week 7: multiple regression - princeton university · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 +...
TRANSCRIPT
![Page 1: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/1.jpg)
Week 7: Multiple Regression
Brandon Stewart1
Princeton
October 22, 24, 2018
1These slides are heavily influenced by Matt Blackwell, Adam Glynn, Justin Grimmer,Jens Hainmueller and Erin Hartman.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 1 / 140
![Page 2: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/2.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions
This WeekI Monday:
F matrix form of linear regressionF t-tests, F-tests and general linear hypothesis tests
I Wednesday:F problems with p-valuesF agnostic regressionF the bootstrap
Next WeekI break!I then . . . diagnostics
Long RunI probability → inference → regression → causal inference
Questions?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 2 / 140
![Page 3: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/3.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions
This WeekI Monday:
F matrix form of linear regressionF t-tests, F-tests and general linear hypothesis tests
I Wednesday:F problems with p-valuesF agnostic regressionF the bootstrap
Next WeekI break!I then . . . diagnostics
Long RunI probability → inference → regression → causal inference
Questions?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 2 / 140
![Page 4: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/4.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions
This WeekI Monday:
F matrix form of linear regressionF t-tests, F-tests and general linear hypothesis tests
I Wednesday:F problems with p-valuesF agnostic regressionF the bootstrap
Next WeekI break!I then . . . diagnostics
Long RunI probability → inference → regression → causal inference
Questions?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 2 / 140
![Page 5: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/5.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions
This WeekI Monday:
F matrix form of linear regressionF t-tests, F-tests and general linear hypothesis tests
I Wednesday:F problems with p-valuesF agnostic regressionF the bootstrap
Next WeekI break!I then . . . diagnostics
Long RunI probability → inference → regression → causal inference
Questions?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 2 / 140
![Page 6: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/6.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions
This WeekI Monday:
F matrix form of linear regressionF t-tests, F-tests and general linear hypothesis tests
I Wednesday:F problems with p-valuesF agnostic regressionF the bootstrap
Next WeekI break!I then . . . diagnostics
Long RunI probability → inference → regression → causal inference
Questions?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 2 / 140
![Page 7: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/7.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 3 / 140
![Page 8: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/8.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 3 / 140
![Page 9: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/9.jpg)
The Linear Model with New Notation
Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:
yi = β0 + xiβ1 + ziβ2 + ui
Imagine we had an n of 4. We could write out each formula:
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 4 / 140
![Page 10: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/10.jpg)
The Linear Model with New Notation
Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:
yi = β0 + xiβ1 + ziβ2 + ui
Imagine we had an n of 4. We could write out each formula:
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 4 / 140
![Page 11: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/11.jpg)
The Linear Model with New Notation
Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:
yi = β0 + xiβ1 + ziβ2 + ui
Imagine we had an n of 4. We could write out each formula:
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 4 / 140
![Page 12: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/12.jpg)
The Linear Model with New Notation
Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:
yi = β0 + xiβ1 + ziβ2 + ui
Imagine we had an n of 4. We could write out each formula:
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 4 / 140
![Page 13: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/13.jpg)
The Linear Model with New Notation
Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:
yi = β0 + xiβ1 + ziβ2 + ui
Imagine we had an n of 4. We could write out each formula:
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 4 / 140
![Page 14: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/14.jpg)
The Linear Model with New Notation
Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:
yi = β0 + xiβ1 + ziβ2 + ui
Imagine we had an n of 4. We could write out each formula:
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 4 / 140
![Page 15: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/15.jpg)
The Linear Model with New Notation
Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:
yi = β0 + xiβ1 + ziβ2 + ui
Imagine we had an n of 4. We could write out each formula:
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 4 / 140
![Page 16: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/16.jpg)
The Linear Model with New Notation
Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:
yi = β0 + xiβ1 + ziβ2 + ui
Imagine we had an n of 4. We could write out each formula:
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 4 / 140
![Page 17: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/17.jpg)
The Linear Model with New Notation
Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:
yi = β0 + xiβ1 + ziβ2 + ui
Imagine we had an n of 4. We could write out each formula:
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 4 / 140
![Page 18: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/18.jpg)
The Linear Model with New Notation
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
We can write this as:y1
y2
y3
y4
=
1111
β0 +
x1
x2
x3
x4
β1 +
z1
z2
z3
z4
β2 +
u1
u2
u3
u4
Outcome is a linear combination of the the x, z, and u vectors
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 5 / 140
![Page 19: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/19.jpg)
The Linear Model with New Notation
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
We can write this as:
y1
y2
y3
y4
=
1111
β0 +
x1
x2
x3
x4
β1 +
z1
z2
z3
z4
β2 +
u1
u2
u3
u4
Outcome is a linear combination of the the x, z, and u vectors
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 5 / 140
![Page 20: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/20.jpg)
The Linear Model with New Notation
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
We can write this as:y1
y2
y3
y4
=
1111
β0 +
x1
x2
x3
x4
β1 +
z1
z2
z3
z4
β2 +
u1
u2
u3
u4
Outcome is a linear combination of the the x, z, and u vectors
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 5 / 140
![Page 21: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/21.jpg)
The Linear Model with New Notation
y1 = β0 + x1β1 + z1β2 + u1 (unit 1)
y2 = β0 + x2β1 + z2β2 + u2 (unit 2)
y3 = β0 + x3β1 + z3β2 + u3 (unit 3)
y4 = β0 + x4β1 + z4β2 + u4 (unit 4)
We can write this as:y1
y2
y3
y4
=
1111
β0 +
x1
x2
x3
x4
β1 +
z1
z2
z3
z4
β2 +
u1
u2
u3
u4
Outcome is a linear combination of the the x, z, and u vectors
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 5 / 140
![Page 22: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/22.jpg)
Grouping Things into Matrices
Can we write this in a more compact form?Yes! Let X and β be the following:
X(4×3)
=
1 x1 z1
1 x2 z2
1 x3 z3
1 x4 z4
β(3×1)
=
β0
β1
β2
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 6 / 140
![Page 23: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/23.jpg)
Grouping Things into Matrices
Can we write this in a more compact form?
Yes! Let X and β be the following:
X(4×3)
=
1 x1 z1
1 x2 z2
1 x3 z3
1 x4 z4
β(3×1)
=
β0
β1
β2
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 6 / 140
![Page 24: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/24.jpg)
Grouping Things into Matrices
Can we write this in a more compact form?Yes! Let X and β be the following:
X(4×3)
=
1 x1 z1
1 x2 z2
1 x3 z3
1 x4 z4
β(3×1)
=
β0
β1
β2
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 6 / 140
![Page 25: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/25.jpg)
Grouping Things into Matrices
Can we write this in a more compact form?Yes! Let X and β be the following:
X(4×3)
=
1 x1 z1
1 x2 z2
1 x3 z3
1 x4 z4
β(3×1)
=
β0
β1
β2
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 6 / 140
![Page 26: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/26.jpg)
Grouping Things into Matrices
Can we write this in a more compact form?Yes! Let X and β be the following:
X(4×3)
=
1 x1 z1
1 x2 z2
1 x3 z3
1 x4 z4
β(3×1)
=
β0
β1
β2
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 6 / 140
![Page 27: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/27.jpg)
Back to Regression
X is the n × (k + 1) design matrix of independent variables
β be the (k + 1)× 1 column vector of coefficients.
Xβ will be n × 1:
Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk
We can compactly write the linear model as the following:
y(n×1)
= Xβ(n×1)
+ u(n×1)
We can also write this at the individual level, where x′i is the ith rowof X:
yi = x′iβ + ui
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 7 / 140
![Page 28: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/28.jpg)
Back to Regression
X is the n × (k + 1) design matrix of independent variables
β be the (k + 1)× 1 column vector of coefficients.
Xβ will be n × 1:
Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk
We can compactly write the linear model as the following:
y(n×1)
= Xβ(n×1)
+ u(n×1)
We can also write this at the individual level, where x′i is the ith rowof X:
yi = x′iβ + ui
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 7 / 140
![Page 29: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/29.jpg)
Back to Regression
X is the n × (k + 1) design matrix of independent variables
β be the (k + 1)× 1 column vector of coefficients.
Xβ will be n × 1:
Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk
We can compactly write the linear model as the following:
y(n×1)
= Xβ(n×1)
+ u(n×1)
We can also write this at the individual level, where x′i is the ith rowof X:
yi = x′iβ + ui
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 7 / 140
![Page 30: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/30.jpg)
Back to Regression
X is the n × (k + 1) design matrix of independent variables
β be the (k + 1)× 1 column vector of coefficients.
Xβ will be n × 1:
Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk
We can compactly write the linear model as the following:
y(n×1)
= Xβ(n×1)
+ u(n×1)
We can also write this at the individual level, where x′i is the ith rowof X:
yi = x′iβ + ui
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 7 / 140
![Page 31: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/31.jpg)
Back to Regression
X is the n × (k + 1) design matrix of independent variables
β be the (k + 1)× 1 column vector of coefficients.
Xβ will be n × 1:
Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk
We can compactly write the linear model as the following:
y(n×1)
= Xβ(n×1)
+ u(n×1)
We can also write this at the individual level, where x′i is the ith rowof X:
yi = x′iβ + ui
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 7 / 140
![Page 32: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/32.jpg)
Back to Regression
X is the n × (k + 1) design matrix of independent variables
β be the (k + 1)× 1 column vector of coefficients.
Xβ will be n × 1:
Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk
We can compactly write the linear model as the following:
y(n×1)
= Xβ(n×1)
+ u(n×1)
We can also write this at the individual level, where x′i is the ith rowof X:
yi = x′iβ + ui
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 7 / 140
![Page 33: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/33.jpg)
Back to Regression
X is the n × (k + 1) design matrix of independent variables
β be the (k + 1)× 1 column vector of coefficients.
Xβ will be n × 1:
Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk
We can compactly write the linear model as the following:
y(n×1)
= Xβ(n×1)
+ u(n×1)
We can also write this at the individual level, where x′i is the ith rowof X:
yi = x′iβ + ui
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 7 / 140
![Page 34: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/34.jpg)
Back to Regression
X is the n × (k + 1) design matrix of independent variables
β be the (k + 1)× 1 column vector of coefficients.
Xβ will be n × 1:
Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk
We can compactly write the linear model as the following:
y(n×1)
= Xβ(n×1)
+ u(n×1)
We can also write this at the individual level, where x′i is the ith rowof X:
yi = x′iβ + ui
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 7 / 140
![Page 35: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/35.jpg)
Multiple Linear Regression in Matrix Form
Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:
β =
β0
β1...
βk
y = Xβ
It might be helpful to see this again more written out:
y =
y1
y2
...yn
= Xβ =
1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk
...
1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 8 / 140
![Page 36: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/36.jpg)
Multiple Linear Regression in Matrix Form
Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:
β =
β0
β1...
βk
y = Xβ
It might be helpful to see this again more written out:
y =
y1
y2
...yn
= Xβ =
1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk
...
1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 8 / 140
![Page 37: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/37.jpg)
Multiple Linear Regression in Matrix Form
Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:
β =
β0
β1...
βk
y = Xβ
It might be helpful to see this again more written out:
y =
y1
y2
...yn
= Xβ =
1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk
...
1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 8 / 140
![Page 38: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/38.jpg)
Multiple Linear Regression in Matrix Form
Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:
β =
β0
β1...
βk
y = Xβ
It might be helpful to see this again more written out:
y =
y1
y2
...yn
= Xβ =
1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk
...
1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 8 / 140
![Page 39: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/39.jpg)
Multiple Linear Regression in Matrix Form
Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:
β =
β0
β1...
βk
y = Xβ
It might be helpful to see this again more written out:
y =
y1
y2
...yn
= Xβ =
1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk
...
1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 8 / 140
![Page 40: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/40.jpg)
Multiple Linear Regression in Matrix Form
Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:
β =
β0
β1...
βk
y = Xβ
It might be helpful to see this again more written out:
y =
y1
y2
...yn
= Xβ =
1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk
...
1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 8 / 140
![Page 41: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/41.jpg)
Multiple Linear Regression in Matrix Form
Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:
β =
β0
β1...
βk
y = Xβ
It might be helpful to see this again more written out:
y =
y1
y2
...yn
= Xβ =
1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk
...
1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 8 / 140
![Page 42: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/42.jpg)
Multiple Linear Regression in Matrix Form
Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:
β =
β0
β1...
βk
y = Xβ
It might be helpful to see this again more written out:
y =
y1
y2
...yn
= Xβ =
1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk
...
1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 8 / 140
![Page 43: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/43.jpg)
Residuals
We can easily write the residuals in matrix form:
u = y − Xβ
Our goal as usual is to minimize the sum of the squared residuals,which we saw earlier we can write:
u′u = (y − Xβ)′(y − Xβ)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 9 / 140
![Page 44: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/44.jpg)
Residuals
We can easily write the residuals in matrix form:
u = y − Xβ
Our goal as usual is to minimize the sum of the squared residuals,which we saw earlier we can write:
u′u = (y − Xβ)′(y − Xβ)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 9 / 140
![Page 45: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/45.jpg)
Residuals
We can easily write the residuals in matrix form:
u = y − Xβ
Our goal as usual is to minimize the sum of the squared residuals,which we saw earlier we can write:
u′u = (y − Xβ)′(y − Xβ)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 9 / 140
![Page 46: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/46.jpg)
Residuals
We can easily write the residuals in matrix form:
u = y − Xβ
Our goal as usual is to minimize the sum of the squared residuals,which we saw earlier we can write:
u′u = (y − Xβ)′(y − Xβ)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 9 / 140
![Page 47: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/47.jpg)
OLS Estimator in Matrix Form
Goal: minimize the sum of the squared residuals
Take (matrix) derivatives, set equal to 0 (see Appendix)
Resulting first order conditions:
X′(y − Xβ) = 0
Rearranging:X′Xβ = X′y
In order to isolate β, we need to move the X′X term to the other sideof the equals sign.
We’ve learned about matrix multiplication, but what about matrix“division”?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 10 / 140
![Page 48: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/48.jpg)
OLS Estimator in Matrix Form
Goal: minimize the sum of the squared residuals
Take (matrix) derivatives, set equal to 0 (see Appendix)
Resulting first order conditions:
X′(y − Xβ) = 0
Rearranging:X′Xβ = X′y
In order to isolate β, we need to move the X′X term to the other sideof the equals sign.
We’ve learned about matrix multiplication, but what about matrix“division”?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 10 / 140
![Page 49: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/49.jpg)
OLS Estimator in Matrix Form
Goal: minimize the sum of the squared residuals
Take (matrix) derivatives, set equal to 0 (see Appendix)
Resulting first order conditions:
X′(y − Xβ) = 0
Rearranging:X′Xβ = X′y
In order to isolate β, we need to move the X′X term to the other sideof the equals sign.
We’ve learned about matrix multiplication, but what about matrix“division”?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 10 / 140
![Page 50: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/50.jpg)
OLS Estimator in Matrix Form
Goal: minimize the sum of the squared residuals
Take (matrix) derivatives, set equal to 0 (see Appendix)
Resulting first order conditions:
X′(y − Xβ) = 0
Rearranging:X′Xβ = X′y
In order to isolate β, we need to move the X′X term to the other sideof the equals sign.
We’ve learned about matrix multiplication, but what about matrix“division”?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 10 / 140
![Page 51: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/51.jpg)
OLS Estimator in Matrix Form
Goal: minimize the sum of the squared residuals
Take (matrix) derivatives, set equal to 0 (see Appendix)
Resulting first order conditions:
X′(y − Xβ) = 0
Rearranging:X′Xβ = X′y
In order to isolate β, we need to move the X′X term to the other sideof the equals sign.
We’ve learned about matrix multiplication, but what about matrix“division”?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 10 / 140
![Page 52: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/52.jpg)
OLS Estimator in Matrix Form
Goal: minimize the sum of the squared residuals
Take (matrix) derivatives, set equal to 0 (see Appendix)
Resulting first order conditions:
X′(y − Xβ) = 0
Rearranging:X′Xβ = X′y
In order to isolate β, we need to move the X′X term to the other sideof the equals sign.
We’ve learned about matrix multiplication, but what about matrix“division”?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 10 / 140
![Page 53: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/53.jpg)
OLS Estimator in Matrix Form
Goal: minimize the sum of the squared residuals
Take (matrix) derivatives, set equal to 0 (see Appendix)
Resulting first order conditions:
X′(y − Xβ) = 0
Rearranging:X′Xβ = X′y
In order to isolate β, we need to move the X′X term to the other sideof the equals sign.
We’ve learned about matrix multiplication, but what about matrix“division”?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 10 / 140
![Page 54: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/54.jpg)
Back to OLS
Let’s assume, for now, that the inverse of X′X exists
Then we can write the OLS estimator as the following:
β = (X′X)−1X′y
“ex prime ex inverse ex prime y” sear it into your soul.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 11 / 140
![Page 55: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/55.jpg)
Back to OLS
Let’s assume, for now, that the inverse of X′X exists
Then we can write the OLS estimator as the following:
β = (X′X)−1X′y
“ex prime ex inverse ex prime y” sear it into your soul.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 11 / 140
![Page 56: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/56.jpg)
Back to OLS
Let’s assume, for now, that the inverse of X′X exists
Then we can write the OLS estimator as the following:
β = (X′X)−1X′y
“ex prime ex inverse ex prime y” sear it into your soul.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 11 / 140
![Page 57: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/57.jpg)
Back to OLS
Let’s assume, for now, that the inverse of X′X exists
Then we can write the OLS estimator as the following:
β = (X′X)−1X′y
“ex prime ex inverse ex prime y”
sear it into your soul.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 11 / 140
![Page 58: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/58.jpg)
Back to OLS
Let’s assume, for now, that the inverse of X′X exists
Then we can write the OLS estimator as the following:
β = (X′X)−1X′y
“ex prime ex inverse ex prime y” sear it into your soul.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 11 / 140
![Page 59: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/59.jpg)
Intuition for the OLS in Matrix Form
β = (X′X)−1X′y
What’s the intuition here?
“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y
“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X
Thus, we have something like:
β ≈ (variance of X)−1(covariance of X & y)
i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove theintercept from X such that β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominatorare the variances and covariances if X and y are demeaned and normalized by the sample sizeminus 1.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 12 / 140
![Page 60: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/60.jpg)
Intuition for the OLS in Matrix Form
β = (X′X)−1X′y
What’s the intuition here?
“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y
“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X
Thus, we have something like:
β ≈ (variance of X)−1(covariance of X & y)
i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove theintercept from X such that β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominatorare the variances and covariances if X and y are demeaned and normalized by the sample sizeminus 1.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 12 / 140
![Page 61: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/61.jpg)
Intuition for the OLS in Matrix Form
β = (X′X)−1X′y
What’s the intuition here?
“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y
“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X
Thus, we have something like:
β ≈ (variance of X)−1(covariance of X & y)
i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove theintercept from X such that β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominatorare the variances and covariances if X and y are demeaned and normalized by the sample sizeminus 1.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 12 / 140
![Page 62: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/62.jpg)
Intuition for the OLS in Matrix Form
β = (X′X)−1X′y
What’s the intuition here?
“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y
“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X
Thus, we have something like:
β ≈ (variance of X)−1(covariance of X & y)
i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove theintercept from X such that β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominatorare the variances and covariances if X and y are demeaned and normalized by the sample sizeminus 1.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 12 / 140
![Page 63: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/63.jpg)
Intuition for the OLS in Matrix Form
β = (X′X)−1X′y
What’s the intuition here?
“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y
“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X
Thus, we have something like:
β ≈ (variance of X)−1(covariance of X & y)
i.e. analogous to the simple linear regression case!
Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove theintercept from X such that β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominatorare the variances and covariances if X and y are demeaned and normalized by the sample sizeminus 1.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 12 / 140
![Page 64: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/64.jpg)
Intuition for the OLS in Matrix Form
β = (X′X)−1X′y
What’s the intuition here?
“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y
“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X
Thus, we have something like:
β ≈ (variance of X)−1(covariance of X & y)
i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove theintercept from X such that β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominatorare the variances and covariances if X and y are demeaned and normalized by the sample sizeminus 1.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 12 / 140
![Page 65: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/65.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 13 / 140
![Page 66: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/66.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 13 / 140
![Page 67: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/67.jpg)
OLS Assumptions in Matrix Form
1 Linearity: y = Xβ + u
2 Random/iid sample: (yi , x′i ) are a iid sample from the population.
3 No perfect collinearity: X is an n × (k + 1) matrix with rank k + 1
4 Zero conditional mean: E[u|X] = 0
5 Homoskedasticity: var(u|X) = σ2uIn
6 Normality: u|X ∼ N(0, σ2uIn)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 14 / 140
![Page 68: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/68.jpg)
OLS Assumptions in Matrix Form
1 Linearity: y = Xβ + u
2 Random/iid sample: (yi , x′i ) are a iid sample from the population.
3 No perfect collinearity: X is an n × (k + 1) matrix with rank k + 1
4 Zero conditional mean: E[u|X] = 0
5 Homoskedasticity: var(u|X) = σ2uIn
6 Normality: u|X ∼ N(0, σ2uIn)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 14 / 140
![Page 69: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/69.jpg)
OLS Assumptions in Matrix Form
1 Linearity: y = Xβ + u
2 Random/iid sample: (yi , x′i ) are a iid sample from the population.
3 No perfect collinearity: X is an n × (k + 1) matrix with rank k + 1
4 Zero conditional mean: E[u|X] = 0
5 Homoskedasticity: var(u|X) = σ2uIn
6 Normality: u|X ∼ N(0, σ2uIn)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 14 / 140
![Page 70: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/70.jpg)
Assumption 3: No Perfect Collinearity
Definition (Rank)
The rank of a matrix is the maximum number of linearly independentcolumns.
In matrix form: X is an n × (k + 1) matrix with rank k + 1
If X has rank k + 1, then all of its columns are linearly independent
. . . and none of its columns are linearly dependent implies no perfectcollinearity
X has rank k + 1 and thus (X′X) is invertible
Just like variation in X led us to be able to divide by the variance insimple OLS
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 15 / 140
![Page 71: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/71.jpg)
Assumption 3: No Perfect Collinearity
Definition (Rank)
The rank of a matrix is the maximum number of linearly independentcolumns.
In matrix form: X is an n × (k + 1) matrix with rank k + 1
If X has rank k + 1, then all of its columns are linearly independent
. . . and none of its columns are linearly dependent implies no perfectcollinearity
X has rank k + 1 and thus (X′X) is invertible
Just like variation in X led us to be able to divide by the variance insimple OLS
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 15 / 140
![Page 72: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/72.jpg)
Assumption 3: No Perfect Collinearity
Definition (Rank)
The rank of a matrix is the maximum number of linearly independentcolumns.
In matrix form: X is an n × (k + 1) matrix with rank k + 1
If X has rank k + 1, then all of its columns are linearly independent
. . . and none of its columns are linearly dependent implies no perfectcollinearity
X has rank k + 1 and thus (X′X) is invertible
Just like variation in X led us to be able to divide by the variance insimple OLS
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 15 / 140
![Page 73: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/73.jpg)
Assumption 3: No Perfect Collinearity
Definition (Rank)
The rank of a matrix is the maximum number of linearly independentcolumns.
In matrix form: X is an n × (k + 1) matrix with rank k + 1
If X has rank k + 1, then all of its columns are linearly independent
. . . and none of its columns are linearly dependent implies no perfectcollinearity
X has rank k + 1 and thus (X′X) is invertible
Just like variation in X led us to be able to divide by the variance insimple OLS
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 15 / 140
![Page 74: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/74.jpg)
Assumption 3: No Perfect Collinearity
Definition (Rank)
The rank of a matrix is the maximum number of linearly independentcolumns.
In matrix form: X is an n × (k + 1) matrix with rank k + 1
If X has rank k + 1, then all of its columns are linearly independent
. . . and none of its columns are linearly dependent implies no perfectcollinearity
X has rank k + 1 and thus (X′X) is invertible
Just like variation in X led us to be able to divide by the variance insimple OLS
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 15 / 140
![Page 75: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/75.jpg)
Assumption 3: No Perfect Collinearity
Definition (Rank)
The rank of a matrix is the maximum number of linearly independentcolumns.
In matrix form: X is an n × (k + 1) matrix with rank k + 1
If X has rank k + 1, then all of its columns are linearly independent
. . . and none of its columns are linearly dependent implies no perfectcollinearity
X has rank k + 1 and thus (X′X) is invertible
Just like variation in X led us to be able to divide by the variance insimple OLS
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 15 / 140
![Page 76: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/76.jpg)
Assumption 3: No Perfect Collinearity
Definition (Rank)
The rank of a matrix is the maximum number of linearly independentcolumns.
In matrix form: X is an n × (k + 1) matrix with rank k + 1
If X has rank k + 1, then all of its columns are linearly independent
. . . and none of its columns are linearly dependent implies no perfectcollinearity
X has rank k + 1 and thus (X′X) is invertible
Just like variation in X led us to be able to divide by the variance insimple OLS
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 15 / 140
![Page 77: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/77.jpg)
Assumption 5: Homoskedasticity
The stated homoskedasticity assumption is: var(u|X) = σ2uIn
To really understand this we need to know what var(u|X) is in fullgenerality.
The variance of a vector is actually a matrix:
var[u] = Σu =
var(u1) cov(u1, u2) . . . cov(u1, un)
cov(u2, u1) var(u2) . . . cov(u2, un)...
. . .
cov(un, u1) cov(un, u2) . . . var(un)
This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 16 / 140
![Page 78: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/78.jpg)
Assumption 5: Homoskedasticity
The stated homoskedasticity assumption is: var(u|X) = σ2uIn
To really understand this we need to know what var(u|X) is in fullgenerality.
The variance of a vector is actually a matrix:
var[u] = Σu =
var(u1) cov(u1, u2) . . . cov(u1, un)
cov(u2, u1) var(u2) . . . cov(u2, un)...
. . .
cov(un, u1) cov(un, u2) . . . var(un)
This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 16 / 140
![Page 79: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/79.jpg)
Assumption 5: Homoskedasticity
The stated homoskedasticity assumption is: var(u|X) = σ2uIn
To really understand this we need to know what var(u|X) is in fullgenerality.
The variance of a vector is actually a matrix:
var[u] = Σu =
var(u1) cov(u1, u2) . . . cov(u1, un)
cov(u2, u1) var(u2) . . . cov(u2, un)...
. . .
cov(un, u1) cov(un, u2) . . . var(un)
This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 16 / 140
![Page 80: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/80.jpg)
Assumption 5: Homoskedasticity
The stated homoskedasticity assumption is: var(u|X) = σ2uIn
To really understand this we need to know what var(u|X) is in fullgenerality.
The variance of a vector is actually a matrix:
var[u] = Σu =
var(u1) cov(u1, u2) . . . cov(u1, un)
cov(u2, u1) var(u2) . . . cov(u2, un)...
. . .
cov(un, u1) cov(un, u2) . . . var(un)
This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 16 / 140
![Page 81: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/81.jpg)
Assumption 5: Homoskedasticity
The stated homoskedasticity assumption is: var(u|X) = σ2uIn
To really understand this we need to know what var(u|X) is in fullgenerality.
The variance of a vector is actually a matrix:
var[u] = Σu =
var(u1) cov(u1, u2) . . . cov(u1, un)
cov(u2, u1) var(u2) . . . cov(u2, un)...
. . .
cov(un, u1) cov(un, u2) . . . var(un)
This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 16 / 140
![Page 82: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/82.jpg)
Assumption 5: The Meaning of Homoskedasticity
What does var(u|X) = σ2uIn mean?
In is the n × n identity matrix, σ2u is a scalar.
Visually:
var[u] = σ2uIn =
σ2u 0 0 . . . 0
0 σ2u 0 . . . 0
...0 0 0 . . . σ2
u
In less matrix notation:
I var(ui ) = σ2u for all i (constant variance)
I cov(ui , uj) = 0 for all i 6= j (implied by iid)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 17 / 140
![Page 83: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/83.jpg)
Assumption 5: The Meaning of Homoskedasticity
What does var(u|X) = σ2uIn mean?
In is the n × n identity matrix, σ2u is a scalar.
Visually:
var[u] = σ2uIn =
σ2u 0 0 . . . 0
0 σ2u 0 . . . 0
...0 0 0 . . . σ2
u
In less matrix notation:
I var(ui ) = σ2u for all i (constant variance)
I cov(ui , uj) = 0 for all i 6= j (implied by iid)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 17 / 140
![Page 84: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/84.jpg)
Assumption 5: The Meaning of Homoskedasticity
What does var(u|X) = σ2uIn mean?
In is the n × n identity matrix, σ2u is a scalar.
Visually:
var[u] = σ2uIn =
σ2u 0 0 . . . 0
0 σ2u 0 . . . 0
...0 0 0 . . . σ2
u
In less matrix notation:
I var(ui ) = σ2u for all i (constant variance)
I cov(ui , uj) = 0 for all i 6= j (implied by iid)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 17 / 140
![Page 85: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/85.jpg)
Assumption 5: The Meaning of Homoskedasticity
What does var(u|X) = σ2uIn mean?
In is the n × n identity matrix, σ2u is a scalar.
Visually:
var[u] = σ2uIn =
σ2u 0 0 . . . 0
0 σ2u 0 . . . 0
...0 0 0 . . . σ2
u
In less matrix notation:
I var(ui ) = σ2u for all i (constant variance)
I cov(ui , uj) = 0 for all i 6= j (implied by iid)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 17 / 140
![Page 86: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/86.jpg)
Assumption 5: The Meaning of Homoskedasticity
What does var(u|X) = σ2uIn mean?
In is the n × n identity matrix, σ2u is a scalar.
Visually:
var[u] = σ2uIn =
σ2u 0 0 . . . 0
0 σ2u 0 . . . 0
...0 0 0 . . . σ2
u
In less matrix notation:
I var(ui ) = σ2u for all i (constant variance)
I cov(ui , uj) = 0 for all i 6= j (implied by iid)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 17 / 140
![Page 87: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/87.jpg)
Assumption 5: The Meaning of Homoskedasticity
What does var(u|X) = σ2uIn mean?
In is the n × n identity matrix, σ2u is a scalar.
Visually:
var[u] = σ2uIn =
σ2u 0 0 . . . 0
0 σ2u 0 . . . 0
...0 0 0 . . . σ2
u
In less matrix notation:
I var(ui ) = σ2u for all i (constant variance)
I cov(ui , uj) = 0 for all i 6= j (implied by iid)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 17 / 140
![Page 88: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/88.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 89: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/89.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 90: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/90.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 91: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/91.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 92: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/92.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 93: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/93.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 94: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/94.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 95: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/95.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 96: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/96.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 97: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/97.jpg)
Unbiasedness of β
Is β still unbiased under assumptions 1-4? Does E [β] = β?
β =(X′X
)−1X′y (linearity and no collinearity)
β =(X′X
)−1X′(Xβ + u)
β =(X′X
)−1X′Xβ +
(X′X
)−1X′u
β = Iβ +(X′X
)−1X′u
β = β +(X′X
)−1X′u
E [β|X] = E [β|X] + E [(X′X
)−1X′u|X]
E [β|X] = β +(X′X
)−1X′E [u|X]
E [β|X] = β (zero conditional mean)
So, yes!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 18 / 140
![Page 98: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/98.jpg)
A Much Shorter Proof of Unbiasedness of β
A shorter (but less helpful later) proof of unbiasedness,
E [β] = E [(X′X
)−1X′y] (definition of the estimator)
=(X′X
)−1X′Xβ (expectation of y)
= β
Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 19 / 140
![Page 99: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/99.jpg)
A Much Shorter Proof of Unbiasedness of β
A shorter (but less helpful later) proof of unbiasedness,
E [β] = E [(X′X
)−1X′y] (definition of the estimator)
=(X′X
)−1X′Xβ (expectation of y)
= β
Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 19 / 140
![Page 100: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/100.jpg)
A Much Shorter Proof of Unbiasedness of β
A shorter (but less helpful later) proof of unbiasedness,
E [β] = E [(X′X
)−1X′y] (definition of the estimator)
=(X′X
)−1X′Xβ (expectation of y)
= β
Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 19 / 140
![Page 101: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/101.jpg)
A Much Shorter Proof of Unbiasedness of β
A shorter (but less helpful later) proof of unbiasedness,
E [β] = E [(X′X
)−1X′y] (definition of the estimator)
=(X′X
)−1X′Xβ (expectation of y)
= β
Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 19 / 140
![Page 102: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/102.jpg)
A Much Shorter Proof of Unbiasedness of β
A shorter (but less helpful later) proof of unbiasedness,
E [β] = E [(X′X
)−1X′y] (definition of the estimator)
=(X′X
)−1X′Xβ (expectation of y)
= β
Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 19 / 140
![Page 103: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/103.jpg)
Rule: Variance of Linear Function of Random Vector
Recall that for a linear transformation of a random variable X we haveV [aX + b] = a2V [X ] with constants a and b.
We will need an analogous rule for linear functions of random vectors.
Definition (Variance of Linear Transformation of Random Vector)
Let f (u) = Au + B be a linear transformation of a random vector u withnon-random vectors or matrices A and B. Then the variance of thetransformation is given by:
V [f (u)] = V [Au + B] = AV [u]A′ = AΣuA′
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 20 / 140
![Page 104: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/104.jpg)
Rule: Variance of Linear Function of Random Vector
Recall that for a linear transformation of a random variable X we haveV [aX + b] = a2V [X ] with constants a and b.
We will need an analogous rule for linear functions of random vectors.
Definition (Variance of Linear Transformation of Random Vector)
Let f (u) = Au + B be a linear transformation of a random vector u withnon-random vectors or matrices A and B. Then the variance of thetransformation is given by:
V [f (u)] = V [Au + B] = AV [u]A′ = AΣuA′
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 20 / 140
![Page 105: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/105.jpg)
Rule: Variance of Linear Function of Random Vector
Recall that for a linear transformation of a random variable X we haveV [aX + b] = a2V [X ] with constants a and b.
We will need an analogous rule for linear functions of random vectors.
Definition (Variance of Linear Transformation of Random Vector)
Let f (u) = Au + B be a linear transformation of a random vector u withnon-random vectors or matrices A and B. Then the variance of thetransformation is given by:
V [f (u)] = V [Au + B] = AV [u]A′ = AΣuA′
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 20 / 140
![Page 106: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/106.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 107: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/107.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 108: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/108.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 109: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/109.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 110: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/110.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 111: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/111.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 112: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/112.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 113: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/113.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 114: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/114.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 115: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/115.jpg)
Conditional Variance of ββ = β + (X′X)
−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS
estimator is a linear function of the errors. Thus:
V [β|X] = V [β|X] + V [(X′X)−1
X′u|X]
= V [(X′X)−1
X′u|X]
= (X′X)−1
X′V [u|X]((X′X)−1
X′)′ (X is nonrandom given X)
= (X′X)−1
X′V [u|X]X (X′X)−1
= (X′X)−1
X′σ2IX (X′X)−1
(by homoskedasticity)
= σ2I (X′X)−1
X′X (X′X)−1
= σ2 (X′X)−1
This gives the (k + 1)× (k + 1) variance-covariance matrix of β.
To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:
σ2 =
∑i u
2i
n − (k + 1)=
u′u
n − (k + 1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 21 / 140
![Page 116: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/116.jpg)
Sampling Variance for β
Under assumptions 1-5, the variance-covariance matrix of the OLSestimators is given by:
V [β|X] = σ2(X′X
)−1=
β0 β1 β2 · · · βkβ0 V [β0] Cov[β0, β1] Cov[β0, β2] · · · Cov[β0, βk ]
β1 Cov[β0, β1] V [β1] Cov[β1, β2] · · · Cov[β1, βk ]
β2 Cov[β0, β2] Cov[β1, β2] V [β2] · · · Cov[β2, βk ]...
......
.... . .
...
βk Cov[β0, βk ] Cov[βk , β1] Cov[βk , β2] · · · V [βk ]
Recall that standard errors are the square root of the diagonals of thismatrix.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 22 / 140
![Page 117: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/117.jpg)
Overview of Inference in the General Setting
Under assumption 1-5 in large samples:
βj − βjSE [βj ]
∼ N(0, 1)
In small samples, under assumptions 1-6,
βj − βjSE [βj ]
∼ tn−(k+1)
Estimated standard errors are:
SE [βj ] =
√var[β]jj
var[β] = σ2u(X′X)−1
σ2u =
u′u
n − (k + 1)
Thus, confidence intervals and hypothesis tests proceed in essentiallythe same way.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 23 / 140
![Page 118: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/118.jpg)
Overview of Inference in the General Setting
Under assumption 1-5 in large samples:
βj − βjSE [βj ]
∼ N(0, 1)
In small samples, under assumptions 1-6,
βj − βjSE [βj ]
∼ tn−(k+1)
Estimated standard errors are:
SE [βj ] =
√var[β]jj
var[β] = σ2u(X′X)−1
σ2u =
u′u
n − (k + 1)
Thus, confidence intervals and hypothesis tests proceed in essentiallythe same way.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 23 / 140
![Page 119: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/119.jpg)
Overview of Inference in the General Setting
Under assumption 1-5 in large samples:
βj − βjSE [βj ]
∼ N(0, 1)
In small samples, under assumptions 1-6,
βj − βjSE [βj ]
∼ tn−(k+1)
Estimated standard errors are:
SE [βj ] =
√var[β]jj
var[β] = σ2u(X′X)−1
σ2u =
u′u
n − (k + 1)
Thus, confidence intervals and hypothesis tests proceed in essentiallythe same way.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 23 / 140
![Page 120: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/120.jpg)
Overview of Inference in the General Setting
Under assumption 1-5 in large samples:
βj − βjSE [βj ]
∼ N(0, 1)
In small samples, under assumptions 1-6,
βj − βjSE [βj ]
∼ tn−(k+1)
Estimated standard errors are:
SE [βj ] =
√var[β]jj
var[β] = σ2u(X′X)−1
σ2u =
u′u
n − (k + 1)
Thus, confidence intervals and hypothesis tests proceed in essentiallythe same way.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 23 / 140
![Page 121: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/121.jpg)
Properties of the OLS Estimator: Summary
Theorem
Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)
−1:
β|X ∼ N(β, σ2 (X′X)
−1)
Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β
Variances and covariances are given by V [β|X] = σ2 (X′X)−1
An unbiased estimator for the error variance σ2 is given by
σ2 =u′u
n − (k + 1)
With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 24 / 140
![Page 122: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/122.jpg)
Properties of the OLS Estimator: Summary
Theorem
Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)
−1:
β|X ∼ N(β, σ2 (X′X)
−1)
Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β
Variances and covariances are given by V [β|X] = σ2 (X′X)−1
An unbiased estimator for the error variance σ2 is given by
σ2 =u′u
n − (k + 1)
With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 24 / 140
![Page 123: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/123.jpg)
Properties of the OLS Estimator: Summary
Theorem
Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)
−1:
β|X ∼ N(β, σ2 (X′X)
−1)
Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β
Variances and covariances are given by V [β|X] = σ2 (X′X)−1
An unbiased estimator for the error variance σ2 is given by
σ2 =u′u
n − (k + 1)
With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 24 / 140
![Page 124: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/124.jpg)
Properties of the OLS Estimator: Summary
Theorem
Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)
−1:
β|X ∼ N(β, σ2 (X′X)
−1)
Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β
Variances and covariances are given by V [β|X] = σ2 (X′X)−1
An unbiased estimator for the error variance σ2 is given by
σ2 =u′u
n − (k + 1)
With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 24 / 140
![Page 125: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/125.jpg)
Properties of the OLS Estimator: Summary
Theorem
Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)
−1:
β|X ∼ N(β, σ2 (X′X)
−1)
Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β
Variances and covariances are given by V [β|X] = σ2 (X′X)−1
An unbiased estimator for the error variance σ2 is given by
σ2 =u′u
n − (k + 1)
With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 24 / 140
![Page 126: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/126.jpg)
Properties of the OLS Estimator: Summary
Theorem
Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)
−1:
β|X ∼ N(β, σ2 (X′X)
−1)
Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β
Variances and covariances are given by V [β|X] = σ2 (X′X)−1
An unbiased estimator for the error variance σ2 is given by
σ2 =u′u
n − (k + 1)
With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 24 / 140
![Page 127: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/127.jpg)
Implications of the Variance-Covariance Matrix
Note that the sampling distribution is a joint distribution because itinvolves multiple random variables.
This is because the sampling distribution of the terms in β arecorrelated.
In a practical sense, this means that our uncertainty aboutcoefficients is correlated across variables.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 25 / 140
![Page 128: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/128.jpg)
Implications of the Variance-Covariance Matrix
Note that the sampling distribution is a joint distribution because itinvolves multiple random variables.
This is because the sampling distribution of the terms in β arecorrelated.
In a practical sense, this means that our uncertainty aboutcoefficients is correlated across variables.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 25 / 140
![Page 129: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/129.jpg)
Implications of the Variance-Covariance Matrix
Note that the sampling distribution is a joint distribution because itinvolves multiple random variables.
This is because the sampling distribution of the terms in β arecorrelated.
In a practical sense, this means that our uncertainty aboutcoefficients is correlated across variables.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 25 / 140
![Page 130: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/130.jpg)
Implications of the Variance-Covariance Matrix
Note that the sampling distribution is a joint distribution because itinvolves multiple random variables.
This is because the sampling distribution of the terms in β arecorrelated.
In a practical sense, this means that our uncertainty aboutcoefficients is correlated across variables.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 25 / 140
![Page 131: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/131.jpg)
Multivariate Normal: SimulationY = β0 + β1X1 + u with u ∼ N(0, σ2
u = 4) and β0 = 5, β1 = −1, and n = 100:
0.0 0.2 0.4 0.6 0.8 1.0
24
68
Sampling distribution of Regression Lines
x1
y
4.0 4.5 5.0 5.5 6.0
−3
−2
−1
0
Joint sampling distribution
beta_0 hat
beta
_1 h
at
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 26 / 140
![Page 132: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/132.jpg)
Marginals of Multivariate Normal RVs are NormalY = β0 + β1X1 + u with u ∼ N(0, σ2
u = 4) and β0 = 5, β1 = −1, and n = 100:
Sampling Distribution beta_0 hat
beta_0 hat
Fre
quen
cy
4.0 4.5 5.0 5.5 6.0
010
020
030
040
050
060
0Sampling Distribution beta_1 hat
beta_1 hat
Fre
quen
cy
−3 −2 −1 0 1
020
040
060
080
0
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 27 / 140
![Page 133: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/133.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 28 / 140
![Page 134: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/134.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 28 / 140
![Page 135: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/135.jpg)
Running Example: Chilean Referendum on Pinochet
The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.
Data: national survey conducted in April and May of 1988 byFLACSO in Chile.
Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.
Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.
We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 29 / 140
![Page 136: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/136.jpg)
Running Example: Chilean Referendum on Pinochet
The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.
Data: national survey conducted in April and May of 1988 byFLACSO in Chile.
Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.
Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.
We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 29 / 140
![Page 137: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/137.jpg)
Running Example: Chilean Referendum on Pinochet
The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.
Data: national survey conducted in April and May of 1988 byFLACSO in Chile.
Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.
Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.
We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 29 / 140
![Page 138: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/138.jpg)
Running Example: Chilean Referendum on Pinochet
The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.
Data: national survey conducted in April and May of 1988 byFLACSO in Chile.
Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.
Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.
We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 29 / 140
![Page 139: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/139.jpg)
Running Example: Chilean Referendum on Pinochet
The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.
Data: national survey conducted in April and May of 1988 byFLACSO in Chile.
Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.
Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.
We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 29 / 140
![Page 140: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/140.jpg)
Hypothesis Testing in R
Model the intended Pinochet vote as a linear function of gender,education, and age of respondents.
R Code> fit <- lm(vote1 ~ fem + educ + age, data = d)
> summary(fit)
~~~~~
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***
fem 0.1360034 0.0237132 5.735 1.15e-08 ***
educ -0.0607604 0.0138649 -4.382 1.25e-05 ***
age 0.0037786 0.0008315 4.544 5.90e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.4875 on 1699 degrees of freedom
Multiple R-squared: 0.05112, Adjusted R-squared: 0.04945
F-statistic: 30.51 on 3 and 1699 DF, p-value: < 2.2e-16
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 30 / 140
![Page 141: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/141.jpg)
The t-Value for Multiple Linear Regression
Consider testing a hypothesis about a single regression coefficient βj :
H0 : βj = c
In the simple linear regression we used the t-value to test this kind ofhypothesis.
We can consider the same t-value about βj for the multiple regression:
T =βj − c
SE (βj)
How do we compute SE (βj)?
SE (βj) =
√V (βj) =
√V (β)(j,j) =
√σ2(X′X)−1
(j,j)
where A(j,j) is the (j , j) element of matrix A.
That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 31 / 140
![Page 142: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/142.jpg)
The t-Value for Multiple Linear Regression
Consider testing a hypothesis about a single regression coefficient βj :
H0 : βj = c
In the simple linear regression we used the t-value to test this kind ofhypothesis.
We can consider the same t-value about βj for the multiple regression:
T =βj − c
SE (βj)
How do we compute SE (βj)?
SE (βj) =
√V (βj) =
√V (β)(j,j) =
√σ2(X′X)−1
(j,j)
where A(j,j) is the (j , j) element of matrix A.
That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 31 / 140
![Page 143: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/143.jpg)
The t-Value for Multiple Linear Regression
Consider testing a hypothesis about a single regression coefficient βj :
H0 : βj = c
In the simple linear regression we used the t-value to test this kind ofhypothesis.
We can consider the same t-value about βj for the multiple regression:
T =βj − c
SE (βj)
How do we compute SE (βj)?
SE (βj) =
√V (βj) =
√V (β)(j,j) =
√σ2(X′X)−1
(j,j)
where A(j,j) is the (j , j) element of matrix A.
That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 31 / 140
![Page 144: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/144.jpg)
The t-Value for Multiple Linear Regression
Consider testing a hypothesis about a single regression coefficient βj :
H0 : βj = c
In the simple linear regression we used the t-value to test this kind ofhypothesis.
We can consider the same t-value about βj for the multiple regression:
T =βj − c
SE (βj)
How do we compute SE (βj)?
SE (βj) =
√V (βj) =
√V (β)(j,j) =
√σ2(X′X)−1
(j,j)
where A(j,j) is the (j , j) element of matrix A.
That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 31 / 140
![Page 145: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/145.jpg)
The t-Value for Multiple Linear Regression
Consider testing a hypothesis about a single regression coefficient βj :
H0 : βj = c
In the simple linear regression we used the t-value to test this kind ofhypothesis.
We can consider the same t-value about βj for the multiple regression:
T =βj − c
SE (βj)
How do we compute SE (βj)?
SE (βj) =
√V (βj) =
√V (β)(j,j) =
√σ2(X′X)−1
(j,j)
where A(j,j) is the (j , j) element of matrix A.
That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 31 / 140
![Page 146: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/146.jpg)
The t-Value for Multiple Linear Regression
Consider testing a hypothesis about a single regression coefficient βj :
H0 : βj = c
In the simple linear regression we used the t-value to test this kind ofhypothesis.
We can consider the same t-value about βj for the multiple regression:
T =βj − c
SE (βj)
How do we compute SE (βj)?
SE (βj) =
√V (βj) =
√V (β)(j,j) =
√σ2(X′X)−1
(j,j)
where A(j,j) is the (j , j) element of matrix A.
That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 31 / 140
![Page 147: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/147.jpg)
Getting the Standard ErrorsR Code
> fit <- lm(vote1 ~ fem + educ + age, data = d)
> summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***
fem 0.1360034 0.0237132 5.735 1.15e-08 ***
educ -0.0607604 0.0138649 -4.382 1.25e-05 ***
age 0.0037786 0.0008315 4.544 5.90e-06 ***
---
We can pull out the variance-covariance matrix σ2(X′X)−1 in R from the lm() object:
R Code> V <- vcov(fit)
> V
(Intercept) fem educ age
(Intercept) 2.642311e-03 -3.455498e-04 -5.270913e-04 -3.357119e-05
fem -3.455498e-04 5.623170e-04 2.249973e-05 8.285291e-07
educ -5.270913e-04 2.249973e-05 1.922354e-04 3.411049e-06
age -3.357119e-05 8.285291e-07 3.411049e-06 6.914098e-07
> sqrt(diag(V))
(Intercept) fem educ age
0.0514034097 0.0237132251 0.0138648980 0.0008315105
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 32 / 140
![Page 148: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/148.jpg)
Getting the Standard ErrorsR Code
> fit <- lm(vote1 ~ fem + educ + age, data = d)
> summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***
fem 0.1360034 0.0237132 5.735 1.15e-08 ***
educ -0.0607604 0.0138649 -4.382 1.25e-05 ***
age 0.0037786 0.0008315 4.544 5.90e-06 ***
---
We can pull out the variance-covariance matrix σ2(X′X)−1 in R from the lm() object:R Code
> V <- vcov(fit)
> V
(Intercept) fem educ age
(Intercept) 2.642311e-03 -3.455498e-04 -5.270913e-04 -3.357119e-05
fem -3.455498e-04 5.623170e-04 2.249973e-05 8.285291e-07
educ -5.270913e-04 2.249973e-05 1.922354e-04 3.411049e-06
age -3.357119e-05 8.285291e-07 3.411049e-06 6.914098e-07
> sqrt(diag(V))
(Intercept) fem educ age
0.0514034097 0.0237132251 0.0138648980 0.0008315105
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 32 / 140
![Page 149: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/149.jpg)
Using the t-Value as a Test Statistic
The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.
1 Compute the t-value as T = (βj − c)/SE [βj ]
2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies
P(−tα/2 ≤ T ≤ tα/2
)= 1− α
3 Decide whether the realized value of T in our data is unusual given theknown distribution of the test statistic.
4 Finally, either declare that we reject H0 or not, or report the p-value.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 33 / 140
![Page 150: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/150.jpg)
Using the t-Value as a Test Statistic
The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.
1 Compute the t-value as T = (βj − c)/SE [βj ]
2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies
P(−tα/2 ≤ T ≤ tα/2
)= 1− α
3 Decide whether the realized value of T in our data is unusual given theknown distribution of the test statistic.
4 Finally, either declare that we reject H0 or not, or report the p-value.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 33 / 140
![Page 151: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/151.jpg)
Using the t-Value as a Test Statistic
The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.
1 Compute the t-value as T = (βj − c)/SE [βj ]
2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies
P(−tα/2 ≤ T ≤ tα/2
)= 1− α
3 Decide whether the realized value of T in our data is unusual given theknown distribution of the test statistic.
4 Finally, either declare that we reject H0 or not, or report the p-value.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 33 / 140
![Page 152: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/152.jpg)
Using the t-Value as a Test Statistic
The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.
1 Compute the t-value as T = (βj − c)/SE [βj ]
2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies
P(−tα/2 ≤ T ≤ tα/2
)= 1− α
3 Decide whether the realized value of T in our data is unusual given theknown distribution of the test statistic.
4 Finally, either declare that we reject H0 or not, or report the p-value.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 33 / 140
![Page 153: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/153.jpg)
Using the t-Value as a Test Statistic
The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.
1 Compute the t-value as T = (βj − c)/SE [βj ]
2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies
P(−tα/2 ≤ T ≤ tα/2
)= 1− α
3 Decide whether the realized value of T in our data is unusual given theknown distribution of the test statistic.
4 Finally, either declare that we reject H0 or not, or report the p-value.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 33 / 140
![Page 154: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/154.jpg)
Using the t-Value as a Test Statistic
The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.
1 Compute the t-value as T = (βj − c)/SE [βj ]
2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies
P(−tα/2 ≤ T ≤ tα/2
)= 1− α
3 Decide whether the realized value of T in our data is unusual given theknown distribution of the test statistic.
4 Finally, either declare that we reject H0 or not, or report the p-value.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 33 / 140
![Page 155: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/155.jpg)
Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2
Since we know the sampling distribution for our t-value:
T =βj − c
SE [βj ]∼ tn−k−1
So we also know the probability that the value of our test statistics falls into a giveninterval:
P
(−tα/2 ≤
βj − βjSE [βj ]
≤ tα/2
)= 1− α
We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]
]and thus can construct the confidence intervals as usual using:
βj ± tα/2 · SE [βj ]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 34 / 140
![Page 156: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/156.jpg)
Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2
Since we know the sampling distribution for our t-value:
T =βj − c
SE [βj ]∼ tn−k−1
So we also know the probability that the value of our test statistics falls into a giveninterval:
P
(−tα/2 ≤
βj − βjSE [βj ]
≤ tα/2
)= 1− α
We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]
]and thus can construct the confidence intervals as usual using:
βj ± tα/2 · SE [βj ]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 34 / 140
![Page 157: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/157.jpg)
Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2
Since we know the sampling distribution for our t-value:
T =βj − c
SE [βj ]∼ tn−k−1
So we also know the probability that the value of our test statistics falls into a giveninterval:
P
(−tα/2 ≤
βj − βjSE [βj ]
≤ tα/2
)= 1− α
We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]
]and thus can construct the confidence intervals as usual using:
βj ± tα/2 · SE [βj ]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 34 / 140
![Page 158: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/158.jpg)
Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2
Since we know the sampling distribution for our t-value:
T =βj − c
SE [βj ]∼ tn−k−1
So we also know the probability that the value of our test statistics falls into a giveninterval:
P
(−tα/2 ≤
βj − βjSE [βj ]
≤ tα/2
)= 1− α
We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]
]
and thus can construct the confidence intervals as usual using:
βj ± tα/2 · SE [βj ]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 34 / 140
![Page 159: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/159.jpg)
Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2
Since we know the sampling distribution for our t-value:
T =βj − c
SE [βj ]∼ tn−k−1
So we also know the probability that the value of our test statistics falls into a giveninterval:
P
(−tα/2 ≤
βj − βjSE [βj ]
≤ tα/2
)= 1− α
We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]
]and thus can construct the confidence intervals as usual using:
βj ± tα/2 · SE [βj ]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 34 / 140
![Page 160: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/160.jpg)
Confidence Intervals in RR Code
> fit <- lm(vote1 ~ fem + educ + age, data = d)
> summary(fit)
~~~~~
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***
fem 0.1360034 0.0237132 5.735 1.15e-08 ***
educ -0.0607604 0.0138649 -4.382 1.25e-05 ***
age 0.0037786 0.0008315 4.544 5.90e-06 ***
---
R Code> confint(fit)
2.5 % 97.5 %
(Intercept) 0.303407780 0.50504909
fem 0.089493169 0.18251357
educ -0.087954435 -0.03356629
age 0.002147755 0.00540954
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 35 / 140
![Page 161: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/161.jpg)
Testing Hypothesis About a Linear Combination of βj
R Code> fit <- lm(REALGDPCAP ~ Region, data = D)
> summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4452.7 783.4 5.684 2.07e-07 ***
RegionAfrica -2552.8 1204.5 -2.119 0.0372 *
RegionAsia 148.9 1149.8 0.129 0.8973
RegionLatAmerica -271.3 1007.0 -0.269 0.7883
RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***
βAsia and βLAm are close. So we may want to test the null hypothesis:
H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0
against the alternative of
H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0
What would be an appropriate test statistic for this hypothesis?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 36 / 140
![Page 162: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/162.jpg)
Testing Hypothesis About a Linear Combination of βjR Code
> fit <- lm(REALGDPCAP ~ Region, data = D)
> summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4452.7 783.4 5.684 2.07e-07 ***
RegionAfrica -2552.8 1204.5 -2.119 0.0372 *
RegionAsia 148.9 1149.8 0.129 0.8973
RegionLatAmerica -271.3 1007.0 -0.269 0.7883
RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***
βAsia and βLAm are close. So we may want to test the null hypothesis:
H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0
against the alternative of
H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0
What would be an appropriate test statistic for this hypothesis?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 36 / 140
![Page 163: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/163.jpg)
Testing Hypothesis About a Linear Combination of βjR Code
> fit <- lm(REALGDPCAP ~ Region, data = D)
> summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4452.7 783.4 5.684 2.07e-07 ***
RegionAfrica -2552.8 1204.5 -2.119 0.0372 *
RegionAsia 148.9 1149.8 0.129 0.8973
RegionLatAmerica -271.3 1007.0 -0.269 0.7883
RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***
βAsia and βLAm are close. So we may want to test the null hypothesis:
H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0
against the alternative of
H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0
What would be an appropriate test statistic for this hypothesis?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 36 / 140
![Page 164: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/164.jpg)
Testing Hypothesis About a Linear Combination of βjR Code
> fit <- lm(REALGDPCAP ~ Region, data = D)
> summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4452.7 783.4 5.684 2.07e-07 ***
RegionAfrica -2552.8 1204.5 -2.119 0.0372 *
RegionAsia 148.9 1149.8 0.129 0.8973
RegionLatAmerica -271.3 1007.0 -0.269 0.7883
RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***
βAsia and βLAm are close. So we may want to test the null hypothesis:
H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0
against the alternative of
H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0
What would be an appropriate test statistic for this hypothesis?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 36 / 140
![Page 165: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/165.jpg)
Testing Hypothesis About a Linear Combination of βjR Code
> fit <- lm(REALGDPCAP ~ Region, data = D)
> summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4452.7 783.4 5.684 2.07e-07 ***
RegionAfrica -2552.8 1204.5 -2.119 0.0372 *
RegionAsia 148.9 1149.8 0.129 0.8973
RegionLatAmerica -271.3 1007.0 -0.269 0.7883
RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***
βAsia and βLAm are close. So we may want to test the null hypothesis:
H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0
against the alternative of
H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0
What would be an appropriate test statistic for this hypothesis?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 36 / 140
![Page 166: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/166.jpg)
Testing Hypothesis About a Linear Combination of βjR Code
> fit <- lm(REALGDPCAP ~ Region, data = D)
> summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4452.7 783.4 5.684 2.07e-07 ***
RegionAfrica -2552.8 1204.5 -2.119 0.0372 *
RegionAsia 148.9 1149.8 0.129 0.8973
RegionLatAmerica -271.3 1007.0 -0.269 0.7883
RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***
Let’s consider a t-value:
T =βLAm − βAsia
SE(βLAm − βAsia)
We will reject H0 if T is sufficiently different from zero.
Note that unlike the test of a single hypothesis, both βLAm and βAsia are randomvariables, hence the denominator.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 37 / 140
![Page 167: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/167.jpg)
Testing Hypothesis About a Linear Combination of βjR Code
> fit <- lm(REALGDPCAP ~ Region, data = D)
> summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4452.7 783.4 5.684 2.07e-07 ***
RegionAfrica -2552.8 1204.5 -2.119 0.0372 *
RegionAsia 148.9 1149.8 0.129 0.8973
RegionLatAmerica -271.3 1007.0 -0.269 0.7883
RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***
Let’s consider a t-value:
T =βLAm − βAsia
SE(βLAm − βAsia)
We will reject H0 if T is sufficiently different from zero.
Note that unlike the test of a single hypothesis, both βLAm and βAsia are randomvariables, hence the denominator.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 37 / 140
![Page 168: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/168.jpg)
Testing Hypothesis About a Linear Combination of βjOur test statistic:
T =βLAm − βAsia
SE (βLAm − βAsia)∼ tn−k−1
How do you find SE (βLAm − βAsia)?
Is it SE (βLAm)− SE (βAsia)? No!
Is it SE (βLAm) + SE (βAsia)? No!
Recall the following property of the variance:
V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )
Therefore, the standard error for a linear combination of coefficients is:
SE (β1 ± β2) =
√V (β1) + V (β2)± 2Cov[β1, β2]
which we can calculate from the estimated covariance matrix of β.
Since the estimates of the coefficients are correlated, we need the covarianceterm.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 38 / 140
![Page 169: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/169.jpg)
Testing Hypothesis About a Linear Combination of βjOur test statistic:
T =βLAm − βAsia
SE (βLAm − βAsia)∼ tn−k−1
How do you find SE (βLAm − βAsia)?
Is it SE (βLAm)− SE (βAsia)? No!
Is it SE (βLAm) + SE (βAsia)? No!
Recall the following property of the variance:
V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )
Therefore, the standard error for a linear combination of coefficients is:
SE (β1 ± β2) =
√V (β1) + V (β2)± 2Cov[β1, β2]
which we can calculate from the estimated covariance matrix of β.
Since the estimates of the coefficients are correlated, we need the covarianceterm.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 38 / 140
![Page 170: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/170.jpg)
Testing Hypothesis About a Linear Combination of βjOur test statistic:
T =βLAm − βAsia
SE (βLAm − βAsia)∼ tn−k−1
How do you find SE (βLAm − βAsia)?
Is it SE (βLAm)− SE (βAsia)?
No!
Is it SE (βLAm) + SE (βAsia)? No!
Recall the following property of the variance:
V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )
Therefore, the standard error for a linear combination of coefficients is:
SE (β1 ± β2) =
√V (β1) + V (β2)± 2Cov[β1, β2]
which we can calculate from the estimated covariance matrix of β.
Since the estimates of the coefficients are correlated, we need the covarianceterm.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 38 / 140
![Page 171: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/171.jpg)
Testing Hypothesis About a Linear Combination of βjOur test statistic:
T =βLAm − βAsia
SE (βLAm − βAsia)∼ tn−k−1
How do you find SE (βLAm − βAsia)?
Is it SE (βLAm)− SE (βAsia)? No!
Is it SE (βLAm) + SE (βAsia)? No!
Recall the following property of the variance:
V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )
Therefore, the standard error for a linear combination of coefficients is:
SE (β1 ± β2) =
√V (β1) + V (β2)± 2Cov[β1, β2]
which we can calculate from the estimated covariance matrix of β.
Since the estimates of the coefficients are correlated, we need the covarianceterm.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 38 / 140
![Page 172: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/172.jpg)
Testing Hypothesis About a Linear Combination of βjOur test statistic:
T =βLAm − βAsia
SE (βLAm − βAsia)∼ tn−k−1
How do you find SE (βLAm − βAsia)?
Is it SE (βLAm)− SE (βAsia)? No!
Is it SE (βLAm) + SE (βAsia)?
No!
Recall the following property of the variance:
V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )
Therefore, the standard error for a linear combination of coefficients is:
SE (β1 ± β2) =
√V (β1) + V (β2)± 2Cov[β1, β2]
which we can calculate from the estimated covariance matrix of β.
Since the estimates of the coefficients are correlated, we need the covarianceterm.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 38 / 140
![Page 173: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/173.jpg)
Testing Hypothesis About a Linear Combination of βjOur test statistic:
T =βLAm − βAsia
SE (βLAm − βAsia)∼ tn−k−1
How do you find SE (βLAm − βAsia)?
Is it SE (βLAm)− SE (βAsia)? No!
Is it SE (βLAm) + SE (βAsia)? No!
Recall the following property of the variance:
V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )
Therefore, the standard error for a linear combination of coefficients is:
SE (β1 ± β2) =
√V (β1) + V (β2)± 2Cov[β1, β2]
which we can calculate from the estimated covariance matrix of β.
Since the estimates of the coefficients are correlated, we need the covarianceterm.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 38 / 140
![Page 174: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/174.jpg)
Testing Hypothesis About a Linear Combination of βjOur test statistic:
T =βLAm − βAsia
SE (βLAm − βAsia)∼ tn−k−1
How do you find SE (βLAm − βAsia)?
Is it SE (βLAm)− SE (βAsia)? No!
Is it SE (βLAm) + SE (βAsia)? No!
Recall the following property of the variance:
V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )
Therefore, the standard error for a linear combination of coefficients is:
SE (β1 ± β2) =
√V (β1) + V (β2)± 2Cov[β1, β2]
which we can calculate from the estimated covariance matrix of β.
Since the estimates of the coefficients are correlated, we need the covarianceterm.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 38 / 140
![Page 175: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/175.jpg)
Testing Hypothesis About a Linear Combination of βjOur test statistic:
T =βLAm − βAsia
SE (βLAm − βAsia)∼ tn−k−1
How do you find SE (βLAm − βAsia)?
Is it SE (βLAm)− SE (βAsia)? No!
Is it SE (βLAm) + SE (βAsia)? No!
Recall the following property of the variance:
V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )
Therefore, the standard error for a linear combination of coefficients is:
SE (β1 ± β2) =
√V (β1) + V (β2)± 2Cov[β1, β2]
which we can calculate from the estimated covariance matrix of β.
Since the estimates of the coefficients are correlated, we need the covarianceterm.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 38 / 140
![Page 176: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/176.jpg)
Testing Hypothesis About a Linear Combination of βjOur test statistic:
T =βLAm − βAsia
SE (βLAm − βAsia)∼ tn−k−1
How do you find SE (βLAm − βAsia)?
Is it SE (βLAm)− SE (βAsia)? No!
Is it SE (βLAm) + SE (βAsia)? No!
Recall the following property of the variance:
V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )
Therefore, the standard error for a linear combination of coefficients is:
SE (β1 ± β2) =
√V (β1) + V (β2)± 2Cov[β1, β2]
which we can calculate from the estimated covariance matrix of β.
Since the estimates of the coefficients are correlated, we need the covarianceterm.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 38 / 140
![Page 177: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/177.jpg)
Example: GDP per capita on Regions
R Code> fit <- lm(REALGDPCAP ~ Region, data = D)
> V <- vcov(fit)
> V
(Intercept) RegionAfrica RegionAsia RegionLatAmerica
(Intercept) 613769.9 -613769.9 -613769.9 -613769.9
RegionAfrica -613769.9 1450728.8 613769.9 613769.9
RegionAsia -613769.9 613769.9 1321965.9 613769.9
RegionLatAmerica -613769.9 613769.9 613769.9 1014054.6
RegionOecd -613769.9 613769.9 613769.9 613769.9
RegionOecd
(Intercept) -613769.9
RegionAfrica 613769.9
RegionAsia 613769.9
RegionLatAmerica 613769.9
RegionOecd 1014054.6
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 39 / 140
![Page 178: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/178.jpg)
Example: GDP per capita on Regions
R Code> fit <- lm(REALGDPCAP ~ Region, data = D)
> V <- vcov(fit)
> V
(Intercept) RegionAfrica RegionAsia RegionLatAmerica
(Intercept) 613769.9 -613769.9 -613769.9 -613769.9
RegionAfrica -613769.9 1450728.8 613769.9 613769.9
RegionAsia -613769.9 613769.9 1321965.9 613769.9
RegionLatAmerica -613769.9 613769.9 613769.9 1014054.6
RegionOecd -613769.9 613769.9 613769.9 613769.9
RegionOecd
(Intercept) -613769.9
RegionAfrica 613769.9
RegionAsia 613769.9
RegionLatAmerica 613769.9
RegionOecd 1014054.6
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 39 / 140
![Page 179: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/179.jpg)
Example: GDP per capita on Regions
We can then compute the test statistic for the hypothesis of interest:
R Code> se <- sqrt(V[4,4] + V[3,3] - 2*V[3,4])
> se
[1] 1052.844
>
> tstat <- (coef(fit)[4] - coef(fit)[3])/se
> tstat
RegionLatAmerica
-0.3990977
t =βLAm − βAsia
SE(βLAm − βAsia)where
SE(βLAm − βAsia) =
√V (βLAm) + V (βAsia)− 2Cov[βLAm, βAsia]
Plugging in we get t ≈ −0.40. So what do we conclude?
We cannot reject the null that the difference in average GDP resulted from chance.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 40 / 140
![Page 180: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/180.jpg)
Example: GDP per capita on Regions
We can then compute the test statistic for the hypothesis of interest:R Code
> se <- sqrt(V[4,4] + V[3,3] - 2*V[3,4])
> se
[1] 1052.844
>
> tstat <- (coef(fit)[4] - coef(fit)[3])/se
> tstat
RegionLatAmerica
-0.3990977
t =βLAm − βAsia
SE(βLAm − βAsia)where
SE(βLAm − βAsia) =
√V (βLAm) + V (βAsia)− 2Cov[βLAm, βAsia]
Plugging in we get t ≈ −0.40. So what do we conclude?
We cannot reject the null that the difference in average GDP resulted from chance.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 40 / 140
![Page 181: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/181.jpg)
Example: GDP per capita on Regions
We can then compute the test statistic for the hypothesis of interest:R Code
> se <- sqrt(V[4,4] + V[3,3] - 2*V[3,4])
> se
[1] 1052.844
>
> tstat <- (coef(fit)[4] - coef(fit)[3])/se
> tstat
RegionLatAmerica
-0.3990977
t =βLAm − βAsia
SE(βLAm − βAsia)where
SE(βLAm − βAsia) =
√V (βLAm) + V (βAsia)− 2Cov[βLAm, βAsia]
Plugging in we get t ≈ −0.40. So what do we conclude?
We cannot reject the null that the difference in average GDP resulted from chance.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 40 / 140
![Page 182: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/182.jpg)
Aside: Adjusted R2
R Code> fit <- lm(vote1 ~ fem + educ + age, data = d)
> summary(fit)
~~~~~
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***
fem 0.1360034 0.0237132 5.735 1.15e-08 ***
educ -0.0607604 0.0138649 -4.382 1.25e-05 ***
age 0.0037786 0.0008315 4.544 5.90e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.4875 on 1699 degrees of freedom
Multiple R-squared: 0.05112, Adjusted R-squared: 0.04945
F-statistic: 30.51 on 3 and 1699 DF, p-value: < 2.2e-16
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 41 / 140
![Page 183: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/183.jpg)
Aside: Adjusted R2
R Code> fit <- lm(vote1 ~ fem + educ + age, data = d)
> summary(fit)
~~~~~
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***
fem 0.1360034 0.0237132 5.735 1.15e-08 ***
educ -0.0607604 0.0138649 -4.382 1.25e-05 ***
age 0.0037786 0.0008315 4.544 5.90e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.4875 on 1699 degrees of freedom
Multiple R-squared: 0.05112, Adjusted R-squared: 0.04945
F-statistic: 30.51 on 3 and 1699 DF, p-value: < 2.2e-16
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 41 / 140
![Page 184: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/184.jpg)
Aside: Adjusted R2
R2 often used to assess in-sample model fit. Recall
R2 = 1− SSres
SStot
where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.
Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables
So, how do we penalize more complex models? Adjusted R2
This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.
Still since people report it, let’s quickly derive adjusted R2,
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 42 / 140
![Page 185: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/185.jpg)
Aside: Adjusted R2
R2 often used to assess in-sample model fit.
Recall
R2 = 1− SSres
SStot
where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.
Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables
So, how do we penalize more complex models? Adjusted R2
This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.
Still since people report it, let’s quickly derive adjusted R2,
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 42 / 140
![Page 186: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/186.jpg)
Aside: Adjusted R2
R2 often used to assess in-sample model fit. Recall
R2 = 1− SSres
SStot
where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.
Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables
So, how do we penalize more complex models? Adjusted R2
This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.
Still since people report it, let’s quickly derive adjusted R2,
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 42 / 140
![Page 187: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/187.jpg)
Aside: Adjusted R2
R2 often used to assess in-sample model fit. Recall
R2 = 1− SSres
SStot
where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.
Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables
So, how do we penalize more complex models? Adjusted R2
This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.
Still since people report it, let’s quickly derive adjusted R2,
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 42 / 140
![Page 188: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/188.jpg)
Aside: Adjusted R2
R2 often used to assess in-sample model fit. Recall
R2 = 1− SSres
SStot
where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.
Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables
So, how do we penalize more complex models?
Adjusted R2
This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.
Still since people report it, let’s quickly derive adjusted R2,
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 42 / 140
![Page 189: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/189.jpg)
Aside: Adjusted R2
R2 often used to assess in-sample model fit. Recall
R2 = 1− SSres
SStot
where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.
Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables
So, how do we penalize more complex models? Adjusted R2
This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.
Still since people report it, let’s quickly derive adjusted R2,
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 42 / 140
![Page 190: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/190.jpg)
Aside: Adjusted R2
R2 often used to assess in-sample model fit. Recall
R2 = 1− SSres
SStot
where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.
Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables
So, how do we penalize more complex models? Adjusted R2
This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.
Still since people report it, let’s quickly derive adjusted R2,
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 42 / 140
![Page 191: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/191.jpg)
Aside: Adjusted R2
R2 often used to assess in-sample model fit. Recall
R2 = 1− SSres
SStot
where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.
Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables
So, how do we penalize more complex models? Adjusted R2
This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.
Still since people report it, let’s quickly derive adjusted R2,
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 42 / 140
![Page 192: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/192.jpg)
Aside: Adjusted R2
Key idea: rewrite R2 in terms of variances
R2 = 1− SSres/n
SStot/n
= 1− V(SSres)
V(SStot)
where V is a biased estimator of the population variance.
What if we replace the biased estimator with the unbiased estimators
V(SSres) = SSres/(n − k − 1)
V(SStot) = SStot/(n − 1)
Some algebra gets us to
R2adj = R2 − (1− R2)
k − 1
n − k︸ ︷︷ ︸model complexity penalty
Adjusted R2 will always be smaller than R2 and can sometimes be negative!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 43 / 140
![Page 193: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/193.jpg)
Aside: Adjusted R2
Key idea: rewrite R2 in terms of variances
R2 = 1− SSres/n
SStot/n
= 1− V(SSres)
V(SStot)
where V is a biased estimator of the population variance.
What if we replace the biased estimator with the unbiased estimators
V(SSres) = SSres/(n − k − 1)
V(SStot) = SStot/(n − 1)
Some algebra gets us to
R2adj = R2 − (1− R2)
k − 1
n − k︸ ︷︷ ︸model complexity penalty
Adjusted R2 will always be smaller than R2 and can sometimes be negative!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 43 / 140
![Page 194: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/194.jpg)
Aside: Adjusted R2
Key idea: rewrite R2 in terms of variances
R2 = 1− SSres/n
SStot/n
= 1− V(SSres)
V(SStot)
where V is a biased estimator of the population variance.
What if we replace the biased estimator with the unbiased estimators
V(SSres) = SSres/(n − k − 1)
V(SStot) = SStot/(n − 1)
Some algebra gets us to
R2adj = R2 − (1− R2)
k − 1
n − k︸ ︷︷ ︸model complexity penalty
Adjusted R2 will always be smaller than R2 and can sometimes be negative!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 43 / 140
![Page 195: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/195.jpg)
Aside: Adjusted R2
Key idea: rewrite R2 in terms of variances
R2 = 1− SSres/n
SStot/n
= 1− V(SSres)
V(SStot)
where V is a biased estimator of the population variance.
What if we replace the biased estimator with the unbiased estimators
V(SSres) = SSres/(n − k − 1)
V(SStot) = SStot/(n − 1)
Some algebra gets us to
R2adj = R2 − (1− R2)
k − 1
n − k︸ ︷︷ ︸model complexity penalty
Adjusted R2 will always be smaller than R2 and can sometimes be negative!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 43 / 140
![Page 196: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/196.jpg)
Aside: Adjusted R2
Key idea: rewrite R2 in terms of variances
R2 = 1− SSres/n
SStot/n
= 1− V(SSres)
V(SStot)
where V is a biased estimator of the population variance.
What if we replace the biased estimator with the unbiased estimators
V(SSres) = SSres/(n − k − 1)
V(SStot) = SStot/(n − 1)
Some algebra gets us to
R2adj = R2 − (1− R2)
k − 1
n − k︸ ︷︷ ︸model complexity penalty
Adjusted R2 will always be smaller than R2 and can sometimes be negative!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 43 / 140
![Page 197: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/197.jpg)
Aside: Adjusted R2
Key idea: rewrite R2 in terms of variances
R2 = 1− SSres/n
SStot/n
= 1− V(SSres)
V(SStot)
where V is a biased estimator of the population variance.
What if we replace the biased estimator with the unbiased estimators
V(SSres) = SSres/(n − k − 1)
V(SStot) = SStot/(n − 1)
Some algebra gets us to
R2adj = R2 − (1− R2)
k − 1
n − k︸ ︷︷ ︸model complexity penalty
Adjusted R2 will always be smaller than R2 and can sometimes be negative!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 43 / 140
![Page 198: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/198.jpg)
Aside: Adjusted R2
Key idea: rewrite R2 in terms of variances
R2 = 1− SSres/n
SStot/n
= 1− V(SSres)
V(SStot)
where V is a biased estimator of the population variance.
What if we replace the biased estimator with the unbiased estimators
V(SSres) = SSres/(n − k − 1)
V(SStot) = SStot/(n − 1)
Some algebra gets us to
R2adj = R2 − (1− R2)
k − 1
n − k︸ ︷︷ ︸model complexity penalty
Adjusted R2 will always be smaller than R2 and can sometimes be negative!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 43 / 140
![Page 199: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/199.jpg)
Aside: Adjusted R2
Key idea: rewrite R2 in terms of variances
R2 = 1− SSres/n
SStot/n
= 1− V(SSres)
V(SStot)
where V is a biased estimator of the population variance.
What if we replace the biased estimator with the unbiased estimators
V(SSres) = SSres/(n − k − 1)
V(SStot) = SStot/(n − 1)
Some algebra gets us to
R2adj = R2 − (1− R2)
k − 1
n − k︸ ︷︷ ︸model complexity penalty
Adjusted R2 will always be smaller than R2 and can sometimes be negative!Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 43 / 140
![Page 200: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/200.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 44 / 140
![Page 201: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/201.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 44 / 140
![Page 202: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/202.jpg)
F Test for Joint Significance of Coefficients
In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)
Suppose our regression model is:
Voted = β0 + γ1FEMALE + β1EDUCATION+
γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u
and we want to testH0 : γ1 = γ2 = γ3 = 0.
Substantively, what question are we asking?
→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).
This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.
If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.
F tests allows us to to test joint hypothesis
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 45 / 140
![Page 203: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/203.jpg)
F Test for Joint Significance of Coefficients
In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)
Suppose our regression model is:
Voted = β0 + γ1FEMALE + β1EDUCATION+
γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u
and we want to testH0 : γ1 = γ2 = γ3 = 0.
Substantively, what question are we asking?
→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).
This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.
If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.
F tests allows us to to test joint hypothesis
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 45 / 140
![Page 204: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/204.jpg)
F Test for Joint Significance of Coefficients
In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)
Suppose our regression model is:
Voted = β0 + γ1FEMALE + β1EDUCATION+
γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u
and we want to testH0 : γ1 = γ2 = γ3 = 0.
Substantively, what question are we asking?
→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).
This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.
If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.
F tests allows us to to test joint hypothesis
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 45 / 140
![Page 205: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/205.jpg)
F Test for Joint Significance of Coefficients
In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)
Suppose our regression model is:
Voted = β0 + γ1FEMALE + β1EDUCATION+
γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u
and we want to testH0 : γ1 = γ2 = γ3 = 0.
Substantively, what question are we asking?
→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).
This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.
If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.
F tests allows us to to test joint hypothesis
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 45 / 140
![Page 206: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/206.jpg)
F Test for Joint Significance of Coefficients
In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)
Suppose our regression model is:
Voted = β0 + γ1FEMALE + β1EDUCATION+
γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u
and we want to testH0 : γ1 = γ2 = γ3 = 0.
Substantively, what question are we asking?
→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).
This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.
If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.
F tests allows us to to test joint hypothesis
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 45 / 140
![Page 207: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/207.jpg)
F Test for Joint Significance of Coefficients
In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)
Suppose our regression model is:
Voted = β0 + γ1FEMALE + β1EDUCATION+
γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u
and we want to testH0 : γ1 = γ2 = γ3 = 0.
Substantively, what question are we asking?
→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).
This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.
If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.
F tests allows us to to test joint hypothesis
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 45 / 140
![Page 208: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/208.jpg)
F Test for Joint Significance of Coefficients
In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)
Suppose our regression model is:
Voted = β0 + γ1FEMALE + β1EDUCATION+
γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u
and we want to testH0 : γ1 = γ2 = γ3 = 0.
Substantively, what question are we asking?
→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).
This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.
If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.
F tests allows us to to test joint hypothesis
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 45 / 140
![Page 209: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/209.jpg)
F Test for Joint Significance of Coefficients
In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)
Suppose our regression model is:
Voted = β0 + γ1FEMALE + β1EDUCATION+
γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u
and we want to testH0 : γ1 = γ2 = γ3 = 0.
Substantively, what question are we asking?
→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).
This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.
If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.
F tests allows us to to test joint hypothesis
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 45 / 140
![Page 210: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/210.jpg)
The χ2 Distribution
To test more than one hypothesis jointly we need to introduce some newprobability distributions.
Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).
Then, the sum of their squares, X =∑n
i=1 Z2i , is distributed according to the χ2
distribution with n degrees of freedom, X ∼ χ2n.
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
0.5
x
Den
sity
chisquare 1chisquare 4chisquare 15
Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 46 / 140
![Page 211: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/211.jpg)
The χ2 Distribution
To test more than one hypothesis jointly we need to introduce some newprobability distributions.
Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).
Then, the sum of their squares, X =∑n
i=1 Z2i , is distributed according to the χ2
distribution with n degrees of freedom, X ∼ χ2n.
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
0.5
x
Den
sity
chisquare 1chisquare 4chisquare 15
Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 46 / 140
![Page 212: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/212.jpg)
The χ2 Distribution
To test more than one hypothesis jointly we need to introduce some newprobability distributions.
Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).
Then, the sum of their squares, X =∑n
i=1 Z2i , is distributed according to the χ2
distribution with n degrees of freedom, X ∼ χ2n.
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
0.5
x
Den
sity
chisquare 1chisquare 4chisquare 15
Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 46 / 140
![Page 213: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/213.jpg)
The χ2 Distribution
To test more than one hypothesis jointly we need to introduce some newprobability distributions.
Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).
Then, the sum of their squares, X =∑n
i=1 Z2i , is distributed according to the χ2
distribution with n degrees of freedom, X ∼ χ2n.
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
0.5
x
Den
sity
chisquare 1chisquare 4chisquare 15
Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 46 / 140
![Page 214: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/214.jpg)
The χ2 Distribution
To test more than one hypothesis jointly we need to introduce some newprobability distributions.
Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).
Then, the sum of their squares, X =∑n
i=1 Z2i , is distributed according to the χ2
distribution with n degrees of freedom, X ∼ χ2n.
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
0.5
x
Den
sity
chisquare 1chisquare 4chisquare 15
Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 46 / 140
![Page 215: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/215.jpg)
The F distribution
The F distribution arises as a ratio of two independent chi-squared distributed randomvariables:
F =X1/df1X2/df2
∼ Fdf1,df2
where X1 ∼ χ2df1
, X2 ∼ χ2df2
, and X1⊥⊥X2.
df1 and df2 are called the numerator degrees of freedom and the denominator degrees offreedom.
0.0 0.5 1.0 1.5 2.0
01
23
4
x
Den
sity
F 1,2F 5,5F 30, 20F 500, 200
In R: df(), pf(), rf()
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 47 / 140
![Page 216: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/216.jpg)
The F distributionThe F distribution arises as a ratio of two independent chi-squared distributed randomvariables:
F =X1/df1X2/df2
∼ Fdf1,df2
where X1 ∼ χ2df1
, X2 ∼ χ2df2
, and X1⊥⊥X2.
df1 and df2 are called the numerator degrees of freedom and the denominator degrees offreedom.
0.0 0.5 1.0 1.5 2.0
01
23
4
x
Den
sity
F 1,2F 5,5F 30, 20F 500, 200
In R: df(), pf(), rf()
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 47 / 140
![Page 217: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/217.jpg)
The F distributionThe F distribution arises as a ratio of two independent chi-squared distributed randomvariables:
F =X1/df1X2/df2
∼ Fdf1,df2
where X1 ∼ χ2df1
, X2 ∼ χ2df2
, and X1⊥⊥X2.
df1 and df2 are called the numerator degrees of freedom and the denominator degrees offreedom.
0.0 0.5 1.0 1.5 2.0
01
23
4
x
Den
sity
F 1,2F 5,5F 30, 20F 500, 200
In R: df(), pf(), rf()
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 47 / 140
![Page 218: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/218.jpg)
The F distributionThe F distribution arises as a ratio of two independent chi-squared distributed randomvariables:
F =X1/df1X2/df2
∼ Fdf1,df2
where X1 ∼ χ2df1
, X2 ∼ χ2df2
, and X1⊥⊥X2.
df1 and df2 are called the numerator degrees of freedom and the denominator degrees offreedom.
0.0 0.5 1.0 1.5 2.0
01
23
4
x
Den
sity
F 1,2F 5,5F 30, 20F 500, 200
In R: df(), pf(), rf()
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 47 / 140
![Page 219: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/219.jpg)
F Test against H0 : γ1 = γ2 = γ3 = 0.
The F statistic can be calculated by the following procedure:
1 Fit the Unrestricted Model (UR) which does not impose H0:
Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u
2 Fit the Restricted Model (R) which does impose H0:
Vote = β0 + β1EDUC + β2AGE + u
3 From the two results, compute the F Statistic:
F0 =(SSRr − SSRur )/q
SSRur/(n − k − 1)
where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.
Intuition:
increase in prediction error
original prediction error
The F statistics have the following sampling distributions:
Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.
Under Assumptions 1–5, qF0a.∼ χ2
q as n→∞ (see next section).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 48 / 140
![Page 220: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/220.jpg)
F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:
1 Fit the Unrestricted Model (UR) which does not impose H0:
Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u
2 Fit the Restricted Model (R) which does impose H0:
Vote = β0 + β1EDUC + β2AGE + u
3 From the two results, compute the F Statistic:
F0 =(SSRr − SSRur )/q
SSRur/(n − k − 1)
where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.
Intuition:
increase in prediction error
original prediction error
The F statistics have the following sampling distributions:
Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.
Under Assumptions 1–5, qF0a.∼ χ2
q as n→∞ (see next section).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 48 / 140
![Page 221: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/221.jpg)
F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:
1 Fit the Unrestricted Model (UR) which does not impose H0:
Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u
2 Fit the Restricted Model (R) which does impose H0:
Vote = β0 + β1EDUC + β2AGE + u
3 From the two results, compute the F Statistic:
F0 =(SSRr − SSRur )/q
SSRur/(n − k − 1)
where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.
Intuition:
increase in prediction error
original prediction error
The F statistics have the following sampling distributions:
Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.
Under Assumptions 1–5, qF0a.∼ χ2
q as n→∞ (see next section).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 48 / 140
![Page 222: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/222.jpg)
F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:
1 Fit the Unrestricted Model (UR) which does not impose H0:
Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u
2 Fit the Restricted Model (R) which does impose H0:
Vote = β0 + β1EDUC + β2AGE + u
3 From the two results, compute the F Statistic:
F0 =(SSRr − SSRur )/q
SSRur/(n − k − 1)
where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.
Intuition:
increase in prediction error
original prediction error
The F statistics have the following sampling distributions:
Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.
Under Assumptions 1–5, qF0a.∼ χ2
q as n→∞ (see next section).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 48 / 140
![Page 223: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/223.jpg)
F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:
1 Fit the Unrestricted Model (UR) which does not impose H0:
Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u
2 Fit the Restricted Model (R) which does impose H0:
Vote = β0 + β1EDUC + β2AGE + u
3 From the two results, compute the F Statistic:
F0 =(SSRr − SSRur )/q
SSRur/(n − k − 1)
where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.
Intuition:
increase in prediction error
original prediction error
The F statistics have the following sampling distributions:
Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.
Under Assumptions 1–5, qF0a.∼ χ2
q as n→∞ (see next section).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 48 / 140
![Page 224: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/224.jpg)
F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:
1 Fit the Unrestricted Model (UR) which does not impose H0:
Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u
2 Fit the Restricted Model (R) which does impose H0:
Vote = β0 + β1EDUC + β2AGE + u
3 From the two results, compute the F Statistic:
F0 =(SSRr − SSRur )/q
SSRur/(n − k − 1)
where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.
Intuition:
increase in prediction error
original prediction error
The F statistics have the following sampling distributions:
Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.
Under Assumptions 1–5, qF0a.∼ χ2
q as n→∞ (see next section).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 48 / 140
![Page 225: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/225.jpg)
F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:
1 Fit the Unrestricted Model (UR) which does not impose H0:
Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u
2 Fit the Restricted Model (R) which does impose H0:
Vote = β0 + β1EDUC + β2AGE + u
3 From the two results, compute the F Statistic:
F0 =(SSRr − SSRur )/q
SSRur/(n − k − 1)
where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.
Intuition:
increase in prediction error
original prediction error
The F statistics have the following sampling distributions:
Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.
Under Assumptions 1–5, qF0a.∼ χ2
q as n→∞ (see next section).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 48 / 140
![Page 226: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/226.jpg)
F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:
1 Fit the Unrestricted Model (UR) which does not impose H0:
Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u
2 Fit the Restricted Model (R) which does impose H0:
Vote = β0 + β1EDUC + β2AGE + u
3 From the two results, compute the F Statistic:
F0 =(SSRr − SSRur )/q
SSRur/(n − k − 1)
where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.
Intuition:
increase in prediction error
original prediction error
The F statistics have the following sampling distributions:
Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.
Under Assumptions 1–5, qF0a.∼ χ2
q as n→∞ (see next section).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 48 / 140
![Page 227: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/227.jpg)
F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:
1 Fit the Unrestricted Model (UR) which does not impose H0:
Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u
2 Fit the Restricted Model (R) which does impose H0:
Vote = β0 + β1EDUC + β2AGE + u
3 From the two results, compute the F Statistic:
F0 =(SSRr − SSRur )/q
SSRur/(n − k − 1)
where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.
Intuition:
increase in prediction error
original prediction error
The F statistics have the following sampling distributions:
Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.
Under Assumptions 1–5, qF0a.∼ χ2
q as n→∞ (see next section).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 48 / 140
![Page 228: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/228.jpg)
Unrestricted Model (UR)
R Code> fit.UR <- lm(vote1 ~ fem + educ + age + fem:age + fem:educ, data = Chile)
> summary(fit.UR)
~~~~~
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.293130 0.069242 4.233 2.42e-05 ***
fem 0.368975 0.098883 3.731 0.000197 ***
educ -0.038571 0.019578 -1.970 0.048988 *
age 0.005482 0.001114 4.921 9.44e-07 ***
fem:age -0.003779 0.001673 -2.259 0.024010 *
fem:educ -0.044484 0.027697 -1.606 0.108431
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.487 on 1697 degrees of freedom
Multiple R-squared: 0.05451, Adjusted R-squared: 0.05172
F-statistic: 19.57 on 5 and 1697 DF, p-value: < 2.2e-16
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 49 / 140
![Page 229: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/229.jpg)
Restricted Model (R)
R Code> fit.R <- lm(vote1 ~ educ + age, data = Chile)
> summary(fit.R)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4878039 0.0497550 9.804 < 2e-16 ***
educ -0.0662022 0.0139615 -4.742 2.30e-06 ***
age 0.0035783 0.0008385 4.267 2.09e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.4921 on 1700 degrees of freedom
Multiple R-squared: 0.03275, Adjusted R-squared: 0.03161
F-statistic: 28.78 on 2 and 1700 DF, p-value: 5.097e-13
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 50 / 140
![Page 230: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/230.jpg)
F Test in R
R Code> SSR.UR <- sum(resid(fit.UR)^2) # = 402
> SSR.R <- sum(resid(fit.R)^2) # = 411
> DFdenom <- df.residual(fit.UR) # = 1703
> DFnum <- 3
> F <- ((SSR.R - SSR.UR)/DFnum) / (SSR.UR/DFdenom)
> F
[1] 13.01581
> qf(0.99, DFnum, DFdenom)
[1] 3.793171
Given above, what do we conclude?
F0 = 13 is greater than the critical value for a .01 level test. So we rejectthe null hypothesis.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 51 / 140
![Page 231: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/231.jpg)
F Test in R
R Code> SSR.UR <- sum(resid(fit.UR)^2) # = 402
> SSR.R <- sum(resid(fit.R)^2) # = 411
> DFdenom <- df.residual(fit.UR) # = 1703
> DFnum <- 3
> F <- ((SSR.R - SSR.UR)/DFnum) / (SSR.UR/DFdenom)
> F
[1] 13.01581
> qf(0.99, DFnum, DFdenom)
[1] 3.793171
Given above, what do we conclude?F0 = 13 is greater than the critical value for a .01 level test. So we rejectthe null hypothesis.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 51 / 140
![Page 232: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/232.jpg)
Null Distribution, Critical Value, and Test StatisticNote that the F statistic is always positive, so we only look at the right tail of thereference F (or χ2 in a large sample) distribution.
0 5 10 15
0.0
0.2
0.4
0.6
value
dens
ity o
f F(3
,170
3−5−
1)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 52 / 140
![Page 233: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/233.jpg)
F Test Examples I
The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
We may want to test:H0 : β1 = β2 = ... = βk = 0
What question are we asking?
→ Does any of the X variables help to predict Y ?
This is called the omnibus test and is routinely reported by statisticalsoftware.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 53 / 140
![Page 234: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/234.jpg)
F Test Examples I
The F test can be used to test various joint hypotheses which involve multiplelinear restrictions.
Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
We may want to test:H0 : β1 = β2 = ... = βk = 0
What question are we asking?
→ Does any of the X variables help to predict Y ?
This is called the omnibus test and is routinely reported by statisticalsoftware.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 53 / 140
![Page 235: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/235.jpg)
F Test Examples I
The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
We may want to test:H0 : β1 = β2 = ... = βk = 0
What question are we asking?
→ Does any of the X variables help to predict Y ?
This is called the omnibus test and is routinely reported by statisticalsoftware.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 53 / 140
![Page 236: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/236.jpg)
F Test Examples I
The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
We may want to test:H0 : β1 = β2 = ... = βk = 0
What question are we asking?
→ Does any of the X variables help to predict Y ?
This is called the omnibus test and is routinely reported by statisticalsoftware.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 53 / 140
![Page 237: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/237.jpg)
F Test Examples I
The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
We may want to test:H0 : β1 = β2 = ... = βk = 0
What question are we asking?
→ Does any of the X variables help to predict Y ?
This is called the omnibus test and is routinely reported by statisticalsoftware.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 53 / 140
![Page 238: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/238.jpg)
F Test Examples I
The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
We may want to test:H0 : β1 = β2 = ... = βk = 0
What question are we asking?
→ Does any of the X variables help to predict Y ?
This is called the omnibus test and is routinely reported by statisticalsoftware.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 53 / 140
![Page 239: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/239.jpg)
F Test Examples I
The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
We may want to test:H0 : β1 = β2 = ... = βk = 0
What question are we asking?
→ Does any of the X variables help to predict Y ?
This is called the omnibus test and is routinely reported by statisticalsoftware.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 53 / 140
![Page 240: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/240.jpg)
Omnibus Test in R
R Code> summary(fit.UR)
~~~~~
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.293130 0.069242 4.233 2.42e-05 ***
fem 0.368975 0.098883 3.731 0.000197 ***
educ -0.038571 0.019578 -1.970 0.048988 *
age 0.005482 0.001114 4.921 9.44e-07 ***
fem:age -0.003779 0.001673 -2.259 0.024010 *
fem:educ -0.044484 0.027697 -1.606 0.108431
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.487 on 1697 degrees of freedom
Multiple R-squared: 0.05451, Adjusted R-squared: 0.05172
F-statistic: 19.57 on 5 and 1697 DF, p-value: < 2.2e-16
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 54 / 140
![Page 241: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/241.jpg)
Omnibus Test in R with Random NoiseR Code
> set.seed(08540)
> p <- 10; x <- matrix(rnorm(p*1000), nrow=1000)
> y <- rnorm(1000); summary(lm(y~x))
~~~~~
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0115475 0.0320874 -0.360 0.7190
x1 -0.0019803 0.0333524 -0.059 0.9527
x2 0.0666275 0.0314087 2.121 0.0341 *
x3 -0.0008594 0.0321270 -0.027 0.9787
x4 0.0051185 0.0333678 0.153 0.8781
x5 0.0136656 0.0322592 0.424 0.6719
x6 0.0102115 0.0332045 0.308 0.7585
x7 -0.0103903 0.0307639 -0.338 0.7356
x8 -0.0401722 0.0318317 -1.262 0.2072
x9 0.0553019 0.0315548 1.753 0.0800 .
x10 0.0410906 0.0319742 1.285 0.1991
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.011 on 989 degrees of freedom
Multiple R-squared: 0.01129, Adjusted R-squared: 0.001294
F-statistic: 1.129 on 10 and 989 DF, p-value: 0.3364
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 55 / 140
![Page 242: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/242.jpg)
F Test Examples II
The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
Next, let’s consider:H0 : β1 = β2 = β3
What question are we asking?
→ Are the coefficients X1, X2 and X3 different from each other?
How many restrictions?
→ Two (β1 − β2 = 0 and β2 − β3 = 0)
How do we fit the restricted model?
→ The null hypothesis implies that the model can be written as:
Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u
So we create a new variable X ∗ = X1 + X2 + X3 and fit:
Y = β0 + β1X∗ + ...+ βkXk + u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 56 / 140
![Page 243: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/243.jpg)
F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
Next, let’s consider:H0 : β1 = β2 = β3
What question are we asking?
→ Are the coefficients X1, X2 and X3 different from each other?
How many restrictions?
→ Two (β1 − β2 = 0 and β2 − β3 = 0)
How do we fit the restricted model?
→ The null hypothesis implies that the model can be written as:
Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u
So we create a new variable X ∗ = X1 + X2 + X3 and fit:
Y = β0 + β1X∗ + ...+ βkXk + u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 56 / 140
![Page 244: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/244.jpg)
F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
Next, let’s consider:H0 : β1 = β2 = β3
What question are we asking?
→ Are the coefficients X1, X2 and X3 different from each other?
How many restrictions?
→ Two (β1 − β2 = 0 and β2 − β3 = 0)
How do we fit the restricted model?
→ The null hypothesis implies that the model can be written as:
Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u
So we create a new variable X ∗ = X1 + X2 + X3 and fit:
Y = β0 + β1X∗ + ...+ βkXk + u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 56 / 140
![Page 245: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/245.jpg)
F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
Next, let’s consider:H0 : β1 = β2 = β3
What question are we asking?
→ Are the coefficients X1, X2 and X3 different from each other?
How many restrictions?
→ Two (β1 − β2 = 0 and β2 − β3 = 0)
How do we fit the restricted model?
→ The null hypothesis implies that the model can be written as:
Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u
So we create a new variable X ∗ = X1 + X2 + X3 and fit:
Y = β0 + β1X∗ + ...+ βkXk + u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 56 / 140
![Page 246: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/246.jpg)
F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
Next, let’s consider:H0 : β1 = β2 = β3
What question are we asking?
→ Are the coefficients X1, X2 and X3 different from each other?
How many restrictions?
→ Two (β1 − β2 = 0 and β2 − β3 = 0)
How do we fit the restricted model?
→ The null hypothesis implies that the model can be written as:
Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u
So we create a new variable X ∗ = X1 + X2 + X3 and fit:
Y = β0 + β1X∗ + ...+ βkXk + u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 56 / 140
![Page 247: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/247.jpg)
F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
Next, let’s consider:H0 : β1 = β2 = β3
What question are we asking?
→ Are the coefficients X1, X2 and X3 different from each other?
How many restrictions?
→ Two (β1 − β2 = 0 and β2 − β3 = 0)
How do we fit the restricted model?
→ The null hypothesis implies that the model can be written as:
Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u
So we create a new variable X ∗ = X1 + X2 + X3 and fit:
Y = β0 + β1X∗ + ...+ βkXk + u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 56 / 140
![Page 248: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/248.jpg)
F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
Next, let’s consider:H0 : β1 = β2 = β3
What question are we asking?
→ Are the coefficients X1, X2 and X3 different from each other?
How many restrictions?
→ Two (β1 − β2 = 0 and β2 − β3 = 0)
How do we fit the restricted model?
→ The null hypothesis implies that the model can be written as:
Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u
So we create a new variable X ∗ = X1 + X2 + X3 and fit:
Y = β0 + β1X∗ + ...+ βkXk + u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 56 / 140
![Page 249: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/249.jpg)
F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u
Next, let’s consider:H0 : β1 = β2 = β3
What question are we asking?
→ Are the coefficients X1, X2 and X3 different from each other?
How many restrictions?
→ Two (β1 − β2 = 0 and β2 − β3 = 0)
How do we fit the restricted model?
→ The null hypothesis implies that the model can be written as:
Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u
So we create a new variable X ∗ = X1 + X2 + X3 and fit:
Y = β0 + β1X∗ + ...+ βkXk + u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 56 / 140
![Page 250: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/250.jpg)
Testing Equality of Coefficients in R
R Code> fit.UR2 <- lm(REALGDPCAP ~ Asia + LatAmerica + Transit + Oecd, data = D)
> summary(fit.UR2)
~~~~~
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1899.9 914.9 2.077 0.0410 *
Asia 2701.7 1243.0 2.173 0.0327 *
LatAmerica 2281.5 1112.3 2.051 0.0435 *
Transit 2552.8 1204.5 2.119 0.0372 *
Oecd 12224.2 1112.3 10.990 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 3034 on 80 degrees of freedom
Multiple R-squared: 0.7096, Adjusted R-squared: 0.6951
F-statistic: 48.88 on 4 and 80 DF, p-value: < 2.2e-16
Are the coefficients on Asia, LatAmerica and Transit statisticallysignificantly different?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 57 / 140
![Page 251: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/251.jpg)
Testing Equality of Coefficients in RR Code
> D$Xstar <- D$Asia + D$LatAmerica + D$Transit
> fit.R2 <- lm(REALGDPCAP ~ Xstar + Oecd, data = D)
> SSR.UR2 <- sum(resid(fit.UR2)^2)
> SSR.R2 <- sum(resid(fit.R2)^2)
> DFdenom <- df.residual(fit.UR2)
> F <- ((SSR.R2 - SSR.UR2)/2) / (SSR.UR2/DFdenom)
> F
[1] 0.08786129
> pf(F, 2, DFdenom, lower.tail = F)
[1] 0.9159762
So, what do we conclude?
The three coefficients are statistically indistinguishable from each other,with the p-value of 0.916.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 58 / 140
![Page 252: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/252.jpg)
Testing Equality of Coefficients in RR Code
> D$Xstar <- D$Asia + D$LatAmerica + D$Transit
> fit.R2 <- lm(REALGDPCAP ~ Xstar + Oecd, data = D)
> SSR.UR2 <- sum(resid(fit.UR2)^2)
> SSR.R2 <- sum(resid(fit.R2)^2)
> DFdenom <- df.residual(fit.UR2)
> F <- ((SSR.R2 - SSR.UR2)/2) / (SSR.UR2/DFdenom)
> F
[1] 0.08786129
> pf(F, 2, DFdenom, lower.tail = F)
[1] 0.9159762
So, what do we conclude?The three coefficients are statistically indistinguishable from each other,with the p-value of 0.916.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 58 / 140
![Page 253: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/253.jpg)
t Test vs. F TestConsider the hypothesis test of
H0 : β1 = β2 vs. H1 : β1 6= β2
What ways have we learned to conduct this test?
Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.
Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)
/(SSRR/(n − k − 1)) and do the F test.
It turns out these two tests give identical results. This is because
X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1
So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.
Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 59 / 140
![Page 254: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/254.jpg)
t Test vs. F TestConsider the hypothesis test of
H0 : β1 = β2 vs. H1 : β1 6= β2
What ways have we learned to conduct this test?
Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.
Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)
/(SSRR/(n − k − 1)) and do the F test.
It turns out these two tests give identical results. This is because
X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1
So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.
Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 59 / 140
![Page 255: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/255.jpg)
t Test vs. F TestConsider the hypothesis test of
H0 : β1 = β2 vs. H1 : β1 6= β2
What ways have we learned to conduct this test?
Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.
Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)
/(SSRR/(n − k − 1)) and do the F test.
It turns out these two tests give identical results. This is because
X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1
So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.
Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 59 / 140
![Page 256: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/256.jpg)
t Test vs. F TestConsider the hypothesis test of
H0 : β1 = β2 vs. H1 : β1 6= β2
What ways have we learned to conduct this test?
Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.
Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)
/(SSRR/(n − k − 1)) and do the F test.
It turns out these two tests give identical results. This is because
X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1
So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.
Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 59 / 140
![Page 257: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/257.jpg)
t Test vs. F TestConsider the hypothesis test of
H0 : β1 = β2 vs. H1 : β1 6= β2
What ways have we learned to conduct this test?
Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.
Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)
/(SSRR/(n − k − 1)) and do the F test.
It turns out these two tests give identical results. This is because
X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1
So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.
Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 59 / 140
![Page 258: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/258.jpg)
t Test vs. F TestConsider the hypothesis test of
H0 : β1 = β2 vs. H1 : β1 6= β2
What ways have we learned to conduct this test?
Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.
Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)
/(SSRR/(n − k − 1)) and do the F test.
It turns out these two tests give identical results. This is because
X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1
So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.
Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 59 / 140
![Page 259: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/259.jpg)
Some More Notes on F Tests
The F-value can also be calculated from R2:
F =(R2
UR − R2R)/q
(1− R2UR)/(n − k − 1)
F tests only work for testing nested models, i.e. the restricted model mustbe a special case of the unrestricted model.
For example F tests cannot be used to test
Y = β0 + β1X1
+ β2X2
+ β3X3 + u
againstY = β0 + β1X1 + β2X2 +
β3X3
+ u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 60 / 140
![Page 260: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/260.jpg)
Some More Notes on F Tests
The F-value can also be calculated from R2:
F =(R2
UR − R2R)/q
(1− R2UR)/(n − k − 1)
F tests only work for testing nested models, i.e. the restricted model mustbe a special case of the unrestricted model.
For example F tests cannot be used to test
Y = β0 + β1X1
+ β2X2
+ β3X3 + u
againstY = β0 + β1X1 + β2X2 +
β3X3
+ u
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 60 / 140
![Page 261: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/261.jpg)
Some More Notes on F Tests
Joint significance does not necessarily imply the significance of individualcoefficients, or vice versa:Finite-Sample Properties of OLS 45
Figure 1.5: t- versus F -Tests
An Example of a Test Statistic Whose Distribution Depends on XXXTo place the discussion of this section in a proper perspective, it may be usefulto note that there are some statistics whose conditional distribution depends on X.Consider the celebrated Durbin-Watson statistic:
!ni=2(ei ! ei!1)2!n
i=1 e2i.
The conditional distribution, and hence the critical values, of this statistic dependon X, but J. Durbin and G. S. Watson have shown that the critical values fallbetween two bounds (which depends on the sample size, the number of regres-sors, and whether the regressor includes a constant). Therefore, the critical valuesfor the unconditional distribution, too, fall between these bounds.
The statistic is designed for testing whether there is no serial correlation inthe error term. Thus, the null hypothesis is Assumption 1.4, while the maintainedhypothesis is the other assumptions of the classical regression model (including thestrict exogeneity assumption) and the normality assumption. But, as emphasizedin Section 1.1, the strict exogeneity assumption is not satisfied in time-series mod-els typically encountered in econometrics, and serial correlation is an issue thatarises only in time-series models. Thus, the Durbin-Watson statistic is not usefulin econometrics. More useful tests for serial correlation, which are all based onlarge-sample theory, will be covered in the next chapter.
Image Credit: Hayashi (2011) Econometrics
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 61 / 140
![Page 262: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/262.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 62 / 140
![Page 263: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/263.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 62 / 140
![Page 264: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/264.jpg)
Limitation of the F Formula
Consider the following null hypothesis:
H0 : β1 = β2 = β3 = 3
orH0 : β1 = 2β2 = 0.5β3 + 1
Can we test them using the F test?To compute the F value, we need to fit the restricted model. How?
Some restrictions are difficult to impose when fitting the model.
Even when we can, the procedure will be ad hoc and require somecreativity.
Is there a general solution?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 63 / 140
![Page 265: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/265.jpg)
Limitation of the F Formula
Consider the following null hypothesis:
H0 : β1 = β2 = β3 = 3
orH0 : β1 = 2β2 = 0.5β3 + 1
Can we test them using the F test?
To compute the F value, we need to fit the restricted model. How?
Some restrictions are difficult to impose when fitting the model.
Even when we can, the procedure will be ad hoc and require somecreativity.
Is there a general solution?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 63 / 140
![Page 266: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/266.jpg)
Limitation of the F Formula
Consider the following null hypothesis:
H0 : β1 = β2 = β3 = 3
orH0 : β1 = 2β2 = 0.5β3 + 1
Can we test them using the F test?To compute the F value, we need to fit the restricted model. How?
Some restrictions are difficult to impose when fitting the model.
Even when we can, the procedure will be ad hoc and require somecreativity.
Is there a general solution?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 63 / 140
![Page 267: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/267.jpg)
Limitation of the F Formula
Consider the following null hypothesis:
H0 : β1 = β2 = β3 = 3
orH0 : β1 = 2β2 = 0.5β3 + 1
Can we test them using the F test?To compute the F value, we need to fit the restricted model. How?
Some restrictions are difficult to impose when fitting the model.
Even when we can, the procedure will be ad hoc and require somecreativity.
Is there a general solution?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 63 / 140
![Page 268: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/268.jpg)
Limitation of the F Formula
Consider the following null hypothesis:
H0 : β1 = β2 = β3 = 3
orH0 : β1 = 2β2 = 0.5β3 + 1
Can we test them using the F test?To compute the F value, we need to fit the restricted model. How?
Some restrictions are difficult to impose when fitting the model.
Even when we can, the procedure will be ad hoc and require somecreativity.
Is there a general solution?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 63 / 140
![Page 269: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/269.jpg)
Limitation of the F Formula
Consider the following null hypothesis:
H0 : β1 = β2 = β3 = 3
orH0 : β1 = 2β2 = 0.5β3 + 1
Can we test them using the F test?To compute the F value, we need to fit the restricted model. How?
Some restrictions are difficult to impose when fitting the model.
Even when we can, the procedure will be ad hoc and require somecreativity.
Is there a general solution?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 63 / 140
![Page 270: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/270.jpg)
General Procedure for Testing Linear Hypotheses
Notice that any set of q linear hypotheses can be written as
Rβ = r
where
I R is a q × (k + 1) matrix of prespecified coefficients on β (hypothesismatrix)
I β = [β0 β1 · · · βk ]′
I r is a q × 1 vector of prespecified constants
Examples:
β1 = β2 = β3 = 3 ⇔
β1
β2
β3
=
333
⇔ 0 1 0 0
0 0 1 00 0 0 1
·β0
β1
β2
β3
=
333
β1 = 2β2 = 0.5β3+1 ⇔[
β1 − 2β2
β1 − 0.5β3
]=
[01
]⇔[
0 1 −2 00 1 0 −0.5
]·
β0
β1
β2
β3
=
[01
]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 64 / 140
![Page 271: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/271.jpg)
General Procedure for Testing Linear Hypotheses
Notice that any set of q linear hypotheses can be written as
Rβ = r
where
I R is a q × (k + 1) matrix of prespecified coefficients on β (hypothesismatrix)
I β = [β0 β1 · · · βk ]′
I r is a q × 1 vector of prespecified constants
Examples:
β1 = β2 = β3 = 3 ⇔
β1
β2
β3
=
333
⇔ 0 1 0 0
0 0 1 00 0 0 1
·β0
β1
β2
β3
=
333
β1 = 2β2 = 0.5β3+1 ⇔[
β1 − 2β2
β1 − 0.5β3
]=
[01
]⇔[
0 1 −2 00 1 0 −0.5
]·
β0
β1
β2
β3
=
[01
]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 64 / 140
![Page 272: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/272.jpg)
General Procedure for Testing Linear Hypotheses
Notice that any set of q linear hypotheses can be written as
Rβ = r
where
I R is a q × (k + 1) matrix of prespecified coefficients on β (hypothesismatrix)
I β = [β0 β1 · · · βk ]′
I r is a q × 1 vector of prespecified constants
Examples:
β1 = β2 = β3 = 3 ⇔
β1
β2
β3
=
333
⇔ 0 1 0 0
0 0 1 00 0 0 1
·β0
β1
β2
β3
=
333
β1 = 2β2 = 0.5β3+1 ⇔[
β1 − 2β2
β1 − 0.5β3
]=
[01
]⇔[
0 1 −2 00 1 0 −0.5
]·
β0
β1
β2
β3
=
[01
]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 64 / 140
![Page 273: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/273.jpg)
General Procedure for Testing Linear Hypotheses
Notice that any set of q linear hypotheses can be written as
Rβ = r
where
I R is a q × (k + 1) matrix of prespecified coefficients on β (hypothesismatrix)
I β = [β0 β1 · · · βk ]′
I r is a q × 1 vector of prespecified constants
Examples:
β1 = β2 = β3 = 3 ⇔
β1
β2
β3
=
333
⇔ 0 1 0 0
0 0 1 00 0 0 1
·β0
β1
β2
β3
=
333
β1 = 2β2 = 0.5β3+1 ⇔[
β1 − 2β2
β1 − 0.5β3
]=
[01
]⇔[
0 1 −2 00 1 0 −0.5
]·
β0
β1
β2
β3
=
[01
]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 64 / 140
![Page 274: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/274.jpg)
General Procedure for Testing Linear Hypotheses
Notice that any set of q linear hypotheses can be written as
Rβ = r
where
I R is a q × (k + 1) matrix of prespecified coefficients on β (hypothesismatrix)
I β = [β0 β1 · · · βk ]′
I r is a q × 1 vector of prespecified constants
Examples:
β1 = β2 = β3 = 3 ⇔
β1
β2
β3
=
333
⇔ 0 1 0 0
0 0 1 00 0 0 1
·β0
β1
β2
β3
=
333
β1 = 2β2 = 0.5β3+1 ⇔[
β1 − 2β2
β1 − 0.5β3
]=
[01
]⇔[
0 1 −2 00 1 0 −0.5
]·
β0
β1
β2
β3
=
[01
]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 64 / 140
![Page 275: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/275.jpg)
General Procedure for Testing Linear Hypotheses
Notice that any set of q linear hypotheses can be written as
Rβ = r
where
I R is a q × (k + 1) matrix of prespecified coefficients on β (hypothesismatrix)
I β = [β0 β1 · · · βk ]′
I r is a q × 1 vector of prespecified constants
Examples:
β1 = β2 = β3 = 3 ⇔
β1
β2
β3
=
333
⇔ 0 1 0 0
0 0 1 00 0 0 1
·β0
β1
β2
β3
=
333
β1 = 2β2 = 0.5β3+1 ⇔[
β1 − 2β2
β1 − 0.5β3
]=
[01
]⇔[
0 1 −2 00 1 0 −0.5
]·
β0
β1
β2
β3
=
[01
]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 64 / 140
![Page 276: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/276.jpg)
General Procedure for Testing Linear Hypotheses
Notice that any set of q linear hypotheses can be written as
Rβ = r
where
I R is a q × (k + 1) matrix of prespecified coefficients on β (hypothesismatrix)
I β = [β0 β1 · · · βk ]′
I r is a q × 1 vector of prespecified constants
Examples:
β1 = β2 = β3 = 3 ⇔
β1
β2
β3
=
333
⇔ 0 1 0 0
0 0 1 00 0 0 1
·β0
β1
β2
β3
=
333
β1 = 2β2 = 0.5β3+1 ⇔[
β1 − 2β2
β1 − 0.5β3
]=
[01
]⇔[
0 1 −2 00 1 0 −0.5
]·
β0
β1
β2
β3
=
[01
]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 64 / 140
![Page 277: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/277.jpg)
General Procedure for Testing Linear Hypotheses
Notice that any set of q linear hypotheses can be written as
Rβ = r
where
I R is a q × (k + 1) matrix of prespecified coefficients on β (hypothesismatrix)
I β = [β0 β1 · · · βk ]′
I r is a q × 1 vector of prespecified constants
Examples:
β1 = β2 = β3 = 3 ⇔
β1
β2
β3
=
333
⇔ 0 1 0 0
0 0 1 00 0 0 1
·β0
β1
β2
β3
=
333
β1 = 2β2 = 0.5β3+1 ⇔[
β1 − 2β2
β1 − 0.5β3
]=
[01
]⇔[
0 1 −2 00 1 0 −0.5
]·
β0
β1
β2
β3
=
[01
]
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 64 / 140
![Page 278: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/278.jpg)
Wald Statistic
Let’s consider testing H0 : Rβ = r, a set of q linear restrictions.
If H0 is true, Rβ − r should be zero except for sampling variability.
To formally evaluate the statistical significance of the deviation from zero,we must transform Rβ− r to a statistic that can be compared to a referencedistribution.
It turns out that the following Wald statistic can be used:
W =(
Rβ − r)′·[σ2R(X′X)−1R′
]−1 ·(
Rβ − r)
Looks complicated? Let’s figure out why this makes sense:
I The first and last components give the sum of squares of thecomponents of Rβ − r. This summarizes its deviation from zero.
I The middle component is the variance of Rβ− r. This standardizes thesum of squares to have variance one.
We know β is approximately normal ⇒ Rβ − r should also be normal=⇒ W should therefore be ... χ2 distributed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 65 / 140
![Page 279: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/279.jpg)
Wald Statistic
Let’s consider testing H0 : Rβ = r, a set of q linear restrictions.
If H0 is true, Rβ − r should be zero except for sampling variability.
To formally evaluate the statistical significance of the deviation from zero,we must transform Rβ− r to a statistic that can be compared to a referencedistribution.
It turns out that the following Wald statistic can be used:
W =(
Rβ − r)′·[σ2R(X′X)−1R′
]−1 ·(
Rβ − r)
Looks complicated? Let’s figure out why this makes sense:
I The first and last components give the sum of squares of thecomponents of Rβ − r. This summarizes its deviation from zero.
I The middle component is the variance of Rβ− r. This standardizes thesum of squares to have variance one.
We know β is approximately normal ⇒ Rβ − r should also be normal=⇒ W should therefore be ... χ2 distributed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 65 / 140
![Page 280: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/280.jpg)
Wald Statistic
Let’s consider testing H0 : Rβ = r, a set of q linear restrictions.
If H0 is true, Rβ − r should be zero except for sampling variability.
To formally evaluate the statistical significance of the deviation from zero,we must transform Rβ− r to a statistic that can be compared to a referencedistribution.
It turns out that the following Wald statistic can be used:
W =(
Rβ − r)′·[σ2R(X′X)−1R′
]−1 ·(
Rβ − r)
Looks complicated? Let’s figure out why this makes sense:
I The first and last components give the sum of squares of thecomponents of Rβ − r. This summarizes its deviation from zero.
I The middle component is the variance of Rβ− r. This standardizes thesum of squares to have variance one.
We know β is approximately normal ⇒ Rβ − r should also be normal=⇒ W should therefore be ... χ2 distributed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 65 / 140
![Page 281: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/281.jpg)
Wald Statistic
Let’s consider testing H0 : Rβ = r, a set of q linear restrictions.
If H0 is true, Rβ − r should be zero except for sampling variability.
To formally evaluate the statistical significance of the deviation from zero,we must transform Rβ− r to a statistic that can be compared to a referencedistribution.
It turns out that the following Wald statistic can be used:
W =(
Rβ − r)′·[σ2R(X′X)−1R′
]−1 ·(
Rβ − r)
Looks complicated? Let’s figure out why this makes sense:
I The first and last components give the sum of squares of thecomponents of Rβ − r. This summarizes its deviation from zero.
I The middle component is the variance of Rβ− r. This standardizes thesum of squares to have variance one.
We know β is approximately normal ⇒ Rβ − r should also be normal=⇒ W should therefore be ... χ2 distributed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 65 / 140
![Page 282: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/282.jpg)
Wald Statistic
Let’s consider testing H0 : Rβ = r, a set of q linear restrictions.
If H0 is true, Rβ − r should be zero except for sampling variability.
To formally evaluate the statistical significance of the deviation from zero,we must transform Rβ− r to a statistic that can be compared to a referencedistribution.
It turns out that the following Wald statistic can be used:
W =(
Rβ − r)′·[σ2R(X′X)−1R′
]−1 ·(
Rβ − r)
Looks complicated? Let’s figure out why this makes sense:
I The first and last components give the sum of squares of thecomponents of Rβ − r. This summarizes its deviation from zero.
I The middle component is the variance of Rβ− r. This standardizes thesum of squares to have variance one.
We know β is approximately normal ⇒ Rβ − r should also be normal=⇒ W should therefore be ... χ2 distributed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 65 / 140
![Page 283: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/283.jpg)
Wald Statistic
Let’s consider testing H0 : Rβ = r, a set of q linear restrictions.
If H0 is true, Rβ − r should be zero except for sampling variability.
To formally evaluate the statistical significance of the deviation from zero,we must transform Rβ− r to a statistic that can be compared to a referencedistribution.
It turns out that the following Wald statistic can be used:
W =(
Rβ − r)′·[σ2R(X′X)−1R′
]−1 ·(
Rβ − r)
Looks complicated? Let’s figure out why this makes sense:
I The first and last components give the sum of squares of thecomponents of Rβ − r. This summarizes its deviation from zero.
I The middle component is the variance of Rβ− r. This standardizes thesum of squares to have variance one.
We know β is approximately normal ⇒ Rβ − r should also be normal=⇒ W should therefore be ... χ2 distributed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 65 / 140
![Page 284: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/284.jpg)
Wald Statistic
Let’s consider testing H0 : Rβ = r, a set of q linear restrictions.
If H0 is true, Rβ − r should be zero except for sampling variability.
To formally evaluate the statistical significance of the deviation from zero,we must transform Rβ− r to a statistic that can be compared to a referencedistribution.
It turns out that the following Wald statistic can be used:
W =(
Rβ − r)′·[σ2R(X′X)−1R′
]−1 ·(
Rβ − r)
Looks complicated? Let’s figure out why this makes sense:
I The first and last components give the sum of squares of thecomponents of Rβ − r. This summarizes its deviation from zero.
I The middle component is the variance of Rβ− r. This standardizes thesum of squares to have variance one.
We know β is approximately normal ⇒ Rβ − r should also be normal=⇒ W should therefore be ... χ2 distributed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 65 / 140
![Page 285: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/285.jpg)
Wald Statistic
Let’s consider testing H0 : Rβ = r, a set of q linear restrictions.
If H0 is true, Rβ − r should be zero except for sampling variability.
To formally evaluate the statistical significance of the deviation from zero,we must transform Rβ− r to a statistic that can be compared to a referencedistribution.
It turns out that the following Wald statistic can be used:
W =(
Rβ − r)′·[σ2R(X′X)−1R′
]−1 ·(
Rβ − r)
Looks complicated? Let’s figure out why this makes sense:
I The first and last components give the sum of squares of thecomponents of Rβ − r. This summarizes its deviation from zero.
I The middle component is the variance of Rβ− r. This standardizes thesum of squares to have variance one.
We know β is approximately normal ⇒ Rβ − r should also be normal=⇒ W should therefore be ...
χ2 distributed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 65 / 140
![Page 286: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/286.jpg)
Wald Statistic
Let’s consider testing H0 : Rβ = r, a set of q linear restrictions.
If H0 is true, Rβ − r should be zero except for sampling variability.
To formally evaluate the statistical significance of the deviation from zero,we must transform Rβ− r to a statistic that can be compared to a referencedistribution.
It turns out that the following Wald statistic can be used:
W =(
Rβ − r)′·[σ2R(X′X)−1R′
]−1 ·(
Rβ − r)
Looks complicated? Let’s figure out why this makes sense:
I The first and last components give the sum of squares of thecomponents of Rβ − r. This summarizes its deviation from zero.
I The middle component is the variance of Rβ− r. This standardizes thesum of squares to have variance one.
We know β is approximately normal ⇒ Rβ − r should also be normal=⇒ W should therefore be ... χ2 distributed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 65 / 140
![Page 287: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/287.jpg)
Sampling Distribution of the Wald Statistic
Theorem (Large-Sample Distribution of the Wald Statistic)
Under Assumptions 1–5, as n→∞ the distribution of the Wald statistic approaches thechi square distribution with q degrees of freedom:
Wd→ χ2
q as n→∞
Theorem (Small-Sample Distribution of the Wald Statistic)
Under Assumptions 1–6, for any sample size n the Wald statistic divided by q has the Fdistribution with (q, n − k − 1) degrees of freedom:
W /q ∼ Fq,n−k−1
qFq,n−k−1d→ χ2
q as n→∞, so the difference disappears when n large.> pf(3.1, 2, 500,lower.tail=F) [1] 0.04591619
> pchisq(2*3.1, 2,lower.tail=F) [1] 0.0450492
> pf(3.1, 2, 50000,lower.tail=F) [1] 0.04505786
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 66 / 140
![Page 288: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/288.jpg)
Sampling Distribution of the Wald Statistic
Theorem (Large-Sample Distribution of the Wald Statistic)
Under Assumptions 1–5, as n→∞ the distribution of the Wald statistic approaches thechi square distribution with q degrees of freedom:
Wd→ χ2
q as n→∞
Theorem (Small-Sample Distribution of the Wald Statistic)
Under Assumptions 1–6, for any sample size n the Wald statistic divided by q has the Fdistribution with (q, n − k − 1) degrees of freedom:
W /q ∼ Fq,n−k−1
qFq,n−k−1d→ χ2
q as n→∞, so the difference disappears when n large.> pf(3.1, 2, 500,lower.tail=F) [1] 0.04591619
> pchisq(2*3.1, 2,lower.tail=F) [1] 0.0450492
> pf(3.1, 2, 50000,lower.tail=F) [1] 0.04505786
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 66 / 140
![Page 289: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/289.jpg)
Sampling Distribution of the Wald Statistic
Theorem (Large-Sample Distribution of the Wald Statistic)
Under Assumptions 1–5, as n→∞ the distribution of the Wald statistic approaches thechi square distribution with q degrees of freedom:
Wd→ χ2
q as n→∞
Theorem (Small-Sample Distribution of the Wald Statistic)
Under Assumptions 1–6, for any sample size n the Wald statistic divided by q has the Fdistribution with (q, n − k − 1) degrees of freedom:
W /q ∼ Fq,n−k−1
qFq,n−k−1d→ χ2
q as n→∞, so the difference disappears when n large.> pf(3.1, 2, 500,lower.tail=F) [1] 0.04591619
> pchisq(2*3.1, 2,lower.tail=F) [1] 0.0450492
> pf(3.1, 2, 50000,lower.tail=F) [1] 0.04505786
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 66 / 140
![Page 290: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/290.jpg)
Sampling Distribution of the Wald Statistic
Theorem (Large-Sample Distribution of the Wald Statistic)
Under Assumptions 1–5, as n→∞ the distribution of the Wald statistic approaches thechi square distribution with q degrees of freedom:
Wd→ χ2
q as n→∞
Theorem (Small-Sample Distribution of the Wald Statistic)
Under Assumptions 1–6, for any sample size n the Wald statistic divided by q has the Fdistribution with (q, n − k − 1) degrees of freedom:
W /q ∼ Fq,n−k−1
qFq,n−k−1d→ χ2
q as n→∞, so the difference disappears when n large.> pf(3.1, 2, 500,lower.tail=F) [1] 0.04591619
> pchisq(2*3.1, 2,lower.tail=F) [1] 0.0450492
> pf(3.1, 2, 50000,lower.tail=F) [1] 0.04505786
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 66 / 140
![Page 291: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/291.jpg)
Testing General Linear Hypotheses in R
In R, the linearHypothesis() function in the car package does the Wald testfor general linear hypotheses.
R Code> fit.UR2 <- lm(REALGDPCAP ~ Asia + LatAmerica + Transit + Oecd, data = D)
> R <- matrix(c(0,1,-1,0,0, 0,1,0,-1,0), nrow = 2, byrow = T)
> r <- c(0,0)
> linearHypothesis(fit.UR2, R, r)
Linear hypothesis test
Hypothesis:
Asia - LatAmerica = 0
Asia - Transit = 0
Model 1: restricted model
Model 2: REALGDPCAP ~ Asia + LatAmerica + Transit + Oecd
Res.Df RSS Df Sum of Sq F Pr(>F)
1 82 738141635
2 80 736523836 2 1617798 0.0879 0.916
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 67 / 140
![Page 292: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/292.jpg)
Conclusion
Multiple regression is much like the regression formulations we havealready seen.
We showed how to estimate the coefficients and get the variancecovariance matrix.
Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.
Appendix contains material on:I Derivation for the estimator (+ some of the math for this)I Proof of consistency
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 68 / 140
![Page 293: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/293.jpg)
Conclusion
Multiple regression is much like the regression formulations we havealready seen.
We showed how to estimate the coefficients and get the variancecovariance matrix.
Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.
Appendix contains material on:I Derivation for the estimator (+ some of the math for this)I Proof of consistency
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 68 / 140
![Page 294: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/294.jpg)
Conclusion
Multiple regression is much like the regression formulations we havealready seen.
We showed how to estimate the coefficients and get the variancecovariance matrix.
Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.
Appendix contains material on:I Derivation for the estimator (+ some of the math for this)I Proof of consistency
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 68 / 140
![Page 295: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/295.jpg)
Conclusion
Multiple regression is much like the regression formulations we havealready seen.
We showed how to estimate the coefficients and get the variancecovariance matrix.
Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.
Appendix contains material on:I Derivation for the estimator (+ some of the math for this)I Proof of consistency
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 68 / 140
![Page 296: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/296.jpg)
Conclusion
Multiple regression is much like the regression formulations we havealready seen.
We showed how to estimate the coefficients and get the variancecovariance matrix.
Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.
Appendix contains material on:I Derivation for the estimator (+ some of the math for this)I Proof of consistency
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 68 / 140
![Page 297: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/297.jpg)
Fun Without Weights
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 69 / 140
![Page 298: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/298.jpg)
Improper Linear Models
Proper linear model is one where predictor variables are givenoptimized weights in some way (for example through regression)
Meehl (1954) Clinical Versus Statistical Prediction: A TheoreticalAnalysis and Review of the Evidence argued that proper linear modelsoutperform clinical intuition in many areas.
Dawes argues that even improper linear models (those where weightsare set by hand or set to be equal), outperform clinical intuition.
Equal weight models are argued to be quite robust for thesepredictions
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 70 / 140
![Page 299: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/299.jpg)
Improper Linear Models
Proper linear model is one where predictor variables are givenoptimized weights in some way (for example through regression)
Meehl (1954) Clinical Versus Statistical Prediction: A TheoreticalAnalysis and Review of the Evidence argued that proper linear modelsoutperform clinical intuition in many areas.
Dawes argues that even improper linear models (those where weightsare set by hand or set to be equal), outperform clinical intuition.
Equal weight models are argued to be quite robust for thesepredictions
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 70 / 140
![Page 300: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/300.jpg)
Improper Linear Models
Proper linear model is one where predictor variables are givenoptimized weights in some way (for example through regression)
Meehl (1954) Clinical Versus Statistical Prediction: A TheoreticalAnalysis and Review of the Evidence argued that proper linear modelsoutperform clinical intuition in many areas.
Dawes argues that even improper linear models (those where weightsare set by hand or set to be equal), outperform clinical intuition.
Equal weight models are argued to be quite robust for thesepredictions
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 70 / 140
![Page 301: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/301.jpg)
Improper Linear Models
Proper linear model is one where predictor variables are givenoptimized weights in some way (for example through regression)
Meehl (1954) Clinical Versus Statistical Prediction: A TheoreticalAnalysis and Review of the Evidence argued that proper linear modelsoutperform clinical intuition in many areas.
Dawes argues that even improper linear models (those where weightsare set by hand or set to be equal), outperform clinical intuition.
Equal weight models are argued to be quite robust for thesepredictions
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 70 / 140
![Page 302: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/302.jpg)
Example: Graduate Admissions
Faculty rated all students in the psych department at University ofOregon
Ratings predicted from a proper linear model of student GRE scores,undergrad GPA and selectivity of student’s undergraduate institution.Cross-validated correlation was .38
Correlation of faculty ratings with average rating of admissionscommittee was .19
Standardized and equally weighted improper linear model, correlatedat .48
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 71 / 140
![Page 303: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/303.jpg)
Example: Graduate Admissions
Faculty rated all students in the psych department at University ofOregon
Ratings predicted from a proper linear model of student GRE scores,undergrad GPA and selectivity of student’s undergraduate institution.Cross-validated correlation was .38
Correlation of faculty ratings with average rating of admissionscommittee was .19
Standardized and equally weighted improper linear model, correlatedat .48
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 71 / 140
![Page 304: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/304.jpg)
Example: Graduate Admissions
Faculty rated all students in the psych department at University ofOregon
Ratings predicted from a proper linear model of student GRE scores,undergrad GPA and selectivity of student’s undergraduate institution.Cross-validated correlation was .38
Correlation of faculty ratings with average rating of admissionscommittee was .19
Standardized and equally weighted improper linear model, correlatedat .48
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 71 / 140
![Page 305: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/305.jpg)
Example: Graduate Admissions
Faculty rated all students in the psych department at University ofOregon
Ratings predicted from a proper linear model of student GRE scores,undergrad GPA and selectivity of student’s undergraduate institution.Cross-validated correlation was .38
Correlation of faculty ratings with average rating of admissionscommittee was .19
Standardized and equally weighted improper linear model, correlatedat .48
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 71 / 140
![Page 306: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/306.jpg)
Other Examples
Self-assessed measures of marital happiness: modeled with improperlinear model of (rate of lovemaking - rate of arguments):correlation of .40
Einhorn (1972) study of doctors coding biopsies of patients withHodgkin’s disease and then rated severity. Their rating of severity wasessentially uncorrelated with survival times, but the variables theycoded predicted outcomes using a regression model.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 72 / 140
![Page 307: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/307.jpg)
Other Examples
Self-assessed measures of marital happiness: modeled with improperlinear model of (rate of lovemaking - rate of arguments):correlation of .40
Einhorn (1972) study of doctors coding biopsies of patients withHodgkin’s disease and then rated severity. Their rating of severity wasessentially uncorrelated with survival times, but the variables theycoded predicted outcomes using a regression model.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 72 / 140
![Page 308: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/308.jpg)
Other Examples
Self-assessed measures of marital happiness: modeled with improperlinear model of (rate of lovemaking - rate of arguments):correlation of .40
Einhorn (1972) study of doctors coding biopsies of patients withHodgkin’s disease and then rated severity. Their rating of severity wasessentially uncorrelated with survival times, but the variables theycoded predicted outcomes using a regression model.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 72 / 140
![Page 309: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/309.jpg)
Other Examples
Column descriptions:
C1) average of human judges
C2) model based on human judges
C3) randomly chosen weights preserving signs
C4) equal weighting
C5) cross-validated weights
C6) unattainable optimal linear model
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 73 / 140
![Page 310: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/310.jpg)
The Argument
“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)
The choice of variables is extremely important for prediction!
This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.
People are good at picking out relevant information, but terrible atintegrating it.
The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.
Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 74 / 140
![Page 311: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/311.jpg)
The Argument
“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)
The choice of variables is extremely important for prediction!
This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.
People are good at picking out relevant information, but terrible atintegrating it.
The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.
Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 74 / 140
![Page 312: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/312.jpg)
The Argument
“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)
The choice of variables is extremely important for prediction!
This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.
People are good at picking out relevant information, but terrible atintegrating it.
The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.
Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 74 / 140
![Page 313: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/313.jpg)
The Argument
“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)
The choice of variables is extremely important for prediction!
This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.
People are good at picking out relevant information, but terrible atintegrating it.
The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.
Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 74 / 140
![Page 314: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/314.jpg)
The Argument
“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)
The choice of variables is extremely important for prediction!
This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.
People are good at picking out relevant information, but terrible atintegrating it.
The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.
Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 74 / 140
![Page 315: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/315.jpg)
The Argument
“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)
The choice of variables is extremely important for prediction!
This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.
People are good at picking out relevant information, but terrible atintegrating it.
The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.
Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 74 / 140
![Page 316: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/316.jpg)
My Thoughts on the Argument
Particularly in prediction, looking for the true or right model can bequixotic
The broader research project suggests that a big part of whatquantitative models are doing predictively, is focusing human talent inthe right place.
This all applies because predictors well chosen and the sample size issmall (so the weight optimization isn’t great)
It is a fascinating paper!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 75 / 140
![Page 317: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/317.jpg)
My Thoughts on the Argument
Particularly in prediction, looking for the true or right model can bequixotic
The broader research project suggests that a big part of whatquantitative models are doing predictively, is focusing human talent inthe right place.
This all applies because predictors well chosen and the sample size issmall (so the weight optimization isn’t great)
It is a fascinating paper!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 75 / 140
![Page 318: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/318.jpg)
My Thoughts on the Argument
Particularly in prediction, looking for the true or right model can bequixotic
The broader research project suggests that a big part of whatquantitative models are doing predictively, is focusing human talent inthe right place.
This all applies because predictors well chosen and the sample size issmall (so the weight optimization isn’t great)
It is a fascinating paper!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 75 / 140
![Page 319: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/319.jpg)
My Thoughts on the Argument
Particularly in prediction, looking for the true or right model can bequixotic
The broader research project suggests that a big part of whatquantitative models are doing predictively, is focusing human talent inthe right place.
This all applies because predictors well chosen and the sample size issmall (so the weight optimization isn’t great)
It is a fascinating paper!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 75 / 140
![Page 320: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/320.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 76 / 140
![Page 321: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/321.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 76 / 140
![Page 322: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/322.jpg)
Gradient
Let v = v(u) be a scalar-valued function Rn → R1 where u is a (n × 1) column
vector. For example: v(u) = c′u where c =
013
and u =
u1
u2
u3
Definition (Gradient)
We can define the column vector of partial derivatives
∂v(u)
∂u=
∂v/∂u1
∂v/∂u2
...∂v/∂un
This vector of partial derivatives is called the gradient.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 77 / 140
![Page 323: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/323.jpg)
Gradient
Let v = v(u) be a scalar-valued function Rn → R1 where u is a (n × 1) column
vector.
For example: v(u) = c′u where c =
013
and u =
u1
u2
u3
Definition (Gradient)
We can define the column vector of partial derivatives
∂v(u)
∂u=
∂v/∂u1
∂v/∂u2
...∂v/∂un
This vector of partial derivatives is called the gradient.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 77 / 140
![Page 324: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/324.jpg)
Gradient
Let v = v(u) be a scalar-valued function Rn → R1 where u is a (n × 1) column
vector. For example: v(u) = c′u where c =
013
and u =
u1
u2
u3
Definition (Gradient)
We can define the column vector of partial derivatives
∂v(u)
∂u=
∂v/∂u1
∂v/∂u2
...∂v/∂un
This vector of partial derivatives is called the gradient.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 77 / 140
![Page 325: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/325.jpg)
Gradient
Let v = v(u) be a scalar-valued function Rn → R1 where u is a (n × 1) column
vector. For example: v(u) = c′u where c =
013
and u =
u1
u2
u3
Definition (Gradient)
We can define the column vector of partial derivatives
∂v(u)
∂u=
∂v/∂u1
∂v/∂u2
...∂v/∂un
This vector of partial derivatives is called the gradient.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 77 / 140
![Page 326: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/326.jpg)
Vector Derivative Rule I (linear functions)
Theorem (differentiation of linear functions)
Given a linear function v(u) = c′u of an (n × 1) vector u, the derivative of v(u)w.r.t. u is given by
∂v
∂u= c
This also works when c is a matrix and therefore v is a vector-valued function.
For example, let v(u) = c′u where c =
013
and u =
u1
u2
u3
, then
v = c′u = 0 · u1 + 1 · u2 + 3 · u3
and∂v
∂u=
∂v/∂u1
∂v/∂u2
∂v/∂u3
=
013
= c
Hence, ∂v
∂u= c
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 78 / 140
![Page 327: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/327.jpg)
Vector Derivative Rule I (linear functions)
Theorem (differentiation of linear functions)
Given a linear function v(u) = c′u of an (n × 1) vector u, the derivative of v(u)w.r.t. u is given by
∂v
∂u= c
This also works when c is a matrix and therefore v is a vector-valued function.
For example, let v(u) = c′u where c =
013
and u =
u1
u2
u3
, then
v = c′u = 0 · u1 + 1 · u2 + 3 · u3
and∂v
∂u=
∂v/∂u1
∂v/∂u2
∂v/∂u3
=
013
= c
Hence, ∂v
∂u= c
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 78 / 140
![Page 328: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/328.jpg)
Vector Derivative Rule I (linear functions)
Theorem (differentiation of linear functions)
Given a linear function v(u) = c′u of an (n × 1) vector u, the derivative of v(u)w.r.t. u is given by
∂v
∂u= c
This also works when c is a matrix and therefore v is a vector-valued function.
For example, let v(u) = c′u where c =
013
and u =
u1
u2
u3
, then
v = c′u = 0 · u1 + 1 · u2 + 3 · u3
and∂v
∂u=
∂v/∂u1
∂v/∂u2
∂v/∂u3
=
013
= c
Hence, ∂v
∂u= c
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 78 / 140
![Page 329: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/329.jpg)
Vector Derivative Rule I (linear functions)
Theorem (differentiation of linear functions)
Given a linear function v(u) = c′u of an (n × 1) vector u, the derivative of v(u)w.r.t. u is given by
∂v
∂u= c
This also works when c is a matrix and therefore v is a vector-valued function.
For example, let v(u) = c′u where c =
013
and u =
u1
u2
u3
, then
v = c′u = 0 · u1 + 1 · u2 + 3 · u3
and∂v
∂u=
∂v/∂u1
∂v/∂u2
∂v/∂u3
=
013
= c
Hence, ∂v
∂u= c
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 78 / 140
![Page 330: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/330.jpg)
Vector Derivative Rule I (linear functions)
Theorem (differentiation of linear functions)
Given a linear function v(u) = c′u of an (n × 1) vector u, the derivative of v(u)w.r.t. u is given by
∂v
∂u= c
This also works when c is a matrix and therefore v is a vector-valued function.
For example, let v(u) = c′u where c =
013
and u =
u1
u2
u3
, then
v = c′u = 0 · u1 + 1 · u2 + 3 · u3
and∂v
∂u=
∂v/∂u1
∂v/∂u2
∂v/∂u3
=
013
= c
Hence, ∂v
∂u= c
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 78 / 140
![Page 331: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/331.jpg)
Vector Derivative Rule I (linear functions)
Theorem (differentiation of linear functions)
Given a linear function v(u) = c′u of an (n × 1) vector u, the derivative of v(u)w.r.t. u is given by
∂v
∂u= c
This also works when c is a matrix and therefore v is a vector-valued function.
For example, let v(u) = c′u where c =
013
and u =
u1
u2
u3
, then
v = c′u = 0 · u1 + 1 · u2 + 3 · u3
and∂v
∂u=
∂v/∂u1
∂v/∂u2
∂v/∂u3
=
013
= c
Hence, ∂v
∂u= c
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 78 / 140
![Page 332: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/332.jpg)
Vector Derivative Rule I (linear functions)
Theorem (differentiation of linear functions)
Given a linear function v(u) = c′u of an (n × 1) vector u, the derivative of v(u)w.r.t. u is given by
∂v
∂u= c
This also works when c is a matrix and therefore v is a vector-valued function.
For example, let v(u) = c′u where c =
013
and u =
u1
u2
u3
, then
v = c′u = 0 · u1 + 1 · u2 + 3 · u3
and∂v
∂u=
∂v/∂u1
∂v/∂u2
∂v/∂u3
=
013
= c
Hence, ∂v
∂u= c
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 78 / 140
![Page 333: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/333.jpg)
Vector Derivative Rule II (quadratic form)
Theorem (quadratic form)
Given a (n × n) symmetric matrix A and a scalar-valued function v(u) = u′Au of(n × 1) vector u, we have
∂v
∂u= A′u + Au = 2Au
For example, let A =
[3 11 5
]and u =
[u1
u2
].Then v(u) = u′Au is equal to
v = [3 · u1 + u2, u1 + 5 · u2]
[u1
u2
]= 3u2
1 + 2u1u2 + 5u22
and
∂v
∂u=
[∂v/∂u1
∂v/∂u2
]=
[6u1 + 2u2
2u1 + 10u2
]= 2 ·
[3u1 + 1u2
1u1 + 5u2
]= 2Au
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 79 / 140
![Page 334: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/334.jpg)
Vector Derivative Rule II (quadratic form)
Theorem (quadratic form)
Given a (n × n) symmetric matrix A and a scalar-valued function v(u) = u′Au of(n × 1) vector u, we have
∂v
∂u= A′u + Au = 2Au
For example, let A =
[3 11 5
]and u =
[u1
u2
].
Then v(u) = u′Au is equal to
v = [3 · u1 + u2, u1 + 5 · u2]
[u1
u2
]= 3u2
1 + 2u1u2 + 5u22
and
∂v
∂u=
[∂v/∂u1
∂v/∂u2
]=
[6u1 + 2u2
2u1 + 10u2
]= 2 ·
[3u1 + 1u2
1u1 + 5u2
]= 2Au
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 79 / 140
![Page 335: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/335.jpg)
Vector Derivative Rule II (quadratic form)
Theorem (quadratic form)
Given a (n × n) symmetric matrix A and a scalar-valued function v(u) = u′Au of(n × 1) vector u, we have
∂v
∂u= A′u + Au = 2Au
For example, let A =
[3 11 5
]and u =
[u1
u2
].Then v(u) = u′Au is equal to
v = [3 · u1 + u2, u1 + 5 · u2]
[u1
u2
]= 3u2
1 + 2u1u2 + 5u22
and
∂v
∂u=
[∂v/∂u1
∂v/∂u2
]=
[6u1 + 2u2
2u1 + 10u2
]= 2 ·
[3u1 + 1u2
1u1 + 5u2
]= 2Au
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 79 / 140
![Page 336: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/336.jpg)
Vector Derivative Rule II (quadratic form)
Theorem (quadratic form)
Given a (n × n) symmetric matrix A and a scalar-valued function v(u) = u′Au of(n × 1) vector u, we have
∂v
∂u= A′u + Au = 2Au
For example, let A =
[3 11 5
]and u =
[u1
u2
].Then v(u) = u′Au is equal to
v = [3 · u1 + u2, u1 + 5 · u2]
[u1
u2
]= 3u2
1 + 2u1u2 + 5u22
and
∂v
∂u=
[∂v/∂u1
∂v/∂u2
]=
[6u1 + 2u2
2u1 + 10u2
]= 2 ·
[3u1 + 1u2
1u1 + 5u2
]= 2Au
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 79 / 140
![Page 337: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/337.jpg)
Vector Derivative Rule II (quadratic form)
Theorem (quadratic form)
Given a (n × n) symmetric matrix A and a scalar-valued function v(u) = u′Au of(n × 1) vector u, we have
∂v
∂u= A′u + Au = 2Au
For example, let A =
[3 11 5
]and u =
[u1
u2
].Then v(u) = u′Au is equal to
v = [3 · u1 + u2, u1 + 5 · u2]
[u1
u2
]= 3u2
1 + 2u1u2 + 5u22
and
∂v
∂u=
[∂v/∂u1
∂v/∂u2
]=
[6u1 + 2u2
2u1 + 10u2
]= 2 ·
[3u1 + 1u2
1u1 + 5u2
]= 2Au
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 79 / 140
![Page 338: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/338.jpg)
Vector Derivative Rule II (quadratic form)
Theorem (quadratic form)
Given a (n × n) symmetric matrix A and a scalar-valued function v(u) = u′Au of(n × 1) vector u, we have
∂v
∂u= A′u + Au = 2Au
For example, let A =
[3 11 5
]and u =
[u1
u2
].Then v(u) = u′Au is equal to
v = [3 · u1 + u2, u1 + 5 · u2]
[u1
u2
]= 3u2
1 + 2u1u2 + 5u22
and
∂v
∂u=
[∂v/∂u1
∂v/∂u2
]=
[6u1 + 2u2
2u1 + 10u2
]= 2 ·
[3u1 + 1u2
1u1 + 5u2
]= 2Au
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 79 / 140
![Page 339: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/339.jpg)
Vector Derivative Rule II (quadratic form)
Theorem (quadratic form)
Given a (n × n) symmetric matrix A and a scalar-valued function v(u) = u′Au of(n × 1) vector u, we have
∂v
∂u= A′u + Au = 2Au
For example, let A =
[3 11 5
]and u =
[u1
u2
].Then v(u) = u′Au is equal to
v = [3 · u1 + u2, u1 + 5 · u2]
[u1
u2
]= 3u2
1 + 2u1u2 + 5u22
and
∂v
∂u=
[∂v/∂u1
∂v/∂u2
]=
[6u1 + 2u2
2u1 + 10u2
]= 2 ·
[3u1 + 1u2
1u1 + 5u2
]= 2Au
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 79 / 140
![Page 340: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/340.jpg)
Vector Derivative Rule II (quadratic form)
Theorem (quadratic form)
Given a (n × n) symmetric matrix A and a scalar-valued function v(u) = u′Au of(n × 1) vector u, we have
∂v
∂u= A′u + Au = 2Au
For example, let A =
[3 11 5
]and u =
[u1
u2
].Then v(u) = u′Au is equal to
v = [3 · u1 + u2, u1 + 5 · u2]
[u1
u2
]= 3u2
1 + 2u1u2 + 5u22
and
∂v
∂u=
[∂v/∂u1
∂v/∂u2
]=
[6u1 + 2u2
2u1 + 10u2
]= 2 ·
[3u1 + 1u2
1u1 + 5u2
]= 2Au
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 79 / 140
![Page 341: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/341.jpg)
Hessian
Suppose v is a scalar-valued function v = f (u) of a (k + 1)× 1 columnvector u =
[u1 u2 · · · uk+1
]′Definition (Hessian)
The (k + 1)× (k + 1) matrix of second-order partial derivatives ofv = f (u) is called the Hessian matrix and denoted
∂v2
∂u∂u′=
∂v2
∂u1∂u1· · · ∂v2
∂u1∂uk+1...
. . ....
∂v2
∂uk+1∂u1· · · ∂v2
∂uk+1∂uk+1
Note: The Hessian is symmetric.
The above rules are used to derive the optimal estimators in the appendixslides.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 80 / 140
![Page 342: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/342.jpg)
Hessian
Suppose v is a scalar-valued function v = f (u) of a (k + 1)× 1 columnvector u =
[u1 u2 · · · uk+1
]′
Definition (Hessian)
The (k + 1)× (k + 1) matrix of second-order partial derivatives ofv = f (u) is called the Hessian matrix and denoted
∂v2
∂u∂u′=
∂v2
∂u1∂u1· · · ∂v2
∂u1∂uk+1...
. . ....
∂v2
∂uk+1∂u1· · · ∂v2
∂uk+1∂uk+1
Note: The Hessian is symmetric.
The above rules are used to derive the optimal estimators in the appendixslides.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 80 / 140
![Page 343: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/343.jpg)
Hessian
Suppose v is a scalar-valued function v = f (u) of a (k + 1)× 1 columnvector u =
[u1 u2 · · · uk+1
]′Definition (Hessian)
The (k + 1)× (k + 1) matrix of second-order partial derivatives ofv = f (u) is called the Hessian matrix and denoted
∂v2
∂u∂u′=
∂v2
∂u1∂u1· · · ∂v2
∂u1∂uk+1...
. . ....
∂v2
∂uk+1∂u1· · · ∂v2
∂uk+1∂uk+1
Note: The Hessian is symmetric.
The above rules are used to derive the optimal estimators in the appendixslides.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 80 / 140
![Page 344: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/344.jpg)
Hessian
Suppose v is a scalar-valued function v = f (u) of a (k + 1)× 1 columnvector u =
[u1 u2 · · · uk+1
]′Definition (Hessian)
The (k + 1)× (k + 1) matrix of second-order partial derivatives ofv = f (u) is called the Hessian matrix and denoted
∂v2
∂u∂u′=
∂v2
∂u1∂u1· · · ∂v2
∂u1∂uk+1...
. . ....
∂v2
∂uk+1∂u1· · · ∂v2
∂uk+1∂uk+1
Note: The Hessian is symmetric.
The above rules are used to derive the optimal estimators in the appendixslides.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 80 / 140
![Page 345: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/345.jpg)
Hessian
Suppose v is a scalar-valued function v = f (u) of a (k + 1)× 1 columnvector u =
[u1 u2 · · · uk+1
]′Definition (Hessian)
The (k + 1)× (k + 1) matrix of second-order partial derivatives ofv = f (u) is called the Hessian matrix and denoted
∂v2
∂u∂u′=
∂v2
∂u1∂u1· · · ∂v2
∂u1∂uk+1...
. . ....
∂v2
∂uk+1∂u1· · · ∂v2
∂uk+1∂uk+1
Note: The Hessian is symmetric.
The above rules are used to derive the optimal estimators in the appendixslides.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 80 / 140
![Page 346: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/346.jpg)
Derivatives with respect to β
S(β,X, y) = (y − Xβ)′(y − Xβ)
= y′y − 2y′Xβ + β′X′Xβ
∂S(β,X, y)
∂β=
− 2X′y + 2X′Xβ
The first term does not contain β
The second term is an example of rule I from the derivative section
The third term is an example of rule II from the derivative section
And while we are at it the Hessian is:
∂2S(β,X, y)
∂β∂β′ = 2X′X
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 81 / 140
![Page 347: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/347.jpg)
Derivatives with respect to β
S(β,X, y) = (y − Xβ)′(y − Xβ)
= y′y − 2y′Xβ + β′X′Xβ
∂S(β,X, y)
∂β= − 2X′y + 2X′Xβ
The first term does not contain β
The second term is an example of rule I from the derivative section
The third term is an example of rule II from the derivative section
And while we are at it the Hessian is:
∂2S(β,X, y)
∂β∂β′ = 2X′X
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 81 / 140
![Page 348: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/348.jpg)
Derivatives with respect to β
S(β,X, y) = (y − Xβ)′(y − Xβ)
= y′y − 2y′Xβ + β′X′Xβ
∂S(β,X, y)
∂β= − 2X′y + 2X′Xβ
The first term does not contain β
The second term is an example of rule I from the derivative section
The third term is an example of rule II from the derivative section
And while we are at it the Hessian is:
∂2S(β,X, y)
∂β∂β′ = 2X′X
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 81 / 140
![Page 349: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/349.jpg)
Solving for β
∂S(β,X, y)
∂β= −2X′y + 2X′Xβ
Setting the vector of partial derivatives equal to zero and substituting βfor β , we can solve for the OLS estimator.
0 = −2X′y + 2X′Xβ
− 2X′Xβ = −2X′y
X′Xβ = X′y(X′X
)−1X′Xβ =
(X′X
)−1X′y
Iβ =(X′X
)−1X′y
β =(X′X
)−1X′y
Note that we implicitly assumed that X′X is invertible.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 82 / 140
![Page 350: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/350.jpg)
Solving for β
∂S(β,X, y)
∂β= −2X′y + 2X′Xβ
Setting the vector of partial derivatives equal to zero and substituting βfor β , we can solve for the OLS estimator.
0 = −2X′y + 2X′Xβ
− 2X′Xβ = −2X′y
X′Xβ = X′y(X′X
)−1X′Xβ =
(X′X
)−1X′y
Iβ =(X′X
)−1X′y
β =(X′X
)−1X′y
Note that we implicitly assumed that X′X is invertible.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 82 / 140
![Page 351: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/351.jpg)
Solving for β
∂S(β,X, y)
∂β= −2X′y + 2X′Xβ
Setting the vector of partial derivatives equal to zero and substituting βfor β , we can solve for the OLS estimator.
0 = −2X′y + 2X′Xβ
− 2X′Xβ = −2X′y
X′Xβ = X′y(X′X
)−1X′Xβ =
(X′X
)−1X′y
Iβ =(X′X
)−1X′y
β =(X′X
)−1X′y
Note that we implicitly assumed that X′X is invertible.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 82 / 140
![Page 352: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/352.jpg)
Solving for β
∂S(β,X, y)
∂β= −2X′y + 2X′Xβ
Setting the vector of partial derivatives equal to zero and substituting βfor β , we can solve for the OLS estimator.
0 = −2X′y + 2X′Xβ
− 2X′Xβ = −2X′y
X′Xβ = X′y
(X′X
)−1X′Xβ =
(X′X
)−1X′y
Iβ =(X′X
)−1X′y
β =(X′X
)−1X′y
Note that we implicitly assumed that X′X is invertible.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 82 / 140
![Page 353: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/353.jpg)
Solving for β
∂S(β,X, y)
∂β= −2X′y + 2X′Xβ
Setting the vector of partial derivatives equal to zero and substituting βfor β , we can solve for the OLS estimator.
0 = −2X′y + 2X′Xβ
− 2X′Xβ = −2X′y
X′Xβ = X′y(X′X
)−1X′Xβ =
(X′X
)−1X′y
Iβ =(X′X
)−1X′y
β =(X′X
)−1X′y
Note that we implicitly assumed that X′X is invertible.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 82 / 140
![Page 354: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/354.jpg)
Solving for β
∂S(β,X, y)
∂β= −2X′y + 2X′Xβ
Setting the vector of partial derivatives equal to zero and substituting βfor β , we can solve for the OLS estimator.
0 = −2X′y + 2X′Xβ
− 2X′Xβ = −2X′y
X′Xβ = X′y(X′X
)−1X′Xβ =
(X′X
)−1X′y
Iβ =(X′X
)−1X′y
β =(X′X
)−1X′y
Note that we implicitly assumed that X′X is invertible.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 82 / 140
![Page 355: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/355.jpg)
Solving for β
∂S(β,X, y)
∂β= −2X′y + 2X′Xβ
Setting the vector of partial derivatives equal to zero and substituting βfor β , we can solve for the OLS estimator.
0 = −2X′y + 2X′Xβ
− 2X′Xβ = −2X′y
X′Xβ = X′y(X′X
)−1X′Xβ =
(X′X
)−1X′y
Iβ =(X′X
)−1X′y
β =(X′X
)−1X′y
Note that we implicitly assumed that X′X is invertible.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 82 / 140
![Page 356: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/356.jpg)
Solving for β
∂S(β,X, y)
∂β= −2X′y + 2X′Xβ
Setting the vector of partial derivatives equal to zero and substituting βfor β , we can solve for the OLS estimator.
0 = −2X′y + 2X′Xβ
− 2X′Xβ = −2X′y
X′Xβ = X′y(X′X
)−1X′Xβ =
(X′X
)−1X′y
Iβ =(X′X
)−1X′y
β =(X′X
)−1X′y
Note that we implicitly assumed that X′X is invertible.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 82 / 140
![Page 357: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/357.jpg)
Consistency of βTo show consistency, we rewrite the OLS estimator in terms of sample means sothat we can apply LLN.
First, note that a matrix cross product can be written as a sum of vector products:
X′X =n∑
i=1
x′ixi and X′y =n∑
i=1
x′iyi
where xi is the 1× (k + 1) row vector of predictor values for unit i .
Now we can rewrite the OLS estimator as,
β =(∑n
i=1x′ixi
)−1 (∑n
i=1x′iyi)
=(∑n
i=1x′ixi
)−1 (∑n
i=1x′i (xiβ + ui )
)= β +
(∑n
i=1x′ixi
)−1 (∑n
i=1x′iui
)= β +
(1
n
∑n
i=1x′ixi
)−1(1
n
∑n
i=1x′iui
)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 83 / 140
![Page 358: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/358.jpg)
Consistency of βTo show consistency, we rewrite the OLS estimator in terms of sample means sothat we can apply LLN.
First, note that a matrix cross product can be written as a sum of vector products:
X′X =n∑
i=1
x′ixi and X′y =n∑
i=1
x′iyi
where xi is the 1× (k + 1) row vector of predictor values for unit i .
Now we can rewrite the OLS estimator as,
β =(∑n
i=1x′ixi
)−1 (∑n
i=1x′iyi)
=(∑n
i=1x′ixi
)−1 (∑n
i=1x′i (xiβ + ui )
)= β +
(∑n
i=1x′ixi
)−1 (∑n
i=1x′iui
)= β +
(1
n
∑n
i=1x′ixi
)−1(1
n
∑n
i=1x′iui
)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 83 / 140
![Page 359: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/359.jpg)
Consistency of βTo show consistency, we rewrite the OLS estimator in terms of sample means sothat we can apply LLN.
First, note that a matrix cross product can be written as a sum of vector products:
X′X =n∑
i=1
x′ixi and X′y =n∑
i=1
x′iyi
where xi is the 1× (k + 1) row vector of predictor values for unit i .
Now we can rewrite the OLS estimator as,
β =(∑n
i=1x′ixi
)−1 (∑n
i=1x′iyi)
=(∑n
i=1x′ixi
)−1 (∑n
i=1x′i (xiβ + ui )
)= β +
(∑n
i=1x′ixi
)−1 (∑n
i=1x′iui
)= β +
(1
n
∑n
i=1x′ixi
)−1(1
n
∑n
i=1x′iui
)Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 83 / 140
![Page 360: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/360.jpg)
Consistency of β
Now let’s apply the LLN to the sample means:(1
n
n∑i=1
x′ixi
)p−→ E [x′ixi ], a (k + 1)× (k + 1) nonsingular matrix.(
1
n
n∑i=1
x′iui
)p−→ E [x′iui ] = 0, by the zero cond. mean assumption.
Therefore, we have
plim(β) = β + (E [x′ixi ])−1 · 0
= β.
We can also show the asymptotic normality of β using a similar argument butwith the CLT.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 84 / 140
![Page 361: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/361.jpg)
Consistency of β
Now let’s apply the LLN to the sample means:(1
n
n∑i=1
x′ixi
)p−→ E [x′ixi ], a (k + 1)× (k + 1) nonsingular matrix.(
1
n
n∑i=1
x′iui
)p−→ E [x′iui ] = 0, by the zero cond. mean assumption.
Therefore, we have
plim(β) = β + (E [x′ixi ])−1 · 0
= β.
We can also show the asymptotic normality of β using a similar argument butwith the CLT.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 84 / 140
![Page 362: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/362.jpg)
Consistency of β
Now let’s apply the LLN to the sample means:(1
n
n∑i=1
x′ixi
)p−→ E [x′ixi ], a (k + 1)× (k + 1) nonsingular matrix.(
1
n
n∑i=1
x′iui
)p−→ E [x′iui ] = 0, by the zero cond. mean assumption.
Therefore, we have
plim(β) = β + (E [x′ixi ])−1 · 0
= β.
We can also show the asymptotic normality of β using a similar argument butwith the CLT.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 84 / 140
![Page 363: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/363.jpg)
References
Wooldridge, Jeffrey. Introductory econometrics: A modern approach.Cengage Learning, 2012.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 85 / 140
![Page 364: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/364.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions
This WeekI Monday:
F matrix form of linear regressionF t-tests, F-tests and general linear hypothesis tests
I Wednesday:F problems with p-valuesF agnostic regressionF the bootstrap
Next WeekI break!I then . . . diagnostics
Long RunI probability → inference → regression → causal inference
Questions?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 86 / 140
![Page 365: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/365.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions
This WeekI Monday:
F matrix form of linear regressionF t-tests, F-tests and general linear hypothesis tests
I Wednesday:F problems with p-valuesF agnostic regressionF the bootstrap
Next WeekI break!I then . . . diagnostics
Long RunI probability → inference → regression → causal inference
Questions?
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 86 / 140
![Page 366: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/366.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 87 / 140
![Page 367: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/367.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 87 / 140
![Page 368: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/368.jpg)
p-values (courtesy of XKCD)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 88 / 140
![Page 369: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/369.jpg)
The value of the p-value
Every experiment may be said to exist only in order to give thefacts a chance of disproving the null hypothesis.
Ronald Fisher (1935)
In social science (and I think in psychology as well), the nullhypothesis is almost certainly false, false, false, and you don’tneed a p-value to tell you this. The p-value tells you the extentto which a certain aspect of your data are consistent with thenull hypothesis. A lack of rejection doesn’t tell you that thenull hypothesis is likely true; rather, it tells you that you don’thave enough data to reject the null hypothesis.
Andrew Gelman (2010)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 89 / 140
![Page 370: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/370.jpg)
The value of the p-value
Every experiment may be said to exist only in order to give thefacts a chance of disproving the null hypothesis.
Ronald Fisher (1935)
In social science (and I think in psychology as well), the nullhypothesis is almost certainly false, false, false, and you don’tneed a p-value to tell you this. The p-value tells you the extentto which a certain aspect of your data are consistent with thenull hypothesis. A lack of rejection doesn’t tell you that thenull hypothesis is likely true; rather, it tells you that you don’thave enough data to reject the null hypothesis.
Andrew Gelman (2010)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 89 / 140
![Page 371: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/371.jpg)
Problems with p-values
p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.
p-values are not:I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false
a large p-value could mean either that we are in the null world ORthat we had insufficient power
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 90 / 140
![Page 372: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/372.jpg)
Problems with p-values
p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.
p-values are not:I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false
a large p-value could mean either that we are in the null world ORthat we had insufficient power
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 90 / 140
![Page 373: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/373.jpg)
Problems with p-values
p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.
p-values are not:
I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false
a large p-value could mean either that we are in the null world ORthat we had insufficient power
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 90 / 140
![Page 374: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/374.jpg)
Problems with p-values
p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.
p-values are not:I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false
a large p-value could mean either that we are in the null world ORthat we had insufficient power
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 90 / 140
![Page 375: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/375.jpg)
Problems with p-values
p-values are extremely common in the social sciences and are oftenthe standard by which the value of the finding is judged.
p-values are not:I an indication of a large substantive effectI the probability that the null hypothesis is trueI the probability that the alternative hypothesis is false
a large p-value could mean either that we are in the null world ORthat we had insufficient power
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 90 / 140
![Page 376: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/376.jpg)
So what is the basic idea?
The idea was to run an experiment, then see if the results wereconsistent with what random chance might produce. Researcherswould first set up a ’null hypothesis’ that they wanted todisprove, such as there being no correlation or no differencebetween groups. Next, they would play the devil’s advocate and,assuming that this null hypothesis was in fact true, calculate thechances of getting results at least as extreme as what wasactually observed. This probability was the P value. The smallerit was, suggested Fisher, the greater the likelihood that thestraw-man null hypothesis was false.(Nunzo 2014, emphasis mine)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 91 / 140
![Page 377: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/377.jpg)
I’ve got 99 problems. . .
p-values are hard to interpret, but even in the best scenarios they havesome key problems:
they remove focus from data, measurement, theory and thesubstantive quantity of interest
they are often applied outside the dichotomous/decision-makingframework where they make some sense
significance isn’t even a good filter for predictive covariates (Ward etal 2010, Lo et al 2015)
they lead to publication filtering on arbitrary cutoffs.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 92 / 140
![Page 378: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/378.jpg)
I’ve got 99 problems. . .
p-values are hard to interpret, but even in the best scenarios they havesome key problems:
they remove focus from data, measurement, theory and thesubstantive quantity of interest
they are often applied outside the dichotomous/decision-makingframework where they make some sense
significance isn’t even a good filter for predictive covariates (Ward etal 2010, Lo et al 2015)
they lead to publication filtering on arbitrary cutoffs.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 92 / 140
![Page 379: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/379.jpg)
I’ve got 99 problems. . .
p-values are hard to interpret, but even in the best scenarios they havesome key problems:
they remove focus from data, measurement, theory and thesubstantive quantity of interest
they are often applied outside the dichotomous/decision-makingframework where they make some sense
significance isn’t even a good filter for predictive covariates (Ward etal 2010, Lo et al 2015)
they lead to publication filtering on arbitrary cutoffs.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 92 / 140
![Page 380: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/380.jpg)
I’ve got 99 problems. . .
p-values are hard to interpret, but even in the best scenarios they havesome key problems:
they remove focus from data, measurement, theory and thesubstantive quantity of interest
they are often applied outside the dichotomous/decision-makingframework where they make some sense
significance isn’t even a good filter for predictive covariates (Ward etal 2010, Lo et al 2015)
they lead to publication filtering on arbitrary cutoffs.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 92 / 140
![Page 381: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/381.jpg)
I’ve got 99 problems. . .
p-values are hard to interpret, but even in the best scenarios they havesome key problems:
they remove focus from data, measurement, theory and thesubstantive quantity of interest
they are often applied outside the dichotomous/decision-makingframework where they make some sense
significance isn’t even a good filter for predictive covariates (Ward etal 2010, Lo et al 2015)
they lead to publication filtering on arbitrary cutoffs.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 92 / 140
![Page 382: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/382.jpg)
I’ve got 99 problems. . .
p-values are hard to interpret, but even in the best scenarios they havesome key problems:
they remove focus from data, measurement, theory and thesubstantive quantity of interest
they are often applied outside the dichotomous/decision-makingframework where they make some sense
significance isn’t even a good filter for predictive covariates (Ward etal 2010, Lo et al 2015)
they lead to publication filtering on arbitrary cutoffs.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 92 / 140
![Page 383: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/383.jpg)
Arbitrary Cutoffs
Gerber and Malhotra (2006) Top Political Science JournalsStewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 93 / 140
![Page 384: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/384.jpg)
Arbitrary Cutoffs
Gerber and Malhotra (2008) Top Sociology JournalsStewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 93 / 140
![Page 385: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/385.jpg)
Arbitrary Cutoffs
Masicampo and Lalande (2012) Top Psychology Journals
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 93 / 140
![Page 386: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/386.jpg)
Still Not Convinced?The Real Harm of Misinterpreted p-values
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 94 / 140
![Page 387: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/387.jpg)
Example from Hauer: Right-Turn-On-Red
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 95 / 140
![Page 388: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/388.jpg)
The Point in Hauer
Two other interesting examples in Hauer (2004)
Core issue is that lack of significance is not an indication of a zeroeffect, it could also be a lack of power (i.e. a small sample sizerelative to the difficulty of detecting the effect)
On the opposite end, large tech companies rarely use significancetesting because they have huge samples which essentially always findsome non-zero effect. But that doesn’t make the finding significant ina colloquial sense of important.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 96 / 140
![Page 389: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/389.jpg)
The Point in Hauer
Two other interesting examples in Hauer (2004)
Core issue is that lack of significance is not an indication of a zeroeffect, it could also be a lack of power (i.e. a small sample sizerelative to the difficulty of detecting the effect)
On the opposite end, large tech companies rarely use significancetesting because they have huge samples which essentially always findsome non-zero effect. But that doesn’t make the finding significant ina colloquial sense of important.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 96 / 140
![Page 390: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/390.jpg)
The Point in Hauer
Two other interesting examples in Hauer (2004)
Core issue is that lack of significance is not an indication of a zeroeffect, it could also be a lack of power (i.e. a small sample sizerelative to the difficulty of detecting the effect)
On the opposite end, large tech companies rarely use significancetesting because they have huge samples which essentially always findsome non-zero effect. But that doesn’t make the finding significant ina colloquial sense of important.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 96 / 140
![Page 391: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/391.jpg)
The Point in Hauer
Two other interesting examples in Hauer (2004)
Core issue is that lack of significance is not an indication of a zeroeffect, it could also be a lack of power (i.e. a small sample sizerelative to the difficulty of detecting the effect)
On the opposite end, large tech companies rarely use significancetesting because they have huge samples which essentially always findsome non-zero effect. But that doesn’t make the finding significant ina colloquial sense of important.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 96 / 140
![Page 392: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/392.jpg)
p-values and Confidence Intervals
p-values one of most used tests in the social sciences–and you’re tellingme not to rely on them?
Basically, yes.
What’s the matter with you?
- Two reasons not to worship p-values [of many]
1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information
2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (newspaper rule)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 97 / 140
![Page 393: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/393.jpg)
p-values and Confidence Intervals
p-values one of most used tests in the social sciences–and you’re tellingme not to rely on them?
Basically, yes.
What’s the matter with you?
- Two reasons not to worship p-values [of many]
1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information
2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (newspaper rule)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 97 / 140
![Page 394: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/394.jpg)
p-values and Confidence Intervals
p-values one of most used tests in the social sciences–and you’re tellingme not to rely on them?
Basically, yes.
What’s the matter with you?
- Two reasons not to worship p-values [of many]
1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information
2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (newspaper rule)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 97 / 140
![Page 395: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/395.jpg)
p-values and Confidence Intervals
p-values one of most used tests in the social sciences–and you’re tellingme not to rely on them?
Basically, yes.
What’s the matter with you?
- Two reasons not to worship p-values [of many]
1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information
2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (newspaper rule)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 97 / 140
![Page 396: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/396.jpg)
p-values and Confidence Intervals
p-values one of most used tests in the social sciences–and you’re tellingme not to rely on them?
Basically, yes.
What’s the matter with you?
- Two reasons not to worship p-values [of many]
1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information
2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (newspaper rule)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 97 / 140
![Page 397: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/397.jpg)
p-values and Confidence Intervals
p-values one of most used tests in the social sciences–and you’re tellingme not to rely on them?
Basically, yes.
What’s the matter with you?
- Two reasons not to worship p-values [of many]
1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information
2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (newspaper rule)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 97 / 140
![Page 398: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/398.jpg)
p-values and Confidence Intervals
p-values one of most used tests in the social sciences–and you’re tellingme not to rely on them?
Basically, yes.
What’s the matter with you?
- Two reasons not to worship p-values [of many]
1) Statistical: they represent a very specific quantity under a nulldistribution. If you don’t really care about rejecting just that null, thenyou should focus on providing more information
2) Substantive: p-values are divorced from your quantity of interest–whichalmost always should relate to how much an intervention changes aquantity of social scientific interest (newspaper rule)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 97 / 140
![Page 399: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/399.jpg)
p-values and Confidence Intervals
But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?
1) Me too, good luck.
2) That’s not what p-values measure
3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure
Instead, show quantities you care about with confidence intervals.
Don’t misinterpret, or rely too heavily, on your p-values.They are evidence against your null, not evidence in favor
of your alternative.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 98 / 140
![Page 400: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/400.jpg)
p-values and Confidence Intervals
But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?
1) Me too, good luck.
2) That’s not what p-values measure
3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure
Instead, show quantities you care about with confidence intervals.
Don’t misinterpret, or rely too heavily, on your p-values.They are evidence against your null, not evidence in favor
of your alternative.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 98 / 140
![Page 401: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/401.jpg)
p-values and Confidence Intervals
But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?
1) Me too, good luck.
2) That’s not what p-values measure
3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure
Instead, show quantities you care about with confidence intervals.
Don’t misinterpret, or rely too heavily, on your p-values.They are evidence against your null, not evidence in favor
of your alternative.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 98 / 140
![Page 402: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/402.jpg)
p-values and Confidence Intervals
But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?
1) Me too, good luck.
2) That’s not what p-values measure
3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure
Instead, show quantities you care about with confidence intervals.
Don’t misinterpret, or rely too heavily, on your p-values.They are evidence against your null, not evidence in favor
of your alternative.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 98 / 140
![Page 403: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/403.jpg)
p-values and Confidence Intervals
But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?
1) Me too, good luck.
2) That’s not what p-values measure
3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure
Instead, show quantities you care about with confidence intervals.
Don’t misinterpret, or rely too heavily, on your p-values.They are evidence against your null, not evidence in favor
of your alternative.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 98 / 140
![Page 404: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/404.jpg)
p-values and Confidence Intervals
But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?
1) Me too, good luck.
2) That’s not what p-values measure
3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure
Instead, show quantities you care about with confidence intervals.
Don’t misinterpret, or rely too heavily, on your p-values.They are evidence against your null, not evidence in favor
of your alternative.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 98 / 140
![Page 405: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/405.jpg)
p-values and Confidence Intervals
But I want to assess the probability that my hypothesis is true–why can’t Iuse a p-value?
1) Me too, good luck.
2) That’s not what p-values measure
3) No one study is going to eliminate an entire hypothesis; even if thatstudy generates a really small p-value, you’d probably want an entirelydifferent infrastructure
Instead, show quantities you care about with confidence intervals.
Don’t misinterpret, or rely too heavily, on your p-values.They are evidence against your null, not evidence in favor
of your alternative.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 98 / 140
![Page 406: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/406.jpg)
But Let’s Not Obsess Too Much About p-values
From Leek and Peng (2015) “P values are just the tip of the iceberg” Nature.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 99 / 140
![Page 407: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/407.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 100 / 140
![Page 408: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/408.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 100 / 140
![Page 409: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/409.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:I (A1) linearity, (A2) i.i.d. sample, (A3) full rank Xi , (A4) zero
conditional mean error, (A5) homoskedasticity.I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get smallsample SEs and BUE.What is the basic approach here?
I A1 defines a linear model for the conditional expectation:
E [Yi |Xi ] = µi = X′iβ
I A4-6, define a probabilistic model for the conditional distribution of Yi
given Xi :
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.I A2 ensures that we observe independent samples for estimation.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 101 / 140
![Page 410: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/410.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:I (A1) linearity, (A2) i.i.d. sample, (A3) full rank Xi , (A4) zero
conditional mean error, (A5) homoskedasticity.
I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get smallsample SEs and BUE.What is the basic approach here?
I A1 defines a linear model for the conditional expectation:
E [Yi |Xi ] = µi = X′iβ
I A4-6, define a probabilistic model for the conditional distribution of Yi
given Xi :
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.I A2 ensures that we observe independent samples for estimation.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 101 / 140
![Page 411: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/411.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:I (A1) linearity, (A2) i.i.d. sample, (A3) full rank Xi , (A4) zero
conditional mean error, (A5) homoskedasticity.I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get smallsample SEs and BUE.What is the basic approach here?
I A1 defines a linear model for the conditional expectation:
E [Yi |Xi ] = µi = X′iβ
I A4-6, define a probabilistic model for the conditional distribution of Yi
given Xi :
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.I A2 ensures that we observe independent samples for estimation.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 101 / 140
![Page 412: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/412.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:I (A1) linearity, (A2) i.i.d. sample, (A3) full rank Xi , (A4) zero
conditional mean error, (A5) homoskedasticity.I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get smallsample SEs and BUE.
What is the basic approach here?I A1 defines a linear model for the conditional expectation:
E [Yi |Xi ] = µi = X′iβ
I A4-6, define a probabilistic model for the conditional distribution of Yi
given Xi :
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.I A2 ensures that we observe independent samples for estimation.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 101 / 140
![Page 413: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/413.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:I (A1) linearity, (A2) i.i.d. sample, (A3) full rank Xi , (A4) zero
conditional mean error, (A5) homoskedasticity.I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get smallsample SEs and BUE.What is the basic approach here?
I A1 defines a linear model for the conditional expectation:
E [Yi |Xi ] = µi = X′iβ
I A4-6, define a probabilistic model for the conditional distribution of Yi
given Xi :
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.I A2 ensures that we observe independent samples for estimation.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 101 / 140
![Page 414: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/414.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:I (A1) linearity, (A2) i.i.d. sample, (A3) full rank Xi , (A4) zero
conditional mean error, (A5) homoskedasticity.I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get smallsample SEs and BUE.What is the basic approach here?
I A1 defines a linear model for the conditional expectation:
E [Yi |Xi ] = µi = X′iβ
I A4-6, define a probabilistic model for the conditional distribution of Yi
given Xi :
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.I A2 ensures that we observe independent samples for estimation.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 101 / 140
![Page 415: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/415.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:I (A1) linearity, (A2) i.i.d. sample, (A3) full rank Xi , (A4) zero
conditional mean error, (A5) homoskedasticity.I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get smallsample SEs and BUE.What is the basic approach here?
I A1 defines a linear model for the conditional expectation:
E [Yi |Xi ] = µi = X′iβ
I A4-6, define a probabilistic model for the conditional distribution of Yi
given Xi :
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.I A2 ensures that we observe independent samples for estimation.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 101 / 140
![Page 416: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/416.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:I (A1) linearity, (A2) i.i.d. sample, (A3) full rank Xi , (A4) zero
conditional mean error, (A5) homoskedasticity.I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get smallsample SEs and BUE.What is the basic approach here?
I A1 defines a linear model for the conditional expectation:
E [Yi |Xi ] = µi = X′iβ
I A4-6, define a probabilistic model for the conditional distribution of Yi
given Xi :
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.
I A2 ensures that we observe independent samples for estimation.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 101 / 140
![Page 417: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/417.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:I (A1) linearity, (A2) i.i.d. sample, (A3) full rank Xi , (A4) zero
conditional mean error, (A5) homoskedasticity.I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get smallsample SEs and BUE.What is the basic approach here?
I A1 defines a linear model for the conditional expectation:
E [Yi |Xi ] = µi = X′iβ
I A4-6, define a probabilistic model for the conditional distribution of Yi
given Xi :
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.I A2 ensures that we observe independent samples for estimation.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 101 / 140
![Page 418: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/418.jpg)
Agnostic views on regression
[Yi |Xi ] ∼ N(X ′i β, σ2)
These assumptions assume we know a lot about how Yi is ‘generated’.
Justifications for using OLS (like BLUE/BUE) often invoke theseassumptions which are unlikely to hold exactly.
Alternative: take an agnostic view on regression.I use OLS without believing these assumptions.I lean on two things: A2 i.i.d. sample, asymptotics (large-sample
properties)
Lose the distributional assumptions, focus on the conditionalexpectation function (CEF):
µ(x) = E[Yi |Xi = x ] =∑y
y · P[Yi = y |Xi = x ]
NB: this makes no statement about whether or not the CEF you arelooking at is the ‘right’ one.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 102 / 140
![Page 419: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/419.jpg)
Agnostic views on regression
[Yi |Xi ] ∼ N(X ′i β, σ2)
These assumptions assume we know a lot about how Yi is ‘generated’.
Justifications for using OLS (like BLUE/BUE) often invoke theseassumptions which are unlikely to hold exactly.
Alternative: take an agnostic view on regression.I use OLS without believing these assumptions.I lean on two things: A2 i.i.d. sample, asymptotics (large-sample
properties)
Lose the distributional assumptions, focus on the conditionalexpectation function (CEF):
µ(x) = E[Yi |Xi = x ] =∑y
y · P[Yi = y |Xi = x ]
NB: this makes no statement about whether or not the CEF you arelooking at is the ‘right’ one.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 102 / 140
![Page 420: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/420.jpg)
Agnostic views on regression
[Yi |Xi ] ∼ N(X ′i β, σ2)
These assumptions assume we know a lot about how Yi is ‘generated’.
Justifications for using OLS (like BLUE/BUE) often invoke theseassumptions which are unlikely to hold exactly.
Alternative: take an agnostic view on regression.
I use OLS without believing these assumptions.I lean on two things: A2 i.i.d. sample, asymptotics (large-sample
properties)
Lose the distributional assumptions, focus on the conditionalexpectation function (CEF):
µ(x) = E[Yi |Xi = x ] =∑y
y · P[Yi = y |Xi = x ]
NB: this makes no statement about whether or not the CEF you arelooking at is the ‘right’ one.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 102 / 140
![Page 421: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/421.jpg)
Agnostic views on regression
[Yi |Xi ] ∼ N(X ′i β, σ2)
These assumptions assume we know a lot about how Yi is ‘generated’.
Justifications for using OLS (like BLUE/BUE) often invoke theseassumptions which are unlikely to hold exactly.
Alternative: take an agnostic view on regression.I use OLS without believing these assumptions.
I lean on two things: A2 i.i.d. sample, asymptotics (large-sampleproperties)
Lose the distributional assumptions, focus on the conditionalexpectation function (CEF):
µ(x) = E[Yi |Xi = x ] =∑y
y · P[Yi = y |Xi = x ]
NB: this makes no statement about whether or not the CEF you arelooking at is the ‘right’ one.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 102 / 140
![Page 422: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/422.jpg)
Agnostic views on regression
[Yi |Xi ] ∼ N(X ′i β, σ2)
These assumptions assume we know a lot about how Yi is ‘generated’.
Justifications for using OLS (like BLUE/BUE) often invoke theseassumptions which are unlikely to hold exactly.
Alternative: take an agnostic view on regression.I use OLS without believing these assumptions.I lean on two things: A2 i.i.d. sample, asymptotics (large-sample
properties)
Lose the distributional assumptions, focus on the conditionalexpectation function (CEF):
µ(x) = E[Yi |Xi = x ] =∑y
y · P[Yi = y |Xi = x ]
NB: this makes no statement about whether or not the CEF you arelooking at is the ‘right’ one.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 102 / 140
![Page 423: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/423.jpg)
Agnostic views on regression
[Yi |Xi ] ∼ N(X ′i β, σ2)
These assumptions assume we know a lot about how Yi is ‘generated’.
Justifications for using OLS (like BLUE/BUE) often invoke theseassumptions which are unlikely to hold exactly.
Alternative: take an agnostic view on regression.I use OLS without believing these assumptions.I lean on two things: A2 i.i.d. sample, asymptotics (large-sample
properties)
Lose the distributional assumptions, focus on the conditionalexpectation function (CEF):
µ(x) = E[Yi |Xi = x ] =∑y
y · P[Yi = y |Xi = x ]
NB: this makes no statement about whether or not the CEF you arelooking at is the ‘right’ one.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 102 / 140
![Page 424: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/424.jpg)
Agnostic views on regression
[Yi |Xi ] ∼ N(X ′i β, σ2)
These assumptions assume we know a lot about how Yi is ‘generated’.
Justifications for using OLS (like BLUE/BUE) often invoke theseassumptions which are unlikely to hold exactly.
Alternative: take an agnostic view on regression.I use OLS without believing these assumptions.I lean on two things: A2 i.i.d. sample, asymptotics (large-sample
properties)
Lose the distributional assumptions, focus on the conditionalexpectation function (CEF):
µ(x) = E[Yi |Xi = x ] =∑y
y · P[Yi = y |Xi = x ]
NB: this makes no statement about whether or not the CEF you arelooking at is the ‘right’ one.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 102 / 140
![Page 425: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/425.jpg)
Justifying linear regression
Define linear regression:
β = arg minb
E[(Yi − X ′i b)2]
The solution to this is the following:
β = E[XiX′i ]−1E[XiYi ]
Note that the is the population coefficient vector, not the estimatoryet.
In other words, even a non-linear CEF has a “true” linearapproximation, even though that approximation may not be great.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 103 / 140
![Page 426: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/426.jpg)
Justifying linear regression
Define linear regression:
β = arg minb
E[(Yi − X ′i b)2]
The solution to this is the following:
β = E[XiX′i ]−1E[XiYi ]
Note that the is the population coefficient vector, not the estimatoryet.
In other words, even a non-linear CEF has a “true” linearapproximation, even though that approximation may not be great.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 103 / 140
![Page 427: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/427.jpg)
Justifying linear regression
Define linear regression:
β = arg minb
E[(Yi − X ′i b)2]
The solution to this is the following:
β = E[XiX′i ]−1E[XiYi ]
Note that the is the population coefficient vector, not the estimatoryet.
In other words, even a non-linear CEF has a “true” linearapproximation, even though that approximation may not be great.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 103 / 140
![Page 428: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/428.jpg)
Justifying linear regression
Define linear regression:
β = arg minb
E[(Yi − X ′i b)2]
The solution to this is the following:
β = E[XiX′i ]−1E[XiYi ]
Note that the is the population coefficient vector, not the estimatoryet.
In other words, even a non-linear CEF has a “true” linearapproximation, even though that approximation may not be great.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 103 / 140
![Page 429: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/429.jpg)
Regression anatomy
Consider simple linear regression:
(α, β) = arg mina,b
E[(Yi − a− bXi )
2]
In this case, we can write the population/true slope β as:
β = E[XiX′i ]−1E[XiYi ] =
Cov(Yi ,Xi )
Var[Xi ]
With more covariates, β is more complicated, but we can still write itlike this.
Let Xki be the residual from a regression of Xki on all the otherindependent variables. Then, βk , the coefficient for Xki is:
βk =Cov(Yi , Xki )
Var(Xki )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 104 / 140
![Page 430: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/430.jpg)
Regression anatomy
Consider simple linear regression:
(α, β) = arg mina,b
E[(Yi − a− bXi )
2]
In this case, we can write the population/true slope β as:
β = E[XiX′i ]−1E[XiYi ] =
Cov(Yi ,Xi )
Var[Xi ]
With more covariates, β is more complicated, but we can still write itlike this.
Let Xki be the residual from a regression of Xki on all the otherindependent variables. Then, βk , the coefficient for Xki is:
βk =Cov(Yi , Xki )
Var(Xki )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 104 / 140
![Page 431: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/431.jpg)
Regression anatomy
Consider simple linear regression:
(α, β) = arg mina,b
E[(Yi − a− bXi )
2]
In this case, we can write the population/true slope β as:
β = E[XiX′i ]−1E[XiYi ] =
Cov(Yi ,Xi )
Var[Xi ]
With more covariates, β is more complicated, but we can still write itlike this.
Let Xki be the residual from a regression of Xki on all the otherindependent variables. Then, βk , the coefficient for Xki is:
βk =Cov(Yi , Xki )
Var(Xki )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 104 / 140
![Page 432: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/432.jpg)
Regression anatomy
Consider simple linear regression:
(α, β) = arg mina,b
E[(Yi − a− bXi )
2]
In this case, we can write the population/true slope β as:
β = E[XiX′i ]−1E[XiYi ] =
Cov(Yi ,Xi )
Var[Xi ]
With more covariates, β is more complicated, but we can still write itlike this.
Let Xki be the residual from a regression of Xki on all the otherindependent variables. Then, βk , the coefficient for Xki is:
βk =Cov(Yi , Xki )
Var(Xki )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 104 / 140
![Page 433: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/433.jpg)
Justification 1: Linear CEFs
Justification 1: if the CEF is linear, the population regression functionis it. That is, if E [Yi |Xi ] = X ′i b, then b = β.
When would we expect the CEF to be linear? Two cases.
1 Outcome and covariates are multivariate normal.2 Linear regression model is saturated.
A model is saturated if there are as many parameters as there arepossible combination of the Xi variables.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 105 / 140
![Page 434: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/434.jpg)
Justification 1: Linear CEFs
Justification 1: if the CEF is linear, the population regression functionis it. That is, if E [Yi |Xi ] = X ′i b, then b = β.
When would we expect the CEF to be linear? Two cases.
1 Outcome and covariates are multivariate normal.2 Linear regression model is saturated.
A model is saturated if there are as many parameters as there arepossible combination of the Xi variables.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 105 / 140
![Page 435: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/435.jpg)
Justification 1: Linear CEFs
Justification 1: if the CEF is linear, the population regression functionis it. That is, if E [Yi |Xi ] = X ′i b, then b = β.
When would we expect the CEF to be linear? Two cases.
1 Outcome and covariates are multivariate normal.2 Linear regression model is saturated.
A model is saturated if there are as many parameters as there arepossible combination of the Xi variables.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 105 / 140
![Page 436: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/436.jpg)
Justification 1: Linear CEFs
Justification 1: if the CEF is linear, the population regression functionis it. That is, if E [Yi |Xi ] = X ′i b, then b = β.
When would we expect the CEF to be linear? Two cases.
1 Outcome and covariates are multivariate normal.2 Linear regression model is saturated.
A model is saturated if there are as many parameters as there arepossible combination of the Xi variables.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 105 / 140
![Page 437: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/437.jpg)
Justification 1: Linear CEFs
Justification 1: if the CEF is linear, the population regression functionis it. That is, if E [Yi |Xi ] = X ′i b, then b = β.
When would we expect the CEF to be linear? Two cases.
1 Outcome and covariates are multivariate normal.2 Linear regression model is saturated.
A model is saturated if there are as many parameters as there arepossible combination of the Xi variables.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 105 / 140
![Page 438: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/438.jpg)
Saturated model example
Two binary variables, X1i for marriage status and X2i for havingchildren.
Four possible values of Xi , four possible values of µ(Xi ):
E [Yi |X1i = 0,X2i = 0]
E [Yi |X1i = 1,X2i = 0]
E [Yi |X1i = 0,X2i = 1]
E [Yi |X1i = 1,X2i = 1]
We can write the CEF as follows:
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 106 / 140
![Page 439: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/439.jpg)
Saturated model example
Two binary variables, X1i for marriage status and X2i for havingchildren.
Four possible values of Xi , four possible values of µ(Xi ):
E [Yi |X1i = 0,X2i = 0]
E [Yi |X1i = 1,X2i = 0]
E [Yi |X1i = 0,X2i = 1]
E [Yi |X1i = 1,X2i = 1]
We can write the CEF as follows:
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 106 / 140
![Page 440: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/440.jpg)
Saturated model example
Two binary variables, X1i for marriage status and X2i for havingchildren.
Four possible values of Xi , four possible values of µ(Xi ):
E [Yi |X1i = 0,X2i = 0]
E [Yi |X1i = 1,X2i = 0]
E [Yi |X1i = 0,X2i = 1]
E [Yi |X1i = 1,X2i = 1]
We can write the CEF as follows:
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 106 / 140
![Page 441: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/441.jpg)
Saturated model example
Two binary variables, X1i for marriage status and X2i for havingchildren.
Four possible values of Xi , four possible values of µ(Xi ):
E [Yi |X1i = 0,X2i = 0] = α
E [Yi |X1i = 1,X2i = 0]
E [Yi |X1i = 0,X2i = 1]
E [Yi |X1i = 1,X2i = 1]
We can write the CEF as follows:
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 106 / 140
![Page 442: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/442.jpg)
Saturated model example
Two binary variables, X1i for marriage status and X2i for havingchildren.
Four possible values of Xi , four possible values of µ(Xi ):
E [Yi |X1i = 0,X2i = 0] = α
E [Yi |X1i = 1,X2i = 0] = α + β
E [Yi |X1i = 0,X2i = 1]
E [Yi |X1i = 1,X2i = 1]
We can write the CEF as follows:
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 106 / 140
![Page 443: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/443.jpg)
Saturated model example
Two binary variables, X1i for marriage status and X2i for havingchildren.
Four possible values of Xi , four possible values of µ(Xi ):
E [Yi |X1i = 0,X2i = 0] = α
E [Yi |X1i = 1,X2i = 0] = α + β
E [Yi |X1i = 0,X2i = 1] = α + γ
E [Yi |X1i = 1,X2i = 1]
We can write the CEF as follows:
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 106 / 140
![Page 444: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/444.jpg)
Saturated model example
Two binary variables, X1i for marriage status and X2i for havingchildren.
Four possible values of Xi , four possible values of µ(Xi ):
E [Yi |X1i = 0,X2i = 0] = α
E [Yi |X1i = 1,X2i = 0] = α + β
E [Yi |X1i = 0,X2i = 1] = α + γ
E [Yi |X1i = 1,X2i = 1] = α + β + γ + δ
We can write the CEF as follows:
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 106 / 140
![Page 445: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/445.jpg)
Saturated model example
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Basically, each value of µ(Xi ) is being estimated separately.
I within-strata estimation.I No borrowing of information from across values of Xi .
Requires a set of dummies for each categorical variable plus allinteractions.
Or, a series of dummies for each unique combination of Xi .
This makes linearity hold mechanically and so linearity is not anassumption.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 107 / 140
![Page 446: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/446.jpg)
Saturated model example
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Basically, each value of µ(Xi ) is being estimated separately.
I within-strata estimation.
I No borrowing of information from across values of Xi .
Requires a set of dummies for each categorical variable plus allinteractions.
Or, a series of dummies for each unique combination of Xi .
This makes linearity hold mechanically and so linearity is not anassumption.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 107 / 140
![Page 447: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/447.jpg)
Saturated model example
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Basically, each value of µ(Xi ) is being estimated separately.
I within-strata estimation.I No borrowing of information from across values of Xi .
Requires a set of dummies for each categorical variable plus allinteractions.
Or, a series of dummies for each unique combination of Xi .
This makes linearity hold mechanically and so linearity is not anassumption.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 107 / 140
![Page 448: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/448.jpg)
Saturated model example
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Basically, each value of µ(Xi ) is being estimated separately.
I within-strata estimation.I No borrowing of information from across values of Xi .
Requires a set of dummies for each categorical variable plus allinteractions.
Or, a series of dummies for each unique combination of Xi .
This makes linearity hold mechanically and so linearity is not anassumption.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 107 / 140
![Page 449: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/449.jpg)
Saturated model example
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Basically, each value of µ(Xi ) is being estimated separately.
I within-strata estimation.I No borrowing of information from across values of Xi .
Requires a set of dummies for each categorical variable plus allinteractions.
Or, a series of dummies for each unique combination of Xi .
This makes linearity hold mechanically and so linearity is not anassumption.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 107 / 140
![Page 450: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/450.jpg)
Saturated model example
E [Yi |X1i ,X2i ] = α + βX1i + γX2i + δ(X1iX2i )
Basically, each value of µ(Xi ) is being estimated separately.
I within-strata estimation.I No borrowing of information from across values of Xi .
Requires a set of dummies for each categorical variable plus allinteractions.
Or, a series of dummies for each unique combination of Xi .
This makes linearity hold mechanically and so linearity is not anassumption.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 107 / 140
![Page 451: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/451.jpg)
Saturated model example
Washington (AER) data on the effects of daughters.
We’ll look at the relationship between voting and number of kids(causal?).
girls <- foreign::read.dta("girls.dta")
head(girls[, c("name", "totchi", "aauw")])
## name totchi aauw
## 1 ABERCROMBIE, NEIL 0 100
## 2 ACKERMAN, GARY L. 3 88
## 3 ADERHOLT, ROBERT B. 0 0
## 4 ALLEN, THOMAS H. 2 100
## 5 ANDREWS, ROBERT E. 2 100
## 6 ARCHER, W.R. 7 0
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 108 / 140
![Page 452: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/452.jpg)
Saturated model example
Washington (AER) data on the effects of daughters.
We’ll look at the relationship between voting and number of kids(causal?).
girls <- foreign::read.dta("girls.dta")
head(girls[, c("name", "totchi", "aauw")])
## name totchi aauw
## 1 ABERCROMBIE, NEIL 0 100
## 2 ACKERMAN, GARY L. 3 88
## 3 ADERHOLT, ROBERT B. 0 0
## 4 ALLEN, THOMAS H. 2 100
## 5 ANDREWS, ROBERT E. 2 100
## 6 ARCHER, W.R. 7 0
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 108 / 140
![Page 453: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/453.jpg)
Saturated model example
Washington (AER) data on the effects of daughters.
We’ll look at the relationship between voting and number of kids(causal?).
girls <- foreign::read.dta("girls.dta")
head(girls[, c("name", "totchi", "aauw")])
## name totchi aauw
## 1 ABERCROMBIE, NEIL 0 100
## 2 ACKERMAN, GARY L. 3 88
## 3 ADERHOLT, ROBERT B. 0 0
## 4 ALLEN, THOMAS H. 2 100
## 5 ANDREWS, ROBERT E. 2 100
## 6 ARCHER, W.R. 7 0
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 108 / 140
![Page 454: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/454.jpg)
Linear model
summary(lm(aauw ~ totchi, data = girls))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.31 1.81 33.81 <2e-16 ***
## totchi -5.33 0.62 -8.59 <2e-16 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 42 on 1733 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.0408, Adjusted R-squared: 0.0403
## F-statistic: 73.8 on 1 and 1733 DF, p-value: <2e-16
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 109 / 140
![Page 455: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/455.jpg)
Saturated modelsummary(lm(aauw ~ as.factor(totchi), data = girls))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56.41 2.76 20.42 < 2e-16 ***
## as.factor(totchi)1 5.45 4.11 1.33 0.1851
## as.factor(totchi)2 -3.80 3.27 -1.16 0.2454
## as.factor(totchi)3 -13.65 3.45 -3.95 8.1e-05 ***
## as.factor(totchi)4 -19.31 4.01 -4.82 1.6e-06 ***
## as.factor(totchi)5 -15.46 4.85 -3.19 0.0015 **
## as.factor(totchi)6 -33.59 10.42 -3.22 0.0013 **
## as.factor(totchi)7 -17.13 11.41 -1.50 0.1336
## as.factor(totchi)8 -55.33 12.28 -4.51 7.0e-06 ***
## as.factor(totchi)9 -50.41 24.08 -2.09 0.0364 *
## as.factor(totchi)10 -53.41 20.90 -2.56 0.0107 *
## as.factor(totchi)12 -56.41 41.53 -1.36 0.1745
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 41 on 1723 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.0506, Adjusted R-squared: 0.0446
## F-statistic: 8.36 on 11 and 1723 DF, p-value: 1.84e-14
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 110 / 140
![Page 456: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/456.jpg)
Saturated model minus the constantsummary(lm(aauw ~ as.factor(totchi) - 1, data = girls))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## as.factor(totchi)0 56.41 2.76 20.42 <2e-16 ***
## as.factor(totchi)1 61.86 3.05 20.31 <2e-16 ***
## as.factor(totchi)2 52.62 1.75 30.13 <2e-16 ***
## as.factor(totchi)3 42.76 2.07 20.62 <2e-16 ***
## as.factor(totchi)4 37.11 2.90 12.79 <2e-16 ***
## as.factor(totchi)5 40.95 3.99 10.27 <2e-16 ***
## as.factor(totchi)6 22.82 10.05 2.27 0.0233 *
## as.factor(totchi)7 39.29 11.07 3.55 0.0004 ***
## as.factor(totchi)8 1.08 11.96 0.09 0.9278
## as.factor(totchi)9 6.00 23.92 0.25 0.8020
## as.factor(totchi)10 3.00 20.72 0.14 0.8849
## as.factor(totchi)12 0.00 41.43 0.00 1.0000
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 41 on 1723 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.587, Adjusted R-squared: 0.584
## F-statistic: 204 on 12 and 1723 DF, p-value: <2e-16
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 111 / 140
![Page 457: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/457.jpg)
Compare to within-strata means
The saturated model makes no assumptions about the between-stratarelationships.
Just calculates within-strata means:
c1 <- coef(lm(aauw ~ as.factor(totchi) - 1, data = girls))
c2 <- with(girls, tapply(aauw, totchi, mean, na.rm = TRUE))
rbind(c1, c2)
## 0 1 2 3 4 5 6 7 8 9 10 12
## c1 56 62 53 43 37 41 23 39 1.1 6 3 0
## c2 56 62 53 43 37 41 23 39 1.1 6 3 0
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 112 / 140
![Page 458: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/458.jpg)
Compare to within-strata means
The saturated model makes no assumptions about the between-stratarelationships.
Just calculates within-strata means:
c1 <- coef(lm(aauw ~ as.factor(totchi) - 1, data = girls))
c2 <- with(girls, tapply(aauw, totchi, mean, na.rm = TRUE))
rbind(c1, c2)
## 0 1 2 3 4 5 6 7 8 9 10 12
## c1 56 62 53 43 37 41 23 39 1.1 6 3 0
## c2 56 62 53 43 37 41 23 39 1.1 6 3 0
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 112 / 140
![Page 459: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/459.jpg)
Compare to within-strata means
The saturated model makes no assumptions about the between-stratarelationships.
Just calculates within-strata means:
c1 <- coef(lm(aauw ~ as.factor(totchi) - 1, data = girls))
c2 <- with(girls, tapply(aauw, totchi, mean, na.rm = TRUE))
rbind(c1, c2)
## 0 1 2 3 4 5 6 7 8 9 10 12
## c1 56 62 53 43 37 41 23 39 1.1 6 3 0
## c2 56 62 53 43 37 41 23 39 1.1 6 3 0
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 112 / 140
![Page 460: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/460.jpg)
Other justifications for OLS
Justification 2: X ′i β is the best linear predictor (in a mean-squarederror sense) of Yi .
I Why?
β = arg minb E[(Yi − X ′i b)2]
Justification 3: X ′i β provides the minimum mean squared error linearapproximation to E [Yi |Xi ].
Even if the CEF is not linear, a linear regression provides the bestlinear approximation to that CEF.
Don’t need to believe the assumptions (linearity) in order to useregression as a good approximation to the CEF.
Warning if the CEF is very nonlinear then this approximation could beterrible!!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 113 / 140
![Page 461: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/461.jpg)
Other justifications for OLS
Justification 2: X ′i β is the best linear predictor (in a mean-squarederror sense) of Yi .
I Why?
β = arg minb E[(Yi − X ′i b)2]
Justification 3: X ′i β provides the minimum mean squared error linearapproximation to E [Yi |Xi ].
Even if the CEF is not linear, a linear regression provides the bestlinear approximation to that CEF.
Don’t need to believe the assumptions (linearity) in order to useregression as a good approximation to the CEF.
Warning if the CEF is very nonlinear then this approximation could beterrible!!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 113 / 140
![Page 462: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/462.jpg)
Other justifications for OLS
Justification 2: X ′i β is the best linear predictor (in a mean-squarederror sense) of Yi .
I Why? β = arg minb E[(Yi − X ′i b)2]
Justification 3: X ′i β provides the minimum mean squared error linearapproximation to E [Yi |Xi ].
Even if the CEF is not linear, a linear regression provides the bestlinear approximation to that CEF.
Don’t need to believe the assumptions (linearity) in order to useregression as a good approximation to the CEF.
Warning if the CEF is very nonlinear then this approximation could beterrible!!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 113 / 140
![Page 463: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/463.jpg)
Other justifications for OLS
Justification 2: X ′i β is the best linear predictor (in a mean-squarederror sense) of Yi .
I Why? β = arg minb E[(Yi − X ′i b)2]
Justification 3: X ′i β provides the minimum mean squared error linearapproximation to E [Yi |Xi ].
Even if the CEF is not linear, a linear regression provides the bestlinear approximation to that CEF.
Don’t need to believe the assumptions (linearity) in order to useregression as a good approximation to the CEF.
Warning if the CEF is very nonlinear then this approximation could beterrible!!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 113 / 140
![Page 464: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/464.jpg)
Other justifications for OLS
Justification 2: X ′i β is the best linear predictor (in a mean-squarederror sense) of Yi .
I Why? β = arg minb E[(Yi − X ′i b)2]
Justification 3: X ′i β provides the minimum mean squared error linearapproximation to E [Yi |Xi ].
Even if the CEF is not linear, a linear regression provides the bestlinear approximation to that CEF.
Don’t need to believe the assumptions (linearity) in order to useregression as a good approximation to the CEF.
Warning if the CEF is very nonlinear then this approximation could beterrible!!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 113 / 140
![Page 465: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/465.jpg)
Other justifications for OLS
Justification 2: X ′i β is the best linear predictor (in a mean-squarederror sense) of Yi .
I Why? β = arg minb E[(Yi − X ′i b)2]
Justification 3: X ′i β provides the minimum mean squared error linearapproximation to E [Yi |Xi ].
Even if the CEF is not linear, a linear regression provides the bestlinear approximation to that CEF.
Don’t need to believe the assumptions (linearity) in order to useregression as a good approximation to the CEF.
Warning if the CEF is very nonlinear then this approximation could beterrible!!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 113 / 140
![Page 466: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/466.jpg)
Other justifications for OLS
Justification 2: X ′i β is the best linear predictor (in a mean-squarederror sense) of Yi .
I Why? β = arg minb E[(Yi − X ′i b)2]
Justification 3: X ′i β provides the minimum mean squared error linearapproximation to E [Yi |Xi ].
Even if the CEF is not linear, a linear regression provides the bestlinear approximation to that CEF.
Don’t need to believe the assumptions (linearity) in order to useregression as a good approximation to the CEF.
Warning if the CEF is very nonlinear then this approximation could beterrible!!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 113 / 140
![Page 467: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/467.jpg)
The error terms
Let’s define the error term: ei ≡ Yi − X ′i β so that:
Yi = X ′i β + [Yi − X ′i β] = X ′i β + ei
Note the residual ei is uncorrelated with Xi :
E[Xiei ] = E[Xi (Yi − X ′i β)]
= E[XiYi ]− E[XiX′i β]
= E[XiYi ]− E[XiX
′i E[XiX
′i ]−1E[XiYi ]
]= E[XiYi ]− E[XiX
′i ]E[XiX
′i ]−1E[XiYi ]
= E[XiYi ]− E[XiYi ] = 0
No assumptions on the linearity of E[Yi |Xi ].
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 114 / 140
![Page 468: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/468.jpg)
The error terms
Let’s define the error term: ei ≡ Yi − X ′i β so that:
Yi = X ′i β + [Yi − X ′i β] = X ′i β + ei
Note the residual ei is uncorrelated with Xi :
E[Xiei ] = E[Xi (Yi − X ′i β)]
= E[XiYi ]− E[XiX′i β]
= E[XiYi ]− E[XiX
′i E[XiX
′i ]−1E[XiYi ]
]= E[XiYi ]− E[XiX
′i ]E[XiX
′i ]−1E[XiYi ]
= E[XiYi ]− E[XiYi ] = 0
No assumptions on the linearity of E[Yi |Xi ].
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 114 / 140
![Page 469: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/469.jpg)
The error terms
Let’s define the error term: ei ≡ Yi − X ′i β so that:
Yi = X ′i β + [Yi − X ′i β] = X ′i β + ei
Note the residual ei is uncorrelated with Xi :
E[Xiei ] = E[Xi (Yi − X ′i β)]
= E[XiYi ]− E[XiX′i β]
= E[XiYi ]− E[XiX
′i E[XiX
′i ]−1E[XiYi ]
]= E[XiYi ]− E[XiX
′i ]E[XiX
′i ]−1E[XiYi ]
= E[XiYi ]− E[XiYi ] = 0
No assumptions on the linearity of E[Yi |Xi ].
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 114 / 140
![Page 470: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/470.jpg)
The error terms
Let’s define the error term: ei ≡ Yi − X ′i β so that:
Yi = X ′i β + [Yi − X ′i β] = X ′i β + ei
Note the residual ei is uncorrelated with Xi :
E[Xiei ] = E[Xi (Yi − X ′i β)]
= E[XiYi ]− E[XiX′i β]
= E[XiYi ]− E[XiX
′i E[XiX
′i ]−1E[XiYi ]
]= E[XiYi ]− E[XiX
′i ]E[XiX
′i ]−1E[XiYi ]
= E[XiYi ]− E[XiYi ] = 0
No assumptions on the linearity of E[Yi |Xi ].
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 114 / 140
![Page 471: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/471.jpg)
The error terms
Let’s define the error term: ei ≡ Yi − X ′i β so that:
Yi = X ′i β + [Yi − X ′i β] = X ′i β + ei
Note the residual ei is uncorrelated with Xi :
E[Xiei ] = E[Xi (Yi − X ′i β)]
= E[XiYi ]− E[XiX′i β]
= E[XiYi ]− E[XiX
′i E[XiX
′i ]−1E[XiYi ]
]= E[XiYi ]− E[XiX
′i ]E[XiX
′i ]−1E[XiYi ]
= E[XiYi ]− E[XiYi ] = 0
No assumptions on the linearity of E[Yi |Xi ].
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 114 / 140
![Page 472: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/472.jpg)
OLS estimator
We know the population value of β is:
β = E[XiX′i ]−1E[XiYi ]
How do we get an estimator of this?
Plug-in principle replace population expectation with sampleversions:
β =
[1
N
∑i
XiX′i
]−11
N
∑i
XiYi
If you work through the matrix algebra, this turns out to be:
β =(X′X
)−1X′y
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 115 / 140
![Page 473: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/473.jpg)
OLS estimator
We know the population value of β is:
β = E[XiX′i ]−1E[XiYi ]
How do we get an estimator of this?
Plug-in principle replace population expectation with sampleversions:
β =
[1
N
∑i
XiX′i
]−11
N
∑i
XiYi
If you work through the matrix algebra, this turns out to be:
β =(X′X
)−1X′y
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 115 / 140
![Page 474: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/474.jpg)
OLS estimator
We know the population value of β is:
β = E[XiX′i ]−1E[XiYi ]
How do we get an estimator of this?
Plug-in principle replace population expectation with sampleversions:
β =
[1
N
∑i
XiX′i
]−11
N
∑i
XiYi
If you work through the matrix algebra, this turns out to be:
β =(X′X
)−1X′y
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 115 / 140
![Page 475: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/475.jpg)
OLS estimator
We know the population value of β is:
β = E[XiX′i ]−1E[XiYi ]
How do we get an estimator of this?
Plug-in principle replace population expectation with sampleversions:
β =
[1
N
∑i
XiX′i
]−11
N
∑i
XiYi
If you work through the matrix algebra, this turns out to be:
β =(X′X
)−1X′y
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 115 / 140
![Page 476: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/476.jpg)
Asymptotic OLS inference
With this representation in hand, we can write the OLS estimator asfollows:
β = β +
[∑i
XiX′i
]−1∑i
Xiei
Core idea:∑
i Xiei is the sum of r.v.s so the CLT applies.
That, plus some simple asymptotic theory allows us to say:
√N(β − β) N(0,Ω)
Converges in distribution to a Normal distribution with mean vector 0and covariance matrix, Ω:
Ω = E[XiX′i ]−1E[XiX
′i e
2i ]E[XiX
′i ]−1.
No linearity assumption needed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 116 / 140
![Page 477: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/477.jpg)
Asymptotic OLS inference
With this representation in hand, we can write the OLS estimator asfollows:
β = β +
[∑i
XiX′i
]−1∑i
Xiei
Core idea:∑
i Xiei is the sum of r.v.s so the CLT applies.
That, plus some simple asymptotic theory allows us to say:
√N(β − β) N(0,Ω)
Converges in distribution to a Normal distribution with mean vector 0and covariance matrix, Ω:
Ω = E[XiX′i ]−1E[XiX
′i e
2i ]E[XiX
′i ]−1.
No linearity assumption needed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 116 / 140
![Page 478: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/478.jpg)
Asymptotic OLS inference
With this representation in hand, we can write the OLS estimator asfollows:
β = β +
[∑i
XiX′i
]−1∑i
Xiei
Core idea:∑
i Xiei is the sum of r.v.s so the CLT applies.
That, plus some simple asymptotic theory allows us to say:
√N(β − β) N(0,Ω)
Converges in distribution to a Normal distribution with mean vector 0and covariance matrix, Ω:
Ω = E[XiX′i ]−1E[XiX
′i e
2i ]E[XiX
′i ]−1.
No linearity assumption needed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 116 / 140
![Page 479: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/479.jpg)
Asymptotic OLS inference
With this representation in hand, we can write the OLS estimator asfollows:
β = β +
[∑i
XiX′i
]−1∑i
Xiei
Core idea:∑
i Xiei is the sum of r.v.s so the CLT applies.
That, plus some simple asymptotic theory allows us to say:
√N(β − β) N(0,Ω)
Converges in distribution to a Normal distribution with mean vector 0and covariance matrix, Ω:
Ω = E[XiX′i ]−1E[XiX
′i e
2i ]E[XiX
′i ]−1.
No linearity assumption needed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 116 / 140
![Page 480: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/480.jpg)
Asymptotic OLS inference
With this representation in hand, we can write the OLS estimator asfollows:
β = β +
[∑i
XiX′i
]−1∑i
Xiei
Core idea:∑
i Xiei is the sum of r.v.s so the CLT applies.
That, plus some simple asymptotic theory allows us to say:
√N(β − β) N(0,Ω)
Converges in distribution to a Normal distribution with mean vector 0and covariance matrix, Ω:
Ω = E[XiX′i ]−1E[XiX
′i e
2i ]E[XiX
′i ]−1.
No linearity assumption needed!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 116 / 140
![Page 481: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/481.jpg)
Estimating the variance
In large samples then:
√N(β − β) ∼ N(0,Ω)
How to estimate Ω? Plug-in principle again!
Ω =
[∑i
XiX′i
]−1 [∑i
XiX′i e
2i
][∑i
XiX′i
]−1
.
Replace ei with its emprical counterpart (residuals) ei = Yi − X ′i β.
Replace the population moments of Xi with their sample counterparts.
The square root of the diagonals of this covariance matrix are the“robust” or Huber-White standard errors (we will return to this in afew classes).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 117 / 140
![Page 482: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/482.jpg)
Estimating the variance
In large samples then:
√N(β − β) ∼ N(0,Ω)
How to estimate Ω? Plug-in principle again!
Ω =
[∑i
XiX′i
]−1 [∑i
XiX′i e
2i
][∑i
XiX′i
]−1
.
Replace ei with its emprical counterpart (residuals) ei = Yi − X ′i β.
Replace the population moments of Xi with their sample counterparts.
The square root of the diagonals of this covariance matrix are the“robust” or Huber-White standard errors (we will return to this in afew classes).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 117 / 140
![Page 483: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/483.jpg)
Estimating the variance
In large samples then:
√N(β − β) ∼ N(0,Ω)
How to estimate Ω? Plug-in principle again!
Ω =
[∑i
XiX′i
]−1 [∑i
XiX′i e
2i
][∑i
XiX′i
]−1
.
Replace ei with its emprical counterpart (residuals) ei = Yi − X ′i β.
Replace the population moments of Xi with their sample counterparts.
The square root of the diagonals of this covariance matrix are the“robust” or Huber-White standard errors (we will return to this in afew classes).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 117 / 140
![Page 484: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/484.jpg)
Estimating the variance
In large samples then:
√N(β − β) ∼ N(0,Ω)
How to estimate Ω? Plug-in principle again!
Ω =
[∑i
XiX′i
]−1 [∑i
XiX′i e
2i
][∑i
XiX′i
]−1
.
Replace ei with its emprical counterpart (residuals) ei = Yi − X ′i β.
Replace the population moments of Xi with their sample counterparts.
The square root of the diagonals of this covariance matrix are the“robust” or Huber-White standard errors (we will return to this in afew classes).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 117 / 140
![Page 485: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/485.jpg)
Estimating the variance
In large samples then:
√N(β − β) ∼ N(0,Ω)
How to estimate Ω? Plug-in principle again!
Ω =
[∑i
XiX′i
]−1 [∑i
XiX′i e
2i
][∑i
XiX′i
]−1
.
Replace ei with its emprical counterpart (residuals) ei = Yi − X ′i β.
Replace the population moments of Xi with their sample counterparts.
The square root of the diagonals of this covariance matrix are the“robust” or Huber-White standard errors (we will return to this in afew classes).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 117 / 140
![Page 486: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/486.jpg)
The Agnostic Statistics Perspective
The key insight here is that we can derive estimators for properties ofthe conditional expectation function under somewhat weakerassumptions
They still rely heavily on large samples (asymptotic results) andindependent samples.
See the Aronow and Miller textbook for a great explanation anddefense of this worldview (also Lin’s 2013 ‘Agnostic notes onregression adjustments to experimental data’).
We will come back to re-thinking the implications for finite samplesduring the diagnostics classes.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 118 / 140
![Page 487: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/487.jpg)
The Agnostic Statistics Perspective
The key insight here is that we can derive estimators for properties ofthe conditional expectation function under somewhat weakerassumptions
They still rely heavily on large samples (asymptotic results) andindependent samples.
See the Aronow and Miller textbook for a great explanation anddefense of this worldview (also Lin’s 2013 ‘Agnostic notes onregression adjustments to experimental data’).
We will come back to re-thinking the implications for finite samplesduring the diagnostics classes.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 118 / 140
![Page 488: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/488.jpg)
The Agnostic Statistics Perspective
The key insight here is that we can derive estimators for properties ofthe conditional expectation function under somewhat weakerassumptions
They still rely heavily on large samples (asymptotic results) andindependent samples.
See the Aronow and Miller textbook for a great explanation anddefense of this worldview (also Lin’s 2013 ‘Agnostic notes onregression adjustments to experimental data’).
We will come back to re-thinking the implications for finite samplesduring the diagnostics classes.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 118 / 140
![Page 489: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/489.jpg)
The Agnostic Statistics Perspective
The key insight here is that we can derive estimators for properties ofthe conditional expectation function under somewhat weakerassumptions
They still rely heavily on large samples (asymptotic results) andindependent samples.
See the Aronow and Miller textbook for a great explanation anddefense of this worldview (also Lin’s 2013 ‘Agnostic notes onregression adjustments to experimental data’).
We will come back to re-thinking the implications for finite samplesduring the diagnostics classes.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 118 / 140
![Page 490: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/490.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 119 / 140
![Page 491: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/491.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 119 / 140
![Page 492: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/492.jpg)
The Bootstrap
How do we do inference if we don’t know how to construct the samplingdistribution for our estimator?
Idea of the Bootstrap: Use the empirical CDF (eCDF) as a plug-in for theCDF, and resample from that.
We are pretending our sample eCDF looks sufficiently close to our trueCDF, and so we’re sampling from the eCDF as an approximation torepeated sampling from the true CDF. This is called a resampling method.
1 Take a with replacement sample of size n from our sample.
2 Calculate our would-be estimate using this bootstrap sample.
3 Repeat steps 1 and 2 many (B) times.
4 Using the resulting collection of bootstrap estimates to calculateestimates of the standard error or confidence intervals (more later).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 120 / 140
![Page 493: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/493.jpg)
The Bootstrap
How do we do inference if we don’t know how to construct the samplingdistribution for our estimator?
Idea of the Bootstrap: Use the empirical CDF (eCDF) as a plug-in for theCDF, and resample from that.
We are pretending our sample eCDF looks sufficiently close to our trueCDF, and so we’re sampling from the eCDF as an approximation torepeated sampling from the true CDF. This is called a resampling method.
1 Take a with replacement sample of size n from our sample.
2 Calculate our would-be estimate using this bootstrap sample.
3 Repeat steps 1 and 2 many (B) times.
4 Using the resulting collection of bootstrap estimates to calculateestimates of the standard error or confidence intervals (more later).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 120 / 140
![Page 494: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/494.jpg)
The Bootstrap
How do we do inference if we don’t know how to construct the samplingdistribution for our estimator?
Idea of the Bootstrap: Use the empirical CDF (eCDF) as a plug-in for theCDF, and resample from that.
We are pretending our sample eCDF looks sufficiently close to our trueCDF, and so we’re sampling from the eCDF as an approximation torepeated sampling from the true CDF. This is called a resampling method.
1 Take a with replacement sample of size n from our sample.
2 Calculate our would-be estimate using this bootstrap sample.
3 Repeat steps 1 and 2 many (B) times.
4 Using the resulting collection of bootstrap estimates to calculateestimates of the standard error or confidence intervals (more later).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 120 / 140
![Page 495: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/495.jpg)
The Bootstrap
How do we do inference if we don’t know how to construct the samplingdistribution for our estimator?
Idea of the Bootstrap: Use the empirical CDF (eCDF) as a plug-in for theCDF, and resample from that.
We are pretending our sample eCDF looks sufficiently close to our trueCDF, and so we’re sampling from the eCDF as an approximation torepeated sampling from the true CDF. This is called a resampling method.
1 Take a with replacement sample of size n from our sample.
2 Calculate our would-be estimate using this bootstrap sample.
3 Repeat steps 1 and 2 many (B) times.
4 Using the resulting collection of bootstrap estimates to calculateestimates of the standard error or confidence intervals (more later).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 120 / 140
![Page 496: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/496.jpg)
The Bootstrap
How do we do inference if we don’t know how to construct the samplingdistribution for our estimator?
Idea of the Bootstrap: Use the empirical CDF (eCDF) as a plug-in for theCDF, and resample from that.
We are pretending our sample eCDF looks sufficiently close to our trueCDF, and so we’re sampling from the eCDF as an approximation torepeated sampling from the true CDF. This is called a resampling method.
1 Take a with replacement sample of size n from our sample.
2 Calculate our would-be estimate using this bootstrap sample.
3 Repeat steps 1 and 2 many (B) times.
4 Using the resulting collection of bootstrap estimates to calculateestimates of the standard error or confidence intervals (more later).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 120 / 140
![Page 497: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/497.jpg)
The Bootstrap
How do we do inference if we don’t know how to construct the samplingdistribution for our estimator?
Idea of the Bootstrap: Use the empirical CDF (eCDF) as a plug-in for theCDF, and resample from that.
We are pretending our sample eCDF looks sufficiently close to our trueCDF, and so we’re sampling from the eCDF as an approximation torepeated sampling from the true CDF. This is called a resampling method.
1 Take a with replacement sample of size n from our sample.
2 Calculate our would-be estimate using this bootstrap sample.
3 Repeat steps 1 and 2 many (B) times.
4 Using the resulting collection of bootstrap estimates to calculateestimates of the standard error or confidence intervals (more later).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 120 / 140
![Page 498: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/498.jpg)
The Bootstrap
How do we do inference if we don’t know how to construct the samplingdistribution for our estimator?
Idea of the Bootstrap: Use the empirical CDF (eCDF) as a plug-in for theCDF, and resample from that.
We are pretending our sample eCDF looks sufficiently close to our trueCDF, and so we’re sampling from the eCDF as an approximation torepeated sampling from the true CDF. This is called a resampling method.
1 Take a with replacement sample of size n from our sample.
2 Calculate our would-be estimate using this bootstrap sample.
3 Repeat steps 1 and 2 many (B) times.
4 Using the resulting collection of bootstrap estimates to calculateestimates of the standard error or confidence intervals (more later).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 120 / 140
![Page 499: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/499.jpg)
Why The Bootstrap
If we have a very complicated estimator, such as
X41/(X 2 − X 3)2 where samples 2 and 3 are not drawn independently
(β0 + β1)3/(2 + β22)
Then the bootstrap is very useful because we don’t have to derive theanalytical expectation and variance, we can calculate them in thebootstrap.
Bootstrap is really useful when:
you want to avoid making a normal approximation
you want to avoid making certain assumptions (i.e. homoskedasticity)
you are considering an estimator for which an analytical varianceestimator is hard or impossible to calculate.
This is the closest thing to magic I will show you all semester.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 121 / 140
![Page 500: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/500.jpg)
Why The Bootstrap
If we have a very complicated estimator, such as
X41/(X 2 − X 3)2 where samples 2 and 3 are not drawn independently
(β0 + β1)3/(2 + β22)
Then the bootstrap is very useful because we don’t have to derive theanalytical expectation and variance, we can calculate them in thebootstrap.
Bootstrap is really useful when:
you want to avoid making a normal approximation
you want to avoid making certain assumptions (i.e. homoskedasticity)
you are considering an estimator for which an analytical varianceestimator is hard or impossible to calculate.
This is the closest thing to magic I will show you all semester.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 121 / 140
![Page 501: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/501.jpg)
Why The Bootstrap
If we have a very complicated estimator, such as
X41/(X 2 − X 3)2 where samples 2 and 3 are not drawn independently
(β0 + β1)3/(2 + β22)
Then the bootstrap is very useful because we don’t have to derive theanalytical expectation and variance, we can calculate them in thebootstrap.
Bootstrap is really useful when:
you want to avoid making a normal approximation
you want to avoid making certain assumptions (i.e. homoskedasticity)
you are considering an estimator for which an analytical varianceestimator is hard or impossible to calculate.
This is the closest thing to magic I will show you all semester.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 121 / 140
![Page 502: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/502.jpg)
Why The Bootstrap
If we have a very complicated estimator, such as
X41/(X 2 − X 3)2 where samples 2 and 3 are not drawn independently
(β0 + β1)3/(2 + β22)
Then the bootstrap is very useful because we don’t have to derive theanalytical expectation and variance, we can calculate them in thebootstrap.
Bootstrap is really useful when:
you want to avoid making a normal approximation
you want to avoid making certain assumptions (i.e. homoskedasticity)
you are considering an estimator for which an analytical varianceestimator is hard or impossible to calculate.
This is the closest thing to magic I will show you all semester.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 121 / 140
![Page 503: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/503.jpg)
Why The Bootstrap
If we have a very complicated estimator, such as
X41/(X 2 − X 3)2 where samples 2 and 3 are not drawn independently
(β0 + β1)3/(2 + β22)
Then the bootstrap is very useful because we don’t have to derive theanalytical expectation and variance, we can calculate them in thebootstrap.
Bootstrap is really useful when:
you want to avoid making a normal approximation
you want to avoid making certain assumptions (i.e. homoskedasticity)
you are considering an estimator for which an analytical varianceestimator is hard or impossible to calculate.
This is the closest thing to magic I will show you all semester.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 121 / 140
![Page 504: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/504.jpg)
Why The Bootstrap
If we have a very complicated estimator, such as
X41/(X 2 − X 3)2 where samples 2 and 3 are not drawn independently
(β0 + β1)3/(2 + β22)
Then the bootstrap is very useful because we don’t have to derive theanalytical expectation and variance, we can calculate them in thebootstrap.
Bootstrap is really useful when:
you want to avoid making a normal approximation
you want to avoid making certain assumptions (i.e. homoskedasticity)
you are considering an estimator for which an analytical varianceestimator is hard or impossible to calculate.
This is the closest thing to magic I will show you all semester.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 121 / 140
![Page 505: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/505.jpg)
Why The Bootstrap
If we have a very complicated estimator, such as
X41/(X 2 − X 3)2 where samples 2 and 3 are not drawn independently
(β0 + β1)3/(2 + β22)
Then the bootstrap is very useful because we don’t have to derive theanalytical expectation and variance, we can calculate them in thebootstrap.
Bootstrap is really useful when:
you want to avoid making a normal approximation
you want to avoid making certain assumptions (i.e. homoskedasticity)
you are considering an estimator for which an analytical varianceestimator is hard or impossible to calculate.
This is the closest thing to magic I will show you all semester.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 121 / 140
![Page 506: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/506.jpg)
Why The Bootstrap
If we have a very complicated estimator, such as
X41/(X 2 − X 3)2 where samples 2 and 3 are not drawn independently
(β0 + β1)3/(2 + β22)
Then the bootstrap is very useful because we don’t have to derive theanalytical expectation and variance, we can calculate them in thebootstrap.
Bootstrap is really useful when:
you want to avoid making a normal approximation
you want to avoid making certain assumptions (i.e. homoskedasticity)
you are considering an estimator for which an analytical varianceestimator is hard or impossible to calculate.
This is the closest thing to magic I will show you all semester.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 121 / 140
![Page 507: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/507.jpg)
Two ways to calculate intervals and p-values
Using normal approximation intervals and p-values, use the estimatesfrom step 4.[
X − Φ−1(1− α/2) ∗ σboot ,X + Φ−1(1− α/2) ∗ σboot]
I Intuition check: the standard error is just the standard deviation of thebootstrap replicates. There is not square root of n. Why?
Percentile method for the CI: Sort B bootstrap estimates fromsmallest to largest. Grab the values at α/2 ∗ B and 1− α/2 ∗ Bposition.
I Percentile method does not rely on normal approximation but requiresvery large B and thus more computational time.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 122 / 140
![Page 508: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/508.jpg)
Two ways to calculate intervals and p-values
Using normal approximation intervals and p-values, use the estimatesfrom step 4.[
X − Φ−1(1− α/2) ∗ σboot ,X + Φ−1(1− α/2) ∗ σboot]
I Intuition check: the standard error is just the standard deviation of thebootstrap replicates. There is not square root of n. Why?
Percentile method for the CI: Sort B bootstrap estimates fromsmallest to largest. Grab the values at α/2 ∗ B and 1− α/2 ∗ Bposition.
I Percentile method does not rely on normal approximation but requiresvery large B and thus more computational time.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 122 / 140
![Page 509: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/509.jpg)
Two ways to calculate intervals and p-values
Using normal approximation intervals and p-values, use the estimatesfrom step 4.[
X − Φ−1(1− α/2) ∗ σboot ,X + Φ−1(1− α/2) ∗ σboot]
I Intuition check: the standard error is just the standard deviation of thebootstrap replicates. There is not square root of n. Why?
Percentile method for the CI: Sort B bootstrap estimates fromsmallest to largest. Grab the values at α/2 ∗ B and 1− α/2 ∗ Bposition.
I Percentile method does not rely on normal approximation but requiresvery large B and thus more computational time.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 122 / 140
![Page 510: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/510.jpg)
Two ways to calculate intervals and p-values
Using normal approximation intervals and p-values, use the estimatesfrom step 4.[
X − Φ−1(1− α/2) ∗ σboot ,X + Φ−1(1− α/2) ∗ σboot]
I Intuition check: the standard error is just the standard deviation of thebootstrap replicates. There is not square root of n. Why?
Percentile method for the CI: Sort B bootstrap estimates fromsmallest to largest. Grab the values at α/2 ∗ B and 1− α/2 ∗ Bposition.
I Percentile method does not rely on normal approximation but requiresvery large B and thus more computational time.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 122 / 140
![Page 511: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/511.jpg)
Two ways to calculate intervals and p-values
Using normal approximation intervals and p-values, use the estimatesfrom step 4.[
X − Φ−1(1− α/2) ∗ σboot ,X + Φ−1(1− α/2) ∗ σboot]
I Intuition check: the standard error is just the standard deviation of thebootstrap replicates. There is not square root of n. Why?
Percentile method for the CI: Sort B bootstrap estimates fromsmallest to largest. Grab the values at α/2 ∗ B and 1− α/2 ∗ Bposition.
I Percentile method does not rely on normal approximation but requiresvery large B and thus more computational time.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 122 / 140
![Page 512: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/512.jpg)
Example: Linear Regression
We skimmed over the sampling distribution of the variance parameter in alinear regression earlier.
It turns out that σ2 ∼ χ2n−(K+1).
But instead we’ll use Bootstrap:
1) Sample from data set, with replacement n times, X2) Calculate f (X ) (in this case a regression)
3) Repeat B times, form distribution of statistics
4) Calculate confidence interval by identifying α/2 and 1− α/2 value ofstatistic. (percentile method)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 123 / 140
![Page 513: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/513.jpg)
Example: Linear Regression
We skimmed over the sampling distribution of the variance parameter in alinear regression earlier.
It turns out that σ2 ∼ χ2n−(K+1).
But instead we’ll use Bootstrap:
1) Sample from data set, with replacement n times, X2) Calculate f (X ) (in this case a regression)
3) Repeat B times, form distribution of statistics
4) Calculate confidence interval by identifying α/2 and 1− α/2 value ofstatistic. (percentile method)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 123 / 140
![Page 514: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/514.jpg)
Example: Linear Regression
We skimmed over the sampling distribution of the variance parameter in alinear regression earlier.
It turns out that σ2 ∼ χ2n−(K+1).
But instead we’ll use Bootstrap:
1) Sample from data set, with replacement n times, X2) Calculate f (X ) (in this case a regression)
3) Repeat B times, form distribution of statistics
4) Calculate confidence interval by identifying α/2 and 1− α/2 value ofstatistic. (percentile method)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 123 / 140
![Page 515: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/515.jpg)
Example: Linear Regression
We skimmed over the sampling distribution of the variance parameter in alinear regression earlier.
It turns out that σ2 ∼ χ2n−(K+1).
But instead we’ll use Bootstrap:
1) Sample from data set, with replacement n times, X2) Calculate f (X ) (in this case a regression)
3) Repeat B times, form distribution of statistics
4) Calculate confidence interval by identifying α/2 and 1− α/2 value ofstatistic. (percentile method)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 123 / 140
![Page 516: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/516.jpg)
Example: Linear Regression
We skimmed over the sampling distribution of the variance parameter in alinear regression earlier.
It turns out that σ2 ∼ χ2n−(K+1).
But instead we’ll use Bootstrap:
1) Sample from data set, with replacement n times, X
2) Calculate f (X ) (in this case a regression)
3) Repeat B times, form distribution of statistics
4) Calculate confidence interval by identifying α/2 and 1− α/2 value ofstatistic. (percentile method)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 123 / 140
![Page 517: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/517.jpg)
Example: Linear Regression
We skimmed over the sampling distribution of the variance parameter in alinear regression earlier.
It turns out that σ2 ∼ χ2n−(K+1).
But instead we’ll use Bootstrap:
1) Sample from data set, with replacement n times, X2) Calculate f (X ) (in this case a regression)
3) Repeat B times, form distribution of statistics
4) Calculate confidence interval by identifying α/2 and 1− α/2 value ofstatistic. (percentile method)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 123 / 140
![Page 518: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/518.jpg)
Example: Linear Regression
We skimmed over the sampling distribution of the variance parameter in alinear regression earlier.
It turns out that σ2 ∼ χ2n−(K+1).
But instead we’ll use Bootstrap:
1) Sample from data set, with replacement n times, X2) Calculate f (X ) (in this case a regression)
3) Repeat B times, form distribution of statistics
4) Calculate confidence interval by identifying α/2 and 1− α/2 value ofstatistic. (percentile method)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 123 / 140
![Page 519: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/519.jpg)
Example: Linear Regression
We skimmed over the sampling distribution of the variance parameter in alinear regression earlier.
It turns out that σ2 ∼ χ2n−(K+1).
But instead we’ll use Bootstrap:
1) Sample from data set, with replacement n times, X2) Calculate f (X ) (in this case a regression)
3) Repeat B times, form distribution of statistics
4) Calculate confidence interval by identifying α/2 and 1− α/2 value ofstatistic. (percentile method)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 123 / 140
![Page 520: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/520.jpg)
The Bootstrap More Formally
What we are discussing is the nonparametric bootstrap
y1, . . . , yn are the outcomes of independent and identically distributedrandom variables Y1, . . . ,Yn whose PDF and CDF are denoted by fand F .
The sample is used to make inferences about an estimand, denoted byθ using a statistic T whose value in the sample is t.
If we observed F , statistical inference would be very easy, but insteadwe observe F , which is the empirical distribution that put equalprobabilities n−1 at each sample value yi .
I Estimates are constructed by the plug-in principle, which says that theparameter θ = t(F ) is estimated by θ = t(F ). (i.e. we plug in theECDF for the CDF)
I Why does this work? Sampling distribution entirely determined by theCDF and n, WLLN says the ECDF will look more and more like theCDF as n gets large.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 124 / 140
![Page 521: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/521.jpg)
The Bootstrap More Formally
What we are discussing is the nonparametric bootstrap
y1, . . . , yn are the outcomes of independent and identically distributedrandom variables Y1, . . . ,Yn whose PDF and CDF are denoted by fand F .
The sample is used to make inferences about an estimand, denoted byθ using a statistic T whose value in the sample is t.
If we observed F , statistical inference would be very easy, but insteadwe observe F , which is the empirical distribution that put equalprobabilities n−1 at each sample value yi .
I Estimates are constructed by the plug-in principle, which says that theparameter θ = t(F ) is estimated by θ = t(F ). (i.e. we plug in theECDF for the CDF)
I Why does this work? Sampling distribution entirely determined by theCDF and n, WLLN says the ECDF will look more and more like theCDF as n gets large.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 124 / 140
![Page 522: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/522.jpg)
The Bootstrap More Formally
What we are discussing is the nonparametric bootstrap
y1, . . . , yn are the outcomes of independent and identically distributedrandom variables Y1, . . . ,Yn whose PDF and CDF are denoted by fand F .
The sample is used to make inferences about an estimand, denoted byθ using a statistic T whose value in the sample is t.
If we observed F , statistical inference would be very easy, but insteadwe observe F , which is the empirical distribution that put equalprobabilities n−1 at each sample value yi .
I Estimates are constructed by the plug-in principle, which says that theparameter θ = t(F ) is estimated by θ = t(F ). (i.e. we plug in theECDF for the CDF)
I Why does this work? Sampling distribution entirely determined by theCDF and n, WLLN says the ECDF will look more and more like theCDF as n gets large.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 124 / 140
![Page 523: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/523.jpg)
The Bootstrap More Formally
What we are discussing is the nonparametric bootstrap
y1, . . . , yn are the outcomes of independent and identically distributedrandom variables Y1, . . . ,Yn whose PDF and CDF are denoted by fand F .
The sample is used to make inferences about an estimand, denoted byθ using a statistic T whose value in the sample is t.
If we observed F , statistical inference would be very easy, but insteadwe observe F , which is the empirical distribution that put equalprobabilities n−1 at each sample value yi .
I Estimates are constructed by the plug-in principle, which says that theparameter θ = t(F ) is estimated by θ = t(F ). (i.e. we plug in theECDF for the CDF)
I Why does this work? Sampling distribution entirely determined by theCDF and n, WLLN says the ECDF will look more and more like theCDF as n gets large.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 124 / 140
![Page 524: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/524.jpg)
The Bootstrap More Formally
What we are discussing is the nonparametric bootstrap
y1, . . . , yn are the outcomes of independent and identically distributedrandom variables Y1, . . . ,Yn whose PDF and CDF are denoted by fand F .
The sample is used to make inferences about an estimand, denoted byθ using a statistic T whose value in the sample is t.
If we observed F , statistical inference would be very easy, but insteadwe observe F , which is the empirical distribution that put equalprobabilities n−1 at each sample value yi .
I Estimates are constructed by the plug-in principle, which says that theparameter θ = t(F ) is estimated by θ = t(F ). (i.e. we plug in theECDF for the CDF)
I Why does this work? Sampling distribution entirely determined by theCDF and n, WLLN says the ECDF will look more and more like theCDF as n gets large.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 124 / 140
![Page 525: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/525.jpg)
The Bootstrap More Formally
What we are discussing is the nonparametric bootstrap
y1, . . . , yn are the outcomes of independent and identically distributedrandom variables Y1, . . . ,Yn whose PDF and CDF are denoted by fand F .
The sample is used to make inferences about an estimand, denoted byθ using a statistic T whose value in the sample is t.
If we observed F , statistical inference would be very easy, but insteadwe observe F , which is the empirical distribution that put equalprobabilities n−1 at each sample value yi .
I Estimates are constructed by the plug-in principle, which says that theparameter θ = t(F ) is estimated by θ = t(F ). (i.e. we plug in theECDF for the CDF)
I Why does this work? Sampling distribution entirely determined by theCDF and n, WLLN says the ECDF will look more and more like theCDF as n gets large.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 124 / 140
![Page 526: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/526.jpg)
The Bootstrap More Formally
What we are discussing is the nonparametric bootstrap
y1, . . . , yn are the outcomes of independent and identically distributedrandom variables Y1, . . . ,Yn whose PDF and CDF are denoted by fand F .
The sample is used to make inferences about an estimand, denoted byθ using a statistic T whose value in the sample is t.
If we observed F , statistical inference would be very easy, but insteadwe observe F , which is the empirical distribution that put equalprobabilities n−1 at each sample value yi .
I Estimates are constructed by the plug-in principle, which says that theparameter θ = t(F ) is estimated by θ = t(F ). (i.e. we plug in theECDF for the CDF)
I Why does this work? Sampling distribution entirely determined by theCDF and n, WLLN says the ECDF will look more and more like theCDF as n gets large.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 124 / 140
![Page 527: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/527.jpg)
When Does the Bootstrap Fail?Bootstrap works in a wide variety of circumstances, but it does requiresome regularity conditions and it can fail with certain types of data andestimators:
Bootstrap fails when the sampling distribution of the estimator isnon-smooth. (e.g. max and min).Dependent data: nonparametric bootstrap assumes data soindependent so will not work with time series data or other dependentstructures.
I For clustered data, standard bootstrap will not work, but the blockbootstrap will work. In the block bootstrap, clusters are resampled (notnecessarily units) with replacement.
I More on this later.
Many other variants that may be right for certain situations:studentized intervals, jackknife, parametric bootstrap, bag of littlebootstraps, bootstrapping for complex survey designs, etc.
Fox Chapter 21 has a nice section on the bootstrap, Aronow and Miller(2016) covers the theory well.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 125 / 140
![Page 528: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/528.jpg)
When Does the Bootstrap Fail?Bootstrap works in a wide variety of circumstances, but it does requiresome regularity conditions and it can fail with certain types of data andestimators:
Bootstrap fails when the sampling distribution of the estimator isnon-smooth. (e.g. max and min).
Dependent data: nonparametric bootstrap assumes data soindependent so will not work with time series data or other dependentstructures.
I For clustered data, standard bootstrap will not work, but the blockbootstrap will work. In the block bootstrap, clusters are resampled (notnecessarily units) with replacement.
I More on this later.
Many other variants that may be right for certain situations:studentized intervals, jackknife, parametric bootstrap, bag of littlebootstraps, bootstrapping for complex survey designs, etc.
Fox Chapter 21 has a nice section on the bootstrap, Aronow and Miller(2016) covers the theory well.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 125 / 140
![Page 529: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/529.jpg)
When Does the Bootstrap Fail?Bootstrap works in a wide variety of circumstances, but it does requiresome regularity conditions and it can fail with certain types of data andestimators:
Bootstrap fails when the sampling distribution of the estimator isnon-smooth. (e.g. max and min).Dependent data: nonparametric bootstrap assumes data soindependent so will not work with time series data or other dependentstructures.
I For clustered data, standard bootstrap will not work, but the blockbootstrap will work. In the block bootstrap, clusters are resampled (notnecessarily units) with replacement.
I More on this later.
Many other variants that may be right for certain situations:studentized intervals, jackknife, parametric bootstrap, bag of littlebootstraps, bootstrapping for complex survey designs, etc.
Fox Chapter 21 has a nice section on the bootstrap, Aronow and Miller(2016) covers the theory well.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 125 / 140
![Page 530: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/530.jpg)
When Does the Bootstrap Fail?Bootstrap works in a wide variety of circumstances, but it does requiresome regularity conditions and it can fail with certain types of data andestimators:
Bootstrap fails when the sampling distribution of the estimator isnon-smooth. (e.g. max and min).Dependent data: nonparametric bootstrap assumes data soindependent so will not work with time series data or other dependentstructures.
I For clustered data, standard bootstrap will not work, but the blockbootstrap will work. In the block bootstrap, clusters are resampled (notnecessarily units) with replacement.
I More on this later.
Many other variants that may be right for certain situations:studentized intervals, jackknife, parametric bootstrap, bag of littlebootstraps, bootstrapping for complex survey designs, etc.
Fox Chapter 21 has a nice section on the bootstrap, Aronow and Miller(2016) covers the theory well.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 125 / 140
![Page 531: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/531.jpg)
When Does the Bootstrap Fail?Bootstrap works in a wide variety of circumstances, but it does requiresome regularity conditions and it can fail with certain types of data andestimators:
Bootstrap fails when the sampling distribution of the estimator isnon-smooth. (e.g. max and min).Dependent data: nonparametric bootstrap assumes data soindependent so will not work with time series data or other dependentstructures.
I For clustered data, standard bootstrap will not work, but the blockbootstrap will work. In the block bootstrap, clusters are resampled (notnecessarily units) with replacement.
I More on this later.
Many other variants that may be right for certain situations:studentized intervals, jackknife, parametric bootstrap, bag of littlebootstraps, bootstrapping for complex survey designs, etc.
Fox Chapter 21 has a nice section on the bootstrap, Aronow and Miller(2016) covers the theory well.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 125 / 140
![Page 532: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/532.jpg)
When Does the Bootstrap Fail?Bootstrap works in a wide variety of circumstances, but it does requiresome regularity conditions and it can fail with certain types of data andestimators:
Bootstrap fails when the sampling distribution of the estimator isnon-smooth. (e.g. max and min).Dependent data: nonparametric bootstrap assumes data soindependent so will not work with time series data or other dependentstructures.
I For clustered data, standard bootstrap will not work, but the blockbootstrap will work. In the block bootstrap, clusters are resampled (notnecessarily units) with replacement.
I More on this later.
Many other variants that may be right for certain situations:studentized intervals, jackknife, parametric bootstrap, bag of littlebootstraps, bootstrapping for complex survey designs, etc.
Fox Chapter 21 has a nice section on the bootstrap, Aronow and Miller(2016) covers the theory well.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 125 / 140
![Page 533: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/533.jpg)
When Does the Bootstrap Fail?Bootstrap works in a wide variety of circumstances, but it does requiresome regularity conditions and it can fail with certain types of data andestimators:
Bootstrap fails when the sampling distribution of the estimator isnon-smooth. (e.g. max and min).Dependent data: nonparametric bootstrap assumes data soindependent so will not work with time series data or other dependentstructures.
I For clustered data, standard bootstrap will not work, but the blockbootstrap will work. In the block bootstrap, clusters are resampled (notnecessarily units) with replacement.
I More on this later.
Many other variants that may be right for certain situations:studentized intervals, jackknife, parametric bootstrap, bag of littlebootstraps, bootstrapping for complex survey designs, etc.
Fox Chapter 21 has a nice section on the bootstrap, Aronow and Miller(2016) covers the theory well.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 125 / 140
![Page 534: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/534.jpg)
Today in Summary
The difficulty of interpreting p-values
Agnostic Regression: how will regression behave in large samples if wedon’t really buy the assumptions
Bootstrap: a close to assumption free way of constructing confidenceintervals
Appendix contains a fun example of the difficulty of thinking throughp-values.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 126 / 140
![Page 535: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/535.jpg)
Today in Summary
The difficulty of interpreting p-values
Agnostic Regression: how will regression behave in large samples if wedon’t really buy the assumptions
Bootstrap: a close to assumption free way of constructing confidenceintervals
Appendix contains a fun example of the difficulty of thinking throughp-values.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 126 / 140
![Page 536: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/536.jpg)
Today in Summary
The difficulty of interpreting p-values
Agnostic Regression: how will regression behave in large samples if wedon’t really buy the assumptions
Bootstrap: a close to assumption free way of constructing confidenceintervals
Appendix contains a fun example of the difficulty of thinking throughp-values.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 126 / 140
![Page 537: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/537.jpg)
Today in Summary
The difficulty of interpreting p-values
Agnostic Regression: how will regression behave in large samples if wedon’t really buy the assumptions
Bootstrap: a close to assumption free way of constructing confidenceintervals
Appendix contains a fun example of the difficulty of thinking throughp-values.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 126 / 140
![Page 538: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/538.jpg)
Today in Summary
The difficulty of interpreting p-values
Agnostic Regression: how will regression behave in large samples if wedon’t really buy the assumptions
Bootstrap: a close to assumption free way of constructing confidenceintervals
Appendix contains a fun example of the difficulty of thinking throughp-values.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 126 / 140
![Page 539: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/539.jpg)
Next “Week” of Classes (Three Classes)
What can go wrong and how to fix it → Diagnostics
Day 1 (M): Unusual and Influential Data → Robust Estimation
Day 2 (W): Nonlinearity → Generalized Additive Models
Day 3 (M): Unusual Errors → Sandwich Standard Errors
Reading:I Angrist and Pishke Chapter 8 (‘Nonstandard Standard Error Issues’)I Optional: Fox Chapters 11-13I Optional: King and Roberts “How Robust Standard Errors Expose
Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.
I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 127 / 140
![Page 540: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/540.jpg)
Next “Week” of Classes (Three Classes)
What can go wrong and how to fix it → Diagnostics
Day 1 (M): Unusual and Influential Data → Robust Estimation
Day 2 (W): Nonlinearity → Generalized Additive Models
Day 3 (M): Unusual Errors → Sandwich Standard Errors
Reading:I Angrist and Pishke Chapter 8 (‘Nonstandard Standard Error Issues’)I Optional: Fox Chapters 11-13I Optional: King and Roberts “How Robust Standard Errors Expose
Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.
I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 127 / 140
![Page 541: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/541.jpg)
Next “Week” of Classes (Three Classes)
What can go wrong and how to fix it → Diagnostics
Day 1 (M): Unusual and Influential Data → Robust Estimation
Day 2 (W): Nonlinearity → Generalized Additive Models
Day 3 (M): Unusual Errors → Sandwich Standard Errors
Reading:I Angrist and Pishke Chapter 8 (‘Nonstandard Standard Error Issues’)I Optional: Fox Chapters 11-13I Optional: King and Roberts “How Robust Standard Errors Expose
Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.
I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 127 / 140
![Page 542: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/542.jpg)
Next “Week” of Classes (Three Classes)
What can go wrong and how to fix it → Diagnostics
Day 1 (M): Unusual and Influential Data → Robust Estimation
Day 2 (W): Nonlinearity → Generalized Additive Models
Day 3 (M): Unusual Errors → Sandwich Standard Errors
Reading:I Angrist and Pishke Chapter 8 (‘Nonstandard Standard Error Issues’)I Optional: Fox Chapters 11-13I Optional: King and Roberts “How Robust Standard Errors Expose
Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.
I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 127 / 140
![Page 543: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/543.jpg)
Next “Week” of Classes (Three Classes)
What can go wrong and how to fix it → Diagnostics
Day 1 (M): Unusual and Influential Data → Robust Estimation
Day 2 (W): Nonlinearity → Generalized Additive Models
Day 3 (M): Unusual Errors → Sandwich Standard Errors
Reading:I Angrist and Pishke Chapter 8 (‘Nonstandard Standard Error Issues’)I Optional: Fox Chapters 11-13I Optional: King and Roberts “How Robust Standard Errors Expose
Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.
I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 127 / 140
![Page 544: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/544.jpg)
Next “Week” of Classes (Three Classes)
What can go wrong and how to fix it → Diagnostics
Day 1 (M): Unusual and Influential Data → Robust Estimation
Day 2 (W): Nonlinearity → Generalized Additive Models
Day 3 (M): Unusual Errors → Sandwich Standard Errors
Reading:I Angrist and Pishke Chapter 8 (‘Nonstandard Standard Error Issues’)I Optional: Fox Chapters 11-13I Optional: King and Roberts “How Robust Standard Errors Expose
Methodological Problems They Do Not Fix, and What to Do AboutIt.” Political Analysis, 2, 23: 159179.
I Optional: Aronow and Miller Chapters 4.2-4.4 (Inference, Clustering,Nonlinearity)
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 127 / 140
![Page 545: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/545.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 128 / 140
![Page 546: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/546.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 128 / 140
![Page 547: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/547.jpg)
Fun With Weights
Aronow, Peter M., and Cyrus Samii. ”Does Regression ProduceRepresentative Estimates of Causal Effects?.” American Journal ofPolitical Science (2015).2
Imagine we care about the possibly heterogeneous causal effect of atreatment D and we control for some covariates X?
We can express the regression as a weighting over individualobservation treatment effects where the weight depends only on X .
Useful technology for understanding what our models are identifyingoff of by showing us our effective sample.
2I’m grateful to Peter Aronow for sharing his slides, several of which are used here.Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 129 / 140
![Page 548: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/548.jpg)
Fun With Weights
Aronow, Peter M., and Cyrus Samii. ”Does Regression ProduceRepresentative Estimates of Causal Effects?.” American Journal ofPolitical Science (2015).2
Imagine we care about the possibly heterogeneous causal effect of atreatment D and we control for some covariates X?
We can express the regression as a weighting over individualobservation treatment effects where the weight depends only on X .
Useful technology for understanding what our models are identifyingoff of by showing us our effective sample.
2I’m grateful to Peter Aronow for sharing his slides, several of which are used here.Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 129 / 140
![Page 549: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/549.jpg)
Fun With Weights
Aronow, Peter M., and Cyrus Samii. ”Does Regression ProduceRepresentative Estimates of Causal Effects?.” American Journal ofPolitical Science (2015).2
Imagine we care about the possibly heterogeneous causal effect of atreatment D and we control for some covariates X?
We can express the regression as a weighting over individualobservation treatment effects where the weight depends only on X .
Useful technology for understanding what our models are identifyingoff of by showing us our effective sample.
2I’m grateful to Peter Aronow for sharing his slides, several of which are used here.Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 129 / 140
![Page 550: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/550.jpg)
Fun With Weights
Aronow, Peter M., and Cyrus Samii. ”Does Regression ProduceRepresentative Estimates of Causal Effects?.” American Journal ofPolitical Science (2015).2
Imagine we care about the possibly heterogeneous causal effect of atreatment D and we control for some covariates X?
We can express the regression as a weighting over individualobservation treatment effects where the weight depends only on X .
Useful technology for understanding what our models are identifyingoff of by showing us our effective sample.
2I’m grateful to Peter Aronow for sharing his slides, several of which are used here.Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 129 / 140
![Page 551: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/551.jpg)
How this works
We start by asking what the estimate of the average causal effect ofinterest converges to in a large sample:
βp→ E [wiτi ]
E [wi ]where wi = (Di − E [Di |X ])2 ,
so that β converges to a reweighted causal effect. AsE [wi |Xi ] = Var[Di |Xi ], we obtain an average causal effect reweighted byconditional variance of the treatment.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 130 / 140
![Page 552: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/552.jpg)
How this works
We start by asking what the estimate of the average causal effect ofinterest converges to in a large sample:
βp→ E [wiτi ]
E [wi ]where wi = (Di − E [Di |X ])2 ,
so that β converges to a reweighted causal effect. AsE [wi |Xi ] = Var[Di |Xi ], we obtain an average causal effect reweighted byconditional variance of the treatment.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 130 / 140
![Page 553: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/553.jpg)
How this works
We start by asking what the estimate of the average causal effect ofinterest converges to in a large sample:
βp→ E [wiτi ]
E [wi ]where wi = (Di − E [Di |X ])2 ,
so that β converges to a reweighted causal effect. AsE [wi |Xi ] = Var[Di |Xi ], we obtain an average causal effect reweighted byconditional variance of the treatment.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 130 / 140
![Page 554: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/554.jpg)
Estimation
A simple, consistent plug-in estimator of wi is available: wi = D2i where
Di is the residualized treatment. (the proof is connected to the partialingout strategy we showed last week)
Easily implemented in R:
wts <- (d - predict(lm(d~x)))^2
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 131 / 140
![Page 555: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/555.jpg)
Estimation
A simple, consistent plug-in estimator of wi is available: wi = D2i where
Di is the residualized treatment. (the proof is connected to the partialingout strategy we showed last week)
Easily implemented in R:
wts <- (d - predict(lm(d~x)))^2
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 131 / 140
![Page 556: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/556.jpg)
Implications
Unpacking the black box of regression gives us substantive insight
When some observations have no weight, this means that thecovariates completely explain their treatment condition.
This is a feature, not a bug, of regression: we can’t learn anythingfrom those cases anyway (i.e. it is automatically handling issues ofcommon support).
The downside is that we have to be aware of what happened!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 132 / 140
![Page 557: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/557.jpg)
Implications
Unpacking the black box of regression gives us substantive insight
When some observations have no weight, this means that thecovariates completely explain their treatment condition.
This is a feature, not a bug, of regression: we can’t learn anythingfrom those cases anyway (i.e. it is automatically handling issues ofcommon support).
The downside is that we have to be aware of what happened!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 132 / 140
![Page 558: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/558.jpg)
Implications
Unpacking the black box of regression gives us substantive insight
When some observations have no weight, this means that thecovariates completely explain their treatment condition.
This is a feature, not a bug, of regression: we can’t learn anythingfrom those cases anyway (i.e. it is automatically handling issues ofcommon support).
The downside is that we have to be aware of what happened!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 132 / 140
![Page 559: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/559.jpg)
Implications
Unpacking the black box of regression gives us substantive insight
When some observations have no weight, this means that thecovariates completely explain their treatment condition.
This is a feature, not a bug, of regression: we can’t learn anythingfrom those cases anyway (i.e. it is automatically handling issues ofcommon support).
The downside is that we have to be aware of what happened!
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 132 / 140
![Page 560: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/560.jpg)
Application
Jensen (2003), “Democratic Governance and Multinational Corporations:Political Regimes and Inflows of Foreign Direct Investment.”
Jensen presents a large-N TSCS-analysis of the causal effects ofgovernance (as measured by the Polity III score) on Foreign DirectInvestment (FDI).
The nominal sample: 114 countries from 1970 to 1997.
Jensen estimates that a 1 unit increase in polity score corresponds to a0.020 increase in net FDI inflows as a percentage of GDP (p < 0.001).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 133 / 140
![Page 561: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/561.jpg)
Application
Jensen (2003), “Democratic Governance and Multinational Corporations:Political Regimes and Inflows of Foreign Direct Investment.”
Jensen presents a large-N TSCS-analysis of the causal effects ofgovernance (as measured by the Polity III score) on Foreign DirectInvestment (FDI).
The nominal sample: 114 countries from 1970 to 1997.
Jensen estimates that a 1 unit increase in polity score corresponds to a0.020 increase in net FDI inflows as a percentage of GDP (p < 0.001).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 133 / 140
![Page 562: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/562.jpg)
Application
Jensen (2003), “Democratic Governance and Multinational Corporations:Political Regimes and Inflows of Foreign Direct Investment.”
Jensen presents a large-N TSCS-analysis of the causal effects ofgovernance (as measured by the Polity III score) on Foreign DirectInvestment (FDI).
The nominal sample: 114 countries from 1970 to 1997.
Jensen estimates that a 1 unit increase in polity score corresponds to a0.020 increase in net FDI inflows as a percentage of GDP (p < 0.001).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 133 / 140
![Page 563: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/563.jpg)
Application
Jensen (2003), “Democratic Governance and Multinational Corporations:Political Regimes and Inflows of Foreign Direct Investment.”
Jensen presents a large-N TSCS-analysis of the causal effects ofgovernance (as measured by the Polity III score) on Foreign DirectInvestment (FDI).
The nominal sample: 114 countries from 1970 to 1997.
Jensen estimates that a 1 unit increase in polity score corresponds to a0.020 increase in net FDI inflows as a percentage of GDP (p < 0.001).
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 133 / 140
![Page 564: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/564.jpg)
Nominal and Effective Samples
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 134 / 140
![Page 565: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/565.jpg)
Nominal and Effective Samples
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 134 / 140
![Page 566: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/566.jpg)
Nominal and Effective Samples
Over 50% of the weight goes to just 12 (out of 114) countries.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 134 / 140
![Page 567: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/567.jpg)
Broader Implications
When causal effects are heterogeneous, we can draw a distinction between“internally valid” and “externally valid” estimates of an AverageTreatment Effect (ATE).
“Internally valid”: reliable estimates of ATEs, but perhaps not for thepopulation you care about
I randomized (lab, field, survey) experiments, instrumental variables,regression discontinuity designs, other natural experiments
“Externally valid”: perhaps unreliable estimates of ATEs, but for thepopulation of interest
I large-N analyses, representative surveys
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 135 / 140
![Page 568: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/568.jpg)
Broader Implications
When causal effects are heterogeneous, we can draw a distinction between“internally valid” and “externally valid” estimates of an AverageTreatment Effect (ATE).
“Internally valid”: reliable estimates of ATEs, but perhaps not for thepopulation you care about
I randomized (lab, field, survey) experiments, instrumental variables,regression discontinuity designs, other natural experiments
“Externally valid”: perhaps unreliable estimates of ATEs, but for thepopulation of interest
I large-N analyses, representative surveys
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 135 / 140
![Page 569: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/569.jpg)
Broader Implications
When causal effects are heterogeneous, we can draw a distinction between“internally valid” and “externally valid” estimates of an AverageTreatment Effect (ATE).
“Internally valid”: reliable estimates of ATEs, but perhaps not for thepopulation you care about
I randomized (lab, field, survey) experiments, instrumental variables,regression discontinuity designs, other natural experiments
“Externally valid”: perhaps unreliable estimates of ATEs, but for thepopulation of interest
I large-N analyses, representative surveys
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 135 / 140
![Page 570: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/570.jpg)
Broader Implications
Aronow and Samii argue that analyses which use regression, even with arepresentative sample, have no greater claim to external validity than do[natural] experiments.
When a treatment is “as-if” randomly assigned conditional oncovariates, regression distorts the sample by implicitly applyingweights.
The effective sample (upon which causal effects are estimated) mayhave radically different properties than the nominal sample.
When there is an underlying natural experiment in the data, aproperly specified regression model may reproduce the internally validestimate associated with the natural experiment.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 136 / 140
![Page 571: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/571.jpg)
Broader Implications
Aronow and Samii argue that analyses which use regression, even with arepresentative sample, have no greater claim to external validity than do[natural] experiments.
When a treatment is “as-if” randomly assigned conditional oncovariates, regression distorts the sample by implicitly applyingweights.
The effective sample (upon which causal effects are estimated) mayhave radically different properties than the nominal sample.
When there is an underlying natural experiment in the data, aproperly specified regression model may reproduce the internally validestimate associated with the natural experiment.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 136 / 140
![Page 572: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/572.jpg)
Broader Implications
Aronow and Samii argue that analyses which use regression, even with arepresentative sample, have no greater claim to external validity than do[natural] experiments.
When a treatment is “as-if” randomly assigned conditional oncovariates, regression distorts the sample by implicitly applyingweights.
The effective sample (upon which causal effects are estimated) mayhave radically different properties than the nominal sample.
When there is an underlying natural experiment in the data, aproperly specified regression model may reproduce the internally validestimate associated with the natural experiment.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 136 / 140
![Page 573: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/573.jpg)
Broader Implications
Aronow and Samii argue that analyses which use regression, even with arepresentative sample, have no greater claim to external validity than do[natural] experiments.
When a treatment is “as-if” randomly assigned conditional oncovariates, regression distorts the sample by implicitly applyingweights.
The effective sample (upon which causal effects are estimated) mayhave radically different properties than the nominal sample.
When there is an underlying natural experiment in the data, aproperly specified regression model may reproduce the internally validestimate associated with the natural experiment.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 136 / 140
![Page 574: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/574.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 137 / 140
![Page 575: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/575.jpg)
1 Matrix Form of Regression
2 OLS inference in matrix form
3 Standard Hypothesis Tests
4 Testing Joint Significance
5 Testing Linear Hypotheses: The General Case
6 Fun With(out) Weights
7 Appendix: Derivations and Consistency
8 The Problems with p-values
9 Agnostic Regression
10 Inference via the Bootstrap
11 Fun With Weights
12 Appendix: Tricky p-value Example
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 137 / 140
![Page 576: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/576.jpg)
Still Not Convinced? A Tricky Example: p-values
Morris 19873
Mr. Allen the candidate for political Party A will run against Mr.Baker of Party B for office. Past races between these parties forthis office were always closer, and it seems this one will be noexception- Party A candidates always have gotten between 40%and 60% of the vote and have won about half of the elections.Allen needs to know for θ =the proportion of voters favoring himtoday, whether H0: θ < .5 or H1 : θ > .5 is true. A randomsample of n voters is taken, with Y voters favoring Allen. Thepopulation is large and it is justifiable to assume thatY ∼Bin(n, θ), the binomial distribution. The estimate θ = Y /nwill be used.
3From a Comment on Berger and Sellke “Testing a Point Null Hypothesis: TheIrreconcilability of P Values and Evidence
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 138 / 140
![Page 577: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/577.jpg)
Still Not Convinced? A Tricky Example: p-values
Morris 19873
Mr. Allen the candidate for political Party A will run against Mr.Baker of Party B for office. Past races between these parties forthis office were always closer, and it seems this one will be noexception- Party A candidates always have gotten between 40%and 60% of the vote and have won about half of the elections.
Allen needs to know for θ =the proportion of voters favoring himtoday, whether H0: θ < .5 or H1 : θ > .5 is true. A randomsample of n voters is taken, with Y voters favoring Allen. Thepopulation is large and it is justifiable to assume thatY ∼Bin(n, θ), the binomial distribution. The estimate θ = Y /nwill be used.
3From a Comment on Berger and Sellke “Testing a Point Null Hypothesis: TheIrreconcilability of P Values and Evidence
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 138 / 140
![Page 578: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/578.jpg)
Still Not Convinced? A Tricky Example: p-values
Morris 19873
Mr. Allen the candidate for political Party A will run against Mr.Baker of Party B for office. Past races between these parties forthis office were always closer, and it seems this one will be noexception- Party A candidates always have gotten between 40%and 60% of the vote and have won about half of the elections.Allen needs to know for θ =the proportion of voters favoring himtoday, whether H0: θ < .5 or H1 : θ > .5 is true. A randomsample of n voters is taken, with Y voters favoring Allen. Thepopulation is large and it is justifiable to assume thatY ∼Bin(n, θ), the binomial distribution. The estimate θ = Y /nwill be used.
3From a Comment on Berger and Sellke “Testing a Point Null Hypothesis: TheIrreconcilability of P Values and Evidence
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 138 / 140
![Page 579: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/579.jpg)
Example: p-values
Question: Which of three outcomes, all having the same p value,would be most encouraging to candidate Allen?
(a) Y = 15, n = 20, θ = .75, (.560, .940)(b) Y = 115, n = 200, θ = .575, (.506, .644)(c) Y = 1046, n = 2000, θ = .523(.501, .545)
Note: The p values are all about .021.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 139 / 140
![Page 580: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/580.jpg)
Example: p-values
Question: Which of three outcomes, all having the same p value,would be most encouraging to candidate Allen?
(a) Y = 15, n = 20, θ = .75, (.560, .940)(b) Y = 115, n = 200, θ = .575, (.506, .644)(c) Y = 1046, n = 2000, θ = .523(.501, .545)
Note: The p values are all about .021.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 139 / 140
![Page 581: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/581.jpg)
Example: p-values
Question: Which of three outcomes, all having the same p value,would be most encouraging to candidate Allen?
(a) Y = 15, n = 20, θ = .75, (.560, .940)
(b) Y = 115, n = 200, θ = .575, (.506, .644)(c) Y = 1046, n = 2000, θ = .523(.501, .545)
Note: The p values are all about .021.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 139 / 140
![Page 582: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/582.jpg)
Example: p-values
Question: Which of three outcomes, all having the same p value,would be most encouraging to candidate Allen?
(a) Y = 15, n = 20, θ = .75, (.560, .940)(b) Y = 115, n = 200, θ = .575, (.506, .644)
(c) Y = 1046, n = 2000, θ = .523(.501, .545)
Note: The p values are all about .021.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 139 / 140
![Page 583: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/583.jpg)
Example: p-values
Question: Which of three outcomes, all having the same p value,would be most encouraging to candidate Allen?
(a) Y = 15, n = 20, θ = .75, (.560, .940)(b) Y = 115, n = 200, θ = .575, (.506, .644)(c) Y = 1046, n = 2000, θ = .523(.501, .545)
Note: The p values are all about .021.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 139 / 140
![Page 584: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/584.jpg)
Example: p-values
Question: Which of three outcomes, all having the same p value,would be most encouraging to candidate Allen?
(a) Y = 15, n = 20, θ = .75, (.560, .940)(b) Y = 115, n = 200, θ = .575, (.506, .644)(c) Y = 1046, n = 2000, θ = .523(.501, .545)
Note: The p values are all about .021.
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 139 / 140
![Page 585: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/585.jpg)
Example: p-values
the p-values corresponds to Pr(H0|t) only when good power obtainsat typical H1 parameter values.
alternatively, simulate the thing you care about
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 140 / 140
![Page 586: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/586.jpg)
Example: p-values
the p-values corresponds to Pr(H0|t) only when good power obtainsat typical H1 parameter values.
alternatively, simulate the thing you care about
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 140 / 140
![Page 587: Week 7: Multiple Regression - Princeton University · 1 1 + z 1 2 + u 1 (unit 1) y 2 = 0 + x 2 1 + z 2 2 + u 2 (unit 2) y 3 = 0 + x 3 1 + z 3 2 + u 3 (unit 3) y 4 = 0 + x 4 1 + z](https://reader035.vdocuments.us/reader035/viewer/2022080717/5f7835fc022c2138100bd01a/html5/thumbnails/587.jpg)
Example: p-values
the p-values corresponds to Pr(H0|t) only when good power obtainsat typical H1 parameter values.
alternatively, simulate the thing you care about
Stewart (Princeton) Week 7: Multiple Regression October 22, 24, 2018 140 / 140