reminders and hints for hw2 q1.4
DESCRIPTION
Reminders and hints for HW2 Q1.4 Crucial that you get this one right as it carriers through for several other questions. As noted in the question, use the 1913 data to check that your formula is correct. Reminders and hints for HW2 Q1.4 - PowerPoint PPT PresentationTRANSCRIPT
Reminders and hints for HW2 Q1.4
• Crucial that you get this one right as it carriers through for several other questions. As noted in the question, use the 1913 data to check that your formula is correct.
Reminders and hints for HW2 Q1.4
• Crucial that you get this one right as it carriers through for several other questions. As noted in the question, use the 1913 data to check that your formula is correct.
Q1.5• Refer to the beginning of HW1 for an example of the “replace” command
• The notes provide examples of the command for running an OLS regression
Q1.8 & 1.9• Pay attention to the required rounding (precision and accuracy are important)
Q1.10• REALLY pay attention to the required rounding. It is a slight tweak on what is asked for
in 1.8 and 1.9.
Q1.11 – 1.13• Double log model… covered later in the notes
𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝜀𝑖(Original) Empirical Specification:
Double-Log Specification
𝑙𝑛𝑔𝑖=𝛾0+𝛾 1𝑙𝑛𝑥𝑖+𝜇𝑖
𝛽1=¿
𝛾1=¿
(Alternative) Empirical Specification:
Δ𝑔Δ 𝑥=
𝑑𝑔𝑑𝑥
% Δ𝑔% Δ𝑥
𝑑 𝑙𝑛𝑔𝑑𝑙𝑛𝑥=¿
𝑑𝑔𝑔𝑑𝑥𝑥
=
Δ𝑔𝑔
Δ 𝑥𝑥
=¿
�̂�1=%̂ Δ𝑔% Δ𝑥 =0.61
Suppose percent. In this case,
%̂ Δ𝑔=¿Interpretation: A 10 percent increase in high school grades is associated, on average, with a roughly 6 percent increase in college grades.
Why use double-log? In some cases, it is (1) Easier to interpret, (2) a better fit to the data. More on this later…
0.61 ∙ % Δ 𝑥=¿0.61 ∙ 10=6.1
⋮ ⋮
Homework question is tricky—you need to do these steps in that case!
Most important items in a econometrician’s toolbox:
“Regression models designed to control for variables that may mask the causal effects of interest.”
Mostly Harmless Econometrics, p. xi.
Now a slope coefficient indicates the change in the dependent variable associated with a one-unit increase in the explanatory variable holding the other explanatory variables constant
Multivariate Model
𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝛽2𝑠𝑖+𝜀𝑖where is the student’s cumulative college gpa (1-4).
is the student’s high school gpa (1-4).
is a stochastic error term
𝑖=1 ,⋯ , 2096𝑐𝑜𝑙𝑙𝑒𝑔𝑒 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
is the student’s combined SAT score (in hundreds of points (ranges from 6.3 to 16).
𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝛽2𝑠𝑖+𝜀𝑖 𝑖=1 ,⋯ ,2096𝑐𝑜𝑙𝑙𝑒𝑔𝑒 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
𝛽1=𝜕𝑔𝑖
𝜕 𝑥𝑖|𝑠𝑖It is the change in college grades with respect to a change in high school grades holding SAT scores constant.
�̂�𝑖=0.688+0 .374 ∙𝑥 𝑖+0.090 ∙ 𝑠𝑖Interpretation: A full-letter grade increase in a student’s high-school grades is associated, on average, with roughly a third of a letter grade increase in their college grades, holding his or her SAT scores constant.
… holding the other explanatory variables constant, not holding everything else constant. Omitted (and relevant) variables are not held constant.
Suppose we want to graph the relationship between high school and college grades holding SATs constant at its mean, which is 1302 (). Substituting,
( �̂�𝑖|𝑠𝑖=𝑠 )=0.688+0.374 ∙𝑥 𝑖+0.090 ∙ 1 3.08¿ (0.688+0.090 ∙ 13.08 )+0 .374 ∙𝑥 𝑖
¿1.865+0 .374 ∙ 𝑥𝑖
How do you graph the relationship between the dependent variable and one of many explanatory variables. To do it, you need to hold the other explanatory variables constant, often at their sample means.
Multivariate OLS Fitted line:
Univariate OLS Fitted line:
0.00 1.00 2.00 3.00 4.000.00
1.00
2.00
3.00
4.00College
gpa
High school gpa
�̂� 𝑖=1.233+0.548 𝑥 𝑖
(�̂�𝑖|𝑠 𝑖=𝑠 )=1.865+0.374 ∙ 𝑥 𝑖
�̂� 𝑖=�̂� 0+
�̂� 1𝑥 𝑖
𝑔
𝑥𝑖
(𝑥 𝑖 ,𝑔𝑖 )
𝑥
(𝑥 𝑖 , �̂�𝑖 )
𝑥
𝑔
𝑔𝑖−𝑔
�̂�𝑖−𝑔
𝑒𝑖=𝑔𝑖− �̂�𝑖
Decomposition of Variance in g
(𝑔𝑖−𝑔)=( �̂�𝑖−𝑔 )+𝑒𝑖
Decomposition of Variance in g
(𝑔𝑖−𝑔)=( �̂�𝑖−𝑔 )+𝑒𝑖(𝑔𝑖−𝑔)2=(( �̂�𝑖−𝑔 )+𝑒𝑖)
2
¿ ( �̂�𝑖−𝑔 )2+2 (�̂�𝑖−𝑔 )𝑒𝑖+𝑒𝑖2
∑ (𝑔𝑖−𝑔 )2=∑ (�̂�𝑖−𝑔 )2+¿∑ 𝑒𝑖2 ¿
Demonstrate that:
∑ (𝑔𝑖−𝑔 )2=∑ (�̂�𝑖−𝑔 )2+∑ 𝑒𝑖2+2∑ �̂�𝑖𝑒𝑖−2𝑔∑ 𝑒𝑖
2𝑛(∑ �̂�𝑖𝑒𝑖
𝑛 )−2𝑛𝑔 (∑ 𝑒𝑖
𝑛 )
Decomposition of Variance in g
∑ (𝑔𝑖−𝑔 )2=∑ (�̂�𝑖−𝑔 )2+¿∑ 𝑒𝑖2 ¿
TSS = ESS + RSS
Dividing both sides by TSS,
1 =
Define the coefficient of determination as
The proportion of the total variation in the dependent variable that is explained by the independent variables.
is a measure of the overall fit of the estimated model.
High school grades consistently predict college freshmen's grades more accurately than the SAT in both selective and nonselective colleges, and little predictive power is gained by combining the SAT with high school grades.
William Julius Wilson, The American Prospect, 1999𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝜀𝑖
𝑔𝑖=𝛽0+𝛽1𝑥𝑖+𝛽2𝑠𝑖+𝜀𝑖
High school grades alone explains 16.3
percent of the variation in college
grades
High school grades and SAT scores
together explain 25.9 percent of the
variation in college grades
Our evidence doesn’t support William Julius Wilson’s claim!
How hard a student works in high school is a better indication of their potential as a college student than a test that requires six hours on a Saturday.
Editorial, Penn State Daily Collegian, 2008