Download - Omitted Variables
Omitted Variablesππ=π½0+π½1π₯π+π½2π΄π+π£π
True Model:β’ is the studentβs cumulative college gpa (1-4)β’ is the studentβs high school gpa (1-4)β’ is the studentβs academic abilityβ’ is a stochastic error term with
where
Empirical Specification:ππ=π½0+π½1π₯π+ππ ππ=π½2 π΄π+π£ πwhere
(1)
(2)
Empirical Specification:ππ=π½0+π½1π₯π+ππ ππ=π½2 π΄π+π£ πwhere
πΈ (ππ|π₯ π )=π½0+ π½1π₯ π+πΈ (ππ|π₯ π)
ΒΏ π½2πΈ (π΄π|π₯ π )+πΈ (π£ π|π₯ π )
ΒΏ π½2 βπππ£( π΄π , π₯π)π£ππ (π₯ π)
πΈ (ππ|π₯ π )=π½0+(π½1+π½2 β πππ£(π΄π , π₯π)π£ππ (π₯ π) )π₯π
(2)
(3)Hence, we can re-write (2) as
ππ=π½0+(π½1+π½2 β πππ£ (π΄π ,π₯ π)π£ππ (π₯π) )π₯ π+π£ π πΈ (π£π|π₯π )=0where
Regressing on produces where
πΈ ( οΏ½ΜοΏ½1 |π₯π )=π½1+π½2βπππ£ (π΄π ,π₯ π)π£ππ (π₯ π)
What we hoped to estimate
Students with greater ability are expected to have higher college gpaβs. (+) (+) Variances: always positive
Students with greater ability are expected to have high school gpaβs. (+)
is upward biased
π½1
π½2omitted π΄π
πππ£(π΄π , π₯π)π£ππ (π₯ π)
Students who work harder in high school and, as a result, earn higher grades, are likely to earn higher college grades, i.e., But if we donβt control for studentsβ academic abilities, high school grades will appear more important than they really are, because higher ability students are likely to earn both higher high school grades (and higher college grades (. Hence, higher high school grades explain higher college grades both directly and because it is a proxy for the omitted variable, ability.
Suppose the SAT is a proxy variable for ability. In particular, supposeππ=π½0+π½1π₯π+π½2π΄π+π£π
True Model:
π΄π=πΏ0+πΏ1ππ+π’πIn this case, we are assuming that studentsβ high school grades donβt help explain their academic abilities. Substituting this equation into (1), we haveππ= (π½0+π½2πΏ0 )+π½1π₯π+( π½2πΏ1 )π π+(π£π+π½2π’π )
where πΈ (π’π|π₯π ,ππ )=0
Rewriting, ππ=π½0β+π½1π₯ π+π½2βππ+π£πβ
β’ where where(3)
Two Empirical Specificationsππ=π½0+π½1π₯π+ππ ππ=π½1 π΄π+π£ πwhere
πΈ ( οΏ½ΜοΏ½1(2)|π₯ π)=π½1+π½2 βπππ£ (π΄π , π₯π )π£ππ (π₯ π )
>πΈ ( οΏ½ΜοΏ½1(3 )|π₯ π ,ππ)=π½1
ππ=π½0β+π½1π₯ π+π½2βππ+π£πβ(2)(3)
Dependent Variable: College Grade Point Average (4 point scale)All Students All Students Whites Only Non-Whites All StudentsHigh School Grade Point Average 0.548*** 0.374***(20.18) (13.54)
Scholastic Aptitude Test (SAT) 0.090***(16.49)White-Non-Hispanic (1=yes)Interaction term: Interaction term: Constant 1.223*** 0.688***(12.02) (6.81)Observations 2096 2096R-squared 0.163 0.259Absolute value of t-statistics in parentheses* significant at 10%; ** significant at 5%; *** significant at 1%
As expected, People often tell high school students that they need to study hard to eventually do well in college. The corresponding estimate is . What is the interpretation of ?
Sitting outside of High School
Dependent Variable: College Grade Point Average (4 point scale)All Students All Students Whites Only Non-Whites All StudentsHigh School Grade Point Average 0.407*** 0.353*** 0.353***(7.10) (11.16) (11.31)
Scholastic Aptitude Test (SAT) 0.072*** 0.091*** 0.091***(6.35) (14.21) (14.40)
White-Non-Hispanic (1=yes) 0.136(0.51)Interaction term: 0.053(0.79)Interaction term: -0.019(1.46)
Constant 0.874*** 0.738*** 0.738***(3.69) (6.50) (6.58)
Observations 593 1503 2096R-squared 0.165 0.266 0.267Absolute value of t-statistics in parentheses* significant at 10%; ** significant at 5%; *** significant at 1%Non-Hispanic-white students appear to get a bigger boost from studying hard in high school than non-white students. But is the difference statistically significant?
0.407β0.353=0.054
But the difference is not statistically significant!
Regressions run on subsamples of whites and non-whites
Fully Interacted Model
Omitted Variables: Youth Smoking and Anti-Smoking SentimentTrue Model:β’ = cigarettes smoked per dayβ’ =price per packβ’ = anti-smoking sentiment in state s. β’ = stochastic error term with
whereEmpirical Specification:
πππππ =π½0+π½1πππ ππ +πππ πππ =π½2 π΄ππ +π£ππ where
πΈ ( οΏ½ΜοΏ½1 |πππππ )=π½1+π½2 βπππ£ (π΄ππ , πππ ππ )
π£ππ (πππ ππ )
What we hoped to estimate
Stronger anti-smoking sentiment leads to less smoking, e.g., less smoking in public placesVariances: always positive
Stronger anti-smoking sentiment leads to higher cigarette taxes.
is downward biased
(β ) ΒΏ
ΒΏ(β )
Dependent Variable: ln(Quantity of cigarettes per day)All Smokers All Smokers Female
SmokersMale
Smokers All Smokers
Ln(Price per pack) -0.350 -0.310(0.150) (0.162)
Anti-smoking sentiment in state -0.160(0.240)
Female (1=yes)
Interaction term:
Constant 2.780 2.741(0.209) (0.217)
Observations 329 329R-squared 0.016 0.018Standard errors in parentheses
I chose to present standard errors because it is most natural to test whether our estimate implies that young smokersβ demand for cigarettes is inelastic, which requires we calculate a different t-stat than the one produced by standard software programs.
Dependent Variable: ln(Quantity of cigarettes per day)All Smokers All Smokers Female
SmokersMale
Smokers All Smokers
Ln(Price per pack) -0.267 -0.443 -0.443(0.210) (0.212) (0.206)
Anti-smoking sentiment in state
Female (1=yes) -0.357(0.417)
Interaction term: 0.176(0.299)
Constant 2.605 2.962 2.962(0.291) (0.298) (0.289)
Observations 156 173 329R-squared 0.010 0.025 0.026Standard errors in parentheses
It appears that the young womenβs demand for cigarettes is more inelastic than that of young men
β0.267β (β0.443 )=0.176
π‘ π π‘ππ‘=0.1760.299=0.59
But the difference is not statistically significant!