1 econ 495 - econometric review contentsecon.arts.ubc.ca/nfortin/econ495/lecdummy49511.pdf• this...
TRANSCRIPT
Econ 495 - Econometric Review 1
Contents
2 Dummy Variables 2
2.1 The case of one dummy variable . . . . . . . . . . . . . . 2
2.2 Multiple categories . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Interactions among Dummy Variables . . . . . . . 15
2.2.2 Allowing for Different Slopes . . . . . . . . . . . . 19
2.3 Pooling independent cross-sections over time . . . . . . . 22
2.4 Differences-in-Differences estimator and dummy variables . 27
Econ 495 - Econometric Review 2
2 Dummy Variables
2.1 The case of one dummy variable
• Often times, variables that take on continuous values are not available,
instead dummy variables have to be used.
• For example, Statistics Canada often classifies continuous variables,
such as age, into categories to preserve confidentiality; education is
often available in the form of the highest degree or diploma attained.
• A dummy variable is a variable that takes on the value 1 or 0, thus
dummy variables are also called binary variables.
Econ 495 - Econometric Review 3
• Examples: male (= 1 if the worker is male, 0 otherwise), part-timework status (= 1 if the worker is part-time, 0 otherwise), etc.
• Consider a simple model with one continuous variable X and onedummy D
Yi = β0 + β1X1 + δ0D + ui
This can be interpreted as an intercept shift
If D = 0, then Yi = β0 + β1X1 + ui
If D = 1, then Yi = β0 + β1X1 + δ0D + ui
where the case of D = 0 is the base or reference group
• Returning to our Canadian wage regression, let’s include a male dummy
log(wagesi) = β0 + β1educi + δ0male + ui
Econ 495 - Econometric Review 4
. gen male=(sex==1)
. regress lrwage male schooling [weight=fweight](analytic weights assumed)(sum of wgt is 2.2780e+06)
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 2, 9717) = 1156.78
Model | 452.291246 2 226.145623 Prob > F = 0.0000Residual | 1899.62851 9717 .19549537 R-squared = 0.1923
-------------+------------------------------ Adj R-squared = 0.1921Total | 2351.91976 9719 .241991949 Root MSE = .44215
------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------male | .2138704 .0089866 23.80 0.000 .1962548 .2314861
schooling | .0841955 .0019724 42.69 0.000 .0803292 .0880618_cons | 1.568616 .0268082 58.51 0.000 1.516066 1.621165
------------------------------------------------------------------------------
Econ 495 - Econometric Review 5
• Notice that we have included only 1 dummy when there are two groups.
Since female + male = 1, putting both in would have resulted in
perfect collinearity and one variable would have been dropped!
• This is perhaps the simplest example of the dummy variable trap.
• Since we have choosen females as our base group, β0 represents the
intercept for females and δ0 represents the male advantage.
• In terms of expectations, if we assume the zero conditional mean as-
sumption E(u|male, schooling) = 0, then
δ0 = E(lrwage|male = 1, schooling)−E(lrwage|male = 0, schooling)
Econ 495 - Econometric Review 6
• δ = 0.214 is the difference in log hourly wage between males and
females, given the same amount of education (and the same error
term u)
• It says that for the same level of education, men earn about 21%
more than women. The correct calculation is a bit lower (wageM −wageF )/wageF = exp(−0.213704) − 1 = 0.193
• Of course, other factors would have to be taken into account to de-
termine whether this is a discrimination effect
• Instead, we could have constructed a female dummy,
log(wagesi) = α0 + α1educi + γ0female + ui
Econ 495 - Econometric Review 7
. gen female=(sex==2)
. regress lrwage female schooling [weight=fweight](analytic weights assumed)(sum of wgt is 2.2780e+06)
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 2, 9717) = 1156.78
Model | 452.291246 2 226.145623 Prob > F = 0.0000Residual | 1899.62851 9717 .19549537 R-squared = 0.1923
-------------+------------------------------ Adj R-squared = 0.1921Total | 2351.91976 9719 .241991949 Root MSE = .44215
------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------female | -.2138704 .0089866 -23.80 0.000 -.2314861 -.1962548
schooling | .0841955 .0019724 42.69 0.000 .0803292 .0880618_cons | 1.782486 .026398 67.52 0.000 1.73074 1.834232
------------------------------------------------------------------------------
Econ 495 - Econometric Review 8
• Notice that −γ0 = δ0 the coefficient of the dummy variables are ofthe same magnitude but of opposite sign
• Also α0 = β0 + δ0 and β0 = α0 + γ0
• It does not matter which group is choosen to be the base group, butkeeping track of which group is the base group is important for theinterpretation
2.2 Multiple categories
• We can use dummy variables to control for something with multiplecategories
Econ 495 - Econometric Review 9
• In our LFS data, education was initially available in terms of 7 cate-gories, as in Table 1
• In this case, we can construct dummy variables for each category, butwe have to omit one category
• We could let STATA choose the category by including edd* in the listof explanatory variables, but often it is best to choose ourselves
• An intermediate category that has a sufficiently large number of ob-servations is a good choice, in the context of wage regression highschool graduates are often the base group
• Because the base group is absorbed in the intercept, if there are ncategories there should be n − 1 dummy variables
Econ 495 - Econometric Review 10
. tab educ90, gen(edd)
highest |educational |attainment | Freq. Percent Cum.------------+-----------------------------------
0 | 547 4.61 4.61 /* 0 to 8 years*/1 | 1,878 15.82 20.43 /*Some secondary*/2 | 2,459 20.72 41.15 /*Grade 11 to 13*/3 | 1,088 9.17 50.31 /*Some post secondary*/4 | 4,002 33.72 84.03 /*Post secondary diploma*/5 | 1,321 11.13 95.16 /*Bachelors */6 | 575 4.84 100.00 /*Graduate degree */
------------+-----------------------------------Total | 11,870 100.00
regress lrwage edd1 edd2 edd4 edd5 edd6 edd7 [weight=fweight] /*edd3 omitted */(analytic weights assumed)(sum of wgt is 2.2780e+06)
Econ 495 - Econometric Review 11
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 6, 9713) = 316.00
Model | 384.123205 6 64.0205341 Prob > F = 0.0000Residual | 1967.79655 9713 .202594106 R-squared = 0.1633
-------------+------------------------------ Adj R-squared = 0.1628Total | 2351.91976 9719 .241991949 Root MSE = .4501
------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------edd1 | -.1295773 .0241357 -5.37 0.000 -.1768883 -.0822663edd2 | -.173165 .0157088 -11.02 0.000 -.2039575 -.1423724edd4 | -.0482456 .0173476 -2.78 0.005 -.0822504 -.0142407edd5 | .1495457 .0126066 11.86 0.000 .1248341 .1742573edd6 | .3869386 .0163941 23.60 0.000 .3548028 .4190744edd7 | .5895479 .0226934 25.98 0.000 .5450641 .6340316
_cons | 2.692276 .0097833 275.19 0.000 2.673099 2.711454------------------------------------------------------------------------------
Econ 495 - Econometric Review 12
• The interpretation of the wage premiums is by comparison with high
school educated workers
• Since the dependent variable is log(wages), the coefficients of edd7
and edd6 mean that workers with a bachelor’s degree make about 39%
more than workers with a high school degree and that percentage is
about 59% for workers with a graduate degree
• If there are a lot of categories, it may make sense to group some
together
. gen lesshs=0
. replace lesshs=1 if educ90<=1(2425 real changes made). gen hs=0
Econ 495 - Econometric Review 13
. replace hs=1 if educ90==2(2459 real changes made). gen somecol=0. replace somecol=1 if educ90==3 | educ90==4(5090 real changes made). gen univ=0. replace univ=1 if educ90==5 | educ90==6(1896 real changes made)
. sum lrwage lesshs hs somecol univ [weight=fweight](analytic weights assumed)
Variable | Obs Weight Mean Std. Dev. Min Max-------------+-----------------------------------------------------------------
lrwage | 9720 2278028 2.783154 .4919268 1.261226 4.371922lesshs | 9720 2278028 .1807976 .3848702 0 1
hs | 9720 2278028 .2177673 .4127496 0 1somecol | 9720 2278028 .4312809 .4952807 0 1
univ | 9720 2278028 .1701542 .3757875 0 1
. regress lrwage lesshs somecol univ [weight=fweight]
Econ 495 - Econometric Review 14
(analytic weights assumed)(sum of wgt is 2.2780e+06)
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 3, 9716) = 547.22
Model | 339.954055 3 113.318018 Prob > F = 0.0000Residual | 2011.9657 9716 .207077573 R-squared = 0.1445
-------------+------------------------------ Adj R-squared = 0.1443Total | 2351.91976 9719 .241991949 Root MSE = .45506
------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------lesshs | -.162843 .0146856 -11.09 0.000 -.1916297 -.1340562somecol | .1029682 .0121338 8.49 0.000 .0791835 .1267529
univ | .4461324 .0149344 29.87 0.000 .4168579 .475407_cons | 2.692276 .0098909 272.20 0.000 2.672888 2.711665
------------------------------------------------------------------------------
• In the case of educational attainment, the dummy variables reflect thechoices of individuals. The question of causality is a central issue. Are
Econ 495 - Econometric Review 15
individuals with a university degree earning higher wages because they
are more able individuals or because of their higher productivity?
2.2.1 Interactions among Dummy Variables
• Interacting dummy variables is like subdividing the groups
• Suppose we would like to know whether the returns to education are
the same for men and women, using the 4 above education categories,
there will be 8 groups, so we need 7 dummy variables
• If high school*female is the base group
Econ 495 - Econometric Review 16
. gen lhsmale=lesshs*male
. gen scolmale=somecol*male
. gen univmale=univ*male
. regress lrwage male lesshs somecol univ lhsmale scolmale univmale[weight=fweight] (analytic weights assumed)(sum of wgt is 2.2780e+06)
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 7, 9712) = 332.96
Model | 455.186694 7 65.0266706 Prob > F = 0.0000Residual | 1896.73306 9712 .195297885 R-squared = 0.1935
-------------+------------------------------ Adj R-squared = 0.1930Total | 2351.91976 9719 .241991949 Root MSE = .44193
------------------------------------------------------------------------------
Econ 495 - Econometric Review 17
lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------
male | .183935 .0192114 9.57 0.000 .1462766 .2215933lesshs | -.2436651 .0215678 -11.30 0.000 -.2859426 -.2013877somecol | .0798044 .0166586 4.79 0.000 .0471502 .1124587
univ | .457988 .0207182 22.11 0.000 .417376 .4986lhsmale | .1002952 .0288863 3.47 0.001 .0436721 .1569184
scolmale | .0425959 .023568 1.81 0.071 -.0036024 .0887941univmale | -.0310315 .02902 -1.07 0.285 -.0879167 .0258538
_cons | 2.600929 .0135387 192.11 0.000 2.57439 2.627467------------------------------------------------------------------------------
• To illustrate the meaning of the interactions, let’s consider the corre-
spondence between the groups and which dummies are turned on
Econ 495 - Econometric Review 18
Male Female True iflesshs 1 0 male=1 & lesshs=1 & lhsmale=1lesshs 0 1 lesshs=1high school 1 0 male=1high school 0 1 no dummy is turned on, the constant is
the intercept for this base groupsomecol 1 0 male=1 & somecol=1 & scolmale=1somecol 0 1 somecol=1univ 1 0 male=1 & univ=1 & univmale=1univ 0 1 univ=1
• The interpretation of the coefficient on univ is the premium (46%)
that a women with a bachelor’s degree gets by comparison with high
school educated women
Econ 495 - Econometric Review 19
• The interaction univmale is the premium a university educated male
gets by comparison with a university education female, it is not signif-
icant; but of course, he gets the male premium!
• The significant interaction lhsmale tells us that there was a case for
different education coefficient by gender
2.2.2 Allowing for Different Slopes
• We can also interact a dummy variable with a continuous variable
Econ 495 - Econometric Review 20
• Let’s use the continuous version of educational attainment, to estim-
tate
log(wagesi) = β0 + β1educi + δ0male + δ1male ∗ educi + ui
. gen schomale=schooling*male
. regress lrwage male schooling schomale [weight=fweight](analytic weights assumed)(sum of wgt is 2.2780e+06)
Source | SS df MS Number of obs = 9720-------------+------------------------------ F( 3, 9716) = 779.09
Model | 456.064761 3 152.021587 Prob > F = 0.0000Residual | 1895.855 9716 .195127109 R-squared = 0.1939
-------------+------------------------------ Adj R-squared = 0.1937Total | 2351.91976 9719 .241991949 Root MSE = .44173
------------------------------------------------------------------------------
Econ 495 - Econometric Review 21
lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------
male | .4436921 .0530265 8.37 0.000 .3397492 .5476351schooling | .0942717 .0030221 31.19 0.000 .0883478 .1001957schomale | -.0175286 .003986 -4.40 0.000 -.0253419 -.0097153
_cons | 1.43575 .0403754 35.56 0.000 1.356606 1.514894------------------------------------------------------------------------------
• In this regression β0 = 1.436 is the intercept for females, and β0+δ0 =
1.436 + 0.447 = 1.883 is the intercept for males
• β1 = 0.094 is the slope for females and β1 + δ1 = 0.094 − 0.0175 =
0.0765 is the slope for males.
• It is typical to find that the wage equation for women has a lower
intercept but a steeper slope. It means that women earn less than
Econ 495 - Econometric Review 22
men at low levels of education but that the gap narrows as education
increases.
2.3 Pooling independent cross-sections over time
• Many cross-sectional surveys, such as the Labour Force Survey, are
repeated over time
– We may want to pool cross sections just to get bigger sample sizes
– or to investigate the effect of time
– or to investigate whether relationships have changed over time
Econ 495 - Econometric Review 23
• To reflect the fact that the population may have different distributions,
we allow the intercept to differ across the time periods
• This is done by introducing year dummy variables for all but one year
• We can also interact the time dummy with a key variable to look for
changes over time
• For example, let’s see if there has been significant progress in the
gender wage gap in Canada from October 1997 to October 2004
. use c:\data\lfs1004.dta
. append using c:\data\lfs1097.dta
Econ 495 - Econometric Review 24
. gen time=0
. replace time=1 if survyear==2004(59140 real changes made)
. gen female=(sex==2)
. gen femtime=female*time
. regress lrwage female schooling time femtime [weight=fweight](analytic weights assumed)(sum of wgt is 2.4898e+07)
Source | SS df MS Number of obs = 98706-------------+------------------------------ F( 4, 98701) = 6092.42
Model | 4893.05826 4 1223.26457 Prob > F = 0.0000Residual | 19817.6582 98701 .200784777 R-squared = 0.1980
-------------+------------------------------ Adj R-squared = 0.1980Total | 24710.7165 98705 .250349187 Root MSE = .44809
------------------------------------------------------------------------------
Econ 495 - Econometric Review 25
lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------
female | -.2091473 .0042243 -49.51 0.000 -.2174269 -.2008678schooling | .0902508 .0006366 141.76 0.000 .089003 .0914986
time | -.0149797 .0039705 -3.77 0.000 -.0227618 -.0071975femtime | .0077309 .0057311 1.35 0.177 -.003502 .0189639_cons | 1.700167 .0088124 192.93 0.000 1.682895 1.717439
------------------------------------------------------------------------------
An alternative command is:
. xi: regress lrwage schooling i.female*time [weight=fweight]i.female _Ifemale_0-1 (naturally coded; _Ifemale_0 omitted)i.female*time _IfemXtime_# (coded as above)(analytic weights assumed)(sum of wgt is 2.4898e+07)
Source | SS df MS Number of obs = 98706-------------+------------------------------ F( 4, 98701) = 6092.42
Model | 4893.05826 4 1223.26457 Prob > F = 0.0000Residual | 19817.6582 98701 .200784777 R-squared = 0.1980
Econ 495 - Econometric Review 26
-------------+------------------------------ Adj R-squared = 0.1980Total | 24710.7165 98705 .250349187 Root MSE = .44809
------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------schooling | .0902508 .0006366 141.76 0.000 .089003 .0914986
_Ifemale_1 | -.2091473 .0042243 -49.51 0.000 -.2174269 -.2008678time | -.0149797 .0039705 -3.77 0.000 -.0227618 -.0071975
_IfemXtime_1 | .0077309 .0057311 1.35 0.177 -.003502 .0189639_cons | 1.700167 .0088124 192.93 0.000 1.682895 1.717439
------------------------------------------------------------------------------
• Since the coefficient of female*time=0.0077 is not statistically sig-
nificant, we do not find a significant improvement in the gender wage
gap controlling for education
Econ 495 - Econometric Review 27
2.4 Differences-in-Differences estimator and dummy vari-
ables
• Experimental data are often panel data, that is, observations on the
same individuals before and after treatment, as would be the case in
the medical field
• In economics, althought experimental data now relatively more com-
mon they are not always available
• Often, we have to use data from what we call a quasi-experiment or
a natural experiment
Econ 495 - Econometric Review 28
• A natural experiment occurs when some exogenous event–either truly
natural or the result of a policy change–change the environment in
which individuals, economic agents or countries operate
• Like a true experiment, it has a treatment group which is affected
by the policy change and a control group which is thought not to be
affected by the policy change
• Unlike a true experiment, the treatment and control groups are not
choosen randomly
• Sometimes controlling for other variables helps make the group assign-
ment closer to random, sometimes an instrumental variable is needed
Econ 495 - Econometric Review 29
• Let YT,before
be the sample average of Y for those in the treatment
group and YT,after
be the sample average for the treatment group
after the experiment.
• Similarly, let YC,before
and YC,after
denote the corresponding pre-
treatment and post-treatment sample averages for the control group
• The average change in Y over the course of the experiment for those
in the treatment group is ∆YT
= YT,after − Y
T,before
• The average change in Y over the course of the experiment for those
in the control group is ∆YC
= YC,after − Y
C,before
Econ 495 - Econometric Review 30
• The differences-in-differences estimator is
δ̂DID
=(Y
T,after − YT,before
)−
(Y
C,after − YC,before
)
= ∆YT − ∆Y
C
• If the treatment is randomly assigned, then δ̂DID
is an unbiased and
consistent estimator of the causal effect
• To test whether δ̂DID
is statistically different from zero, we can apply
regression analysis to the pooled data set
Y = β0 + δ0 ∗ after + β1 ∗ treated + δDIDtreated ∗ after + u
• We can also add other X’s to the regression to control for factors that
might make the treatment and control group different
Econ 495 - Econometric Review 31
• Consider the policy experiment of the 2001 Pay Equity Law in the
Province of Quebec
• Using women from that province as the treated group, and those in
the Rest of Canada [ROC] as controls, we can run the regression of
the wages of women, controlling for education
. gen quebec=0
. replace quebec=1 if prov>=23 & prov<=25(21320 real changes made)
. gen quetime=quebec*time
. regress lrwage schooling time quebec quetime [weight=fweight] if female==1(analytic weights assumed)(sum of wgt is 1.1970e+07)
Econ 495 - Econometric Review 32
Source | SS df MS Number of obs = 48106-------------+------------------------------ F( 4, 48101) = 3047.96
Model | 2266.6519 4 566.662975 Prob > F = 0.0000Residual | 8942.73084 48101 .185915695 R-squared = 0.2022
-------------+------------------------------ Adj R-squared = 0.2021Total | 11209.3827 48105 .233019078 Root MSE = .43118
------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------schooling | .1004 .0009101 110.32 0.000 .0986162 .1021838
time | -.0109533 .0045211 -2.42 0.015 -.0198147 -.0020918quebec | -.0119871 .0069314 -1.73 0.084 -.0255726 .0015985quetime | .0045126 .0093351 0.48 0.629 -.0137843 .0228095_cons | 1.359245 .0125311 108.47 0.000 1.334684 1.383806
------------------------------------------------------------------------------
• Although positive, the coefficient on quetime is not significant.
Econ 495 - Econometric Review 33
• Perhaps the Quebec law had some positive spillovers on women’s wage
in ROC or there may be a specific Quebec trend, we may try to use
men as an additional control group to control.
• To do this, we will look at the change in the gender gap by interacting
quetime with the female dummy, of course, we need to add all other
possible interactions minus 1 (2*2*2-1=7)
• This will be called a triple difference (DDD) estimator
• Without controlling for schooling, this would give the following table
• The regression framework allows us to control for schooling and obtain
standard errors easily.
Econ 495 - Econometric Review 34
Table 1: Average Log Wages
1997-2004 Change Women MenQuebec ROC Quebec ROC
(1) 1997 2.676 2.691 2.845 2.891(2) 2004 2.705 2.704 2.860 2.894
1st D (3) Row (2)- Row (1) 0.029 0.013 0.014 0.0032nd D (4) Quebec (3) - ROC (3) 0.016 0.0113rd D (5) Women (4) - Men (4) 0.005
Econ 495 - Econometric Review 35
. gen feqtime=female*quetime
. gen femque=female*quebec
. gen feqctime=female*quebec*time
. regress lrwage schooling female time quebec femtime quetime femque /*> */ feqctime [weight=fweight](analytic weights assumed)(sum of wgt is 2.4898e+07)
Source | SS df MS Number of obs = 98706-------------+------------------------------ F( 8, 98697) = 3054.38
Model | 4903.7356 8 612.96695 Prob > F = 0.0000Residual | 19806.9809 98697 .200684731 R-squared = 0.1984
-------------+------------------------------ Adj R-squared = 0.1984Total | 24710.7165 98705 .250349187 Root MSE = .44798
------------------------------------------------------------------------------lrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Econ 495 - Econometric Review 36
schooling | .0902054 .0006366 141.71 0.000 .0889577 .091453female | -.2144925 .0048316 -44.39 0.000 -.2239623 -.2050227
time | -.0158113 .00455 -3.48 0.001 -.0247293 -.0068934quebec | -.0344593 .0067991 -5.07 0.000 -.0477854 -.0211332femtime | .0072471 .0065576 1.11 0.269 -.0056058 .0200999quetime | .0036344 .0092983 0.39 0.696 -.01459 .0218589femque | .0221537 .0099396 2.23 0.026 .0026723 .0416352
feqctime | .0020585 .0134833 0.15 0.879 -.0243687 .0284856_cons | 1.708992 .0089766 190.38 0.000 1.691398 1.726586
------------------------------------------------------------------------------
• Now femque captures the advantage of women in Quebec vs. ROC
and is significant at the 5% level
• But the difference over time, before and after the implementation of
the law, is not significant!