buec 333: an introduction to introduction to …pendakur/teaching/buec333/lecture 07 sep...
TRANSCRIPT
BUEC 333:
An Introduction to
Introduction to Econometrics
Krishna Pendakur, Economics, SFU
What is Econometrics?
• Econometrics is the application of economic theory
and statistical methods to analyze economic data.
• It's both a science and an art.
– There's always a disconnect between economic
theory and the real world. Creatively bridging this gap
to apply economic theory to real-world data is an a
– Economic data are rarely "perfect" (experimental) --
sometimes requires some finesse to get believable
results out of them.
– Economic theory and statistics are grounded in
mathematics. Direct application of either is scientific.
Econometric Questions
• Magnitudes and Tests
• Does reducing class size improve the quality of education? – Common sense tells us that students get more attention in smaller classes,
and therefore probably learn more. But how much more? Reducing class size costs money (need more teachers). Does the improved quality of education justify the expense? Here, the magnitude of improvement matters.
• What will be the value of the S&P 500 one year from today?– Knowing (with certainty) the answer to this question would be worth a lot.
– A good (accurate) estimate is also valuable.
• Do Immigrants Face Discrimination in the Labour Market?– This is a yes/no question. We can use statistics to test whether or not
they face disparity.
So What?
• Knowing some basic econometrics can be very lucrative.– Lots of jobs, some pay very well.
• Knowing your way around some econometric software can too.
• Useful for future study (4th year courses -- you'll see/read quite a bit of empirical work, and maybe do some).
• Even if you end up doing something completely different, it's useful for understanding the world around you.– Managers need to understand where estimates come from and what they tell
you (and what they don't!)
– Helpful for understanding real-world statistics (polls, press about empirical studies, "facts" and arguments, etc.)
– Helpful for understanding finance (alphas, betas, R2, etc.)
• Good Policy Uses Econometric Analysis– Magnitudes Matter
– Causation Matters
BUEC 333 Deliverables…
• In this class, you will learn the statistical methods you would use to answer these questions (and many more ...)
• You will learn how to assess whether a particular estimate/analysis is any good.
• You will learn to use a specific statistical software package (EViews)
• You will get some practice working with real data.
Today
• Correlation
• Magnitudes Matter
• Statistical Precision Matters
• One Dimension Isn’t Enough
• Regression
• Causation
• Instruments, Natural Experiments and Field
Experiments
Some Data
• Long Example: Do immigrants earn less?
– How? Why?
• 2006 Census public-use microdata, sample of
from Census long forms.
• 3% sample of population of BC, about 100,000
observations of individuals
• Earnings from Wages and Salaries (ie.,
working for other people) in 2005
What is Microdata? (Excel)
24119211299106
1521819115116
1321010118146
103199199956
1218888119146
7412628129966
12199999113166
24121221199126
2211314118146
1311415118126
63188117116
75144199916
14123231113186
10311011299921
121999991199116
7512628129966
8412527119986
121999991110186
1221010118116
751121219256
8311213199916
CFSTATCFSIZECFINEFCFINC_ATCFINCBFNMEMBATTSCHAGEIMMAGEGRPABOID
What is Microdata?
• But, there are 56,530 rows and 124 columns
• A bit cumbersome
• Need to summarize this information
– Cross-Tabs
– Tables of Mean
– Regressions
Correlation
• When one number is high, do we expect
another number to be high (or low)?
• When a person is an immigrant, are their
earnings lower?
• Need to map out (at least) immigration status
and earnings
The Cross-Tab
Cross-Tabs show the numbers of observations (rows of the data)that have values in different ranges for different variables.They show how the data are clumped, and how the data vary with each other.
TTTToooottttaaaallll 11116666,,,,666633330000 11111111,,,,555588883333 11111111,,,,555511113333 7777,,,,333300008888 44447777,,,,000033334444 FFFFoooorrrreeeeiiiiggggnnnn----bbbboooorrrrnnnn PPPPeeeerrrrmmmm RRRReeeessss 8888,,,,888811114444 5555,,,,111133332222 4444,,,,888877778888 2222,,,,333355555555 22221111,,,,111177779999 CCCCaaaannnnaaaaddddiiiiaaaannnn----bbbboooorrrrnnnn 7777,,,,222288889999 6666,,,,222211112222 6666,,,,555522220000 4444,,,,888888881111 22224444,,,,999900002222 TTTTeeeemmmmppppoooorrrraaaarrrryyyy RRRReeeessssiiiiddddeeeennnnttttssss 555522227777 222233339999 111111115555 77772222 999955553333 IIIImmmmmmmmiiiiggggrrrraaaannnntttt ssssttttaaaattttuuuussss ZZZZeeeerrrroooo 1111----$$$$22220000kkkk $$$$22220000KKKK----$$$$55550000kkkk $$$$55550000kkkk++++ TTTToooottttaaaallll ggggrrrroooouuuuppppWWWWAAAAGGGGEEEESSSS
• Seems crazy to talk about the earnings of
children, and the real elderly, and people who
arrived after the 2005 income year
• So, drop
– persons under 25
– persons over 69
– immigrants who arrived in 2005 or 2006
TTTToooottttaaaallll 9999,,,,444444442222 6666,,,,888899995555 11110000,,,,444499999999 7777,,,,111199993333 33334444,,,,000022229999 FFFFoooorrrreeeeiiiiggggnnnn----bbbboooorrrrnnnn PPPPeeeerrrrmmmm RRRReeeessss 5555,,,,222200006666 3333,,,,666600008888 4444,,,,555577773333 2222,,,,333300004444 11115555,,,,666699991111 CCCCaaaannnnaaaaddddiiiiaaaannnn----bbbboooorrrrnnnn 3333,,,,999966662222 3333,,,,111122225555 5555,,,,888822223333 4444,,,,888822220000 11117777,,,,777733330000 TTTTeeeemmmmppppoooorrrraaaarrrryyyy RRRReeeessssiiiiddddeeeennnnttttssss 222277774444 111166662222 111100003333 66669999 666600008888 IIIImmmmmmmmiiiiggggrrrraaaannnntttt ssssttttaaaattttuuuussss ZZZZeeeerrrroooo 1111----$$$$22220000kkkk $$$$22220000KKKK----$$$$55550000kkkk $$$$55550000kkkk++++ TTTToooottttaaaallll ggggrrrroooouuuuppppWWWWAAAAGGGGEEEESSSS
Cross-Tabs with Column %s
TTTToooottttaaaallll 111100000000....00000000 111100000000....00000000 111100000000....00000000 111100000000....00000000 111100000000....00000000 FFFFoooorrrreeeeiiiiggggnnnn----bbbboooorrrrnnnn PPPPeeeerrrrmmmm RRRReeeessss 55555555....11114444 55552222....33333333 44443333....55556666 33332222....00003333 44446666....11111111 CCCCaaaannnnaaaaddddiiiiaaaannnn----bbbboooorrrrnnnn 44441111....99996666 44445555....33332222 55555555....44446666 66667777....00001111 55552222....11110000 TTTTeeeemmmmppppoooorrrraaaarrrryyyy RRRReeeessssiiiiddddeeeennnnttttssss 2222....99990000 2222....33335555 0000....99998888 0000....99996666 1111....77779999 IIIImmmmmmmmiiiiggggrrrraaaannnntttt ssssttttaaaattttuuuussss ZZZZeeeerrrroooo 1111----$$$$22220000kkkk $$$$22220000KKKK----$$$$55550000kkkk $$$$55550000kkkk++++ TTTToooottttaaaallll ggggrrrroooouuuuppppWWWWAAAAGGGGEEEESSSS
Cross-Tabs with Row %s
TTTToooottttaaaallll 22227777....77775555 22220000....22226666 33330000....88885555 22221111....11114444 111100000000....00000000 FFFFoooorrrreeeeiiiiggggnnnn----bbbboooorrrrnnnn PPPPeeeerrrrmmmm RRRReeeessss 33333333....11118888 22222222....99999999 22229999....11114444 11114444....66668888 111100000000....00000000 CCCCaaaannnnaaaaddddiiiiaaaannnn----bbbboooorrrrnnnn 22222222....33335555 11117777....66663333 33332222....88884444 22227777....11119999 111100000000....00000000 TTTTeeeemmmmppppoooorrrraaaarrrryyyy RRRReeeessssiiiiddddeeeennnnttttssss 44445555....00007777 22226666....66664444 11116666....99994444 11111111....33335555 111100000000....00000000 IIIImmmmmmmmiiiiggggrrrraaaannnntttt ssssttttaaaattttuuuussss ZZZZeeeerrrroooo 1111----$$$$22220000kkkk $$$$22220000KKKK----$$$$55550000kkkk $$$$55550000kkkk++++ TTTToooottttaaaallll ggggrrrroooouuuuppppWWWWAAAAGGGGEEEESSSS
Do Immigrants Earn Less?
• Temporary Residents have way more zero-
earners, and foreign-born have somewhat
more zero-earners
• Foreign-born permanent residents more low-
earners and less high-earners than Canadian-
borns. The pattern is even stronger for
temporary residents.
Integration Takes Time
• Immigrants are heterogeneous
• Some arrived long ago, some very recently
• Can we account for this in a Cross-Tab?
Do Immigrants Earn Less?
TTTToooottttaaaallll 22227777....88880000 22220000....11111111 33330000....88881111 22221111....22228888 111100000000....00000000 TTTTeeeemmmmppppoooorrrraaaarrrryyyy 44445555....00007777 22226666....66664444 11116666....99994444 11111111....33335555 111100000000....00000000 CCCCddddnnnn----BBBBoooorrrrnnnn 22222222....33335555 11117777....66663333 33332222....88884444 22227777....11119999 111100000000....00000000 2222000000000000ssss 33333333....66666666 33332222....00007777 22227777....22228888 6666....99998888 111100000000....00000000 1111999999990000ssss 33334444....77777777 22225555....99992222 22229999....22223333 11110000....00007777 111100000000....00000000 1111999988880000ssss 22229999....55551111 11119999....55550000 33333333....44449999 11117777....55551111 111100000000....00000000 1111999977770000ssss 22229999....77779999 11116666....11116666 22229999....99994444 22224444....11111111 111100000000....00000000 1111999966660000ssss 44440000....88887777 11114444....99992222 22224444....44444444 11119999....77776666 111100000000....00000000 1111999955550000ssss 44440000....88884444 11116666....44440000 22220000....77777777 22221111....99999999 111100000000....00000000 BBBBeeeeffffoooorrrreeee 1111999955550000 33337777....88888888 11119999....77770000 11119999....77770000 22222222....77773333 111100000000....00000000 CCCCoooohhhhoooorrrrtttt ZZZZeeeerrrroooo 1111----$$$$22220000kkkk $$$$22220000KKKK----$$$$55550000kkkk $$$$55550000kkkk++++ TTTToooottttaaaallll IIIImmmmmmmmiiiiggggrrrraaaannnntttt ggggrrrroooouuuuppppWWWWAAAAGGGGEEEESSSS
Table of Means
• Only 7% of recent immigrants have high earnings, compared with 27% of Cdn-born.
• But, 24% of immigrants from the 1970s have high earnings.
• This is getting clunky: how else can we summarize these data?
• A Table of Means gives the average value of some variable for values or ranges of another variable
Table of Means
TTTTeeeemmmmppppoooorrrraaaarrrryyyy 666600008888 22221111777733337777 CCCCddddnnnn----BBBBoooorrrrnnnn 11117777,,,,777733330000 33338888333311117777 2222000000000000ssss 2222,,,,444466663333 11117777111100003333 1111999999990000ssss 5555,,,,444411112222 22220000222288883333 1111999988880000ssss 2222,,,,444411116666 22227777444477778888 1111999977770000ssss 2222,,,,666677779999 33333333333366665555 1111999966660000ssss 1111,,,,222266660000 33331111777766660000 1111999955550000ssss 555577773333 33330000666688882222BBBBeeeeffffoooorrrreeee 1111999955550000 66666666 22227777555566661111 CCCCoooohhhhoooorrrrtttt NNNN((((WWWWAAAAGGGGEEEESSSS)))) mmmmeeeeaaaannnn((((WWWWAAAAGGGGEEEESSSS))))IIIImmmmmmmmiiiiggggrrrraaaannnntttt
Table of Means
• Those 2000s arrivals have it bad: $20,000
lower earnings than Canadian-borns.
• 1990s arrivals have similarly low earnings.
• But, 1970s arrivals have earnings about $5000
lower than Canadian-born.
Variation
• Used 3% of Greater Vancouver households.
• The mean is not the mean---it is an estimate
of the mean.
• There’s dispersion around the mean. Lots of
people have income far from the mean.
• Standard Deviation (SD) characterises how far
people are from the mean
– SD=the average squared distance from the mean
Table of Means and SDs
TTTTeeeemmmmppppoooorrrraaaarrrryyyy 666600008888 22221111777733337777 55557777888811111111 CCCCddddnnnn----BBBBoooorrrrnnnn 11117777,,,,777733330000 33338888333311117777 55556666333300007777 2222000000000000ssss 2222,,,,444466663333 11117777111100003333 22224444888844448888 1111999999990000ssss 5555,,,,444411112222 22220000222288883333 33331111888888889999 1111999988880000ssss 2222,,,,444411116666 22227777444477778888 33334444888855558888 1111999977770000ssss 2222,,,,666677779999 33333333333366665555 44449999222299997777 1111999966660000ssss 1111,,,,222266660000 33331111777766660000 66665555555588881111 1111999955550000ssss 555577773333 33330000666688882222 55558888666677777777BBBBeeeeffffoooorrrreeee 1111999955550000 66666666 22227777555566661111 33338888333388880000 CCCCoooohhhhoooorrrrtttt NNNN((((WWWWAAAAGGGGEEEESSSS)))) mmmmeeeeaaaannnn((((WWWWAAAAGGGGEEEESSSS)))) ssssdddd((((WWWWAAAAGGGGEEEESSSS))))IIIImmmmmmmmiiiiggggrrrraaaannnntttt
• Immigrants arriving 1961-1970 and 1971-1980
earn almost as much as Canadian-born people
• Those arriving after seem to earn less and less
• But, are these differences statistically
significant---that is, are we sure that they are
there?
Precision
• The precision of an estimated mean depends on
– How much data you have (N)
– How much dispersion there is (sd)
• Standard Error (SE)
– equals the standard deviation divided by the square root of N: SE = SD/√N
• 95% Confidence Interval (CI)
– gives a range such that we’re 95% sure that the true mean lies inside. These are roughly 4 SEswide (2 above, and 2 below, the estimated value)
Table of Means, SDs and SEs
TTTTeeeemmmmppppoooorrrraaaarrrryyyy 666600008888 22221111777733337777 55557777888811111111 2222333344445555 CCCCddddnnnn----BBBBoooorrrrnnnn 11117777,,,,777733330000 33338888333311117777 55556666333300007777 444422223333 2222000000000000ssss 2222,,,,444466663333 11117777111100003333 22224444888844448888 555500001111 1111999999990000ssss 5555,,,,444411112222 22220000222288883333 33331111888888889999 444433333333 1111999988880000ssss 2222,,,,444411116666 22227777444477778888 33334444888855558888 777700009999 1111999977770000ssss 2222,,,,666677779999 33333333333366665555 44449999222299997777 999955552222 1111999966660000ssss 1111,,,,222266660000 33331111777766660000 66665555555588881111 1111888844448888 1111999955550000ssss 555577773333 33330000666688882222 55558888666677777777 2222444455551111BBBBeeeeffffoooorrrreeee 1111999955550000 66666666 22227777555566661111 33338888333388880000 4444777722224444 CCCCoooohhhhoooorrrrtttt NNNN((((WWWWAAAAGGGGEEEESSSS)))) mmmmeeeeaaaannnn((((WWWWAAAAGGGGEEEESSSS)))) ssssdddd((((WWWWAAAAGGGGEEEESSSS)))) sssseeeemmmm((((WWWWAAAAGGGGEEEESSSS))))IIIImmmmmmmmiiiiggggrrrraaaannnntttt
Another Way To Present the Means and Standard Error
Earnings by Immigrant Cohort
15,000
20,000
25,000
30,000
35,000
40,000
Before
1950
19
50s
1960
s
19
70s
1980
s
19
90s
2000
s C
dn-B
orn
Tem
pora
ry
Immigrant Cohort
Mea
n E
arni
ngs
So, Do Immigrants Earn Less?
• The 95% confidence interval for the mean earnings of a Canadian-born person aged 25-69 is entirely above that of the mean earnings of every immigrant cohort.
• A test considers the probability that some relationship is true.
• Can we test whether or not immigrants earn less? Given the probabilities, it is highly improbable that they earn the same as the Canadian-born. So, we reject the hypothesis that they earn the same.
Age and Arrival Cohort
• Immigrants arriving before 1961 and after
1980 earn less than Canadian-born workers.
• But, those arriving before 1961 are old.
• We know old people have low earnings,
because they quit working.
• Those arriving between 1961 and 1980 are in
their prime earning years.
• Is this all about age?
Immigrants Age and Earnings
TTTTeeeemmmmppppoooorrrraaaarrrryyyy 11116666888822220000 22226666999955550000 33338888555555551111 11113333000000000000 1111333333333333 CCCCddddnnnn----BBBBoooorrrrnnnn 33331111555555552222 44443333555511111111 44447777111177776666 33335555222211118888 11110000333333337777 2222000000000000ssss 11118888777799993333 11119999555566664444 11115555555533338888 8888888811110000 2222888899996666 1111999999990000ssss 22220000333311117777 22225555111133339999 22222222777777777777 11110000333366665555 2222111111117777 1111999988880000ssss 33331111666644448888 33331111000055557777 33330000111122224444 11119999111166669999 4444000000008888 1111999977770000ssss 33337777777700004444 44441111333333336666 33333333777788889999 33333333222299991111 6666888877776666 1111999966660000ssss 55553333777700008888 44442222999966667777 22228888888866665555 11110000111188887777 1111999955550000ssss 44449999111199999999 33336666333355559999 7777555533337777BBBBeeeeffffoooorrrreeee 1111999955550000 33331111777766665555 11113333222266667777 CCCCoooohhhhoooorrrrtttt 22225555----33334444 33335555----44444444 44445555----55554444 55555555----66664444 66665555----66669999IIIImmmmmmmmiiiiggggrrrraaaannnntttt AAAAggggeeee GGGGrrrroooouuuupppp
Do Immigrants Earn Less?
• Now that we consider people who are the same
age, the disparities look smaller.
• Immigrants aged 25-34 who arrived in the 1970s
and 1980 have average earnings higher than
their Canadian-born counterparts.
• But, a lot of age-arrival year groups earn less.
Controlling for Age
• When we control for a variable x, we ask “what is
the difference in y for a given value of x?”
• When you look within age categories, there’s less
disparity.
• Thus, controlling for age reduces the amount of
disparity we see between immigrants and
Canadian-born people
Controlling for Education
• Immigrants are more educated than Canadian-
born people. Shouldn’t they earn more?
TTTToooottttaaaallll 11111111....44441111 22224444....33337777 22228888....00002222 6666....88882222 22229999....33338888 111100000000....00000000 TTTTeeeemmmmppppoooorrrraaaarrrryyyy 4444....99994444 11114444....99999999 11116666....99997777 8888....99990000 55554444....22220000 111100000000....00000000 CCCCddddnnnn----BBBBoooorrrrnnnn 9999....22221111 22226666....22227777 33332222....44447777 5555....33336666 22226666....66669999 111100000000....00000000 2222000000000000ssss 11110000....66668888 11117777....66663333 11113333....88889999 9999....44442222 44448888....33338888 111100000000....00000000 1111999999990000ssss 11115555....22222222 22223333....33335555 11119999....99993333 9999....22229999 33332222....22220000 111100000000....00000000 1111999988880000ssss 11117777....00000000 22223333....99991111 22224444....77770000 8888....88881111 22225555....55557777 111100000000....00000000 1111999977770000ssss 11113333....11114444 22222222....11110000 33330000....22220000 7777....66661111 22226666....99995555 111100000000....00000000 1111999966660000ssss 11113333....66667777 22224444....00009999 33333333....00007777 6666....33336666 22222222....88881111 111100000000....00000000 1111999955550000ssss 11116666....77775555 22227777....99992222 33330000....88889999 4444....88889999 11119999....55555555 111100000000....00000000 BBBBeeeeffffoooorrrreeee 1111999955550000 11113333....66664444 11116666....66667777 33336666....33336666 1111....55552222 33331111....88882222 111100000000....00000000 CCCCoooohhhhoooorrrrtttt LLLLTTTT HHHHSSSS HHHHSSSS SSSSoooommmmeeee PPPPSSSS BBBBAAAA////BBBBSSSScccc PPPPoooosssstttt BBBBAAAA TTTToooottttaaaallll IIIImmmmmmmmiiiiggggrrrraaaannnntttt HHHHiiiigggghhhheeeesssstttt LLLLeeeevvvveeeellll ooooffff SSSScccchhhhoooooooolllliiiinnnngggg
Mean Earnings by Education
TTTTeeeemmmmppppoooorrrraaaarrrryyyy 11111111555533333333 11113333555577771111 11118888222222226666 11112222000033337777 22227777555577777777 CCCCddddnnnn----BBBBoooorrrrnnnn 22220000222211118888 33333333777700002222 33335555444444447777 44440000555511112222 55552222111122229999 2222000000000000ssss 8888555500002222 11112222777733337777 11115555999944444444 11113333222244442222 22221111666699992222 1111999999990000ssss 9999444433338888 11114444222233338888 22221111444433335555 22220000222277771111 22229999111100006666 1111999988880000ssss 11111111999911115555 22220000444466663333 22229999999966660000 22229999333355556666 44441111333333335555 1111999977770000ssss 11115555666677779999 22224444222299990000 33334444666611111111 33332222333366663333 44448888333300000000 1111999966660000ssss 11113333000044441111 22226666777711111111 33334444999911118888 22229999222211111111 44444444888811117777 1111999955550000ssss 11117777333344443333 22226666111144442222 22229999111177776666 33334444111144443333 44449999444400005555BBBBeeeeffffoooorrrreeee 1111999955550000 11110000222222222222 11111111000000000000 44443333111166667777 11112222000000000000 22226666555577771111 CCCCoooohhhhoooorrrrtttt LLLLTTTT HHHHSSSS HHHHSSSS SSSSoooommmmeeee PPPPSSSS BBBBAAAA////BBBBSSSScccc PPPPoooosssstttt BBBBAAAAIIIImmmmmmmmiiiiggggrrrraaaannnntttt HHHHiiiigggghhhheeeesssstttt LLLLeeeevvvveeeellll ooooffff SSSScccchhhhoooooooolllliiiinnnngggg
Controlling for Education
• Unlike when we controlled for age, when we
control for education, the disparity gets
bigger.
• So, is the disparity big or small? Should we
control for age or education?
• You could make an even bigger table of
means, or…we could run a regression.
Regression Analysis
• Tool to look at differences along many
dimensions (ie., along age, education and
immigration cohort all at the same time)
• Modeling Assumption: the effects of all these
things can be summarised by adding up a
bunch of individual effects.
• Then, assessing any one of them is easy: its
effect on the total is isolated.
Analysing Regressions
• Cdn-Born, HS, aged 35-44 has estimated
earnings of $29,168 (top number)
• With a PhD (but still Cdn-Born aged 35-44), it
would be $23,459 higher, equalling $52,627
• So, you can isolate the effect of having a PhD
• You can also isolate the effect of immigration
arrival year
Regression Estimates
73617300Post BA2017-20328Temporary
11575838BA/BSc1059-252172000s
7383813Some PS760-189281990s
965-7989LT HS1057-109521980s
1248-2732465-691020-43821970s
827-787855-641446-15381960s
737169245-5421019521950s
766-965725-345999-3662Before 1950
74237629Cdn-born 35-44 HS
Std ErrEstimateVariableStd ErrEstimateVariable
Part of a Regression
2017-20328Temporary
1059-252172000s
760-189281990s
1057-109521980s
1020-43821970s
1446-15381960s
21019521950s
5999-3662Before 1950
74237629Cdn-born 35-44 HS
SEEstVariable
Regression are cool because you can look at part of a regression.
Consider the part with the immigrant cohort effects.
Holding age and education constant, if you compare a Canadian-born person with a Temporary resident, the latter earns $20,328 less.
Immigrants from the 1960s earn only $1538 less, and the confidence interval for that is about [-1500, 4500]: it could be zero dollars less!
Multivariate Analysis
• With tables of proportions or frequencies (ie.
Cross-tabs), you can see concentration.
– Eg., concentration of immigrants at the low end of
the income distribution
• You can also see that they are concentrated in
the middle of the age distribution and the top
of the education distribution
• Thus, controlling for these things is essential.
Multivariate Analysis
• Regressions allow you to control for lots of
things at once.
• One-way analysis on the basis of age might
tell you one thing (eg., earnings seem okay)
• One-way analysis on the basis of education
might tell you another (eg., earnings are low)
• Multivariate analysis can resolve this by
controlling for both at the same time.
R-squared
• The regression model has just three things: year of arrival,
age and education.
• These three things ‘explain’ 7.5% of the variation in earnings
from wages and salaries.
– Explain, as in are correlated with
– Not explain, as in causes
• What explains the other 92.5%?
– Things that explain earnings that are correlated with
arrival, age and education
– Things that explain earnings that are not correlated with
arrival, age and education
Men Earn More---So What
• Men earn more than women. What if we add
“male” to the regression?– 1970s immigrants have the same sex ratio as Canadian-borns
TTTToooottttaaaallll 55551111....77774444 44448888....22226666 111100000000....00000000 TTTTeeeemmmmppppoooorrrraaaarrrryyyy 55555555....55559999 44444444....44441111 111100000000....00000000 CCCCddddnnnn----BBBBoooorrrrnnnn 55550000....11110000 44449999....99990000 111100000000....00000000 2222000000000000ssss 55555555....99994444 44444444....00006666 111100000000....00000000 1111999999990000ssss 55554444....22227777 44445555....77773333 111100000000....00000000 1111999988880000ssss 55553333....00008888 44446666....99992222 111100000000....00000000 1111999977770000ssss 55551111....99992222 44448888....00008888 111100000000....00000000 1111999966660000ssss 55552222....33334444 44447777....66666666 111100000000....00000000 1111999955550000ssss 44449999....55557777 55550000....44443333 111100000000....00000000 BBBBeeeeffffoooorrrreeee 1111999955550000 44445555....44445555 55554444....55555555 111100000000....00000000 CCCCoooohhhhoooorrrrtttt 0000 1111 TTTToooottttaaaallll IIIImmmmmmmmiiiiggggrrrraaaannnntttt mmmmaaaalllleeee
Comparing Regressions
yesjust men
52616172male
1360205887261696273617300Post BA
229568811142687111575838BA/BSc
1366392972835997383813Some PS
1789-11041951-8057965-7989LT HS
2312-352581230-270161248-2732465-69
1519-7198815-7778827-787855-64
136227777261849737169245-54
1418-14263755-9537766-965725-34
3895-158051989-193922017-20328Temporary
2042-318331045-242541059-252172000s
1433-25109750-18298760-189281990s
1975-151311042-105231057-109521980s
1890-78931006-41141020-43821970s
2688-13451426-12181446-15381960s
38071538207277821019521950s
10462-55355915-43875999-3662Before 1950
1360482077772956674237629Cdn-born 35-44 HS
SECoefSECoefSECoef
• Check out just the top left panel: on the left there’s a regression without
male, on the right a regression with male.
• Adding male doesn’t do anything
• This is because male, though important to earnings, is not very correlated
with immigrant arrival cohort.
1989-193922017-20328Temporary
1045-242541059-252172000s
750-18298760-189281990s
1042-105231057-109521980s
1006-41141020-43821970s
1426-12181446-15381960s
207277821019521950s
5915-43875999-3662Before 1950
7772956674237629Cdn-born 35-44 HS
SECoefSECoef
Just Men’s Earnings
yesjust men
male
136020588Post BA
22956881BA/BSc
13663929Some PS
1789-11041LT HS
2312-3525865-69
1519-719855-64
1362277745-54
1418-1426325-34
3895-15805Temporary
2042-318332000s
1433-251091990s
1975-151311980s
1890-78931970s
2688-13451960s
380715381950s
10462-5535Before 1950
136048207Cdn-born 35-44 HS
SECoefThis is the rightmost panel: a regression with just men in it.
The same pattern emerges: immigrants from earlier cohorts have higher earnings, and immigrants from recent cohorts have very low earnings.
Endogeneity
• Adding Male to the regression changed nothing important
(because it wasn’t correlated with cohort)
– But let’s stick with just men.
• What about visible minority status? It is correlated.
TTTToooottttaaaallll 66662222....77770000 33337777....33330000 111100000000....00000000 TTTTeeeemmmmppppoooorrrraaaarrrryyyy 33337777....77778888 66662222....22222222 111100000000....00000000 CCCCddddnnnn----BBBBoooorrrrnnnn 99990000....77771111 9999....22229999 111100000000....00000000 2222000000000000ssss 11117777....55557777 88882222....44443333 111100000000....00000000 1111999999990000ssss 11114444....66661111 88885555....33339999 111100000000....00000000 1111999988880000ssss 11118888....33334444 88881111....66666666 111100000000....00000000 1111999977770000ssss 33335555....77774444 66664444....22226666 111100000000....00000000 1111999966660000ssss 66669999....33338888 33330000....66662222 111100000000....00000000 1111999955550000ssss 88886666....22221111 11113333....77779999 111100000000....00000000 BBBBeeeeffffoooorrrreeee 1111999955550000 99994444....44444444 5555....55556666 111100000000....00000000 CCCCoooohhhhoooorrrrtttt 0000 1111 TTTToooottttaaaallll IIIImmmmmmmmiiiiggggrrrraaaannnntttt VVVVMMMM
1444-9523vm
135820674136020588Post BA
2293735222956881BA/BSc
1365354013663929Some PS
1787-107521789-11041LT HS
2309-353782312-3525865-69
1519-75761519-719855-64
136026251362277745-54
1421-134521418-1426325-34
3952-111813895-15805Temporary
2286-250212042-318332000s
1799-179121433-251091990s
2233-82221975-151311980s
2060-24631890-78931970s
27089982688-13451960s
38042343380715381950s
10448-536110462-5535Before 1950
136449041136048207White Cdn-born 35-44 HS
SECoefSECoef
Endogeneity
• Those numbers changed a lot!
• Visible minority status is correlated with time of arrival: immigrants in later cohorts were more likely to be visible minorities.
• Visible minorities have lower earnings.
• Thus, if you leave visible minority out of the regression, you think you are measuring a cohort effect, but it is really a visible minority effect.
• This is a form of endogeneity. If a coefficient suffers from endogeneity, then its estimated value is not informative of the effect of the variable of interest.
Correlation and Causation
• Two things make correlation different from
causation.
– missing variables: things (eg, earnings and arrival
cohort) may be correlated, but that correlation
may be driven by left-out or unobserved stuff (eg,
country-of-birth).
– reverse causation: things can be correlated, but
you don’t know which causes which.
• Eg, if Chinese-born people fare well in B.C., then they
may choose to move here. This means that their high
wages cause their clustering here.
Causation and Endogeneity
• If two variables are correlated but are not
causally related, the econometric problem is
one of endogeneity (just as above).
Policy Needs Isolated Effects
• If we want to know whether or not recent
cohorts fare badly, a table will do fine.
• If we want to know why they fare badly, we
need to control out stuff
– Eg., part of their ill-fortune is driven by their
ethnic origin, not when they came.
Let’s Educate Immigrants
yesyesjust men
yesvm
135820674136020588Post BA
2293735222956881BA/BSc
1365354013663929Some PS
1787-107521789-11041LT HS
SECoefSECoef
Education pays off no matter how you slice it,controlling for age or not.Should we invest in educating immigrants asa matter of policy?
Policy Needs Causal Effects
• Does the table tell what happens if we get
more educated immigrants?
• A lot of missing variables here:
– Quality of education; relevance of education;
place of education.
– Are people who get educated smarter than
people who don’t? If so, then educating a given
person won’t raise their wage as much as the
difference in earnings might suggest.
Instrumental Variables
• Education is endogenous--that is we don’t
really believe that we have included all the
correlated missing variables.
• We could use an instrument to solve this
problem.
– An instrument is a variable the is correlated with
the problem variable (education in this case) but
not with all those correlated missing variables
– Bit too tricky to define right now.
Experimental Approaches
• The idea of experiments, in the lab, in the
field, or natural experiments, is:
– hold everything constant while changing
something of interest
– Measure the difference across its values
– Since nothing else is changing, there are no
missing variables.
– Since the experimenter is changing the values of
the variable, there is no reverse causation.
Natural Experiments
• Lab experiments don’t work so well for economic questions,
eg, do immigrant arrivals affect native-born wages? Need a
big lab.
• A natural experiment addresses reverse causality: use a
change that for sure is not driven by your supposed
dependent variable.
– Eg, the Mariel Boatlift put more than 100,000 Cubans into
Florida.
– This was not due to high wages in Florida; it was due to
Castro emptying prisons.
– It had no effect on native-born white or hispanic wages
(David Card 1990)
Field Experiments
• Often, you don’t get the natural experiment
you wish for.
• A field experiment uses data created by the
investigator in such a way that there is little or
no missing variables problem.
• Phil Oreopoulous (2009) (in MBC-supported
research) sent out 8000 fake resumes to
figure out whether and why immigrants have
trouble finding work.
Oreopoulous’ Study
• Field experiment with 8,000 mock resumes,
emailed in response to newspaper and online
job postings
• Randomize name, experience, education, and
other characteristics, send, and compare
callback rate differences
• Offers clear and convincing evidence on why
recent immigrants fare poorly in the Canadian
labour market
Example: Different experience
• Names
Example: Different name
Field Experiment Setup
• Respond only to internet or newspaper
postings (e.g. Workopolis, Job Bank,
Craigslist,…) May – Nov. 2008
• 4 resumes per posting, sent over 2-3 days:
• 0: Can. name, Can. education, Can. experience
• 1: foreign name, Can. education, Can. Exp.
• 2: foreign name, foreign education, Can. Exp.
• 3: 4: foreign name, foreign educ., some or all
foreign exp., respectively
Oreopoulous’ Results
• Name discrimination likely illegal– “Subsection 11 (1) of the Code also establishes that the right of a person
under Part I is infringed where a requirement, qualification or factor exists
that is not a prohibited ground of discrimination, but that results in the
exclusion or restriction of a group of persons who are identified by a
prohibited ground of discrimination, unless the requirement or factor is
reasonable and genuine in the circumstances, subject to hardship on the
employer”
• For those with 4-6 years Can. Experience, minimal differences
for foreign vs. Can education
• Lower returns to foreign experience not because of type of
job or employer
• Brits do fine
Food For Thought
• One-way tables are not enough:
– Missing variables may be correlated, and may in fact be driving things
– Multivariate analysis can fix up missing variables problems, and allow
you to get at causation
• Statistical precision matters:
– if things look different, but are not statistically distinguished (eg,
because they are only 1 std err apart), then they may not be different
at all.
• Regression analysis is a multivariate tool that lets you think
about each variable separately.
• Natural experiments, field experiments and instrument
variables help with both missing variables and reverse
causation, and thus allow you to get at causation.