quantitative analysis, pt. 2 - environmental science & policy · quantitative analysis, pt. 2...
TRANSCRIPT
QuantitativeAnalysis,Pt.2
ESP178AppliedResearchMethodsCalvinThigpen
2/28/17AdaptedfromProf.SusanHandy
Reviewfromlastweek
• Descriptivestatistics• What’sthepoint?Whatarewaystoexamine?
• Keyconcepts:• Measuresofcentraltendency?
• Mean• Median• Mode
• Measuresofvariation?• Standarddeviation• Variance• Percentiles
Reviewfromlastweek
• Bivariate(twovariable)relationships• Howcanyouexaminethese?
• Multivariablerelationships• Howtoanalyze?• Whatcausalitycriterionisaddressedbyincludingmultiplevariablesinaregressionmodel?• Whatisanimportantassumptionoflinearregression?
Importantassumptionoflinearregression• Outcomeisanormallydistributed, continuousratio variable• (afteraccountingforpredictor/independentvariables)• Thisassumptionworksfor(most)ratioDVs
5’4” 5’10”
height
menwomen
CoefficientInterpretation
• Howdoyouinterpretthecoefficient(s)?• Ratio
• Nominalbinary
• Nominal/orderedwithmultiplecategories
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.04084 0.02206 -1.852 0.0644 . IndependentV 0.73379 0.02148 34.156 <2e-16 *** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6976 on 998 degrees of freedom Multiple R-squared: 0.539, Adjusted R-squared: 0.5385 F-statistic: 1167 on 1 and 998 DF, p-value: < 2.2e-16
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.44421 0.09048 49.120 < 2e-16 *** culdesac 1 1.46650 0.20871 7.026 3.93e-12 *** culdesac 2 1.70579 0.32747 5.209 2.31e-07 *** culdesac 3 3.36348 0.13810 24.356 < 2e-16 ***
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.99 on 996 degrees of freedom Multiple R-squared: 0.3735, Adjusted R-squared: 0.3716 F-statistic: 198 on 3 and 996 DF, p-value: < 2.2e-16
!" = $ + &'" + ("("~*+,-./ 0, 23
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.44421 0.09445 47.06 <2e-16 *** culdesac.binary 2.82323 0.13148 21.47 <2e-16 *** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.078 on 998 degrees of freedom Multiple R-squared: 0.316, Adjusted R-squared: 0.3153 F-statistic: 461.1 on 1 and 998 DF, p-value: < 2.2e-16
“ButwhatifmyDVisn’tacontinuousratiovariable?”• Continuousratiovariable(linearregressionworkswell):
• BoundedorIntegerratiovariable(linearregressiondoesn’tworkaswell):
height
0
20
40
60
80
100
120
DaysBikedinthePastWeek
0 1 2 3 4 5 6 7
“ButwhatifmyDVisn’tacontinuousratiovariable?”• Nominal/Ordinal(linearregressionreally doesn’tworkwell):
ordinalnominal
“ButwhatifmyDVisn’tacontinuousratiovariable?”• Don’tforcethings!• Youdon’tnecessarily havetochangeyoursurveyquestiontoacontinuousratiooutcome(thoughyoumightconsiderit)
• Today:conceptualoverviewoftwoalternativeapproaches,partofthegeneralizedlinearmodelfamily.
“ButwhatifmyDVisn’tacontinuousratiovariable?”• Let’spicktherighttoolforthe(statistical)problem.• We’llfindtherighttoolbythinkingcriticallyabouthowtofitastatisticalmodelfortheDV:
Binomialdistribution
• Describesthelikelihoodofacertain#ofevents(y)occurring,basedonasetnumberof“trials”(n)andanunderlyingprobability(p)
• Importantassumptions:• Theprobabilityisconsistentacrosstrials• Onlytwothingscanhappen
• yes/no• heads/tails• event/noevent
Twoflavors
• “Bernoulli”• Lookingatasingletrial(n=1)
• Aggregate• Lookingattheresultsofmultipletrials(n=#oftrials)• Bernoulli,summedupto#oftrials
Poissondistribution
• Describesthelikelihoodofacertain#ofeventsoccurring,basedonanaveragerateofoccurrence(lambda)
• Importantassumptions:• Dependentvariablemustbeapositiveinteger
• 0,1,2,3,4,5…
• Eventsoccurindependently• akatheydon’tinfluenceeachother
Binomialdistributionexamples
• Transportation:• Modechoice(bikevs.notbike)
• Wateruse:• Installationofwater-efficienttoilet
• Electricity:• Installationofsolarpanel
• Naturalresources:• Iftroutwereillegallycaughtornot
Poissondistributionexamples
• Transportation:• #ofcrashesatagivenintersectioninaday
• Wateruse:• #oftimesatoiletisflushedinaday
• Electricity:• #oftimesalightisturnedoninanhour
• Naturalresources:• #oftroutillegallycaughtinaday
Translatingthesedistributionsintoregressionmodels• Linearregression
• Dependentvariableandlinearmodelareonthesame,unboundedscale.
• Notsowithotherdistributions!• Wearen’texplainingtheoutcomeitself,weareexplainingparametersinthebinomial/poisson/otherdistributions
• Solution:usea“linkfunction”toaddressdifferenceinscalesofparameterandlinearmodel
!" = $ + &'" + ("("~*+,-./ 0, 23
Linkfunction
• Translatestheunboundedlinearmodelintothemorerestrictedscaleofthedistributionparameter
• Canonical(i.e.“typical”)waystodoso:• “Logit”linkforbinomial• Loglinkforpoisson
Walk-throughexample
!"~567+-6./(7, 9")
yisabounded,integerratiovariableoutof20trials(e.g.20coinflips)
xisacontinuousratiovariablethatvariesfrom-5to2(e.g.amountofweightyouplaceonheads(vs.tails)tobiasthecoin)
/+;6< 9" = 0.3 + 1.2 ∗ '"
Walk-throughexample
−5 −4 −3 −2 −1 0 1 2
−6−2
2
x
Line
ar E
quat
ion
−5 −4 −3 −2 −1 0 1 2
0.0
0.4
0.8
Tran
sfor
med
Lin
ear E
quat
ion
onto
Pro
babi
lity
Scal
e
Index
05
1020
x
Pred
icte
d co
unts
out
of 2
0
−5 −4 −3 −2 −1 0 1 2
? ?
05
1020
Index
Pred
icte
d co
unts
out
of 2
0
−5 −4 −3 −2 −1 0 1 2
LinearModel
LinkFunction
RethinkingtheDV:“#oftimesplayedoutsideinthelastweek”• Whatstatisticalmodelwouldyouchoosetoanalyzethisdependentvariable?
Poissonmodelof#oftimesplayedoutsidelastweek
Howtointerpret?• Similarities?• Differences?
Estimate Std.Errorz value Pr(>|z|)(Intercept) 1.555040.0203576.42<2e-16***d$culdesac.binary 0.440630.0263016.76<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
Whatdothecoefficientsreallymean,though?• Togetabettersense,translatebackontoscaleofthedependentvariable!
log E = $ + &'E = FGHIJE = FK.LL = 4.71E = FK.LLHO.PP = 7.32
Poissonmodelof#oftimesplayedoutsidelastweek
Estimate Std.Errorz value Pr(>|z|)(Intercept) 1.555040.0203576.42<2e-16***d$culdesac.binary 0.440630.0263016.76<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
●
●
●
●
●●
●
●
●
●
●● ● ● ● ● ● ● ● ● ●
5 10 15 20
0.00
0.05
0.10
0.15
0.20
Count
Probability
● ●
●
●
●
●
●●
●
●
●
●
●
●● ● ● ● ● ● ●
Howtointerpret?
Again,butwithmultiplecul-de-sacresponseoptions
Estimate Std.Error z value Pr(>|z|)(Intercept)1.555040.0203576.419<2e-16***culdesac 10.247560.044215.5992.15e-08***culdesac 20.394180.061046.4581.06e-10***culdesac 30.499830.0281217.774<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
●
●
●
●
●●
●
●
●
●● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
0.00
0.10
0.20
Count
Probability
●●
●
●
●
● ●●
●
●
●●
● ● ● ● ● ● ● ● ●
Again,butwithage
• Howtointerpret?
Estimate Std.Errorz value Pr(>|z|)(Intercept) 3.1642470.05354959.09<2e-16***d$age -0.1584550.006259-25.32<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
●●
●
●
● ●●
●
●
●●
● ● ● ● ● ● ● ● ● ●
5 10 15 20
0.00
0.10
0.20
Count
Probability
●●
●
●
●
● ●●
●
●
●●
● ● ● ● ● ● ● ● ●
Again,withageandcul-de-sac
Estimate Std.Error z value Pr(>|z|)(Intercept) 2.8256950.06381244.282<2e-16 ***culdesac 1 0.1498470.0444433.372 0.000747***culdesac 2 0.2173380.0615473.531 0.000414***culdesac 3 0.2963270.0298919.914 <2e-16 ***age -0.1356820.006639-20.43 <2e-16 ***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
Closing:otherprobabilitydistributions• Gamma,exponential,orderedlogit,etc.
McElreath2015,StatisticalRethinking
Recap
DependentVariableLevel Typically AppropriateModel
Nominal(binary) Binomial logisticregressionNominal (multiplecategories) Multinomiallogisticregression
Ordinal Ordinal logisticregression
Ratio(count, greaterthan0) PoissonregressionRatio(count,twooutcomes) Binomial logisticregressionRatio(unbounded,continuous) Linearregression