practical model selection and multi-model inference using r presented by: eric stolen and dan hunt
TRANSCRIPT
Theory
• “A set of propositions set out as an explanation.”
• “Theories are generalizations.”• “Theories contain questions.”• “Theories continually change…”
(Ford, E. D. 2000. Scientific Method for Ecological Research. Cambridge University Press.)
Theory
• Example 1 – Wading bird foraging:– Ideal Free Distribution– Marginal Value Theorem– Scramble Competition
Theory
• Example 2 – Indigo Snake Habitat selection
– Animal perception– Evolutionary Biology– Population Demography
Hypotheses
• Many views – confusing!• A hypothesis is a statement derived from
scientific theory that postulates something about how the world works
• A testable hypothesis is a hypothesis that can be falsified by a contradiction between a prediction derived from the hypothesis and data measured in the appropriate way
Hypotheses
• To use the Information-theoretic toolbox, we must be able to state a hypothesis as a statistical model (or more precisely an equation which allows us to calculate the maximum likelihood of the hypothesis)
Multiple Working Hypotheses
• We operate with a set of multiple alternative hypotheses (models)
• The many advantages include safeguarding objectivity, and allowing rigorous inference.
• Chamberlain (1890)• Strong Inference - Platt (1964)• Karl Popper (ca. 1960)– Bold
Conjectures
Deriving the model set
• This is the tough part (but also the creative part)
• much thought needed, so don’t rush• collaborate, seek outside advice, read
the literature, go to meetings…• How and When hypotheses are better
than What hypotheses (strive to predict rather than describe)
Models – Indigo Snake example
• Study of indigo snake habitat use
• Response variable: home range size ln(ha)
• SEX
• Land cover – 2-3 levels (lC2)
• weeks = effort/exposure
• Science question: “Is there a seasonal difference in habitat use between sexes?”
Models – Indigo Snake example
SEXland cover type (lc2)weeksSEX + lc2SEX + weeksllc2 + weeksSEX + lc2 + weeksSEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2
SEXland cover type (lc2)weeksSEX + lc2SEX + weeksllc2 + weeksSEX + lc2 + weeksSEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2
Models – Indigo Snake example
SEXland coverweeksSEX + land coverSEX + weeksllc2 + weeksSEX + land cover + weeksSEX + land cover + SEX * land coverSEX + land cover + weeks +SEX * land cover
Models – Indigo Snake example
Models – fish habitat use example
• Study of fish habitat use in salt marsh• Response variable was density ln(fish m-2 +1)• Habitat – vegetated or unvegetated• Site – 7 impoundments• Season – 4 seasons• Science questions:
– “Is there evidence for a difference in density between habitats?”
– “Is there a seasonal difference in habitat use by resident marsh fish?”
Models – fish habitat use exampleSite + Season + Habitat + Site*Habitat + Season*Habitat + Site*SeasonSite + Season + Habitat + Site*Habitat + Season*HabitatSite + Season + Habitat + Site*Season + Site*HabitatSite + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*HabitatSite + Habitat + Site*HabitatSite + Season + Habitat + Season*HabitatSeason + Habitat + Season*HabitatSite + Season + Habitat + Site*Season Site + Season + Site*SeasonSite + Season + HabitatSite + SeasonSite + HabitatSeason + HabitatSiteSeasonHabitat
Models – fish habitat use exampleSite + Season + Habitat + Site*Habitat + Season*Habitat + Site*SeasonSite + Season + Habitat + Site*Habitat + Season*HabitatSite + Season + Habitat + Site*Season + Site*HabitatSite + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*HabitatSite + Habitat + Site*HabitatSite + Season + Habitat + Season*HabitatSeason + Habitat + Season*HabitatSite + Season + Habitat + Site*Season Site + Season + Site*SeasonSite + Season + HabitatSite + SeasonSite + HabitatSeason + HabitatSiteSeasonHabitat
Models – fish habitat use exampleSite + Season + Habitat + Site*Habitat + Season*Habitat + Site*SeasonSite + Season + Habitat + Site*Habitat + Season*HabitatSite + Season + Habitat + Site*Season + Site*HabitatSite + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*HabitatSite + Habitat + Site*HabitatSite + Season + Habitat + Season*HabitatSeason + Habitat + Season*HabitatSite + Season + Habitat + Site*Season Site + Season + Site*SeasonSite + Season + HabitatSite + SeasonSite + HabitatSeason + HabitatSiteSeasonHabitat
Modeling
• Trade-off between precision and bias
• Trying to derive knowledge / advance learning; not “fit the data”
• Relationship between data (quantity and quality) and sophistication of the model
Kullback-Leibler Information
• Basic concept from Information theory• The information lost when a model is used
to represent full reality• Can also think of it as the distance
between a model and full reality
Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3The relative difference between models is constant
Akaike’s Contributions
• Figured out how to estimate the relative Kullback-Leibler distance between models in a set of models
• Figured out how to link maximum likelihood estimation theory with expected K-L information
• An (Akaike’s) Information Criteria • AIC = -2 loge (L{modeli }| data) + 2K
• Figured out how to estimate the relative K-L distance between models in a set of models
• Figured out how to link maximum likelihood estimation theory with expected K-L information
• An (Akaike’s) Information Criteria • AIC = -2 loge (L{modeli }| data) + 2K
Akaike’s Contributions
• Figured out how to estimate the relative K-L distance between models in a set of models
• Figured out how to link maximum likelihood estimation theory with expected K-L information
• An (Akaike’s) Information Criteria • AIC = -2 loge (L{modeli }| data) + 2K
Akaike’s Contributions
I-T mechanics
AICci = -2*loge (Likelihood of model i given the data) + 2*K (n/(n-K-1))
or
= AIC + 2*K*(K+1)/(n-K-1)
(where K = the number of parameters estimated and n = the sample size)
Model Probability (also Bayesian posterior model probabilities)
evidence ratio of model i to model j = wi / wj
I-T mechanics
R
rr
iiw
1
)2/1exp(
)2/1exp(
}|{Pr datagobw ii
I-T mechanics
Least Squares Regression
AIC = n loge () + 2*K (n/(n-K-1))
Where RSS / n
(explain offset for constant part)
I-T mechanics
Counting Parameters:
K = number of parameters estimated
Least Square Regression K = number of parameters + 2 (for intercept &
I-T mechanics
Counting Parameters:
K = number of parameters estimated
Logistic Regression K = number of parameters + 1 (for intercept
Comparing Models
model rows model.df k sumlogL sumaic AICc i L(modeli) wi wi/wbest n/k
I + S + H + I * H + S * H 278 264 14 -406.31 842.62 844.22 0.00 1.00 0.81 1.00 20I + S + H + I * S + I * H + S * H 278 255 23 -397.44 842.88 847.23 3.02 0.22 0.18 4.52 12I + S + H + I * S + I * H + S * H + I * S * H 278 248 30 -391.48 844.95 852.48 8.27 0.02 0.01 62.43 9I + S + H + I * S + S * H 278 258 20 -407.01 856.01 859.28 15.06 0.00 0.00 1867.01 14I + H + I * H 278 270 8 -420.96 859.91 860.45 16.23 0.00 0.00 3347.97 35I + S + H + S * H 278 267 11 -420.51 865.01 866.01 21.79 0.00 0.00 53913.94 25I + S + H + I * H 278 267 11 -420.65 865.29 866.29 22.07 0.00 0.00 62073.79 25I + S + H + I * S + I * H 278 258 20 -413.31 868.62 871.89 27.67 0.00 0.00 1.02E+06 14I + H 278 273 5 -437.56 887.12 887.34 43.12 0.00 0.00 2.31E+09 56I + S + H 278 270 8 -437.47 892.95 893.48 49.27 0.00 0.00 4.99E+10 35I + S + H + I * S 278 261 17 -427.95 891.90 894.25 50.04 0.00 0.00 7.33E+10 16S + H + S * H 278 270 8 -454.01 926.02 926.56 82.34 0.00 0.00 7.59E+17 35I 278 274 4 -459.68 929.36 929.50 85.29 0.00 0.00 3.31E+18 70I + S 278 271 7 -457.98 931.96 932.38 88.16 0.00 0.00 1.39E+19 40I + S + I * S 278 262 16 -448.31 930.61 932.70 88.48 0.00 0.00 1.64E+19 17H 278 276 2 -464.39 934.78 934.83 90.61 0.00 0.00 4.75E+19 139S + H 278 273 5 -463.96 939.92 940.14 95.93 0.00 0.00 6.77E+20 56S 278 274 4 -485.38 980.76 980.91 136.70 0.00 0.00 4.82E+29 70
Comparing Models
Combined model weight = 0.995
model rows model.df k sumlogL sumaic AICc i L(modeli) wi wi/wbest n/k
I + S + H + I * H + S * H 278 264 14 -406.31 842.62 844.22 0.00 1.00 0.81 1.00 20I + S + H + I * S + I * H + S * H 278 255 23 -397.44 842.88 847.23 3.02 0.22 0.18 4.52 12I + S + H + I * S + I * H + S * H + I * S * H 278 248 30 -391.48 844.95 852.48 8.27 0.02 0.01 62.43 9I + S + H + I * S + S * H 278 258 20 -407.01 856.01 859.28 15.06 0.00 0.00 1867.01 14I + H + I * H 278 270 8 -420.96 859.91 860.45 16.23 0.00 0.00 3347.97 35I + S + H + S * H 278 267 11 -420.51 865.01 866.01 21.79 0.00 0.00 53913.94 25I + S + H + I * H 278 267 11 -420.65 865.29 866.29 22.07 0.00 0.00 62073.79 25I + S + H + I * S + I * H 278 258 20 -413.31 868.62 871.89 27.67 0.00 0.00 1.02E+06 14I + H 278 273 5 -437.56 887.12 887.34 43.12 0.00 0.00 2.31E+09 56I + S + H 278 270 8 -437.47 892.95 893.48 49.27 0.00 0.00 4.99E+10 35I + S + H + I * S 278 261 17 -427.95 891.90 894.25 50.04 0.00 0.00 7.33E+10 16S + H + S * H 278 270 8 -454.01 926.02 926.56 82.34 0.00 0.00 7.59E+17 35I 278 274 4 -459.68 929.36 929.50 85.29 0.00 0.00 3.31E+18 70I + S 278 271 7 -457.98 931.96 932.38 88.16 0.00 0.00 1.39E+19 40I + S + I * S 278 262 16 -448.31 930.61 932.70 88.48 0.00 0.00 1.64E+19 17H 278 276 2 -464.39 934.78 934.83 90.61 0.00 0.00 4.75E+19 139S + H 278 273 5 -463.96 939.92 940.14 95.93 0.00 0.00 6.77E+20 56S 278 274 4 -485.38 980.76 980.91 136.70 0.00 0.00 4.82E+29 70
Comparing Models
Evidence Ratio = 4.52
model rows model.df k sumlogL sumaic AICc i L(modeli) wi wi/wbest n/k
I + S + H + I * H + S * H 278 264 14 -406.31 842.62 844.22 0.00 1.00 0.81 1.00 20I + S + H + I * S + I * H + S * H 278 255 23 -397.44 842.88 847.23 3.02 0.22 0.18 4.52 12I + S + H + I * S + I * H + S * H + I * S * H 278 248 30 -391.48 844.95 852.48 8.27 0.02 0.01 62.43 9I + S + H + I * S + S * H 278 258 20 -407.01 856.01 859.28 15.06 0.00 0.00 1867.01 14I + H + I * H 278 270 8 -420.96 859.91 860.45 16.23 0.00 0.00 3347.97 35I + S + H + S * H 278 267 11 -420.51 865.01 866.01 21.79 0.00 0.00 53913.94 25I + S + H + I * H 278 267 11 -420.65 865.29 866.29 22.07 0.00 0.00 62073.79 25I + S + H + I * S + I * H 278 258 20 -413.31 868.62 871.89 27.67 0.00 0.00 1.02E+06 14I + H 278 273 5 -437.56 887.12 887.34 43.12 0.00 0.00 2.31E+09 56I + S + H 278 270 8 -437.47 892.95 893.48 49.27 0.00 0.00 4.99E+10 35I + S + H + I * S 278 261 17 -427.95 891.90 894.25 50.04 0.00 0.00 7.33E+10 16S + H + S * H 278 270 8 -454.01 926.02 926.56 82.34 0.00 0.00 7.59E+17 35I 278 274 4 -459.68 929.36 929.50 85.29 0.00 0.00 3.31E+18 70I + S 278 271 7 -457.98 931.96 932.38 88.16 0.00 0.00 1.39E+19 40I + S + I * S 278 262 16 -448.31 930.61 932.70 88.48 0.00 0.00 1.64E+19 17H 278 276 2 -464.39 934.78 934.83 90.61 0.00 0.00 4.75E+19 139S + H 278 273 5 -463.96 939.92 940.14 95.93 0.00 0.00 6.77E+20 56S 278 274 4 -485.38 980.76 980.91 136.70 0.00 0.00 4.82E+29 70
model rows model.df k sumlogL sumaic AICc i L(modeli) wi wi/wbest n/k
SEX + lc2 45 42 3 -51.99 111.98 112.56 0.00 1.00 0.34 1.00 15SEX + lc2 + SEX * lc2 45 41 4 -51.21 112.41 113.41 0.85 0.65 0.22 1.53 11SEX + lc2 + weeks 45 41 4 -51.67 113.35 114.35 1.78 0.41 0.14 2.44 11SEX + landc 45 41 4 -51.89 113.78 114.78 2.22 0.33 0.11 3.03 11SEX + lc2 + weeks + SEX * lc2 45 40 5 -50.92 113.84 115.38 2.81 0.24 0.08 4.09 9SEX + landc + weeks 45 40 5 -51.61 115.23 116.77 4.20 0.12 0.04 8.17 9SEX + landc + lc2 + SEX * lc2 + SEX * landc 45 39 6 -50.90 115.81 118.02 5.45 0.07 0.02 15.28 8SEX + landc + SEX * landc 45 39 6 -50.90 115.81 118.02 5.45 0.07 0.02 15.28 8SEX + landc + weeks + SEX * landc 45 38 7 -50.62 117.24 120.27 7.70 0.02 0.01 47.02 6SEX 45 43 2 -57.47 120.94 121.22 8.66 0.01 0.00 75.90 23SEX + weeks 45 42 3 -56.64 121.27 121.86 9.29 0.01 0.00 104 15lc2 45 43 2 -59.67 125.34 125.63 13.06 0.00 0.00 686 23landc 45 42 3 -59.46 126.91 127.50 14.94 0.00 0.00 1751 15lc2 + weeks 45 42 3 -59.67 127.34 127.92 15.36 0.00 0.00 2163 15landc + weeks 45 41 4 -59.46 128.91 129.91 17.35 0.00 0.00 5854 11Null 45 44 1 -67.50 138.99 139.08 26.52 0.00 0.00 573574 45weeks 45 43 2 -67.34 140.67 140.96 28.40 0.00 0.00 1465539 23
Comparing Models
model rows model.df k sumlogL sumaic AICc i L(modeli) wi wi/wbest n/k
SEX + lc2 45 42 3 -51.99 111.98 112.56 0.00 1.00 0.34 1.00 15SEX + lc2 + SEX * lc2 45 41 4 -51.21 112.41 113.41 0.85 0.65 0.22 1.53 11SEX + lc2 + weeks 45 41 4 -51.67 113.35 114.35 1.78 0.41 0.14 2.44 11SEX + landc 45 41 4 -51.89 113.78 114.78 2.22 0.33 0.11 3.03 11SEX + lc2 + weeks + SEX * lc2 45 40 5 -50.92 113.84 115.38 2.81 0.24 0.08 4.09 9SEX + landc + weeks 45 40 5 -51.61 115.23 116.77 4.20 0.12 0.04 8.17 9SEX + landc + lc2 + SEX * lc2 + SEX * landc 45 39 6 -50.90 115.81 118.02 5.45 0.07 0.02 15.28 8SEX + landc + SEX * landc 45 39 6 -50.90 115.81 118.02 5.45 0.07 0.02 15.28 8SEX + landc + weeks + SEX * landc 45 38 7 -50.62 117.24 120.27 7.70 0.02 0.01 47.02 6SEX 45 43 2 -57.47 120.94 121.22 8.66 0.01 0.00 75.90 23SEX + weeks 45 42 3 -56.64 121.27 121.86 9.29 0.01 0.00 104 15lc2 45 43 2 -59.67 125.34 125.63 13.06 0.00 0.00 686 23landc 45 42 3 -59.46 126.91 127.50 14.94 0.00 0.00 1751 15lc2 + weeks 45 42 3 -59.67 127.34 127.92 15.36 0.00 0.00 2163 15landc + weeks 45 41 4 -59.46 128.91 129.91 17.35 0.00 0.00 5854 11Null 45 44 1 -67.50 138.99 139.08 26.52 0.00 0.00 573574 45weeks 45 43 2 -67.34 140.67 140.96 28.40 0.00 0.00 1465539 23
Comparing Models
Evidence Ratio = 3.03
model rows model.df k sumlogL sumaic AICc i L(modeli) wi wi/wbest n/k
SEX + lc2 45 42 3 -51.99 111.98 112.56 0.00 1.00 0.34 1.00 15SEX + lc2 + SEX * lc2 45 41 4 -51.21 112.41 113.41 0.85 0.65 0.22 1.53 11SEX + lc2 + weeks 45 41 4 -51.67 113.35 114.35 1.78 0.41 0.14 2.44 11SEX + landc 45 41 4 -51.89 113.78 114.78 2.22 0.33 0.11 3.03 11SEX + lc2 + weeks + SEX * lc2 45 40 5 -50.92 113.84 115.38 2.81 0.24 0.08 4.09 9SEX + landc + weeks 45 40 5 -51.61 115.23 116.77 4.20 0.12 0.04 8.17 9SEX + landc + lc2 + SEX * lc2 + SEX * landc 45 39 6 -50.90 115.81 118.02 5.45 0.07 0.02 15.28 8SEX + landc + SEX * landc 45 39 6 -50.90 115.81 118.02 5.45 0.07 0.02 15.28 8SEX + landc + weeks + SEX * landc 45 38 7 -50.62 117.24 120.27 7.70 0.02 0.01 47.02 6SEX 45 43 2 -57.47 120.94 121.22 8.66 0.01 0.00 75.90 23SEX + weeks 45 42 3 -56.64 121.27 121.86 9.29 0.01 0.00 104 15lc2 45 43 2 -59.67 125.34 125.63 13.06 0.00 0.00 686 23landc 45 42 3 -59.46 126.91 127.50 14.94 0.00 0.00 1751 15lc2 + weeks 45 42 3 -59.67 127.34 127.92 15.36 0.00 0.00 2163 15landc + weeks 45 41 4 -59.46 128.91 129.91 17.35 0.00 0.00 5854 11Null 45 44 1 -67.50 138.99 139.08 26.52 0.00 0.00 573574 45weeks 45 43 2 -67.34 140.67 140.96 28.40 0.00 0.00 1465539 23
Comparing Models
Evidence Ratio =4.28 (.34+.22+.14+.08) / (.11+.04+.02+.01)
Mathematical details
• General Linear Models – linear regression and ANOVA– Link function – Identity link– linear equation– error distribution – Normal Distribution (Gaussian)
Y = + 1X1 + 2X2 +
Mathematical details
• Logistic Regression– Link function - Logit link: ln( / (1-))– linear equation– error distribution – Binomial Distribution
Logit() = + 1X1 + 2X2 +
Mathematical details
• What types of models can be compared within a single I-T analysis?– Data must be fixed (including response)– Must be able to calculate maximum likelihood– (ways to deal with quasi-likelihood)– Models do not need to be nested– In some cases AIC is additive
Model Fitting Preliminaries
• Understanding the data/variables
• Avoid data dredging!
• safe data screening practices
• Detect outliers, scale issues, collinearity
• Tools in R
Tools in R
• Tools in R– Generalized linear models
• lm• glm
– Packages• Design Package
– FE Harrell. 2001. Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer.
• CAR package– Fox, J. 2002. An R and S-plus Companion to Applied
Regression. Sage Publications.
Tools in R
• Tools in R– Model formula
• Ex)
– Output• summary(model4)• model4$aic• Model4$coefficients
model4 <- glm(help~age2 + sex + mom_dad + suburb + brdeapp + matepp + density + I(density^2) , family=binomial,data=choices)
Model Checking
• Model Checking– Global model must fit– Models used for inference must meet
assumptions, – Look for numerical problems
• Tools in R
Interpretation of models for inference
• Case 1: One or a few models best models• Examining model parameters and predictions
– Effects– Prediction
• graphing results
– nomograms– Presenting Results
• Anderson, D. R., W. A. Link, D. H. Johnson, and K. P. Burnham. 2001. Suggestions for presenting the results of data analysis. Journal of Wildlife Management 65:373-378.
Tools
• Calculations in Excel
• AICc, Model weights, model likelihood, evidence ratios
• Sorting the models by evidence (exciting concept)
• Model weights, evidence ratios, relative variable importance
• Model selection uncertainty
• Model-average prediction
• Model-average parameter estimates
Multi-model Inference
model rows model.df k sumlogL sumaic AICc i L(modeli) wi wi/wbest n/k
SEX + lc2 45 42 3 -51.99 111.98 112.56 0.00 1.00 0.34 1.00 15SEX + lc2 + SEX * lc2 45 41 4 -51.21 112.41 113.41 0.85 0.65 0.22 1.53 11SEX + lc2 + weeks 45 41 4 -51.67 113.35 114.35 1.78 0.41 0.14 2.44 11SEX + landc 45 41 4 -51.89 113.78 114.78 2.22 0.33 0.11 3.03 11SEX + lc2 + weeks + SEX * lc2 45 40 5 -50.92 113.84 115.38 2.81 0.24 0.08 4.09 9SEX + landc + weeks 45 40 5 -51.61 115.23 116.77 4.20 0.12 0.04 8.17 9SEX + landc + lc2 + SEX * lc2 + SEX * landc 45 39 6 -50.90 115.81 118.02 5.45 0.07 0.02 15.28 8SEX + landc + SEX * landc 45 39 6 -50.90 115.81 118.02 5.45 0.07 0.02 15.28 8SEX + landc + weeks + SEX * landc 45 38 7 -50.62 117.24 120.27 7.70 0.02 0.01 47.02 6SEX 45 43 2 -57.47 120.94 121.22 8.66 0.01 0.00 75.90 23SEX + weeks 45 42 3 -56.64 121.27 121.86 9.29 0.01 0.00 104 15lc2 45 43 2 -59.67 125.34 125.63 13.06 0.00 0.00 686 23landc 45 42 3 -59.46 126.91 127.50 14.94 0.00 0.00 1751 15lc2 + weeks 45 42 3 -59.67 127.34 127.92 15.36 0.00 0.00 2163 15landc + weeks 45 41 4 -59.46 128.91 129.91 17.35 0.00 0.00 5854 11Null 45 44 1 -67.50 138.99 139.08 26.52 0.00 0.00 573574 45weeks 45 43 2 -67.34 140.67 140.96 28.40 0.00 0.00 1465539 23
Multi-model Inference
model rows model.df k sumlogL sumaic AICc i L(modeli) wi wi/wbest n/k
SEX + lc2 45 42 3 -51.99 111.98 112.56 0.00 1.00 0.43 1.00 15SEX + lc2 + SEX * lc2 45 41 4 -51.21 112.41 113.41 0.85 0.65 0.28 1.53 11SEX + lc2 + weeks 45 41 4 -51.67 113.35 114.35 1.78 0.41 0.18 2.44 11SEX + lc2 + weeks + SEX * lc2 45 40 5 -50.92 113.84 115.38 2.81 0.24 0.10 4.09 9SEX 45 43 2 -57.47 120.94 121.22 8.66 0.01 0.01 75.90 23SEX + weeks 45 42 3 -56.64 121.27 121.86 9.29 0.01 0.00 104 15lc2 45 43 2 -59.67 125.34 125.63 13.06 0.00 0.00 686 23lc2 + weeks 45 42 3 -59.67 127.34 127.92 15.36 0.00 0.00 2163 15Null 45 44 1 -67.50 138.99 139.08 26.52 0.00 0.00 573574 45weeks 45 43 2 -67.34 140.67 140.96 28.40 0.00 0.00 1465539 23
Multi-model Inference