linear regression with r 2
TRANSCRIPT
Linear Regressionwith
2012-12-10 @HSPHKazuki Yoshida, M.D. MPH-CLE student
FREEDOMTO KNOW
2: Model selection
Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH
n Introduction
n Reading Data into R (1)
n Reading Data into R (2)
n Descriptive, continuous
n Descriptive, categorical
n Deducer
n Graphics
n Groupwise, continuous
n Linear regression
Previously in this group
Menu
n Linear regression: Model selection
Ingredients
n Selection methods n step()
n drop1()
n add1()
n leaps::regsubsets()
Statistics Programming
Open R Studio
Open the saved script that we
created last time.See also Linear Regression with R 1 slides
Create full & null models
lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw)
lm.null <- lm(bwt ~ 1, data = lbw)
Intercept-only
Compare two models
anova(lm.full, lm.null)
Model 2Model 1
Partial F-test
Significant
Models
Residual degree of freedomResidual sum of squares
Difference in residual SS
Backward elimination
lm.step.bw <- step(lm.full, direction = "backward")
Final model object
Specify full model
Initial AIC
for full model
Removing ftv.catmakes AIC smallest
Removing agemakes AIC smallest
Doing nothingmakes AIC smallest
Forward selection
lm.step.fw <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "forward")
Final model object Specify null model
formula for possible variables
Initial AIC for null
model
Adding uimakes AIC smallest
Adding race.catmakes AIC smallest
Adding smokemakes AIC smallest
Still goes on ...
Stepwise selection/elimination
lm.step.both <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "both")
Final model object Specify null model
formula for possible variables
Initial AIC for null
model
Adding uimakes AIC smallest
Adding race.catmakes AIC smallest
Adding smokemakes AIC smallest
Still goes on ...
Removing is also considered
Removing is also considered
F-test using drop1()## age is the least significant by partial F testdrop1(lm.full, test = "F")
## After elimination, ftv.cat is the least significantdrop1(update(lm.full, ~ . -age), test = "F")
## After elimination, preterm is least significat at p = 0.12.drop1(update(lm.full, ~ . -age -ftv.cat), test = "F")
## After elimination, all variables are significant at p < 0.1drop1(update(lm.full, ~ . -age -ftv.cat -preterm), test = "F")
## Show summary for final modelsummary(update(lm.full, ~ . -age -ftv.cat -preterm))
Updating models## Remove age from full modellm.age.less <- update(lm.full, ~ . -age)
## Adding ui to null modellm.ui.only <- update(lm.null, ~ . +ui)
all variables(.) minus age
all variables (.) plus ui
age least significant
ftv.cat least significant
remove age, and test
test full model
remove age, ftv.cat
F-test comparing age-in model to age-out model
F-test using add1()## ui is the most significant variableadd1(lm.null, scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
## After inclusion, race.cat is the most significantadd1(update(lm.null, ~ . +ui), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
## After inclusion, smoke is the most significantadd1(update(lm.null, ~ . +ui +race.cat), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
## After inclusion, ht is the most significantadd1(update(lm.null, ~ . +ui +race.cat +smoke), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")...
ui most significant
race.cat most significant
add ui, and test
test null model
add ui and race.cat
F-test comparing ui-out model to ui-in model
All-subset regression using leaps package
library(leaps)
regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, nbest = 1, nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")
summary(regsubsets.out)
library(leaps)
regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, nbest = 1, nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")
summary(regsubsets.out)
Result object
Full model
How many best models?Max model size Forced variables
Forced variables
Best 1 predictor
model
Best 7 predictor
model
Best 10 predictor
model
Variable combination
plot(regsubsets.out, scale = "adjr2", main = "Adjusted R^2")
~ lwt + smoke + ht + ui + race.cat + preterm
~ ui
~ smoke + ht + ui + race
the higher the better
library(car)subsets(regsubsets.out, statistic="adjr2", legend = FALSE, min.size = 5, main = "Adjusted R^2")
~ lwt + smoke + ht + ui + race.cat + preterm
subsets(regsubsets.out, statistic="cp", legend = FALSE, min.size = 5, main = "Mallow Cp")
~ lwt + smoke + ht + ui + race.cat + preterm
First model for which Mallow Cp is less than number of
regressors + 1