linear regression with r 2

Linear Regressionwith

2012-12-10 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO KNOW

2: Model selection

Group Website is at:

http://rpubs.com/kaz_yos/useR_at_HSPH



n Introduction

n Reading Data into R (1)

n Reading Data into R (2)

n Descriptive, continuous

n Descriptive, categorical

n Deducer

n Graphics

n Groupwise, continuous

n Linear regression

Previously in this group

Menu

n Linear regression: Model selection

Ingredients

n Selection methods n step()

n drop1()

n add1()

n leaps::regsubsets()

Statistics Programming

Open R Studio

Open the saved script that we

created last time.See also Linear Regression with R 1 slides

Create full & null models

lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw)

lm.null <- lm(bwt ~ 1, data = lbw)

Intercept-only

Compare two models

anova(lm.full, lm.null)

Model 2Model 1

Partial F-test

Significant

Models

Residual degree of freedomResidual sum of squares

Difference in residual SS

Backward elimination

lm.step.bw <- step(lm.full, direction = "backward")

Final model object

Specify full model

Initial AIC

for full model

Removing ftv.catmakes AIC smallest

Removing agemakes AIC smallest

Doing nothingmakes AIC smallest

Forward selection

lm.step.fw <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "forward")

Final model object Specify null model

formula for possible variables

Initial AIC for null

model

Adding uimakes AIC smallest

Adding race.catmakes AIC smallest

Adding smokemakes AIC smallest

Still goes on ...

Stepwise selection/elimination

lm.step.both <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "both")

Final model object Specify null model

formula for possible variables

Initial AIC for null

model

Adding uimakes AIC smallest

Adding race.catmakes AIC smallest

Adding smokemakes AIC smallest

Still goes on ...

Removing is also considered

Removing is also considered

F-test using drop1()## age is the least significant by partial F testdrop1(lm.full, test = "F")

## After elimination, ftv.cat is the least significantdrop1(update(lm.full, ~ . -age), test = "F")

## After elimination, preterm is least significat at p = 0.12.drop1(update(lm.full, ~ . -age -ftv.cat), test = "F")

## After elimination, all variables are significant at p < 0.1drop1(update(lm.full, ~ . -age -ftv.cat -preterm), test = "F")

## Show summary for final modelsummary(update(lm.full, ~ . -age -ftv.cat -preterm))

Updating models## Remove age from full modellm.age.less <- update(lm.full, ~ . -age)

## Adding ui to null modellm.ui.only <- update(lm.null, ~ . +ui)

all variables(.) minus age

all variables (.) plus ui

age least significant

ftv.cat least significant

remove age, and test

test full model

remove age, ftv.cat

F-test comparing age-in model to age-out model

F-test using add1()## ui is the most significant variableadd1(lm.null, scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")

## After inclusion, race.cat is the most significantadd1(update(lm.null, ~ . +ui), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")

## After inclusion, smoke is the most significantadd1(update(lm.null, ~ . +ui +race.cat), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")

## After inclusion, ht is the most significantadd1(update(lm.null, ~ . +ui +race.cat +smoke), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")...

ui most significant

race.cat most significant

add ui, and test

test null model

add ui and race.cat

F-test comparing ui-out model to ui-in model

All-subset regression using leaps package

library(leaps)

regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, nbest = 1, nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")

summary(regsubsets.out)

library(leaps)

regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, nbest = 1, nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")

summary(regsubsets.out)

Result object

Full model

How many best models?Max model size Forced variables

Forced variables

Best 1 predictor

model

Best 7 predictor

model

Best 10 predictor

model

Variable combination

plot(regsubsets.out, scale = "adjr2", main = "Adjusted R^2")

~ lwt + smoke + ht + ui + race.cat + preterm

~ ui

~ smoke + ht + ui + race

the higher the better

library(car)subsets(regsubsets.out, statistic="adjr2", legend = FALSE, min.size = 5, main = "Adjusted R^2")


subsets(regsubsets.out, statistic="cp", legend = FALSE, min.size = 5, main = "Mallow Cp")


First model for which Mallow Cp is less than number of

regressors + 1

linear regression with r 2

Education

linear regressionwith2

model selection

menun linear regression

n groupwise

continuousn reading

continuousn descriptive

categoricaln deducer

cle student freedom