linear regression with r 2

29
Linear Regression with 2012-12-10 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO KNOW 2: Model selection

Upload: kazuki-yoshida

Post on 20-Jun-2015

755 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Linear regression with R 2

Linear Regressionwith

2012-12-10 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO  KNOW

2: Model selection

Page 2: Linear regression with R 2

Group Website is at:

http://rpubs.com/kaz_yos/useR_at_HSPH

Page 3: Linear regression with R 2

n Introduction

n Reading Data into R (1)

n Reading Data into R (2)

n Descriptive, continuous

n Descriptive, categorical

n Deducer

n Graphics

n Groupwise, continuous

n Linear regression

Previously in this group

Page 4: Linear regression with R 2

Menu

n Linear regression: Model selection

Page 5: Linear regression with R 2

Ingredients

n Selection methods n step()

n drop1()

n add1()

n leaps::regsubsets()

Statistics Programming

Page 6: Linear regression with R 2

Open R Studio

Page 7: Linear regression with R 2

Open the saved script that we

created last time.See also Linear Regression with R 1 slides

Page 8: Linear regression with R 2

Create full & null models

lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw)

lm.null <- lm(bwt ~ 1, data = lbw)

Intercept-only

Page 9: Linear regression with R 2

Compare two models

anova(lm.full, lm.null)

Model 2Model 1

Page 10: Linear regression with R 2

Partial F-test

Significant

Models

Residual degree of freedomResidual sum of squares

Difference in residual SS

Page 11: Linear regression with R 2

Backward elimination

lm.step.bw <- step(lm.full, direction = "backward")

Final model object

Specify full model

Page 12: Linear regression with R 2

Initial AIC

for full model

Removing ftv.catmakes AIC smallest

Removing agemakes AIC smallest

Doing nothingmakes AIC smallest

Page 13: Linear regression with R 2

Forward selection

lm.step.fw <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "forward")

Final model object Specify null model

formula for possible variables

Page 14: Linear regression with R 2

Initial AIC for null

model

Adding uimakes AIC smallest

Adding race.catmakes AIC smallest

Adding smokemakes AIC smallest

Still goes on ...

Page 15: Linear regression with R 2

Stepwise selection/elimination

lm.step.both <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "both")

Final model object Specify null model

formula for possible variables

Page 16: Linear regression with R 2

Initial AIC for null

model

Adding uimakes AIC smallest

Adding race.catmakes AIC smallest

Adding smokemakes AIC smallest

Still goes on ...

Removing is also considered

Removing is also considered

Page 17: Linear regression with R 2

F-test using drop1()## age is the least significant by partial F testdrop1(lm.full, test = "F")

## After elimination, ftv.cat is the least significantdrop1(update(lm.full, ~ . -age), test = "F")

## After elimination, preterm is least significat at p = 0.12.drop1(update(lm.full, ~ . -age -ftv.cat), test = "F")

## After elimination, all variables are significant at p < 0.1drop1(update(lm.full, ~ . -age -ftv.cat -preterm), test = "F")

## Show summary for final modelsummary(update(lm.full, ~ . -age -ftv.cat -preterm))

Page 18: Linear regression with R 2

Updating models## Remove age from full modellm.age.less <- update(lm.full, ~ . -age)

## Adding ui to null modellm.ui.only <- update(lm.null, ~ . +ui)

all variables(.) minus age

all variables (.) plus ui

Page 19: Linear regression with R 2

age least significant

ftv.cat least significant

remove age, and test

test full model

remove age, ftv.cat

F-test comparing age-in model to age-out model

Page 20: Linear regression with R 2

F-test using add1()## ui is the most significant variableadd1(lm.null, scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")

## After inclusion, race.cat is the most significantadd1(update(lm.null, ~ . +ui), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")

## After inclusion, smoke is the most significantadd1(update(lm.null, ~ . +ui +race.cat), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")

## After inclusion, ht is the most significantadd1(update(lm.null, ~ . +ui +race.cat +smoke), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")...

Page 21: Linear regression with R 2

ui most significant

race.cat most significant

add ui, and test

test null model

add ui and race.cat

F-test comparing ui-out model to ui-in model

Page 22: Linear regression with R 2

All-subset regression using leaps package

Page 23: Linear regression with R 2

library(leaps)

regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, nbest = 1, nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")

summary(regsubsets.out)

Page 24: Linear regression with R 2

library(leaps)

regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, nbest = 1, nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")

summary(regsubsets.out)

Result object

Full model

How many best models?Max model size Forced variables

Page 25: Linear regression with R 2

Forced variables

Best 1 predictor

model

Best 7 predictor

model

Best 10 predictor

model

Variable combination

Page 26: Linear regression with R 2

plot(regsubsets.out, scale = "adjr2", main = "Adjusted R^2")

~ lwt + smoke + ht + ui + race.cat + preterm

~ ui

~ smoke + ht + ui + race

the higher the better

Page 27: Linear regression with R 2

library(car)subsets(regsubsets.out, statistic="adjr2", legend = FALSE, min.size = 5, main = "Adjusted R^2")

~ lwt + smoke + ht + ui + race.cat + preterm

Page 28: Linear regression with R 2

subsets(regsubsets.out, statistic="cp", legend = FALSE, min.size = 5, main = "Mallow Cp")

~ lwt + smoke + ht + ui + race.cat + preterm

First model for which Mallow Cp is less than number of

regressors + 1

Page 29: Linear regression with R 2