linear regression with r 1

58
Linear Regression with 2012-12-07 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO KNOW 1: Prepare data/specify model/read results

Upload: kazuki-yoshida

Post on 20-Jun-2015

955 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Linear regression with R 1

Linear Regressionwith

2012-12-07 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO  KNOW

1: Prepare data/specify model/read results

Page 2: Linear regression with R 1

Group Website is at:

http://rpubs.com/kaz_yos/useR_at_HSPH

Page 3: Linear regression with R 1

n Introduction

n Reading Data into R (1)

n Reading Data into R (2)

n Descriptive, continuous

n Descriptive, categorical

n Deducer

n Graphics

n Groupwise, continuous

n

Previously in this group

Page 4: Linear regression with R 1

Menu

n Linear regression

Page 5: Linear regression with R 1

Ingredients

n Data preparation

n Model formula

n within()

n factor(), relevel()

n lm()

n formula = Y ~ X1 + X2

n summary()

n anova(), car::Anova()

Statistics Programming

Page 6: Linear regression with R 1

Open R Studio

Page 7: Linear regression with R 1

Create a new scriptand save it.

Page 9: Linear regression with R 1

lowbwt.dat

http://www.umass.edu/statdata/statdata/data/lowbwt.txthttp://www.umass.edu/statdata/statdata/data/lowbwt.dat

We will use lowbwt dataset used in BIO213

Page 10: Linear regression with R 1

lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4)

Load dataset from web

header = TRUEto pick up

variable names

skip 4 rows

Page 11: Linear regression with R 1

lbw[c(10,39), "BWT"] <- c(2655, 3035)

“Fix” dataset

Replace data pointsto make the dataset identical

to BIO213 dataset10th,39th

rows

BWT column

Page 12: Linear regression with R 1

Lower case variable names

names(lbw) <- tolower(names(lbw))

Convert variable names to lower case

Put them back into variable names

Page 13: Linear regression with R 1

See overview

Page 14: Linear regression with R 1

library(gpairs)gpairs(lbw)

Page 15: Linear regression with R 1
Page 16: Linear regression with R 1

RecodingChanging and creating variables

Page 17: Linear regression with R 1

dataset <- within(dataset, { _variable manipulations_

})

Take datasetName of newly created dataset

(here replacing original)

Perform variable manipulationYou can specify by variable name

only. No need for dataset$var_name

Page 18: Linear regression with R 1

lbw <- within(lbw, {

## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")

## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})

Page 19: Linear regression with R 1

lbw <- within(lbw, {

## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")

## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})1 to White2 to Black3 to Other

Categorize race and label:

Numeric to categorical: element by element

1st will be reference

1st will be reference

Page 20: Linear regression with R 1

lbw <- within(lbw, {

## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

})

factor() to create categorical variable

Take race variable

Order levels 1, 2, 3Make 1 reference level

Label levels 1, 2, 3 as White, Black, Other

Create new variable named

race.cat

Explained more in depth

Page 21: Linear regression with R 1

lbw <- within(lbw, {

## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")

## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})

-Inf Inf0 1 2 3 4 5 6] ] ](None Normal Many

Numeric to categorical:range to element

1st will be reference

How breaks work

Page 22: Linear regression with R 1

lbw <- within(lbw, {

## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")

## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})

Reset reference level

Change reference level of ftv.cat variablefrom None to Normal

Page 23: Linear regression with R 1

lbw <- within(lbw, {

## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")

## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))

})

Numeric to Boolean to Category

ptl < 1 to FALSE, then to “0”ptl >= 1 to TRUE, then to “1+”

TRUE, FALSE vector created

here levels labels

Page 24: Linear regression with R 1

lbw <- within(lbw, {

## Categorize smoke ht ui smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) ht <- factor(ht, levels = 0:1, labels = c("No","Yes")) ui <- factor(ui, levels = 0:1, labels = c("No","Yes"))

})

## Alternative to abovelbw[,c("smoke","ht","ui")] <- lapply(lbw[,c("smoke","ht","ui")], function(var) { var <- factor(var, levels = 0:1, labels = c("No","Yes")) })

Binary 0,1 to No,Yes

One-by-one method

Loop method

Page 25: Linear regression with R 1

model formula

Page 26: Linear regression with R 1

outcome ~ predictor1 + predictor2 + predictor3

formula

SAS equivalent: model outcome = predictor1 predictor2 predictor3;

Page 27: Linear regression with R 1

age ~ zyg

In the case of t-test

continuous variable to be compared

grouping variable to separate groups

Variable to be explained

Variable used to explain

Page 28: Linear regression with R 1

Y ~ X1 + X2

linear sum

Page 29: Linear regression with R 1

n . All variables except for the outcome

n + X2 Add X2 term

n - 1 Remove intercept

n X1:X2 Interaction term between X1 and X2

n X1*X2 Main effects and interaction term

Page 30: Linear regression with R 1

Y ~ X1 + X2 + X1:X2

Interaction term

Main effects Interaction

Page 31: Linear regression with R 1

Y ~ X1 * X2

Interaction term

Main effects & interaction

Page 32: Linear regression with R 1

Y ~ X1 + I(X2 * X3)

On-the-fly variable manipulation

New variable (X2 times X3) created on-the-fly and used

Inhibit formula interpretation. For math

manipulation

Page 33: Linear regression with R 1

lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm , data = lbw)

Fit a model

Page 34: Linear regression with R 1

lm.full

See model object

Page 35: Linear regression with R 1

Call: command repeated

Coefficient for each variable

Page 36: Linear regression with R 1

summary(lm.full)

See summary

Page 37: Linear regression with R 1

Call: command repeated

Model F-test

Residual distribution

Dummy variables created

R^2 and adjusted R^2

Coef/SE = t

Page 38: Linear regression with R 1

ftv.catNone No 1st trimester visit people compared to Normal 1st trimester visit people (reference level)

ftv.catMany Many 1st trimester visit people compared to Normal 1st trimester visit people (reference level)

Page 39: Linear regression with R 1

race.catBlack Black people compared to White people (reference level)

race.catOther Other people compared to White people (reference level)

Page 40: Linear regression with R 1

confint(fit.lm)

Confidence intervals

Page 41: Linear regression with R 1

Lower boundary

Upper boundary

Confidence intervals

Page 42: Linear regression with R 1

anova(lm.full)

ANOVA table (type I)

Page 43: Linear regression with R 1

degree of freedom

Sequential SS

Mean SS = SS/DF

F = Mean SS / Mean SS of residual

ANOVA table (type I)

Page 44: Linear regression with R 1

1 age

2 lwt

3 smoke

1st gets all in type I

2nd gets all but overlap

between 1 in type Ilast remaining

only in type I

Type I = Sequential SS

Page 45: Linear regression with R 1

library(car)Anova(lm.full, type = 3)

ANOVA table (type III)

Page 46: Linear regression with R 1

degree of freedom

Marginal SS

F = Mean SS / Mean SS of residual

ANOVA table (type III)

Multi-category variables tested as

one

Page 47: Linear regression with R 1

1 age

2 lwt

3 smoke

1st gets margin

only in type III

2nd

gets

margin

only

in ty

pe II

I

last gets margin

only in type III

Type III = Marginal SS

Page 48: Linear regression with R 1

Type I Type III

Comparison

Page 49: Linear regression with R 1

library(effects)plot(allEffects(lm.full), ylim = c(2000,4000))

Effect plot

Fix Y-axis values for all

plots

Page 50: Linear regression with R 1

Effect of a variable with other covariate

set at average

Page 51: Linear regression with R 1

Interaction

Page 52: Linear regression with R 1

lm.full.int <- lm(bwt ~ age*lwt + smoke + ht + ui + age*ftv.cat + race.cat*preterm, data = lbw)

Continuous * Continuous

Categorical * CategoricalContinuous * Categorical

This model is for demonstration purpose.

Page 53: Linear regression with R 1

Anova(lm.full.int, type = 3)

Page 54: Linear regression with R 1

degree of freedom

Marginal SS

F = Mean SS / Mean SS of residual

Interactionterms

Page 55: Linear regression with R 1

plot(effect("age:lwt", lm.full.int))

lwt level

Con

tinuo

us *

Con

tinuo

us

Page 56: Linear regression with R 1

plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE)C

ontin

uous

* C

ateg

oric

al

Page 57: Linear regression with R 1

Cat

egor

ical

* C

ateg

oric

alplot(effect(c("race.cat*preterm"), lm.full.int),

x.var = "preterm", z.var = "race.cat", multiline = TRUE)

Page 58: Linear regression with R 1