experimental design and analysis multiple linear regression gerry quinn & mick keough, 1998 do...

27
Experimental design and analysis Multiple linear regression Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Upload: nataly-chugg

Post on 14-Dec-2015

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Experimental design and analysis

Multiple linear regression

Gerry Quinn & Mick Keough, 1998Do not copy or distribute without permission of authors.

Page 2: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Multiple regression

• One response (dependent) variable:– Y

• More than one predictor (independent variable) variable:– X1, X2, X3 etc.

– number of predictors = p

• Number of observations = n

Page 3: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Example

• A sample of 51 mammal species (n = 51)• Response variable:

– total sleep time in hrs/day (y)

• Predictors:– body weight in kg (x1)

– brain weight in g (x2)

– maximum life span in years (x3)

– gestation time in days (x4)

Page 4: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Regression models

Population model (equation):

• yi = 0 + 1x1 + 2x2 + .... + i

Sample equation:

• yi = b0 + b1x1 + b2x2 + ....

Page 5: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Example

• Regression model:

sleep = intercept + 1*bodywt + 2*brainwt + 3*lifespan + 4*gestime

Page 6: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Multiple regression equation

Totalsleep

Log lifespan

Log body weight

Page 7: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Partial regression coefficients

• Ho: 1 = 0

• Partial population regression coefficient (slope) for y on x1, holding all other x’s constant, equals zero

• Example:– slope of regression of sleep against body

weight, holding brain weight, max. life span and gestation time constant, is 0.

Page 8: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Partial regression coefficients

• Ho: 2 = 0

• Partial population regression coefficient (slope) for y on x2, holding all other x’s constant, equals zero

• Example:– slope of regression of sleep against brain

weight, holding body weight, max. life span and gestation time constant, is 0.

Page 9: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Testing HO: i = 0

• Use partial t-tests:

• t = bi / SEbi

• Compare with t-distribution with n-2 df

• Separate t-test for each partial regression coefficient in model

• Usual logic of t-tests:– reject HO if P < 0.05

Page 10: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Model comparison

• To test HO: 1 = 0

• Fit full model:– y = 0+1x1+2x2+3x3+…

• Fit reduced model:– y = 0+2x2+3x3+…

• Calculate SSextra:

– SSRegression(full) - SSRegression(reduced)

• F = MSextra / MSResidual(full)

Page 11: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Overall regression model

• Ho: 1 = 2 = ... = 0 (all population slopes equal zero).

• Test of whether overall regression equation is significant.

• Use ANOVA F-test:– Variation explained by regression– Unexplained (residual) variation

Page 12: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Regression diagnostics

• Residual is still observed y - predicted y– Studentised residuals still work

• Other diagnostics still apply:– residual plots– Cook’s D statistics

Page 13: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Assumptions

• Normality and homogeneity of variance for response variable

• Independence of observations

• Linearity

• No collinearity

Page 14: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Collinearity

• Collinearity:– predictors correlated

• Assumption of no collinearity:– predictor variables are uncorrelated with (ie.

independent of) each other

• Collinearity makes estimates of i’s and their significance tests unreliable:– low power for individual tests on i’s

Page 15: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Response (y) and 2 predictors (x1 and x2); n=20

1. x1 and x2 uncorrelated (r = -0.24)

coeff se tol t Pintercept -0.17 1.03 -0.16 0.873x1 1.13 0.14 0.95 7.86 <0.001x2 0.12 0.14 0.95 0.86 0.404

R2 = 0.787, F = 31.38, P < 0.001

Collinearity

Page 16: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Collinearity

intercept 0.49 0.72 0.69 0.503x1 1.55 1.21 0.01 1.28 0.219x2 -0.45 1.21 0.01 -0.37 0.714

2. rearrange x2 so x1 and x2 highly correlated (r = 0.99)

coeff se tol t P

R2 = 0.780, F = 30.05, P < 0.001

Page 17: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Checks for collinearity

• Correlation matrix between predictors• Tolerance for each predictor:

– 1-R2 for regression of that predictor on all others– if tolerance is low (<0.1) then collinearity is a

problem• Variance inflation factor (VIF) for each

predictor:– 1/tolerance– if VIF>10 then collinearity is a problem

Page 18: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Explained variance

R2

proportion of variation in y explained by linear relationship with x1, x2 etc.

SS Regression SS Total

Page 19: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Example

Sleep Bodywt Brainwt Lifespan Gestime

3.3 6654.000 5712.0 38.6 64512.5 3.385 44.5 14.0 60etc.

African elephantArctic foxetc.

Page 20: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Boxplots of variables

Page 21: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Collinearity problem for body weight and brain weight• low tolerance• highly correlated

Parameter Estimate SE Tol t PIntercept 18.94 3.11 6.09 <0.001Bodywt -0.76 1.31 0.08 -0.58 0.565Brainwt -0.84 2.03 0.05 -0.42 0.680Lifespan 2.60 2.05 0.33 1.27 0.211Gestime -5.11 1.81 0.36 -2.82 0.007

R2 = 0.486

Predictors log transformed

Page 22: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

No collinearity between any predictors:• all tolerances OK• reduced SE and larger slope for body weight

Parameter Estimate SE Tol t PIntercept 19.06 3.07 6.21 <0.001Bodwt -1.25 0.59 0.36 -2.09 0.042Lifespan 2.19 1.78 0.43 1.23 0.225Gestime -5.39 1.67 0.42 -3.23 0.002

R2 = 0.484

Omit brain weight because body weight and brain weight are so highly correlated.

Page 23: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Examples from literature

Page 24: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Lampert (1993)

• Ecology 74:1455-1466

• Response variable:– Daphnia (water flea) clutch size

• Predictors:– body size (mm)– particulate organic carbon (mg/L)– temperature (oC)

Page 25: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Lampert (1993)

Parameter Coeff. SE t P

Intercept -42.34 27.52 -1.54 0.168

Body size 14.76 7.10 2.08 0.076POC 0.27 0.43 0.61 0.559Temp 0.73 0.68 1.07 0.321

ANOVA P = 0.052, R2 = 0.684, n = 11

Page 26: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Williams et al. (1993)

• Ecology 74:904-918

• Response variable:– Zostera (seagrass) growth

• Predictors:– epiphyte biomass– porewater ammonium

Page 27: Experimental design and analysis Multiple linear regression  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors

Williams et al. (1993)

Parameter Coeff. P

Epiphyte biomass 0.340 >0.05Porewater ammonium 0.919 <0.05

R2 = 0.71Tolerance = 0.839 (so no collinearity)