regression analysis linear regression logistic regression

32
Regression analysis Linear regression Logistic regression

Upload: darrell-tyler

Post on 01-Apr-2015

375 views

Category:

Documents


13 download

TRANSCRIPT

Page 1: Regression analysis Linear regression Logistic regression

Regression analysis

Linear regression Logistic regression

Page 2: Regression analysis Linear regression Logistic regression

2

Relationship and association

Page 3: Regression analysis Linear regression Logistic regression

3

Straight line

95 95.5 96 96.5 97 97.5 98 98.5 9921.523

21.5235

21.524

21.5245

21.525

21.5255

21.526

21.5265

H ip (cm )

1 cm

-0.0008BM

I

XbbY 10

XBMI 0008.01000

)(

)(

12

121 XX

YYb

onintersecti0 b

HIPBMI 10 bb

Page 4: Regression analysis Linear regression Logistic regression

4

Best straight line?

Page 5: Regression analysis Linear regression Logistic regression

5

Best straight line!

90 92 94 96 98 100 102 104 106 10814

16

18

20

22

24

26

28

30

32

(X1,Y1)

11 YYe

N

iii YYe

1

Least square estimation

Page 6: Regression analysis Linear regression Logistic regression

6

Simple linear regression

1. Is the association linear?

-3 -2 -1 0 1 2 3-4

-2

0

2

4

6

8

10

12

Page 7: Regression analysis Linear regression Logistic regression

7

Simple linear regression

1. Is the association linear?2. Describe the

association: what is b0 and b1

BMI = -12.6kg/m2+0.35kg/m3*Hip

21

XX

YYXXb

i

ii

n

XX i

XbYb 10

Page 8: Regression analysis Linear regression Logistic regression

8

Simple linear regression

1. Is the association linear?2. Describe the association3. Is the slope significantly

different from 0?Help SPSS!!!

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -12,581 2,331 -5,396 ,000

Hip ,345 ,023 ,565 15,266 ,000

a. Dependent Variable: BMI

Page 9: Regression analysis Linear regression Logistic regression

9

Simple linear regression

1. Is the association linear?2. Describe the association3. Is the slope significantly

different from 0?4. How good is the fit?

How far are the data points fom the line on avarage?

11

22

r

YYXX

YYXXr

ii

ii

Page 10: Regression analysis Linear regression Logistic regression

10

The Correlation Coefficient, r

R = 0

R = 1

R = 0.7

R = -0.5

Page 11: Regression analysis Linear regression Logistic regression

11

r2 – Goodness of fitHow much of the variation can be explained by the model?

R2 = 0

R2 = 1

R2 = 0.5

R2 = 0.2

Page 12: Regression analysis Linear regression Logistic regression

12

Multiple linear regression

Could waist measure descirbe some of the variation in BMI?BMI =1.3 kg/m2 + 0.42 kg/m3 * WaistOr even better:

WSTHIPBMI 210 bbb

0.17WST0.25HIP12.2- BMI

Page 13: Regression analysis Linear regression Logistic regression

13

Multiple linear regression

Adding age: adj R2 = 0.352

Adding thigh: adj R2 = 0.352?

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95,0% Confidence Interval

for B

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) -9,001 2,449 -3,676 ,000 -13,813 -4,190

Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252

Hip ,252 ,031 ,411 8,012 ,000 ,190 ,313

Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028

a. Dependent Variable: BMI

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95,0% Confidence Interval

for B

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) 3,581 1,784 2,007 ,045 ,075 7,086

Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252

Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028

Thigh ,252 ,031 ,411 8,012 ,000 ,190 ,313

a. Dependent Variable: BMI

Page 14: Regression analysis Linear regression Logistic regression

14

Assumptions

1. Dependent variable must be metric continuous

2. Independent must be continuous or ordinal

3. Linear relationship between dependent and all independent variables

4. Residuals must have a constant spread.

5. Residuals are normal distributed6. Independent variables are not

perfectly correlated with each other

Page 15: Regression analysis Linear regression Logistic regression

Non-parametric correlation

15

Page 16: Regression analysis Linear regression Logistic regression

16

Ranked Correlation

Kendall’s Spearman’s rs

Korrelation koefficienten er mellem -1 og 1. Hvor -1 er perfekt omvendt korrelation, 0 betyder ingen korrelation,

og 1 betyder perfekt korrelation.

Pearson is the correlation method for normal dataRemember the assumptions:1. Dependent variable must be metric continuous2. Independent must be continuous or ordinal3. Linear relationship between dependent and all independent

variables4. Residuals must have a constant spread.5. Residuals are normal distributed

Page 17: Regression analysis Linear regression Logistic regression

17

Kendall’s - Et eksempel

Page 18: Regression analysis Linear regression Logistic regression

18

Kendall’s - Et eksempel

121

nn

S QPS

Page 19: Regression analysis Linear regression Logistic regression

19

Spearman – det samme eksempel

d2 1 4 9 1 1 1 9 9 1 16

0.68481010

5261

61

33

2

nn

drs

Page 20: Regression analysis Linear regression Logistic regression

20

Korrelation i SPSS

Page 21: Regression analysis Linear regression Logistic regression

21

Korrelation i SPSS

Correlations

a b

a Pearson

Correlation

1 ,685*

Sig. (2-tailed) ,029

N 10 10

b Pearson

Correlation

,685* 1

Sig. (2-tailed) ,029

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Correlations

a b

Kendall's tau_b a Correlation

Coefficient

1,000 ,511*

Sig. (2-tailed) . ,040

N 10 10

b Correlation

Coefficient

,511* 1,000

Sig. (2-tailed) ,040 .

N 10 10

Spearman's rho a Correlation

Coefficient

1,000 ,685*

Sig. (2-tailed) . ,029

N 10 10

b Correlation

Coefficient

,685* 1,000

Sig. (2-tailed) ,029 .

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Page 22: Regression analysis Linear regression Logistic regression

Logistic regression

22

Page 23: Regression analysis Linear regression Logistic regression

23

Logistic Regression

• If the dependent variable is categorical and especially binary?

• Use some interpolation method

• Linear regression cannot help us.

Page 24: Regression analysis Linear regression Logistic regression

24

The sigmodal curve

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

Page 25: Regression analysis Linear regression Logistic regression

25

The sigmodal curve

• The intercept basically just ‘scale’ the input variable

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

0 = 2;

1 = 1

0 = -2;

1 = 1

Page 26: Regression analysis Linear regression Logistic regression

26

The sigmodal curve

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

0 = 0;

1 = 2

0 = 0;

1 = 0.5

• The intercept basically just ‘scale’ the input variable

• Large regression coefficient → risk factor strongly influences the probability

Page 27: Regression analysis Linear regression Logistic regression

27

The sigmodal curve

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

0 = 0;

1 = -1

• The intercept basically just ‘scale’ the input variable

• Large regression coefficient → risk factor strongly influences the probability

• Positive regression coefficient → risk factor increases the probability

• Logistic regession uses maximum likelihood estimation, not least square estimation

Page 28: Regression analysis Linear regression Logistic regression

28

Does age influence the diagnosis? Continuous independent variable

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,109 ,010 108,745 1 ,000 1,115 1,092 1,138

Constant -4,213 ,423 99,097 1 ,000 ,015

a. Variable(s) entered on step 1: Age.

age1

1

10

BBze

pz

Page 29: Regression analysis Linear regression Logistic regression

29

Does previous intake of OCP influence the diagnosis? Categorical independent variable

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a OCP(1) -,311 ,180 2,979 1 ,084 ,733 ,515 1,043

Constant ,233 ,123 3,583 1 ,058 1,263

a. Variable(s) entered on step 1: OCP.

OCP1

1

10

BBze

pz

0.48051

1

1

1)1( 1, OCP If

0.55801

1

1

1)1( 0, OCP If

311.0233.01

233.0

10

0

eeYp

eeYp

BB

B

Page 30: Regression analysis Linear regression Logistic regression

30

Odds ratio

zep

po

1

0.7327 ratio odds 311.01010

0

10

eeee

e BBBBB

BB

Page 31: Regression analysis Linear regression Logistic regression

31

Multiple logistic regression

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157

BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128

OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603

Constant -6,974 ,762 83,777 1 ,000 ,001

a. Variable(s) entered on step 1: Age, BMI, OCP.

BMIageOCP1

1

3210

BBBBze

pz

Page 32: Regression analysis Linear regression Logistic regression

32

Predicting the diagnosis by logistic regression

What is the probability that the tumor of a 50 year old woman who has been using OCP and has a BMI of 26 is malignant?

z = -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140p = 1/(1+e-1.6140) = 0.8340

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157

BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128

OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603

Constant -6,974 ,762 83,777 1 ,000 ,001

a. Variable(s) entered on step 1: Age, BMI, OCP.