regression analysis linear regression logistic regression

Post on 01-Apr-2015

378 Views

Category:

Documents

13 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Regression analysis

Linear regression Logistic regression

2

Relationship and association

3

Straight line

95 95.5 96 96.5 97 97.5 98 98.5 9921.523

21.5235

21.524

21.5245

21.525

21.5255

21.526

21.5265

H ip (cm )

1 cm

-0.0008BM

I

XbbY 10

XBMI 0008.01000

)(

)(

12

121 XX

YYb

onintersecti0 b

HIPBMI 10 bb

4

Best straight line?

5

Best straight line!

90 92 94 96 98 100 102 104 106 10814

16

18

20

22

24

26

28

30

32

(X1,Y1)

11 YYe

N

iii YYe

1

Least square estimation

6

Simple linear regression

1. Is the association linear?

-3 -2 -1 0 1 2 3-4

-2

0

2

4

6

8

10

12

7

Simple linear regression

1. Is the association linear?2. Describe the

association: what is b0 and b1

BMI = -12.6kg/m2+0.35kg/m3*Hip

21

XX

YYXXb

i

ii

n

XX i

XbYb 10

8

Simple linear regression

1. Is the association linear?2. Describe the association3. Is the slope significantly

different from 0?Help SPSS!!!

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -12,581 2,331 -5,396 ,000

Hip ,345 ,023 ,565 15,266 ,000

a. Dependent Variable: BMI

9

Simple linear regression

1. Is the association linear?2. Describe the association3. Is the slope significantly

different from 0?4. How good is the fit?

How far are the data points fom the line on avarage?

11

22

r

YYXX

YYXXr

ii

ii

10

The Correlation Coefficient, r

R = 0

R = 1

R = 0.7

R = -0.5

11

r2 – Goodness of fitHow much of the variation can be explained by the model?

R2 = 0

R2 = 1

R2 = 0.5

R2 = 0.2

12

Multiple linear regression

Could waist measure descirbe some of the variation in BMI?BMI =1.3 kg/m2 + 0.42 kg/m3 * WaistOr even better:

WSTHIPBMI 210 bbb

0.17WST0.25HIP12.2- BMI

13

Multiple linear regression

Adding age: adj R2 = 0.352

Adding thigh: adj R2 = 0.352?

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95,0% Confidence Interval

for B

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) -9,001 2,449 -3,676 ,000 -13,813 -4,190

Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252

Hip ,252 ,031 ,411 8,012 ,000 ,190 ,313

Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028

a. Dependent Variable: BMI

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95,0% Confidence Interval

for B

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) 3,581 1,784 2,007 ,045 ,075 7,086

Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252

Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028

Thigh ,252 ,031 ,411 8,012 ,000 ,190 ,313

a. Dependent Variable: BMI

14

Assumptions

1. Dependent variable must be metric continuous

2. Independent must be continuous or ordinal

3. Linear relationship between dependent and all independent variables

4. Residuals must have a constant spread.

5. Residuals are normal distributed6. Independent variables are not

perfectly correlated with each other

Non-parametric correlation

15

16

Ranked Correlation

Kendall’s Spearman’s rs

Korrelation koefficienten er mellem -1 og 1. Hvor -1 er perfekt omvendt korrelation, 0 betyder ingen korrelation,

og 1 betyder perfekt korrelation.

Pearson is the correlation method for normal dataRemember the assumptions:1. Dependent variable must be metric continuous2. Independent must be continuous or ordinal3. Linear relationship between dependent and all independent

variables4. Residuals must have a constant spread.5. Residuals are normal distributed

17

Kendall’s - Et eksempel

18

Kendall’s - Et eksempel

121

nn

S QPS

19

Spearman – det samme eksempel

d2 1 4 9 1 1 1 9 9 1 16

0.68481010

5261

61

33

2

nn

drs

20

Korrelation i SPSS

21

Korrelation i SPSS

Correlations

a b

a Pearson

Correlation

1 ,685*

Sig. (2-tailed) ,029

N 10 10

b Pearson

Correlation

,685* 1

Sig. (2-tailed) ,029

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Correlations

a b

Kendall's tau_b a Correlation

Coefficient

1,000 ,511*

Sig. (2-tailed) . ,040

N 10 10

b Correlation

Coefficient

,511* 1,000

Sig. (2-tailed) ,040 .

N 10 10

Spearman's rho a Correlation

Coefficient

1,000 ,685*

Sig. (2-tailed) . ,029

N 10 10

b Correlation

Coefficient

,685* 1,000

Sig. (2-tailed) ,029 .

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Logistic regression

22

23

Logistic Regression

• If the dependent variable is categorical and especially binary?

• Use some interpolation method

• Linear regression cannot help us.

24

The sigmodal curve

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

25

The sigmodal curve

• The intercept basically just ‘scale’ the input variable

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

0 = 2;

1 = 1

0 = -2;

1 = 1

26

The sigmodal curve

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

0 = 0;

1 = 2

0 = 0;

1 = 0.5

• The intercept basically just ‘scale’ the input variable

• Large regression coefficient → risk factor strongly influences the probability

27

The sigmodal curve

0 1 1

1

1 e...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0;

1 = 1

0 = 0;

1 = -1

• The intercept basically just ‘scale’ the input variable

• Large regression coefficient → risk factor strongly influences the probability

• Positive regression coefficient → risk factor increases the probability

• Logistic regession uses maximum likelihood estimation, not least square estimation

28

Does age influence the diagnosis? Continuous independent variable

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,109 ,010 108,745 1 ,000 1,115 1,092 1,138

Constant -4,213 ,423 99,097 1 ,000 ,015

a. Variable(s) entered on step 1: Age.

age1

1

10

BBze

pz

29

Does previous intake of OCP influence the diagnosis? Categorical independent variable

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a OCP(1) -,311 ,180 2,979 1 ,084 ,733 ,515 1,043

Constant ,233 ,123 3,583 1 ,058 1,263

a. Variable(s) entered on step 1: OCP.

OCP1

1

10

BBze

pz

0.48051

1

1

1)1( 1, OCP If

0.55801

1

1

1)1( 0, OCP If

311.0233.01

233.0

10

0

eeYp

eeYp

BB

B

30

Odds ratio

zep

po

1

0.7327 ratio odds 311.01010

0

10

eeee

e BBBBB

BB

31

Multiple logistic regression

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157

BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128

OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603

Constant -6,974 ,762 83,777 1 ,000 ,001

a. Variable(s) entered on step 1: Age, BMI, OCP.

BMIageOCP1

1

3210

BBBBze

pz

32

Predicting the diagnosis by logistic regression

What is the probability that the tumor of a 50 year old woman who has been using OCP and has a BMI of 26 is malignant?

z = -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140p = 1/(1+e-1.6140) = 0.8340

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157

BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128

OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603

Constant -6,974 ,762 83,777 1 ,000 ,001

a. Variable(s) entered on step 1: Age, BMI, OCP.

top related