lecture 14: logistic regression

35
Lecture 14: Logistic regression 1

Upload: teddy

Post on 23-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Lecture 14: Logistic regression. risk. 30. 30. 2  2 table (contingency table). Depression. No depression. 10. 20. Divorce . 1. 29. No divorce. a+b. c+d. 2  2 table (contingency table). Disease. Healthy. a. b. Exposed. c. d. Not-exposed. Risk ( absolute). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 14:  Logistic regression

1

Lecture 14: Logistic regression

Page 2: Lecture 14:  Logistic regression

2

22 table (contingency table)

10

1

20

29

Divorce

No divorce

30

30

Depression No depression risk

3010

301

Page 3: Lecture 14:  Logistic regression

3

22 table (contingency table)

a

c

b

d

Exposed

Not-exposed

a+b

c+d

Disease Healthy

Page 4: Lecture 14:  Logistic regression

4

Risk (absolute)

• Proportion of individuals initially healthy who contracted the disease during a given periode

• The observation period must be the same for everyone

• Given as a proportion (0-1), or a percentage (0-100%)

• Estimation of the risk

– For exposed

– For non-exposed

baaRe

dccRne

Page 5: Lecture 14:  Logistic regression

5

Relative risk

• Ratio of the risk for the exposed to the risk for the non-exposed (no unit)

• Interpretation– RR>1: exposition increases the risk– RR=1: exposition does not modify the risk– RR<1: exposition decreases the risk

• When RR<1, relative risk reduction (in %)may be presented

RR = Re / Rne

RRR = (1-RR)*100

Page 6: Lecture 14:  Logistic regression

6

Caveat

• Relative risk does not provide any information on the importance of the compared risks

• RR = 3 peut correspondre à:– Re = 45% Rne = 15%

– Re = 3% Rne = 1%

– Re = 0.006% Rne = 0.002%

– …

Page 7: Lecture 14:  Logistic regression

7

Risk difference

• Difference between the risks of the exposed and non-exposed individuals

• Given as a proportion or a percentage

• Interpretation– DR>0: exposition increases the risk– DR=0: exposition does not modify the risk– DR<0: exposition decreases the risk

DR = Re – Rne

Page 8: Lecture 14:  Logistic regression

8

Test and estimation

• Null hypothesis: no effect of exposition:– RR = 1

– DR = 0

• It is also possible to compute a standard-error for each of these statistics, and to obtain a confidence interval

H0: Re = Rne

2 test

Page 9: Lecture 14:  Logistic regression

BMJ 2003;327:1254-7

Page 10: Lecture 14:  Logistic regression

RR=1.53

Page 11: Lecture 14:  Logistic regression

11

Prevalence

• Proportion of individuals having a disease (or presenting a characteristic) at a given moment

• Moment is defined by• Date (eg: 19 october 2011)

• Event (eg: at birth)

Prevalence of smoking in an freshman class

Page 12: Lecture 14:  Logistic regression

12

Incidence rate

• Risk of contracting a disease during one time unit

• Numerator: new cases

• Denominator: sum of the time-person at risk– Person-years

– Person-days

– …

Page 13: Lecture 14:  Logistic regression

13

Comparison with prospective study: case control study…

– Allows to study very rare conditions (e.g., autism, suicide)– Can be made more quickly– Requires less observations (cost )– Allows to test several risk factors for the outcome

• But…– Does not allow to compute a risk, or an incidence rate

(question: why)– Risk of bias in the measure of risk factor– Difficult to choose appropriate control– Can only study one condition at a time

Page 14: Lecture 14:  Logistic regression

14

Case control study

a

c

b

d

Case Control

We take everyone

We take only a sample

Exposed

Non-exposed

The sums: a+b and c+d have no sense!

Impossible to compute risk and relative risk

Page 15: Lecture 14:  Logistic regression

15

Solution: odds ratio

• Transform a proportion in odds

• Transform odds in proportion:

ppodds

1

1oddsoddsp

Page 16: Lecture 14:  Logistic regression

16

Cas control study

a

c

b

d

Exposed

Non-exposed

Case Control

a/c b/d exposition odds

cbda

dbca

dbddbb

caccaa

ratioodds

//

)/()/(

)/()/(

denominator

numerator

Page 17: Lecture 14:  Logistic regression

17

Prospective study

a

c

b

d

Exposed

Non-exposed

Disease Healthy

a/b

c/d

Disease odds

a/c b/dExposition odds

Page 18: Lecture 14:  Logistic regression

18

Signification a-b-c-d

Case control Prospective study

a Case exposed to the risk factor

People exposed who develop the disease

b Control exposed People exposed remaining healthy

c Cases non-exposed

People non exposed who develop the disease

d Control non exposed

People non exposed remaining healthy

Page 19: Lecture 14:  Logistic regression

19

Property of odds ratio

• Exposition odds ratio = disease odds ratio

..exp //

//

malORdcba

cbda

dbcaOR

Exposition odds Disease oddsodds ratio

Odds ratio are the same, computed from a prospective study or a case control study

Page 20: Lecture 14:  Logistic regression

20

Odds ratio and relative risk

• When the condition is rare, (a<<b et c<<d), OR is approximately equal to RR

RRdccbaa

dcbaOR

)/()/(

//

Page 21: Lecture 14:  Logistic regression

21

• 573 patients with facial clefts

• 763 controls

• Exposition:– Taking more than 400 mg of folic acid

supplements

BMJ 2007; 334:464-470

Page 22: Lecture 14:  Logistic regression
Page 23: Lecture 14:  Logistic regression

23

Results

4.161882145491

ratioodds

Cleft No cleft

Folic acid < 400 mg

491 618

Folic acid ≥ 400 mg

82 145

Total 573 763

Odds 491 / 82 618 / 145

Odds ratio for folic acid < 400 mg

Page 24: Lecture 14:  Logistic regression

24

Why not compute the RR?

• The prevalence is not correct since the goal of the case-control design is to have as many cases as controls.

• Thus, a/(a+b) does not make sense.

Page 25: Lecture 14:  Logistic regression

25

Interpretation of odds ratio

• Similar to relative risk:– OR>1: IV is associated with DV (e.g., exposition is

associated with disease)– OR=1: IV is not associated with DV– OR<1: IV is negatively associated with DV

• Folic acid and facial cleft:– Not taking folic acid supplements (>400 mg) increases

the risk of facial cleft (by 40%)

Page 26: Lecture 14:  Logistic regression

26

Continuous IV

1. Compare means of cases and controls (t test)

2. Divide the IV in several categories and compute an odds ratio for each category

3. Model:

Page 27: Lecture 14:  Logistic regression

2740.00 50.00 60.00 70.00 80.00 90.00 100.00

poids

-0.50

0.00

0.50

1.00

1.50

sex

R Sq Linear = 0.441

Model: linear regression

male

female

Proportion of males >100%

Distribution of the residuals [y - (a+bx)] is not normal

Proportion of females <0%

This is not the right method!

Page 28: Lecture 14:  Logistic regression

28

Logistic regression

• The DV must be transformed– Y: probability that sex=1 (female), instead of 0 (male)

• “logit”

odds!

Page 29: Lecture 14:  Logistic regression

29

Interpretation of « b »

• In linear regression:– b: mean change of Y expected for an increase of

one unit of X

• By analogy, logistic regression:– b: mean change of logit(Y=1) expected for an

increase of one unit of X

eb = odds ratio of Y for one unit of X

Page 30: Lecture 14:  Logistic regression

30

Odds ratio and logistic regression coefficients

• Dependent variable: Y = 1 (case) or = 0 (control)

• Independent variable: X = 1 (exposed) or = 0 (non-exposed)

• Model: logit(y) = a + bx

• Equation among exposed: logit(yexp) = a + b*1 = a + b

• Equation among non-exposed: logit(ynon-exp) = a + b*0 = a

• Equation for b = (a + b) – a = logit(yexp) – logit(ynon-exp)

)ORlog()0y(Pr)1y(Pr

)0y(Pr)1y(Pr

log)0y(Pr)1y(Pr

log)0y(Pr)1y(Pr

logbexpnon

expnon

exp

exp

expnon

expnon

exp

exp

OReb

Page 31: Lecture 14:  Logistic regression

31

Example weight - sex

• Weight in 4 categories: – odds ratio = 8.9– Odds of being a man is multiplied by 8.9 for each

increase to a higher category of weight

• Weight in kilos: – odds ratio = 1.2– Odds of being a man is multiplied by 1.2 for each

additional kilo de poids

Page 32: Lecture 14:  Logistic regression

32

Multiple IVs

• Example:

• Results:– Odds ratio for one additional kg: 1.15

(p<0.001)– Odds ratio for one additional cm: 1.20

(p<0.001)

adjusted for height

Adjusted for weight

Page 33: Lecture 14:  Logistic regression

33

Conclusions

• Case-control study: design to examine associations between risk factors and disease– Mostly for rare disease– Efficient and cost effective

• Odds ratio: measure of association – often similar to relative risk

• Logistic regression: modeling method for binary dependent variables

Page 34: Lecture 14:  Logistic regression

34

Multilevel logistic regression

This is a random intercept multilevel logistic regression

Page 35: Lecture 14:  Logistic regression

35

Correct judgment of normality

• Statistical normality test (KS) was correct for 71.4% of the distributions.

• Only 57.1% for AD and JB.• Levene test correct in both cases (because data

were normally distributed)• You were correct for 71.4% of the distributions,

but your errors were not the same as the statistical tests.– All of you correctly found the bimodal distribution– You were not influenced by sample size