lecture 14: logistic regression
DESCRIPTION
Lecture 14: Logistic regression. risk. 30. 30. 2 2 table (contingency table). Depression. No depression. 10. 20. Divorce . 1. 29. No divorce. a+b. c+d. 2 2 table (contingency table). Disease. Healthy. a. b. Exposed. c. d. Not-exposed. Risk ( absolute). - PowerPoint PPT PresentationTRANSCRIPT
1
Lecture 14: Logistic regression
2
22 table (contingency table)
10
1
20
29
Divorce
No divorce
30
30
Depression No depression risk
3010
301
3
22 table (contingency table)
a
c
b
d
Exposed
Not-exposed
a+b
c+d
Disease Healthy
4
Risk (absolute)
• Proportion of individuals initially healthy who contracted the disease during a given periode
• The observation period must be the same for everyone
• Given as a proportion (0-1), or a percentage (0-100%)
• Estimation of the risk
– For exposed
– For non-exposed
baaRe
dccRne
5
Relative risk
• Ratio of the risk for the exposed to the risk for the non-exposed (no unit)
• Interpretation– RR>1: exposition increases the risk– RR=1: exposition does not modify the risk– RR<1: exposition decreases the risk
• When RR<1, relative risk reduction (in %)may be presented
RR = Re / Rne
RRR = (1-RR)*100
6
Caveat
• Relative risk does not provide any information on the importance of the compared risks
• RR = 3 peut correspondre à:– Re = 45% Rne = 15%
– Re = 3% Rne = 1%
– Re = 0.006% Rne = 0.002%
– …
7
Risk difference
• Difference between the risks of the exposed and non-exposed individuals
• Given as a proportion or a percentage
• Interpretation– DR>0: exposition increases the risk– DR=0: exposition does not modify the risk– DR<0: exposition decreases the risk
DR = Re – Rne
8
Test and estimation
• Null hypothesis: no effect of exposition:– RR = 1
– DR = 0
• It is also possible to compute a standard-error for each of these statistics, and to obtain a confidence interval
H0: Re = Rne
2 test
BMJ 2003;327:1254-7
RR=1.53
11
Prevalence
• Proportion of individuals having a disease (or presenting a characteristic) at a given moment
• Moment is defined by• Date (eg: 19 october 2011)
• Event (eg: at birth)
Prevalence of smoking in an freshman class
12
Incidence rate
• Risk of contracting a disease during one time unit
• Numerator: new cases
• Denominator: sum of the time-person at risk– Person-years
– Person-days
– …
13
Comparison with prospective study: case control study…
– Allows to study very rare conditions (e.g., autism, suicide)– Can be made more quickly– Requires less observations (cost )– Allows to test several risk factors for the outcome
• But…– Does not allow to compute a risk, or an incidence rate
(question: why)– Risk of bias in the measure of risk factor– Difficult to choose appropriate control– Can only study one condition at a time
14
Case control study
a
c
b
d
Case Control
We take everyone
We take only a sample
Exposed
Non-exposed
The sums: a+b and c+d have no sense!
Impossible to compute risk and relative risk
15
Solution: odds ratio
• Transform a proportion in odds
• Transform odds in proportion:
ppodds
1
1oddsoddsp
16
Cas control study
a
c
b
d
Exposed
Non-exposed
Case Control
a/c b/d exposition odds
cbda
dbca
dbddbb
caccaa
ratioodds
//
)/()/(
)/()/(
denominator
numerator
17
Prospective study
a
c
b
d
Exposed
Non-exposed
Disease Healthy
a/b
c/d
Disease odds
a/c b/dExposition odds
18
Signification a-b-c-d
Case control Prospective study
a Case exposed to the risk factor
People exposed who develop the disease
b Control exposed People exposed remaining healthy
c Cases non-exposed
People non exposed who develop the disease
d Control non exposed
People non exposed remaining healthy
19
Property of odds ratio
• Exposition odds ratio = disease odds ratio
..exp //
//
malORdcba
cbda
dbcaOR
Exposition odds Disease oddsodds ratio
Odds ratio are the same, computed from a prospective study or a case control study
20
Odds ratio and relative risk
• When the condition is rare, (a<<b et c<<d), OR is approximately equal to RR
RRdccbaa
dcbaOR
)/()/(
//
21
• 573 patients with facial clefts
• 763 controls
• Exposition:– Taking more than 400 mg of folic acid
supplements
BMJ 2007; 334:464-470
23
Results
4.161882145491
ratioodds
Cleft No cleft
Folic acid < 400 mg
491 618
Folic acid ≥ 400 mg
82 145
Total 573 763
Odds 491 / 82 618 / 145
Odds ratio for folic acid < 400 mg
24
Why not compute the RR?
• The prevalence is not correct since the goal of the case-control design is to have as many cases as controls.
• Thus, a/(a+b) does not make sense.
25
Interpretation of odds ratio
• Similar to relative risk:– OR>1: IV is associated with DV (e.g., exposition is
associated with disease)– OR=1: IV is not associated with DV– OR<1: IV is negatively associated with DV
• Folic acid and facial cleft:– Not taking folic acid supplements (>400 mg) increases
the risk of facial cleft (by 40%)
26
Continuous IV
1. Compare means of cases and controls (t test)
2. Divide the IV in several categories and compute an odds ratio for each category
3. Model:
2740.00 50.00 60.00 70.00 80.00 90.00 100.00
poids
-0.50
0.00
0.50
1.00
1.50
sex
R Sq Linear = 0.441
Model: linear regression
male
female
Proportion of males >100%
Distribution of the residuals [y - (a+bx)] is not normal
Proportion of females <0%
This is not the right method!
28
Logistic regression
• The DV must be transformed– Y: probability that sex=1 (female), instead of 0 (male)
• “logit”
odds!
29
Interpretation of « b »
• In linear regression:– b: mean change of Y expected for an increase of
one unit of X
• By analogy, logistic regression:– b: mean change of logit(Y=1) expected for an
increase of one unit of X
eb = odds ratio of Y for one unit of X
30
Odds ratio and logistic regression coefficients
• Dependent variable: Y = 1 (case) or = 0 (control)
• Independent variable: X = 1 (exposed) or = 0 (non-exposed)
• Model: logit(y) = a + bx
• Equation among exposed: logit(yexp) = a + b*1 = a + b
• Equation among non-exposed: logit(ynon-exp) = a + b*0 = a
• Equation for b = (a + b) – a = logit(yexp) – logit(ynon-exp)
)ORlog()0y(Pr)1y(Pr
)0y(Pr)1y(Pr
log)0y(Pr)1y(Pr
log)0y(Pr)1y(Pr
logbexpnon
expnon
exp
exp
expnon
expnon
exp
exp
OReb
31
Example weight - sex
• Weight in 4 categories: – odds ratio = 8.9– Odds of being a man is multiplied by 8.9 for each
increase to a higher category of weight
• Weight in kilos: – odds ratio = 1.2– Odds of being a man is multiplied by 1.2 for each
additional kilo de poids
32
Multiple IVs
• Example:
• Results:– Odds ratio for one additional kg: 1.15
(p<0.001)– Odds ratio for one additional cm: 1.20
(p<0.001)
adjusted for height
Adjusted for weight
33
Conclusions
• Case-control study: design to examine associations between risk factors and disease– Mostly for rare disease– Efficient and cost effective
• Odds ratio: measure of association – often similar to relative risk
• Logistic regression: modeling method for binary dependent variables
34
Multilevel logistic regression
This is a random intercept multilevel logistic regression
35
Correct judgment of normality
• Statistical normality test (KS) was correct for 71.4% of the distributions.
• Only 57.1% for AD and JB.• Levene test correct in both cases (because data
were normally distributed)• You were correct for 71.4% of the distributions,
but your errors were not the same as the statistical tests.– All of you correctly found the bimodal distribution– You were not influenced by sample size