![Page 1: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/1.jpg)
Logistic regression analysis
Thomas Alexander Gerds
Department of Biostatistics, University of Copenhagen
1 / 51
![Page 2: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/2.jpg)
Carpenter et al.
2 / 51
![Page 3: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/3.jpg)
Carpenter et al.
3 / 51
![Page 4: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/4.jpg)
Regression
The type of the outcome variable determines which kind of modelis relevant:
Quantitative (continuous) outcome
I Linear regresssionI Association parameters: differences between mean values
0-1 (binary) outcome
I Logistic regressionI Association parameters: odds ratio, differences between
log(odds)
4 / 51
![Page 5: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/5.jpg)
Categorical explanatory variable
Group 1, . . . , K (especially binary: K=2)
Linear regression, continuous outcome Y
mean(Y|group k) - mean(Y|reference group)
E.g., the average systolic blood pressure was higher in malescompared to females
Logistic regression, binary outcome
Odds ratio =odds(group k)
odds(reference group)
E.g., the risk (odds) of coronary heart disease was higher in malescompared to females
5 / 51
![Page 6: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/6.jpg)
Quantitative (continuous) explanatory variables
Linear regression, continuous outcome YDifferences in mean values per unit of X:
mean(Y|x+1)-mean(Y|x)
E.g., the average systolic blood pressure increased with age
Logistic regression, binary outcomeDifferences in log(odds) per unit of X
Odds ratio =odds(x+1)odds(x)
E.g., the risk (odds) of coronary heart disease increased with age
6 / 51
![Page 7: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/7.jpg)
Linearity in regression models
For a continuous explanatory variable X, linearity means that theeffect of a unit change of X on the outcome does not depend onthe value of X .
Linear regression, continuous outcome Y
mean(Y |45+ 1)−mean(Y |45) = mean(Y |46+ 1)−mean(Y |46)= · · · = mean(Y |61+ 1)−mean(Y |61)
Logistic regression, binary outcome
odds(45+1)odds(45)
=odds(46+1)odds(46)
= · · · = odds(61+1)odds(61)
Linearity is a model assumption which should be investigated.
7 / 51
![Page 8: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/8.jpg)
Binary outcome regressionIf the outcome variable is binary:
Yi =
{1 if i is diseased0 if i is not diseased
then linear regressionYi = β0 + β1Xi
is not good.
The regression line will gobelow 0 and above 1.
● ●● ●● ●●●
●
●
●
● ●●●
●
●
●
●●● ●
●
●● ●● ●●● ● ●● ●● ●● ●● ● ●● ●● ●● ● ●
●
●
●
●
●
●●●●●● ●● ●
●
●● ●● ● ●● ●●● ●● ●
●
●● ●●●
●
●● ●●●● ● ●● ●● ● ●●
●●
●● ●● ●● ●●● ●●●
●
● ●● ● ●
●
●●●
●
●●
●
●●●● ●●
● ●
●
●
●
●
●● ● ●●●●
●
●●● ●●● ●● ●● ● ●● ●●● ●●●
●
● ● ●●● ●●
●
● ●●
●
●●● ●● ●●● ●● ●●
●●
● ●
●
● ●● ● ●●●
●
● ●
●●
● ●●● ●
●
●●●
●
●●●● ●● ● ●● ●●
●
●
●
● ●●
●●
●●
●
●● ● ● ●● ● ●
●
● ●
●
● ●● ●● ●●
●
●
●
● ●●
●
●●●● ●●
●
● ● ●●●● ●●
●
●
●
●
●
●●● ●●● ●
●
● ●●
● ●● ●
●●●●
● ●
●●●
●
●
●
●
●
● ●● ●●●
●
●●● ●● ●
●
●●
●●
● ●●●●●
●
●
●
●● ●
●
● ●●●● ●
●
●
●
●●●
●
●●●
● ●
● ●● ●●
●
●●● ●● ●
●
●●●● ●●
●
●●● ●●
●
● ●●●●
●
●●●
●
●
●●
● ● ● ●● ● ●●● ●●●●
●
● ● ●● ●● ●● ● ●● ●● ●● ● ●●● ●●●●●● ●
●
● ●●● ●●
●
●● ●● ● ●● ●●●●● ●● ●●●●● ● ● ●●
●
●●
●
●● ●
●
●● ●●
●
●
●
●●●● ● ●
●
● ●●
● ●
●
●
● ●●
●
● ●●
●
●● ●
●
●● ●●● ●
●
●● ●● ●●
●
● ●●
●
●●
●
● ●
●
● ● ● ●●
●
●● ●●● ●●●● ● ●●● ●● ●
●
● ●●●●
●
●●
●
●
● ●
●
●
●● ●
●
●● ●●● ●● ●● ●● ● ●●● ● ●●●
●
●
●
●
●●
●
●
● ●● ●
●
●● ● ●●●
●●
●
●
●●● ● ●● ●●
●
● ●● ● ●●● ●● ●● ●●●
●
●● ● ●● ●●● ●
● ●●
● ●● ●●
●● ●
●● ●●
●
●●
●
● ●●
●
● ●● ● ●●● ●
●
●
●
●●● ●●●● ● ●
●
●
●
●●
●
●
● ●
●●
●
●● ●
●
●
●
●● ● ●● ●● ●● ●● ●● ● ●●● ●● ●●● ●
●
● ●● ●● ●●●●
●
●●●● ●
● ●
● ●
●●●
●● ●●
●
● ● ●●● ●
●
●
●
● ●● ●●
●
●
●
●
●
●●●
●●
●● ●●
●●
●● ● ●●●● ● ●●●● ●● ●●● ●
● ●
● ● ●
●
●
● ●
●
●● ●
●
●
●●● ●●●
● ●
●●
●
● ●● ● ● ●● ●● ●●● ●●● ● ●●●● ● ●
● ●
● ●● ● ●
● ●●
●
●
●●●
●
●●●● ● ●●●●
●
● ●●●
●
●● ● ●● ●● ● ●●
●
● ●
●
●
●
●
●
●●●
●
●● ● ●● ●●●●
●
●●●
●
●
●
●
●
● ●● ●●
●●●
●● ●●● ●
●
●
●
●●
●
● ● ●●● ●● ●
●
●● ●● ● ●● ●● ●●● ●
●
●
●
●
●
● ●●●● ●●● ●
● ●
● ●
●
●● ● ●●●● ● ●
●
●● ●●
●
●● ●●
●
●
●
●● ●● ●●● ●●● ●●
●
●● ●●
● ●
●●● ● ●
●
●● ●● ●● ●●●●● ● ●●●
●
●● ●● ● ●●● ●●●●
●
● ●●
●●●
●●●●
●
●●
●
●● ●● ●● ●●
●●
●● ●●● ●● ●
●
● ● ●●● ●●
●
● ●●
●
●●
● ●● ●● ●
● ●● ●●● ●
●
●●● ●●●● ●
●
● ●●●
●
●
●
●●●
●
● ●●
●
●● ● ●
●
●● ●● ● ● ●●●
●
● ●● ● ●● ● ● ●● ●●● ●● ●●● ●● ●●●
●
●●●
● ●
●
●
●● ●● ●●● ●
●
● ●● ●●●● ●● ●●
●
●●
● ●
●
●
● ● ●●
●
● ●●● ●●● ●● ● ●● ●●● ●
●
● ● ● ●●● ●●
●●●
●
●
●●●●
●
● ●●● ●● ● ●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
● ●● ●
●
● ●●
●●
●●● ●● ●● ● ●●
●
● ●
●
● ●● ●
●●
●
●
●
●●
●●● ●● ● ●● ●● ●●●●● ●●●
●
●● ● ●●
● ●
●● ●
−25 %
0 %
25 %
50 %
75 %
100 %
125 %
20 40 60 80
8 / 51
![Page 9: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/9.jpg)
Binary outcome regressionIf the outcome variable is binary:
Yi =
{1 if i is diseased0 if i is not diseased
then linear regressionYi = β0 + β1Xi
is not good.
The regression line will gobelow 0 and above 1.
● ●● ●● ●●●
●
●
●
● ●●●
●
●
●
●●● ●
●
●● ●● ●●● ● ●● ●● ●● ●● ● ●● ●● ●● ● ●
●
●
●
●
●
●●●●●● ●● ●
●
●● ●● ● ●● ●●● ●● ●
●
●● ●●●
●
●● ●●●● ● ●● ●● ● ●●
●●
●● ●● ●● ●●● ●●●
●
● ●● ● ●
●
●●●
●
●●
●
●●●● ●●
● ●
●
●
●
●
●● ● ●●●●
●
●●● ●●● ●● ●● ● ●● ●●● ●●●
●
● ● ●●● ●●
●
● ●●
●
●●● ●● ●●● ●● ●●
●●
● ●
●
● ●● ● ●●●
●
● ●
●●
● ●●● ●
●
●●●
●
●●●● ●● ● ●● ●●
●
●
●
● ●●
●●
●●
●
●● ● ● ●● ● ●
●
● ●
●
● ●● ●● ●●
●
●
●
● ●●
●
●●●● ●●
●
● ● ●●●● ●●
●
●
●
●
●
●●● ●●● ●
●
● ●●
● ●● ●
●●●●
● ●
●●●
●
●
●
●
●
● ●● ●●●
●
●●● ●● ●
●
●●
●●
● ●●●●●
●
●
●
●● ●
●
● ●●●● ●
●
●
●
●●●
●
●●●
● ●
● ●● ●●
●
●●● ●● ●
●
●●●● ●●
●
●●● ●●
●
● ●●●●
●
●●●
●
●
●●
● ● ● ●● ● ●●● ●●●●
●
● ● ●● ●● ●● ● ●● ●● ●● ● ●●● ●●●●●● ●
●
● ●●● ●●
●
●● ●● ● ●● ●●●●● ●● ●●●●● ● ● ●●
●
●●
●
●● ●
●
●● ●●
●
●
●
●●●● ● ●
●
● ●●
● ●
●
●
● ●●
●
● ●●
●
●● ●
●
●● ●●● ●
●
●● ●● ●●
●
● ●●
●
●●
●
● ●
●
● ● ● ●●
●
●● ●●● ●●●● ● ●●● ●● ●
●
● ●●●●
●
●●
●
●
● ●
●
●
●● ●
●
●● ●●● ●● ●● ●● ● ●●● ● ●●●
●
●
●
●
●●
●
●
● ●● ●
●
●● ● ●●●
●●
●
●
●●● ● ●● ●●
●
● ●● ● ●●● ●● ●● ●●●
●
●● ● ●● ●●● ●
● ●●
● ●● ●●
●● ●
●● ●●
●
●●
●
● ●●
●
● ●● ● ●●● ●
●
●
●
●●● ●●●● ● ●
●
●
●
●●
●
●
● ●
●●
●
●● ●
●
●
●
●● ● ●● ●● ●● ●● ●● ● ●●● ●● ●●● ●
●
● ●● ●● ●●●●
●
●●●● ●
● ●
● ●
●●●
●● ●●
●
● ● ●●● ●
●
●
●
● ●● ●●
●
●
●
●
●
●●●
●●
●● ●●
●●
●● ● ●●●● ● ●●●● ●● ●●● ●
● ●
● ● ●
●
●
● ●
●
●● ●
●
●
●●● ●●●
● ●
●●
●
● ●● ● ● ●● ●● ●●● ●●● ● ●●●● ● ●
● ●
● ●● ● ●
● ●●
●
●
●●●
●
●●●● ● ●●●●
●
● ●●●
●
●● ● ●● ●● ● ●●
●
● ●
●
●
●
●
●
●●●
●
●● ● ●● ●●●●
●
●●●
●
●
●
●
●
● ●● ●●
●●●
●● ●●● ●
●
●
●
●●
●
● ● ●●● ●● ●
●
●● ●● ● ●● ●● ●●● ●
●
●
●
●
●
● ●●●● ●●● ●
● ●
● ●
●
●● ● ●●●● ● ●
●
●● ●●
●
●● ●●
●
●
●
●● ●● ●●● ●●● ●●
●
●● ●●
● ●
●●● ● ●
●
●● ●● ●● ●●●●● ● ●●●
●
●● ●● ● ●●● ●●●●
●
● ●●
●●●
●●●●
●
●●
●
●● ●● ●● ●●
●●
●● ●●● ●● ●
●
● ● ●●● ●●
●
● ●●
●
●●
● ●● ●● ●
● ●● ●●● ●
●
●●● ●●●● ●
●
● ●●●
●
●
●
●●●
●
● ●●
●
●● ● ●
●
●● ●● ● ● ●●●
●
● ●● ● ●● ● ● ●● ●●● ●● ●●● ●● ●●●
●
●●●
● ●
●
●
●● ●● ●●● ●
●
● ●● ●●●● ●● ●●
●
●●
● ●
●
●
● ● ●●
●
● ●●● ●●● ●● ● ●● ●●● ●
●
● ● ● ●●● ●●
●●●
●
●
●●●●
●
● ●●● ●● ● ●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
● ●● ●
●
● ●●
●●
●●● ●● ●● ● ●●
●
● ●
●
● ●● ●
●●
●
●
●
●●
●●● ●● ● ●● ●● ●●●●● ●●●
●
●● ● ●●
● ●
●● ●
−25 %
0 %
25 %
50 %
75 %
100 %
125 %
20 40 60 80
8 / 51
![Page 10: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/10.jpg)
(Multiple) logistic regression
We denote the probability of the event Yi = 1 for a subject withexplanatory variables Xi , Zi , . . . as
P(Yi = 1|Xi ,Zi , . . . ) = pi .
The idea is to use the logit function. Instead of pi which isbounded between 0 and 1 we apply linear regression to log(odds):
logit(pi ) = log
(pi
1− pi
)= a+ b1Zi + b2Xi + . . .
log(
pi1−pi
)can take both negative and positive values.
9 / 51
![Page 11: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/11.jpg)
Warm-up exercisesComplete the following table
pi oddsi logit(pi )
0.001%-7.0-4.5
2.1%8%
50%3.8
99%11.5
Hint: the following functions take a vector as argument
logit <- function(p){log(p/(1 - p))}expit <- function(x){exp(x)/(1 + exp(x))}
10 / 51
![Page 12: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/12.jpg)
Example: Framingham study
I SEX: 1 for males, 2 for femalesI AGE: age (years) at baseline (45-62)I FRW: "Framingham relative weight" (pct.) at baseline
(52-222; 11 persons have missing values)I SBP: systolic blood pressure at baseline (mmHg) (90-300)I DBP: diastolic blood pressure at baseline (mmHg) 50-160)I CHOL: cholesterol at baseline (mg/100ml) (96-430)I CIG: cigarettes per day at baseline (0-60; 1 person has missing
value)I CHD: 0 if no "coronary heart disease" during follow-up, 1 if
"coronary heart disease" at baseline (prevalent cases), x=2-10if "coronary heart disease" was diagnosed at follow-up no.x
11 / 51
![Page 13: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/13.jpg)
Framingham study: data preparationlibrary(data.table)framingham <- fread("data/Framingham.csv")
## remove prevalent casesframingham <- framingham[CHD!=1,]
## define factor levels/labelsframingham[,Smoke:=factor(CIG>0,levels=c(FALSE,TRUE),labels=c("No","Yes"))]
framingham[,Sex:=factor(SEX,levels=c(1,2),labels=c("Male","Female"))]
## define binary outcome variableframingham[,Y:=factor(CHD>1,levels=c(FALSE,TRUE),labels=c("no
CHD","CHD"))]framingham
ID SEX AGE FRW SBP DBP CHOL CIG CHD Smoke Sex Y1: 1070 2 45 93 100 62 220 0 0 No Female no CHD2: 1081 1 48 93 108 70 340 0 0 No Male no CHD3: 1123 2 45 91 160 100 171 0 0 No Female no CHD4: 1215 1 50 110 110 70 224 0 0 No Male no CHD5: 1267 1 48 85 110 70 229 25 0 Yes Male no CHD
---1359: 6432 1 47 113 155 105 175 5 5 Yes Male CHD1360: 6434 1 59 98 124 84 227 20 2 Yes Male CHD1361: 6437 2 55 111 108 74 231 0 0 No Female no CHD1362: 6440 1 49 114 110 80 218 20 0 Yes Male no CHD1363: 6442 2 51 95 152 90 199 1 0 Yes Female no CHD
12 / 51
![Page 14: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/14.jpg)
Framingham outcome
i = subject number: 1, . . . , 1406
Xi = age of subject i
Zi = gender of subject i
Vi = smoking status of subject i
Yi =
{1 subject i develops coronary heart diseased (CHD)0 subject i does not develop CHD
pi = P(Yi = 1|Xi ,Vi ,Zi , ...) = probability of CHD of subject i
pi(1− pi )
= odds of CHD of subject i
13 / 51
![Page 15: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/15.jpg)
A binary explanatory variable
Zi =
{1 if i is a man0 if i is a woman
Simple logistic regression:
log
(pi
1− pi
)= a+ bZi =
{a females
a+ b males
That means,
b = (a+ b)− a = log(odds for males)− log(odds for females)
= log
(Odds for malesOdds for females
)and
−b = a− (a+ b) = log
(Odds for femalesOdds for males
)
14 / 51
![Page 16: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/16.jpg)
A binary explanatory variable
Zi =
{1 if i is a man0 if i is a woman
Simple logistic regression:
log
(pi
1− pi
)= a+ bZi =
{a females
a+ b males
That means,
b = (a+ b)− a = log(odds for males)− log(odds for females)
= log
(Odds for malesOdds for females
)and
−b = a− (a+ b) = log
(Odds for femalesOdds for males
)14 / 51
![Page 17: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/17.jpg)
Exercise: 2 by 2 contingency table
framingham[,table(Sex,Y)]
no CHD CHDMale 479 164Female 616 104
I use the tools for 2x2 tablesI compute the odds ratio with 95% confidence limits and
corresponding p-valueI report and interprete the result in a sentence
15 / 51
![Page 18: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/18.jpg)
Logistic regression in R
fit1 <- glm(Y∼Sex, data=framingham, family=binomial)
I Y ∼ Sex tells R that Y is the outcome and Sex the explanatoryvariable
I data=framingham tells R where to find Y and SexI glm means generalized linear modelI family=binomial tells R that the outcome is binary and the
logit link should be used
16 / 51
![Page 19: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/19.jpg)
Logistic regression in Rfit1 <- glm(Y∼Sex,data=framingham,family=binomial)summary(fit1)
Call:glm(formula = Y ~ Sex, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.7674 -0.7674 -0.5586 -0.5586 1.9672
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.07183 0.09047 -11.847 < 2e-16 ***SexFemale -0.70702 0.13937 -5.073 0.000000392 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1351.2 on 1362 degrees of freedomResidual deviance: 1324.9 on 1361 degrees of freedomAIC: 1328.9
Number of Fisher Scoring iterations: 4
17 / 51
![Page 20: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/20.jpg)
Confidence intervals for the odds ratio
library(Publish)fit1 <- glm(Y∼Sex,data=framingham,family=binomial)publish(fit1)
Variable Units OddsRatio CI.95 p-valueSex Male 1.00 [1.00;1.00] 1
Female 0.49 [0.38;0.65] <0.0001
Note : 0.49 = exp(−0.71)
Women have a significantly lower risk to develop coronary heart diseasethan men (odds ratio: 0.49, 95%-CI: [0.38; 0.65], p-value <0.0001).
18 / 51
![Page 21: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/21.jpg)
Confidence intervals for the odds ratio
library(Publish)fit1 <- glm(Y∼Sex,data=framingham,family=binomial)publish(fit1)
Variable Units OddsRatio CI.95 p-valueSex Male 1.00 [1.00;1.00] 1
Female 0.49 [0.38;0.65] <0.0001
Note : 0.49 = exp(−0.71)
Women have a significantly lower risk to develop coronary heart diseasethan men (odds ratio: 0.49, 95%-CI: [0.38; 0.65], p-value <0.0001).
18 / 51
![Page 22: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/22.jpg)
Changing the reference level
framingham[,sex:=relevel(Sex,"Female")]fit1a <- glm(Y∼sex,data=framingham,family=binomial)publish(fit1a)
Variable Units OddsRatio CI.95 p-valuesex Female 1.00 [1.00;1.00] 1
Male 2.03 [1.54;2.66] <0.0001
Note : 2.03 = exp(0.71)
Men have a significantly higher risk to develop coronary heart disease thanwomen (odds ratio: 2.03, 95%-CI: [1.5; 2.7], p-value <0.0001).
19 / 51
![Page 23: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/23.jpg)
Changing the reference level
framingham[,sex:=relevel(Sex,"Female")]fit1a <- glm(Y∼sex,data=framingham,family=binomial)publish(fit1a)
Variable Units OddsRatio CI.95 p-valuesex Female 1.00 [1.00;1.00] 1
Male 2.03 [1.54;2.66] <0.0001
Note : 2.03 = exp(0.71)
Men have a significantly higher risk to develop coronary heart disease thanwomen (odds ratio: 2.03, 95%-CI: [1.5; 2.7], p-value <0.0001).
19 / 51
![Page 24: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/24.jpg)
Two explanatory variables:
Zi =
{1 if i male0 female
and Vi =
{1 if i smokes0 otherwise
Data can be summarized as two 2 by 2 tables in two ways
Males (Z=1) Females (Z=0)V = 0 V = 1 V = 0 V = 1
Y = 0 191 288 Y = 0 423 192Y = 1 57 107 Y = 1 77 27
Smokers (V = 1) Non-smokers (V = 0)Males Females Males Females
Y = 0 288 192 Y = 0 191 423Y = 1 107 27 Y = 1 57 77
20 / 51
![Page 25: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/25.jpg)
Cochran-Mantel-Haenszel test
In this way, we can study the effect of smoking adjusted for sex:
ORMantel-Haenszel = 0.97; p > 0.05
and also study the effect of Sex adjusted for smoking:
ORMantel-Haenszel = 2.03; p < 0.05
ConclusionsI there is no significant effect of smoking adjusted for sexI there is a significant effect of sex adjusted for smoking
21 / 51
![Page 26: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/26.jpg)
Logistic regression model: two binary variables
log(
pi1−pi
)= a+ b1Zi + b2Vi
=
a Female non-smokera+ b1 Male non-smokera+ b2 Female smokera+ b1 + b2 Male smoker
Note: b1 = (a+ b1)− a= (a+ b1 + b2)− (a+ b2)
= logOR (males vs. females for given smoking status)
and b2 = (a+ b2)− a= (a+ b1 + b2)− (a+ b1)
= logOR (smokers vs. non-smokers for given gender)
22 / 51
![Page 27: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/27.jpg)
Logistic regression resultsfit2=glm(Y∼Sex+Smoke,data=framingham,family=binomial)summary(fit2)
Call:glm(formula = Y ~ Sex + Smoke, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.7716 -0.7607 -0.5564 -0.5564 1.9708
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.09215 0.12717 -8.588 < 2e-16 ***SexFemale -0.69521 0.14635 -4.750 0.00000203 ***SmokeYes 0.03296 0.14457 0.228 0.82---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1350.8 on 1361 degrees of freedomResidual deviance: 1324.5 on 1359 degrees of freedom
(1 observation deleted due to missingness)AIC: 1330.5
Number of Fisher Scoring iterations: 4
23 / 51
![Page 28: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/28.jpg)
Extracting odds ratios with confidence intervals
publish(fit2,intercept=TRUE)
Variable Units Missing OddsRatio CI.95 p-value(Intercept) 0.34 [0.26;0.43] <0.0001Sex Male 0 Ref
Female 0.50 [0.37;0.66] <0.0001Smoke No 1 Ref
Yes 1.03 [0.78;1.37] 0.8196
Logistic regression adjusted for smoking status showed a decreasein odds of CHD of 50% (CI-95%: [37%;66%]) in women comparedto men (p<0.0001).Exercise: Based on this model, compute the risk of CHD for anon-smoking woman, a non-smoking man, a smoking woman and asmoking man.
24 / 51
![Page 29: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/29.jpg)
Extracting odds ratios with confidence intervals
publish(fit2,intercept=TRUE)
Variable Units Missing OddsRatio CI.95 p-value(Intercept) 0.34 [0.26;0.43] <0.0001Sex Male 0 Ref
Female 0.50 [0.37;0.66] <0.0001Smoke No 1 Ref
Yes 1.03 [0.78;1.37] 0.8196
Logistic regression adjusted for smoking status showed a decreasein odds of CHD of 50% (CI-95%: [37%;66%]) in women comparedto men (p<0.0001).
Exercise: Based on this model, compute the risk of CHD for anon-smoking woman, a non-smoking man, a smoking woman and asmoking man.
24 / 51
![Page 30: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/30.jpg)
Extracting odds ratios with confidence intervals
publish(fit2,intercept=TRUE)
Variable Units Missing OddsRatio CI.95 p-value(Intercept) 0.34 [0.26;0.43] <0.0001Sex Male 0 Ref
Female 0.50 [0.37;0.66] <0.0001Smoke No 1 Ref
Yes 1.03 [0.78;1.37] 0.8196
Logistic regression adjusted for smoking status showed a decreasein odds of CHD of 50% (CI-95%: [37%;66%]) in women comparedto men (p<0.0001).Exercise: Based on this model, compute the risk of CHD for anon-smoking woman, a non-smoking man, a smoking woman and asmoking man.
24 / 51
![Page 31: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/31.jpg)
Simple logistic regression: categorical explanatory variable:
Categorize age into 4 intervals:
45-48, 49-52, 53-56, 57-62
Summarize in 2 by 4 table
X = 0 X = 1 X = 2 X = 345-48 49-52 53-56 57-62
Y = 0 308 298 254 235 1095Y = 1 51 61 64 92 268
359 359 318 327 1363
(Note: both males and females)
25 / 51
![Page 32: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/32.jpg)
ANOVA: χ2 test
We may test whether the risk of CHD differs between the 4 agegroups using a chi-square test statistic - in this case with 3 degreesof freedom:
Null hypothesis:
Odds(age45− 48) = Odds(age49− 52)= Odds(age53− 56) = Odds(age57−62)
∑ (OBS − EXP)2
EXP= 23.29 ∼ χ2
3, P < 0.001
Conclusion: CHD-risk differs significantly between the age groups.
26 / 51
![Page 33: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/33.jpg)
Logistic regression: categorical variable with 4 levels:
log
(pi
1− pi
)=
a age 45− 48
a+ b1 age 49− 52a+ b2 age 53− 56a+ b3 age 57− 62
Reference category 45-48
a = log (Odds(45− 48))
b1 = log
(Odds(49− 52)Odds(45− 48)
)b2 = log
(Odds(53− 56)Odds(45− 48)
)b3 = log
(Odds(57− 62)Odds(45− 48)
)
27 / 51
![Page 34: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/34.jpg)
Resultsframingham[,AgeCut:=cut(AGE,
c(40,48,52,56,99),labels=c("45-48","49-52","53-56","57-62"))]
fit3=glm(Y∼AgeCut,data=framingham,family=binomial)publish(fit3,intercept=1L)
Variable Units OddsRatio CI.95 p-value(Intercept) 0.17 [0.12;0.22] < 0.0001
AgeCut 45-48 Ref49-52 1.24 [0.82;1.85] 0.3042553-56 1.52 [1.02;2.28] 0.0415157-62 2.36 [1.61;3.46] < 0.0001
Notes:
1. The interpretation depends on the cut-off values2. Not all comparisons are in the table, for example the odds ratio for
group 49-52 vs 53-56 is not. But, it can be computed and you knowhow.
28 / 51
![Page 35: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/35.jpg)
Resultsframingham[,AgeCut:=cut(AGE,
c(40,48,52,56,99),labels=c("45-48","49-52","53-56","57-62"))]
fit3=glm(Y∼AgeCut,data=framingham,family=binomial)publish(fit3,intercept=1L)
Variable Units OddsRatio CI.95 p-value(Intercept) 0.17 [0.12;0.22] < 0.0001
AgeCut 45-48 Ref49-52 1.24 [0.82;1.85] 0.3042553-56 1.52 [1.02;2.28] 0.0415157-62 2.36 [1.61;3.46] < 0.0001
Notes:
1. The interpretation depends on the cut-off values2. Not all comparisons are in the table, for example the odds ratio for
group 49-52 vs 53-56 is not. But, it can be computed and you knowhow.
28 / 51
![Page 36: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/36.jpg)
Quantitative explanatory factor
It is often more natural to include the variable AGE (in years) as aquantitative explanatory factor in the model (i.e., NO grouping)
log
(pi
1− pi
)= a+ b · agei
a = log(odds(age=0))b = log(odds(age=a))− log(odds(age=a+1))
Interpretation: For each year
exp(b) = odds ratio
is the factor by which odds for CHD increases with each one unitincrease of age (here 1 year).
29 / 51
![Page 37: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/37.jpg)
Resultsfit5=glm(Y∼AGE,data=framingham,family=binomial)summary(fit5)
Call:glm(formula = Y ~ AGE, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.8600 -0.7052 -0.6082 -0.5224 2.0294
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.88431 0.77372 -6.313 0.000000000274 ***AGE 0.06581 0.01446 4.550 0.000005374208 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1351.2 on 1362 degrees of freedomResidual deviance: 1330.2 on 1361 degrees of freedomAIC: 1334.2
Number of Fisher Scoring iterations: 4
30 / 51
![Page 38: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/38.jpg)
ResultsOne year change in age
fit5=glm(Y∼AGE,data=framingham,family=binomial)publish(fit5,intercept=1L)
Variable Units OddsRatio CI.95 p-value(Intercept) 0.01 [0.00;0.03] < 0.0001
AGE 1.07 [1.04;1.10] < 0.0001
10-year change in age
framingham[,age10:=AGE/10]fit5=glm(Y∼age10,data=framingham,family=binomial)publish(fit5,intercept=1)
Variable Units OddsRatio CI.95 p-value(Intercept) 0.01 [0.00;0.03] < 0.0001
age10 1.93 [1.45;2.56] < 0.0001
31 / 51
![Page 39: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/39.jpg)
Exercises
If we substract from each person’s age the value 50:
framingham[,Age50:=AGE-50]fit5a=glm(Y∼Age50,data=framingham,family=binomial)publish(fit5a,intercept=1)
Variable Units OddsRatio CI.95 p-value(Intercept) 0.20 [0.17;0.24] < 0.0001
Age50 1.07 [1.04;1.10] < 0.0001
1. Report the coronary heart disease risk of a person aged 50.2. Report the association between age and risk of coronary heart
disease in a sentence with confidence interval and p-value.
32 / 51
![Page 40: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/40.jpg)
Multiple logistic regression
Additive effects of several explanatory variables:
log
(pi
1− pi
)= a+ b1Zi + b2Xi + . . .
Multiple logistic regression is a way to control confounding:
The effect on the outcome (odds ratio) of each explanatory variableis mutually adjusted for the other explanatory variables.
I The model assumes that the effect (odds ratio) of Z on Y isthe same for all values of X.
I In other words: the effect of Z on Y is not modified by thevalues of X (no statistical interaction).
33 / 51
![Page 41: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/41.jpg)
Illustration of what "mutually adjusted" means
Additive model (no statistical interactions)
log
(pi
1− pi
)= a+ b1Zi + b2Xi
Effect of sex Zi (0 = female, 1 = male) adjusted for age (Xi)
odds(age=50, male)odds(age=50, female)
=exp(a+ b1 + b250)
exp(a+ b250)= exp(a+ b1 + b250− a− b250)= exp(b1).
The result is the same for age 46 and age 61 and all other ages.
34 / 51
![Page 42: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/42.jpg)
Illustration of what "mutually adjusted" means (continued)Effect of age (Xi) for males:
odds(age=51, male)odds(age=50, male)
=exp(a+ b1 + b251)exp(a− b1 + b250)
= exp(a+ b1 + b251− a− b1 − b250)= exp(b2).
The result is the same for females:
odds(age=51, female)odds(age=50, female)
=exp(a+ b251)exp(a− b250)
= exp(a+ b251− a− b250)= exp(b2).
Linearity means that the result is the same for a comparison of age63 and age 62 and all other one year differences.
35 / 51
![Page 43: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/43.jpg)
Resultsfit.add=glm(Y ∼ AGE + Sex, family = binomial,
data = framingham)summary(fit.add)
Call:glm(formula = Y ~ AGE + Sex, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.9910 -0.6927 -0.5958 -0.4500 2.1913
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.59208 0.78019 -5.886 0.00000000396 ***AGE 0.06672 0.01458 4.575 0.00000475151 ***SexFemale -0.71613 0.14052 -5.096 0.00000034612 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1351.2 on 1362 degrees of freedomResidual deviance: 1303.5 on 1360 degrees of freedomAIC: 1309.5
Number of Fisher Scoring iterations: 4
36 / 51
![Page 44: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/44.jpg)
Results
fit.add=glm(Y ∼ AGE + Sex, family = binomial, data =framingham)
publish(fit.add)
Variable Units OddsRatio CI.95 p-valueAGE 1.07 [1.04;1.10] <0.0001Sex Male 1.00 [1.00;1.00] 1
Female 0.49 [0.37;0.64] <0.0001
Logistic regression was used to investigate gender differences inodds (risks) of CHD adjusted for age.
The age adjusted odds ratio was 0.49 (95%-CI: [0.37;0.64])showing that the risks of CHD were significantly lower for womencompared to men (p<0.0001).
37 / 51
![Page 45: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/45.jpg)
Predicted risks based on logistic regression model
A logistic regression model can be used to predict personalized risks:
log
(pi
1− pi
)= a+ b1Zi + b2Xi + . . .
is equivalent to
pi =exp(a+ b1Zi + b2Xi + . . . )
1+ exp(a+ b1Zi + b2Xi + . . . )
The risks (and risk ratios) depend on all explanatory variablessimultaneously.
38 / 51
![Page 46: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/46.jpg)
Predicted risks based on logistic regression modelPrediction makes most sense for new data
mydata=expand.grid(AGE=c(50,55),Sex=factor(c("Female","Male")))setDT(mydata)mydata
AGE Sex1: 50 Female2: 55 Female3: 50 Male4: 55 Male
mydata[,risk:=predict(fit.add,newdata=mydata,type="response")]mydata
AGE Sex risk1: 50 Female 0.12213812: 55 Female 0.16263533: 50 Male 0.22162844: 55 Male 0.2844255
39 / 51
![Page 47: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/47.jpg)
Visualization of predicted risks
mydata2 <- setDT(expand.grid(AGE=seq(45,62,1),Sex=factor(c("Female","Male"))))
mydata2[,risk:=predict(fit.add,newdata=mydata2,type="response")]
library(ggplot2)ggplot(mydata2,aes(x=AGE,y=risk,group
=Sex,colour=Sex))+geom_line()+ylim(c(0,1))+xlab("Age (years)")+ylab("Risk of CHD")
0.00
0.25
0.50
0.75
1.00
45 50 55 60
Age (years)
Ris
k of
CH
D
Sex
Female
Male
40 / 51
![Page 48: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/48.jpg)
Example with more variables
framingham[,Chol10:=CHOL/10]fit.multi <- glm(Y ∼ AGE + Sex + Chol10 + SBP + Smoke,
family = binomial,data = framingham)publish(fit.multi)
Variable Units Missing OddsRatio CI.95 p-valueAGE 0 1.06 [1.02;1.09] 0.0004181Sex Male 0 Ref
Female 0.38 [0.28;0.52] < 0.0001Chol10 0 1.05 [1.02;1.08] 0.0026086
SBP 0 1.02 [1.01;1.02] < 0.0001Smoke No 1 Ref
Yes 1.19 [0.88;1.60] 0.2510447
41 / 51
![Page 49: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/49.jpg)
Exercise
1. Report the effect of cholesterol on coronary heart disease fromthe multiple logistic regression model
2. Predict the coronary heart disease risks of four smokingfemales all aged 50 and with 150 SBP but with differentcholesterol values:I person 1: 235, person 2: 245, person 3: 351, person 4: 361
mydata=data.frame(AGE=50,Sex=factor("Female",levels(framingham$Sex)),Smoke=factor("Yes",levels(framingham$Smoke)),SBP=150,Chol10=c(23.5,24.5,35.1,36.1))
1. Compute the risk ratios for 10 unit cholesterol changes from245 to 235 and from 361 to 351
2. Repeat 2. and 3. for a male person
42 / 51
![Page 50: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/50.jpg)
Statistical interaction = Effectmodification
The effect of X on Y depends on Z
43 / 51
![Page 51: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/51.jpg)
Effect modification
A statistical interaction (effect modification) requires 3variablesI two explanatory variables X,ZI one outcome Y
In logistic regression the odds ratio which describes the effect of Xon the odds of Y=1 depends on the value of Z
SymmetryIf the effect of variable X on Y is modified by Z then also the effectof Z on Y is modified X.
ExampleThe age (Z) effect on the CHD-risk (Y) may depend on sex (X).
44 / 51
![Page 52: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/52.jpg)
Statistical interaction in R
summary(glm(Y ∼ AGE * SEX, family = binomial, data =framingham))
Alternative notation:
summary(glm(Y ∼ AGE + SEX + AGE:SEX, family = binomial, data = framingham))
45 / 51
![Page 53: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/53.jpg)
Resultssummary(glm(Y ∼ AGE * SEX, family = binomial, data = framingham
))
Call:glm(formula = Y ~ AGE * SEX, family = binomial, data = framingham)
Deviance Residuals:Min 1Q Median 3Q Max
-0.9171 -0.7284 -0.6074 -0.4010 2.3029
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.091694 2.361000 0.039 0.9690AGE -0.007736 0.044223 -0.175 0.8611SEX -3.544593 1.604311 -2.209 0.0271 *AGE:SEX 0.052967 0.029871 1.773 0.0762 .---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1351.2 on 1362 degrees of freedomResidual deviance: 1300.4 on 1359 degrees of freedomAIC: 1308.4
Number of Fisher Scoring iterations: 4
46 / 51
![Page 54: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/54.jpg)
Statistical interaction
publish(glm(Y ∼ AGE * Sex, family = binomial, data =framingham))
Variable Units OddsRatio CI.95 p-valueAGE: Sex(Male) 1.05 [1.01;1.09] 0.01629
AGE: Sex(Female) 1.10 [1.05;1.15] < 0.0001
Notes:
I The main effects for AGE and Sex have no interpretation (andare therefore not shown).
I One year more in age increases the odds by 5% in males andby 10% in females.
47 / 51
![Page 55: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/55.jpg)
Predicted risk of the model with an additive effect of ageand sex (no effect modification)
Age (years)
Pre
dict
ed C
HD
ris
k
MaleFemale
0 %
25 %
50 %
75 %
100
%
45 50 55 60
48 / 51
![Page 56: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/56.jpg)
Predicted risk of the model with an interaction between ageand sex (with effect modification)
Age (years)
Pre
dict
ed C
HD
ris
k
MaleFemale
0 %
25 %
50 %
75 %
100
%
45 50 55 60
49 / 51
![Page 57: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/57.jpg)
Testing for statistical interactionfit.add=glm(Y ∼ AGE + SEX, family = binomial, data = framingham)fit.int=glm(Y ∼ AGE * SEX, family = binomial, data = framingham)anova(fit.add,fit.int,test="Chisq")
Analysis of Deviance Table
Model 1: Y ~ AGE + SEXModel 2: Y ~ AGE * SEX
Resid. Df Resid. Dev Df Deviance Pr(>Chi)1 1360 1303.52 1359 1300.4 1 3.1676 0.07511 .---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
There is no statistically significant modification of the age effect by gender(p>0.05).
50 / 51
![Page 58: Logistic regression analysis - staff.pubhealth.ku.dk](https://reader031.vdocuments.us/reader031/viewer/2022012013/61588420bb4add155a2bf0b3/html5/thumbnails/58.jpg)
Take home messages
I (Multiple) logistic regression describes associations betweenone or several explanatory variables and the risk of an event(binary outcome).
I Analysis of an exposure of interest can be adjusted forpotential confounders
I In an additive model (no interactions), odds ratios do notdepend on the other explanatory variables
I Risks and risk ratios predicted by the model depend on theother explanatory variables
I Linearity and absence of interaction are assumptions whichshould be investigated
51 / 51