![Page 1: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/1.jpg)
A Generalized Linear Model for
Bernoulli Response Data
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46
![Page 2: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/2.jpg)
Consider the Gauss-Markov linear model with normal
errors:y = X� + ✏, ✏ ⇠ N(0, �2
I).
Another way to write this model is
8 i = 1, . . . , n, y
i
⇠ N(µi
, �2
), µi
= x
0i
�,
and y
1
, . . . , y
n
are independent.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 2 / 46
![Page 3: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/3.jpg)
This is a special case of what is known as a
generalized linear model.
Here is another special case:
8 i = 1, . . . , n, y
i
⇠ Bernoulli(⇡i
),
⇡i
=
exp(x
0i
�)
1 + exp(x
0i
�),
and y
1
, . . . , y
n
are independent.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 3 / 46
![Page 4: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/4.jpg)
In each example, all responses are independent, and
each response is a draw from one type of distribution
whose parameters may depend on explanatoryvariables through a linear predictor x
0i
�.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 4 / 46
![Page 5: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/5.jpg)
The second model, for the case of a binary response,
is often called a logistic regression model.
Binary responses are common (success/failure,
survive/die, good customer/bad customer, win/lose,etc.)
The logistic regression model can help us understandhow explanatory variables are related to the
probability of “success.”
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 5 / 46
![Page 6: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/6.jpg)
Example: Disease Outbreak Study
Source: Applied Linear Statistical Models, 4th edition,by Neter, Kutner, Nachtsheim, Wasserman (1996)
In a health study to investigate an epidemic outbreakof a disease that is spread by mosquitoes, individuals
were randomly sampled within two sectors in a city todetermine if the person had recently contracted the
disease under study.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 6 / 46
![Page 7: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/7.jpg)
Response Variable
y
i
= 0 (person i does not have the disease)
y
i
= 1 (person i has the disease)
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 7 / 46
![Page 8: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/8.jpg)
Potential Explanatory Variables
age in years
socioeconomic status
1 = upper
2 = middle
3 = lower
sector (1 or 2)
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 8 / 46
![Page 9: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/9.jpg)
Questions of Interest
The potential explanatory variables and the response
were recorded for 196 randomly selected individuals.
Are any of these variables associated with theprobability of disease and if so how?
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 9 / 46
![Page 10: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/10.jpg)
We will demonstrate how to use R to fit a logisticregression model to this dataset.
Before delving more deeply into logistic regression,
we will review the basic facts of the Bernoulli
distribution.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 10 / 46
![Page 11: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/11.jpg)
y ⇠ Bernoulli(⇡) has probability mass function
Pr(y = k) = f (k) =
(⇡k
(1 � ⇡)1�k for k 2 {0, 1}0 otherwise
Thus,
Pr(y = 0) = f (0) = ⇡0
(1 � ⇡)1�0
= 1 � ⇡
andPr(y = 1) = f (1) = ⇡1
(1 � ⇡)1�1
= ⇡.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 11 / 46
![Page 12: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/12.jpg)
The variance of y is a function of the mean of y.
E(y) =
1X
k=0
kf (k) = 0 · (1 � ⇡) + 1 · ⇡ = ⇡
E(y
2
) =
1X
k=0
k
2
f (k) = 0
2 · (1 � ⇡) + 1
2 · ⇡ = ⇡
Var(y) = E(y
2
)� [E(y)]
2
= ⇡ � ⇡2
= ⇡(1 � ⇡)
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 12 / 46
![Page 13: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/13.jpg)
The Logistic Regression Model
For i = 1, . . . , n, y
i
⇠ Bernoulli(⇡i
),
where⇡
i
=
exp(x
0i
�)
1 + exp(x
0i
�)
and y
1
, . . . , y
n
are independent.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 13 / 46
![Page 14: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/14.jpg)
The Logit Function
The function
g(⇡) = log
✓⇡
1 � ⇡
◆
is called the logit function.
The logit function maps the interval (0, 1) to the realline (�1,1).
⇡ is a probability, so log(
⇡1�⇡) is the log(odds), where
the odds of an event A ⌘ Pr(A)
1�Pr(A)
.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 14 / 46
![Page 15: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/15.jpg)
Note that
g(⇡i
) = log
✓⇡
i
1 � ⇡i
◆
= log
exp(x
0i
�)
1 + exp(x
0i
�)
�1
1 + exp(x
0i
�)
�
= log[exp(x
0i
�)] = x
0i
�.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 15 / 46
![Page 16: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/16.jpg)
Thus, the logistic regression model says that,
y
i
⇠ Bernoulli(⇡i
), where
log
✓⇡
i
1 � ⇡i
◆= x
0i
�
In Generalized Linear Models terminology, the logit is
called the link function because it “links” the mean ofy
i
(i.e., ⇡i
) to the linear predictor x
0i
�.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 16 / 46
![Page 17: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/17.jpg)
For Generalized Linear Models, it is not necessarythat the mean of y
i
be a linear function of �.
Rather, some function of the mean of y
i
is a linear
function of �.
For logistic regression, that function is
logit(⇡i
) = log
✓⇡
i
1 � ⇡i
◆= x
0i
�.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 17 / 46
![Page 18: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/18.jpg)
When the response is Bernoulli or more generally,binomial, the logit link function is one natural choice.However, other link functions can be considered.
Some common choices (that are also available in R)include the following:
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 18 / 46
1
![Page 19: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/19.jpg)
1 logit:
log
✓⇡
1 � ⇡
◆= x
0�.
2 probit:�
�1
(⇡) = x
0�,
where �
�1
(·) is the inverse of N(0, 1) CDF.
3 Complementary log-log (cloglog in R):
log(� log(1 � ⇡)) = x
0�.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 19 / 46
![Page 20: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/20.jpg)
Although any of these link functions (or others) can
be used, the logit link has some advantages when itcomes to interpreting the results (as we will discuss
later).
Thus, the logit link is a good choice if it can provide a
good fit to the data.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 20 / 46
![Page 21: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/21.jpg)
The log likelihood function for logistic regression is
`(� | y) =
nX
i=1
log[⇡y
i
i
(1 � ⇡i
)
1�y
i
]
=
nX
i=1
[y
i
log(⇡i
) + (1 � y
i
) log(1 � ⇡i
)]
=
nX
i=1
[y
i
{log(⇡i
)� log(1 � ⇡i
)}+ log(1 � ⇡i
)]
=
nX
i=1
y
i
log
✓⇡
i
1 � ⇡i
◆+ log(1 � ⇡
i
)
�
=
nX
i=1
[y
i
x
0i
� � log(1 + exp{x
0i
�})]
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 21 / 46
![Page 22: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/22.jpg)
For Generalized Linear Models, Fisher’s Scoring
Method is typically used to obtain an MLE for �,denoted as ˆ�.
Fisher’s Scoring Method is a variation of the
Newton-Raphson algorithm in which the Hessianmatrix (matrix of second partial derivatives) is
replaced by its expected value (-Fisher Information
matrix).
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 22 / 46
![Page 23: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/23.jpg)
For generalized Linear Models, Fisher’s scoring
method results in an iterative weighted least squaresprocedure.
The algorithm is presented for the general case inSection 2.5 of Generalized Linear Models 2nd Edition
(1989) by McCullough and Nelder.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 23 / 46
![Page 24: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/24.jpg)
For sufficiently large samples, ˆ� is approximately
normal with mean � and a variance-covariancematrix that can be approximated by the estimated
inverse of the Fisher Information Matrix; i.e.,
ˆ�·⇠ N(�,ˆI�1
(
ˆ�))
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 24 / 46
![Page 25: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/25.jpg)
Inference can be conducted using the Wald approachor via likelihood ratio testing as discussed in our
course notes on likelihood-related topics.
For example, a Wald confidence interval for c
0� with
approximate coverage probability of 0.95 is given by
c
0ˆ� ± 1.96
qc
0ˆ
I
�1
(
ˆ�)c
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 25 / 46
![Page 26: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/26.jpg)
Interpretation of Logistic Regression Parameters
Let x = [x
1
, x
2
, . . . , x
j�1
, x
j
, x
j+1
, . . . , x
p
]
0.
Let ˜x = [x
1
, x
2
, . . . , x
j�1
, x
j
+ 1, x
j+1
, . . . , x
p
]
0.
In other words, ˜x is the same as x except that the jthexplanatory variable has been increased by one unit.
Let ⇡ =
exp(x
0�)1+exp(x
0�) and ⇡̃ =
exp(
˜
x
0�)1+exp(
˜
x
0�) .
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 26 / 46
![Page 27: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/27.jpg)
The Odds Ratio
⇡̃
1 � ⇡̃
�⇡
1 � ⇡= exp
⇢log
✓⇡̃
1 � ⇡̃
�⇡
1 � ⇡
◆�
= exp
⇢log
✓⇡̃
1 � ⇡̃
◆� log
✓⇡
1 � ⇡
◆�
= exp{˜x0� � x
0�}
= exp{(xj
+ 1)�j
� x
j
�j
}
= exp{�j
}.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 27 / 46
![Page 28: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/28.jpg)
Thus, ⇡̃1�⇡̃ = exp(�
j
)
⇡1�⇡ .
All other explanatory variables held constant, theodds of success at x
j
+ 1 are exp(�j
) times the odds of
success at x
j
.
This is true regardless of the initial value x
j
.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 28 / 46
![Page 29: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/29.jpg)
A one unit increase in the jth explanatory variable
(with all other explanatory variables held constant) isassociated with a multiplicative change in the odds ofsuccess by the factor exp(�
j
).
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 29 / 46
![Page 30: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/30.jpg)
If (Lj
,U
j
) is a 100(1 � ↵)% confidence interval for �j
,
then(exp(L
j
), exp(Uj
))
is a 100(1 � ↵)% confidence interval for exp(�j
).
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 30 / 46
P ( L;
E p ; E Uj ) = 100 ( i - a) %
# P ( exp ( L ;) E exp ( p ;) E exp ( U ;) ) = 100 It a) %
![Page 31: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/31.jpg)
Also, note that
⇡ =
exp(x
0�)
1 + exp(x
0�)=
1
1
exp(x
0�) + 1
=
1
1 + exp(�x
0�).
Thus, if (Lj
,Uj
) is a 100(1 � ↵)% confidence interval for
x
0�, then a 100(1 � ↵)% confidence interval for ⇡ is✓
1
1 + exp(�L
j
)
,1
1 + exp(�U
j
)
◆.
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 31 / 46
![Page 32: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/32.jpg)
> d=read.delim("http://dnett.github.io/S510/Disease.txt")
> head(d)
id age ses sector disease savings
1 1 33 1 1 0 1
2 2 35 1 1 0 1
3 3 6 1 1 0 0
4 4 60 1 1 0 1
5 5 18 3 1 1 0
6 6 26 3 1 0 0
>
> d$ses=factor(d$ses)
> d$sector=factor(d$sector)
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 32 / 46
LwtIanort THIS
VARIABLE IN
THIS EXAMPLE .
![Page 33: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/33.jpg)
> o=glm(disease˜age+ses+sector,
+ family=binomial(link=logit),
+ data=d)
>
> summary(o)
Call:
glm(formula = disease ˜ age + ses + sector,
family = binomial(link = logit),
data = d)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6576 -0.8295 -0.5652 1.0092 2.0842
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 33 / 46
X = [1, aze ,SEI ,
se=3 ,Sester 2)
- WE Win LEARN ABOUT THESE LATER.
![Page 34: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/34.jpg)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.293933 0.436769 -5.252 1.5e-07 ***
age 0.026991 0.008675 3.111 0.001862 **
ses2 0.044609 0.432490 0.103 0.917849
ses3 0.253433 0.405532 0.625 0.532011
sector2 1.243630 0.352271 3.530 0.000415 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 34 / 46
Test of
WE WILL LEARN ABOUT THIS LATER.
![Page 35: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/35.jpg)
Null deviance: 236.33 on 195 degrees of freedom
Residual deviance: 211.22 on 191 degrees of freedom
AIC: 221.22
Number of Fisher Scoring iterations: 3
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 35 / 46
NULL Moser : logit ( Hi ) = M ( IT ,=Tz= - . . = Tn )
Fuu Morsel : logit ( Hi ) = Xi' @
- 2
lodi) n - i
d. a
-24=(5+45) - Zl ,=(E) n -5
![Page 36: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/36.jpg)
> coef(o)
(Intercept) age ses2 ses3 sector2
-2.29393347 0.02699100 0.04460863 0.25343316 1.24363036
> round(vcov(o),3)
(Intercept) age ses2 ses3 sector2
(Intercept) 0.191 -0.002 -0.083 -0.102 -0.080
age -0.002 0.000 0.000 0.000 0.000
ses2 -0.083 0.000 0.187 0.072 0.003
ses3 -0.102 0.000 0.072 0.164 0.039
sector2 -0.080 0.000 0.003 0.039 0.124
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 36 / 46
§ ' B ' B' 3 $4 B's
VEr ( E)
bees
![Page 37: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/37.jpg)
> confint(o)
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) -3.19560769 -1.47574975
age 0.01024152 0.04445014
ses2 -0.81499026 0.89014587
ses3 -0.53951033 1.05825383
sector2 0.56319260 1.94992969
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 37 / 46
{ bs : Zech - 2llEb'D) extra }" * " " " �1� " ⇒ ¥%¥r%÷¥ )
( ,)<APPROXIMATE 95% CONFIDENCE
INTERVAL For Be .
![Page 38: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/38.jpg)
> oreduced=glm(disease˜age+sector,
+ family=binomial(link=logit),
+ data=d)
>
> anova(oreduced,o,test="Chisq")
Analysis of Deviance Table
Model 1: disease ˜ age + sector
Model 2: disease ˜ age + ses + sector
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 193 211.64
2 191 211.22 2 0.4193 0.8109
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 38 / 46
FIT A REDUCED MODEL THAT EXCLUDES Socioeconomic
status ( 5€52,
5€53 ) '
×=[1,
age ,sector}
~ ~ -
Ho :p3=fy=o
Ho :fsesz=Bses3=O
n -3 24dB) -2105.11=2 .ME) -2L ,dEr)
M - 5 Zl ,=(E)y
↳
,=p . VALUE
5-3 PM } 0.4193 )
![Page 39: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/39.jpg)
> o=oreduced
> anova(o,test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: disease
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 195 236.33
age 1 12.013 194 224.32 0.0005283 ***
sector 1 12.677 193 211.64 0.0003702 ***
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 39 / 46
SWITCH To REDUCED MODEL BECAUSE SES
NOT SIGNIFICANT
-
Nt -
[email protected] zettel )
'D / /) p( X. 312.013 )
zedo.sk,) # (
YannPCXT ? 12.677)
![Page 40: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/40.jpg)
> head(model.matrix(o))
(Intercept) age sector2
1 1 33 0
2 1 35 0
3 1 6 0
4 1 60 0
5 1 18 0
6 1 26 0
>
> b=coef(o)
> b
(Intercept) age sector2
-2.15965912 0.02681289 1.18169345
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 40 / 46
E
![Page 41: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/41.jpg)
> ci=confint(o)
Waiting for profiling to be done...
> ci
2.5 % 97.5 %
(Intercept) -2.86990940 -1.51605906
age 0.01010532 0.04421365
sector2 0.52854584 1.85407936
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 41 / 46
LIKELIHOOD BASED CONFIDENCE INTERVALS
![Page 42: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/42.jpg)
> #How should we interpret our estimate of
> #the slope coefficient on age?
> exp(b[2])
age
1.027176
> #All else equal, the odds of disease are about 1.027
> #times greater for someone age x+1 than for someone
> #age x. An increase of one year in age is associated
> #with an increase in the odds of disease by about 2.7%.
> #A 95% confidence interval for the multiplicative
> #increase factor is
> exp(ci[2,])
2.5 % 97.5 %
1.010157 1.045206
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 42 / 46
exp ( K)
![Page 43: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/43.jpg)
> #How should we interpret our estimate of
> #the slope coefficient on sector?
> exp(b[3])
sector2
3.25989
> #All else equal, the odds of disease are about 3.26
> #times greater for someone living in sector 2 than for
> #someone living in sector one.
> #A 95% confidence interval for the multiplicative
> #increase factor is
> exp(ci[3,])
2.5 % 97.5 %
1.696464 6.385816
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 43 / 46
exp ( $3 )
![Page 44: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/44.jpg)
> #Estimate the probability that a randomly
> #selected 40-year-old living in sector 2
> #has the disease.
> x=c(1,40,1)
> 1/(1+exp(-t(x)%*%b))
[,1]
[1,] 0.5236198
> #Approximate 95% confidence interval
> #for the probability in question.
> sexb=sqrt(t(x)%*%vcov(o)%*%x)
> cixb=c(t(x)%*%b-2*sexb,t(x)%*%b+2*sexb)
> 1/(1+exp(-cixb))
[1] 0.3965921 0.6476635
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 44 / 46
X.'
= [ I
,40
,I ]
I
I + expt - x.' E)
← a' Etesx
WE±z x.' ELITE
![Page 45: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/45.jpg)
> #Plot estimated probabilities as a function
> #of age for each sector.
>
> x=1:85
plot(x,1/(1+exp(-(b[1]+b[2]*x))),ylim=c(0,1),
type="l",col=4,lwd=2,xlab="Age",
ylab="Estimated Probability of Disease", cex.lab=1.3)
lines(x,1/(1+exp(-(b[1]+b[2]*x+b[3]))),col=2,lwd=2)
legend("topleft", legend=c("Sector 1","Sector 2"),
col=c(4,2),lwd=2)
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 45 / 46
![Page 46: A Generalized Linear Model for Bernoulli Response …...When the response is Bernoulli or more generally, binomial, the logit link function is one natural choice. However, other link](https://reader033.vdocuments.us/reader033/viewer/2022041613/5e399604604e075f1075b412/html5/thumbnails/46.jpg)
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
Age
Estim
ated
Pro
babi
lity
of D
isea
se
Sector 1
Sector 2
Copyright c�2017 Dan Nettleton (Iowa State University) Statistics 510 46 / 46