stac51: categorical data analysis

Introduction to Generalized linear Models

STAC51: Categorical data Analysis

Mahinda Samarakoon

March 23, 2016

Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 67


Table of contents

1 Introduction to Generalized linear Models




In ordinary regression models, we model means of Normalrandom variables as functions of some predictors (independentvariables).

Recall that the ordinary regression model is give by

Yi = β0 + β1xi1 + · · ·+ βpxip + εi

where εi are independent N(0, σ2).

This implies

E (Yi ) = β0 + β1xi1 + · · ·+ βpxip.

This model assumes that Yi has a Normal distribution.

What if this is not true? For example

Y can be a nominal categorical variable. Y can be a Poissonrandom variable. There are many other possibilities.




Generalized linear models(GLM) extend ordinary regressionmodels to encompass non-normal response variables andmodeling functions of the mean.

We can use these models to investigate the relationships(associations) among categorical and continuous variables.

They have three components:

Random component identifies the response variable Y and itsprobability distributionSystematic component identifies the explanatory variables usedin a linear predictor function.Link function specifies the function of E (Y ) that the modelequates to the liner predictor.



Introduction to Generalized linear Models:Randomcomponent

The random component of a GLM consists of a responsevariable Y with independent observations (y1, . . . , yN) from adistribution in the natural exponential family.

This family has probability density function or mass functionof form

f (yi ; θi ) = a(θi )b(yi ) exp(yiQ(θi )) (1)

Some important distributions, including the Poisson andbinomial are in this family.

The value of the parameter θi varies with i .

The parameter Q(θ) is called the natural parameter.

Note that there is a more general formula defining theexponential family, but this is sufficient for the discrete datawe discuss here.



Introduction to Generalized linear Models:Systematiccomponent component

The systematic component of a GLM relates a vector(η1, . . . , ηN) to the explanatory variables through a linearmodel. Let xij denote the value of predictor j(j = 1, 2, . . . , p.)for subject i .

Thenηi =

∑j

βixij , i = 1, . . . ,N.

This linear combination of explanatory variables is called thelinear predictor.



Introduction to Generalized linear Models:Link Function

Link function connects the random and systematiccomponents.

The model links µi = E (Yi ) to ηi by ηi = g(µi ), where thelink function g is a monotonic, differentiable function. Thus,g links E (Yi ) to explanatory variables through the formula

g(µi ) = β0 + β1xi1 + · · ·+ βpxip, i = 1, . . . ,N.



Introduction to Generalized linear Models:Link Function

The link function g(µ) = µ, called the identity link, hasηi = µi . ordinary regression with normally distributed Y .

The link function that transforms the mean to the naturalparameter is called the canonical link.

That is g(µi ) = Q(θi ) and

Q(θi ) = β0 + β1xi1 + · · ·+ βpxip, i = 1, . . . ,N.

Use of the canonical link has advantages (but not mandatory).



Example: Binomial Logit Models for Binary Data

For binary data P(Y = 1) = π and P(Y = 0) = 1− π.

Y has a Bernoulli distribution.

µ = E (Y ) = π.

We can express thee probability mass function as

f (y , π) = πy (1− π)1−y = (1− π)[π/(1− π)]y (2)

= (1− π) exp

[y

(log

π

1− π

)](3)

for y = 0 and 1.

This is a natural exponential family, identifying θ with π,a(π) = 1− π, b(y) = 1, and Q(π) = log π

1−π .

The natual parameter Q(π) = log π1−π is the log odds of the

response 1 (i.e. log odds of Y = 1), the logit of π.

with this canonical link function are called logistic regressionmodes, or sometimes simply as logit models



Example: Binomial Logit Models for Binary Data

Question: Can we use the ordinary regression model for binarydata?

regression model is

E (Yi )− πi = β0 + β1xi1 + · · ·+ βpxip, i = 1, . . . ,N.

The problem with this model is that πi is a probability (i.e.taking values between 0 and 1),

but linear functions take values over the entire real line.

This also doesn’t satisfy the usual assumptions of ordinaryregression model: Y does not have Normal distribution,

Var(Yi ) = πi (1− πi ) depends on i . That means Var(Y ) isnot constant.



Example:Loglinear Models for Count Data

The simplest distribution for count data is the Poissondistribution.

The probability mass function of the Poisson distribution is

f (y , π) =e−µµy

y != e−µ

(1

y !

)exp[y log(µ)], y = 0, 1, . . . .

This is a natural exponential family with θ = µ, a(µ) = e−µ,b(y) = 1/y !, and Q(µ) = log(µ).

The natural parameter is log(µ) and so the canonical link isthe log link.

the model using link function is

log(µi ) = β0 + β1xi1 + · · ·+ βpxip, i = 1, . . . ,N. (4)

This is called a Poisson loglinear model



Logistic regression model

To simplify the discussion, let’s use only one explanatoryvariable, x for predicting the probability of success, π(x)

The logistic regression model for this case is

logπ(x)

1− π(x)= α + βx (5)

or

π(x) =exp(α + βx)

1 + exp(α + βx)(6)

Note: F (x) = ex

1+ex is c.d.f of the standard logistic distribution

and so logistic regression model can be written asπ(x) = F (α + βx) where F is the c.d.f. of the standardlogistic distribution.



Graph of π(x) vs x for α = 1 and β = 0.5

#R code for plotting the graph of pi vs x

alpha<-1

beta1<- 0.5

curve(expr = exp(alpha+beta1*x)/(1+exp(alpha+beta1*x)),

from = -15, to = 15, col = "red", main =

expression(pi(x) == frac(e^{alpha+beta*x},

1+e^{alpha+beta*x})), xlab = "x",

ylab = expression(pi(x)), panel.first = grid(nx =

NULL, ny = NULL, col = "gray", lty = "dotted"))



Graph of π(x) vs x for α = 1 and β = −0.5



Logistic regression model with more than one independentvariable

This model can generalized to more than one independentvariable.

logπ(x)

1− π(x)= α + β1x1 + · · ·+ βpxp (7)

or

π(x) =exp(α + β1x1 + · · ·+ βpxp)

1 + exp(α + β1x1 + · · ·+ βpxp(8)



Interpretation of β’s

In the model with one independent variable, α represents thelog-odds Y = 1 when x = 0

and β represents the increase in the log-odds Y = 1 when xincreases by one unit.

In the model with more than one independent variable, αrepresents the log-odds Y = 1 when x1 = · · · = xp = 0

and βk represents the increase in the log-odds Y = 1 when xkincreases by one unit, holding other x variables fixed.



Interpretation of β’s

For example, in example in aspirin study , we found that theodds of a heart attack in the placebo group is 1.83 times thatin the aspirin group.In this example we can consider x = 1 as the placebo group,x = 0 as aspirin group. y = 1 mean got a heart disease andy = 0 means did not get a heart disease.substituting x = 0 in 5, we get the log odds for the aspiringroup

log(odds(0)) = logπ(0)

1− π(0)= α

Substituting x = 1 we get the log odds for the placebo group

log(odds(1)) = logπ(1)

1− π(1)= α + β

and so

β = log

(Odds(1)

Odds(0)

)or eβ represents the odds ratio.



Parameter estimation

We use the maximum likelihood methods to estimate theparameters (i.e α and β’s ). This requires numerical methods.

We can use the R function, glm(), to estimate the parameter.



Logistic regression model: Example

The R code below fits a logistic regression model for the data froman example from Kutner et al 2004. This example was based on astudy of the effect of computer programming experience on abilityto complete within a specified time a complex programming task,including debugging. Twenty-five persons were selected for thestudy. They had varying amounts of programming experience(measured in months of experience).




> # Example p565 Kutner et al

> data=read.table("C:/Users/Mahinda/Desktop/CH14TA01.txt", header=F)

> experience <- data[,1]

> task <- data[,2]

> cbind(experience, task)

experience task

[1,] 14 0

[2,] 29 0

[3,] 6 0

[4,] 25 1

[5,] 18 1

[6,] 4 0

[7,] 18 0

[8,] 12 0

[9,] 22 1

[10,] 6 0

[11,] 30 1

[12,] 11 0

[13,] 30 1

[14,] 5 0

[15,] 20 1

[16,] 13 0

[17,] 9 0

[18,] 32 1

[19,] 24 0

[20,] 13 1

[21,] 19 0

[22,] 4 0

[23,] 28 1

[24,] 22 1

[25,] 8 1




> model1 = glm(task ~ experience, family=binomial)

> summary(model1)

Call:

glm(formula = task ~ experience, family = binomial)

Deviance Residuals:

Min 1Q Median 3Q Max

-1.8992 -0.7509 -0.4140 0.7992 1.9624

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.05970 1.25935 -2.430 0.0151 *

experience 0.16149 0.06498 2.485 0.0129 *

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 34.296 on 24 degrees of freedom

Residual deviance: 25.425 on 23 degrees of freedom

AIC: 29.425

Number of Fisher Scoring iterations: 4

> # for every one-month increase in experience, estimated

odds of being able to perform the task are multiplied by exp(betahat1[1])




The estimated probability that a person will be able toperform the task is

π(xi ) =exp(α + βxi )

1 + exp(α + βxi )

=exp(−3.05970 + 0.16149xi )

1 + exp(−3.05970 + 0.16149xi )

For example the probability that a person with 24 monthsexperience will be able to perform the task is

π(24) =exp(−3.05970 + 0.16149× 24)

1 + exp(−3.05970 + 0.16149× 24)= 0.6934

.




> pihat = model1$fitted.values # estmated probabilities

> cbind(experience, task , pihat)

experience task pihat

1 14 0 0.31026237

2 29 0 0.83526292

3 6 0 0.10999616

4 25 1 0.72660237

5 18 1 0.46183704

6 4 0 0.08213002

7 18 0 0.46183704

8 12 0 0.24566554

9 22 1 0.62081158

10 6 0 0.10999616

11 30 1 0.85629862

12 11 0 0.21698039

13 30 1 0.85629862

14 5 0 0.09515416

15 20 1 0.54240353

16 13 0 0.27680234

17 9 0 0.16709980

18 32 1 0.89166416

19 24 0 0.69337941

20 13 1 0.27680234

21 19 0 0.50213414

22 4 0 0.08213002

23 28 1 0.81182461

24 22 1 0.62081158

25 8 1 0.14581508




> # Plot of estimated probability vs experience

> Estimated_prob <- function(experience) { exp(model1$coefficients[1] +

model1$coefficients[2]*experience) /

(1+exp(model1$coefficients[1]+model1$coefficients[2]*experience)) }

> curve(Estimated_prob, from=0, to=40, , xlab="experience",

+ ylab="Estimated Probability")

> abline(h=(seq(0,1,by=0.02)), col="blue", lty="dotted")

> abline(v=(seq(0,40,1)), col="blue", lty="dotted")



Logistic regression model: Example 2

The R code below fits a logistic regression model for the data (incontingency table) in aspirin above:

> x = c(rep(1, 189+10845), rep(0, 104+10933))

> y = c(rep(1, 189), rep(0, 10845), rep(1, 104), rep(0, 10933))

> length(x)

[1] 22071

> length(y)

[1] 22071

> model1 = glm(y ~ x, family=binomial)

> summary(model1)



Logistic regression model: Example 2

The R code below fits a logistic regression model for the data (incontingency table) in aspirin above:

Call:

glm(formula = y ~ x, family = binomial)

Deviance Residuals:


-0.1859 -0.1859 -0.1376 -0.1376 3.0544

Coefficients:


(Intercept) -4.65515 0.09852 -47.250 < 2e-16 ***

x 0.60544 0.12284 4.929 8.28e-07 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




AIC: 3093.3 Mahinda Samarakoon STAC51: Categorical data Analysis 26 / 67


Probit regression model

Another model for Bernoulli random component Y is theprobit regression model. This model uses the inverse standardnormal c.d.f Φ−1 as the link function. That is, the model

π(x) = Φ(α + βx) (9)

orΦ−1(π(x)) = α + βx . (10)

The curve has a similar appearance to logistic regression curve.




The curves for β = 0.5 and β = −0.5, both with α = 1 are shownbelow:




Which model to use?

This is not an easy question.

One way to decide is to try many models and see which onefits the data best.

logit is easier to interpret, through the use of odds and oddsratios and so is used often.



Probit regression model: Example

> # Example p565 Kutner et al

> data=read.table("C:/Users/Mahinda/Desktop/CH14TA01.txt", header=F)

> experience <- data[,1]

> task <- data[,2]

> cbind(experience, task)

experience task

[1,] 14 0

[2,] 29 0

[3,] 6 0

[4,] 25 1

[5,] 18 1

[6,] 4 0

[7,] 18 0

[8,] 12 0

[9,] 22 1

[10,] 6 0

[11,] 30 1

[12,] 11 0

[13,] 30 1

[14,] 5 0

[15,] 20 1

[16,] 13 0

[17,] 9 0

[18,] 32 1

[19,] 24 0

[20,] 13 1

[21,] 19 0

[22,] 4 0

[23,] 28 1

[24,] 22 1

[25,] 8 1



Probit Regression model: Example

> model2 = glm(task ~ experience, family=binomial (link = probit))

> summary(model2)

Call:

glm(formula = task ~ experience, family = binomial(link = probit))

Deviance Residuals:


-1.8959 -0.7579 -0.3907 0.8101 1.9691

Coefficients:


(Intercept) -1.83787 0.69012 -2.663 0.00774 **

experience 0.09686 0.03565 2.717 0.00659 **

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1







> pihatlogit = model1$fitted.values # estmated probabilities

> pihatprobit = model2$fitted.values # estmated probabilities

> cbind(experience, task , pihatlogit, pihatprobit)

experience task pihatlogit pihatprobit

1 14 0 0.31026237 0.31495382

2 29 0 0.83526292 0.83422869

3 6 0 0.10999616 0.10442754

4 25 1 0.72660237 0.72024848

5 18 1 0.46183704 0.46238565

6 4 0 0.08213002 0.07346854

7 18 0 0.46183704 0.46238565

8 12 0 0.24566554 0.24965602

9 22 1 0.62081158 0.61524129

10 6 0 0.10999616 0.10442754

11 30 1 0.85629862 0.85721025

12 11 0 0.21698039 0.21992975

13 30 1 0.85629862 0.85721025

14 5 0 0.09515416 0.08793556

15 20 1 0.54240353 0.53954616

16 13 0 0.27680234 0.28139084

17 9 0 0.16709980 0.16698550

18 32 1 0.89166416 0.89645092

19 24 0 0.69337941 0.68677231

20 13 1 0.27680234 0.28139084

21 19 0 0.50213414 0.50097045

22 4 0 0.08213002 0.07346854

23 28 1 0.81182461 0.80898266

24 22 1 0.62081158 0.61524129

25 8 1 0.14581508 0.14389004



Probit Regression model: Example

The two estimated regression curves (logistic and probit) areshown below.

> #Plotting estimated regression curves

> # Plot of estimated probability vs experience

>

> Estimated_prob <- function(experience) { exp(model1$coefficients[1] +

+ model1$coefficients[2]*experience) / (1+exp(model1$coefficients[1]+

model1$coefficients[2]*experience)) }

> # Or we can use

> #Estimated_prob <- function(experience) { plogis(model1$coefficients[1] +

> # model1$coefficients[2]*experience)}

> curve(Estimated_prob, from=0, to=40, col = "green", xlab="experience",

+ ylab="Estimated Probability")

> abline(h=(seq(0,1,by=0.02)), col="blue", lty="dotted")


>

> #---------------------------------------------------------------------------

> par(new = TRUE)

> #Ploting the estimated probablity from the probit model

> Estimated_prob2 <- function(experience) { pnorm(model2$coefficients[1] +

+ model2$coefficients[2]*experience)}

> curve(Estimated_prob2, from=0, to=40, col = "red", axes = FALSE,

xlab = "", ylab = "")

legend(locator(1), legend = c("Logit", "Probit"), lty = c(1,1),

col = c( "green", "red"))

#locator(1) places the legend at the place you click on the graph



Probit regression model: Example



Generalized linear models for count data

Counts of possible outcomes are non-negative integers.

These are often modeled as Poisson random variables.

A Poisson loglinear GLIM assumes a Poisson distribution forthe response and the log function for the link function. So,the linear predictor is related to the mean as

log(µ(x) = α + βx (11)

orµ(x) = exp(α + βx) = eα(eβ)x (12)

Interpretation of β: A unit increase in x has a multiplicativeimpact of eβ. i.e. the mean of Y at x + 1 is equal to themean at x times eβ.



Horseshoe Crab Mating Example p 123

For each i th female, assume the number of satellites, Yi , has aPoisson distribution with mean µi dependent on female shell width(xi ). We will model the expected number of satellites with thefollowing model:

log(µi ) = α + βxi .

The R code below fits the model for crab data:> # Log- linear model example

> # Example p 123

> crab=read.table("C:/Users/Mihinda/Desktop/crab.txt", header=T) #the data file

> model3 <- glm(formula = satellite ~ width, data = crab, family = poisson(link = log))

> summary(model3)

Call:

glm(formula = satellite ~ width, family = poisson(link = log),

data = crab)

Deviance Residuals:


-2.8526 -1.9884 -0.4933 1.0970 4.9221

Coefficients:


(Intercept) -3.30476 0.54224 -6.095 1.1e-09 ***

width 0.16405 0.01997 8.216 < 2e-16 ***




> # Predicting the mean response at a given value(s0 of x

> #Predict for 25 and 30 widths

> predict.data<-data.frame(width = c(25, 30))

> #Predcted vlues for mu

> pred1 <- predict(model3, newdata = predict.data, type = "response", se = TRUE)

> pred1

$fit

1 2

2.217477 5.035916

$se.fit

1 2

0.1345945 0.3703386

> pred2 <- predict(model3, newdata = predict.data, se = TRUE)

> #This gives predicted values for log(mu)

> pred2

$fit

1 2

0.7963699 1.6165954

$se.fit

1 2

0.06069713 0.07353947




> alpha<-0.05

> lower<-pred1$fit-qnorm(1-alpha/2)*pred1$se

> upper<-pred1$fit+qnorm(1-alpha/2)*pred1$se

> data.frame(predict.data, mu.hat = round(pred1$fit,3), lower = round(lower,3), upper = round(upper,3))

width mu.hat lower upper

1 25 2.217 1.954 2.481

2 30 5.036 4.310 5.762




> # Plot of estimated mean count vs width

> Estimated_count <- function(width) { exp(model1$coefficients[1] +

+ model1$coefficients[2]*width) }

> curve(Estimated_count, from=20, to=35, , xlab="Width",

+ ylab="Estimated Mean Count")

> abline(h=(seq(0,15,by=1)), col="blue", lty="dotted")




Overdispersion for Poisson GLMs

Count data often vary more than we would expect if theresponse distribution truly were Poisson.

The phenomenon of the data having greater variability thanexpected for a GLM is called overdispersion.




This might happen because the true distribution is a mixtureof different Poisson distributions.

One remedy for this is to find more explanatory variables andadd to the model.

negative binomial is a related distribution for count data thatpermits the variance to exceed the mean.

probability mass function of the negative binomial distributionis given by

f (y , k , π) =

(y + k − 1

y

)(1− π)yπk , y = 0, 1, 2, · · · (13)

where k > 0 and µ > 0 are parameters.

Negative binomial random variables can be interpreted as thenumber of failures before the kth success.




The mean and the variance of this distribution are given by

E (Y ) = µ =k(1− π)

πand (14)

Var(Y ) = k(1−π)π2 .

Note thatk

µ+ k= π

and

µ+ µ2/k =k(1− π)

π2= Var(Y ).

Note that E (Y ) < Var(Y ).

k is the (positive) dispersion parameter.

The smaller the dispersion parameter, the larger the varianceas compared to the mean. In R this is denoted by θ.

Note: Agresti uses γ = 1/k as dispersion parameter.




This probability mass function can also be written as

f (y , k , π) =

(y + k − 1

y

)(1− k

µ+ k

)y ( k

µ+ k

)k

, y = 0, 1, 2, · · ·

(15)



Negative Binomial GLMs: Example Horseshoe CrabMating Example

glm function in R cannot fit negative binomial regression models.We can use glm.nb function in the MASS package to estimate thismodel. The R code below uses the glm.nb to estimate a negativebinomial GLM for crab data.> #R code Negative binomial regression

> library(MASS)


> model3.nb<-glm.nb(formula = satellite ~ width, data = crab,

link = log)

> summary(model3.nb)

Call:

glm.nb(formula = satellite ~ width, data = crab, link = log,

init.theta = 0.90456808)

Coefficients:


(Intercept) -4.05251 1.17143 -3.459 0.000541 ***

width 0.19207 0.04406 4.360 1.3e-05 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(Dispersion parameter for Negative Binomial(0.9046) family taken to be 1)



Negative Binomial GLMs: Example Horseshoe CrabMating Exampl

(Dispersion parameter for Negative Binomial(0.9046) family taken to be 1)



AIC: 757.29

Number of Fisher Scoring iterations: 1

Theta: 0.905

Std. Err.: 0.161



Statistical Inference and Model Checking For GLMs: Waldtest

One test we are usually interested in H0 : β = 0 againstHa : β 6= 0. For large n, MLE’s are approximately Normal. Inparticular, β ∼ N(β,AsVar(β)) and so

Z =β − βSE

approx→ N(0, 1)

and this result can be used to calculate approximate p-value.(Wald test) Consider the crab data. Test whether the number ofsatellites is independent of the width.Solution: z = 8.216, the p-value is 2× 10−16 < 0.05 and reject thenull hypothesis. An approximate 95 percent confidence interval is0.16405± 1.96× 0.01997



Statistical Inference and Model Checking For GLMs:Likelihood Ratio test

We have discussed the LRT before. This can also be usedhere.Recall

Λ =Maximum likelihood under the null hypohesis

Unrestricted maximum likelihoodFor testing H0 : β = 0, the numerator is calculated assumingβ = 0. Thus, the model fit to the data is only g(µ) = αwhere g(µ) is the link function.The denominator is calculated without assuming β = 0 and sothe model fit to the data is g(µ) = α + βx We know for largen,G 2 = −2 log(Λ) has an approximate chisquared distribution.The degrees of freedom is the number of parameters in inunrestricted model - the number of parameters in the modelunder the null hypothesis.




For example for testing the null hypothesis H0 : β = 0, thedegrees of freedom is 1.

The value of G 2 is not always given in software outputs.

They often give ”null deviance” and ”residual deviance”.

These values are G 2 values for testing some differenthypothesis, but we often can use them to calculate the valuevalue of G 2 for our other tests.

For examples The value of G 2 for testing H0 : β = 0 againstHa : β 6= 0 is simply null deviance - residual deviance.




Null deviance (G 21 ) tests H0: Model with only α against H1:

Saturated model.

Example Poisson GLM : In Poisson GLM, the saturated modelassumes Yi ∼ Poisson (µi for i = 1, . . . , n)

and the MLE of µi is yi .

G 21 = 2

n∑i=1

yi log

(yiµ0,i

)(16)

where µ0,i = eα0 and α0 is the MLE of α in the modellogµi = α, i = 1, . . . , n.

Residual deviance (G 22 )tests H0: Model with only α and β

against H1: Saturated model. .




Null deviance (G 21 ) tests H0: Model with only α against H1:

Saturated model.

Example Poisson GLM : In Poisson GLM, the saturated modelassumes Yi ∼ Poisson (µi for i = 1, . . . , n)

and the MLE of µi is yi .

G 21 = 2

n∑i=1

yi log

(yiµ0,i

)(17)

where µ0,i = eα0 and α0 is the MLE of α in the modellogµi = α, i = 1, . . . , n.

Residual deviance (G 22 )tests H0: Model with only α and β

against H1: Saturated model. .



Statistical Inference and Model Checking For GLMs:Likelihood Ratio test Example

> # Log- linear model example

> # Example p 123


> model3 <- glm(formula = satellite ~ width, data = crab, family = poisson(link = log))

> summary(model3)

Call:

glm(formula = satellite ~ width, family = poisson(link = log),

data = crab)

Deviance Residuals:


-2.8526 -1.9884 -0.4933 1.0970 4.9221

Coefficients:


(Intercept) -3.30476 0.54224 -6.095 1.1e-09 ***

width 0.16405 0.01997 8.216 < 2e-16 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(Dispersion parameter for poisson family taken to be 1)



AIC: 927.18

—————————————————————————————— Mahinda Samarakoon STAC51: Categorical data Analysis 53 / 67


Statistical Inference and Model Checking For GLMs:Likelihood Ratio test Example

Crab data

G 2 = 632.79− 567.88 = 64.91

Degreed of freedom = 172 - 171 = 1

Chi-square table value at α = 0.05 = 3.84

We reject the null hypothesis H0 : β = 0

Data shows evidence to indicate that width has a significanteffect on the number of satellites.



Residuals for GLMs p140

Pearson residual for observation i is defined by

ei =yi − µi√

µi(18)

and standardized residuals are defined by

ri =yi − µi√µi (1− hi )

(19)

where hi is the ith diagonal element of the hat matrix.

H = W1/2X(XTWX)−1XTW1/2. (20)

hi ’s are known as leverages.

The standardized residual has a distribution that is closer to astandard normal distribution than the Pearson residual.



Residuals for GLMs: Example

The R code below calculates the standardized residuals and createsresidual plots

> #RCode Residuals for Poissin Reg


> poissonReg <- glm(formula = satellite ~ width, data = crab,

family = poisson(link = log))

> e <-residuals(poissonReg, type="pearson")

> X<-model.matrix(poissonReg)

> muhat<-predict(poissonReg, type = "response")

> W <- diag(muhat)

> H<-(W^(1/2))%*%X%*%solve(t(X)%*%W%*%X)%*% t(X)%*%(W^(1/2))

> h <- diag(H)

> head(h)

[1] 0.009852370 0.006360719 0.006945761 0.019161622 0.014825698 0.008169498

> r <- e/sqrt(1-h)

> head(e)

1 2 3 4 5 6

2.1463312 0.8582102 -1.5642375 -1.0726099 -1.5836582 0.5254940

> head(r)

1 2 3 4 5 6

2.1569832 0.8609527 -1.5696984 -1.0830364 -1.5955298 0.5276537





>

>

> h<-lm.influence(poissonReg)$h

> head(h)

1 2 3 4 5 6

0.009852370 0.006360719 0.006945761 0.019161622 0.014825698 0.008169498

> r <- e/sqrt(1-h)

> head(r)

1 2 3 4 5 6

2.1569832 0.8609527 -1.5696984 -1.0830364 -1.5955298 0.5276537





> #----------------------------------------------------------------

> #Standardized residual vs observation number

> plot(x = 1:length(r), y = r, xlab="Observation number",

+ ylab="Standardized residuals", main = "Standardized

+ residuals vs. observation number")

> abline(h = c(-3, 3), lty=3, col="red")

> #-----------------------------------------------------------------

> par(mfrow = c(1,1))

> # Plot of Residual vs width

> plot(x = crab$width, y = r, xlab="Width",

+ ylab="Standardized Pearson residuals", main =

+ "Standardized Pearson residuals vs. width")


> #-------------------------------------------------------------

> plot(x = crab$width, y = r, xlab="Width",

+ ylab="Standardized Pearson residuals", main =

+ "Standardized Pearson residuals vs. width", type = "n")

> text(x = crab$width, y = r,

+ labels = crab$satellite, cex=0.75)

> abline(h = c(-3,3), lty=3, col="red")





> #R code Negative binomial regression

> library(MASS)


> model4.nb<-glm.nb(formula = satellite ~ width, data = crab, link = log)

> e.nb<-residuals(model4.nb, type="pearson")

> h.nb<-lm.influence(model4.nb)$h

> r.nb<-e.nb/sqrt(1-h.nb)

> par(mfrow = c(1,2))

> plot(x = 1:length(r.nb), y = r.nb, xlab="Obs. number",

+ ylab="Standardized residuals",

+ main = "Stand. residuals (Neg Bin model) vs. obs. number")


>

> plot(x = crab$width, y = r.nb,

+ xlab="Width", ylab="Standardized residuals",

+ main = "Stand. residuals (Neg Bin model) vs. width", type = "n")

> text(x = crab$width, y = r.nb, labels =

+ crab$satellite, cex=0.75)


>



Goodness of Fit:Pearson Chisquare

For Poisson regression, the statistic

χ2 =n∑

i=1

(yi − µi )2

µi.

The statistic as an approximate χ2 distribution with n -number of model parameters = n − 2 degrees of freedom forlarge n.

In order for the χ2 approximation to work well, µi should notbe small.

Rule of thumb µi ≥ 5



Goodness of Fit: LRT

For Poisson regression, LRT comparing the model with thesaturated model is

G 2 = −2 log(Λ) = 2n∑

i=1

yi log

(yiµi

)

where µi = eα+βxi

The statistic has an approximate χ2 distribution with n − 2degrees of freedom for large n. In R this is called the residualdeviance.



Goodness of fit: Example

The R code and output for Crab data are given below. Use thePearson chisq test and the LRT to test goodness of fit of themodel.> #RCode Residuals for Poissin Reg

> crab=read.table("C:/Users/Mihinda/Desktop/crab.txt",

header=T) #the data file

> poissonReg <- glm(formula = satellite ~ width, data = crab,

family = poisson(link = log))

> summary(poissonReg)

Coefficients:


(Intercept) -3.30476 0.54224 -6.095 1.1e-09 ***

width 0.16405 0.01997 8.216 < 2e-16 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(Dispersion parameter for poisson family taken to be 1)



> pear.res<-resid(poissonReg, type="pearson")

> pearsonChisq <- sum(pear.res^2)

> pearsonChisq

[1] 544.157

> p_pearson <- 1-pchisq(pearsonChisq, df = poissonReg$df.residual)

> p_pearson

[1] 0.



Goodness of Fit:Example

Both tests indicate lack of fit of the model

Some yi ’s (not printed) are less than 5 and so the test is notvery reliable


stac51: categorical data analysis

Documents