i owa s tate u niversity department of animal science proc glimmix generalized mixed linear models...
TRANSCRIPT
IOWA STATE UNIVERSITYDepartment of Animal Science
PROC GLIMMIXGeneralized Mixed Linear Models
Animal Science 500
Lecture No. 17- 18
October 25, 2010
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX Information
u PROC GLIMMIX is a procedure for fitting Generalized Linear Mixed Models
u GLiM’s (or GLM’s) allow for non-normal data and random effects
u GLiM’s allow for correlation amongst responses
An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIXP. Gibbs, SAS Technical Support
IOWA STATE UNIVERSITYDepartment of Animal Science
Getting GLIMMIX
u SAS 9.1 Download add-on (Windows, Unix, Linux) fromn http://support.sas.comn http://www.sas.com/statistics
u Supported on a limited number of platforms and platform configurations
u SAS 9.2 (available now for most academic sites)
An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIXP. Gibbs, SAS Technical Support
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overview
u PROC GLIMMIX fits statistical models to data with correlations or nonconstant variability and where the response is not necessarily normally distributed.
u These models are known as generalized linear mixed models (GLMM).
u The GLMMs, like linear mixed models, assume normal (Gaussian) random effects.
u Conditional on these random effects, data can have any distribution in the exponential family
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overview
u The exponential family comprises many of the elementary discrete and continuous distributions and include: n Binary,
l The experiment consists of n repeated trials. l Each trial can result in just two possible outcomes. We call one of
these outcomes a success and the other, a failure. l The probability of success, denoted by P, is the same on every trial. l The trials are independent - that is, the outcome on one trial does
not affect the outcome on other trials.
n Binomial, n Poisson, and n Negative binomial distributions,
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overview
u The exponential family comprises many of the elementary discrete and continuous distributions and include: n Binomial,
l Situations in which the coin for example is biased, so that heads and tails have different probabilities.
l The probability distributions for which there are just two possible outcomes with fixed probability summing to one.
l These distributions are called are called binomial distributions
n Poisson, and n Negative binomial distributions,
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overview
u The exponential family comprises many of the elementary discrete and continuous distributions and include: n Poisson,
n The poisson distribution is an appropriate model for count data.l Examples of such data are mortality data, l The number of misprints in a book, l The number of bacteria on a plate, and l The number of activations of a geiger counter.
n Negative binomial distributions,
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overview
u The exponential family comprises many of the elementary discrete and continuous distributions and include: n Negative binomial distributions,
l The probability distribution of a negative binomial random variable is called a negative binomial distribution.
l Also known as the Pascal distribution. l Example: You are flipping a coin repeatedly and count the number
of heads (successes). If we continue flipping the coin until it has landed 2 times on heads, we are conducting a negative binomial experiment.
l The negative binomial random variable is the number of coin flips required to achieve 2 heads.
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overviewu The exponential family comprises many of the elementary
discrete and continuous distributions and include:
u The previous distributions are discrete members of this family.
u The normal, beta, gamma, and chi-square distributions are representatives of the continuous distributions in this family.
u In the absence of random effects, the GLIMMIX procedure fits generalized linear models (fit by the GENMOD procedure).
u GLMMs are useful for estimating trends in disease rates
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overviewu The continuous distribution forms in the exponential family
include the:n Normal, also called Gaussiann Beta
l The Beta distribution has two main uses:
1. As the description of uncertainty or random variation of a probability, fraction or prevalence;
2. As a useful distribution one can rescale and shift to create distributions with a wide range of shapes and over any finite range. As such, it is sometimes used to model expert opinion
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overviewu The continuous distribution forms in the exponential family
include the:n Gamma,
l Applications based on intervals between events which derive from it being the sum of one or more exponentially distributed variables. In this form, examples of its use include queuing models, the flow of items through manufacturing and distribution processes and the load on web servers and the many and varied forms of telecom exchange.
l Due to its moderately skewed profile, it can be used as a model in a range of disciplines, including climatology where it is a workable model for rainfall and financial services where it has been used for modelling insurance claims and the size of loan defaults and as such has been used in probability of ruin and value at risk calculations.
From http://www.brighton-webs.co.uk/distributions/gamma.asp
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overviewu The continuous distribution forms in the exponential family
include the:
n Chi-square distributionsl The best-known situations in which the chi-square distribution is used are the common
chi-square tests for goodness of fit to compare an observed distribution to a known or theoretical distribution.
Example expected movie rating distribution to the observed movie rating distributionl Also can be used to test the independency of two criteria of classification of qualitative
data.l χ2 = Σ (O – E)2
E
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overviewu The continuous distribution forms in the exponential family
include the:
n Chi-square distributions
n Hypothesesl H0: The distribution of observed frequencies equals the distribution of expected
frequencies. l H1: The distribution of observed frequencies does not equal the distribution of expected
frequencies.
n Assumptionsl Observations are independent (each subject can appear once and only once in a
table) l Expected frequencies in each row are at least 15.
IOWA STATE UNIVERSITYDepartment of Animal Science
Example of a Chi Square distributionu Example 1: Pepsi Challenge
n Test whether cola preference among 220 college students in a simple random sample is equally distributed. n Each individual tastes each of the three colas.n Between tastes subjects eat a soda cracker.n Each subject receives the colas in a different order. n Each subject then selects which soda he/she likes best.
u Results: Pepsi 85, Coke 57, RC 78.n Use equal expected frequencies for each row, E = 73.33.
O E O-E (O-E)2 (O-E)2/E
Pepsi 85 73.33 11.67 136.19 1.86
Coke 57 73.33 -16.33 266.67 3.64
RC 78 73.33 4.67 21.81 0.3
Totals 220 219.99 χ2 = 5.8
u df = rows - 1 = 3 - 1 = 2.
u Critical value of χ2 = 5.99 at alpha = 0.05.
u Observed value of χ2 = 5.8.
u Decision: Fail to reject H0.
Example from: http://www.philender.com/courses/intro/notes3/chi.html
IOWA STATE UNIVERSITYDepartment of Animal Science
Distributions Supported in PROC GLIMMIX
u Discreten Binaryn Binomialn Poissonn Geometricn Negative Binomial n Multinomial (nominal
and ordinal)
Continuous Beta Normal “Lognormal” Gamma Exponential Inverse Gaussian Shifted T
Distributions specified through DIST= (and LINK=) options on the MODEL statement
An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIXP. Gibbs, SAS Technical Support
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overviewu In the absence of random effects, the GLIMMIX procedure
fits generalized linear models (fit by the GENMOD procedure).
u GLMMs are useful for estimating:n Trends in disease rates, n Modeling CD4 counts in a clinical trial over time, n Modeling the proportion of infected plants on experimental units in a
design with randomly selected treatments or randomly selected blocksn Predicting the probability of high ozone levels in countiesn Modeling skewed data over time, n Analyzing customer preference, n Joint modeling of multivariate outcomes, etc.
IOWA STATE UNIVERSITYDepartment of Animal Science
GLIMMIX overview
u The syntax in SAS to use GLIMMIX to what we have learned for Proc Mixed. n This includes CLASS, MODEL, and RANDOM
statements.
IOWA STATE UNIVERSITYDepartment of Animal Science
PROC GLIMMIX features.u SUBJECT= and GROUP= options, which enable blocking of variance
matrices and parameter heterogeneity
u Linear unbiased predictors
u Flexible covariance structures for random and residual random effects, including variance components, unstructured, autoregressive, and spatial structures
u The CONTRAST, ESTIMATE, LSMEANS, and LSMESTIMATE statements, which produce hypothesis tests and estimable linear combinations of effects
u The NLOPTIONS statement, which enables you to exercise control over the numerical optimization.
u You can choose techniques, update methods, line search algorithms, convergence criteria, and more. Or, you can choose the default optimization strategies selected for the particular class of model you are fitting.
IOWA STATE UNIVERSITYDepartment of Animal Science
PROC GLIMMIX features.u Computed variables with SAS programming statements inside of
PROC GLIMMIX (except for variables listed in the CLASS statement). n These computed variables can appear in the MODEL, RANDOM, WEIGHT, or FREQ
statement.
u User-specified link and variance functions choice of model-based variance-covariance estimators for the fixed effects or empirical (sandwich)
u Estimators to make analysis robust against misspecification of the covariance structure and to adjust for small-sample bias joint modeling for multivariate data. n For example, you can model binary and normal responses from a subject jointly and
use random effects to relate (fuse) the two outcomes.
IOWA STATE UNIVERSITYDepartment of Animal Science
Comparing the GLIMMIX and MIXED Procedures
u The MIXED procedure is different from the GLIMMIX procedure in the following respect: n Linear mixed models are a special case in the family of generalized linear mixed
models;n A linear mixed model is a generalized linear mixed model where the conditional
distribution is normal and the link function is the identity function.
u Most models that can be fit with the MIXED procedure can also be fit with the GLIMMIX procedure.
u Despite this overlap in functionality, there are also some important differences between the two procedures.
u Knowledge concerning the differences enables the user to select the most appropriate tool in situations where you have a choice between procedures and to identify situations where a choice does not exist.
IOWA STATE UNIVERSITYDepartment of Animal Science
Comparing the GLIMMIX and MIXED Procedures
The following PROC MIXED statement when using the repeated statement
repeated / subject=id type=ar(1);
is equivalent to the following Random statement in the GLIMMIX procedure:
random _residual_ / subject=id type=ar(1);
IOWA STATE UNIVERSITYDepartment of Animal Science
Syntax: GLIMMIX Procedureu You can specify the following statements in the GLIMMIX
procedure:
u PROC GLIMMIX <options> ;
u BY variables ;
u CLASS variables ;
u CONTRAST ’label’ contrast-specification <, contrast-specification> <, ...> </ options> ;
u COVTEST <’label’> <test-specification> </ options> ;
u EFFECT effect-specification ;
u ESTIMATE ’label’ contrast-specification <(divisor=n)><, ’label’ contrast-specification <(divisor=n)>> <, ...> </ options> ;
u FREQ variable
IOWA STATE UNIVERSITYDepartment of Animal Science
Syntax: GLIMMIX Procedureu ID Variables ;
u LSMEANS fixed-effects </ options> ;
u LSMESTIMATE fixed-effect <’label’> values <divisor=><, <’label’> values <divisor=n>> <, ...> </ options> ;
u MODEL response<(response-options)> = <fixed-effects> </ model-options> ;
u MODEL events/trials = <fixed-effects> </ model-options> ;
u NLOPTIONS <options> ;
u OUTPUT <OUT=SAS-data-set><keyword<(keyword-options)> <=name>>...<keyword<(keyword-options)> <=name>> </ options> ;
u PARMS (value-list) ...</ options> ;
u RANDOM random-effects </ options> ;
IOWA STATE UNIVERSITYDepartment of Animal Science
Syntax: GLIMMIX Procedureu WEIGHT variable ;
u Programming statements ;
u The CLASS, CONTRAST, COVTEST, EFFECT, ESTIMATE, LSMEANS, LSMESTIMATE, and RANDOM statements and the programming statements can appear multiple times.
u The PROC GLIMMIX and MODEL statements are required, and the MODEL statement must appear after the CLASS statement if a CLASS statement is included.
u The EFFECT statements must appear before the MODEL statement.
IOWA STATE UNIVERSITYDepartment of Animal Science
Comparing MIXED and GLIMMIXPROC GLIMMIX PROC MIXEDBY BYCLASS CLASSCONTRAST CONTRASTEFFECTESTIMATE ESTIMATEFREQID IDLSMEANS LSMEANSLSMESTIMATEMODEL MODELNLOPTIONSOUTPUTPARMS PARMSPRIORRANDOM RANDOM
REPEATEDWEIGHT WEIGHT<Programming Statements>
An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIXP. Gibbs, SAS Technical Support
IOWA STATE UNIVERSITYDepartment of Animal Science
Comparing MIXED and GLIMMIX
MIXED uses RANDOM statement for G-side effects and REPEATED statement for R-side effects.
An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIXP. Gibbs, SAS Technical Support
IOWA STATE UNIVERSITYDepartment of Animal Science
Comparing MIXED and GLIMMIX
Both types of effects are specified with the RANDOM statement in GLIMMIX
An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIXP. Gibbs, SAS Technical Support
IOWA STATE UNIVERSITYDepartment of Animal Science
Comparing MIXED and GLIMMIX
What are G-and R-side Random Effects?Recallr from mixed models: Y = X*Beta + Z*Gamma + En G-side effects enter through Z*Gamman R-side effects apply to the covariance matrix on En G-side effects are “inside” the link function, making them
easier to interpret and understandn R-side effects are “outside” the link function and are
more difficult to interpret
An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIXP. Gibbs, SAS Technical Support
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix Example
Proc glimmix data=one;
Class treatment date site load;
Model deads/pigs_transported = treatment/ dist=binomial link=logit solution;
Random site date(site) load(date*site);
LSMeans treatment/ilink pdiff cl;
Run;
Quit;
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix ExampleThe GLIMMIX Procedure
Model Information
Data Set WORK.ONE
Response Variable (Events) Deads
Response Variable (Trials) Pigs_Transported
Response Distribution Binomial
Link Function Logit
Variance Function Default
Variance Matrix Not blocked
Estimation Technique Residual PL
Degrees of Freedom Method Containment
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix ExampleClass Levels Values
Treatment 2 Blue Red
Date 10 07/07/09 07/08/09 07/13/09 07/14/09 07/15/09
07/20/09 07/21/09 07/22/09 07/27/09 07/28/09
Site 2 L&L1 LPB
Load 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 28
Number of Observations Read 54
Number of Observations Used 54
Number of Events 10
Number of Trials 4462
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix ExampleDimensions
G-side Cov. Parameters 3
Columns in X 3
Columns in Z 40
Subjects (Blocks in V) 1
Max Obs per Subject 54
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix ExampleThe GLIMMIX Procedure
Iteration History
Objective Max
Iteration Restarts Subiterations Function Change Gradient
0 0 1 180.8730287 2.00000000 9.588598
1 0 0 226.21287482 0.17907168 5.707842
2 0 3 244.93049605 2.00000000 4.510821
3 0 2 241.99123222 0.24664831 4.378435
4 0 2 241.22432004 0.03671922 4.357186
5 0 1 241.08063527 0.00328332 4.35531
6 0 1 241.06655367 0.00015363 4.355223
7 0 0 241.06587398 0.00000000 4.355221
Convergence criterion (PCONV=1.11022E-8) satisfied.
Estimated G matrix is not positive definite.
u
The Estimated G matrix not positive definite message usually indicates that one or more variance components on the RANDOM statement is/are estimated to be zero and could/should be removed from the model.
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix Example
Fit Statistics
-2 Res Log Pseudo-Likelihood 241.07
Generalized Chi-Square 47.41
Gener. Chi-Square / DF 0.91
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix Example
Covariance Parameter Estimates
Standard
Cov Parm Estimate Error
Site 0 .
Date(Site) 0 .
Load(Date*Site) 0.1569 0.7068
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix Example Solutions for Fixed Effects
Standard
Effect Treatment Estimate Error DF t Value Pr > |t|
Intercept -5.9213 0.4160 1 -14.23 0.0447
Treatment Blue -0.4067 0.6466 26 -0.63 0.5348
Treatment Red 0 . . . .
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix ExampleType III Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
Treatment 1 26 0.40 0.5348
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix ExampleTreatment Least Squares Means
Standard
Treatment Estimate Error DF t Value Pr > |t| Alpha Lower Upper Mean
Blue -6.3280 0.5066 26 -12.49 <.0001 0.05 -7.3692 -5.2867 0.001782
Red -5.9213 0.4160 26 -14.23 <.0001 0.05 -6.7764 -5.0662 0.002675
Treatment Least Squares Means
Standard
Error Lower Upper
Treatment Mean Mean Mean
Blue 0.000901 0.000630 0.005033
Red 0.001110 0.001139 0.006267
IOWA STATE UNIVERSITYDepartment of Animal Science
Glimmix Example
Differences of Treatment Least Squares Means
Standard
Treatment _Treatment Estimate Error DF t Value Pr > |t| Alpha Lower Upper
Blue Red -0.4067 0.6466 26 -0.63 0.5348 0.05 -1.7358 0.9224
IOWA STATE UNIVERSITYDepartment of Animal Science