new ways to analyse counts and proportions from complex ... · regression guide figure 3.15....

New ways to analysecounts and proportions

from complex investigationsa practical introduction to HGLMs

Wageningen, 18th June 2008

Roger PayneVSN International, 5 The Waterhouse,

Waterhouse Street, Hemel Hempstead, UK

Some history• least squares

• Gauss (1809), Legendre (1805)• analysis of variance

• Fisher (1918) The correlation between relatives on the supposition of Mendelianinheritence. Trans. Roy. Soc. Edinb.

• transformations• Fisher (1915) Frequency distribution of the values of the correlation coefficient in

samples from an infinitely large population. Biometrika• Bartlett (1947) The use of transformations. Biometrics

• probit analysis• Fisher (1924) Case of zero survivors in probit assay. Ann.Appl.Biol.• Finney Probit Analysis (1947, 1964, 1971)

• generalized linear models• Nelder & Wedderburn (1972) Generalized linear models. J. Roy. Statist. Soc. A

• generalized linear mixed models• Schall, Estimation in GLMs with random effects. Biometrika• Gilmour et al., 1985, Biometrika• Engel & Keen (1994). A simple approach for the analysis of generalized linear

mixed models. Statistica Neerlandica

• hierarchical generalised linear models• Lee & Nelder (1996) Hierarchical generalised linear models. J. Roy. Statist. Soc. A

..

Generalized linear models• extend the usual regression framework to cater for

non-Normal distributions e.g.•Poisson for counts – number of items sold in a shop,

or numbers of accidents on a road, number of fungal spores on plants etc

•binomial data recording r “successes” out of n trials – surviving patients out of those treated, weeds killed out of those sprayed etc

• also incorporate a link function defining the transformation required for a linear model e.g.•log with Poisson data (counts)•logit, probit or complementary log-log for binomial data

• but, once the distribution and link are defined, you can fit them just like other regression models

..

e.g. Regression Guide Figure 3.14• comparison of the

effectiveness of three analgesic drugs to a standard drug, morphine (Finney, Probit analysis, 3rd Edition 1971, p.103)

• 14 groups of mice were tested for response to the drugs at a range of doses

• N is total number of mice in each group

• R is number responding• need to calculate

LogDose = LOG10(Dose)

..

Regression Guide Figure 3.15

Generalized linear models

• ordinary regression - model y = μ + εμ is the mean predicted by a model μ = X β

e.g. a + b × x

ε is the residual with Normal distribution N(0, σ2)

y (equivalently) has Normal distribution N(μ, σ2)

• generalized linear model – still E(y) = μbut model now defines the linear predictor η = X β

μ & η are related by the link function η = g(μ)

y has distribution from the exponential family (mean μ)

e.g. binomial, gamma, Normal, Poisson

• references• Guide to GenStat, Part 2 Statistics, Section 3.5.• McCullagh, P. & Nelder, J.A. (1989). Generalized Linear

Models (second edition). Chapman & Hall, London.

..

Hierarchical generalized linear models

• expected value E(y) = μlink η = g(μ)distribution Normal, Binomial, Poisson or Gamma (from exponential family)

• but linear predictor η = X β + ∑i Zi νinow contains additional vectors of random effects νi with Normal, beta, gamma or inverse gamma distributions and with their own link functions

• Normal-identity gives a GLMM but HGLM algorithms use much improved Laplace approximations in their use of adjusted profile likelihood

• references• Guide to GenStat, Part 2 Statistics, Section 3.5.11.• Lee, Y. & Nelder, J.A. (1996, 1998, 1999, 2001, 2006..). • Lee, Y., Nelder, J.A. & Pawitan, Y. (2006). Generalized

Linear Models with Random Effects: Unified Analysis via H-likelihood. Chapman & Hall.

..

HGLMs – overview

• Hierarchical generalized linear models• extend generalized linear models to >1 source of error• include generalized linear mixed models as a special case

• but the additional random terms are not constrained to follow a Normal distribution, nor to have an identity link

• allow for modelling of the dispersion of the error terms• extending quasi-likelihood methods of Nelder & Pregibon (1987)

• have an efficient fitting algorithm• no numerical integration is required

• are explained in the book Generalized Linear Models with Random Effects: Unified Analysis via H-likelihood by Lee, Nelder & Pawitan (2006)

• examples available in GenStat for Windows 9th Edition onwards

• Hierarchical generalized nonlinear models• include nonlinear parameters in the HGLM fixed model

• in GenStat for Windows 10th Edition

..

Fitting algorithm

• interconnected Normal and Gamma GLMs (§5.4.4)

HGLMs in GenStat

• procedures (Payne, Lee, Nelder & Noh 2008)• HGFIXEDMODEL – defines the fixed model for an HGLM or DHGLM• HGRANDOMMODEL – defines the random model for an HGLM• HGDRANDOMMODEL – adds random terms into the dispersion models

of an HGLM, so that the whole model becomes a DHGLM• HGNONLINEAR – defines nonlinear parameters for the fixed model• HGANALYSE – fits a hierarchical generalized linear model (HGLM) or a

double hierarchical generalized linear model (DHGLM)• HGDISPLAY – displays results from an HGLM or DHGLM• HGPLOT – produces model-checking plots for an HGLM or DHGLM• HGPREDICT – forms predictions from an HGLM or DHGLM analysis• HGKEEP – saves information from an HGLM or DHGLM analysis• HGGRAPH – plots predictions from an HGLM or DHGLM analysis• HGWALD – gives Wald tests for fixed terms that can be dropped

• menus• Stats | Regression Analysis | Mixed Models | Hierarchical Generalized

Linear Models• cover the standard situations, but not dispersion modelling nor DHGLMs..

GenStat HGLM examples

Example – chocolate cakes

• LNP §5.5• breaking angle of

chocolate cakes• split plot:

Replicate/Batch/Cake• treatment factors:

Recipe (whole-plot factor, between Batches), Temperature (sub-plot factor, within Batches)

• analyse as a Normal-Normal HGLM to compare with familiar REML

..

HGLM menu – for cakes

• find menu in Mixed models section of Regression Analysis on Stats menu

Output: Normal-Normal HGLM

mean model

dispersion model here just fits variance components

estimates of parameters in the mean model

..

Output: Normal-Normal HGLMestimated parameters in mean model (continued) fixed terms only by default

parameters in dispersion models (logged variance components)

assess random & dispersion models by –2×Pβ,v(h)fixed model by –2×Pv(h)for DIC use –2×(h/v)h-likelihood of mean model is –2×(h)EQD's are approximations to profile likelihoods ..

Compare to REML

Assess fixed model using –2 pν(h)

Conjugate HGLMs (LNP §6.2)• random distribution is the conjugate of the GLM distribution

• Normal – Normal most obvious• Poisson – Gamma most useful?• Binomial – Beta next most useful?• Gamma – Inverse Gamma

• algorithmically and intuitively appealing• random parameters are on the canonical scale (LNP §4.5)• contribution of the random parameters to the extended

likelihood (Ξ the h-likelihood) has the same form as the likelihood of the base GLM

• so it can easily be fitted together with the base GLM in the augmented mean model (same variance function, same iterative reweighting scheme etc...)

• likelihood factorizes so no need for numerical integration or approximations

..

Non-conjugate HGLMs• random distribution is conjugate of another GLM distribution

• Poisson – Normal Poisson GLMM• Binomial – Normal Binomial GLMM• Gamma – Normal Gamma GLMM

• algorithmically more difficult, but can still be fitted within the GLM framework (LNP §6.4)

• random parameters no longer on the canonical scale• use extended likelihood to estimate random parameters• use adjusted profile likelihood to estimate fixed parameters• but enhanced Laplace approximations available (Noh & Lee 06)

• augmented mean model now has different GLMs for the base GLM and the augmented units

• fitted using procedure RMGLM

..

Birds in Tasmania• HGLM

• base GLM – Poisson distribution, Log link• random terms – Gamma distribution, Log link• i.e. Poisson-Gamma conjugate HGLM

• random terms• site (Site)• treatment locations within site (SiteTreat)• sample plots within treatment locations (Plot)

• fixed terms• connected by habitat strips (Treatment)• vegetation type (Vegetation)• time of day (AM_vs_PM)

• data set used by Steve Candy, Forestry Tasmania, at the Workshop Extensions of Generalized Linear Models (Nelder, Payne & Candy) before the Australasian Genstat Conference, Surfers Paradise, 30 January 2001

..

Wild

life H

abita

t Stri

ps:

effe

ctive

ness

as n

ative

fore

st b

ird h

abita

t

WildlifeHabitatStrip ‘Treatment’

Sample point

Radiata Pine Plantation(1984)

Eucalypt Plantation(1987)

‘Control’Sample point

Retreat South Aerial Photography (1999)

Vegetation * Treatment * AM_vs_PM

mean model

dispersion model

estimates of parameters in the mean model

Vegetation * Treatment * AM_vs_PM

dispersion parameters

likelihood statistics

d.f.

scaled deviances

Wald tests

•no evidence of a 3-factor interaction..

Omit Vegetation.Treatment.AM_vs_PM

Wald tests and Change

• change deviance 2.60 on 2 d.f. for omitting Vegetation.Treatment.AM_vs_PM (c.f. Wald 2.58)

• next omit Vegetation.Treatment..

Omit Vegetation.Treatment


• change deviance 4.42 on 2 d.f. for omitting Vegetation.Treatment (c.f. Wald 4.49)

• next omit Vegetation.AM_PM..

Omit Vegetation.AM_PM


• change deviance 4.97 on 2 d.f. for omitting Vegetation.AM_PM (c.f. Wald 5.00)

• now study results..

Predicted means

Predicted means on natural scale

Predicted means treatment x time of day

Hierarchical generalized nonlinear models

• expected value E(y) = μlink η = g(μ)

distribution – Normal, Binomial, Poisson or Gamma (from exponential family)

linear predictor η = X β + ∑i Zi γirandom effects γi with either beta, Normal, gamma or inverse gamma distributions, and their own link functions

• nonlinear parameters in fixed terms in the linear predictor • X β = ∑ xi βi• but now some xi's are nonlinear functions of explanatory

variables and parameters that are to be estimated

• extension of generalized nonlinear models of Lane (1996)• constraint – available only for conjugate HGLM's..

Implementation – interlinked GLMs• fit nonlinear parameters by maximizing h-likelihood

of augmented mean model

• Hooded Parrot (Psephotus dissimilis)

• grass parrot in Northern Territory of Australia

• nests in termite mounds• nests also inhabited by

moth larvae that feed on nestling waste

Acknowledgement: S CooneyAustralian National University, Canberrahttp://www.anu.edu.au/BoZo/stuart/HPP.htm

Growth of Hooded Parrot nestlings

• investigate relationship between parrot and moth• beneficial, commensal or parasitic

• 41 nests located and monitored• each brood had between 1-7 chicks• treatments randomized to nests

• moth larvae left or experimentally removed from nest• weight of chicks measured over time• growth modelled over time by logistic curve

• weight = a + c / (1 + exp{–b × (age – m)})• model linear in a and c, nonlinear in b and m

• fit as HGNLM because• brood is a random effect• treatments are applied to complete broods

..

Growth of Hooded Parrot nestlings

Initial values from logistic curve

HGNLM, common A, B, C and M

HGNLM, common B, C and M

HGNLM, common B and M

HGNLM, common M

HGNLM, different A, B, C and M

Likelihood statistics

•no evidence of any treatment effects..

Standard curve (ignoring brood)

• suggestion of a treatment effect..

GNLM with Brood as a fixed term

• significant (but misleading) treatment effects..

Conclusions

• HGLMs (DHGLMs & HGNLMs) provide effective & appropriate models for non-Normal data with several sources of error

• the algorithms (and their GenStat implementation) are very efficient – and convenient to use

• the methodology is described in the book• Lee, Y., Nelder, J.A. & Pawitan, Y. (2006). Generalized Linear Models

with Random Effects: Unified Analysis via H-likelihood. CRC Press.

• the book examples are accessible in GenStat for Windows9th Editions onwards

• there are many recent extensions (+ corrections)• including HGNLMs, in GenStat for Windows 10th Edition• Wald tests and plots of predicted means in the 11th Edition

..

new ways to analyse counts and proportions from complex ... · regression guide figure 3.15....

Documents