Download - Google lme4
lme4: interface, testing, and community issues
Ben Bolker, McMaster UniversityDepartments of Mathematics & Statistics and Biology
15 April 2014
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
lme4
I R package for mixed models
I linear, generalized, nonlinear
I speed and generality
I alternatives (also seehttp://glmm.wikidot.com/pkg-comparison)
I R: MCMCglmm, glmmADMB, hglm, othersI other: AD Model Builder, Stata (GLAMM, xtmixed,
xtmelogit), AS-REML, MLWiN, HLM, SAS PROCGLIMMIX/MIXED/NLMIXED, NIMBLE (http://www.slideshare.net/dlebauer/de-valpine-nimble)
I Bayesian frameworks: INLA, BUGS (JAGS: glm module), Stan
Features
I formula interface
I scalar and vector random e�ects
I GLMMs: basic + user-speci�ed family/link functions
I extract deviance function
I standard accessors: �xed and random coe�cients, residuals etc
I predict and simulate methods
I likelihood pro�ling and parametric bootstrapping
Downstream packages
afex
agridat
AICcmodavg
aod
aods3
arm
BayesFactor
Bayesthresh
BBRecapture
benchmark
blme
boss
BradleyTerry2
car
catdata
clusterPower
DAAG difR
dlnm
doBy
effects
expp
ez
flexmix
gamm4
glmulti
gmodels
GWAF
HLMdiag
HSAUR
HSAUR2
influence.ME
irtrees
kulife
kyotil
languageR
lava
lme4
LMERConvenienceFunctions
lmerTest
longpower
lsmeans
mediation
MEMSS
metafor
Metatron
MethComp
mi
mice
miceadds
mixAK
mixlm
MixMAP
mlmRev
MPDiR
multcomp
multiDimBio
MuMIn
NanoStringNorm
nonrandom
ordinal
pamm
pan
papeR
PBImisc
pbkrtest
pedigreemm
phia
phmm
polytomous
prLogistic
R2admbR2STATS
RcmdrPlugin.NMBU
refund
RLRsim robustlmm
RVAideMemoire
SASmixed
sirt
spacomSPOT
Surrogate
texreg
TripleR
ZeligMultilevel
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
Challenges
I Wide range of users/developers
I Evolving goals
I R is a hacker language . . .
I choice of object-orientation systems: (S3/S4/ref class)fortunes::fortune(121)
Rolf Turner: If you want to simultaneously handcu�
yourself, strap yourself into a strait jacket, and tie
yourself in knots, and moreover write code which is
incomprehensible to the human mind, then S4
methods are indeed the way to go.
Goals
I Simplicity for end-users (formula interface)
I Flexibility for downstream developers (modular chunks)I wrappers (ez, afex)I inference and diagnostics (pbkrtest, lmerTest)I extended models (pedigreemm, blme)
I Modularity for core development/maintenance
I Stability
Layers
i linear algebra: RcppEigen/CHOLMOD
ii PWRSS/PIRLS computations
iii nonlinear optimization
iv API/formula interface, higher-level functions(pro�ling, bootstrap, etc.)
Modular structure
(g)lFormula formula plus data → model elements(model frame, X, ReTrms ={Zt, Lambdat, Lind . . . })
mk(Gl|L)merDevfun model elements → deviance function(layers i and ii)
optimize(Gl|L)mer deviance function + starting conditions →estimates of θ and β(layer iii)
mkMerMod optimization results → merMod object
getME general-purpose accessor function
Modularity in action
lmod <- lFormula(Reaction ~ Days + (Days | Subject),
sleepstudy)
names(lmod)
## [1] "fr" "X" "reTrms" "REML"
## [5] "formula"
devfun <- do.call(mkLmerDevfun, lmod)
(opt <- optimizeLmer(devfun))
## parameter estimates: 0.967 0.0152 0.231
## objective: 1744
## number of function evaluations: 98
result <- mkMerMod(environment(devfun), opt, lmod$reTrms,
fr = lmod$fr)
Fit with pseudo-�xed e�ects
lmod2 <- lFormula(Reaction ~ Days + (1 | Subject) +
(0 + Days | Subject), sleepstudy)
devfun2 <- do.call(mkLmerDevfun, lmod2)
tmpf <- function(th) devfun2(c(20, th))
minqa::bobyqa(par = 1, fn = tmpf, lower = 0)
## parameter estimates: 0.248
## objective: 1824
## number of function evaluations: 22
Is it working?
I most downstream packages successfully ported to v 1.0
I most users weaned from @ accessors (?)
I development seems easier
I will we be able to make large internal changes?
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
Design issues
I Prevent/warn of silly usageI Unidenti�able models
(e.g. rank-de�cient, single level per random e�ect)I Ill-advised models
(e.g. small number of random e�ect levels)
I Prevent/warn of �bad� �ts
Recent changes
I v. ???: move from nlminb to other default optimizers (nomore �false convergence� warnings)
I v. ???: introduce pre-�t checking
I v. 1.0-1: loosen pre-�t checks
I v. 1.0-5: introduce convergence checks
I soon: loosen/restructure convergence checks(use relative rather than absolute gradients)
Open questions: gradient, Hessian calculations?
Problems
I Computational overhead (e.g. rank-checking)
I Unusual use cases
I Detecting and identifying �tting problems
Model use issues
I Inference for mixed modelsis tough(e.g. the greatdegrees-of-freedom debate)
I Ethics: should you providequestionable, imperfect, orpoorly understoodmethods?(e.g. Wald intervals;standard errors onpredictions)
I . . . or should you let yourusers �ounder?
Roz Chast
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
Testing
I Computational core is all �oating-point
I Small di�erences between platforms, compilers, etc.. . . .
I . . . but there are many unstable cases
I have to go beyond unit tests
I test examples are large, slow, and sometimes con�dential
Outline
Introduction
Interface issues
User guidance
Testing
Future directions
Model extensions
I non-linear �tting (present but underdeveloped)
I negative binomial, zero-in�ated models:EM/iterative algorithms or add to level III
I �exible variance structures: flexLambda branch
I structure in residuals (�R-side�)
Open questions
I restore post hoc MCMC sampling?other (faster) methods for inference and
I limitations of formula interface
I how important is GHQ?
The really big picture
I Switch to Julia, or ??
I CommoditiesI Computational linear algebraI Nonlinear optimizers
I Language-switching: interface friction
I Advantages of established framework