the msm package - leg-ufprmsc:msm.pdf · the msm package november 23, 2006 version 0.7 date...

The msm PackageNovember 23, 2006

Version 0.7

Date 2006-11-21

Title Multi-state Markov and hidden Markov models in continuous time

Author Christopher Jackson <[email protected]>

Maintainer Christopher Jackson <[email protected]>

Description Functions for fitting general continuous-time Markov and hidden Markov multi-statemodels to longitudinal data. Both Markov transition rates and the hidden Markov output processcan be modelled in terms of covariates. A variety of observation schemes are supported,including processes observed at arbitrary times, completely-observed processes, and censoredstates.

License GPL version 2 or newer

R topics documented:MatrixExp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2aneur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3boot.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4bos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6coef.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7crudeinits.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8deltamethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9ematrix.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11fev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12hazard.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13heart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14hmm-dists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14logLik.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17medists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20odds.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29pexp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1

2 MatrixExp

plot.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31pmatrix.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32pmatrix.piecewise.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34prevalence.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35psor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37qmatrix.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38qratio.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39sim.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40simmulti.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42sojourn.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44statetable.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45msm.summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46surface.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47tnorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48totlos.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50transient.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51viterbi.msm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Index 53

MatrixExp Matrix exponential

Description

Calculates the exponential of a square matrix.

Usage

MatrixExp(mat, t = 1, n = 20, k = 3, method="pade")

Arguments

mat A square matrix

t An optional scaling factor for the eigenvalues of mat

n Number of terms in the series approximation to the exponential

k Underflow correction factor, for the series approximation

method "pade" for the Pade approximation, or "series" for the power series ap-proximation.

aneur 3

Details

The exponential E of a square matrix M is calculated as

E = U exp(D)U−1

where D is a diagonal matrix with the eigenvalues of M on the diagonal, exp(D) is a diagonalmatrix with the exponentiated eigenvalues of M on the diagonal, and U is a matrix whose columnsare the eigenvectors of M .

This method of calculation is used if M has distinct eigenvalues. I If M has repeated eigenvalues,then its eigenvector matrix may be non-invertible. In this case, the matrix exponential is calculatedusing the Pade approximation defined by Moler and van Loan (2003), or the less robust power seriesapproximation,

exp(M) = I + M + M2/2 + M3/3! + M4/4! + ...

For a continuous-time homogeneous Markov process with transition intensity matrix Q, the proba-bility of occupying state s at time u + t conditional on occupying state r at time u is given by the(r, s) entry of the matrix exp(tQ).

The implementation of the Pade approximation was taken from JAGS by Martyn Plummer (http://www-fis.iarc.fr/~martyn/software/jags).

The series approximation method was adapted from the corresponding function in Jim Lindsey’s Rpackage rmutil (http://popgen.unimaas.nl/~jlindsey/rcode.html).

Value

The exponentiated matrix exp(mat).

References

Cox, D. R. and Miller, H. D. The theory of stochastic processes, Chapman and Hall, London (1965)

Moler, C and van Loan, C (2003). Nineteen dubious ways to compute the exponential of a matrix,twenty-five years later. SIAM Review 45, 3–49.At http://epubs.siam.org/sam-bin/dbq/article/41801

aneur Aortic aneurysm progression data

Description

This dataset contains longitudinal measurements of grades of aortic aneurysms, measured by ultra-sound examination of the diameter of the aorta.

Usage

data(aneur)

http://www-fis.iarc.fr/~martyn/software/jags

http://www-fis.iarc.fr/~martyn/software/jags

http://popgen.unimaas.nl/~jlindsey/rcode.html

http://epubs.siam.org/sam-bin/dbq/article/41801

4 boot.msm

Format

A data frame containing 4337 rows, with each row corresponding to an ultrasound scan from oneof 838 men over 65 years of age.

ptnum (numeric) Patient identification numberage (numeric) Recipient age at examination (years)

diam (numeric) Aortic diameterstate (numeric) State of aneurysm.

The states represent successive degrees of aneurysm severity, as indicated by the aortic diameter.

State 1 Aneurysm-free < 30 cmState 2 Mild aneurysm 30-44 cmState 3 Moderate aneurysm 45-54 cmState 4 Severe aneurysm > 55 cm

683 of these men were aneurysm-free at age 65 and were re-screened every two years. The remain-ing men were aneurysmal at entry and had successive screens with frequency depending on the stateof the aneurysm. Severe aneurysms are repaired by surgery.

Source

The Chichester, U.K. randomised controlled trial of screening for abdominal aortic aneurysms byultrasonography.

References

Jackson, C.H., Sharples, L.D., Thompson, S.G. and Duffy, S.W. and Couto, E. Multi-state Markovmodels for disease progression with classification error. The Statistician, 52(2): 193–209 (2003)

Couto, E. and Duffy, S. W. and Ashton, H. A. and Walker, N. M. and Myles, J. P. and Scott, R.A. P. and Thompson, S. G. (2002) Probabilities of progression of aortic aneurysms: estimates andimplications for screening policy Journal of Medical Screening 9(1):40–42

boot.msm Bootstrap resampling for multi-state models

Description

Draw a number of bootstrap resamples, refit a msm model to the resamples, and calculate statisticson the refitted models.

Usage

boot.msm(x, stat=pmatrix.msm, B=500, file=NULL)

boot.msm 5

Arguments

x A fitted msm model, as output by msm.

stat A function to call on each refitted msm model. By default this is pmatrix.msm,returning the transition probability matrix in one time unit. If NULL then nofunction is computed.

B Number of bootstrap resamples.

file Name of a file in which to save partial results after each replicate. This issaved using save and can be restored using load, producing an object calledboot.list containing the partial results.

Details

The bootstrap datasets are computed by resampling independent transitions between pairs of states(for non-hidden models without censoring), or independent patient series (for hidden models ormodels with censoring).

Confidence intervals or standard errors for the corresponding statistic can be calculated by sum-marising the returned list of B replicated outputs. This is currently implemented for the transitionprobability matrix, see pmatrix.msm. At the moment, for other outputs, users will have to writetheir own code to summarise the output of boot.msm.

Most of msm’s output functions present confidence intervals based on asymptotic standard errorscalculated from the Hessian. These are expected to be underestimates of the true standard errors(Cramer-Rao lower bound). Bootstrapping may give a more accurate estimate of the uncertainty.

All objects used in the original call to msm which produced x, such as the qmatrix, should bein the working environment, or else boot.msm will produce an “object not found” error. Thisenables boot.msm to refit the original model to the replicate datasets.

If stat is NULL, then B different msmmodel objects will be stored in memory. This is unadvisable,as msm objects tend to be large, as they contain the original data used for the msm fit, so this will bewasteful of memory.

To specify more than one statistic, write a function consisting of a list of different function calls, forexample,

stat = function(x) list (pmatrix.msm(x, t=1), pmatrix.msm(x, t=2))

Value

A list with B components, containing the result of calling function stat on each of the refittedmodels. If stat is NULL, then each component just contains the refitted model. If one of the Bmodel fits was unsuccessful and resulted in an error, then the corresponding list component willcontain the error message.

Author(s)

C.H.Jackson <[email protected]>

References

Efron, B. and Tibshirani, R.J. (1993) An Introduction to the Bootstrap, Chapman and Hall.

6 bos

See Also

pmatrix.msm, totlos.msm

Examples

## Not run:## Psoriatic arthritis exampledata(psor)psor.q <- rbind(c(0,0.1,0,0),c(0,0,0.1,0),c(0,0,0,0.1),c(0,0,0,0))psor.msm <- msm(state ~ months, subject=ptnum, data=psor, qmatrix = psor.q, covariates = ~ollwsdrt+hieffusn, constraint = list(hieffusn=c(1,1,1),ollwsdrt=c(1,1,2)), control = list(REPORT=1,trace=2), method="BFGS")## Bootstrap the baseline transition intensity matrix. This will take a long time.q.list <- boot.msm(psor.msm, function(x)x$Qmatrices$baseline)## Manipulate the resulting list of matrices to calculate bootstrap standard errors.apply(array(unlist(q.list), dim=c(4,4,5)), c(1,2), sd)## Similarly calculate a bootstrap 95% confidence intervalapply(array(unlist(q.list), dim=c(4,4,5)), c(1,2), function(x)quantile(x, c(0.025, 0.975)))## Bootstrap standard errors are larger than the asymptotic standard errors calculated from the Hessianpsor.msm$QmatricesSE$baseline

## End(Not run)

bos Bronchiolitis obliterans syndrome after lung transplants

Description

A dataset containing histories of bronchiolitis obliterans syndrome (BOS) from lung transplantrecipients. BOS is a chronic decline in lung function, often observed after lung transplantation. Thecondition is classified into four stages of severity: none, mild, moderate and severe.

Usage

data(bos)

Format

A data frame containing 638 rows, grouped by patient, including histories of 204 patients. Thefirst observation for each patient is defined to be stage 1, no BOS, at six months after transplant.Subsequent observations denote the entry times into stages 2, 3, 4, representing mild, moderate andsevere BOS respectively, and stage 5, representing death.

ptnum (numeric) Patient identification numbertime (numeric) Months after transplant

state (numeric) BOS state entered at this time

Details

The entry time of each patient into each stage of BOS was estimated by clinicians, based on theirhistory of lung function measurements and acute rejection and infection episodes. BOS is only

coef.msm 7

assumed to occur beyond six months after transplant. In the first six months the function of eachpatient’s new lung stabilises. Subsequently BOS is diagnosed by comparing the lung functionagainst the "baseline" value.

Source

Papworth Hospital, U.K.

References

Heng. D. et al. (1998). Bronchiolitis Obliterans Syndrome: Incidence, Natural History, Prognosis,and Risk Factors. Journal of Heart and Lung Transplantation 17(12)1255–1263.

coef.msm Extract model coefficients

Description

Extract the estimated log transition intensities and the corresponding linear effects of each covariate.

Usage

## S3 method for class 'msm':coef(object, ...)

Arguments

object A fitted multi-state model object, as returned by msm.... (unused) further arguments passed to or from other methods.

Value

If there is no misclassification, coef.msm returns a list of matrices. The first component, labelledlogbaseline, is a matrix containing the estimated transition intensities on the log scale with anycovariates fixed at their means in the data. Each remaining component is a matrix giving the lineareffects of the labelled covariate on the matrix of log intensities.

For misclassification models, coef.msm returns a list of lists. The first component, Qmatrices,is a list of matrices as described in the previous paragraph. The additional component Ematricesis a list of similar format containing the logit-misclassification probabilities and any estimated co-variate effects.

Author(s)

C. H. Jackson 〈[email protected]〉

See Also

msm

8 crudeinits.msm

crudeinits.msm Calculate crude initial values for transition intensities

Description

Calculates crude initial values for transition intensities by assuming that the data represent the exacttransition times of the Markov process.

Usage

crudeinits.msm(formula, subject, qmatrix, data=NULL, censor=NULL, censor.states=NULL)

Arguments

formula A formula giving the vectors containing the observed states and the correspond-ing observation times. For example,state ~ time

Observed states should be in the set 1, ..., n, where n is the number ofstates.

subject Vector of subject identification numbers for the data specified by formula. Ifmissing, then all observations are assumed to be on the same subject. Thesemust be sorted so that all observations on the same subject are adjacent.

qmatrix Matrix of indicators for the allowed transitions. An initial value will be esti-mated for each value of qmatrix that is greater than zero. Transitions are takenas disallowed for each entry of qmatrix that is 0.

data An optional data frame in which the variables represented by subject andstate can be found.

censor A state, or vector of states, which indicates censoring. See msm.censor.states

Specifies the underlying states which censored observations can represent. Seemsm.

Details

Suppose we want a crude estimate of the transition intensity qrs from state r to state s. If we observenrs transitions from state r to state s, and a total of nr transitions from state r, then qrs/qrr can beestimated by nrs/nr. Then, given a total of Tr years spent in state r, the mean sojourn time 1/qrr

can be estimated as Tr/nr. Thus, nrs/Tr is a crude estimate of qrs.

If the data do represent the exact transition times of the Markov process, then these are the exactmaximum likelihood estimates.

Observed transitions which are incompatible with the given qmatrix are ignored. Censored statesare ignored.

Value

The estimated transition intensity matrix. This can be used as the qmatrix argument to msm.

deltamethod 9

Author(s)


See Also

statetable.msm

Examples

data(heart)twoway4.q <- rbind(c(-0.5, 0.25, 0, 0.25), c(0.166, -0.498, 0.166, 0.166),c(0, 0.25, -0.5, 0.25), c(0, 0, 0, 0))statetable.msm(state, PTNUM, data=heart)crudeinits.msm(state ~ years, PTNUM, data=heart, qmatrix=twoway4.q)

deltamethod The delta method

Description

Delta method for approximating the standard error of a transformation g(X) of a random variableX = (x1, x2, . . .), given estimates of the mean and covariance matrix of X .

Usage

deltamethod(g, mean, cov, ses=TRUE)

Arguments

g A formula representing the transformation. The variables must be labelled x1,x2,... For example,

~ 1 / (x1 + x2)

If the transformation returns a vector, then a list of formulae representing (g1, g2, . . .)can be provided, for example

list( ~ x1 + x2, ~ x1 / (x1 + x2) )

mean The estimated mean of X

cov The estimated covariance matrix of X

ses If TRUE, then the standard errors of g1(X), g2(X), . . . are returned. Otherwisethe covariance matrix of g(X) is returned.

10 deltamethod

Details

The delta method expands a differentiable function of a random variable about its mean, usuallywith a first-order Taylor approximation, and then takes the variance. For example, an approximationto the covariance matrix of g(X) is given by

Cov(g(X)) = g′(µ)Cov(X)[g′(µ)]T

where µ is an estimate of the mean of X .

A limitation of this function is that variables created by the user are not visible within the formulag. To work around this, it is necessary to build the formula as a string, using functions such assprintf, then to convert the string to a formula using as.formula. See the example below.

Value

A vector containing the standard errors of g1(X), g2(X), . . . or a matrix containing the covarianceof g(X).

Author(s)


References

Oehlert, G. W. A note on the delta method. American Statistician 46(1), 1992

Examples

## Simple linear regression, E(y) = alpha + beta xx <- 1:100y <- rnorm(100, 4*x, 5)toy.lm <- lm(y ~ x)estmean <- coef(toy.lm)estvar <- summary(toy.lm)$cov.unscaled * summary(toy.lm)$sigma^2

## Estimate of (1 / (alphahat + betahat))1 / (estmean[1] + estmean[2])## Approximate standard errordeltamethod (~ 1 / (x1 + x2), estmean, estvar)

## We have a variable z we would like to use within the formula.z <- 1## deltamethod (~ z / (x1 + x2), estmean, estvar) will not work.## Instead, build up the formula as a string, and convert to a formula.form <- sprintf("~ %f / (x1 + x2)", z)formdeltamethod(as.formula(form), estmean, estvar)

ematrix.msm 11

ematrix.msm Misclassification probability matrix

Description

Extract the estimated misclassification probability matrix, and corresponding confidence intervals,from a fitted multi-state model at a given set of covariate values.

Usage

ematrix.msm(x, covariates="mean", cl=0.95)

Arguments

x A fitted multi-state model, as returned by msm

covariates The covariate values for which to estimate the misclassification probability ma-trix. This can either be:

the string "mean", denoting the means of the covariates in the data (this is thedefault),

the number 0, indicating that all the covariates should be set to zero,

or a list of values, with optional names. For examplelist (60, 1)

where the order of the list follows the order of the covariates originally given inthe model formula, or a named list,list (age = 60, sex = 1)

cl Width of the symmetric confidence interval to present. Defaults to 0.95.

Details

Misclassification probabilities and covariate effects are estimated on the logit scale by msm. Acovariance matrix is estimated from the Hessian of the maximised log-likelihood. From these, thedelta method is used to obtain standard errors of the probabilities on the natural scale at arbitrarycovariate values. Confidence intervals are estimated by assuming normality on the logit scale.

Value

A list with components:

estimate Estimated misclassification probability matrix.

SE Corresponding approximate standard errors.

L Lower confidence limits.

U Upper confidence limits.

12 fev

The default print method for objects returned by ematrix.msm presents estimates and confidencelimits. To present estimates and standard errors, do something like

ematrix.msm(x)[c("estimates","SE")]

Author(s)


See Also

qmatrix.msm

fev FEV1 measurements from lung transplant recipients

Description

A series of measurements of the forced expiratory volume in one second (FEV1) from lung trans-plant recipients, from six months onwards after their transplant.

Usage

data(fev)

Format

A data frame containing 5896 rows. There are 204 patients, the rows are grouped by patient num-ber and ordered by days after transplant. Each row represents an examination and containing anadditional covariate.

ptnum (numeric) Patient identification number.days (numeric) Examination time (days after transplant).fev (numeric) Percentage of baseline FEV1. A code of 999 indicates the patient’s date of death.

acute (numeric) 0/1 indicator for whether the patient suffered an acute infection or rejectionwithin 14 days of the visit.

Details

A baseline "normal" FEV1 for each individual is calculated using measurements from the first sixmonths after transplant. After six months, as presented in this dataset, FEV1 is expressed as apercentage of the baseline value.

FEV1 is monitored to diagnose bronchiolitis obliterans syndrome (BOS), a long-term lung functiondecline, thought to be a form of chronic rejection. Acute rejections and infections also affect thelung function in the short term.

hazard.msm 13

Source


References

Jackson, C.H. and Sharples, L.D. Hidden Markov models for the onset and progression of bron-chiolitis obliterans syndrome in lung transplant recipients Statistics in Medicine, 21(1): 113–128(2002).

hazard.msm Calculate tables of hazard ratios for covariates on transition intensi-ties

Description

Hazard ratios are computed by exponentiating the estimated covariate effects on the log-transitionintensities. This function is called by summary.msm.

Usage

hazard.msm(x, hazard.scale = 1, cl = 0.95)

Arguments

x Output from msm representing a fitted multi-state model.

hazard.scale Vector with same elements as number of covariates on transition rates. Corre-sponds to the increase in each covariate used to calculate its hazard ratio. De-faults to all 1.


Value

A list of tables containing hazard ratio estimates, one table for each covariate. Each table hasthree columns, containing the hazard ratio, and an approximate upper and lower confidence limitrespectively (assuming normality on the log scale), for each Markov chain transition intensity.

Author(s)


See Also

msm, summary.msm, odds.msm

14 hmm-dists

heart Heart transplant monitoring data

Description

A series of approximately yearly angiographic examinations of heart transplant recipients. The stateat each time is a grade of cardiac allograft vasculopathy (CAV), a deterioration of the arterial walls.

Usage

data(heart)

Format

A data frame containing 2846 rows. There are 622 patients, the rows are grouped by patient numberand ordered by years after transplant, with each row representing an examination and containingadditional covariates.

PTNUM (numeric) Patient identification numberage (numeric) Recipient age at examination (years)

years (numeric) Examination time (years after transplant)dage (numeric) Age of heart donor (years)sex (character) sex (0=male, 1=female)

pdiag (character) Primary diagnosis (reason for transplant)IHD=ischaemic heart disease, IDC=idiopathic dilated cardiomyopathy.

cumrej (numeric) Cumulative number of acute rejection episodesstate (numeric) State at the examination.

State 1 represents no CAV, state 2 is mild/moderate CAVand state 3 is severe CAV. State 4 indicates death.

Source


References

Sharples, L.D. and Jackson, C.H. and Parameshwar, J. and Wallwork, J. and Large, S.R. (2003). Di-agnostic accuracy of coronary angiopathy and risk factors for post-heart-transplant cardiac allograftvasculopathy. Transplantation 76(4):679-82

hmm-dists Hidden Markov model constructors

hmm-dists 15

Description

These functions are used to specify the distribution of the response conditionally on the underlyingstate in a hidden Markov model. A list of these function calls, with one component for each state,should be used for the hmodel argument to msm. The initial values for the parameters of thedistribution should be given as arguments.

Usage

hmmCat(prob, basecat)hmmIdent(x)hmmUnif(lower, upper)hmmNorm(mean, sd)hmmLNorm(meanlog, sdlog)hmmExp(rate)hmmGamma(shape, rate)hmmWeibull(shape, scale)hmmPois(rate)hmmBinom(size, prob)hmmTNorm(mean, sd, lower, upper)hmmMETNorm(mean, sd, lower, upper, sderr, meanerr=0)hmmMEUnif(lower, upper, sderr, meanerr=0)hmmNBinom(disp, prob)

Arguments

hmmCat represents a categorical response distribution on the set 1, 2, ..., length(prob).The Markov model with misclassification is an example of this type of model. The categories inthis case are (some subset of) the observed states.

The hmmIdent distribution is used for underlying states which are observed exactly without error.

hmmUnif, hmmNorm, hmmLNorm, hmmExp, hmmGamma, hmmWeibull, hmmPois, hmmBinom,hmmTNorm and hmmNBinom represent Uniform, Normal, log-Normal, exponential, Gamma, Weibull,Poisson, Binomial, truncated Normal and negative binomial distributions, respectively, with pa-rameterisations the same as the default parameterisations in the corresponding base R distributionfunctions.

The hmmMETNorm and hmmMEUnif distributions are truncated Normal and Uniform distributions,but with additional Normal measurement error on the response. These are generalisations of thedistributions proposed by Satten and Longini (1994) for modelling the progression of CD4 cellcounts in monitoring HIV disease. See medists for density, distribution, quantile and randomgeneration functions for these distributions. See also tnorm for density, distribution, quantile andrandom generation functions for the truncated Normal distribution.

prob (hmmCat) Vector of probabilities of observing category 1, 2, ..., length(prob)respectively. Or the probability governing a binomial or negative binomial dis-tribution.

basecat (hmmCat) Category which is considered to be the "baseline", so that duringestimation, the probabilities are parameterised as probabilities relative to thisbaseline category. By default, the category with the greatest probability is usedas the baseline.

16 hmm-dists

x (hmmIdent) Code in the data which denotes the exactly-observed state.

mean (hmmNorm,hmmLNorm,hmmTNorm) Mean defining a Normal, or truncatedNormal distribution.

sd (hmmNorm,hmmLNorm,hmmTNorm) Standard deviation defining a Normal,or truncated Normal distribution.

meanlog (hmmNorm,hmmLNorm,hmmTNorm) Mean on the log scale, for a log Normaldistribution.

sdlog (hmmNorm,hmmLNorm,hmmTNorm) Standard deviation on the log scale, fora log Normal distribution.

rate (hmmPois,hmmExp,hmmGamma) Rate of a Poisson, Exponential or Gammadistribution (see dpois, dexp, dgamma).

shape (hmmPois,hmmExp,hmmGamma) Shape parameter of a Gamma or Weibulldistribution (see dgamma, dweibull).

scale (hmmGamma) Shape parameter of a Gamma distribution (see dgamma).

size Order of a Binomial distribution (see dbinom).

disp Dispersion parameter of a negative binomial distribution, also called size ororder. (see dnbinom).

lower (hmmUnif,hmmTNorm,hmmMEUnif) Lower limit for an Uniform or trun-cated Normal distribution.

upper (hmmUnif,hmmTNorm,hmmMEUnif) Upper limit for an Uniform or trun-cated Normal distribution.

sderr (hmmMETNorm,hmmUnif) Standard deviation of the Normal measurement er-ror distribution.

meanerr (hmmMETNorm,hmmUnif) Additional shift in the measurement error, fixed to0 by default. This may be modelled in terms of covariates.

Details

See the PDF manual ‘msm-manual.pdf’ in the ‘doc’ subdirectory for algebraic definitions of allthese distributions.

Parameters which can be modelled in terms of covariates, on the scale of a link function, are asfollows.

PARAMETER NAME LINK FUNCTIONmean identitymeanlog identityrate logscale logmeanerr identityprob logit

Parameters basecat, lower, upper, size, meanerr are fixed at their initial values.All other parameters are estimated while fitting the hidden Markov model, unless the appropriatefixedpars argument is supplied to msm.

logLik.msm 17

For categorical response distributions (hmmCat) the outcome probabilities initialized to zero arefixed at zero, and the probability corresponding to basecat is fixed to one minus the sum of theremaining probabilities. These remaining probabilities are estimated, and can be modelled in termsof covariates.

Value

Each function returns an object of class hmodel, which is a list containing information about themodel. The only component which may be useful to end users is r, a function of one argument nwhich returns a random sample of size n from the given distribution.

Author(s)


References

Satten, G.A. and Longini, I.M. Markov chains with measurement error: estimating the ’true’ courseof a marker of the progression of human immunodeficiency virus disease (with discussion) AppliedStatistics 45(3): 275-309 (1996).

Jackson, C.H. and Sharples, L.D. Hidden Markov models for the onset and progresison of bron-chiolitis obliterans syndrome in lung transplant recipients Statistics in Medicine, 21(1): 113–128(2002).

Jackson, C.H., Sharples, L.D., Thompson, S.G. and Duffy, S.W. and Couto, E. Multi-state Markovmodels for disease progression with classification error. The Statistician, 52(2): 193–209 (2003).

See Also

msm

logLik.msm Extract model log-likelihood

Description

Extract the log-likelihood and the number of parameters of a model fitted with msm.

Usage

## S3 method for class 'msm':logLik(object, ...)

Arguments

object A fitted multi-state model object, as returned by msm.

... (unused) further arguments passed to or from other methods.

18 medists

Value

The minus log-likelihood of the model represented by ’object’ evaluated at the maximum likelihoodestimates.

Author(s)


See Also

msm

medists Measurement error distributions

Description

Truncated Normal and Uniform distributions, where the response is also subject to a Normallydistributed measurement error.

Usage

dmenorm(x, mean=0, sd=1, lower=-Inf, upper=Inf, sderr=0, meanerr=0, log = FALSE)pmenorm(q, mean=0, sd=1, lower=-Inf, upper=Inf, sderr=0, meanerr=0,

lower.tail = TRUE, log.p = FALSE)qmenorm(p, mean=0, sd=1, lower=-Inf, upper=Inf, sderr=0, meanerr=0,

lower.tail = TRUE, log.p = FALSE)rmenorm(n, mean=0, sd=1, lower=-Inf, upper=Inf, sderr=0, meanerr=0)dmeunif(x, lower=0, upper=1, sderr=0, meanerr=0, log = FALSE)pmeunif(q, lower=0, upper=1, sderr=0, meanerr=0, lower.tail = TRUE, log.p = FALSE)qmeunif(p, lower=0, upper=1, sderr=0, meanerr=0, lower.tail = TRUE, log.p = FALSE)rmeunif(n, lower=0, upper=1, sderr=0, meanerr=0)

Arguments

x,q vector of quantiles.p vector of probabilities.n number of observations. If length(n) > 1, the length is taken to be the

number required.mean vector of means.sd vector of standard deviations.lower lower truncation point.upper upper truncation point.sderr Standard deviation of measurement error distribution.meanerr Optional shift for the measurement error distribution.log, log.p logical; if TRUE, probabilities p are given as log(p).lower.tail logical; if TRUE (default), probabilities are P [X <= x], otherwise, P [X > x].

medists 19

Details

The normal distribution with measurement error has density

Φ(u, µ2, σ3) − Φ(l, µ2, σ3)Φ(u, µ0, σ0) − Φ(l, µ0, σ0)

φ(x, µ0 + µε, σ2)

whereσ2

2 = σ20 + σ2

ε ,

σ3 = σ0σε/σ2,

µ2 = (x − µε)σ20 + µ0σ

2ε ,

µ0 is the mean of the original Normal distribution before truncation,σ0 is the corresponding standard deviation,u is the upper truncation point,l is the lower truncation point,σε is the standard deviation of the additional measurement error,µε is the mean of the measurement error (usually 0).φ(x) is the density of the corresponding normal distribution, andΦ(x) is the distribution function of the corresponding normal distribution.

The uniform distribution with measurement error has density

(Φ(x, µε + l, σε) − Φ(x, µε + u, σε))/(u − l)

These are calculated from the original truncated Normal or Uniform density functions f(.|µ, σ, l, u)as ∫

f(y|µ, σ, l, u)φ(x, y + µε, σε)dy

If sderr and meanerr are not specified they assume the default values of 0, representing nomeasurement error variance, and no constant shift in the measurement error, respectively.

Therefore, for example with no other arguments, dmenorm(x), is simply equivalent to dtnorm(x),which in turn is equivalent to dnorm(x).

These distributions were used by Satten and Longini (1996) for CD4 cell counts conditionally onhidden Markov states of HIV infection, and later by Jackson and Sharples (2002) for FEV1 mea-surements conditionally on states of chronic lung transplant rejection.

These distribution functions are just provided for convenience, and are not optimised for numericalaccuracy. To fit a hidden Markov model with these response distributions, use a hmmMETNorm orhmmMEUnif constructor. See the hmm-dists help page for further details.

Value

dmenorm, dmeunif give the density, pmenorm, pmeunif give the distribution function, qmenorm,qmeunif give the quantile function, and rmenorm, rmeunif generate random deviates, for theNormal and Uniform versions respectively.

20 msm

Author(s)


References

Satten, G.A. and Longini, I.M. Markov chains with measurement error: estimating the ’true’ courseof a marker of the progression of human immunodeficiency virus disease (with discussion) AppliedStatistics 45(3): 275-309 (1996)


See Also

dnorm, dunif, dtnorm

Examples

## what does the distribution look like?x <- seq(50, 90, by=1)plot(x, dnorm(x, 70, 10), type="l", ylim=c(0,0.06)) ## standard Normallines(x, dtnorm(x, 70, 10, 60, 80), type="l") ## truncated Normal## truncated Normal with small measurement errorlines(x, dmenorm(x, 70, 10, 60, 80, sderr=3), type="l")

msm Multi-state Markov and hidden Markov models in continuous time

Description

Fit a continuous-time Markov or hidden Markov multi-state model by maximum likelihood. Obser-vations of the process can be made at arbitrary times, or the exact times of transition between statescan be known. Covariates can be fitted to the Markov chain transition intensities or to the hiddenMarkov observation process.

Usage

msm ( formula, subject=NULL, data = list(), qmatrix, gen.inits = FALSE,ematrix=NULL, hmodel=NULL, obstype=NULL,covariates = NULL, covinits = NULL, constraint = NULL,misccovariates = NULL, misccovinits = NULL, miscconstraint = NULL,hcovariates = NULL, hcovinits = NULL, hconstraint = NULL,qconstraint=NULL, econstraint=NULL, initprobs = NULL, est.initprobs=FALSE,death = FALSE, exacttimes = FALSE, censor=NULL,censor.states=NULL, cl = 0.95, fixedpars = NULL, center=TRUE,opt.method=c("optim","nlm"), hessian=TRUE, use.deriv=FALSE,deriv.test=FALSE, analyticp=TRUE, ... )

msm 21

Arguments

formula A formula giving the vectors containing the observed states and the correspond-ing observation times. For example,state ~ time

Observed states should be in the set 1, ..., n, where n is the number ofstates.

subject Vector of subject identification numbers for the data specified by formula. Ifmissing, then all observations are assumed to be on the same subject. Thesemust be sorted so that all observations on the same subject are adjacent.

data Optional data frame in which to interpret the variables supplied in formula,subject, covariates, misccovariates, hcovariates and obstype.

qmatrix Initial transition intensity matrix of the Markov chain. If an instantaneous tran-sition is not allowed from state r to state s, then qmatrix should have (r, s)entry 0, otherwise it should be non-zero. Any diagonal entry of qmatrix isignored, as it is constrained to be equal to minus the sum of the rest of the row.For example,

rbind( c( 0, 0.1, 0.01 ), c( 0.1, 0, 0.2 ), c( 0, 0,0 ) )

represents a ’health - disease - death’ model, with transition intensities 0.1 fromhealth to disease, 0.01 from health to death, 0.1 from disease to health, and 0.2from disease to death. The initial intensities given here are with any covariatesset to their means in the data (or set to zero, if center = FALSE).

gen.inits If TRUE, then initial values for the transition intensities are estimated by assum-ing that the data represent the exact transition times of the process. The non-zeroentries of the supplied qmatrix are assumed to indicate the allowed transitionsof the model.

ematrix If misclassification between states is to be modelled, this should be a matrix ofinitial values for the misclassification probabilities. The rows represent underly-ing states, and the columns represent observed states. If an observation of states is not possible when the subject occupies underlying state r, then ematrixshould have (r, s) entry 0. Otherwise ematrix should have (r, s) entry corre-sponding to the probability of observing s conditionally on occupying true stater. The diagonal of ematrix is ignored, as rows are constrained to sum to 1.For example,

rbind( c( 0, 0.1, 0 ), c( 0.1, 0, 0.1 ), c( 0, 0.1, 0) )

represents a model in which misclassifications are only permitted between adja-cent states.For an alternative way of specifying misclassification models, see hmodel.

hmodel Specification of the hidden Markov model. This should be a list of return valuesfrom the constructor functions described in the hmm-dists help page. Each

22 msm

element of the list corresponds to the outcome model conditionally on the cor-responding underlying state.For example, consider a three-state hidden Markov model. Suppose the ob-servations in underlying state 1 are generated from a Normal distribution withmean 100 and standard deviation 16, while observations in underlying state 2are Normal with mean 54 and standard deviation 18. Observations in state 3,representing death, are exactly observed, and coded as 999 in the data. Thismodel is specified ashmodel = list(hmmNorm(mean=100, sd=16), hmmNorm(mean=54,sd=18), hmmIdent(999))

The mean and standard deviation parameters are estimated starting from theseinitial values. See the hmm-dists help page for details of the constructorfunctions for each available distribution.A misclassification model, that is, a hidden Markov model where the outcomesare misclassified observations of the underlying states, can either be specifiedusing a list of hmmCat objects, or by using an ematrix as in previous versionsof msm.For example,

ematrix = rbind( c( 0, 0.1, 0, 0 ), c( 0.1, 0, 0.1, 0), c( 0, 0.1, 0, 0), c( 0, 0, 0, 0) )

is equivalent to

hmodel = list( hmmCat(prob=c(0.9, 0.1, 0, 0)), hmmCat(prob=c(0.1,0.8, 0.1, 0)), hmmCat(prob=c(0, 0.1, 0.9, 0)), hmmIdent())

obstype A vector specifying the observation scheme for each row of the data. This canbe included in the data frame data along with the state, time, subject IDs andcovariates. Its elements should be either 1, 2 or 3, meaning as follows:

1 An observation of the process at an arbitrary time (a "snapshot" of the process)2 An exact transition time, with the state at the previous observation retained

until the current observation.3 An exact transition time, but the state at the instant before entering this state is

unknown. A common example is death times in studies of chronic diseases.

If obstype is not specified, this defaults to all 1. If obstype is a singlenumber, all observations are assumed to be of this type.This is a generalisation of the death and exacttimes arguments to allowdifferent schemes per observation.exacttimes=TRUE specifies that all observations are of obstype 2.death = death.states specifies that all observations of death.statesare of type 3. death = TRUE specifies that all observations in the final ab-sorbing state are of type 3.

covariates Formula representing the covariates on the transition intensities via a log-linearmodel. For example,~ age + sex + treatment

msm 23

covinits Initial values for log-linear effects of covariates on the transition intensities. Thisshould be a named list with each element corresponding to a covariate. A singleelement contains the initial values for that covariate on each transition intensity,reading across the rows in order. For a pair of effects constrained to be equal,the initial value for the first of the two effects is used.For example, for a model with the above qmatrix and age and sex covariates,the following initialises all covariate effects to zero apart from the age effect onthe 2-1 transition, and the sex effect on the 1-3 transition. covinits =list(sex=c(0, 0, 0.1, 0), age=c(0, 0.1, 0, 0))

For factor covariates, name each level by concatenating the name of the co-variate with the level name, quoting if necessary. For example, for a covariateagegroup with three levels 0-15, 15-60, 60-, use something likecovinits = list("agegroup15-60"=c(0, 0.1, 0, 0), "agegroup60-

"=c(0.1, 0.1, 0, 0))

If not specified or wrongly specified, initial values are assumed to be zero.

constraint A list of one vector for each named covariate. The vector indicates which co-variate effects on intensities are constrained to be equal. Take, for example, amodel with five transition intensities and two covariates. Specifying

constraint = list (age = c(1,1,1,2,2), treatment = c(1,2,3,4,5))

constrains the effect of age to be equal for the first three intensities, and equalfor the fourth and fifth. The effect of treatment is assumed to be different foreach intensity. Any vector of increasing numbers can be used as indicators. Theintensity parameters are assumed to be ordered by reading across the rows of thetransition matrix, starting at the first row, ignoring the diagonals.For categorical covariates, defined using factor(covname), specify con-straints as follows:

list(..., covnameVALUE1 = c(...), covnameVALUE2 = c(...),...)

where VALUE1, VALUE2, ... are the levels of the factor. Make sure the contrastsoption is set appropriately, for example, the defaultoptions(contrasts=c(contr.treatment, contr.poly))

sets the first (baseline) level of unordered factors to zero.To assume no covariate effect on a certain transition, set its initial value to zeroand use the fixedpars argument to fix it during the optimisation.

misccovariatesA formula representing the covariates on the misclassification probabilities, anal-ogously to covariates. Only used if the model is specified using ematrix,rather than hmodel.

misccovinits Initial values for the covariates on the misclassification probabilities, definedin the same way as covinits. Only used if the model is specified usingematrix.

24 msm

miscconstraintA list of one vector for each named covariate on misclassification probabilities.The vector indicates which covariate effects on misclassification probabilitiesare constrained to be equal, analogously to constraint. Only used if themodel is specified using ematrix.

hcovariates List of formulae the same length as hmodel, defining any covariates governingthe hidden Markov outcome models. The covariates operate on a suitably link-transformed linear scale, for example, log scale for a Poisson outcome model. Ifthere are no covariates for a certain hidden state, then insert a NULL in the cor-responding place in the list. For example, hcovariates = list(~acute+ age, ~acute, NULL).

hcovinits Initial values for the hidden Markov model covariates. A list of the same lengthas hcovariates. Each element is a vector with initial values for each covari-ate on that state. For example, the above hcovariates can be initialised withhcovariates = list(c(-8, 0), -8, NULL). Initial values must begiven for all or no covariates, if none are given these are all set to zero. The initialvalue given in the hmodel constructor function for the corresponding baselineparameter is interpreted as the value of that parameter with any covariates fixedto their means in the data.

hconstraint A named list. Each element is a vector of constraints on the named hiddenMarkov model parameter. The vector has length equal to the number of timesthat class of parameter appears in the whole model.For example consider the three-state hidden Markov model described above,with normally-distributed outcomes for states 1 and 2. To constrain the outcomevariance to be equal for states 1 and 2, and to also constrain the effect of acuteon the outcome mean to be equal for states 1 and 2, specifyhconstraint = list(sd = c(1,1), acute=c(1,1))

qconstraint A vector of indicators specifying which baseline transition intensities are equal.For example,qconstraint = c(1,2,3,3)

constrains the third and fourth intensities to be equal, in a model with four al-lowed instantaneous transitions.

econstraint A similar vector of indicators specifying which baseline misclassification prob-abilities are constrained to be equal. Only used if the model is specified usingematrix, rather than hmodel.

initprobs Currently only used in hidden Markov models. Vector of assumed underlyingstate occupancy probabilities at each individual’s first observation. If these areestimated (see est.initprobs), then this defaults to equal probability foreach state. Otherwise this defaults to c(1, rep(0, nstates-1)), that is,in state 1 with a probability of 1. Scaled to sum to 1 if necessary.

est.initprobsIf TRUE, then the underlying state occupancy probabilities at the first observa-tion will be estimated, starting from initial values taken from the initprobsargument. Be warned that if any of these initial values are 0 or 1, then optimwill give an "non-finite value" error. To fix any of these probabilities during theestimation, e.g. at values of 0 or 1, then use an appropriate fixedpars argu-ment. Note that the free parameters during this estimation excludes the state 1

msm 25

occupancy probability, which is fixed at 1 minus the sum of the other probabili-ties.

death Vector of indices of the death states. A death state is an absorbing state whosetime of entry is known exactly, but the individual is assumed to be in an unknowntransient state ("alive") at the previous instant. This is the usual situation fortimes of death in chronic disease monitoring data. For example, if you specifydeath = c(4, 5) then states 4 and 5 are assumed to be death states.death = TRUE indicates that the final state is a death state, and death =FALSE (the default) indicates that there is no death state. See the obstypeargument.

censor A state, or vector of states, which indicates censoring. Censoring means that theobserved state is known only to be one of a particular set of states. For example,censor=999 indicates that all observations of 999 in the vector of observedstates denote censoring times. By default, this means that the true state couldhave been anything other than an absorbing state. To specify corresponding truestates explicitly, use a censor.states argument.

censor.statesSpecifies the underlying states which censored observations can represent. Ifcensor is a single number (the default) this can be a vector, or a list with oneelement. If censor is a vector with more than one element, this should be a list,with each element a vector corresponding to the equivalent element of censor.For examplecensor = c(99, 999), censor.states = list(c(2,3), c(3,4))

means that observations coded 99 represent either state 2 or state 3, while obser-vations coded 999 are really either state 3 or state 4.

exacttimes By default, the transitions of the Markov process are assumed to take placeat unknown occasions in between the observation times. If exacttimes isset to TRUE, then all observation times are assumed to represent the exact andcomplete times of transition of the process. This is equivalent to every row ofthe data having obstype = 2. See the obstype argument.

cl Width of symmetric confidence intervals for maximum likelihood estimates, bydefault 0.95.

fixedpars Vector of indices of parameters whose values will be fixed at their initial val-ues during the optimisation. These are given in the order: transition intensities(reading across rows of the transition matrix), covariates on intensities (orderedby intensities within covariates), hidden Markov model parameters (ordered byparameters within states), hidden Markov model covariate parameters (orderedby covariates within parameters within states), initial state occupancy probabil-ities (excluding the first probability, which is fixed at one minus the sum of theothers).For covariates on misclassification probabilities, this is a change from version0.4 in the parameter ordering. Previously these were ordered by misclassifica-tion probabilities within covariates.This can be useful for profiling likelihoods, and building complex models stageby stage. To fix all parameters, specify fixedpars = TRUE.

center If TRUE (the default) then covariates are centered at their means during themaximum likelihood estimation. This usually improves convergence.

26 msm

opt.method Quoted name of the R function to perform minimisation of the minus twice loglikelihood. Either "optim" or "nlm". optim is the default.

hessian If TRUE (the default) then the Hessian matrix is computed at the maximumlikelihood estimates, to obtain standard errors and confidence intervals.

use.deriv If TRUE then analytic first derivatives are used in the optimisation of the likeli-hood, when an appropriate quasi-Newton optimisation method, such as BFGS,is being used. Note that the default for optim is a Nelder-Mead method whichcannot use derivatives. However, these derivatives, if supplied, are always usedto calculate the Hessian.

deriv.test If TRUE, then analytic and numeric derivatives are computed and compared atthe initial values, and no optimisation is performed.

analyticp By default, the likelihood for certain simpler 3, 4 and 5 state models is calculatedusing an analytic expression for the transition probability (P) matrix. To revert tothe original method of using the matrix exponential, specify analyticp=FALSE.See the PDF manual for a list of the models for which analytic P matrices areimplemented.

... Optional arguments to the general-purpose R optimisation routines, optimor nlm. Useful options for optim include method="BFGS" for using aquasi-Newton optimisation algorithm, which can often be faster than the defaultNelder-Mead. If the optimisation fails to converge, consider normalising theproblem using, for example, control=list(fnscale = 2500), for ex-ample, replacing 2500 by a number of the order of magnitude of the likelihood.If ’false’ convergence is reported and the standard errors cannot be calculateddue to a non-positive-definite Hessian, then consider tightening the tolerance cri-teria for convergence. If the optimisation takes a long time, intermediate stepscan be printed using the trace argument of the control list. See optim fordetails.

Details

For full details about the methodology behind the msm package, refer to the PDF manual ‘msm-manual.pdf’ in the ‘doc’ subdirectory of the package. This includes a tutorial in the typical use ofmsm.

Users upgrading from versions of msm less than 0.5 will need to change some of their model fittingsyntax. In particular, initial values are now specified in the qmatrix and covinits argumentsinstead of inits, and qmatrix is no longer a matrix of 0/1 indicators. See the appendix to thePDF manual or the NEWS file in the top-level installation directory for a full list of changes.

For simple multi-state Markov models, the likelihood is calculated in terms of the transition inten-sity matrix Q. When the data consist of observations of the Markov process at arbitrary times, theexact transition times are not known. Then the likelihood is calculated using the transition probabil-ity matrix P (t) = exp(tQ), where exp is the matrix exponential. If state i is observed at time t andstate j is observed at time u, then the contribution to the likelihood from this pair of observationsis the i, j element of P (u − t). See, for example, Kalbfleisch and Lawless (1985), Kay (1986), orGentleman et al. (1994).

For hidden Markov models, the likelihood for an individual with k observations is calculated di-rectly by summing over the unknown state at each time, producing a product of k matrices. The

msm 27

calculation is a generalisation of the method described by Satten and Longini (1996), and also byJackson and Sharples (2002), and Jackson et al. (2003).

There must be enough information in the data on each state to estimate each transition rate, other-wise the likelihood will be flat and the maximum will not be found. It may be appropriate to reducethe number of states in the model, or reduce the number of covariate effects, to ensure conver-gence. Hidden Markov models are particularly susceptible to non-identifiability, especially whencombined with a complex transition matrix.

Choosing an appropriate set of initial values for the optimisation can also be important. For flatlikelihoods, ’informative’ initial values will often be required.

Value

A list of class msm, with components:

call The original call to msm.

Qmatrices A list of matrices. The first component, labelled logbaseline, is a matrixcontaining the estimated transition intensities on the log scale with any covari-ates fixed at their means in the data. Each remaining component is a matrixgiving the linear effects of the labelled covariate on the matrix of log intensi-ties. To extract an estimated intensity matrix on the natural scale, at an arbitrarycombination of covariate values, use the function qmatrix.msm.

QmatricesSE The standard error matrices corresponding to Qmatrices.QmatricesL,QmatricesU

Corresponding lower and upper symmetric confidence limits, of width 0.95 un-less specified otherwise by the cl argument.

Ematrices A list of matrices. The first component, labelled logitbaseline, is the es-timated misclassification probability matrix with any covariates fixed at theirmeans in the data. Each remaining component is a matrix giving the lineareffects of the labelled covariate on the matrix of logit misclassification prob-abilities. To extract an estimated misclassification probability matrix on thenatural scale, at an arbitrary combination of covariate values, use the functionematrix.msm.

EmatricesSE The standard error matrices corresponding to Ematrices.EmatricesL,EmatricesU

Corresponding lower and upper symmetric confidence limits, of width 0.95 un-less specified otherwise by the cl argument.

sojourn A list with components:mean = estimated mean sojourn times in the transient states, with covariatesfixed at their means.se = corresponding standard errors.

minus2loglik Minus twice the maximised log-likelihood.

estimates Vector of untransformed maximum likelihood estimates returned from optim.Transition intensities are on the log scale and misclassification probabilities arethe logit scale.

estimates.t Vector of transformed maximum likelihood estimates with intensities and prob-abilities on their natural scales.

28 msm

fixedpars Indices of estimates which were fixed during the maximum likelihood esti-mation.

covmat Covariance matrix corresponding to estimates.

ci Matrix of confidence intervals corresponding to estimates

opt Return value from optim or nlm, giving information about the results of theoptimisation.

foundse Logical value indicating whether the Hessian was positive-definite at the sup-posed maximum of the likelihood. If not, the covariance matrix of the parame-ters is unavailable. In these cases the optimisation has probably not convergedto a maximum.

data A list of constants and vectors giving the data, for use in post-processing.

qmodel A list of objects specifying the model for transition intensities, for use in post-processing.

emodel A list of objects specifying the model for misclassification.

qcmodel A list of objects specifying the model for covariates on the transition intensities.

ecmodel A list of objects specifying the model for covariates on misclassification proba-bilities.

hmodel A list of class "hmodel", containing objects specifying the hidden Markov model.Estimates of "baseline" location parameters are presented with any covariatesfixed to their means in the data.

cmodel A list of objects specifying any model for censoring.

Printing a msm object by typing the object’s name at the command line implicitly invokes print.msm.This formats and prints the important information in the model fit. This includes the fitted transitionintensity matrix, matrices containing covariate effects on intensities, and mean sojourn times froma fitted msm model. When there is a hidden Markov model, the chief information in the hmodelcomponent is also formatted and printed. This includes estimates and confidence intervals for eachparameter.

To extract summary information from the fitted model, it is recommended to use the more flexibleextractor functions, such as qmatrix.msm, pmatrix.msm, sojourn.msm, instead of directlyreading from list components of msm objects.

Author(s)


References

Kalbfleisch, J., Lawless, J.F., The analysis of panel data under a Markov assumption Journal of theAmerical Statistical Association (1985) 80(392): 863–871.

Kay, R. A Markov model for analysing cancer markers and disease states in survival studies. Bio-metrics (1986) 42: 855–865.

Gentleman, R.C., Lawless, J.F., Lindsey, J.C. and Yan, P. Multi-state Markov models for analysingincomplete disease history data with illustrations for HIV disease. Statistics in Medicine (1994)13(3): 805–821.

odds.msm 29

Satten, G.A. and Longini, I.M. Markov chains with measurement error: estimating the ’true’ courseof a marker of the progression of human immunodeficiency virus disease (with discussion) AppliedStatistics 45(3): 275-309 (1996)


Jackson, C.H., Sharples, L.D., Thompson, S.G. and Duffy, S.W. and Couto, E. Multi-state Markovmodels for disease progression with classification error. The Statistician, 52(2): 193–209 (2003)

See Also

simmulti.msm, plot.msm, summary.msm, qmatrix.msm, pmatrix.msm, sojourn.msm.

Examples

### Heart transplant data### For further details and background to this example, see### the PDF manual in the doc directory.data(heart)print(heart[1:10,])twoway4.q <- rbind(c(-0.5, 0.25, 0, 0.25), c(0.166, -0.498, 0.166, 0.166),c(0, 0.25, -0.5, 0.25), c(0, 0, 0, 0))statetable.msm(state, PTNUM, data=heart)crudeinits.msm(state ~ years, PTNUM, data=heart, qmatrix=twoway4.q)heart.msm <- msm( state ~ years, subject=PTNUM, data = heart,

qmatrix = twoway4.q, death = 4,control = list ( trace = 2, REPORT = 1 ) )

heart.msmqmatrix.msm(heart.msm)pmatrix.msm(heart.msm, t=10)sojourn.msm(heart.msm)

odds.msm Calculate tables of odds ratios for covariates on misclassificationprobabilities

Description

Odds ratios are computed by exponentiating the estimated covariate effects on the logit-misclassificationprobabilities.

Usage

odds.msm(x, odds.scale = 1, cl = 0.95)

30 pexp

Arguments

x Output from msm representing a fitted multi-state model.odds.scale Vector with same elements as number of covariates on misclassification proba-

bilities. Corresponds to the increase in each covariate used to calculate its oddsratio. Defaults to all 1.


Value

A list of tables containing odds ratio estimates, one table for each covariate. Each table has threecolumns, containing the odds ratio, and an approximate upper 95% and lower 95% confidence limitrespectively (assuming normality on the log scale), for each misclassification probability.

Author(s)


See Also

msm, hazard.msm

pexp Exponential distribution with piecewise-constant rate

Description

Density, distribution function, quantile function and random generation for a generalisation of theexponential distribution, in which the rate changes at a series of times.

Usage

dpexp(x, rate=1, t=0, log = FALSE)ppexp(q, rate=1, t=0, lower.tail = TRUE, log.p = FALSE)qpexp(p, rate=1, t=0, lower.tail = TRUE, log.p = FALSE)rpexp(n, rate=1, t=0)

Arguments

x,q vector of quantiles.p vector of probabilities.n number of observations. If length(n) > 1, the length is taken to be the

number required.rate vector of rates.t vector of the same length as rate, giving the times at which the rate changes.

The first element of t should be 0, and t should be in increasing order.log, log.p logical; if TRUE, probabilities p are given as log(p).lower.tail logical; if TRUE (default), probabilities are P[X <= x], otherwise, P[X > x].

plot.msm 31

Details

Consider the exponential distribution with rates r1, . . . , rn changing at times t1, . . . , tn, with t1 =0. Suppose tk is the maximum ti such that ti < x. The density of this distribution at x > 0 is f(x)for k = 1, and

k∏i=1

(1 − F (ti − ti−1, ri))f(x − tk, rk)

for k > 1.

where F () and f() are the distribution and density functions of the standard exponential distribution.

If rate is of length 1, this is just the standard exponential distribution. Therefore, for example,dpexp(x), with no other arguments, is simply equivalent to dexp(x).

Only rpexp is used in the msm package, to simulate from Markov processes with piecewise-constant intensities depending on time-dependent covariates. These functions are merely providedfor completion, and are not optimized for numerical stability.

Value

dpexp gives the density, ppexp gives the distribution function, qpexp gives the quantile function,and rpexp generates random deviates.

Author(s)


See Also

dexp, sim.msm.

Examples

x <- seq(0.1, 50, by=0.1)rate <- c(0.1, 0.2, 0.05, 0.3)t <- c(0, 10, 20, 30)plot(x, dexp(x, 0.1), type="l") ## standard exponential distributionlines(x, dpexp(x, rate, t), type="l", lty=2) ## distribution with piecewise constant rateplot(x, pexp(x, 0.1), type="l") ## standard exponential distributionlines(x, ppexp(x, rate, t), type="l", lty=2) ## distribution with piecewise constant rate

plot.msm Plots of multi-state models

Description

This produces a plot of the expected probability of survival against time, from each transient state.Survival is defined as not entering an absorbing state.

32 pmatrix.msm

Usage

## S3 method for class 'msm':plot(x, from, to, range, covariates, legend.pos, ...)

Arguments

x Output from msm, representing a fitted multi-state model object

from States from which to consider survival. Defaults to the complete set of transientstates.

to Absorbing state to consider. Defaults to the highest-labelled absorbing state.

range Vector of two elements, giving the range of times to plot for.

covariates Covariate values for which to evaluate the expected probabilities. This can ei-ther be:





legend.pos Vector of the x and y position, respectively, of the legend.

... Other arguments to the generic plot function

Author(s)


See Also

msm

pmatrix.msm Transition probability matrix

Description

Extract the estimated transition probability matrix from a fitted multi-state model for a given timeinterval, at a given set of covariate values.

pmatrix.msm 33

Usage

pmatrix.msm(x, t=1, covariates="mean", ci.boot=FALSE, cl=0.95, B=500)

Arguments

x A fitted multi-state model, as returned by msm.

t The time interval to estimate the transition probabilities for, by default one unit.

covariates The covariate values at which to estimate the transition probabilities. This caneither be:





ci.boot Calculate a bootstrap confidence interval. This is usually time-consuming, anddisabled by default. See boot.msm for more details of bootstrapping in msm.

cl Width of the symmetric confidence interval

B Number of bootstrap replicates

Details

For a continuous-time homogeneous Markov process with transition intensity matrix Q, the proba-bility of occupying state s at time u + t conditionally on occupying state r at time u is given by the(r, s) entry of the matrix P (t) = exp(tQ).

For non-homogeneous processes, where covariates and hence the transition intensity matrix aretime-dependent, but are piecewise-constant within the time interval [u, u+t], the function pmatrix.piecewise.msmcan be used.

Value

The matrix of estimated transition probabilities P (t) in the given time. Rows correspond to "from-state" and columns to "to-state".

Author(s)


See Also

qmatrix.msm, pmatrix.piecewise.msm, boot.msm

34 pmatrix.piecewise.msm

pmatrix.piecewise.msmTransition probability matrix for processes with piecewise-constant in-tensities

Description

Extract the estimated transition probability matrix from a fitted non-time-homogeneous multi-statemodel for a given time interval. This is a generalisation of pmatrix.msm to non-homogeneousmodels with time-dependent covariates.

Usage

pmatrix.piecewise.msm(x, t1, t2, times, covariates)

Arguments

x A fitted multi-state model, as returned by msm. This should be a non-homogeneousmodel, whose transition intensity matrix depends on a time-dependent covariate.

t1 The start of the time interval to estimate the transition probabilities for.

t2 The end of the time interval to estimate the transition probabilities for.

times Cut points at which the transition intensity matrix changes.

covariates A list with number of components one greater than the length of times. Eachcomponent of the list is specified in the same way as the covariates argu-ment to pmatrix.msm. The components correspond to the covariate values inthe intervals(t1, times[1]], (times[1], times[2]], ..., (times[length(times)], t2]

(assuming that all elements of times are in the interval (t1, t2)).

Details

Suppose a multi-state model has been fitted, in which the transition intensity matrix Q(x(t)) is mod-elled in terms of time-dependent covariates x(t). The transition probability matrix P (t1, tn) for thetime interval (t1, tn) cannot be calculated from the estimated intensity matrix as exp((tn − t1)Q),because Q varies within the interval t1, tn. However, if the covariates are piecewise-constant, orcan be approximated as piecewise-constant, then we can calculate P (t1, tn) by multiplying togetherindividual matrices P (ti, ti+1) = exp((ti+1 − ti)Q), calculated over intervals where Q is constant:

P (t1, tn) = P (t1, t2)P (t2, t3) . . . P (tn−1, tn)

Value

The matrix of estimated transition probabilities P (t) for the time interval [t1, tn]. That is, theprobabilities of occupying state s at time tn conditionally on occupying state r at time t1. Rowscorrespond to "from-state" and columns to "to-state".

prevalence.msm 35

Author(s)


See Also

pmatrix.msm

Examples

## Not run:## In a clinical study, suppose patients are given a placebo in the## first 5 weeks, then they begin treatment 1 at 5 weeks, and## a combination of treatments 1 and 2 from 10 weeks.## Suppose a multi-state model x has been fitted for the patients'## progress, with treat1 and treat2 as time dependent covariates.

## Cut points for when treatment covariate changestimes <- c(0, 5, 10)

## Indicators for which treatments are active at the three cut pointscovariates <- list( list (treat1=0, treat2=0), list(treat1=1, treat2=0),list(treat1=1, treat2=1) )

## Calculate transition probabilities from the start of the study to 15 weekspmatrix.piecewise.msm(x, 0, 15, times, covariates)## End(Not run)

prevalence.msm Tables of observed and expected prevalences

Description

This provides a rough indication of the goodness of fit of a multi-state model, by estimating theobserved numbers of individuals occupying each state at a series of times, and comparing thesewith forecasts from the fitted model.

Usage

prevalence.msm(x, times, timezero=NULL, initstates, covariates="mean",misccovariates="mean")

Arguments

x A fitted multi-state model produced by msm.

times Series of times at which to compute the observed and expected prevalences ofstates.

timezero Initial time of the Markov process. Expected values are forecasted from here.Defaults to the minimum of the observation times given in the data.

36 prevalence.msm

initstates Optional vector of the same length as the number of states. Gives the numbers ofindividuals occupying each state at the initial time. The default is those observedin the data.

covariates Covariate values for which to forecast expected state occupancy. See qmatrix.msm.Defaults to the mean values of the covariates in the data set.

misccovariates(Misclassification models only) Values of covariates on the misclassificationprobability matrix for which to forecast expected state occupancy. Defaults tothe mean values of the covariates in the data set.

Details

To compute ‘observed’ prevalences at a time t, individuals are assumed to be in the same state as attheir last observation time preceding t.

The fitted transition probability matrix is used to forecast expected prevalences from the state oc-cupancy at the initial time. To produce the expected number in state j at time t after the start, thenumber of individuals under observation at time t (including those who have died, but not thoselost to follow-up) is multiplied by the product of the proportion of individuals in each state at theinitial time and the transition probability matrix in the time interval t. The proportion of individ-uals in each state at the "initial" time is estimated, if necessary, in the same way as the observedprevalences.

For misclassification models (fitted using an ematrix), this aims to assess the fit of the full modelfor the observed states. That is, the combined Markov progression model for the true states and themisclassification model. Thus, expected prevalences of true states are estimated from the assumedproportion occupying each state at the initial time using the fitted transition probabiliy matrix. Thevector of expected prevalences of true states is then multiplied by the fitted misclassification prob-ability matrix to obtain the expected prevalences of observed states.

For general hidden Markov models, the observed state is taken to be the predicted underlying statefrom the Viterbi algorithm (viterbi.msm). The goodness of fit of these states to the underlyingMarkov model is tested.

For an example of this approach, see Gentleman et al. (1994).

Value


Observed Table of observed numbers of individuals in each state at each timeObserved percentages

Corresponding percentage of the individuals at risk at each time.

Expected Table of corresponding expected numbers.Expected percentages

Corresponding percentage of the individuals at risk at each time.

Author(s)


psor 37

References

Gentleman, R.C., Lawless, J.F., Lindsey, J.C. and Yan, P. Multi-state Markov models for analysingincomplete disease history data with illustrations for HIV disease. Statistics in Medicine (1994)13(3): 805–821.

See Also

msm, summary.msm

psor Psoriatic arthritis data

Description

A series of observations of grades of psoriatic arthritis, as indicated by numbers of damaged joints.

Usage

data(psor)

Format

A data frame containing 806 observations, representing visits to a psoriatic arthritis (PsA) clinicfrom 305 patients. The rows are grouped by patient number and ordered by examination time. Eachrow represents an examination and contains additional covariates.

ptnum (numeric) Patient identification numbermonths (numeric) Examination time in monthsstate (numeric) Clinical state of PsA. Patients in states 1, 2, 3 and 4

have 0, 1 to 4, 5 to 9 and 10 or more damaged joints,respectively.

hieffusn (numeric) Presence of five or more effusionsollwsdrt (character) Erythrocyte sedimentation rate of less than 15 mm/h

References

Gladman, D. D. and Farewell, V.T. (1999) Progression in psoriatic arthritis: role of time-varyingclinical indicators. J. Rheumatol. 26(11):2409-13

Examples

## Four-state progression-only model with high effusion and low## sedimentation rate as covariates on the progression rates. High## effusion is assumed to have the same effect on the 1-2, 2-3, and 3-4## progression rates, while low sedimentation rate has the same effect## on the 1-2 and 2-3 intensities, but a different effect on the 3-4.

data(psor)

38 qmatrix.msm

psor.q <- rbind(c(0,0.1,0,0),c(0,0,0.1,0),c(0,0,0,0.1),c(0,0,0,0))psor.msm <- msm(state ~ months, subject=ptnum, data=psor,

qmatrix = psor.q, covariates = ~ollwsdrt+hieffusn,constraint = list(hieffusn=c(1,1,1),ollwsdrt=c(1,1,2)),fixedpars=FALSE, control = list(REPORT=1,trace=2), method="BFGS")

qmatrix.msm(psor.msm)sojourn.msm(psor.msm)hazard.msm(psor.msm)

qmatrix.msm Transition intensity matrix

Description

Extract the estimated transition intensity matrix, and the corresponding standard errors, from a fittedmulti-state model at a given set of covariate values.

Usage

qmatrix.msm(x, covariates="mean", sojourn=FALSE, cl=0.95)

Arguments


covariates The covariate values at which to estimate the intensity matrix. This can either be:





sojourn Set to TRUE if the estimated sojourn times and their standard errors should alsobe returned.


Details

Transition intensities and covariate effects are estimated on the log scale by msm. A covariancematrix is estimated from the Hessian of the maximised log-likelihood. The delta method is usedto obtain from these the standard error of the intensities on the natural scale at arbitrary covariatevalues. Confidence limits are calculated by assuming normality on the log scale.

qratio.msm 39

Value


estimate Estimated transition intensity matrix.

SE Corresponding approximate standard errors.

L Lower confidence limits

U Upper confidence limits

If sojourn is TRUE, extra components called sojourn and sojournSE are included, con-taining the estimate and standard errors, respectively, of the mean sojourn times in each transientstate.

The default print method for objects returned by qmatrix.msm presents estimates and confidencelimits. To present estimates and standard errors, do something like

qmatrix.msm(x)[c("estimates","SE")]

Author(s)


See Also

pmatrix.msm, sojourn.msm, deltamethod, ematrix.msm

qratio.msm Estimated ratio of transition intensities

Description

Compute the estimate and approximate standard error of the ratio of two estimated transition inten-sities from a fitted multi-state model at a given set of covariate values.

Usage

qratio.msm(x, ind1, ind2, covariates = "mean", cl = 0.95)

Arguments


ind1 Pair of numbers giving the indices in the intensity matrix of the numerator of theratio, for example, c(1,2).

ind2 Pair of numbers giving the indices in the intensity matrix of the denominator ofthe ratio, for example, c(2,1).

40 sim.msm

covariates The covariate values at which to estimate the intensities. This can either be:






Details

For example, we might want to compute the ratio of the progression rate and recovery rate for afitted model disease.msm with a health state (state 1) and a disease state (state 2). In this case,the progression rate is the (1,2) entry of the intensity matrix, and the recovery rate is the (2,1) entry.Thus to compute this ratio with covariates set to their means, we call

qratio.msm(disease.msm, c(1,2), c(2,1)) .

Standard errors are estimated by the delta method. Confidence limits are estimated by assumingnormality on the log scale.

Value

A named vector with elements estimate, se, L and U containing the estimate, standard error,lower and upper confidence limits, respectively, of the ratio of intensities.

Author(s)


See Also

qmatrix.msm

sim.msm Simulate one individual trajectory from a continuous-time Markovmodel

Description

Simulate one realisation from a continuous-time Markov process up to a given time.

sim.msm 41

Usage

sim.msm(qmatrix, maxtime, covs=NULL, beta=NULL, obstimes=0, start=1,mintime=0)

Arguments

qmatrix The transition intensity matrix of the Markov process. The diagonal of qmatrixis ignored, and computed as appropriate so that the rows sum to zero. For ex-ample, a possible qmatrix for a three state illness-death model with recoveryis:rbind( c( 0, 0.1, 0.02 ), c( 0.1, 0, 0.01 ), c( 0, 0,0 ) )

maxtime Maximum time for the simulated process.

covs Matrix of time-dependent covariates, with one row for each observation timeand one column for each covariate.

beta Matrix of linear covariate effects on log transition intensities. The rows corre-spond to different covariates, and the columns to the transition intensities. Theintensities are ordered by reading across rows of the intensity matrix, startingwith the first, counting the positive off-diagonal elements of the matrix.

obstimes Vector of times at which the covariates are observed.

start Starting state of the process. Defaults to 1.

mintime Starting time of the process. Defaults to 0.

Details

The effect of time-dependent covariates on the transition intensity matrix for an individual is de-termined by assuming that the covariate is a step function which remains constant in between theindividual’s observation times.

Value

A list with components,

states Simulated states through which the process moves. This ends with either anabsorption before obstime, or a transient state at obstime.

times Exact times at which the process changes to the corresponding states

qmatrix The given transition intensity matrix

Author(s)


See Also

simmulti.msm

42 simmulti.msm

Examples

qmatrix <- rbind(c(-0.2, 0.1, 0.1 ),c(0.5, -0.6, 0.1 ),c(0, 0, 0))

sim.msm(qmatrix, 30)

simmulti.msm Simulate multiple trajectories from a multi-state Markov model witharbitrary observation times

Description

Simulate a number of individual realisations from a multi-state Markov process. Observations ofthe process are made at specified arbitrary times for each individual.

Usage

simmulti.msm(data, qmatrix, covariates=NULL, death = FALSE, start,ematrix=NULL, hmodel=NULL, hcovariates=NULL)

Arguments

data A data frame with a mandatory column named time, representing observationtimes. The optional column named subject, corresponds to subject identifi-cation numbers. If not given, all observations are assumed to be on the sameindividual. Observation times should be sorted within individuals. Other namedcolumns of the data frame represent any covariates.

qmatrix The transition intensity matrix of the Markov process, with any covariates setto zero. The diagonal of qmatrix is ignored, and computed as appropriate sothat the rows sum to zero. For example, a possible qmatrix for a three stateillness-death model with recovery is:rbind( c( 0, 0.1, 0.02 ), c( 0.1, 0, 0.01 ), c( 0, 0,0 ) )

covariates List of covariate effects on log transition intensities. Each element is a vectorof the effects of one covariate on all the transition intensities. The intensitiesare ordered by reading across rows of the intensity matrix, starting with the first,counting the positive off-diagonal elements of the matrix.For example, for a multi-state model with three transition intensities, and twocovariates x and y on each intensity,covariates=list(x = c(-0.3,-0.3,-0.3), y=c(0.1, 0.1, 0.1))

simmulti.msm 43

death Vector of indices of the death states. A death state is an absorbing state whosetime of entry is known exactly, but the individual is assumed to be in an unknowntransient state ("alive") at the previous instant. This is the usual situation fortimes of death in chronic disease monitoring data. For example, if you specifydeath = c(4, 5) then states 4 and 5 are assumed to be death states.death = TRUE indicates that the final state is a death state, and death =FALSE (the default) indicates that there is no death state.

start A vector with the same number of elements as there are distinct subjects in thedata, giving the states in which each corresponding individual begins. Defaultsto state 1 for each subject.

ematrix An optional misclassification matrix for generating observed states conditionallyon the simulated true states. As defined in msm.

hmodel An optional hidden Markov model for generating observed outcomes condition-ally on the simulated true states. As defined in msm.

hcovariates List of the same length as hmodel, defining any covariates governing the hid-den Markov outcome models. Unlike in the msm function, this should also de-fine the values of the covariate effects. Each element of the list is a named vectorof the initial values for each set of covariates for that state. For example, for athree-state hidden Markov model with two, one and no covariates on the state 1,2 and 3 outcome models respectively,hcovariates = list (c(acute=-8, age=0), c(acute=-8),

NULL)

Details

sim.msm is called repeatedly to produce a simulated trajectory for each individual. The state ateach specified observation time is then taken to produce a new column state. The effect of time-dependent covariates on the transition intensity matrix for an individual is determined by assumingthat the covariate is a step function which remains constant in between the individual’s observationtimes. If the subject enters an absorbing state, then only the first observation in that state is kept inthe data frame. Rows corresponding to future observations are deleted. The entry times into statesgiven in death are assumed to be known exactly.

Value

A data frame with columns,

subject Subject identification indicators

time Observation times

state Simulated (true) state at the corresponding time

obs Observed outcome at the corresponding time, if ematrix or hmodelwas sup-plied

plus any supplied covariates.

Author(s)


44 sojourn.msm

See Also

sim.msm

Examples

### Simulate 100 individuals with common observation timessim.df <- data.frame(subject = rep(1:100, rep(13,100)), time = rep(seq(0, 24, 2), 100))qmatrix <- rbind(c(-0.11, 0.1, 0.01 ),

c(0.05, -0.15, 0.1 ),c(0.02, 0.07, -0.09))

simmulti.msm(sim.df, qmatrix)

sojourn.msm Mean sojourn times from a multi-state model

Description

Estimate the mean sojourn times in the transient states of a multi-state model and their confidencelimits.

Usage

sojourn.msm(x, covariates="mean", cl=0.95)

Arguments


covariates The covariate values at which to estimate the mean sojourn times. This can ei-ther be:



a list of values, with optional names. For example,list(60, 1), where the order of the list follows the order of the covariatesoriginally given in the model formula, or a named list, e.g.list (age = 60, sex = 1)


Details

The mean sojourn time in a transient state r is estimated by −1/qrr, where qrr is the rth entry onthe diagonal of the estimated transition intensity matrix. Calls deltamethod to find approximatestandard errors. Confidence limits are estimated by assuming normality on the log scale.

statetable.msm 45

Value

A data frame with components:

estimates Estimated mean sojourn times in the transient states.

SE Corresponding standard errors.

L Lower confidence limits.

U Upper confidence limits.

Author(s)


See Also

msm, qmatrix.msm, deltamethod

statetable.msm Table of transitions

Description

Calculates a frequency table counting the number of times each pair of states were observed insuccessive observation times. This can be a useful way of summarising multi-state data.

Usage

statetable.msm(state, subject, data=NULL)

Arguments

state Observed states, assumed to be ordered by time within each subject.

subject Subject identification numbers corresponding to state. If not given, all obser-vations are assumed to be on the same subject.

data An optional data frame in which the variables represented by subject andstate can be found.

Value

A frequency table with starting states as rows and finishing states as columns.

Author(s)


See Also

crudeinits.msm

46 msm.summary

Examples

## Heart transplant datadata(heart)

## 148 deaths from state 1, 48 from state 2 and 55 from state 3.statetable.msm(state, PTNUM, data=heart)

msm.summary Summarise a fitted multi-state model

Description

Summary method for fitted msm models. Currently, this produces a table of observed and expectedstate prevalences for each time. For models with covariates, prints hazard ratios with confidenceintervals for covariate effects.

Usage

## S3 method for class 'msm':summary(object, times=NULL, timezero=NULL, initstates=NULL,covariates="mean", misccovariates="mean", hazard.scale=1, ...)

Arguments

object A fitted multi-state model object, as returned by msm.

times A sequence of times at which to compare observed and expected prevalences ofeach state. Defaults to seq(min(times), max(times), (max(times)- min(times))/10).

timezero Initial time of the Markov process. Expected values are forecasted from here.Defaults to the minimum of the observation times given in the data.

initstates Optional vector of the same length as the number of states. Gives the num-bers of individuals occupying each state at timezero. The default is that allindividuals are in state 1.

covariates Covariate values for which to forecast expected state occupancy. See qmatrix.msm.Defaults to the mean values of the covariates in the data set.

misccovariates(Misclassification models only) Values of covariates on the misclassificationprobability matrix for which to forecast expected state occupancy. Defaults tothe mean values of the covariates in the data set.

hazard.scale Vector with same elements as number of covariates on transition rates. Corre-sponds to the increase in each covariate used to calculate its hazard ratio. De-faults to all 1.

... further arguments passed to or from other methods

surface.msm 47

Value

A list of class summary.msm, with components:

prevalences Output from prevalence.msm.

hazard Output from hazard.msm

hazard.scale Value of the hazard.scale argument

Author(s)


See Also

msm,prevalence.msm, hazard.msm

surface.msm Explore the likelihood surface

Description

Plot the log-likelihood surface with respect to two parameters.

Usage

surface.msm(x, params=c(1,2), np=10, type=c("contour","filled.contour","persp","image"),point=NULL, xrange=NULL, yrange=NULL,...)

Arguments

x Output from msm, representing a fitted msm model.

params Integer vector with two elements, giving the indices of the parameters to vary.All other parameters will be fixed. Defaults to c(1,2), representing the firsttwo log transition intensities. See the fixedpars argument to msm for a defi-nition of these indices.

np Number of grid points to use in each direction, by default 10. An np x npgrid will be used to evaluate the likelihood surface. If 100 likelihood functionevaluations is slow, then reduce this.

type Character string specifying the type of plot to produce.

"contour" Contour plot, using the R function contour."filled.contour" Solid-color contour plot, using the R function filled.contour."persp" Perspective plot, using the R function persp."image" Grid color plot, using the R function image.

point Vector of length n, where n is the number of parameters in the model, including

48 tnorm

the parameters that will be varied here. This specifies the point at which to fixthe likelihood. By default, this is the maximum likelihood estimates stored inthe fitted model x, x$estimates.

xrange Range to plot for the first varied parameter. Defaults to plus and minus twostandard errors, obtained from the Hessian at the maximum likelihood estimate.

yrange Range to plot for the second varied parameter. Defaults to plus and minus twostandard errors, obtained from the Hessian at the maximum likelihood estimate.

... Further arguments to be passed to the plotting function.

Details

Draws a contour or perspective plot. Useful for diagnosing irregularities in the likelihood surface.If you want to use these plots before running the maximum likelihood estimation, then just run msmwith all estimates fixed at their initial values.

contour.msm just calls surface.msm with type = "persp".

persp.msm just calls surface.msm with type = "persp".

image.msm just calls surface.msm with type = "persp".

As these three functions are methods of the generic functions contour, persp and image, theycan be invoked as contour(x), persp(x) or image(x), where x is a fitted msm object.

Author(s)


See Also

msm, contour, filled.contour, persp, image.

tnorm Truncated Normal distribution

Description

Density, distribution function, quantile function and random generation for the truncated Normaldistribution with mean equal to mean and standard deviation equal to sd before truncation, andtruncated on the interval [lower, upper].

Usage

dtnorm(x, mean=0, sd=1, lower=-Inf, upper=Inf, log = FALSE)ptnorm(q, mean=0, sd=1, lower=-Inf, upper=Inf, lower.tail = TRUE, log.p = FALSE)qtnorm(p, mean=0, sd=1, lower=-Inf, upper=Inf, lower.tail = TRUE, log.p = FALSE)rtnorm(n, mean=0, sd=1, lower=-Inf, upper=Inf)

tnorm 49

Arguments

x,q vector of quantiles.

p vector of probabilities.

n number of observations. If length(n) > 1, the length is taken to be thenumber required.

mean vector of means.

sd vector of standard deviations.

lower lower truncation point.

upper upper truncation point.

log, log.p logical; if TRUE, probabilities p are given as log(p).

lower.tail logical; if TRUE (default), probabilities are P[X <= x], otherwise, P[X > x].

Details

The truncated normal distribution has density

f(x, µ, σ) = φ(x, µ, σ)/(Φ(u, µ, σ) − Φ(l, µ, σ))

for l <= x <= u, and 0 otherwise.

µ is the mean of the original Normal distribution before truncation,σ is the corresponding standard deviation,u is the upper truncation point,l is the lower truncation point,φ(x) is the density of the corresponding normal distribution, andΦ(x) is the distribution function of the corresponding normal distribution.

If mean or sd are not specified they assume the default values of 0 and 1, respectively.

If lower or upper are not specified they assume the default values of -Inf and Inf, respectively,corresponding to no lower or no upper truncation.

Therefore, for example, dtnorm(x), with no other arguments, is simply equivalent to dnorm(x).

Only rtnorm is used in the msm package, to simulate from hidden Markov models with truncatednormal distributions. These functions are merely provided for completion, and are not optimized fornumerical stability. To fit a hidden Markov model with a truncated Normal response distribution,use a hmmTNorm constructor. See the hmm-dists help page for further details.

Value

dtnorm gives the density, ptnorm gives the distribution function, qtnorm gives the quantilefunction, and rtnorm generates random deviates.

Author(s)


See Also

dnorm

50 totlos.msm

Examples

x <- seq(50, 90, by=1)plot(x, dnorm(x, 70, 10), type="l", ylim=c(0,0.06)) ## standard Normal distributionlines(x, dtnorm(x, 70, 10, 60, 80), type="l") ## truncated Normal distribution

totlos.msm Total length of stay

Description

Estimate the expected total length of stay in each transient state, for a given period of evolution ofa multi-state model. This assumes that the transition rates do not change with time.

Usage

totlos.msm(x, start=1, fromt=0, tot=Inf, covariates="mean", ci.boot=FALSE, cl=0.95, B=500, ...)

Arguments

x A fitted multi-state model, as returned by msm.

start State at the beginning of the period.

fromt Time from which to estimate total length of stay. Defaults to 0, the beginning ofthe process.

tot Time up to which total length of stay is estimated. Defaults to infinity, givingthe expected time spent in the state until absorption. For models without anabsorbing state, t must be specified.

covariates The covariate values to estimate for. This can either be:





ci.boot Calculate a bootstrap confidence interval. This is usually time-consuming, anddisabled by default. See boot.msm for more details of bootstrapping in msm.

cl Width of the symmetric confidence interval

B Number of bootstrap replicates

... Further arguments to be passed to the integrate function to control the nu-merical integration.

transient.msm 51

Details

The expected total length of stay in state j between times t1 and t2, from the point of view ofan individual in state i at time 0, is defined by the integral from t1 to t2 of the i, j entry of thetransition probability matrix P (t). As the individual entries of P (t) = exp(tQ) are not availableexplicitly in terms of t for a general Markov model, this integral is calculated numerically, usingthe integrate function. This may take a long time for models with many states where P (t) isexpensive to calculate.

For a model where the individual has only one place to go from each state, and each state is visitedonly once, for example a progressive disease model with no recovery or death, these are equal tothe mean sojourn time in each state. However, consider a three-state health-disease-death modelwith transitions from health to disease, health to death, and disease to death, where everybody startshealthy. In this case the mean sojourn time in the disease state will be greater than the expectedlength of stay in the disease state. This is because the mean sojourn time in a state is conditionalon entering the state, whereas the expected total time diseased is a forecast for a healthy individual,who may die before getting the disease.

Value

A vector of expected total lengths of stay for each transient state.

Author(s)


See Also

sojourn.msm, pmatrix.msm, integrate, boot.msm.

transient.msm Transient and absorbing states

Description

Returns the transient and absorbing states of either a fitted model or a transition intensity matrix.

Usage

transient.msm(x=NULL, qmatrix=NULL)absorbing.msm(x=NULL, qmatrix=NULL)

Arguments

x A fitted multi-state model as returned by msm.

qmatrix A transition intensity matrix. The diagonal is ignored and taken to be minus thesum of the rest of the row.

52 viterbi.msm

Value

A vector of the ordinal indices of the transient or absorbing states.

Author(s)


viterbi.msm Calculate the most likely path through underlying stages

Description

For a fitted hidden Markov model, the Viterbi algorithm recursively constructs the path with thehighest probability through the underlying stages.

Usage

viterbi.msm(x)

Arguments

x A fitted hidden Markov multi-state model, as produced by msm

Value

A data frame with columns:

subject = subject identification numbers

time = times of observations

observed = corresponding observed states

fitted = corresponding fitted states found by Viterbi recursion. If the model is not a hiddenMarkov model, this is just the observed states.

Author(s)


References

Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. Biological sequence analysis, Cambridge Uni-versity Press, 1998.

See Also

msm

Index

∗Topic datasetsaneur, 3bos, 5fev, 11heart, 13psor, 36

∗Topic mathdeltamethod, 8MatrixExp, 1

∗Topic modelscoef.msm, 6ematrix.msm, 10hazard.msm, 12logLik.msm, 16msm, 19odds.msm, 29plot.msm, 31prevalence.msm, 35qmatrix.msm, 37qratio.msm, 39sim.msm, 40simmulti.msm, 41sojourn.msm, 43surface.msm, 47viterbi.msm, 52

absorbing.msm (transient.msm), 51aneur, 3

boot.msm, 4, 4, 32, 33, 50, 51bos, 5

coef.msm, 6contour, 47, 48contour.msm (surface.msm), 47crudeinits.msm, 7, 45

dbinom, 15deltamethod, 8, 39, 44dexp, 15, 30

dgamma, 15dmenorm (medists), 17dmeunif (medists), 17dnbinom, 15dnorm, 19, 49dpexp (pexp), 29dpois, 15dtnorm, 19dtnorm (tnorm), 48dunif, 19dweibull, 15

ematrix.msm, 10, 11, 26, 39

fev, 11filled.contour, 47, 48

hazard.msm, 12, 29, 46, 47heart, 13hmm-dists, 18, 21, 49hmm-dists, 14hmmBinom (hmm-dists), 14hmmCat, 21hmmCat (hmm-dists), 14hmmExp (hmm-dists), 14hmmGamma (hmm-dists), 14hmmIdent (hmm-dists), 14hmmLNorm (hmm-dists), 14hmmMETNorm, 18hmmMETNorm (hmm-dists), 14hmmMEUnif, 18hmmMEUnif (hmm-dists), 14hmmNBinom (hmm-dists), 14hmmNorm (hmm-dists), 14hmmPois (hmm-dists), 14hmmTNorm, 49hmmTNorm (hmm-dists), 14hmmUnif (hmm-dists), 14hmmWeibull (hmm-dists), 14

image, 47, 48

53

54 INDEX

image.msm (surface.msm), 47integrate, 50, 51

load, 4logLik.msm, 16

MatrixExp, 1medists, 14, 17msm, 4, 6–8, 10–13, 16, 17, 19, 27, 29, 31–33,

35, 36, 38, 39, 42, 44, 46–48, 50–52msm.summary, 46

nlm, 25, 27

odds.msm, 13, 29optim, 24, 25, 27

persp, 47, 48persp.msm (surface.msm), 47pexp, 29plot.msm, 28, 31pmatrix.msm, 4, 5, 27, 28, 32, 33, 34, 39,

51pmatrix.piecewise.msm, 33, 33pmenorm (medists), 17pmeunif (medists), 17ppexp (pexp), 29prevalence.msm, 35, 46, 47print.summary.msm (msm.summary),

46psor, 36ptnorm (tnorm), 48

qmatrix.msm, 11, 26–28, 33, 35, 37, 38, 40,44, 46

qmenorm (medists), 17qmeunif (medists), 17qpexp (pexp), 29qratio.msm, 39qtnorm (tnorm), 48

rmenorm (medists), 17rmeunif (medists), 17rpexp (pexp), 29rtnorm (tnorm), 48

save, 4sim.msm, 30, 40, 42, 43simmulti.msm, 28, 41, 41sojourn.msm, 27, 28, 39, 43, 51

statetable.msm, 8, 45summary.msm, 12, 13, 28, 36summary.msm (msm.summary), 46surface.msm, 47

tnorm, 14, 48totlos.msm, 5, 50transient.msm, 51

viterbi.msm, 36, 52

the msm package - leg-ufprmsc:msm.pdf · the msm package november 23, 2006 version 0.7 date...

Documents