danish mortality monitoring using the r package surveillancestats- · surveillance package...

24
surveillance package Aberration Detection Discussion References Danish Mortality Monitoring using the R package surveillance Michael H¨ ohle 1 1 Department for Infectious Disease Epidemiology Robert Koch Institute, Berlin, Germany Statistical Methods for Outbreak Detection Open University, Milton Keynes, UK 19 May 2010 R package surveillance Michael H¨ ohle 1/ 23

Upload: others

Post on 04-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Danish Mortality Monitoring using the R packagesurveillance

Michael Hohle1

1Department for Infectious Disease EpidemiologyRobert Koch Institute, Berlin, Germany

Statistical Methods for Outbreak DetectionOpen University, Milton Keynes, UK

19 May 2010

R package surveillance Michael Hohle 1/ 23

Page 2: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Outline

1 The surveillance package

2 Aberration DetectionNegative Binomial CUSUM – TheoryNegative Binomial CUSUM – Mortality MonitoringRun-length properties

3 Discussion

R package surveillance Michael Hohle 2/ 23

Page 3: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

What is surveillance? (1)

An open source package for the visualization, modelingand monitoring of count data and categorical time series inpublic health surveillance

Prospective outbreak detection methods for univariate countdata time series:

farrington – Farrington et al. (1996)cusum – Rossi et al. (1999) and extensionsrogerson – Rogerson and Yamada (2004)glrnb – Hohle and Paul (2008)

Retrospective count data time series models:

hhh – Held et al. (2005); Paul et al. (2008)twins – Held et al. (2006)

Spatio-Temporal cluster detection

stcd – Assuncao and Correa (2009).

R package surveillance Michael Hohle 3/ 23

Page 4: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

What is surveillance? (2)

Motivation: Provide data structure and implementationalframework for methodological developments

Spin-off: Tool for epidemiologists and others working inapplied disease monitoring

Availability: CRAN, current development version from

http://surveillance.r-forge.r-project.org/

Package is available under the GNU General Public License(GPL) v. 2.0.

R package surveillance Michael Hohle 4/ 23

Page 5: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

The EuroMOMO project

European monitoring of excess mortality for public healthaction (EuroMOMO)

Aim: develop and strengthen real-time monitoring of mortalityacross Europe in order to enhance the management of seriouspublic health risks such as pandemic influenza, heat waves andcold snaps

Main outcome of mortality monitoring: excess mortality

EuroMOMO in this talk

Danish mortality data provided by Statens Serum Institut,Denmark, are used to illustrate the surveillance package.

R package surveillance Michael Hohle 5/ 23

Page 6: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Data structure: The sts class

Surveillance time series {yit ; t = 1, . . . , n, i = 1, . . . ,m} arerepresented using objects of class sts

> data("momo")

> momo

-- An object of class sts --

freq: 52 with strptime format string %V

start: 1994-01-03

dim(observed): 782 8

Head of observed:

[0,1) [1,5) [5,15) [15,45) [45,65) [65,75) [75,85) [85,Inf)

[1,] 11 4 2 53 212 279 528 408

...

Dates can be stored using the R Date class which handles theISO 8601 date standard

R package surveillance Michael Hohle 6/ 23

Page 7: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Visualizing sts objects (1)

The plot function provides an interface to several visualrepresentations controlled by the type argument.

> plot(momo[year(momo) >= 2000, ], type = observed ~ time |

+ unit)

time

No.

infe

cted

2000

II

2003

II

2006

II

010

030

050

0

[0,1)

time

No.

infe

cted

2000

II

2003

II

2006

II

010

030

050

0

[1,5)

timeN

o. in

fect

ed

2000

II

2003

II

2006

II

010

030

050

0

[5,15)

time

No.

infe

cted

2000

II

2003

II

2006

II

010

030

050

0

[15,45)

time

No.

infe

cted

2000

II

2003

II

2006

II

010

030

050

0

[45,65)

time

No.

infe

cted

2000

II

2003

II

2006

II

010

030

050

0

[65,75)

time

No.

infe

cted

2000

II

2003

II

2006

II

010

030

050

0[75,85)

timeN

o. in

fect

ed

2000

II

2003

II

2006

II

010

030

050

0

[85,Inf)

R package surveillance Michael Hohle 7/ 23

Page 8: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Visualizing sts objects (2)

> plot(momo[, "[0,1)"], ylab = "No. of deaths")

time

No.

of d

eath

s

1994

II

1995

IV

1997

II

1998

IV

2000

II

2001

IV

2003

II

2004

IV

2006

II

2007

IV

05

1015

2025

Summarizing: The series contain small and large counts, trendsand seasonality → take this into account within a statistical model.

R package surveillance Michael Hohle 8/ 23

Page 9: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Statistical Framework for Aberration Detection

Univariate time series {yt , t = 1, 2, . . .} to monitor

At the unknown time τ , an important change in the processoccurs. For each time t we differentiate between two-states:

xt =

{0 if t < τ (in-control),1 otherwise (out-of-control).

At time s ≥ 1, the available information is ys = {yt ; t ≤ s}.Detection is based on a statistic r(·) with resulting alarm time

TA = min{s ≥ 1 : r(ys) > g},

where g is a known threshold.

R package surveillance Michael Hohle 9/ 23

Page 10: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Theory: Negative Binomial CUSUM (1)

Likelihood ratio between the out-of-control and in-controlmodels at time s given that τ = t:

L(s, t) =f (ys |τ = t)

f (ys |τ > s)=

s∏i=t

f (yi ; θ1)

f (yi ; θ0),

where f (·; θ) is the negative binomial PMF with parametervector θ.

Cumulative Sum (CUSUM) procedure advantageous fordetecting sustained shifts:

r(ys) = max{1 ≤ t ≤ s : log L(s, t)}.

R package surveillance Michael Hohle 10/ 23

Page 11: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Theory: Negative Binomial CUSUM (2)

The computation of r(ys) in recursive form:

r0 = 0,

rs = max

(0, rs−1 + log

{f (ys ; θ1)

f (ys ; θ0)

}), s ≥ 1.

When there is evidence against in-control, the LLRcontributions are added up.

No credit in the direction of the in-control is given because rscannot get below zero.

R package surveillance Michael Hohle 11/ 23

Page 12: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Theory: Negative Binomial CUSUM (3)

Negative-binomial response with fixed dispersion parameter αand in-control mean modeled using a GLM with log-link

yt ∼ NegBin(µ0,t , α),

log(µ0,t) = log(popt) + β0 + β1 · t + ct ,

where ct is a cyclic function with period 52 or 53 dependingon the number of ISO weeks in the year of t and popt denotesthe population size in the respective age group at time t.

As a consequence, E(yt) = µ0,t and Var(yt) = µ0,t + α · µ20,t

Out-of-control model for given κ > 1:

µ1,t = κ · µ0,t .

R package surveillance Michael Hohle 12/ 23

Page 13: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Application: Negative Binomial CUSUM (1)

Monitoring example: Age group 75-84 starting from week 40in 2007 (i.e. 1st October 2007) using past 5 years as reference:

> m <- glm.nb( `observed.[75,85)` ~ 1 + epoch + sin(2*pi*epochInPeriod) +

+ cos(2*pi*epochInPeriod) + offset(log(`population.[75,85)`)),+ data=momo.df[phase1,])

> mu0 <- predict(m, newdata=momo.df[phase2,],type="response")

Aim: to optimally detect a 20% increase in the mean, i.e.κ = 1.2. Use g = 4.75 – consequences?

> kappa <- 1.2

> s.nb <- glrnb(momo[, "[75,85)"], control = list(range = phase2,

+ alpha = 1/m$theta, mu0 = mu0, c.ARL = 4.75, theta = log(kappa),

+ ret = "cases"))

R package surveillance Michael Hohle 13/ 23

Page 14: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Application: Negative Binomial CUSUM (2)

For week 02 in 2008 an alarm is generated:

time (weeks)

No.

of d

eath

s

2007

IV

2008

II

2008

III

2008

IV

010

020

030

040

0

µ0 µ1 NNBA

Also shown is the number needed before alarm (NNBA), i.e.given r(ys−1) find the minimum ys such that r(ys) > g .

R package surveillance Michael Hohle 14/ 23

Page 15: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Application: Negative Binomial CUSUM (2)

For week 02 in 2008 an alarm is generated:

time (weeks)

No.

of d

eath

s

2007

IV

2008

II

2008

III

2008

IV

010

020

030

040

0

µ0GAM µ1

GAM NNBA

Also shown is the number needed before alarm (NNBA), i.e.given r(ys−1) find the minimum ys such that r(ys) > g .

R package surveillance Michael Hohle 14/ 23

Page 16: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Run-length of NegBin CUSUM (1)

Interest is in the PMF of TA. Compute this either by MonteCarlo simulation or by using a Markov chain approximation.

Generalization of Bissell (1984) to time varying count dataCUSUMs: dynamics of rt described by a Markov chain:

State 0 rt = 0State i rt ∈

((i − 1) · g

M , i ·gM

], i = 1, 2, . . . ,M

State M + 1 rt > g

Calculation of the (M + 2)× (M + 2) transition matrix Pt

with elements

pt,i ,j = P(rt ∈ State j |rt−1 ∈ State i), i , j = 0, 1, . . . ,M + 1

by approximations suggested in Hawkins and Olwell (1998)

R package surveillance Michael Hohle 15/ 23

Page 17: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Run-length of NegBin CUSUM (2)

State M + 1 is absorbing.

The cumulative probability of an alarm at any step up to timen, n ≥ 1, is:

P(TA ≤ n) =

[n∏

t=1

Pt

]0,M+1

The PMF of TA can thus be determined by subtraction

Now: Choose g such that P(TA ≤ 65|τ =∞) is below someacceptable value, e.g. 10%.

> pMarkovChain <- sapply(g.grid, function(g) {

+ TA <- LRCUSUM.runlength(mu = t(mu0), mu0 = t(mu0), mu1 = kappa *

+ t(mu0), h = g, dfun = dY, n = rep(600, length(mu0)),

+ alpha = 1/m$theta)

+ return(tail(TA$cdf, n = 1))

+ })

R package surveillance Michael Hohle 16/ 23

Page 18: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Run-length of NegBin CUSUM (3)

P(TA ≤ 65|τ =∞) as a function of g – computed by bothMonte Carlo simulation and the Markov chain approximation.

1 2 3 4 5 6 7 8

0.0

0.2

0.4

0.6

0.8

g

P(T

A≤

65|τ

=∞

)

0.1

Monte CarloMarkov chain

The Markov chain approximation is 5.0 times faster thanMonte Carlo based on 1000 samples.

R package surveillance Michael Hohle 17/ 23

Page 19: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Comparison with the Farrington algorithm

Fitted negative binomial model with mean µ0,t and dispersionαt , matching the quasi-Poisson model, as true model.

Based on 1000 realizations of I (TA ≤ 65|τ =∞) for theFarrington et al. (1996) algorithm with 2

3 -power transform,b = 5, w = 4 and α = 0.001, we obtain

P(TA ≤ 65|τ =∞) ≈ 0.19.

A rough estimate of this number would have been

1−(

1− α

2

)65= 0.03.

Note: Using farrington without reweighting and alwaysincluding a trend, we obtain the Monte Carlo estimate 0.04.

R package surveillance Michael Hohle 18/ 23

Page 20: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Discussion (1)

surveillance offers visualization, modeling and monitoringof count data and categorical time series.

Combined with Sweave/odfWeave the package can be usedfor automatic report generation using LaTeX/OpenOffice.

A starting point to learn more about the package is Hohle andMazick (2010) or the short course slides available from theR-Forge page.

The current package version is 1.1-6.

R package surveillance Michael Hohle 19/ 23

Page 21: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Discussion (2)

Current methodological work:

CUSUM changepoint detection in the binomial settingyt ∼ Bin(nt , πt) and the multinomial setting yt ∼ Mk(nt ,πt).The proportions are modeled by logistic, proportional odds andmultinomial logistic regression models (Hohle, 2010).Model based space-time cluster detection based on theadditive-multiplicative intensity model in (Hohle, 2009).

EuroMOMO work:

Extend to more demographics oriented two-dimensional counttable modelling indexed by time and age (Eilers et al., 2008):

log(µta) = log(popta) + vta + ftacos(ωt) + gta sin(ωt).

R package surveillance Michael Hohle 20/ 23

Page 22: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Acknowledgements

Persons:

Michaela Paul, Andrea Riebler and Leonhard Held, Institute ofSocial and Preventive Medicine, University of Zurich,Switzerland

Valentin Wimmer, Ludwig-Maximilians-Universitat Munchen,Germany, and Mathias Hofmann, Technical University ofMunich, Germany

Thais Correa, Department of Statistics, Universidade Federalde Minas Gerais, Belo Horizonte, Brazil

Financial Support:

German Science Foundation (DFG, 2003-2006)

Munich Center of Health Sciences (2007-2010)

R package surveillance Michael Hohle 21/ 23

Page 23: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Literature I

Assuncao, R., Correa, T., 2009. Surveillance to detect emerging space-time clusters.Computational Statistics & Data Analysis 53 (8), 2817–2830.

Bissell, A. F., 1984. The Performance of Control Charts and Cusums Under LinearTrend. Applied Statistics 33 (2), 145–151.

Eilers, P. H. C., Gampe, J., Marx, B. D., Rau, R., 2008. Modulation models forseasonal time series and incidence tabels. Statistics in Medicine 27, 3430–3441.

Farrington, C., Andrews, N., Beale, A., Catchpole, M., 1996. A statistical algorithmfor the early detection of outbreaks of infectious disease. Journal of the RoyalStatistical Society, Series A 159, 547–563.

Hawkins, D. M., Olwell, D. H., 1998. Cumulative Sum Charts and Charting for QualityImprovement. Statistics for Engineering and Physical Science. Springer.

Held, L., Hofmann, M., Hohle, M., Schmid, V., 2006. A two component model forcounts of infectious diseases. Biostatistics 7, 422–437.

Held, L., Hohle, M., Hofmann, M., 2005. A statistical framework for the analysis ofmultivariate infectious disease surveillance data. Statistical Modelling 5, 187–199.

Hohle, M., 2009. Additive-multiplicative regression models for spatio-temporalepidemics. Biometrical Journal 51 (6), 961–978.

R package surveillance Michael Hohle 22/ 23

Page 24: Danish Mortality Monitoring using the R package surveillancestats- · surveillance package Aberration Detection DiscussionReferences Run-length of NegBin CUSUM (1) Interest is in

surveillance package Aberration Detection Discussion References

Literature II

Hohle, M., 2010. Changepoint detection in categorical time series. In: Kneib, T., Tutz,G. (Eds.), Statistical Modelling and Regression Structures – Festschrift in Honourof Ludwig Fahrmeir. Springer, pp. 377–397.

Hohle, M., Mazick, A., 2010. Aberration detection in R illustrated by Danish mortalitymonitoring. In: Kass-Hout, T., Zhang, X. (Eds.), Biosurveillance: A HealthProtection Priority. CRC Press, to appear.

Hohle, M., Paul, M., 2008. Count data regression charts for the monitoring ofsurveillance time series. Computational Statistics & Data Analysis 52 (9),4357–4368.

Paul, M., Held, L., Toschke, A. M., 2008. Multivariate modelling of infectious diseasesurveillance data. Statistics in Medicine 27, 6250–6267.

Rogerson, P., Yamada, I., 2004. Approaches to syndromic surveillance when dataconsist of small regional counts. Morbidity and Mortality Weekly Report 53, 79–85.

Rossi, G., Lampugnani, L., Marchi, M., 1999. An approximate CUSUM procedure forsurveillance of health events. Statistics in Medicine 18, 2111–2122.

R package surveillance Michael Hohle 23/ 23