bayesian nonparametric modelling of recurrent …bayesian nonparametric modelling of recurrent event...

53
Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University College London Collaborators: Alessandra Guglielmi (Polimi, Milano, Italy) Marta Tallarita (UCL, London, UK), Giorgio Paulon (UT, Austin, US) Francesca Ieva (Polimi, Milano, Italy) and James Malone-Lee (UCL, London, UK) 21th September 2018 Bayesian Computation for High-Dimensional Statistical Models

Upload: others

Post on 27-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Bayesian Nonparametric Modellingof Recurrent Event Processes

Maria De Iorio

Yale-NUS College

Dept of Statistical Science, University College London

Collaborators: Alessandra Guglielmi (Polimi, Milano, Italy)Marta Tallarita (UCL, London, UK), Giorgio Paulon (UT, Austin, US)

Francesca Ieva (Polimi, Milano, Italy) and James Malone-Lee (UCL, London, UK)

21th September 2018

Bayesian Computation for High-Dimensional Statistical Models

Page 2: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Outline

Ü Models for recurrent events

Ü AR(p) models for Gap Times

Ü Choosing order p

– Urinary Tract Infection

– Readmission Data

Ü Semiparametric Model for recurrent event with termination

– Administrative Data Application ü HF

Ü Extensions and Conclusions

Page 3: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Recurrent Events

• events repeat over time

• large number of individual processes, small number of recurrentevents (recurrent infections, asthma attacks, blood donations)

Page 4: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Example: UTI

Best clinical marker if UTI is pyuria.

Recurrent events for two patients: the last waiting time of the patient onthe left is observed, while that of the patient on the right is censored. Redcircles denote zero WBC at the visit while green circles denote WBCgreater than 0.

Page 5: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Modelling Recurrent Events(Cook and Lawless, 2007)

For a single process:

• event count process

• gap/waiting times between successive events

Page 6: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Modelling Event Counts

• event counts {N(t), t ≥ 0}, N(t)= number of events in [0, t].

Intensity function:

λ(t|H(t)) = lim∆t↓0

P(N(t + ∆t−)− N(t−) = 1|H(t))

∆t

H(t) = {N(s), 0 < s < t} history of the process at time t

λ(t|H(t)) instantaneous probability of an event occurring at t,conditional on the process history

• useful when individuals frequently experience the events of interest,and the events are incidental (occurrence does not materially alter theprocess itself)

Page 7: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Modelling Gap Times

• gap/waiting times between successive events W1,W2,W3, . . .

Wj = Tj − Tj−1

useful:

– when events are relatively infrequent– when some type of individual renewal occurs after an event– when prediction of the next time is of interest

Interpretation: W1,W2,W3, . . . can be interpreted as observationson a time series at equally spaced time points j = 1, 2, 3, . . .

Ex: recurrent infections (when an individual returns to a similar stateafter the infection has been cleared), recurrent hospitalizations

Page 8: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Models for Gap Times based on generalizedregressions

For a single process/individual:

L(W1,W2, . . . ,Wn) =

L(W1)× L(W2|W1)× L(W3|W1,W2)× . . .× L(Wn|W1, . . . ,Wn−1)

Model L(Wj |W1, . . . ,Wj−1) for all j :

• AFT models, e.g. logWj is Gaussian distributed, with meandepending on covariates, previous gap times, individual random effect

• PH models, where the hazards of Wj depend on covariates, previousgap times, individual random effect

Page 9: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Models for Gap Times based on generalizedregressions

For a single process/individual:

L(W1,W2, . . . ,Wn) =

L(W1)× L(W2|W1)× L(W3|W1,W2)× . . .× L(Wn|W1, . . . ,Wn−1)

Model L(Wj |W1, . . . ,Wj−1) for all j :

• AFT models, e.g. logWj is Gaussian distributed, with meandepending on covariates, previous gap times, individual random effect

• PH models, where the hazards of Wj depend on covariates, previousgap times, individual random effect

Page 10: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Models for Gap Times

t = 0: start of all event processes (e.g. first infection);

[0, τi ]: time window

ni − 1 observed events

0 := ti0 < ti1 < ti2 < . . . < tini−1, wij := tij − tij−1

wini := τi − tini−1 ni -th event is censored

Number of events per individual can be different.

Page 11: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Likelidood for m individuals

L =m∏i=1

ni−1∏

j=1

fj(wij |zij)

Sni (wini |zini )

where zij = (xij ,wi1, . . . ,wij−1)

Page 12: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

The model - likelihoodYij := log(Wij), xij covariate vector at recurrence j , j = 1, . . . , ni ,i = 1, . . . , n

Yij |xij ,βj ,αi , σ2 ∼ N (xT

ij βj + αij , σ2)

The likelihood can be rewritten by considering the joint distribution ofYi = (Yi1, . . . ,Yini )

Yi |αi ,β1, . . . ,βJ , xi , σ2 ind.∼ Nni (mi ,Σi )

• Conditional independence between different patients

• Wij has generalized AFT distribution, i.e. logWij = zijβ + σεj , where zij = (xij , 1)

• frailty term : αij depends on the recurrence occasion j , but also on the item i

• the regression parameters βj , expressing the effect of covariates xij , depend on the

recurrence occasion j

Page 13: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Nonparametric AR(p) Frailty Model

Autoregressive parameters αi = (ai1, . . . , αini )

αij | mi0,mi1, . . . ,mip, τind.∼ N (mi0 +

p∑l=1

mil Yij−l , τ2)

j = p + 1, . . . , ni + 1

(mi0,mi1, . . . ,mip) | G iid∼ G

G ∼ DP(M,G0)︸ ︷︷ ︸nonparametric prior

The distribution of αij for j ≤ p, depends only on the available pastobservations as in any AR(p) model.

Page 14: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Why Bayesian Nonparametrics• model flexibility and robustness, as

parametric models are often imposeunrealistic constraints.

• models we consider are in fact richlyparametric (formally, using aninfinite-dimensional parameter space)rather than nonparametric.

• Bayesian nonparametric models involve placing prior distributions onbroad families of probability distributions; e.g. Mixtures of Polya trees(MPT) and Dirichlet Processes mixtures (DPM).? nice theoretical properties? ease of computation? ease of interpretability? extensions to more complex setups? accommodates for heterogeneity in population, over-dispersion of the

observations, outliers? automatic clustering of individuals/identification of subgroups

Page 15: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Background

• Bayesian Nonparametrics (BNP) deals with infinite dimensionalparameter spaces

• Traditional prior distributions becomes stochastic process priors

• Focus on stochastic process on discrete probability measures

G =∞∑h=1

whδθh ,

Assign flexible prior to G , e.g. DP, PT, Pytman-Yor, . . .

Page 16: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Dirichlet Process (DP)

Probability model on distributions:

G ∼ DP(α,G0)

• Two parameters:

- G0: distribution on (Θ,A)- α: precision on R+

• G is a.s. discrete

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

α = 1

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

α = 10

Page 17: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Sethuraman’s stick breaking representation

G =∞∑h=1

whδθh

ξh ∼ Beta(1, α)

wh = ξh

h−1∏i=1

(1− ξi ), scaled Beta distribution

θhiid∼ G0, h = 1, 2, . . .

where δ(x) denotes a point mass at x , ψh are weights of point masses atlocations θh.

Page 18: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Dirichlet Process Mixtures (DPM)In many data analysis applications the discreteness is inappropriate.

To remove discreteness: convolution with a continuous kernel

f (y | θ) = p(y | θ)

θ | G ∼ G

G ∼ DP(α,G0)

=⇒ sampling model is

f (y) =

∫p(y | θ)dG (θ)

G ∼ DP(α,G0)

Page 19: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Dirichlet Process Mixtures (DPM)

. . . or with latent variables θi

G ∼ DP(α,G0)

θi ∼ G

f (yi ) = p(yi | θi )

Nice feature: Mixture is discrete with probability one, and with small α,there can be high probabilities of a finite mixture.

Often p(y | θ) = N(β, σ2) −→ f (y) =∑∞

h=1 ψhN(βh, σ2)

Page 20: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Comment

Under G : p(θi = θi ′) > 0

Observations share the same θ ⇒ belong to the same cluster

⇒ DPM induces a random partition of the observations {1, . . . , n}

Note: in this case the clustering of observations depends only on the the distribution of

y .

Page 21: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Going back to Recurrent Events model....

Gap times Yij := log(Wij), j = 1, . . . , ni , i = 1, . . . , n

Yij |xij ,βj ,αi , σ2 ∼ N (xT

ij βj +αij , σ2)

Autoregressive parameters αi = (αi1, . . . , αini )

αij | mi0,mi1, . . . ,mip, τind.∼ N (mi0 +

p∑l=1

mil Yij−l , τ2)

Random effect distribution:

(mi0,mi1, . . . ,mip) | G iid∼ G

G ∼ DP(M,G0)

The model for αi is DPM =⇒ clustering based on mi .

Page 22: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Hyperpriors

βjiid∼ Nq(0, β2

0 Iq)

σ2 ∼ Inv-Gamma(aσ, bσ)

τ2 ∼ Inv-Gamma(aτ , bτ )

M ∼ U(0,M0)

G0 = N (0, σ2g ) × TBeta(aZ , bZ )× · · · × TBeta(aZ , bZ )︸ ︷︷ ︸

p times

.

assuming a priori independence among the different parameters.

TBeta(aZ , bZ ) denotes the translated Beta distribution defined on theinterval (−1, 1) with density proportional to(y + 1)aZ−1(1− y)bZ−11(−1,1)(y).

Page 23: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Testing for the Order of Dependence

The order of dependence on past observations, p is often unknown inapplications, and it needs to be estimated.

Two approaches:

• Include in the base measure of the DP a spike and slab distribution onthe autoregressive coefficient, leading to Spiked Dirichlet process priorintroduced by Kim et al. (2009)

• specify directly a prior on p, and then, conditional on p, specify theprior distribution for the remaining parameters. The dimension of theparameter vector (mi0,mi1, . . . ,mip) changes according to p andconsequently the dimension of the space where the Dirichlet processmeasure is defined (Quintana & Muller (2012)) .

Page 24: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Example:UTI

• 306 female patients

• 28% censoring

• 975 gap times

• include for each patient a5-dimensional vector oftime-varying covariates: thestandardized age of the patientand symptoms (urgency, pain,stress incontinence and voidingsymptoms).

Page 25: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Posterior Inference on p

Predictive marginal distributions of mi0, mi1, mi2 and mi3 obtained with spike and slabvariable selection. No dependence on previous gap times.

Specifying a prior directly on p leads to a posterior distribution with mode on 0.

Page 26: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Example: Hospital Readmissions

• we consider the readmission dataset in the R package frailtypack.

• The dataset contains rehospitalization times (in days) after surgery inpatients diagnosed with colorectal cancer.

• The origin of the time axis is the date of the surgical procedure

• consider data on N = 197 patients, for a total number of 495recurrent events.

• 60% censoring

• we include covariates:

3 chemo: variable indicating if the patient received chemotherapy3 sex : gender of the patient3 dukes: classification of the colorectal cancer, ordinal (A-B, C, D)3 charlson: Charlson comorbidity index, ordinal (0, 1-2, 3), time-varying

Page 27: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Posterior Inference on p

• Predictive marginaldistributions of mi0, mi1,mi2 and mi3 obtainedwith spike and slabvariable selection.

• Note that the spikeand slab prior allows tohave clusters withdifferent timedependency.

• Specifying a prior

directly on p leads to a

posterior distribution

with mode on 2.

Page 28: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Recurrent Events with Termination

• events repeat over time

• large number of individual processes, small number of recurrent events

• when a recurrent event process is terminated by another event (e.g.death), that is the absorbing state

• the terminating event may be observed or censored

Page 29: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Heart failure (HF)

3 Heart failure is a chronic (progressive) condition in which the heartmuscle is unable to pump enough blood through to meet the body’sneeds for blood and oxygen

3 Symptoms: shortness of breath (dyspnea), cardiac asthma, stasis inthe systemic circulation or in the portal circulation, edema, cyanosis,hypertrophy of the heart and unusual fatigue, lack of appetite ornausea, impaired thinking, increased heart rate

3 Causes: coronary artery disease including a previous myocardialinfarction (heart attack), high blood pressure, atrial fibrillation,valvular heart disease, excess alcohol use, infection, andcardiomyopathy of an unknown cause

3 Diagnosis: based on the history of the symptoms and a physicalexamination with confirmation by echocardiography, and other tests.

3 Treatment: e.g. pace-maker, drugs

Page 30: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

HF Burden

3 HF is one of the main causes of morbidity, hospitalization and deathin the western world and the related economic burden is relevant andexpected to increase.

3 HF prevalence significantly increases with age

3 Individuals are often admitted to hospitals and outpatient careservices.

3 Multiple readmission are burdensome to the patient and healthcaresystem.

3 For example, it costs about 30 billion dollars each year in US (morethan 6 million people in US are affected); about 80% of those costsare due to hospitalization (http://www.heart.org/)

3 The average cost a HF-related event in Lombardia (most populatedItalian region) is around 6000 Euro

Page 31: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

HF data project

Italian Ministry of Health funded a project proposed by the managementof the healthcare system in Lombardia

Participants: Politecnico di Milano (Dept of Management, Dept ofMathematics), Regione Lombardia, hospitals (Niguarda, San Carlo, etc)

Aim: to build a large and reliable database on patients hospitalized for HF,which should link data on hospitalizations, outpatient service utilization,and drug prescriptions and could be used for epidemiological purposes,cost analysis, risk prediction, and quality of care evaluation

In Italy there is a universalistic health care system: the regional officesserve as regulators and payers. Reimbursement to healthcare providers isbased on a protocol.

Database was originally intended to help policy makers to better manageHF burden of the healthcare costs of Lombardia.

Page 32: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Administrative data

Administrative data are complete, systematic, continuous in time and free

Usually characterised by large numbers and provide an invaluable source tostudy prevalence and incidence of major diseases.

Disadvantages of the use of administrative data: lack of accuracy,difficulties in linkage, merging, different coding

In our application no detailed information on relavant clinical markers isavailable.

These limitations are particularly relevant when studyingconditions/diseases characterized by transition from chronic to acutephases and back, as for HF

Need for flexible and robust models: BNP models!

Page 33: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Administrative data

Administrative data are complete, systematic, continuous in time and free

Usually characterised by large numbers and provide an invaluable source tostudy prevalence and incidence of major diseases.

Disadvantages of the use of administrative data: lack of accuracy,difficulties in linkage, merging, different coding

In our application no detailed information on relavant clinical markers isavailable.

These limitations are particularly relevant when studyingconditions/diseases characterized by transition from chronic to acutephases and back, as for HF

Need for flexible and robust models:

BNP models!

Page 34: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Administrative data

Administrative data are complete, systematic, continuous in time and free

Usually characterised by large numbers and provide an invaluable source tostudy prevalence and incidence of major diseases.

Disadvantages of the use of administrative data: lack of accuracy,difficulties in linkage, merging, different coding

In our application no detailed information on relavant clinical markers isavailable.

These limitations are particularly relevant when studyingconditions/diseases characterized by transition from chronic to acutephases and back, as for HF

Need for flexible and robust models: BNP models!

Page 35: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Data

Recurrence process: hospitalizations for HF of the subject

Termination: death of the subject

• identification of hospitalizations for HF (for all adults resident inLombardia) from all hospitals in Lombardia, hospitalized for the firsttime from 01/01/2006 to 31/12/2012

• in-hospital deaths collected from the hospitals; out-hospital deathscollected from the vital statistics regional dataset

• huge dataset, more than 217 thousand subjects

• N = 810 patients (with at least 1 occurrence in addition to the first –the origin of time); 44% has right-censored survival time

Page 36: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Aim

• develop a joint model for waiting times between hospitalizations and survivalwithin a BNP framework for data from HF in Regione Lombardia;

• clustering of subjects according to their history of recurrent events andtermination;

• assessment of the relationship between event occurrence, survival andcovariates; particularly important because administrative data may beinaccurate!

• time-to-event is the main clinical outcome of interest, but also the dynamicsof the hospitalization process itself; controlling/reducing the costs ofhospitalizations is important;

• model the relationship between survival times and recurrence of events;several re-hospitalizations are often related to risk of death and are likely toaffect it;

• a better understanding of how recurrences affect survival may lead to a moreeffective planning of healthcare resources.

Page 37: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Joint Model for gap times and termination

0 =: Ti0 <Ti1 < Ti2 < · · · < Tin < Ti ≤ ζi (= 31/06/2012)

Recurrent Events Time to Event

Gap Times: Wij = Tij − Tij−1, j = 1, . . . , ni

Last gap Wini+1 time censored by ζiJoint Model: Wi1, . . . ,Wini ,Wini+1, TiCovariates: xij fixed-time and time-varying covariates influencing Wij

zi fixed predictors of Ti

dependent censoring of gap times by termination : semi-competing risksmodel

Idea: common frailty parameter shared by Ti and Wi1, . . . ,Wini as in Huang and Liu

(2007), and the prior is nonparametric (Brown and Ibrahim, 2003, BNP model for a

longitudinal outcome used as predictor for the time-to-event)

Page 38: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

A BNP semi-competing risks model: likelihood + BNP prior component

Model:

logWij | βj , xij , αi , σi ind∼ N

(xijβj + αi , σ

2i

), j = 1, . . . , ni + 1;

log Ti | zi ,γ, αi , ψ, η2i

ind∼ N(ziγ + ψαi , η

2i

); i = 1, . . . ,N

(αi , σ2i , η

2i ) | G ∼ G , G ∼ DP(M,G0)

xij : p fixed-time and q time-varying covariateszi : r fixed-time covariates

• the subject-specific random effect αi takes into account the dependentcensoring of gap times by termination and the correlation between differentgap times of the same subject i

• αi allows the clustering to depend on both gap times trajectories andsurvival outcome

Page 39: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Parametric Prior Component

Prior independence among parameters β0, (β1, . . . ,βJ), γ, ψ,{(αi , σ

2i , η

2i )i}

β0 ∼ Np(0, β20 Ip)

β1, . . . ,βJ | µ := (µ1 . . . , µq)T , Σ := diag(τ21 , . . . , τ

2q )

iid∼ Nq(µ,Σ)

µ1, . . . , µqiid∼ N (0, σ2

µ), τ21 , . . . , τ

2q

iid∼ Inv-Gamma(aτ , bτ )

γ ∼ Nr (0, γ20 Ir )

ψ ∼ N (0, ψ20)

G0 = N (0, α20)× inv-Gamma(aσ, bσ)× inv-Gamma(aη, bη)

M ∼ U(aM , bM).

Several variations, e.g. Var(Yij |par) = σij or η2i = η2 ∼ parametric

marginal prior, different distributional assumptions

Page 40: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Application to HF datasetN = 810 patients (with at least 1 occurrence in addition to the first - theorigin of time); 44% has right-censored survival time.

Page 41: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Fixed Time Covariates

measured at entrance in the study

• gender of the patient (=1 if female): proportion of women for each jis ' 50%

• age [years] of the patient at each hospitalization; empirical means:

Women Men TOT

78.44 73.14 75.77

• group: clinical classification of the patient according to worldwideindicators (Mazzali et al., 2016)G1: patient having HF as the cause of admission or complicating anothercardiac disease (standard HF)G2: patient with myocardial or cardiopulmonary diseases

G3: patient with Acute HF as a complication of other diseases or for whom

HF is reported as comorbidity + G4 (3 subjects only).

/ group: binary covariate, =0 if G1 (560 patients), and 1 otherwise(250).

Page 42: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Time-varying covariates

• rehab: binary, =1 if any time during the hospitalization is spent in arehabilitation unit; 11.78% of the hospitalizations, corresponding to 29.01%of the patients.

• ic : binary, =1 if at least a part of the hospitalization is spent in a intensivecare unit; 11.95% of the hospitalizations, corresponding to 31.48% ofpatients

• n com: total number of comorbidities (e.g. dementia, weight loss, alcoholabuse, tumor, arrhythmia, hypertension, pulmonary disease, diabetes,psychosis, HIV/AIDS) for each hospitalization. Sample means for all patientsobserved at the j-th gap times range from 2.19 to 5.09, increasing with j

• n pro: total number of surgical procedures (e.g. CABG - Coronary ArteryBypass Grafting, PTCA - angioplasty) for each patient hospitalization.Sample means for all patients observed at the j-th gap times range from0.03 to 0.15

• summary of time-varying covariates included in the survival regression

Page 43: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Posterior Inference on Regression Coeff−

0.8

−0.

6−

0.4

−0.

20.

00.

2

β01 β02 β03

−3

−2

−1

0

γ1 γ2 γ3 γ4 γ5 γ6 γ7

Gender, Pathology, Age Gender, Pathology, Age, rehab, ic

# of comorb., # of surgical proced.

95% Marginal posterior credible intervals of the regression coefficients β0 (left) in

the gap time and γ (right) survival time likelihood components of the

time-homogeneous covariates under our model.

Page 44: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Summary

• there is no effect of gender on the gap times, but a weak effect isdetectable on the survival times (with a negative effect on the survivaltime for women)

• the group variable seems to be weakly relevant for the gap times(patients with non-standard pathology, i.e. not in group G1, havelarger gap times); moreover, it has a negative influence on thesurvival time;

• the age variable is a predictor of both gap times and survival: inparticular, there is evidence that the average time betweenhospitalizations and the survival times are shorter for older patients.

Page 45: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Time-varying Covariates

−1.

0−

0.5

0.0

0.5

1.0

Gap time

1 2 3 4 5 6 7 8 9 10

−1.

5−

1.0

−0.

50.

00.

51.

0

Gap time

1 2 3 4 5 6 7 8 9 10

Rehabilitation status Intensive care unit status

−0.

50.

00.

51.

01.

5

Gap time

1 2 3 4 5 6 7 8 9 10

−1.

5−

1.0

−0.

50.

00.

5

Gap time

1 2 3 4 5 6 7 8 9 10

# of comorbidities # of surgical procedures

Page 46: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Summary

• Time-varying covariates are reported at the end of each gap time.

• In general time-varying covariates do not appear to have a strongeffect on the recurrence process, except for few early gap times

• as expected, the uncertainty on the effect estimates increases overtime, due to the smaller number of available observations.

• ic has an effect on the distribution of waiting times, as patients withic equal 1 show shorter gap times

• both a large number of comorbidities and a large number of surgicalprocedures yield frequent hospitalizations at the beginning of thestudy, followed by an opposite effect for later gap times.

Page 47: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Posterior Inference on ψ

• ψ reflects the strength of therelationship between the twoprocesses

• Posterior of ψ is centred away fromzero, on the positive axis =⇒ asthe time between hospitalizationswidens, the probability of survivalincreases as well.

• Consistent with the fact that

rehospitalization is common among

patients affected by HF and that

the course of this disease is often

characterized by repeated hospital

admissions at relatively short

intervals and a limited prognosis for

survival.

1.34 1.36 1.38 1.40 1.42 1.44 1.46 1.48

05

1015

20

ψ

Marginal posterior density of ψ.

Page 48: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Cluster modelα1, . . . ,αn are a sample from a DP, i.e. a sample with ties:

i and j are in the same cluster ⇔ αi = αj

If ρ = {C1, . . . ,Ck} is a partition of the data index set {1, . . . , n}, thenthe model induces a prior for ρ:

π(ρ) = P(ρ = {C1, . . . ,Ck}) = f (#C1, . . . ,#Ck),

where f is an infinite exchangeable partition probability function (DP inthis case)

Cluster estimates based on the estimate of π(ρ|data) minimizing posteriorexpected value of the Binder’s loss function - with equal weights(Lau-Green, 2007).

- clustering of the individuals in the sample is based on the frailtyparameter αi ’s, i.e. based on the whole trajectories and the survivaltime.

Page 49: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Clustering Inference

3 4 5 6 7

02

46

810

1

2

3

4

5

6

++++++ ++

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+++++++++ +++++++++

+++++++++++++++++++++++++++++++++++

p < 0.0001

0.00

0.25

0.50

0.75

1.00

0 500 1000 1500 2000 2500Time

Sur

viva

l pro

babi

lity

Cluster

++++++

1

2

3

4

5

6

α Kaplan Meier Estimates

Page 50: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Summary

3 The largest cluster of patients (56.85% of the patients, cluster 2) ischaracterized by large survival times.

3 Other 2 clusters (3 and 5) include mostly censored observations andare characterized by large survival times and long gap-timetrajectories. However, Cluster 3 shows shorter gap times with a largeaverage number of hospitalizations, whereas Cluster 5 includes youngpatients with the largest waiting times but with only few recurrentevents.

3 Clusters 1, 4 and 6 present shorter time intervals betweenhospitalizations compared to the others clusters as well as shortersurvival times.

3 The percentage of patients with standard pathology (i.e. group=0) issimilar to the overall rate (∼ 70%) in each cluster but in Cluster 5,where it is 47%.

Page 51: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Extensions3 Include autoregressive term in the model:

logWij | all the restind∼ N

(xijβj + αi +

r∑k=1

δik logWij−k , σ2i

)log Ti | all the rest

ind∼ N(ziγ + ψαi , η

2i

); i = 1, . . . ,N

with parametric or nonparametric prior on the autoregressive coefficients δik

3 Extend the BNP Autoregressive Frailty Model to include a survival outcome:

Ti | γi , z?i , ρi ∼ f (ti | z?i γi , ρi )logWij |xij ,βj ,αi , σ

2 ∼ N (xTij βj + αij , σ

2)

αij | mi0,mi1, . . . ,mip, τind.∼ N (mi0 +

p∑l=1

mil logWij−l , τ2)

(ρi ,mi0,mi1, . . . ,mip) | G iid∼ G , G ∼ DP(M,G0)

We link survival and recurrence process by setting a joint nonparametric prior.This is in the spirit of the Random Partition Model with Covariates (Muller &Quintana, 2010) ý Augmented Response Models

3 Model time course of longitudinal markers

Page 52: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Conclusions

3 HF among major causes of hospitalization and death in population and the mainreason for hospital admission in patients ≥ 65 in western countries; urgent need formethods supporting healthcare management

3 we have highlighted important determinants of reapeted hospitalization as well assurvival. For example, advanced age, morbidity load, ic and rehabilitation. Thevariable pathology has an effect on gap times and reflects different HF protocols.

3 Profiling patients according their risk profile should help the implementation ofappropriate intervention strategies.

3 BNP models, which are flexible and robust, can be useful because administrativedata lack accuracy

3 joint semiparametric model for recurrent HF hospitalizations and time to death;our approach jointly models survival and the hospitalizations times, specifying aDP as random effect distribution of the frailty parameter that links the survivaland gap time trajectories

Page 53: Bayesian Nonparametric Modelling of Recurrent …Bayesian Nonparametric Modelling of Recurrent Event Processes Maria De Iorio Yale-NUS College Dept of Statistical Science, University

Wrap-up and Acknowledgements3 models for gap times, since events are relatively infrequent and prediction is of

interest

3 recurrent events: rehospitalizations of patients due to Chronic Heart Failure, UTIs,readmission after surgery for cancer patients

3 easy to generalize PH or AFT models to more general models with AR(p)dependence on the gap times or on the random effects

3 BNP priors in order to model more flexible AR(p) parameter distribution and toachieve cluster estimates of patients

3 BNP prior to model shared frailty between survival and recurrence process

3 other Bayesian nonparametric priors could be OK, at the cost of more expensivecomputations

3 future work: much richer dataset, i.e. wider sample size and deeper understandingof new and old explanatory variables, spatial information, longitudinal markers;parallel algorithms for inference for the complete dataset; or BNP covariate-drivenclustering models

3 Support received by the Royal Society (UK) through the InternationalExchanges Scheme 2014/R2 (inc CNRS).