chapter 3 frailty models based on the gompertz baseline...
TRANSCRIPT
Chapter 3
Frailty Models Based on the
Gompertz Baseline Distribution
3.1 Introduction
Mostly Weibull distribution is considered for baseline hazard. It is well known that the Weibull
model is not suitable for modeling survival data in all situations. Gompertz distribution is one of
the most important growth models. It has many applications in, for example, medical, biological
and actuarial studies. The Gompertz distribution plays an important role in modeling human
mortality and fitting actuarial tables. It has been used as a growth model and also used to fit
the tumor growth.
This Chapter consider shared frailty models with Gompertz baseline distribution. The mate-
rial of this Chapter i.e. Shared gamma frailty model with Gompertz baseline has been accepted
by the journal Statistics and Probability Letters, 82, 1310-17, (2012) and the remaining content
of this Chapter i.e. Shared inverse Gaussian frailty model with Gompertz baseline has been
submitted for publication in the well-known international journal.
57
Gompertz Baseline Distribution 58
3.2 Gompertz Baseline Distribution
Gompertz model, used most frequently by medical researchers and biologists in modeling the
mortality ratio data, was formulated by Gompertz (1825). Gompertz distribution is a growth
model and has been used in relation with tumor development. Ahuja and Nash (1979) showed
that Gompertz distribution, with a simple conversion, related to some distributions in the Pear-
son distributions family. According to Jaheen (2003), Garg et al. (1970) obtained the maximum
likelihood estimations of the Gompertz distribution parameters. Osman (1987), used a Gompertz
distribution with two parameters, worked on the features of the distribution and offered that it
should be used in modeling the lifespan data analyzing the survival ratio in heterogenic masses.
It has been widely used, especially in actuarial and biological applications and in demography.
A continuous random variable T is said to follow a Gompertz distribution with parameters
λ > 0 and γ > 0 (T ∼ Gompertz(λ, γ)), if the baseline hazard function corresponds to
h0(t) =
λ exp(γ t) ; t > 0, λ > 0, γ ∈ R
0 ; otherwise.(3.1)
and the cumulative hazard function is
H0(t) =
λγ−1(exp(γ t)− 1) ; t > 0, λ > 0, γ ∈ R
0 ; otherwise.(3.2)
where λ and γ are scale and shape parameters, respectively. For γ = 0 the baseline hazard (3.1)
reduces to the exponential hazard.
The corresponding survival function and probability density function of Gompertz distribu-
tion are
S0(t) =
exp[−λγ−1(exp(γt)− 1)] ; t > 0, λ > 0, γ ∈ R
0 ; otherwise.(3.3)
and
f0(t) =
λ exp(γ t) exp[−λγ−1(exp(γt)− 1)] ; t > 0, λ > 0, γ ∈ R
0 ; otherwise.(3.4)
We note that for γ > 0, S0(t) goes to zero for t→∞. With γ < 0, S0(t) goes to 0 < exp(λγ−1) <
1 for t → ∞. Therefore the event never occurs for a proportion exp(λγ−1) of the population.
We therefore consider the case γ > 0.
Gompertz Baseline Distribution 59
In this Chapter, the two-parameter Gompertz distribution is considered. Let us assume that
the independent random variables T1 and T2 have Gompertz distribution with parameters λ1, γ1
and λ2, γ2, respectively. In short we say Gomp(λj , γj), (j = 1, 2).
Here, T1 and T2 are independent, thus, H(ti1, ti2) = H1(ti1) + H2(ti2), where Hj(tij) is the
integrated hazard of Tij , (i = 1, 2, ..., n; j = 1, 2).
Under this independency assumption, using equation (1.1), the resulting regression model
H(tij |Ui, Xi) = uiλjγ−1
j (exp(γj tij)− 1) exp(x′iβ), (i = 1, .....n; j = 1, 2) (3.5)
is indeed an extended cumulative proportional hazards model conditional on frailty and fixed
factors. The conditional survival function of the jth individual in the ith cluster is given by
S(tij |Ui, Xi) = exp[−uiλjγ−1
j exp(x′iβ)(exp(γj tij)− 1)], tij ≥ 0, λj > 0, γj > 0 (3.6)
Here value of Ui is common to two components in a group. When there is no variability in the
distribution of Ui, that is, when Ui has a degenerate distribution then there is no dependency.
When the distribution is not degenerate, the dependence is positive. The value of Ui can be
considered as generated from unknown values of some explanatory variables. Conditional on
Ui = ui, the bivariate survival function is
S(ti1, ti2|Ui, Xi) = S(ti1|Ui, Xi)S(ti2|Ui, Xi)
= exp[−ui exp(x′iβ){λ1γ−11 (exp(γ1ti1)− 1) + λ2γ
−12 (exp(γ2ti2)− 1)}] (3.7)
where Ui (i = 1, 2, · · · , n) follows gamma distribution and inverse Gaussian distribution given
in (1.7) and (1.18), respectively.
3.2.1 Proposed Models
Substituting cumulative hazard function for Gompertz distribution in equations (1.9) and (1.20),
we get the unconditional bivariate survival functions based on gamma frailty and inverse Gaus-
sian frailty as,
Sθ(ti1, ti2|Xi) = [1 + θ exp(x′iβ){λ1γ−11 (exp(γ1ti1)− 1) + λ2γ
−12 (exp(γ2ti2)− 1)}]−1/θ (3.8)
Gompertz Baseline Distribution 60
and
Sθ(ti1, ti2|Xi) = exp[
1− (1 + 2θ exp(x′iβ){λ1γ−11 (exp(γ1ti1)− 1) + λ2γ
−12 (exp(γ2ti2)− 1)})1/2
θ
](3.9)
Here onwards we call equations (3.8) and (3.9) as Model-I and Model-II, respectively. Thus,
Model-I is the shared gamma frailty model under Gompertz baseline hazard and Model-II is the
shared inverse Gaussian frailty model under the same Gompertz baseline hazard.
Once we have unconditional survival function of bivariate random variable (Ti1, Ti2) we can
obtain likelihood function and estimate the parameters of the model. Here onwards in this
Chapter we represent Sθ(ti1, ti2|Xi) as Sθ(ti1, ti2).
3.3 Fitting of Models
Inferential procedures for frailty models have almost exclusively focused on likelihood based
approaches (Liang et al., 1995). The likelihood function in (2.7) for the random censorship
model can be used for the estimation procedure. To estimate the parameters of the model firstly
we must have to obtain likelihood function given by (2.7) for which we need to obtain the
functions, fi1, fi2, fi3, and Fi. Differentiating survival function for each of the model given in
equations (3.8) and (3.9) for obtaining these functions.
For Model-I, we have
fi1 = (1 + θ)λ1λ2 exp(2x′iβ) exp(γ1ti1 + γ2ti2)S(1+2θ)θ (ti1, ti2)
fi2 = λ1 exp(x′iβ) exp(γ1ti1)S(1+θ)θ (ti1, wi)
fi3 = λ2 exp(x′iβ) exp(γ2ti2)S(1+θ)θ (wi, ti2)
Fi = Sθ(wi, wi)
where Sθ(., .) is given by equation (3.8).
Gompertz Baseline Distribution 61
For Model-II, we have
fi1 =λ1λ2 exp(γ1ti1 + γ2ti2)Sθ(ti1, ti2)φ2(ti1, ti2)
[φ1(ti1, ti2)]32
exp(2x′iβ)
fi2 =λ1 exp(γ1ti1)Sθ(ti1, wi)
[φ1(ti1, wi)]12
exp(x′iβ)
fi3 =λ2 exp(γ2ti2)Sθ(wi, ti2)
[φ1(wi, ti2)]12
exp(x′iβ)
Fi = Sθ(wi, wi)
where φ1(ai, bi) = 1 + 2θ{λ1γ−11 (exp(γ1ai − 1)) + λ2γ
−12 (exp(γ2bi − 1))} exp(x′iβ); φ2(ai, bi) =
θ + [φ1(ai, bi)]12 and Sθ(., .) is given by equation (3.9).
Now we apply MCMC procedure discussed in Section (2.5) to estimate the parameters of
the models. Firstly, we consider a simulation study to evaluate the performance of the Bayesian
estimation procedure and then we apply the procedure to litters of rat data.
3.4 Simulation Study
To evaluate the performance of the Bayesian estimation procedure we carried out a simulation
study. For the simulation purpose we have considered only one covariate X = X1 which we
assume to follow binomial distribution for Model-I and normal distribution for Model-II. With
one covariate, the shared gamma frailty model and the shared inverse Gaussian frailty model
given in equation (3.8) and (3.9), respectively, has six parameters. The frailty variable U is
assumed to have gamma distribution and inverse Gaussian distribution with variance θ. Lifetimes
(Ti1, Ti2) for ith pair are conditionally independent for given frailty Ui = ui. We assume that
Tij (i = 1, . . . , n; j = 1, 2) follows the Gompertz baseline distribution. As the Bayesian methods
are time consuming, we generate only fifty, seventy-five and one hundred pairs of lifetimes using
inverse transform technique.
According to the assumption, for given frailty U , lifetimes of individuals are independent.
So, the conditional survival function for an individual for given frailty U = u and a covariate X
at time t > 0 is,
S(t | U,X) = exp[−uH0(t) exp(x′iβ)] (3.10)
Gompertz Baseline Distribution 62
Equating S(t | U,X) to a random number say r (0 < r < 1) over t > 0, using (3.6), we get,
t = γ−1ln
[1 +
γA
λ
](3.11)
where A = − ln(r)u exp(xβ) . Equations (3.11) is a generator to generate lifetimes for model (3.8) and
(3.9). We have generated different random samples of size n = 50, 75 and 100 for lifetimes Ti1
and Ti2 using (3.11) and for frailty from gamma and inverse Gaussian density functions f(u)
given in (1.7) and (1.18). To generate frailty values from gamma distribution we have used direct
commands in R package but for generating frailty values from inverse Gaussian distribution we
have used different packages like ‘SuppDists’, ‘statmod’ etc. in R.
Here we have generated different random samples of size n = 50, 75 and 100 for lifetimes Ti1
and Ti2 using (3.11). But we are giving procedure for sample generation of only one sample size,
say, n = 50. Samples are generated using following procedure for Model-I and Model-II:
1. Generate a random sample of size n = 50 from gamma and inverse Gaussian distributions
having density as given in equation (1.7) and (1.18), with θ = 3.6 as shared frailties (ui)
for ith (i = 1, 2, . . . , 50) cluster.
2. Generate 50 covariate values for X from binomial distribution for Model-I and from normal
distribution for Model-II.
3. Compute exp(Xiβ) with regression coefficient β = 2 and β = 0.7 for Model-I and Model-II,
respectively.
4. Generate 50 pairs of lifetimes (ti1, ti2) for given covariate (xi) using following generators,
ti1 = γ−11 ln
[1 +
γ1Ai1λ1
](3.12)
ti2 = γ−12 ln
[1 +
γ2Ai2λ2
](3.13)
for shared frailty models Model-I and Model-II, where Ai1 = − ln(ri1)ui exp(xiβ) and Ai2 =
− ln(ri2)ui exp(xiβ) ; r1 and r2 are random variables having U(0, 1) distribution and γ1, λ1 are
respectively shape and scale parameters of baseline distribution of first survival time and
γ2, λ2 are that of second survival time.
Gompertz Baseline Distribution 63
5. Generate censoring time wi from exponential distribution with failure rate 0.9 for Model-I
and 1.5 for Model-II.
6. Observe jth individual survival time t∗ij = min(tij , wi) and censoring indicator δij for ith
cluster (i = 1, 2, . . . , 50 and j = 1, 2), where
δij =
1, ; tij ≤ wi
0, ; tij > wi
Thus we have data consists of 50 pairs of survival times (t∗i1, t∗i2) and censoring indicators
δij .
Prior distributions that we have assumed for the parameters of Model-I are respec-
tively, G(0.0001, 0.0001) for baseline parameters λ1, γ1 and γ2 and G(0.01, 0.01) for λ2;
G(0.0001, 0.0001) for frailty parameter θ and N(0, 1000) for regression parameter β. Also,
prior distributions that we have assumed for the parameters of Model-II are respectively,
G(0.0001, 0.0001) for baseline parameters λ1, λ2, γ1 and γ2; G(0.0001, 0.0001) for frailty pa-
rameter θ and N(0, 1000) for regression parameter β. Here G(a, b) is gamma distribution with
shape parameter a and scale parameter b and N(µ, σ2) represents normal distribution with mean
µ and variance σ2.
We run two parallel chains for the proposed model with the different starting points using
Metropolis-Hastings algorithm within Gibbs sampler based on normal transition kernels. We
have iterated both the chains for 95,000 times using gamma prior. For both the chains the
results were somewhat similar so we present here the analysis for only one chain (i.e. chain I)
and for only one sample size for the resulting shared frailty models.
Trace plots and coupling from past plots are presented in Figures 3.1 and 3.3 for Model-I and
Figures 3.2 and 3.4 for Model-II with gamma prior assumption. These plots for other sample sizes
also have same pattern. So, we have not shown any graphs for other sample sizes, only for n = 75
with gamma prior has been shown to get the idea about these graphs. Trace plots are having
zigzag pattern which says that parameters are moving freely and appropriate for both models.
Sample autocorrelation plots are presented in Figures 3.5 and 3.6 for both models. Generated
chains have slightly higher autocorrelation lag which is decided by using sample autocorrelation
function plot.
Gompertz Baseline Distribution 64
Autocorrelation lag for all the parameters have different values so for uniqueness we have taken
Figure 3.1: Trace plots of the posterior samples for the simulation study of
Model-I
lag as maximum of lag of all parameters of a model. To have a single value of lag we have
taken lag value k (given in Table 3.5 for Model-I and Table 3.6 for Model-II) as maximum of
autocorrelation lag of all the parameters of a model.
Gelman-Rubin scale reduction factor and Geweke test with corresponding p-values for both
Gompertz Baseline Distribution 65
Figure 3.2: Trace plots of the posterior samples for the simulation study of
Model-II
the models are provided in Tables 3.1 and 3.3 for Model-I and Tables 3.2 and 3.4 for Model-II,
respectively. From these Tables, we can observe that Gelman-Rubin scale reduction factor values
are quite near to one and also p-values for Geweke test values are large enough to say the chains
attain stationary distribution.
The posterior summaries including posterior mean, standard error, and 95% credible intervals
along with burn in period values, sample autocorrelation lag values and sample sizes are provided
Gompertz Baseline Distribution 66
in Table 3.5 and Table 3.6 for both shared frailty models, respectively. From these Tables, it
Figure 3.3: Coupling from past plots for the simulation study of Model-I
can be observed that estimated values of parameters reach quite close to true values of the
parameters with decreasing standard errors as the sample size goes on increasing. We have used
R statistical software to perform this simulation study.
Gompertz Baseline Distribution 67
Figure 3.4: Coupling from past plots for the simulation study of Model-II
Gompertz Baseline Distribution 68
Figure 3.5: Sample autocorrelation function plots for the simulation study of
Model-I
Gompertz Baseline Distribution 69
Figure 3.6: Sample autocorrelation function plots for the simulation study of
Model-II
Gompertz Baseline Distribution 70
Table 3.1: Gelman-Rubin convergence statistic values for simulation for Model-I
Parameters λ1 λ2 γ1 γ2 θ β
True values 3.0 3.0 1.0 2.0 3.6 2.0
n = 50
stat. value 1.010111 1.003496 1.002179 1.000078 1.001579 1.000022
n = 75
stat. value 1.000276 1.000163 1.000015 0.9999592 1.000463 1.006026
n = 100
stat. value 1.020548 1.007764 1.000666 1.000023 1.001366 1.000979
Table 3.2: Gelman-Rubin convergence statistic values for simulation for Model-
II
Parameters λ1 λ2 γ1 γ2 θ β
True values 3.0 3.0 1.0 2.0 3.6 0.7
n = 50
stat. value 1.000161 0.9999772 1.000717 1.003983 1.006078 1.000492
n = 75
stat. value 1.001183 1.000145 0.9999587 1.001571 1.000021 1.002332
n = 100
stat. value 1.010941 1.004914 1.000031 1.003994 1.001281 1.009831
Gompertz Baseline Distribution 71
Table 3.3: Geweke test values and corresponding p-values for simulation for
Model-I
Sample Size→ n = 50 n = 75 n = 100
True test test test
Parameters values value p− value value p− value value p− value
λ1 3.0 -0.00129 0.49948 -0.00153 0.49938 -0.00056 0.49977
λ2 3.0 -0.00702 0.49719 0.01471 0.50586 0.01807 0.50721
γ1 1.0 -0.00737 0.49705 0.00462 0.50184 0.01586 0.50633
γ2 2.0 -0.00127 0.49949 0.00209 0.50083 0.000401 0.50016
θ 3.6 0.01427 0.50569 0.01133 0.50452 -0.00404 0.49838
β 2.0 -0.00439 0.49824 0.01168 0.50466 -0.00316 0.49873
Table 3.4: Geweke test values and corresponding p-values for simulation for
Model-II
Sample Size→ n = 50 n = 75 n = 100
True test test test
Parameters values value p− value value p− value value p− value
λ1 3.0 0.01211 0.50483 -0.00019 0.49992 -0.00857 0.49658
λ2 3.0 -0.00649 0.49740 0.00726 0.50289 -0.00525 0.49790
γ1 1.0 0.00969 0.50386 0.00742 0.50296 0.01601 0.50639
γ2 2.0 -0.00701 0.49720 0.00629 0.50251 -0.00275 0.49890
θ 3.6 0.00287 0.50114 -0.00808 0.49677 0.01192 0.50475
β 0.7 -0.00152 0.49939 -0.00951 0.49620 0.00921 0.50367
Gompertz Baseline Distribution 72
Table 3.5: Parameters estimates for simulation of shared gamma frailty model
(Model-I) with Gompertz baseline hazard
Parameters λ1 λ2 γ1 γ2 θ β
True values 3.0 3.0 1.0 2.0 3.6 2.0
n = 50
sample size = 870 burn in period = 8000 autocorrelation lag = 100
Estimates 3.339832 3.58423 0.8029933 1.898675 3.728842 1.784984
S.E. 0.5166535 0.5047603 0.2513277 0.2350126 0.3168721 0.2564108
LowerLt. 2.17248 2.030812 0.51023 1.603307 3.039895 1.519136
UpperLt. 3.935928 3.862703 1.464501 2.375263 4.159701 2.412618
n = 75
sample size = 600 burn in period = 5000 autocorrelation lag = 150
Estimates 3.275938 3.352028 0.9781125 1.990647 3.670445 1.807583
S.E. 0.4715845 0.4150472 0.2474298 0.1986308 0.2753138 0.2123236
LowerLt. 2.235856 2.462795 0.550843 1.609894 3.114257 1.521822
UpperLt. 3.927783 3.967307 1.498642 2.360427 4.147788 2.286009
n = 100
sample size = 425 burn in period = 10000 autocorrelation lag = 200
Estimates 2.818765 2.805313 1.019907 1.997276 3.601107 1.829282
S.E. 0.4173838 0.3207179 0.2032705 0.1980163 0.2688608 0.2091362
LowerLt. 2.283129 2.876453 0.5884597 1.621737 3.180849 1.516968
UpperLt. 3.783039 3.992017 1.409367 2.37215 4.156142 2.259177
Gompertz Baseline Distribution 73
Table 3.6: Parameters estimates for simulation of shared inverse Gaussian frailty
model (Model-II) with Gompertz baseline hazard
Parameters λ1 λ2 γ1 γ2 θ β
True values 3.0 3.0 1.0 2.0 3.6 0.7
n = 50
sample size = 582 burn in period = 8000 autocorrelation lag = 170
Estimates 2.8591 2.601739 0.9236211 1.910175 3.508175 0.6428016
S.E. 0.5856733 0.4414507 0.2899155 0.2864638 0.3244146 0.04853413
LowerLt. 1.849004 2.015302 0.5109697 1.413235 3.042288 0.3450716
UpperLt. 3.923291 3.887658 1.459524 2.592098 4.192036 0.8330701
n = 75
sample size = 890 burn in period = 6000 autocorrelation lag = 100
Estimates 2.955741 2.846793 0.974943 1.95329 3.624358 0.7514799
S.E. 0.3480802 0.2487879 0.277412 0.2842537 0.3186504 0.03493911
LowerLt. 2.236078 2.111656 0.5273721 1.520869 3.051731 0.3770092
UpperLt. 3.479806 3.380265 1.456028 2.527003 4.17338 0.8168667
n = 100
sample size = 580 burn in period = 8000 autocorrelation lag = 150
Estimates 2.959122 3.089771 0.9826119 1.993055 3.600546 0.6761037
S.E. 0.2529426 0.2006675 0.265877 0.2802394 0.3161122 0.03123944
LowerLt. 2.400494 2.161371 0.5331393 1.533585 3.097311 0.4168464
UpperLt. 3.330827 3.35982 1.451178 2.519502 4.131662 0.7792813
Gompertz Baseline Distribution 74
3.5 Analysis of Tumorigenesis Experiment Data
In this Section the method is applied to the animal tumorigenesis data described by Mantel
et al. (1977) (given in Appendix A). The experiment involved 50 male and 50 female litters,
consist of three rats within each litter. A subset of the data, for the female rats litter, was
analyzed by Hougaard (1986) who used a Cox type model and a parametric model with Weibull
margins. The Bayesian approach using Gibbs sampling was given by Clayton (1991). We also
use this subset of data but with two controlled rats to illustrate the application of our proposed
models because there should be homogeneity within cluster. One of the main research aims is to
investigate whether some unobserved factors like genetic or environmental factors significantly
increases the failure rate of rats while allowing for common covariates for both rats within the
same litter. Here, we fit the proposed shared frailty models (defined in 3.8 and 3.9) to the female
rats’ tumor time data, without any covariate as we consider only controlled female rats. First,
we check goodness-of-fit of the data for Gompertz baseline distribution and then apply the
Bayesian estimation procedure. In the analysis of this study, we have used the R package. To
(a) C1 (b) C2
Figure 3.7: Graphs for goodness-of-fit for litter matched tumorigenesis data for
Gompertz distribution
check goodness-of-fit of data set, firstly we consider graphical procedure to check appropriateness
of the model. To assess model graphically, we plot the graph of ln[1−{γln(S(t))/λ}] versus t for
Gompertz Baseline Distribution 75
proposed model. If the resulted plot is roughly a straight line then we can say that the underline
baseline model is appropriate. The graph of ln[1−{γln(S(t))/λ}] versus t for Gompertz baseline
distribution for both controlled rats C1 and C2, using R-program, are shown in the Figures 3.7(a)
and 3.7(b), respectively. From the Figures, we can observe that many of the points are nearly
on the straight line.
(a) C1 (b) C2
Figure 3.8: Comparison of non parametric survival function with Gompertz
survival function for litter matched tumorigenesis data
Figures 3.8(a) and 3.8(b) which plot the Kaplan-Meier estimates and hypothesized theo-
retical distribution S0(t) for the model demonstrate a close agreement between the two i.e.
non-parametric and parametric survival curves.
Table 3.7: p-values for Kolmogorov-Smirnov test statistic for litter matched
tumorigenesis data set
Test C1 C2
K-S 0.9804 0.9753
Gompertz Baseline Distribution 76
We have considered a statistical test also viz. Kolmogorov-Smirnov (K-S) test for testing
composite goodness-of-fit hypotheses to the Gompertz baseline distribution for models (3.8)
and (3.9). Test procedure is developed for testing goodness-of-fit with data subjected to random
right censoring. Here, we assume that survival times of rats are conditionally independent so
we apply K-S test to the survival times of both rats separately. The p-values of K-S test for
baseline distribution are presented in Table 3.7 which are very large approximately close to 1.
Thus, from different goodness-of-fit graphs (see Figures 3.7(a) and 3.7(b)) and p-values of K-S
test (see Table 3.7) we can say that there is no statistical evidence to reject the hypothesis that
data are from Gompertz distribution.
In the analysis, we have used the R program. Given the model assumptions, this program
performs the Gibbs sampler by simulating from the full conditional distributions. We run two
parallel chains for both shared frailty models using two sets of prior distributions with the
different starting points using Metropolis-Hastings algorithm within Gibbs sampler based on
normal transition kernels. In our study of this data, the prior distribution we use for frailty
parameter θ is gamma distribution with mean one and large variance, say Γ(φ, φ), with a small
choice of φ. Since, we do not have any prior information about baseline parameters, λ1, γ1, λ2,
and γ2, therefore, prior distributions of these parameters are assumed to be flat. We consider
two different sets of prior distributions for baseline parameters, one is Γ(a, b) and another is
U(c, d). All the hyper-parameters φ, a, b, c, and d are known. Here Γ(a, b) is gamma distribution
with shape parameter a and scale parameter b and U(c, d) represents uniform distribution over
the interval c to d. We set hyper-parameters as
• for Model-I
φ = 0.00001, a = 1; b = 0.0001, c = 0 and d = 100.
• for Model-II
φ = 0.00001; a = 1, b = 0.0001 for λ1, λ2, γ1, and γ2; and c = 0, d = 100 for λ1, λ2, γ2,
and c = 0, d = 60 for γ1.
We implemented 95,000 iterations of the algorithm. To diminish the effect of the starting
distribution, we generally discard the early iterations of each sequence and focus attention on
the remaining.
Gompertz Baseline Distribution 77
Figure 3.9: Posterior for Model-I: time series plots of the posterior samples for
litter matched tumorigenesis data
We described the first 2,700 and 4,000 iterations as a burn-in for Model-I and Model-II,
respectively. There is no effect of prior distributions on posterior summaries because estimates
of parameters are nearly same for both sets of priors. Here, results for chain I and chain II are
similar so we present results for only one chain (i.e. chain I) with Γ(a, b) as prior for baseline
Gompertz Baseline Distribution 78
parameters.
Figure 3.10: Posterior for Model-II: time series plots of the posterior samples
for litter matched tumorigenesis data
Graphs for time series (or trace) plot, coupling from past plots and sample autocorrelation
plot with thinning (i.e. after considering lag shown in Figures) for the parameters are shown in
Figures 3.9, 3.11, and 3.13 and Figures 3.10, 3.12, and 3.14 for Model-I and Model-II, respectively.
Gompertz Baseline Distribution 79
Figure 3.11: Posterior for Model-I: coupling from past plots of the posterior
samples for litter matched tumorigenesis data
Trace plots for all the parameters show zigzag pattern which indicates that parameters
move and mix more freely. Thus, it seems that the Markov chain has reached the stationary
state. However, a sequence of draws after burn-in period may have autocorrelation. Because
of autocorrelation consecutive draws may not be random, but values at widely separated time
points are approximately independent.
Gompertz Baseline Distribution 80
So, a pseudo random sample from the posterior distribution can be found by taking values from
a single run of the Markov chain at widely spaced time points (autocorrelation lag) after burn-in
period.
Figure 3.12: Posterior for Model-II: coupling from past plots of the posterior
samples for litter matched tumorigenesis data
Simulated values of parameters have autocorrelation of lag k (value given below each plot
in Figure 3.13 and Figure 3.14), so every kth iteration is selected as sample to thin the chain
and discarding the rest. The autocorrelation of parameters become almost negligible after the
Gompertz Baseline Distribution 81
defined lag, given in Figures.
Figure 3.13: Plots of autocorrelations with thinning number lag (given below
the Figure) of the posterior samples for litter matched tumorigenesis data for
Model-I
We can also use running mean plots to check how well our chains are mixing. A running
mean plot is a plot of the iterations against the mean of the draws up to each iteration. In fact
running mean plots display a time series of the running mean for each parameter in each chain,
Gompertz Baseline Distribution 82
as shown in Figure 3.15 and Figure 3.16 for Model-I and Model-II, respectively.
These plots should be converging to a value. Running mean plot for each parameter is converging
Figure 3.14: Plots of autocorrelations with thinning number lag (given below
the Figure) of the posterior samples for litter matched tumorigenesis data for
Model-II
to the posterior mean of the parameter, thus, represents a good mixing of chain. Thus, our
diagnostic plots suggest that the MCMC chains are mixing very well.
Gompertz Baseline Distribution 83
(a) (b)
(c) (d)
(e)
Figure 3.15: Graphs of running mean plots for litter matched tumorigenesis
data for Model-I
Gompertz Baseline Distribution 84
(a) (b)
(c) (d)
(e)
Figure 3.16: Graphs of running mean plots for litter matched tumorigenesis
data for Model-II
Gompertz Baseline Distribution 85
Tables 3.8 and 3.9 present the posterior mean, standard deviation, and 95% credible inter-
vals for all baseline parameters, frailty variance along with Gelman-Rubin convergence statistic
values and Geweke test statistic values with corresponding p-values for Model-I and Model-II,
respectively. Monitoring convergence of the chains has been done via the Brooks and Gelman
(1998) convergence-diagnostic.
Hence, once convergence has been achieved, 92,300 and 91,000 observations, for Model-I and
Table 3.8: Posterior summary for litter matched tumorigenesis data set for
Model-I
Credible Interval
Parameters BGR Geweke test Estimate S.E. Lower Upper
stat. value test value p-value limit limit
λ1 1.0016 0.0141 0.5056 0.0006 0.0003 0.0002 0.0014
λ2 1.0007 -0.0137 0.4945 0.0009 0.0004 0.0003 0.0020
γ1 1.0005 -0.0107 0.4956 0.0244 0.0046 0.0156 0.0316
γ2 1.0012 0.0128 0.5051 0.0242 0.0043 0.0160 0.0315
θ 1.0028 0.0041 0.5016 1.9875 0.3127 1.5209 2.6236
Model-II, respectively, are taken from each chain after the burn-in period. On inspection of the
Brooks and Gelman’s diagnostic, we find the BGR (Brooks and Gelman Ratio) convergent to
one, this shows that the convergence for variance of frailty θ and other parameters has been
obtained. Also, the Geweke test statistic values are quite small and corresponding p-values are
large enough to say the chains attain stationary distribution.
There is little difference between the posterior estimates of baseline parameters γ1, γ2 and
frailty parameter θ for Model-I and Model-II, presented in Tables 3.8 and 3.9. The posterior
estimate of dispersion of random effects i.e. θ = 1.9875 for Model-I and θ = 1.9958 for Model-II,
shows that there exists highly significant heterogeneity in population of rats. Thus, there is a
dependence within the litter.
Gompertz Baseline Distribution 86
Table 3.9: Posterior summary for litter matched tumorigenesis data set for
Model-II
Credible Interval
Parameters BGR Geweke test Estimate S.E. Lower Upper
stat. value test value p-value limit limit
λ1 1.1623 0.0059 0.5023 0.0005 0.0002 0.0001 0.0011
λ2 1.0895 0.0079 0.5031 0.0007 0.0003 0.0003 0.0016
γ1 1.0168 -0.0060 0.4975 0.0250 0.0042 0.0159 0.0308
γ2 1.0123 -0.0105 0.4957 0.0248 0.0039 0.0165 0.0306
θ 1.0377 -0.0041 0.4983 1.9958 0.3182 1.5183 2.6637
3.5.1 Testing of Significance of Frailty
Therneau and Grambsch (2000) analyzed this data using gamma and Gaussian frailty models.
They observed the estimate of the variance of the random effect is highest when Gaussian frailty
with AIC method is used. Hanagal (2011) also considered the same data for Weibull distribution
as the baseline model to analyze and test for no frailty in frailty models. From the Gaussian and
gamma frailty models, he observed from the R output that frailty variable (litter) is very highly
significantly related to the recurrence time. Now we test for no-frailty in shared gamma frailty
model and shared inverse Gaussian model. To test significance of frailty we compared two models
M0 and M1. One with shared gamma frailty as model M1,G, shared inverse Gaussian as model
M1,IG and without frailty as model M0. As it is observed from Table 3.10 that AIC, BIC and
DIC are not able to discriminate between the models with frailty and without frailty for Model-I
and Model-II because the difference in information criterion between M0 and M1,G for Model-I
and M0 and M1,IG for Model-II are very small i.e. approximately less than 3. So, we use Bayes
factor for this comparison. Bayes factor value which is 6.363487 and lies in (6, 10) for Model-I
and 2.443901 and lies in (2, 6) for Model-II (see Table 3.11) gives strong positive and positive
evidence against the model M0 i.e. without frailty for both shared frailty models. Therefore, M0
Gompertz Baseline Distribution 87
is worse than M1 based on Bayes factor and all other criterion. Thus, it is observed from the
results that frailty variable i.e litter effect is very highly significantly related to the recurrence
time of rats.
Table 3.10: AIC, BIC and DIC values for test of significance for frailty under
Model-I & Model-II fitted to litter matched tumorigenesis data set
Models Frailty AIC AICc BIC DIC log-likelihood
I with (M1,G) 267.649 269.014 279.114 261.457 -128.825
II with (M1,IG) 266.3979 267.7615 278.2191 260.9202 -128.1989
I&II without (M0) 265.2342 266.123 275.2299 261.9294 -128.6171
Table 3.11: Bayes factor values and decision for test of significance for frailty
under Model-I & Model-II fitted to litter matched tumorigenesis data set
numerator model
Models against 2loge(B10) range Evidence against
denominator model model in denominator
I M1,G against M0 6.363487 ≥ 6 and ≤ 10 Strong Positive
II M1,IG against M0 2.443901 ≥ 2and ≤ 6 Positive
3.6 Discussion
In this Chapter, shared gamma frailty model and shared inverse Gaussian frailty model with
Gompertz baseline distribution has been introduced. We have developed the Bayesian estimation
procedure to estimate the parameters of the models. Firstly we have considered simulation study
to see the performance of the models and then applied to the litter matched tumorigenesis data.
In case of simulation, we have chosen the gamma as prior distribution to check the effect of sample
Gompertz Baseline Distribution 88
size on the posterior estimates. It was observed from all results of Tables and Figures that the
posterior summary estimates with gamma prior assumptions were close to true parameter values.
In this Chapter, we have considered the litter matched tumorigenesis experiment study. Here,
we have discussed the Bayesian estimation procedure including Gibbs sampling for computing
the estimation of the unknown parameters for litter matched tumorigenesis data set. For this
data set we have no covariate as we have considered only controlled rats and also, we were
mainly interested to see the effect of frailty. We have discussed Kolmogorov-Smirnov statistical
tests for testing goodness-of-fit to Gompertz distribution. K-S test gives the statistical evidence
of not rejecting the hypotheses that data follows Gompertz baseline distribution.
For estimation purpose, we have run two parallel chains with two sets of priors from different
starting points and considered the “burn-in” interval for each chain. We have provided 95,000
iterations to perform the analysis. We have clearly written the steps involved in the iteration
procedure, using R-program, given in Appendix B. The quality of convergence was checked
graphically by trace plots and running mean plots and statistically by Geweke test statistics
and Gelman-Rubin statistics (see Brooks and Gelman, 1998). The values of the Gelman-Rubin
statistics in this case are quite close to one. Thus, the sample can be considered to have arisen
from stationary distribution. Thus, all convergence diagnostics suggest that the MCMC chains
are mixing very well, converging to a value, and attain stationary distribution.
We have given the complete posterior summary of the data set including posterior mean,
standard error, and 95% credible interval for both shared frailty models. We have compared
the model with frailty and model without frailty using some information criteria such as, log-
likelihood, AIC, AICc, BIC, DIC, and Bayes factor for both shared frailty models. From the
value of all these criteria we conclude that the shared frailty model provide a suitable choice
for modeling the life time model of litter matched tumorigenesis data as compared to without
frailty model. From the tests for heterogeneity and posterior estimates of frailty parameter set
we can conclude that there is highly significant heterogeneity among the rats. Data analysis of
tumorigenesis experiment indicates that the performance of Bayesian estimation method is quite
satisfactory.