2012 oleksandr v. savenkovufdcimages.uflib.ufl.edu/uf/e0/04/45/93/00001/savenkov_o.pdf · phd...
TRANSCRIPT
NOVEL METHODS FOR TIME SERIES DATA IN CLINICAL STUDIES
By
OLEKSANDR V. SAVENKOV
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2012
© 2012 Oleksandr V. Savenkov
2
This dissertation is dedicated to the memory of Mark C.K. Yang
3
ACKNOWLEDGMENTS
There are many people without whose support and encouragement I would not be able to
complete this degree.
First and foremost, I would like to thank my advisor Professor Sam Wu. I am grateful to
Sam for dedicating so much time and efforts to my dissertation. I would like also to thank my
first PhD advisor, Professor Mark Yang, who taught me to work hard . A real sadness is that
he passed away in 2010.
I would like to thank Professor Malay Ghosh and Professor Kshitij Khare for serving in my
PhD committee and giving me valuable advices.
I feel grateful to my collaborators and mentors, Professor Panos Pardalos, who is also in
my PhD committee, and Frank Skidmore, for introducing new exciting areas of research to me.
I am very grateful to my friends and mentors, Professor Vladimir Boginski and Professor
Sergiy Butenko, for their invaluable help and encouragement. Thank you.
I would like to thank my peers and the faculty members at the Department of Statistics.
Finally, I would like to thank my family. Although, they are thousands miles away from
me, I feel their love and support everyday.
4
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
CHAPTER
1 INTERVENTION ANALYSIS FOR SINGLE-SUBJECT STUDIES . . . . . . . . . . 10
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2 An Improved Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3 The Test Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 ANALYSIS OF VARIANCE BASED ON CROSS-FITTING MEASURE OF SIMI-LARITY AND PERMUTATION TEST . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3 Measure of Distance by ARMA Coefficients . . . . . . . . . . . . . . . . . . . 332.4 The Cross-Fitting Measure of Similarity . . . . . . . . . . . . . . . . . . . . . 342.5 ANOVA Based on Cross-Fitting Measure of Similarity and Permutation Test . 382.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
APPENDIX
A LIKELIHOOD FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
B CROSS-FITTING DISTANCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
C R CODE FOR EVALUATING MLE . . . . . . . . . . . . . . . . . . . . . . . . . . 47
D MATLAB CODE FOR EVALUATING MLE . . . . . . . . . . . . . . . . . . . . . . 51
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5
LIST OF TABLES
Table page
1-1 MSE and Bias for β = (30, 0.8, 0.6) . . . . . . . . . . . . . . . . . . . . . . . . . 21
1-2 MSE and Bias for β = (30, 0, 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1-3 Case study results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6
LIST OF FIGURES
Figure page
1-1 Several Common Mean Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1-2 Estimated mean functions for an AR process . . . . . . . . . . . . . . . . . . . . . 22
1-3 Estimated mean functions for an ARMA process . . . . . . . . . . . . . . . . . . . 23
1-4 Intensive CILT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1-5 Distributed CILT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1-6 Intensive PACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1-7 Distributed PACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2-1 Comparison of three stationary AR(1) time series . . . . . . . . . . . . . . . . . . . 37
7
Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy
NOVEL METHODS FOR TIME SERIES DATA IN CLINICAL STUDIES
By
Oleksandr V. Savenkov
August 2012
Chair: Samuel S. WuMajor: Statistics
Single subject or n-of-1 research designs have been widely used to evaluate treatment
interventions. Many statistical procedures, such as: split-middle trend lines, regression trend
line, Shewart-chart trend line, binomial tests, randomization tests and Tryon C statistics have
been used to analyze single-subject data, but they fail to control Type I error due to serially
dependent time series observations.
In this work we present an improved intervention analysis model for dynamic characteris-
tics of an intervention effect in a short series of single subject data. There are several potential
difficulties that may arise, and one of them is that the series of data can be relatively short.
To address this issue we derived exact likelihood function, that allows us to get estimates in
more direct way than approximate algorithms, which fail to converge for short series. Since
we consider short series, the chi-squared approximation to the null distribution for the test
statistics may not be valid. In order to test the treatment effect, we develop the hypothesis
testing procedure. The methods are illustrated with a real clinical trial on constraint-induced
language therapy for aphasia patients.
In the second part of this work we provide an overview of several approaches to measure
the similarity/dissimilarity between time series. The main goal is to develop the framework that
would allow to investigate group difference in time series. This problem rely on the ability to
measure the distance between time series. In this work we develop a novel approach to measure
8
the similarity between time series, cross-fitting measure. The proposed measure can be also
applied to time series clustering problems.
9
CHAPTER 1INTERVENTION ANALYSIS FOR SINGLE-SUBJECT STUDIES
1.1 Introduction
Single-subject designs have been used widely for decades, particularly in the behavioral
sciences, but statistical analysis of such studies remains problematic, primarily because such
data are generally autocorrelated and the observation series is short. The purpose of this work
is to introduce a new analysis model that improves the current methods of drawing conclusions
from single-subject studies. In a typical single-subject design, repeated observations are made
on the lone subject during a baseline period and a subsequent treatment period. The baseline
measurements are intended to establish a stable reference point, and also, in cases of recent
injury or other affliction from which spontaneous improvement might be expected, to estimate
the rate of improvement prior to treatment. After the subject is exposed to the intervention,
the observations continue in an attempt to establish corresponding treatment-period values.
Investigators then desire to test two null hypotheses: 1) there is no difference in overall
outcome between the baseline and treatment periods, and 2) there is no difference in the
rate of change in outcome between the two periods. When spontaneous improvement before
treatment does not occur (i.e., the slope of the baseline data is non-positive), rejection of the
first hypothesis is enough to show that the treatment is effective. In other cases, rejection of
both hypotheses is required [1].
Visual analysis is the traditional and still widely used method of approaching such
studies [1]. The data are plotted across time, with a vertical line separating the baseline and
treatment periods. Investigators then eyeball the data and make informal conclusions about
the effectiveness of the intervention. As one might imagine, this method is highly subjective
and hence unreliable, with one large meta-analysis finding an overall inter-rater agreement
coefficient of only 0.58 [2]. To address this concern, researchers have proposed various tools to
aid visual analysis and make its conclusions more robust. In the split-middle trend line method
[3], the baseline data are divided in half and a line is drawn through their respective medians.
10
The same procedure is applied to the treatment data, and the level and slope of the two lines
are qualitatively compared. The celeration trend line method [4] is identical, except the lines
extend through means rather than medians. Separate regression lines through the baseline and
treatment data also are sometimes plotted and visually compared. None of these methods,
however, has been shown to offer much improvement in the reliability of visual analysis, with
Type I error rates remaining as high as 84% [5–9].
Methods that are more statistically oriented also often are used. The Shewart procedure
[4, 10] sets reference lines two standard deviations above and below the mean of the baseline
data. If two successive data points in the treatment period fall outside those bounds, one infer
that significant change has occurred. Binomial tests compare the proportion of treatment
points that fall above and below the baseline split-median or celeration line. T-tests are
sometimes performed between the baseline and treatment means, and Tryon’s C-statistic [11] is
frequently used to compare slopes. The autocorrelation of single-subject data causes such tests
to be invalid, however, and Type I error rates remain unacceptably high when autocorrelation is
present [12–15].
Gottman [16] defined interrupted-time-series analysis (ITSA) for a stream of serially
dependent observations across two experimental periods. Based on fitting autoregressive
parameters, the method yields three tests: an F test of the null hypothesis that no overall
change has occurred between the two periods, and t-tests for differences in means and slopes.
Crosbie [17] showed that the ITSA method underestimated positive autocorrelation and
hence could not maintain Type I error control when the baseline and treatment observations
were relatively few (less than 50 observations per time period), making the ITSA method
inapplicable to most clinical settings. Crosbie proposed a corrected version of ITSA, called
ITSACORR, which could handle shorter time series and has since been widely employed.
ITSACORR, however, fails to control Type I error for autocorrelations higher than 0.6 and
sample sizes less than 20, and it also assumes that the intervention effect is a linear trend
change from baseline, which is not appropriate in many applications, such as the examples
11
we give in our illustrative example section. In addition, Rosner, Munoz, et al.[18] and Rosner
and Munoz[19] present autoregressive models that use regression methods to relate change in
response variables to explanatory variables.
In this work, we consider an improved intervention analysis model [20] for dynamic charac-
teristics of an intervention effect in a short series of single-subject data. The statistical model
is presented in Section 2. The maximum likelihood estimates are derived and a hypothesis
testing procedure is proposed in Section 3. The methods are illustrated with a real clinical trial
on constraint-induced language therapy for aphasia patients in Section 4.
1.2 An Improved Model
Suppose the data Yt , t = 1, ..., n are available as a series observed at equal time intervals.
We assume that the time series is the subject to intervention at time T and the time T is
known. The part of the time series Yt , t < T is the preintervention data.
In pioneering work, Box and Tiao [20] considered the following intervention model
Yt = f (t) + Nt , (1–1)
where
• Yt is the observed outcome series
• f (t) is the unknown mean function associated with known intervention time
• Nt is random noise.
It is assumed that the noise Nt follows an autoregressive moving average model (ARMA):
φ(B)Nt = θ(B)at
where
• B is the backward shift operator [21]
• at , t = 1, ...n is a sequence of independent random variables with N(0,σ2) distribution
• φ(B) = 1− φ1B − φ2B2 − ...− φpB
p
12
θ(B) = 1− θ1B − θ2B2 − ...− θqB
q
In this work we consider the case of a single intervention. We should mention that such models
are not restricted to single intervention and several mean functions can be combined for more
sophisticated intervention effects. There are several possible response patterns, that depend on
the choice of the mean function f (t). Among others, we can consider following mean functions
• f (t) = ωBIt
• f (t) = ωB1−δBIt
• f (t) = ωB1−B It
An indicator function It is given by
It =
0, if t ≤ T ;
1, o/w.
(1–2)
Short time intervention effects can be specified using the pulse function
Pt =
1, if t = T ;
0, o/w.
Clearly,
(1− B)It = Pt
Thus, without loss of generality, we consider models with the step function It .
In this paper, we assume that f (t) follows a first-order dynamic model for intervention,
with the transfer function of the form
f (t) =ωB
1− δBIt , (1–3)
13
Figure 1-1. Several Common Mean Functions
14
where ω, δ are unknown parameters, with 0 < δ < 1, and B is the backward shift operator
[21]. This implies that
f (t) =
0, if t ≤ T ;
ω(1− δt−T )/(1− δ), if t > T .
(1–4)
Such transfer function model is appropriate when the response is not expected to be immedi-
ate. Such assumptions seem to be reasonable in clinical studies.
In general, it is desirable to chose the form of the transfer function based on the informa-
tion about mechanisms that cause the change. We also assume that Nt follows an ARMA(1, 1)
model with mean 0. Note that higher order Nt can be included into the model. For our case,
let
Nt =1− θB
1− φBat (1–5)
which implies that
Nt − φNt−1 = at − θat−1. (1–6)
Then the model for time series Yt has form
Y1 = µ+ N1
Y2 = µ+ N2
...
YT = µ+ NT
YT+1 = µ+ ω + NT+1
...
Yn = µ+ ω(1− δn−T )/(1− δ) + Nn.
In this model the first order dynamic function is applied to the unknown mean function, it
makes hard to derive MLEs, since the parameters are involved ”non-linearly” in the model.
15
Under model (1–1), we have
Yt = f (t) + Nt = ω + δf (t − 1) + Nt = ω + δYt−1 + (Nt − δNt−1), t ≥ T + 1
To simplify the previous model, we use an ARMA(1,1) time series (W1, ...,Wn) to replace
the terms (N1,N2, ...,NT ;NT+1 − δNT ,NT+2 − δNT+1, ...,Nn − δNn−1). In other words, the
original intervention model (1–1) can be written in the form of (1–7) below:
Y1 = µ+W1
Y2 = µ+W2
...
YT = µ+WT
YT+1 = µ+ ω + δYT +WT+1
...
Yn = µ+ ω + δYn−1 +Wn.
(1–7)
Instead of applying first order dynamic to the mean function, the new model applies it to the
observed time series. The model (1–7) can be rewritten in the matrix form as
W = AY − η (1–8)
where
η = (µ, ... ,µ,µ+ ω, ... ,µ+ ω)T , (1–9)
16
and the matrix A is given by
A =
1 0 0 ... ... 0 0
0 1 0 ... ... 0 0
...
0 ... ... −δ 1 ... 0
...
0 0 0 ... ... −δ 1
(1–10)
This representation allows us to derive the probability density function of Y .
The model can be also rewritten in the form
W = Y − Bβ (1–11)
β = (µ,ω, δ)T , (1–12)
The matrix B is given by
B =
1 0 0
1 0 0
......
...
1 1 YT...
......
1 1 Yn−1
(1–13)
This form is useful for deriving estimates for β, the parameters of interest.
The probability density function of the vector Y = (Y1, ...,Yn) equals to
p(Y |φ, θ,σ) = (2πσ2)−n2 |ZTZ |−
12 exp −1
2S∗(φ, θ)/σ2 (1–14)
17
where Z is given by
Z =
1 0
0 1
θ − φ −φ(θ − φ)(1− φ2)−12
θ(θ − φ) −θφ(θ − φ)(1− φ2)−12
...
θn−1(θ − φ) −θn−1φ(θ − φ)(1− φ2)−12
(1–15)
and
S∗(φ, θ) = (AY − η)TΓ(AY − η) = (Y − Bβ)TΓ(Y − Bβ). (1–16)
with Γ defined as
Γ = LT (I − Z(ZTZ)−1ZT )L (1–17)
And matrix L has form
L =
0 0 0 ... 0 0
0 0 0 ... 0 0
1 0 0 ... 0 0
(θ − φ) 1 0 ... 0 0
θ(θ − φ) (θ − φ) 1 ... 0 0
...
θn−2(θ − φ) θn−3(θ − φ) θn−4(θ − φ) ... (θ − φ) 1
. (1–18)
Clearly, the form of matrices Z and L depends on the order of the time series Nt .
Consider the likelihood function:
L(φ, θ,σ2|Y ) = (2πσ2)−n2 |ZTZ |−
12 exp −1
2S∗(φ, θ)/σ2 (1–19)
First, it can be shown that for any given (φ, θ), the likelihood function is maximized by
β(φ, θ) = (BTΓB)−1BTΓY
18
and
σ2(φ, θ) = S∗(φ, θ, β)/n
which depend on (φ, θ) through Γ. Plugged them into the likelihood function, we get
L∗(φ, θ|β, σ2,Y ) =
(2π · S∗(φ, θ, β)
n
)− n2
|ZTZ |−12 exp −n
2. (1–20)
Therefore, if we let (φ, θ) be the values that maximize the above expression L∗, then the
MLE of the parameters can be obtained as φ, θ, β(φ, θ) and σ2(φ, θ)).
Furthermore, we would like to point out a connection between the MLE β(φ, θ) and a
Bayes estimator. If we let β = (BTΓB)−1BTΓY which depend on (φ, θ) through Γ, then we
have
S∗(φ, θ, β) = [(Y − Bβ)− B(β − β)]TΓ[(Y − Bβ)− B(β − β)]
= (Y − BβT )Γ(Y − Bβ) + (β − β)TBTΓB(β − β), (1–21)
where the first term is constant given (φ, θ) and Y . Therefore, the likelihood function and
(1–21) imply that, conditioned on (φ, θ,σ2), the posterior distribution of β is multivariate
normal with mean β and covariance (BTΓB)−1.
Therefore,
L(β|Y ,φ, θ,σ2) ∝ exp −12(β − β)TBTΓB(β − β)/σ2. (1–22)
In other words, the MLE β(φ, θ) is the Bayes estimator with auxiliary parameters estimated at
(φ, θ).
1.3 The Test Procedure
Suppose βT = (µ, βT2 ), where β2 = (ω, δ), then for a treatment effect we would like to
test
H0 : β2 = (0, 0) vs Ha : β2 6= (0, 0) (1–23)
19
Here we consider partitioned β. It is well known [22], that whatever true value of β
β − β ∼ AN(0,Σ) (1–24)
where Σ is the variance-covariance matrix of the vector β.
Let
P =
0 1 00 0 1
then
Pβ =
0 1 00 0 1
µ
ω
δ
=ωδ
= β2
Assume that the matrix Σ is of the form
Σ =
Σ11 Σ12 Σ13
Σ21 Σ22 Σ23
Σ31 Σ32 Σ33
Then the variance-covariance matrix for the vector β2 is given by
Σ2 = Cov(β2) = Cov(Pβ) = PΣPT =
Σ22 Σ23Σ32 Σ33
To find p−value one can use a Wald test statistic given by
Tw = β2TΣ−12 β2 ∼ χ22 (1–25)
We should mention that the likelihood ratio test statistics and score test statistics can be used.
This result holds for large samples, but it may not be valid for small samples, because the
chi-squared approximation to the null distribution for the test statistics may not be as good as
for large samples. To address this issue, we propose the following procedure to find the p-value:
1. Simulate n time series from the model with β2 = (0, 0).
20
2. Estimate coefficients β2 based on simulated data and calculate Ti , i = 1, ..., n as a Waldtest statistic.
3. Calculate p−value according the formula
p =
∑ni=1 I (Ti > Tw)
n
1.4 Simulation Studies
In this section we present some numerical results to compare an intervention model
with AR(1) errors and an intervention model with ARMA(1, 1) errors. The simulations were
performed for β = (30, 8, 0.6) and β = (30, 0, 0). For the comparison purpose we looked
at the two sets of data. The first set was simulated using the model with AR(1) errors and
the second set was simulated using the model with ARMA(1, 1) errors. The compuational
experiments have been performed with free software environment R [23].
The results of simulation studies are summarized in Table 1-1 and Table 1-2.
Table 1-1. MSE and Bias for β = (30, 0.8, 0.6)
MSE BiasModel Fit µ ω δ µ ω δ
AR(1) 0.395 2.578 0.0004 0.0246 0.0767 -0.0007AR(1) ARMA(1,1) 0.465 5.7 0.0008 -0.0795 -0.819 0.011
AR(1) 0.403 3.19 0.0005 0.0254 0.0944 -0.001ARMA(1,1) ARMA(1,1) 0.479 6.93 0.001 -0.0717 -0.8985 0.011
Table 1-2. MSE and Bias for β = (30, 0, 0)
MSE BiasModel Fit µ ω δ µ ω δ
AR(1) 0.7325 137.9 0.1518 0.015 -5.07 0.17AR(1) ARMA(1,1) 1.102 218.8 0.24 0.078 -11.22 0.369
AR(1) 0.751 275.4 0.304 0.0304 -8.62 0.287ARMA(1,1) ARMA(1,1) 1.073 273.9 0.302 0.103 -10.79 0.353
21
Figure 1-2. Estimated mean functions for an AR process
22
Figure 1-3. Estimated mean functions for an ARMA process
23
1.5 Case Study
For an illustrative example we consider data from a randomized clinical trial of Constraint
Induced Language Therapy (CILT). The main aim of the study was to determine if CILT would
result in observable improvements in speech and if it would be significantly better than regular,
unconstrained language therapy. There were four groups of patients who completed the study:
• Intensive CILT (10 patients)
• Distributed CILT (10 patients)
• Intensive Promoting Aphasic Communicative Effectiveness (PACE) (8 patients)
• Distributed PACE (8 patients)
The PACE therapy [24] was used for the comparison because of its common application in
the rehabilitation of aphasia.
We expect that the clinical response on the intervention in each group can be variable,
therefore, the single-subject design is reasonable approach to test for the treatment effect
for each patient.The results from trials can be combined using meta-analysis or Bayesian
hierarchical models [25].
The model 1–7 and the algorithm for estimating p−values, described in Section 1.3,
were programmed in R and in MATLAB. The graphics of fitted model for different groups of
patients are presented in Figure 1-4, Figure 1-5, Figure 1-6 and Figure 1-7 respectively. The
results from the CILT study are summarized in Table 1-3.
1.6 Conclusions
In this chapter, we developed an improved intervention model for single-subject studies
with relatively small number of observations for each subject.The exact likelihood function for
the model was derived. We also presented a framework for a treatment effect test in clinical
studies with single-subject design. This goal is achieved using the coefficient estimates from the
exact likelihood function.
24
Table 1-3. Case study results
Patient p-value TS µ ω δ1 < 0.001 66.01 56.177 1.31 0.3252 0.87 0.404 42.51 -12.845 0.253 < 0.001 1145.84 9.25 34.82 0.214 0.001 48.01 60.41 17.56 0.065 < 0.001 927.829 20.02 18.98 0.476 0.041 17.41 56.84 51.33 -0.487 0.689 1.31 3.98 2.75 0.0478 0.014 23.1 47.36 1.14 0.219 < 0.001 170.48 19.5 -0.12 0.6910 0.002 89.51 59.06 -28.76 0.6211 0.23 10.13 64.18 44.66 -0.49512 0.003 57.8 13.15 0.79 0.6513 < 0.001 122.09 50.07 1.40 0.3514 < 0.001 82.98 46.52 13.61 0.3315 0.035 19.64 56.13 -34.16 0.716 < 0.001 62.97 34.10 -18.72 0.7217 0.01 32.58 3.58 4.39 0.5618 < 0.001 104.15 37.28 -16.7 0.7619 < 0.001 112.07 9.41 39.59 0.1320 0.015 28.29 69.4 -29.98 0.5221 < 0.001 1551.63 10.41 23.58 0.4422 < 0.001 147.12 33.17 20.57 0.3823 0.031 19.03 34.44 -7.68 0.5624 < 0.001 96.76 53.51 -20.22 0.6225 < 0.001 85.47 2.01 41.61 -0.0326 < 0.001 1892.40 7.83 49.02 0.0927 0.37 14.26 -2.41 6.41 -0.2228 0.001 61.42 17.86 5.5 0.1629 0.026 29.91 3.97 -2.75 0.2530 0.089 10.44 4.34 -0.94 0.5531 0.015 26.25 23.10 -17.89 0.4432 0.011 30.29 16.92 28.99 0.1933 0.354 17.8 -0.02 0.79 -0.7534 0.64 1.51 0.17 -0.034 0.3735 0.024 22.28 39.68 44.85 -0.3336 < 0.001 1287.95 31.26 16.64 0.38
25
Figure 1-4. Intensive CILT
26
Figure 1-5. Distributed CILT
27
Figure 1-6. Intensive PACE
28
Figure 1-7. Distributed PACE
29
The model was successfully fit to the data from a randomized clinical trial of Constraint
Induced Language Therapy. Clearly, the applications of such models are not restricted to
clinical studies, though this research was motivated by medical applications.
30
CHAPTER 2ANALYSIS OF VARIANCE BASED ON CROSS-FITTING MEASURE OF SIMILARITY AND
PERMUTATION TEST
2.1 Introduction
The analysis of experimental data that have been observed at different time points leads
to new statistical modeling. Time series data arise in many scientific fields: economics (stock
market), medicine (blood pressure traced over time, fMRI), speech recognition, physical and
environmental sciences. Typical real problems on time series deal with modeling, forecasting
and clustering. For this reason the study of distance measures and clustering for time series
is an important part of research in several scientific fields. The main goal of this work is to
study group differences in time series. Assume that in each group time series observed from
many different subjects , each with different model. We shall claim group differences if there
is more between group differences than within group differences. These types of problems rely
on the ability to measure the similarity or dissimilarity between time series. Defining reasonable
measure of similarity is a nontrivial task. There are two main approaches to perform pairwise
comparison between time series. The first approach deals with selected features extracted
from the data. The second approach relies on comparison models built from the raw data
with likelihood ratio type of testing. In this work we introduce a distance measure based on
cross-fitting. The proposed measure should be a convenient tool for analysis of variance and
time series clustering.
2.2 Literature Review
In this section we briefly summarize previous research on measure of time series similar-
ity/dissimilarity. We will discuss methods based on raw data and also model based approach
which is more related to the goal of the research in this chapter.
Minkowski distance: Let X and Y be T -dimensional vectors. Then Minkowski distance in
Lq norm between observed values is defined as:
dM =q
√√√√ N∑t=1
(Xt − Yt)q
31
There are several distances based on cross-correlation.
Golay et al. [26] introduced two cross-correlation-based distances
d1cc = (1− cc1 + cc
)β
for some β > 0, and
d2cc = 2(1− cc)
where
cc =
∑Tt=1(Xt − µX )(Yt − µY )
SXSY.
SX and SY are standard deviations.
According to Liao [27] dissimilarity index based on cross-correlation function can be
defined as:
di ,j =
√1− ρ2i ,j(0)∑maxτ=1 ρ
2i ,j(τ)
where ρ2i ,j(τ) is the cross-correlation between two time series Xi and Yj with lag τ , and max is
the maximum lag.
So far we considered the metrics that are based on the similarity of the raw data aligned
by time. Now we will focus on approach based on model comparison when the time alignment
is irrelevant. Kalpakis et al. [28] claim that there are many similarity queries where Euclidean
distance between raw data fail to capture the notion of similarity and they proposed the
Euclidean distance between the Linear Predictive Coding(LPC) spectra as a measure of
dissimilarity. Consider AR(p) time series, Box [21]
Xt = φ1Xt−1 + φ2Xt−2 + ... + φpXt−p + at
Then,
cn =
φ1, if n = 1
φn +∑n−1m=1(1−
mn)αmcn−m, if 1 < n ≤ p∑n−1
m=1(1−mn)αmcn−m, if p < n
32
Alonso [29] developed time series clustering based on forecast densities. Let X (i) =
(X(i)1 , ...,X
(i)T ) be the time series corresponding to the ith subject in the sample. Assume
f(i)XT+h
denote the density function of the forecast X (i)T+h, then the distance is
Dij =
∫ (f(i)XT+h
− f (j)XT+h)dx
Another distance was proposed by Piccolo [30]. The distance is based on AR(∞) representa-
tion of ARMA models. We will discuss this distance in the next section and compare it with
proposed cross-fitting distance. Maharaj [31] also used AR(∞) form of ARMA models to test
hypothesis if there is difference between the generating processes of two stationary series.
2.3 Measure of Distance by ARMA Coefficients
The purpose of this section is to outline the idea of AR distance and Cross-Fitting
measure of similarity. The AR distance was introduced by Piccolo[30]. Corduas and Piccolo[32]
discovered asymptotic distribution of the squared AR distance in order to set comparison of
time series within the hypotheses testing framework.
Let Zt be a zero mean ARIMA(p, d , q) process. According to the standard notation of
Box et al.[21] such a model is defined as follow:
φ(B)OdZt = θ(B)at (2–1)
where at is a univariate white noise process with zero mean and constant variance σ2, B is the
backward shift operator, which is defined by BZt = Zt−1. An autoregressive operator of order
p and a moving average operator of arder q are defined as:
φ(B) = 1− φ1B − φ2B2 − ...− φpB
p
θ(B) = 1− θ1B − θ2B2 − ...− θqB
q
with the invertibility and stationarity restrictions. We need invertibility assumption to ensure
that Zt can be represented according to AR(∞) formulation:
π(B)Zt = at (2–2)
33
with
π(B) = (1− B)dφ(B)θ−1(B) = 1−∞∑j=1
πjBj
and∞∑j=1
|πj | <∞
Based on this representation Piccolo[30] introduced the Euclidean distance between the
π-weights as measure of dissimilarity between two ARIMA processes Xt and Yt :
d =
√√√√ ∞∑j=1
(πxj − πyj)2, (2–3)
where πxj and πyj are π-weights from models for time series Xt and Yt respectively.
2.4 The Cross-Fitting Measure of Similarity
In this section we develop a new approach to measure similarity between time series.
Suppose there are two time series Xt , t = 1, ..., n and Yt , t = 1, ..,m.
Assume we can fit series Xt by ARMA model M1 and series Yt by ARMA model M2.
To define the distance between two series we use the following algorithm:
1. Apply M1 to time series Xt to obtain prediction error σ211.
2. Apply M2 to time series Xt to obtain prediction error σ212.
3. Apply M1 to time series Yt to obtain prediction error σ221
4. Apply M2 to time series Yt to obtain prediction error σ222
5. Define the distance between the two time series by:
d(1, 2) =σ212 − σ211
σ211+σ221 − σ222
σ222(2–4)
In the case when we are only interested in the ‘shape‘ difference of the two time series we can
standardize the series by
Xt =Xt − XSx
, Yt =Yt − YtSy
,
34
where X , Y , Sx , Sy are the standard notation for sample means and standard deviations
respectively.
The proposed measure satisfies properties of a semimetric:
1. d(Xt ,Yt) ≥ 0
2. d(Xt ,Yt) = 0 if and only if Xt = Yt
3. d(Xt ,Yt) = d(Yt ,Xt)
Consider two AR(1) models:
• Xt = φ1Xt−1 + εt where εt are iid N(0, 1)
• Yt = φ2Yt−1 + εt , where εt are iid N(0, 1)
In terms of the models coefficients the cross-fitting distance is equal:
d(1, 2) =(φ1 − φ2)
2
1− φ21+(φ1 − φ2)
2
1− φ22
Consider three time series from AR(1) models:
1. Xt = 0.1Xt−1 + εt , t = 1, ..., 100
2. Yt = 0.5Yt−1 + εt , t = 1, ..., 100
3. Zt = 0.9Zt−1 + εt , t = 1, ..., 100
For the cross-fitting measure we obtain: according to our definition d(1, 2) = 0.37,
d(2, 3) = 1.055
For the AR distance it is easy to see that the Piccolo’s Euclidean distance between model
1(φ = 0.1) and model 2(φ = 0.5) is the same as distance between model 2(φ = 0.5) and
model 3(φ = 0.9).
Based on graphical representation we expect more dissimilarity between Model 2 and
Model 3.
We considered three stationary time series for comparison between Piccolo’s distance and
cross-fitting measure of similarity. The difference is even more illustrative if we consider two
stationary time series and one non-stationary, because the dissimilarity is going to be even more
35
extreme between non-stationary and stationary time series than between two stationary series.
Suppose there are two time series from AR(2) models:
• Xt = φ1Xt−1 + φ2Xt−2 + εt ,
• Yt = φ∗1Yt−1 + φ
∗2Yt−2 + ε
∗t
Based on the definition of cross-fitting measure:
d(1, 2) =σ212 − σ211
σ211+σ221 − σ222
σ222
Then σ211 = E(Xt − Xt)2 = E(ε2t ) = 1 and similarly σ222 = 1.
For σ212 we fit data from the time series based on the first model using the second model:
σ212 = γ(0)− 2(φ∗1γ(1) + φ
∗2γ(2)) + (φ
∗21 + φ
∗22 )γ(0) + 2φ
∗1φ
∗2γ(1)
σ221 = γ∗(0)− 2(φ1γ∗(1) + φ2γ∗(2)) + (φ21 + φ22)γ∗(0) + 2φ1φ2γ∗(1)
Where:
γ(0) =1− φ2
1− φ2 − φ21 − φ21φ2 − φ22 + φ32
γ(1) =φ1
1− φ2 − φ21 − φ21φ2 − φ22 + φ32
γ(2) =φ21 − φ2(1− φ2)
1− φ2 − φ21 − φ21φ2 − φ22 + φ32
and
γ∗(0) =1− φ∗
2
1− φ∗2 − φ∗2
1 − φ∗21 φ
∗2 − φ∗2
2 + φ∗32
γ∗(1) =φ∗1
1− φ∗2 − φ∗2
1 − φ∗21 φ
∗2 − φ∗2
2 + φ∗32
γ∗(2) =φ∗21 − φ∗
2(1− φ∗2)
1− φ∗2 − φ∗2
1 − φ∗21 φ
∗2 − φ∗2
2 + φ∗32
36
Figure 2-1. Comparison of three stationary AR(1) time series
37
2.5 ANOVA Based on Cross-Fitting Measure of Similarity and Permutation Test
In this section we consider the Fisher’s permutation test [33] for the ANOVA with the
cross-fitting distance derived in the previous section. The permutation test allows to estimate
a p−value without any assumptions about the distribution of a test statistics under the null
hypothesis.
Let group i , i = 1, ...,K have observations Yij(t), where j is the subject index and t is
the time index, j = 1, ..., ni . Assume the time series model for time series ij is Mij . Let the
fitting distance between ij and kl be d(ij , kl). The test statistic for H0 that there is no group
difference is
T =average between group measure (BM)
average within group measure (WM)(2–5)
where BM is given by
BM =
∑All between group measures∑K−1
i=1 ni
(∑Kj=i+1 nj
) (2–6)
and
WM =
∑All within group measures∑K
j=1
∑nj−1i=1 (nj − i)
(2–7)
To estimate p−value for the test one can use the permutation test .
The permutation algorithm is as follow:
1. Calculate statistics T using the equation 2–5
2. Evaluate value T ∗i for each permutation, i = 1, ...,N
3. Approximate p−value as follow
p =
∑Ni=1 I (T
∗i > T )
N
2.6 Conclusions
In this chapter we presented a cross-fitting measure, a novel approach to measure
similarity between time series. We also discussed some limitations of previous research. We
derived ANOVA based on cross-fitting measure of similarity and permutation test. The
proposed measure can be a useful tool for problems that involve time series clustering.
38
APPENDIX ALIKELIHOOD FUNCTION
In this part, using results from Newbold [34] for an ARMA(1,1) process, W =
(W1, ...,Wn), we derive the exact likelihood function for a vector Y = (Y1, ...,Yn).
Consider the model:
Y1 = µ+W1
...
YT = µ+WT
YT+1 = µ+ ω + δYT +WT+1
...
Yn = µ+ ω + δYn−1 +Wn
(A–1)
We assume that Wt , t = 1, ..., n follow ARMA(1,1) model.
Let a0 = a0
W0 =W0
at =Wt − φWt−1 − θat−1, (1 ≤ t ≤ n)
(A–2)
Define vector e as follow
e = (e∗, en)T (A–3)
where
e∗ = (a0,W0)T
and
en = (a1, ..., an)T
Then the set of equations (A–2) can be written as follow
e = LW + Xe∗ (A–4)
39
where
L =
0 0 0 ... 0 0
0 0 0 ... 0 0
1 0 0 ... 0 0
(θ − φ) 1 0 ... 0 0
θ(θ − φ) (θ − φ) 1 ... 0 0
...
θn−2(θ − φ) θn−3(θ − φ) θn−4(θ − φ) ... (θ − φ) 1
and
X =
1 0
0 1
θ −φ
θ2 −θφ...
...
θn −θn−1φ
Then the model (A–4) can be written ase∗
en
= 0Ln
W + IXn
e∗ (A–5)
Consider
E(e∗eT∗ ) = E
a20 aoW0
a0W0 W 20
(A–6)
The stationary and invertible ARMA(p, q) process can be represented as
Wt = ψ(B)at =∞∑j=0
ψjat−j
where
ψ(B) = φ−1(B)θ(B)
40
Let γj is an autocovariance function, then
γj = E(WtWt−j)
The set of equations for autocovariance functions is given by [21]γ0 = φγ1 + σ
2(1− θψ1)
γ1 = φγ0 − θσ2
γk = φγk−1 (k ≥ 2)
(A–7)
Solving (A–7) for γ0 and γ1 we obtainγ0 =
1+θ2−2φθ1−φ2
σ2
γ1 =(1−φθ)(φ−θ)1−φ2
σ2
γk = φγk−1 (k ≥ 2)
(A–8)
Clearly,
E(a20) = σ2,
and
E(W 20 ) = γ0 =
1 + θ2 − 2φθ1− φ2
σ2
For h ≥ 0,
Cov(at+h−j ,Wt) = Cov(at+h−j ,
∞∑k=0
ψkat−k) = ψj−hσ2
Therefore,
E(a0W0) = ψ0σ2 = σ2
Then
E(e∗eT∗ ) = σ2Ω (A–9)
where
Ω =
1 1
1 1+θ2−2φθ1−φ2
41
Let T be a nonsingular matrix s.t.
TΩTT = I
Multiplication of (A–5) by the matrix T 0
0 I
yields u∗
un
= 0Ln
W + I
XnT−1
u∗ (A–10)
where u∗ = Te∗ and un = en.
In matrix notation we can write:
u = LW + Zu∗ (A–11)
Z is given by
Z =
1 0
0 1
θ − φ −φ(θ − φ)γ
θ(θ − φ) −θφ(θ − φ)γ
...
θn−1(θ − φ) −θn−1φ(θ − φ)γ
(A–12)
Since
E(u∗uT∗ ) = σ2TΩTT
then the density function of u is given by
f (u|σ) = (2πσ2)−12(n+2) exp
(−12uTu/σ2
)Therefore, the joint density function of W and u∗ is given by
f (W , u∗) = (2πσ2)−
12(n+2) exp
(−12S(φ, θ, u∗)/σ
2
)
42
with S(φ, θ, u∗) is given by
S(φ, θ, u∗) = (LW + Zu∗)T (LW + Zu∗)
Let
u∗ = −(ZTZ)−1ZTLWn
using the fact that
S(φ, θ, u∗) = S(φ, θ) + (u∗ − u∗)TZTZ(u∗ − u∗)
where
S(φ, θ) = (LW + Zu∗)T (LW + Zu∗)
the joint density function can be written as
f (W , u∗) = f (W |φ, θ,σ)f (u∗|W ,φ, θ,σ)
therefore, the marginal density function is given by
f (W |φ, θ,σ) = (2πσ2)−n2 |ZTZ |−
12 exp −1
2S(φ, θ)/σ2 (A–13)
One can rewrite
S(φ, θ) = (LW − Z(ZTZ)−1ZTLW )T (LW − Z(ZTZ)−1ZTLW ) =W TΓW (A–14)
where
Γ = LT (I − Z(ZTZ)−1ZT )TL
Consider model (A–1). In matrix form it can be rewritten as
W = AY − ω (A–15)
43
where
A =
1 0 0 ... ... 0 0
0 1 0 ... ... 0 0
...
0 ... ... −δ 1 ... 0
...
0 0 0 ... ... −δ 1
(A–16)
then
A−1 =
1 0 0 ... ... 0 0
0 1 0 ... ... 0 0
...
0 ... ... δ 1 ... 0
...
0 0 0 ... ... δ 1
(A–17)
And the Jacobian of the transformation is |(det(A−1))−1| = 1.
Thus, the exact likelihood for the vector Y = (Y1, ...,Yn) is of the form
p(Y |φ, θ,σ) = (2πσ2)−n2 |ZTZ |−
12 exp −1
2S∗(φ, θ)/σ2 (A–18)
where
S∗(φ, θ) = (AY − ω)TΓ(AY − ω)
and
ZTZ =
1 + (θ−φ)2(1−θ2n)1−θ2
−(θ − φ)2φγ 1−θ2n
1−θ2
−(θ − φ)2φγ 1−θ2n
1−θ21 + (θ − φ)2φ2γ2 1−θ2n
1−θ2
Therefore, the determinant is given by
|ZTZ | = 1 + (θ − φ)21− θ2n
(1− θ2)(1− φ2)
44
APPENDIX BCROSS-FITTING DISTANCE
Consider two time series from AR(2) models:
• Xt = φ1Xt−1 + φ2Xt−2 + εt
• Yt = φ∗1Yt−1 + φ
∗2Yt−2 + ε
∗t
Then
σ211 = E(Xt − Xt)2 = E(ε2t ) = 1
For σ212 we fit data from the model (1) using the model(2)
σ212 = E(Xt−Xt)2 = E(Xt−φ∗1Xt−1−φ∗
2Xt−2)2 = E(X 2t )−2E(Xt(φ∗
1Xt−1−φ∗2Xt−2))+E(φ
∗1Xt−1−φ∗
2Xt−2)2
To find cov(Xt+h,Xt) one can use difference equations
γ(h)− φ1γ(h − 1)− φ2γ(h − 2) = 0, h > max(p, q + 1)
with initial conditions
γ(h)− φ1γ(h − 1)− φ2γ(h − 2) = σ2ε
q∑j=h
θjψj−h
where θj are coefficients from MA part: θ0 = 1, θj = 0 for j ≥ 1
To find ψj one can use the equation:
(ψ0 + ψ1x + ψ2x2...)(1− φ1x − φ2x
2 − ...) = (1 + θ1x + θ2x2 + ...)
The first few values:
ψ0 = 1
ψ1 − φ1ψ0 = θ1
ψ2 − φ1ψ1 − φ2ψ0 = θ2
45
And from the system of equations we get:
ψ0 = 1
ψ1 = φ1
ψ2 = φ21 + φ2
Thus γ(0), γ(1), γ(2) can be obtained from the system of equations:
φ1γ(1) + φ2γ(2) + 1 = γ(0)
φ1γ(0) + φ2γ(1) = γ(1)
φ1γ(1) + φ2γ(0) = γ(2)
E(X 2t ) = γ(0) =1− φ2
1− φ2 − φ21 − φ21φ2 − φ22 + φ32
E(XtXt−1) = γ(1) =φ1
1− φ2 − φ21 − φ21φ2 − φ22 + φ32
E(XtXt−2) = γ(2) =φ21 − φ2(1− φ2)
1− φ2 − φ21 − φ21φ2 − φ22 + φ32
And finally,
σ212 = γ(0)− 2(φ∗1γ(1) + φ
∗2γ(2)) + (φ
∗21 + φ
∗22 )γ(0) + 2φ
∗1φ
∗2γ(1)
σ221 = γ∗(0)− 2(φ1γ∗(1) + φ2γ∗(2)) + (φ21 + φ22)γ∗(0) + 2φ1φ2γ∗(1)
46
APPENDIX CR CODE FOR EVALUATING MLE
In this chapter we developed R code for calculating exact likelihood function and MLE for
improved model
> rm(list = ls())
> ciu <- read.table("CIUdata.txt")
> ciu <- as.matrix(ciu)
> p.val <- function(TS)
Y <- TS
ar.fit <- function(Y)
phi <- seq(-0.99, 0.99, 0.01)
n <- length(Y)
tcp <- 4
Zm <- function(x, y, n)
Zn <- matrix(rep(0, 2 * n), nrow = n)
a <- c(0:(n - 1))
Zn[, 1] <- (x^a) * (x - y)
Zn[, 2] <- (-y) * x^a * (x - y) * (1 - y^2)^(-0.5)
Z <- rbind(diag(1, 2, 2), Zn)
return(Z)
Lm <- function(x, y = 0, n)
Ln <- matrix(rep(0, n * n), nrow = n)
b <- c(0:(n - 2))
for (i in 1:(n - 1))
Ln[, 1] <- c(1, (x^b) * (x - y))
Ln[(i + 1):n, i + 1] <- Ln[1:(n - i), 1]
47
L <- rbind(rep(0, n), rep(0, n), Ln)
return(L)
Determ <- function(x, y, n)
D <- 1 + (x - y)^2 * (1 - x^(2 * n))/((1 - x^2) *
(1 - y^2))
InvM <- function(x, y, n)
Inv11 <- 1 + (x - y)^2 * y^2 * (1 - x^(2 * n))/((1 -
y^2) * (1 - x^2))
Inv22 <- 1 + (x - y)^2 * (1 - x^(2 * n))/(1 - x^2)
Inv12 <- (x - y)^2 * y * (1 - x^(2 * n))/((1 - y^2)^(1/2) *
(1 - x^2))
Inv <- Determ(x, y, n)^(-1) * matrix(c(Inv11, Inv12,
Inv12, Inv22), ncol = 2)
likl <- function(x)
phi <- x
I <- diag(1, n + 2, n + 2)
Z <- Zm(0, phi, n)
L <- Lm(0, phi, n)
Det <- Determ(0, phi, n)
Inv <- InvM(0, phi, n)
B <- cbind(rep(1, n), c(rep(0, 4), rep(1, 9)), c(rep(0,
tcp), Y[tcp:12]))
Gamma <- t(L) %*% (I - Z %*% Inv %*% t(Z)) %*% L
48
betaHat <- solve(t(B) %*% Gamma %*% B) %*% t(B) %*%
Gamma %*% Y
S <- t(Y - B %*% betaHat) %*% Gamma %*% (Y - B %*%
betaHat)
lgl <- -n/2 * log(S) - 1/2 * log(Det)
return(lgl)
Lik <- rep(0, length(phi))
Lik <- sapply(phi, likl)
phi <- phi[which.max(Lik)]
I <- diag(1, n + 2, n + 2)
Z <- Zm(0, phi, n)
L <- Lm(0, phi, n)
Det <- Determ(0, phi, n)
Inv <- InvM(0, phi, n)
B <- cbind(rep(1, n), c(rep(0, 4), rep(1, 9)), c(rep(0,
4), Y[4:12]))
Gamma <- t(L) %*% (I - Z %*% Inv %*% t(Z)) %*% L
temp <- solve(t(B) %*% Gamma %*% B)
betaHat <- temp %*% t(B) %*% Gamma %*% Y
S <- t(Y - B %*% betaHat) %*% Gamma %*% (Y - B %*% betaHat)
s2 <- sqrt(S/n)
s2 <- as.numeric(s2)
tmp2 <- temp * s2^2
lik <- c(phi, betaHat, s2)
CovInv <- solve(temp[2:3, 2:3])
TStat <- betaHat[2:3] %*% CovInv %*% betaHat[2:3]/s2^2
49
lik <- c(TStat, phi, betaHat, s2)
return(lik)
coeff <- as.numeric(ar.fit(TS))
n.col <- 100
n.row <- 13
s.data <- matrix(rep(0, n.col * n.row), nrow = n.row)
for (i in 1:n.col)
s.data[, i] <- arima.sim(list(order = c(1, 0, 0), ar = coeff[2]),
n = n.row) * coeff[6]
sim.coef <- apply(s.data, 2, ar.fit)
p <- sum(coeff[1] < sim.coef[1, ])/100
res <- c(p, coeff)
50
APPENDIX DMATLAB CODE FOR EVALUATING MLE
The MATLAB code was written by Dr. Sam Wu and Oleksandr Savenkov.
% FitData
DisplayFitOnly
% DisplayFitAndData
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function FitData
Y = InputData;
T = 4; n = 13; Times = 1:13;
llRes = [];
[ParEst, CovOmegaDelta, TestStat, Pvalues] = FitSSdata(Y, T);
[StdC, Pvalue] = EvalCstat(Y);
AllRes = [ParEst, reshape(CovOmegaDelta, 1, 4), TestStat, Pvalues, StdC, Pvalue];
ParEst
fid2 = fopen('AnalysisResults_Tmp.txt','w');
[r,c] = size(AllRes);
for ii = 1:r
for jj=1:c
fprintf(fid2, ['%9.4f '], AllRes(ii, jj));
end
fprintf(fid2, '\n');
end
fclose(fid2);
51
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function DisplayFitOnly
T = 4; n = 13; Times = 1:13;
BetaHat = [31.92 -13.15 0.36];
BetaHat = [29.42 18.95 -0.60];
BetaHat = [30.42 0.001 0.005];
Mu = BetaHat(1);
Omega = BetaHat(2);
Delta = BetaHat(3);
Yhat = zeros(n, 1); Yhat(1:T) = Mu;
for t = (T+1):n
Yhat(t) = Mu + Omega + Delta * Yhat(t-1);
end
figure(3)
plot(Times, Yhat, 'b-', 'MarkerSize',4, 'LineWidth',1);
ylim([0, 100]);
set(gca,'ytick',[0 25 50 75 100]);
set(gca,'yticklabel',[' 0'; ' 25'; ' 50'; ' 75'; '100']);
ylabel('Percent');
set(gca,'xtick',[4 8 12]);
xlabel('Sessions')
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function DisplayFitAndData
52
Y = InputData;
T = 4; n = 13; Times = 1:13;
load AnalysisResults_Tmp.txt;
Est = AnalysisResults_Tmp;
BetaHat = Est(4:6);
Mu = BetaHat(1);
Omega = BetaHat(2);
Delta = BetaHat(3);
Yhat = zeros(n, 1); Yhat(1:T) = Mu;
for t = (T+1):n
Yhat(t) = Mu + Omega + Delta * Yhat(t-1);
end
figure(3)
plot(Times, Y, 'b^', Times, Yhat, 'b-', 'MarkerSize',4, 'LineWidth',1);
ylim([0, 100]);
p1 = Est(13);
p1 = round(p1*1e4)/1e4;
p2 = Est(15);
p2 = round(p2*1e4)/1e4;
title( strcat('p = ', num2str(p1, 4), ' & ', num2str(p2, 4) ) );
set(gca,'ytick',[0 25 50 75 100]);
set(gca,'yticklabel',[' 0'; ' 25'; ' 50'; ' 75'; '100']);
ylabel('Percent');
set(gca,'xtick',[4 8 12]);
53
xlabel('Sessions')
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [ParEst, CovOmegaDelta, TestStat, Pvalues] = FitSSdata(Y, T)
%% Y -- single subject observation series
%% T -- time of intervention
%%
n = length(Y);
B = [ones(n, 1) [zeros(T,1); ones(n-T, 1)] [zeros(T,1); Y(T:(n-1))]];
[ParEst, Cov_Beta] = GridSearch(B, Y);
CovOmegaDelta = Cov_Beta(2:3, 2:3);
TestStat = ParEst(5:6) * inv(CovOmegaDelta) * ParEst(5:6)';
Pvalue1 = 1-chi2cdf(TestStat, 2);
Pvalue2 = 0; %% EvalPvalue2(ParEst, TestStat, T, n);
Pvalues = [Pvalue1, Pvalue2];
function Pvalue2 = EvalPvalue2(ParEst0, TestStat0, T, n)
Rep = 4e3;
Fi = ParEst0(1);
Theta = ParEst0(2);
s = sqrt(ParEst0(3));
Mu = ParEst0(4);
Pvalue2 = 0;
for i = 1:Rep
Noise = simarma(Fi, -Theta, n, s^2);
54
Y = Mu + Noise';
B = [ones(n, 1) [zeros(T,1); ones(n-T, 1)] [zeros(T,1); Y(T:(n-1))]];
[ParEst, Cov_Beta] = GridSearch(B, Y);
CovOmegaDelta = Cov_Beta(2:3, 2:3);
TestStat = ParEst(5:6) * inv(CovOmegaDelta) * ParEst(5:6)';
Pvalue2 = Pvalue2 + (TestStat>TestStat0)/Rep;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% This new version was created on Oct 17, 2011
%%
function [ParEst, Cov_Beta] = GridSearch(B, Y);
tmp = [];
n = length(Y);
for Fi = -0.99:0.01:0.99
for Theta = 0 %% -.99:0.01:0.99
%% s2= mean(Residuals.^2) * (1-Fi^2) / (1 + Theta^2 - 2*Theta*Fi);
[Z, L, DetZpZ, Gamma] = CreateZLGamma(Fi,Theta,n);
BetaHat = inv(B'*Gamma*B) * B' * Gamma * Y;
Residuals = (Y - B*BetaHat);
s2= Residuals' * Gamma * Residuals /n;
ll = -n/2*log(2*pi*s2) -log(DetZpZ)/2 - n/2;
%% ll = -n/2*log(2*pi*s2);
tmp = [tmp; [Fi Theta s2 BetaHat' ll]];
55
end
end
a = sortrows(tmp, -7);
ParEst = a(1, 1:6);
[Z, L, DetZpZ, Gamma] = CreateZLGamma(ParEst(1), ParEst(2),n);
Tmp = inv(B'*Gamma*B);
BetaHat = Tmp * B' * Gamma * Y;
Residuals = (Y - B*BetaHat);
s2= Residuals' * Gamma * Residuals /n;
Cov_Beta = Tmp * s2;
function [Z, L, DetZpZ, Gamma] = CreateZLGamma(fi,theta,n)
a = (0:(n-1))';
b = (theta .^ a) * (theta - fi);
Z = [[1; 0; b] [0; 1; b*(-fi)*((1-fi^2)^(-0.5))]];
tmp = [0; 1; b];
L = [];
for i = 1:n
tmp = [0; tmp(1:(n+1))];
L = [L tmp];
end
w = (theta - fi)^2 * (1 - theta^(2*n))/(1 - theta^2);
56
r = 1/sqrt(1-fi^2);
DetZpZ = 1 + w/(1-fi^2);
InvZpZ = [1+w*fi^2*r^2, w*fi*r; w*fi*r, 1+w]/DetZpZ;
Gamma = L' * (eye(n+2) - Z * InvZpZ * Z') * L;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=simarma(fi,theta,n,s2,seed)
% y=simarma(fi,theta,n,s2) simulates ARMA process,
% fi vector fi paramaters, theta vector of theta parameters
% n observations
% s2 WN variance
% if fifth argument seed given, seed is the random generator seed
%
% The parametrization is under the Brockwell notations for theta and fi!!!!!!
if nargin==5
randn('seed',seed);
end
y=filter([1 theta],[1 -fi],randn(1,n+20));
y=y(21:n+20)*sqrt(s2);
function [StdC, Pvalue] = EvalCstat(x)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This program evaluate the C statistics by Tryon, 1982 %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
57
n = length(x); a = x(2:n)-x(1:(n-1)); b = x - mean(x);
sc = sqrt((n-2)/(n-1)/(n+1));
c = 1 - sum(a.^2)/2/sum(b.^2);
StdC = c/sc;
Pvalue = 2*(1-normcdf(StdC));
58
REFERENCES
[1] Robey R, Schultz M, Crawford A, Sinner C. Review: Single-subject clinical-outcomeresearch: designs, data, effect sizes, and analyses. Aphasiology 1999; 13(6):445–473.
[2] Ottenbacher K. Interrater agreement of visual analysis in single-subject decisions: Quanti-tative review and analysis. American journal on mental retardation 1993; .
[3] Kazdin A. Single-case research designs: Methods for clinical and applied settings. OxfordUniversity Press New York, 1982.
[4] Bloom M, Fischer J, Orme J, et al.. Evaluating practice. Allyn & Bacon, 1982.
[5] Matyas T, Greenwood K. Visual analysis of single-case time series: Effects of variability,serial dependence, and magnitude of intervention effects. Journal of Applied BehaviorAnalysis 1990; 23(3):341.
[6] Johnson M, Ottenbacher K. Trend line influence on visual analysis of single-subject data inrehabilitation research. Disability & Rehabilitation 1991; 13(2):55–59.
[7] Ottenbacher K, Cusick A. An empirical investigation of interrater agreement for single-subject data using graphs with and without trend lines. Journal of the Association forPersons with Severe Handicaps 1991; .
[8] Stocks J, Williams M. Evaluation of single subject data using statistical hypothesis testsversus visual inspection of charts with and without celeration lines. Journal of socialservice research 1995; 20(3-4):105–126.
[9] Ottenbacher K. Visual inspection of single-subject data: An empirical analysis. MentalRetardation 1990; .
[10] Krishef C. Fundamental approaches to single subject design and analysis. Krieger Pub.Co., 1991.
[11] Tryon W. A simplified time-series analysis for evaluating treatment interventions. Journalof Applied Behavior Analysis 1982; 15(3):423.
[12] Phillips J. Serially correlated errors in some single-subject designs. British Journal ofMathematical and Statistical Psychology 1983; 36(2):269–280.
[13] Toothaker L, Banz M, Noble C, Camp J, Davis D. N=1 designs: The failure of anova-based tests. Journal of Educational and Behavioral Statistics 1983; 8(4):289–309.
[14] Sharpley C, Alavosius M. Autocorrelation in behavioral data: An alternative perspective.1988; .
[15] Suen H, Lee P, Owen S. Effects of autocorrelation on single-subject single-facet crossed-design generalizability assessment. Behavioral Assessment 1990; 12:305–315.
59
[16] Gottman J. Time-series analysis: A comprehensive introduction for social scientists, vol.400. Cambridge University Press Cambridge, 1981.
[17] Crosbie J. Interrupted time-series analysis with brief single-subject data. Journal ofConsulting and Clinical Psychology 1993; 61(6):966.
[18] Rosner B, Munoz A, Tager I, Speizer F, Weiss S. The use of an autoregressive model forthe analysis of longitudinal data in epidemiologic studies. Statistics in Medicine 1985;4(4):457–467.
[19] Rosner B, Munoz A. Autoregressive modelling for the analysis of longitudinal data withunequally spaced examinations. Statistics in Medicine 1988; 7(1-2):59–71.
[20] Box G, Tiao G. Intervention analysis with applications to economic and environmentalproblems. Journal of the American Statistical Association 1975; :70–79.
[21] Box G, Jenkins G, Reinsel G. Time series analysis: forecasting and control. Prentice Hall,1994.
[22] DasGupta A. Asymptotic theory of statistics and probability. Springer Verlag, 2008.
[23] Ihaka R, Gentleman R. R: A language for data analysis and graphics. Journal of computa-tional and graphical statistics 1996; :299–314.
[24] Davis G, Wilcox M. Adult aphasia rehabilitation: Applied pragmatics. College-Hill PressSan Diego, CA, 1985.
[25] Zucker D, Ruthazer R, Schmid C. Individual (n-of-1) trials can be combined to givepopulation comparative treatment effect estimates: methodologic considerations. Journalof clinical epidemiology 2010; 63(12):1312–1323.
[26] Golay X, Kollias S, Stoll G, Meier D, Valavanis A, Boesiger P. A new correlation-basedfuzzy logic clustering algorithm for fMRI. Magnetic Resonance in Medicine 1998;40(2):249–260.
[27] Liao W, et al.. Clustering of time series data–a survey. Pattern Recognition 2005;38(11):1857–1874.
[28] Kalpakis K, Gada D, Puttagunta V. Distance measures for effective clustering of ARIMAtime-series. Proceedings of the IEEE International Conference on Data Mining, Citeseer,2001; 273–280.
[29] Alonso A, Berrendero J, Hernandez A, Justel A. Time series clustering based on forecastdensities. Computational Statistics & Data Analysis 2006; 51(2):762–776.
[30] Piccolo D. A distance measure for classifying ARIMA models. Journal of Time SeriesAnalysis 1990; 11(2):153–164.
[31] Maharaj E. Cluster of time series. Journal of Classification 2000; 17(2):297–314.
60
[32] Corduas M, Piccolo D. Time series clustering and classification by the autoregressivemetric. Computational statistics & data analysis 2008; 52(4):1860–1872.
[33] Lehmann E, Romano J. Testing statistical hypotheses. Springer Verlag, 2005.
[34] Newbold P. The exact likelihood function for a mixed autoregressive-moving averageprocess. Biometrika 1974; 61(3):423–426.
61
BIOGRAPHICAL SKETCH
Oleksandr Savenkov was born in 1983 in Ukraine. He obtained his degree in Financial
Mathematics from Donetsk National University in 2004. After his graduation he had been
working for two years as an economist in Raiffeisen Bank Aval. Oleksandr joined the De-
partment of Statistics at the University of Florida as a graduate student in 2006. During his
study he was a teaching assistant for several undergraduate and graduate classes and research
assistant on several scientific projects.
62