2012 oleksandr v. savenkovufdcimages.uflib.ufl.edu/uf/e0/04/45/93/00001/savenkov_o.pdf · phd...

NOVEL METHODS FOR TIME SERIES DATA IN CLINICAL STUDIES

By

OLEKSANDR V. SAVENKOV

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2012

© 2012 Oleksandr V. Savenkov

2

This dissertation is dedicated to the memory of Mark C.K. Yang

3

ACKNOWLEDGMENTS

There are many people without whose support and encouragement I would not be able to

complete this degree.

First and foremost, I would like to thank my advisor Professor Sam Wu. I am grateful to

Sam for dedicating so much time and efforts to my dissertation. I would like also to thank my

first PhD advisor, Professor Mark Yang, who taught me to work hard . A real sadness is that

he passed away in 2010.

I would like to thank Professor Malay Ghosh and Professor Kshitij Khare for serving in my

PhD committee and giving me valuable advices.

I feel grateful to my collaborators and mentors, Professor Panos Pardalos, who is also in

my PhD committee, and Frank Skidmore, for introducing new exciting areas of research to me.

I am very grateful to my friends and mentors, Professor Vladimir Boginski and Professor

Sergiy Butenko, for their invaluable help and encouragement. Thank you.

I would like to thank my peers and the faculty members at the Department of Statistics.

Finally, I would like to thank my family. Although, they are thousands miles away from

me, I feel their love and support everyday.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

CHAPTER

1 INTERVENTION ANALYSIS FOR SINGLE-SUBJECT STUDIES . . . . . . . . . . 10

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2 An Improved Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3 The Test Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 ANALYSIS OF VARIANCE BASED ON CROSS-FITTING MEASURE OF SIMI-LARITY AND PERMUTATION TEST . . . . . . . . . . . . . . . . . . . . . . . . 31

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3 Measure of Distance by ARMA Coefficients . . . . . . . . . . . . . . . . . . . 332.4 The Cross-Fitting Measure of Similarity . . . . . . . . . . . . . . . . . . . . . 342.5 ANOVA Based on Cross-Fitting Measure of Similarity and Permutation Test . 382.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

APPENDIX

A LIKELIHOOD FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

B CROSS-FITTING DISTANCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

C R CODE FOR EVALUATING MLE . . . . . . . . . . . . . . . . . . . . . . . . . . 47

D MATLAB CODE FOR EVALUATING MLE . . . . . . . . . . . . . . . . . . . . . . 51

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5

LIST OF TABLES

Table page

1-1 MSE and Bias for β = (30, 0.8, 0.6) . . . . . . . . . . . . . . . . . . . . . . . . . 21

1-2 MSE and Bias for β = (30, 0, 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1-3 Case study results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6

LIST OF FIGURES

Figure page

1-1 Several Common Mean Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1-2 Estimated mean functions for an AR process . . . . . . . . . . . . . . . . . . . . . 22

1-3 Estimated mean functions for an ARMA process . . . . . . . . . . . . . . . . . . . 23

1-4 Intensive CILT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1-5 Distributed CILT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1-6 Intensive PACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1-7 Distributed PACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2-1 Comparison of three stationary AR(1) time series . . . . . . . . . . . . . . . . . . . 37

7

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

NOVEL METHODS FOR TIME SERIES DATA IN CLINICAL STUDIES

By

Oleksandr V. Savenkov

August 2012

Chair: Samuel S. WuMajor: Statistics

Single subject or n-of-1 research designs have been widely used to evaluate treatment

interventions. Many statistical procedures, such as: split-middle trend lines, regression trend

line, Shewart-chart trend line, binomial tests, randomization tests and Tryon C statistics have

been used to analyze single-subject data, but they fail to control Type I error due to serially

dependent time series observations.

In this work we present an improved intervention analysis model for dynamic characteris-

tics of an intervention effect in a short series of single subject data. There are several potential

difficulties that may arise, and one of them is that the series of data can be relatively short.

To address this issue we derived exact likelihood function, that allows us to get estimates in

more direct way than approximate algorithms, which fail to converge for short series. Since

we consider short series, the chi-squared approximation to the null distribution for the test

statistics may not be valid. In order to test the treatment effect, we develop the hypothesis

testing procedure. The methods are illustrated with a real clinical trial on constraint-induced

language therapy for aphasia patients.

In the second part of this work we provide an overview of several approaches to measure

the similarity/dissimilarity between time series. The main goal is to develop the framework that

would allow to investigate group difference in time series. This problem rely on the ability to

measure the distance between time series. In this work we develop a novel approach to measure

8

the similarity between time series, cross-fitting measure. The proposed measure can be also

applied to time series clustering problems.

9

CHAPTER 1INTERVENTION ANALYSIS FOR SINGLE-SUBJECT STUDIES

1.1 Introduction

Single-subject designs have been used widely for decades, particularly in the behavioral

sciences, but statistical analysis of such studies remains problematic, primarily because such

data are generally autocorrelated and the observation series is short. The purpose of this work

is to introduce a new analysis model that improves the current methods of drawing conclusions

from single-subject studies. In a typical single-subject design, repeated observations are made

on the lone subject during a baseline period and a subsequent treatment period. The baseline

measurements are intended to establish a stable reference point, and also, in cases of recent

injury or other affliction from which spontaneous improvement might be expected, to estimate

the rate of improvement prior to treatment. After the subject is exposed to the intervention,

the observations continue in an attempt to establish corresponding treatment-period values.

Investigators then desire to test two null hypotheses: 1) there is no difference in overall

outcome between the baseline and treatment periods, and 2) there is no difference in the

rate of change in outcome between the two periods. When spontaneous improvement before

treatment does not occur (i.e., the slope of the baseline data is non-positive), rejection of the

first hypothesis is enough to show that the treatment is effective. In other cases, rejection of

both hypotheses is required [1].

Visual analysis is the traditional and still widely used method of approaching such

studies [1]. The data are plotted across time, with a vertical line separating the baseline and

treatment periods. Investigators then eyeball the data and make informal conclusions about

the effectiveness of the intervention. As one might imagine, this method is highly subjective

and hence unreliable, with one large meta-analysis finding an overall inter-rater agreement

coefficient of only 0.58 [2]. To address this concern, researchers have proposed various tools to

aid visual analysis and make its conclusions more robust. In the split-middle trend line method

[3], the baseline data are divided in half and a line is drawn through their respective medians.

10

The same procedure is applied to the treatment data, and the level and slope of the two lines

are qualitatively compared. The celeration trend line method [4] is identical, except the lines

extend through means rather than medians. Separate regression lines through the baseline and

treatment data also are sometimes plotted and visually compared. None of these methods,

however, has been shown to offer much improvement in the reliability of visual analysis, with

Type I error rates remaining as high as 84% [5–9].

Methods that are more statistically oriented also often are used. The Shewart procedure

[4, 10] sets reference lines two standard deviations above and below the mean of the baseline

data. If two successive data points in the treatment period fall outside those bounds, one infer

that significant change has occurred. Binomial tests compare the proportion of treatment

points that fall above and below the baseline split-median or celeration line. T-tests are

sometimes performed between the baseline and treatment means, and Tryon’s C-statistic [11] is

frequently used to compare slopes. The autocorrelation of single-subject data causes such tests

to be invalid, however, and Type I error rates remain unacceptably high when autocorrelation is

present [12–15].

Gottman [16] defined interrupted-time-series analysis (ITSA) for a stream of serially

dependent observations across two experimental periods. Based on fitting autoregressive

parameters, the method yields three tests: an F test of the null hypothesis that no overall

change has occurred between the two periods, and t-tests for differences in means and slopes.

Crosbie [17] showed that the ITSA method underestimated positive autocorrelation and

hence could not maintain Type I error control when the baseline and treatment observations

were relatively few (less than 50 observations per time period), making the ITSA method

inapplicable to most clinical settings. Crosbie proposed a corrected version of ITSA, called

ITSACORR, which could handle shorter time series and has since been widely employed.

ITSACORR, however, fails to control Type I error for autocorrelations higher than 0.6 and

sample sizes less than 20, and it also assumes that the intervention effect is a linear trend

change from baseline, which is not appropriate in many applications, such as the examples

11

we give in our illustrative example section. In addition, Rosner, Munoz, et al.[18] and Rosner

and Munoz[19] present autoregressive models that use regression methods to relate change in

response variables to explanatory variables.

In this work, we consider an improved intervention analysis model [20] for dynamic charac-

teristics of an intervention effect in a short series of single-subject data. The statistical model

is presented in Section 2. The maximum likelihood estimates are derived and a hypothesis

testing procedure is proposed in Section 3. The methods are illustrated with a real clinical trial

on constraint-induced language therapy for aphasia patients in Section 4.

1.2 An Improved Model

Suppose the data Yt , t = 1, ..., n are available as a series observed at equal time intervals.

We assume that the time series is the subject to intervention at time T and the time T is

known. The part of the time series Yt , t < T is the preintervention data.

In pioneering work, Box and Tiao [20] considered the following intervention model

Yt = f (t) + Nt , (1–1)

where

• Yt is the observed outcome series

• f (t) is the unknown mean function associated with known intervention time

• Nt is random noise.

It is assumed that the noise Nt follows an autoregressive moving average model (ARMA):

φ(B)Nt = θ(B)at

where

• B is the backward shift operator [21]

• at , t = 1, ...n is a sequence of independent random variables with N(0,σ2) distribution

• φ(B) = 1− φ1B − φ2B2 − ...− φpB

p

12

θ(B) = 1− θ1B − θ2B2 − ...− θqB

q

In this work we consider the case of a single intervention. We should mention that such models

are not restricted to single intervention and several mean functions can be combined for more

sophisticated intervention effects. There are several possible response patterns, that depend on

the choice of the mean function f (t). Among others, we can consider following mean functions

• f (t) = ωBIt

• f (t) = ωB1−δBIt

• f (t) = ωB1−B It

An indicator function It is given by

It =

0, if t ≤ T ;

1, o/w.

(1–2)

Short time intervention effects can be specified using the pulse function

Pt =

1, if t = T ;

0, o/w.

Clearly,

(1− B)It = Pt

Thus, without loss of generality, we consider models with the step function It .

In this paper, we assume that f (t) follows a first-order dynamic model for intervention,

with the transfer function of the form

f (t) =ωB

1− δBIt , (1–3)

13

Figure 1-1. Several Common Mean Functions

14

where ω, δ are unknown parameters, with 0 < δ < 1, and B is the backward shift operator

[21]. This implies that

f (t) =

0, if t ≤ T ;

ω(1− δt−T )/(1− δ), if t > T .

(1–4)

Such transfer function model is appropriate when the response is not expected to be immedi-

ate. Such assumptions seem to be reasonable in clinical studies.

In general, it is desirable to chose the form of the transfer function based on the informa-

tion about mechanisms that cause the change. We also assume that Nt follows an ARMA(1, 1)

model with mean 0. Note that higher order Nt can be included into the model. For our case,

let

Nt =1− θB

1− φBat (1–5)

which implies that

Nt − φNt−1 = at − θat−1. (1–6)

Then the model for time series Yt has form

Y1 = µ+ N1

Y2 = µ+ N2

...

YT = µ+ NT

YT+1 = µ+ ω + NT+1

...

Yn = µ+ ω(1− δn−T )/(1− δ) + Nn.

In this model the first order dynamic function is applied to the unknown mean function, it

makes hard to derive MLEs, since the parameters are involved ”non-linearly” in the model.

15

Under model (1–1), we have

Yt = f (t) + Nt = ω + δf (t − 1) + Nt = ω + δYt−1 + (Nt − δNt−1), t ≥ T + 1

To simplify the previous model, we use an ARMA(1,1) time series (W1, ...,Wn) to replace

the terms (N1,N2, ...,NT ;NT+1 − δNT ,NT+2 − δNT+1, ...,Nn − δNn−1). In other words, the

original intervention model (1–1) can be written in the form of (1–7) below:

Y1 = µ+W1

Y2 = µ+W2

...

YT = µ+WT

YT+1 = µ+ ω + δYT +WT+1

...

Yn = µ+ ω + δYn−1 +Wn.

(1–7)

Instead of applying first order dynamic to the mean function, the new model applies it to the

observed time series. The model (1–7) can be rewritten in the matrix form as

W = AY − η (1–8)

where

η = (µ, ... ,µ,µ+ ω, ... ,µ+ ω)T , (1–9)

16

and the matrix A is given by

A =

1 0 0 ... ... 0 0

0 1 0 ... ... 0 0

...

0 ... ... −δ 1 ... 0

...

0 0 0 ... ... −δ 1

(1–10)

This representation allows us to derive the probability density function of Y .

The model can be also rewritten in the form

W = Y − Bβ (1–11)

β = (µ,ω, δ)T , (1–12)

The matrix B is given by

B =

1 0 0

1 0 0

......

...

1 1 YT...

......

1 1 Yn−1

(1–13)

This form is useful for deriving estimates for β, the parameters of interest.

The probability density function of the vector Y = (Y1, ...,Yn) equals to

p(Y |φ, θ,σ) = (2πσ2)−n2 |ZTZ |−

12 exp −1

2S∗(φ, θ)/σ2 (1–14)

17

where Z is given by

Z =

1 0

0 1

θ − φ −φ(θ − φ)(1− φ2)−12

θ(θ − φ) −θφ(θ − φ)(1− φ2)−12

...

θn−1(θ − φ) −θn−1φ(θ − φ)(1− φ2)−12

(1–15)

and

S∗(φ, θ) = (AY − η)TΓ(AY − η) = (Y − Bβ)TΓ(Y − Bβ). (1–16)

with Γ defined as

Γ = LT (I − Z(ZTZ)−1ZT )L (1–17)

And matrix L has form

L =

0 0 0 ... 0 0

0 0 0 ... 0 0

1 0 0 ... 0 0

(θ − φ) 1 0 ... 0 0

θ(θ − φ) (θ − φ) 1 ... 0 0

...

θn−2(θ − φ) θn−3(θ − φ) θn−4(θ − φ) ... (θ − φ) 1

. (1–18)

Clearly, the form of matrices Z and L depends on the order of the time series Nt .

Consider the likelihood function:

L(φ, θ,σ2|Y ) = (2πσ2)−n2 |ZTZ |−

12 exp −1

2S∗(φ, θ)/σ2 (1–19)

First, it can be shown that for any given (φ, θ), the likelihood function is maximized by

β(φ, θ) = (BTΓB)−1BTΓY

18

and

σ2(φ, θ) = S∗(φ, θ, β)/n

which depend on (φ, θ) through Γ. Plugged them into the likelihood function, we get

L∗(φ, θ|β, σ2,Y ) =

(2π · S∗(φ, θ, β)

n

)− n2

|ZTZ |−12 exp −n

2. (1–20)

Therefore, if we let (φ, θ) be the values that maximize the above expression L∗, then the

MLE of the parameters can be obtained as φ, θ, β(φ, θ) and σ2(φ, θ)).

Furthermore, we would like to point out a connection between the MLE β(φ, θ) and a

Bayes estimator. If we let β = (BTΓB)−1BTΓY which depend on (φ, θ) through Γ, then we

have

S∗(φ, θ, β) = [(Y − Bβ)− B(β − β)]TΓ[(Y − Bβ)− B(β − β)]

= (Y − BβT )Γ(Y − Bβ) + (β − β)TBTΓB(β − β), (1–21)

where the first term is constant given (φ, θ) and Y . Therefore, the likelihood function and

(1–21) imply that, conditioned on (φ, θ,σ2), the posterior distribution of β is multivariate

normal with mean β and covariance (BTΓB)−1.

Therefore,

L(β|Y ,φ, θ,σ2) ∝ exp −12(β − β)TBTΓB(β − β)/σ2. (1–22)

In other words, the MLE β(φ, θ) is the Bayes estimator with auxiliary parameters estimated at

(φ, θ).

1.3 The Test Procedure

Suppose βT = (µ, βT2 ), where β2 = (ω, δ), then for a treatment effect we would like to

test

H0 : β2 = (0, 0) vs Ha : β2 6= (0, 0) (1–23)

19

Here we consider partitioned β. It is well known [22], that whatever true value of β

β − β ∼ AN(0,Σ) (1–24)

where Σ is the variance-covariance matrix of the vector β.

Let

P =

0 1 00 0 1

then

Pβ =

0 1 00 0 1

µ

ω

δ

=ωδ

= β2

Assume that the matrix Σ is of the form

Σ =

Σ11 Σ12 Σ13

Σ21 Σ22 Σ23

Σ31 Σ32 Σ33

Then the variance-covariance matrix for the vector β2 is given by

Σ2 = Cov(β2) = Cov(Pβ) = PΣPT =

Σ22 Σ23Σ32 Σ33

To find p−value one can use a Wald test statistic given by

Tw = β2TΣ−12 β2 ∼ χ22 (1–25)

We should mention that the likelihood ratio test statistics and score test statistics can be used.

This result holds for large samples, but it may not be valid for small samples, because the

chi-squared approximation to the null distribution for the test statistics may not be as good as

for large samples. To address this issue, we propose the following procedure to find the p-value:

1. Simulate n time series from the model with β2 = (0, 0).

20

2. Estimate coefficients β2 based on simulated data and calculate Ti , i = 1, ..., n as a Waldtest statistic.

3. Calculate p−value according the formula

p =

∑ni=1 I (Ti > Tw)

n

1.4 Simulation Studies

In this section we present some numerical results to compare an intervention model

with AR(1) errors and an intervention model with ARMA(1, 1) errors. The simulations were

performed for β = (30, 8, 0.6) and β = (30, 0, 0). For the comparison purpose we looked

at the two sets of data. The first set was simulated using the model with AR(1) errors and

the second set was simulated using the model with ARMA(1, 1) errors. The compuational

experiments have been performed with free software environment R [23].

The results of simulation studies are summarized in Table 1-1 and Table 1-2.

Table 1-1. MSE and Bias for β = (30, 0.8, 0.6)

MSE BiasModel Fit µ ω δ µ ω δ

AR(1) 0.395 2.578 0.0004 0.0246 0.0767 -0.0007AR(1) ARMA(1,1) 0.465 5.7 0.0008 -0.0795 -0.819 0.011

AR(1) 0.403 3.19 0.0005 0.0254 0.0944 -0.001ARMA(1,1) ARMA(1,1) 0.479 6.93 0.001 -0.0717 -0.8985 0.011

Table 1-2. MSE and Bias for β = (30, 0, 0)

MSE BiasModel Fit µ ω δ µ ω δ

AR(1) 0.7325 137.9 0.1518 0.015 -5.07 0.17AR(1) ARMA(1,1) 1.102 218.8 0.24 0.078 -11.22 0.369

AR(1) 0.751 275.4 0.304 0.0304 -8.62 0.287ARMA(1,1) ARMA(1,1) 1.073 273.9 0.302 0.103 -10.79 0.353

21

Figure 1-2. Estimated mean functions for an AR process

22

Figure 1-3. Estimated mean functions for an ARMA process

23

1.5 Case Study

For an illustrative example we consider data from a randomized clinical trial of Constraint

Induced Language Therapy (CILT). The main aim of the study was to determine if CILT would

result in observable improvements in speech and if it would be significantly better than regular,

unconstrained language therapy. There were four groups of patients who completed the study:

• Intensive CILT (10 patients)

• Distributed CILT (10 patients)

• Intensive Promoting Aphasic Communicative Effectiveness (PACE) (8 patients)

• Distributed PACE (8 patients)

The PACE therapy [24] was used for the comparison because of its common application in

the rehabilitation of aphasia.

We expect that the clinical response on the intervention in each group can be variable,

therefore, the single-subject design is reasonable approach to test for the treatment effect

for each patient.The results from trials can be combined using meta-analysis or Bayesian

hierarchical models [25].

The model 1–7 and the algorithm for estimating p−values, described in Section 1.3,

were programmed in R and in MATLAB. The graphics of fitted model for different groups of

patients are presented in Figure 1-4, Figure 1-5, Figure 1-6 and Figure 1-7 respectively. The

results from the CILT study are summarized in Table 1-3.

1.6 Conclusions

In this chapter, we developed an improved intervention model for single-subject studies

with relatively small number of observations for each subject.The exact likelihood function for

the model was derived. We also presented a framework for a treatment effect test in clinical

studies with single-subject design. This goal is achieved using the coefficient estimates from the

exact likelihood function.

24

Table 1-3. Case study results

Patient p-value TS µ ω δ1 < 0.001 66.01 56.177 1.31 0.3252 0.87 0.404 42.51 -12.845 0.253 < 0.001 1145.84 9.25 34.82 0.214 0.001 48.01 60.41 17.56 0.065 < 0.001 927.829 20.02 18.98 0.476 0.041 17.41 56.84 51.33 -0.487 0.689 1.31 3.98 2.75 0.0478 0.014 23.1 47.36 1.14 0.219 < 0.001 170.48 19.5 -0.12 0.6910 0.002 89.51 59.06 -28.76 0.6211 0.23 10.13 64.18 44.66 -0.49512 0.003 57.8 13.15 0.79 0.6513 < 0.001 122.09 50.07 1.40 0.3514 < 0.001 82.98 46.52 13.61 0.3315 0.035 19.64 56.13 -34.16 0.716 < 0.001 62.97 34.10 -18.72 0.7217 0.01 32.58 3.58 4.39 0.5618 < 0.001 104.15 37.28 -16.7 0.7619 < 0.001 112.07 9.41 39.59 0.1320 0.015 28.29 69.4 -29.98 0.5221 < 0.001 1551.63 10.41 23.58 0.4422 < 0.001 147.12 33.17 20.57 0.3823 0.031 19.03 34.44 -7.68 0.5624 < 0.001 96.76 53.51 -20.22 0.6225 < 0.001 85.47 2.01 41.61 -0.0326 < 0.001 1892.40 7.83 49.02 0.0927 0.37 14.26 -2.41 6.41 -0.2228 0.001 61.42 17.86 5.5 0.1629 0.026 29.91 3.97 -2.75 0.2530 0.089 10.44 4.34 -0.94 0.5531 0.015 26.25 23.10 -17.89 0.4432 0.011 30.29 16.92 28.99 0.1933 0.354 17.8 -0.02 0.79 -0.7534 0.64 1.51 0.17 -0.034 0.3735 0.024 22.28 39.68 44.85 -0.3336 < 0.001 1287.95 31.26 16.64 0.38

25

Figure 1-4. Intensive CILT

26

Figure 1-5. Distributed CILT

27

Figure 1-6. Intensive PACE

28

Figure 1-7. Distributed PACE

29

The model was successfully fit to the data from a randomized clinical trial of Constraint

Induced Language Therapy. Clearly, the applications of such models are not restricted to

clinical studies, though this research was motivated by medical applications.

30

CHAPTER 2ANALYSIS OF VARIANCE BASED ON CROSS-FITTING MEASURE OF SIMILARITY AND

PERMUTATION TEST

2.1 Introduction

The analysis of experimental data that have been observed at different time points leads

to new statistical modeling. Time series data arise in many scientific fields: economics (stock

market), medicine (blood pressure traced over time, fMRI), speech recognition, physical and

environmental sciences. Typical real problems on time series deal with modeling, forecasting

and clustering. For this reason the study of distance measures and clustering for time series

is an important part of research in several scientific fields. The main goal of this work is to

study group differences in time series. Assume that in each group time series observed from

many different subjects , each with different model. We shall claim group differences if there

is more between group differences than within group differences. These types of problems rely

on the ability to measure the similarity or dissimilarity between time series. Defining reasonable

measure of similarity is a nontrivial task. There are two main approaches to perform pairwise

comparison between time series. The first approach deals with selected features extracted

from the data. The second approach relies on comparison models built from the raw data

with likelihood ratio type of testing. In this work we introduce a distance measure based on

cross-fitting. The proposed measure should be a convenient tool for analysis of variance and

time series clustering.

2.2 Literature Review

In this section we briefly summarize previous research on measure of time series similar-

ity/dissimilarity. We will discuss methods based on raw data and also model based approach

which is more related to the goal of the research in this chapter.

Minkowski distance: Let X and Y be T -dimensional vectors. Then Minkowski distance in

Lq norm between observed values is defined as:

dM =q

√√√√ N∑t=1

(Xt − Yt)q

31

There are several distances based on cross-correlation.

Golay et al. [26] introduced two cross-correlation-based distances

d1cc = (1− cc1 + cc

)β

for some β > 0, and

d2cc = 2(1− cc)

where

cc =

∑Tt=1(Xt − µX )(Yt − µY )

SXSY.

SX and SY are standard deviations.

According to Liao [27] dissimilarity index based on cross-correlation function can be

defined as:

di ,j =

√1− ρ2i ,j(0)∑maxτ=1 ρ

2i ,j(τ)

where ρ2i ,j(τ) is the cross-correlation between two time series Xi and Yj with lag τ , and max is

the maximum lag.

So far we considered the metrics that are based on the similarity of the raw data aligned

by time. Now we will focus on approach based on model comparison when the time alignment

is irrelevant. Kalpakis et al. [28] claim that there are many similarity queries where Euclidean

distance between raw data fail to capture the notion of similarity and they proposed the

Euclidean distance between the Linear Predictive Coding(LPC) spectra as a measure of

dissimilarity. Consider AR(p) time series, Box [21]

Xt = φ1Xt−1 + φ2Xt−2 + ... + φpXt−p + at

Then,

cn =

φ1, if n = 1

φn +∑n−1m=1(1−

mn)αmcn−m, if 1 < n ≤ p∑n−1

m=1(1−mn)αmcn−m, if p < n

32

Alonso [29] developed time series clustering based on forecast densities. Let X (i) =

(X(i)1 , ...,X

(i)T ) be the time series corresponding to the ith subject in the sample. Assume

f(i)XT+h

denote the density function of the forecast X (i)T+h, then the distance is

Dij =

∫ (f(i)XT+h

− f (j)XT+h)dx

Another distance was proposed by Piccolo [30]. The distance is based on AR(∞) representa-

tion of ARMA models. We will discuss this distance in the next section and compare it with

proposed cross-fitting distance. Maharaj [31] also used AR(∞) form of ARMA models to test

hypothesis if there is difference between the generating processes of two stationary series.

2.3 Measure of Distance by ARMA Coefficients

The purpose of this section is to outline the idea of AR distance and Cross-Fitting

measure of similarity. The AR distance was introduced by Piccolo[30]. Corduas and Piccolo[32]

discovered asymptotic distribution of the squared AR distance in order to set comparison of

time series within the hypotheses testing framework.

Let Zt be a zero mean ARIMA(p, d , q) process. According to the standard notation of

Box et al.[21] such a model is defined as follow:

φ(B)OdZt = θ(B)at (2–1)

where at is a univariate white noise process with zero mean and constant variance σ2, B is the

backward shift operator, which is defined by BZt = Zt−1. An autoregressive operator of order

p and a moving average operator of arder q are defined as:

φ(B) = 1− φ1B − φ2B2 − ...− φpB

p

θ(B) = 1− θ1B − θ2B2 − ...− θqB

q

with the invertibility and stationarity restrictions. We need invertibility assumption to ensure

that Zt can be represented according to AR(∞) formulation:

π(B)Zt = at (2–2)

33

with

π(B) = (1− B)dφ(B)θ−1(B) = 1−∞∑j=1

πjBj

and∞∑j=1

|πj | <∞

Based on this representation Piccolo[30] introduced the Euclidean distance between the

π-weights as measure of dissimilarity between two ARIMA processes Xt and Yt :

d =

√√√√ ∞∑j=1

(πxj − πyj)2, (2–3)

where πxj and πyj are π-weights from models for time series Xt and Yt respectively.

2.4 The Cross-Fitting Measure of Similarity

In this section we develop a new approach to measure similarity between time series.

Suppose there are two time series Xt , t = 1, ..., n and Yt , t = 1, ..,m.

Assume we can fit series Xt by ARMA model M1 and series Yt by ARMA model M2.

To define the distance between two series we use the following algorithm:

1. Apply M1 to time series Xt to obtain prediction error σ211.

2. Apply M2 to time series Xt to obtain prediction error σ212.

3. Apply M1 to time series Yt to obtain prediction error σ221

4. Apply M2 to time series Yt to obtain prediction error σ222

5. Define the distance between the two time series by:

d(1, 2) =σ212 − σ211

σ211+σ221 − σ222

σ222(2–4)

In the case when we are only interested in the ‘shape‘ difference of the two time series we can

standardize the series by

Xt =Xt − XSx

, Yt =Yt − YtSy

,

34

where X , Y , Sx , Sy are the standard notation for sample means and standard deviations

respectively.

The proposed measure satisfies properties of a semimetric:

1. d(Xt ,Yt) ≥ 0

2. d(Xt ,Yt) = 0 if and only if Xt = Yt

3. d(Xt ,Yt) = d(Yt ,Xt)

Consider two AR(1) models:

• Xt = φ1Xt−1 + εt where εt are iid N(0, 1)

• Yt = φ2Yt−1 + εt , where εt are iid N(0, 1)

In terms of the models coefficients the cross-fitting distance is equal:

d(1, 2) =(φ1 − φ2)

2

1− φ21+(φ1 − φ2)

2

1− φ22

Consider three time series from AR(1) models:

1. Xt = 0.1Xt−1 + εt , t = 1, ..., 100

2. Yt = 0.5Yt−1 + εt , t = 1, ..., 100

3. Zt = 0.9Zt−1 + εt , t = 1, ..., 100

For the cross-fitting measure we obtain: according to our definition d(1, 2) = 0.37,

d(2, 3) = 1.055

For the AR distance it is easy to see that the Piccolo’s Euclidean distance between model

1(φ = 0.1) and model 2(φ = 0.5) is the same as distance between model 2(φ = 0.5) and

model 3(φ = 0.9).

Based on graphical representation we expect more dissimilarity between Model 2 and

Model 3.

We considered three stationary time series for comparison between Piccolo’s distance and

cross-fitting measure of similarity. The difference is even more illustrative if we consider two

stationary time series and one non-stationary, because the dissimilarity is going to be even more

35

extreme between non-stationary and stationary time series than between two stationary series.

Suppose there are two time series from AR(2) models:

• Xt = φ1Xt−1 + φ2Xt−2 + εt ,

• Yt = φ∗1Yt−1 + φ

∗2Yt−2 + ε

∗t

Based on the definition of cross-fitting measure:

d(1, 2) =σ212 − σ211

σ211+σ221 − σ222

σ222

Then σ211 = E(Xt − Xt)2 = E(ε2t ) = 1 and similarly σ222 = 1.

For σ212 we fit data from the time series based on the first model using the second model:

σ212 = γ(0)− 2(φ∗1γ(1) + φ

∗2γ(2)) + (φ

∗21 + φ

∗22 )γ(0) + 2φ

∗1φ

∗2γ(1)

σ221 = γ∗(0)− 2(φ1γ∗(1) + φ2γ∗(2)) + (φ21 + φ22)γ∗(0) + 2φ1φ2γ∗(1)

Where:

γ(0) =1− φ2

1− φ2 − φ21 − φ21φ2 − φ22 + φ32

γ(1) =φ1

1− φ2 − φ21 − φ21φ2 − φ22 + φ32

γ(2) =φ21 − φ2(1− φ2)

1− φ2 − φ21 − φ21φ2 − φ22 + φ32

and

γ∗(0) =1− φ∗

2

1− φ∗2 − φ∗2

1 − φ∗21 φ

∗2 − φ∗2

2 + φ∗32

γ∗(1) =φ∗1

1− φ∗2 − φ∗2

1 − φ∗21 φ

∗2 − φ∗2

2 + φ∗32

γ∗(2) =φ∗21 − φ∗

2(1− φ∗2)

1− φ∗2 − φ∗2

1 − φ∗21 φ

∗2 − φ∗2

2 + φ∗32

36

Figure 2-1. Comparison of three stationary AR(1) time series

37

2.5 ANOVA Based on Cross-Fitting Measure of Similarity and Permutation Test

In this section we consider the Fisher’s permutation test [33] for the ANOVA with the

cross-fitting distance derived in the previous section. The permutation test allows to estimate

a p−value without any assumptions about the distribution of a test statistics under the null

hypothesis.

Let group i , i = 1, ...,K have observations Yij(t), where j is the subject index and t is

the time index, j = 1, ..., ni . Assume the time series model for time series ij is Mij . Let the

fitting distance between ij and kl be d(ij , kl). The test statistic for H0 that there is no group

difference is

T =average between group measure (BM)

average within group measure (WM)(2–5)

where BM is given by

BM =

∑All between group measures∑K−1

i=1 ni

(∑Kj=i+1 nj

) (2–6)

and

WM =

∑All within group measures∑K

j=1

∑nj−1i=1 (nj − i)

(2–7)

To estimate p−value for the test one can use the permutation test .

The permutation algorithm is as follow:

1. Calculate statistics T using the equation 2–5

2. Evaluate value T ∗i for each permutation, i = 1, ...,N

3. Approximate p−value as follow

p =

∑Ni=1 I (T

∗i > T )

N

2.6 Conclusions

In this chapter we presented a cross-fitting measure, a novel approach to measure

similarity between time series. We also discussed some limitations of previous research. We

derived ANOVA based on cross-fitting measure of similarity and permutation test. The

proposed measure can be a useful tool for problems that involve time series clustering.

38

APPENDIX ALIKELIHOOD FUNCTION

In this part, using results from Newbold [34] for an ARMA(1,1) process, W =

(W1, ...,Wn), we derive the exact likelihood function for a vector Y = (Y1, ...,Yn).

Consider the model:

Y1 = µ+W1

...

YT = µ+WT

YT+1 = µ+ ω + δYT +WT+1

...

Yn = µ+ ω + δYn−1 +Wn

(A–1)

We assume that Wt , t = 1, ..., n follow ARMA(1,1) model.

Let a0 = a0

W0 =W0

at =Wt − φWt−1 − θat−1, (1 ≤ t ≤ n)

(A–2)

Define vector e as follow

e = (e∗, en)T (A–3)

where

e∗ = (a0,W0)T

and

en = (a1, ..., an)T

Then the set of equations (A–2) can be written as follow

e = LW + Xe∗ (A–4)

39

where

L =

0 0 0 ... 0 0

0 0 0 ... 0 0

1 0 0 ... 0 0

(θ − φ) 1 0 ... 0 0

θ(θ − φ) (θ − φ) 1 ... 0 0

...

θn−2(θ − φ) θn−3(θ − φ) θn−4(θ − φ) ... (θ − φ) 1

and

X =

1 0

0 1

θ −φ

θ2 −θφ...

...

θn −θn−1φ

Then the model (A–4) can be written ase∗

en

= 0Ln

W + IXn

e∗ (A–5)

Consider

E(e∗eT∗ ) = E

a20 aoW0

a0W0 W 20

(A–6)

The stationary and invertible ARMA(p, q) process can be represented as

Wt = ψ(B)at =∞∑j=0

ψjat−j

where

ψ(B) = φ−1(B)θ(B)

40

Let γj is an autocovariance function, then

γj = E(WtWt−j)

The set of equations for autocovariance functions is given by [21]γ0 = φγ1 + σ

2(1− θψ1)

γ1 = φγ0 − θσ2

γk = φγk−1 (k ≥ 2)

(A–7)

Solving (A–7) for γ0 and γ1 we obtainγ0 =

1+θ2−2φθ1−φ2

σ2

γ1 =(1−φθ)(φ−θ)1−φ2

σ2

γk = φγk−1 (k ≥ 2)

(A–8)

Clearly,

E(a20) = σ2,

and

E(W 20 ) = γ0 =

1 + θ2 − 2φθ1− φ2

σ2

For h ≥ 0,

Cov(at+h−j ,Wt) = Cov(at+h−j ,

∞∑k=0

ψkat−k) = ψj−hσ2

Therefore,

E(a0W0) = ψ0σ2 = σ2

Then

E(e∗eT∗ ) = σ2Ω (A–9)

where

Ω =

1 1

1 1+θ2−2φθ1−φ2

41

Let T be a nonsingular matrix s.t.

TΩTT = I

Multiplication of (A–5) by the matrix T 0

0 I

yields u∗

un

= 0Ln

W + I

XnT−1

u∗ (A–10)

where u∗ = Te∗ and un = en.

In matrix notation we can write:

u = LW + Zu∗ (A–11)

Z is given by

Z =

1 0

0 1

θ − φ −φ(θ − φ)γ

θ(θ − φ) −θφ(θ − φ)γ

...

θn−1(θ − φ) −θn−1φ(θ − φ)γ

(A–12)

Since

E(u∗uT∗ ) = σ2TΩTT

then the density function of u is given by

f (u|σ) = (2πσ2)−12(n+2) exp

(−12uTu/σ2

)Therefore, the joint density function of W and u∗ is given by

f (W , u∗) = (2πσ2)−

12(n+2) exp

(−12S(φ, θ, u∗)/σ

2

)

42

with S(φ, θ, u∗) is given by

S(φ, θ, u∗) = (LW + Zu∗)T (LW + Zu∗)

Let

u∗ = −(ZTZ)−1ZTLWn

using the fact that

S(φ, θ, u∗) = S(φ, θ) + (u∗ − u∗)TZTZ(u∗ − u∗)

where

S(φ, θ) = (LW + Zu∗)T (LW + Zu∗)

the joint density function can be written as

f (W , u∗) = f (W |φ, θ,σ)f (u∗|W ,φ, θ,σ)

therefore, the marginal density function is given by

f (W |φ, θ,σ) = (2πσ2)−n2 |ZTZ |−

12 exp −1

2S(φ, θ)/σ2 (A–13)

One can rewrite

S(φ, θ) = (LW − Z(ZTZ)−1ZTLW )T (LW − Z(ZTZ)−1ZTLW ) =W TΓW (A–14)

where

Γ = LT (I − Z(ZTZ)−1ZT )TL

Consider model (A–1). In matrix form it can be rewritten as

W = AY − ω (A–15)

43

where

A =

1 0 0 ... ... 0 0

0 1 0 ... ... 0 0

...

0 ... ... −δ 1 ... 0

...

0 0 0 ... ... −δ 1

(A–16)

then

A−1 =

1 0 0 ... ... 0 0

0 1 0 ... ... 0 0

...

0 ... ... δ 1 ... 0

...

0 0 0 ... ... δ 1

(A–17)

And the Jacobian of the transformation is |(det(A−1))−1| = 1.

Thus, the exact likelihood for the vector Y = (Y1, ...,Yn) is of the form

p(Y |φ, θ,σ) = (2πσ2)−n2 |ZTZ |−

12 exp −1

2S∗(φ, θ)/σ2 (A–18)

where

S∗(φ, θ) = (AY − ω)TΓ(AY − ω)

and

ZTZ =

1 + (θ−φ)2(1−θ2n)1−θ2

−(θ − φ)2φγ 1−θ2n

1−θ2

−(θ − φ)2φγ 1−θ2n

1−θ21 + (θ − φ)2φ2γ2 1−θ2n

1−θ2

Therefore, the determinant is given by

|ZTZ | = 1 + (θ − φ)21− θ2n

(1− θ2)(1− φ2)

44

APPENDIX BCROSS-FITTING DISTANCE

Consider two time series from AR(2) models:

• Xt = φ1Xt−1 + φ2Xt−2 + εt

• Yt = φ∗1Yt−1 + φ

∗2Yt−2 + ε

∗t

Then

σ211 = E(Xt − Xt)2 = E(ε2t ) = 1

For σ212 we fit data from the model (1) using the model(2)

σ212 = E(Xt−Xt)2 = E(Xt−φ∗1Xt−1−φ∗

2Xt−2)2 = E(X 2t )−2E(Xt(φ∗

1Xt−1−φ∗2Xt−2))+E(φ

∗1Xt−1−φ∗

2Xt−2)2

To find cov(Xt+h,Xt) one can use difference equations

γ(h)− φ1γ(h − 1)− φ2γ(h − 2) = 0, h > max(p, q + 1)

with initial conditions

γ(h)− φ1γ(h − 1)− φ2γ(h − 2) = σ2ε

q∑j=h

θjψj−h

where θj are coefficients from MA part: θ0 = 1, θj = 0 for j ≥ 1

To find ψj one can use the equation:

(ψ0 + ψ1x + ψ2x2...)(1− φ1x − φ2x

2 − ...) = (1 + θ1x + θ2x2 + ...)

The first few values:

ψ0 = 1

ψ1 − φ1ψ0 = θ1

ψ2 − φ1ψ1 − φ2ψ0 = θ2

45

And from the system of equations we get:

ψ0 = 1

ψ1 = φ1

ψ2 = φ21 + φ2

Thus γ(0), γ(1), γ(2) can be obtained from the system of equations:

φ1γ(1) + φ2γ(2) + 1 = γ(0)

φ1γ(0) + φ2γ(1) = γ(1)

φ1γ(1) + φ2γ(0) = γ(2)

E(X 2t ) = γ(0) =1− φ2

1− φ2 − φ21 − φ21φ2 − φ22 + φ32

E(XtXt−1) = γ(1) =φ1

1− φ2 − φ21 − φ21φ2 − φ22 + φ32

E(XtXt−2) = γ(2) =φ21 − φ2(1− φ2)

1− φ2 − φ21 − φ21φ2 − φ22 + φ32

And finally,

σ212 = γ(0)− 2(φ∗1γ(1) + φ

∗2γ(2)) + (φ

∗21 + φ

∗22 )γ(0) + 2φ

∗1φ

∗2γ(1)

σ221 = γ∗(0)− 2(φ1γ∗(1) + φ2γ∗(2)) + (φ21 + φ22)γ∗(0) + 2φ1φ2γ∗(1)

46

APPENDIX CR CODE FOR EVALUATING MLE

In this chapter we developed R code for calculating exact likelihood function and MLE for

improved model

> rm(list = ls())

> ciu <- read.table("CIUdata.txt")

> ciu <- as.matrix(ciu)

> p.val <- function(TS)

Y <- TS

ar.fit <- function(Y)

phi <- seq(-0.99, 0.99, 0.01)

n <- length(Y)

tcp <- 4

Zm <- function(x, y, n)

Zn <- matrix(rep(0, 2 * n), nrow = n)

a <- c(0:(n - 1))

Zn[, 1] <- (x^a) * (x - y)

Zn[, 2] <- (-y) * x^a * (x - y) * (1 - y^2)^(-0.5)

Z <- rbind(diag(1, 2, 2), Zn)

return(Z)

Lm <- function(x, y = 0, n)

Ln <- matrix(rep(0, n * n), nrow = n)

b <- c(0:(n - 2))

for (i in 1:(n - 1))

Ln[, 1] <- c(1, (x^b) * (x - y))

Ln[(i + 1):n, i + 1] <- Ln[1:(n - i), 1]

47

L <- rbind(rep(0, n), rep(0, n), Ln)

return(L)

Determ <- function(x, y, n)

D <- 1 + (x - y)^2 * (1 - x^(2 * n))/((1 - x^2) *

(1 - y^2))

InvM <- function(x, y, n)

Inv11 <- 1 + (x - y)^2 * y^2 * (1 - x^(2 * n))/((1 -

y^2) * (1 - x^2))

Inv22 <- 1 + (x - y)^2 * (1 - x^(2 * n))/(1 - x^2)

Inv12 <- (x - y)^2 * y * (1 - x^(2 * n))/((1 - y^2)^(1/2) *

(1 - x^2))

Inv <- Determ(x, y, n)^(-1) * matrix(c(Inv11, Inv12,

Inv12, Inv22), ncol = 2)

likl <- function(x)

phi <- x

I <- diag(1, n + 2, n + 2)

Z <- Zm(0, phi, n)

L <- Lm(0, phi, n)

Det <- Determ(0, phi, n)

Inv <- InvM(0, phi, n)

B <- cbind(rep(1, n), c(rep(0, 4), rep(1, 9)), c(rep(0,

tcp), Y[tcp:12]))

Gamma <- t(L) %*% (I - Z %*% Inv %*% t(Z)) %*% L

48

betaHat <- solve(t(B) %*% Gamma %*% B) %*% t(B) %*%

Gamma %*% Y

S <- t(Y - B %*% betaHat) %*% Gamma %*% (Y - B %*%

betaHat)

lgl <- -n/2 * log(S) - 1/2 * log(Det)

return(lgl)

Lik <- rep(0, length(phi))

Lik <- sapply(phi, likl)

phi <- phi[which.max(Lik)]

I <- diag(1, n + 2, n + 2)

Z <- Zm(0, phi, n)

L <- Lm(0, phi, n)

Det <- Determ(0, phi, n)

Inv <- InvM(0, phi, n)

B <- cbind(rep(1, n), c(rep(0, 4), rep(1, 9)), c(rep(0,

4), Y[4:12]))

Gamma <- t(L) %*% (I - Z %*% Inv %*% t(Z)) %*% L

temp <- solve(t(B) %*% Gamma %*% B)

betaHat <- temp %*% t(B) %*% Gamma %*% Y

S <- t(Y - B %*% betaHat) %*% Gamma %*% (Y - B %*% betaHat)

s2 <- sqrt(S/n)

s2 <- as.numeric(s2)

tmp2 <- temp * s2^2

lik <- c(phi, betaHat, s2)

CovInv <- solve(temp[2:3, 2:3])

TStat <- betaHat[2:3] %*% CovInv %*% betaHat[2:3]/s2^2

49

lik <- c(TStat, phi, betaHat, s2)

return(lik)

coeff <- as.numeric(ar.fit(TS))

n.col <- 100

n.row <- 13

s.data <- matrix(rep(0, n.col * n.row), nrow = n.row)

for (i in 1:n.col)

s.data[, i] <- arima.sim(list(order = c(1, 0, 0), ar = coeff[2]),

n = n.row) * coeff[6]

sim.coef <- apply(s.data, 2, ar.fit)

p <- sum(coeff[1] < sim.coef[1, ])/100

res <- c(p, coeff)

50

APPENDIX DMATLAB CODE FOR EVALUATING MLE

The MATLAB code was written by Dr. Sam Wu and Oleksandr Savenkov.

% FitData

DisplayFitOnly

% DisplayFitAndData

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function FitData

Y = InputData;

T = 4; n = 13; Times = 1:13;

llRes = [];

[ParEst, CovOmegaDelta, TestStat, Pvalues] = FitSSdata(Y, T);

[StdC, Pvalue] = EvalCstat(Y);

AllRes = [ParEst, reshape(CovOmegaDelta, 1, 4), TestStat, Pvalues, StdC, Pvalue];

ParEst

fid2 = fopen('AnalysisResults_Tmp.txt','w');

[r,c] = size(AllRes);

for ii = 1:r

for jj=1:c

fprintf(fid2, ['%9.4f '], AllRes(ii, jj));

end

fprintf(fid2, '\n');

end

fclose(fid2);

51

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function DisplayFitOnly

T = 4; n = 13; Times = 1:13;

BetaHat = [31.92 -13.15 0.36];

BetaHat = [29.42 18.95 -0.60];

BetaHat = [30.42 0.001 0.005];

Mu = BetaHat(1);

Omega = BetaHat(2);

Delta = BetaHat(3);

Yhat = zeros(n, 1); Yhat(1:T) = Mu;

for t = (T+1):n

Yhat(t) = Mu + Omega + Delta * Yhat(t-1);

end

figure(3)

plot(Times, Yhat, 'b-', 'MarkerSize',4, 'LineWidth',1);

ylim([0, 100]);

set(gca,'ytick',[0 25 50 75 100]);

set(gca,'yticklabel',[' 0'; ' 25'; ' 50'; ' 75'; '100']);

ylabel('Percent');

set(gca,'xtick',[4 8 12]);

xlabel('Sessions')

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function DisplayFitAndData

52

Y = InputData;

T = 4; n = 13; Times = 1:13;

load AnalysisResults_Tmp.txt;

Est = AnalysisResults_Tmp;

BetaHat = Est(4:6);

Mu = BetaHat(1);

Omega = BetaHat(2);

Delta = BetaHat(3);

Yhat = zeros(n, 1); Yhat(1:T) = Mu;

for t = (T+1):n

Yhat(t) = Mu + Omega + Delta * Yhat(t-1);

end

figure(3)

plot(Times, Y, 'b^', Times, Yhat, 'b-', 'MarkerSize',4, 'LineWidth',1);

ylim([0, 100]);

p1 = Est(13);

p1 = round(p1*1e4)/1e4;

p2 = Est(15);

p2 = round(p2*1e4)/1e4;

title( strcat('p = ', num2str(p1, 4), ' & ', num2str(p2, 4) ) );

set(gca,'ytick',[0 25 50 75 100]);

set(gca,'yticklabel',[' 0'; ' 25'; ' 50'; ' 75'; '100']);

ylabel('Percent');

set(gca,'xtick',[4 8 12]);

53

xlabel('Sessions')

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [ParEst, CovOmegaDelta, TestStat, Pvalues] = FitSSdata(Y, T)

%% Y -- single subject observation series

%% T -- time of intervention

%%

n = length(Y);

B = [ones(n, 1) [zeros(T,1); ones(n-T, 1)] [zeros(T,1); Y(T:(n-1))]];

[ParEst, Cov_Beta] = GridSearch(B, Y);

CovOmegaDelta = Cov_Beta(2:3, 2:3);

TestStat = ParEst(5:6) * inv(CovOmegaDelta) * ParEst(5:6)';

Pvalue1 = 1-chi2cdf(TestStat, 2);

Pvalue2 = 0; %% EvalPvalue2(ParEst, TestStat, T, n);

Pvalues = [Pvalue1, Pvalue2];

function Pvalue2 = EvalPvalue2(ParEst0, TestStat0, T, n)

Rep = 4e3;

Fi = ParEst0(1);

Theta = ParEst0(2);

s = sqrt(ParEst0(3));

Mu = ParEst0(4);

Pvalue2 = 0;

for i = 1:Rep

Noise = simarma(Fi, -Theta, n, s^2);

54

Y = Mu + Noise';

B = [ones(n, 1) [zeros(T,1); ones(n-T, 1)] [zeros(T,1); Y(T:(n-1))]];

[ParEst, Cov_Beta] = GridSearch(B, Y);

CovOmegaDelta = Cov_Beta(2:3, 2:3);

TestStat = ParEst(5:6) * inv(CovOmegaDelta) * ParEst(5:6)';

Pvalue2 = Pvalue2 + (TestStat>TestStat0)/Rep;

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% This new version was created on Oct 17, 2011

%%

function [ParEst, Cov_Beta] = GridSearch(B, Y);

tmp = [];

n = length(Y);

for Fi = -0.99:0.01:0.99

for Theta = 0 %% -.99:0.01:0.99

%% s2= mean(Residuals.^2) * (1-Fi^2) / (1 + Theta^2 - 2*Theta*Fi);

[Z, L, DetZpZ, Gamma] = CreateZLGamma(Fi,Theta,n);

BetaHat = inv(B'*Gamma*B) * B' * Gamma * Y;

Residuals = (Y - B*BetaHat);

s2= Residuals' * Gamma * Residuals /n;

ll = -n/2*log(2*pi*s2) -log(DetZpZ)/2 - n/2;

%% ll = -n/2*log(2*pi*s2);

tmp = [tmp; [Fi Theta s2 BetaHat' ll]];

55

end

end

a = sortrows(tmp, -7);

ParEst = a(1, 1:6);

[Z, L, DetZpZ, Gamma] = CreateZLGamma(ParEst(1), ParEst(2),n);

Tmp = inv(B'*Gamma*B);

BetaHat = Tmp * B' * Gamma * Y;

Residuals = (Y - B*BetaHat);

s2= Residuals' * Gamma * Residuals /n;

Cov_Beta = Tmp * s2;

function [Z, L, DetZpZ, Gamma] = CreateZLGamma(fi,theta,n)

a = (0:(n-1))';

b = (theta .^ a) * (theta - fi);

Z = [[1; 0; b] [0; 1; b*(-fi)*((1-fi^2)^(-0.5))]];

tmp = [0; 1; b];

L = [];

for i = 1:n

tmp = [0; tmp(1:(n+1))];

L = [L tmp];

end

w = (theta - fi)^2 * (1 - theta^(2*n))/(1 - theta^2);

56

r = 1/sqrt(1-fi^2);

DetZpZ = 1 + w/(1-fi^2);

InvZpZ = [1+w*fi^2*r^2, w*fi*r; w*fi*r, 1+w]/DetZpZ;

Gamma = L' * (eye(n+2) - Z * InvZpZ * Z') * L;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function y=simarma(fi,theta,n,s2,seed)

% y=simarma(fi,theta,n,s2) simulates ARMA process,

% fi vector fi paramaters, theta vector of theta parameters

% n observations

% s2 WN variance

% if fifth argument seed given, seed is the random generator seed

%

% The parametrization is under the Brockwell notations for theta and fi!!!!!!

if nargin==5

randn('seed',seed);

end

y=filter([1 theta],[1 -fi],randn(1,n+20));

y=y(21:n+20)*sqrt(s2);

function [StdC, Pvalue] = EvalCstat(x)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% This program evaluate the C statistics by Tryon, 1982 %

% %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

57

n = length(x); a = x(2:n)-x(1:(n-1)); b = x - mean(x);

sc = sqrt((n-2)/(n-1)/(n+1));

c = 1 - sum(a.^2)/2/sum(b.^2);

StdC = c/sc;

Pvalue = 2*(1-normcdf(StdC));

58

REFERENCES

[1] Robey R, Schultz M, Crawford A, Sinner C. Review: Single-subject clinical-outcomeresearch: designs, data, effect sizes, and analyses. Aphasiology 1999; 13(6):445–473.

[2] Ottenbacher K. Interrater agreement of visual analysis in single-subject decisions: Quanti-tative review and analysis. American journal on mental retardation 1993; .

[3] Kazdin A. Single-case research designs: Methods for clinical and applied settings. OxfordUniversity Press New York, 1982.

[4] Bloom M, Fischer J, Orme J, et al.. Evaluating practice. Allyn & Bacon, 1982.

[5] Matyas T, Greenwood K. Visual analysis of single-case time series: Effects of variability,serial dependence, and magnitude of intervention effects. Journal of Applied BehaviorAnalysis 1990; 23(3):341.

[6] Johnson M, Ottenbacher K. Trend line influence on visual analysis of single-subject data inrehabilitation research. Disability & Rehabilitation 1991; 13(2):55–59.

[7] Ottenbacher K, Cusick A. An empirical investigation of interrater agreement for single-subject data using graphs with and without trend lines. Journal of the Association forPersons with Severe Handicaps 1991; .

[8] Stocks J, Williams M. Evaluation of single subject data using statistical hypothesis testsversus visual inspection of charts with and without celeration lines. Journal of socialservice research 1995; 20(3-4):105–126.

[9] Ottenbacher K. Visual inspection of single-subject data: An empirical analysis. MentalRetardation 1990; .

[10] Krishef C. Fundamental approaches to single subject design and analysis. Krieger Pub.Co., 1991.

[11] Tryon W. A simplified time-series analysis for evaluating treatment interventions. Journalof Applied Behavior Analysis 1982; 15(3):423.

[12] Phillips J. Serially correlated errors in some single-subject designs. British Journal ofMathematical and Statistical Psychology 1983; 36(2):269–280.

[13] Toothaker L, Banz M, Noble C, Camp J, Davis D. N=1 designs: The failure of anova-based tests. Journal of Educational and Behavioral Statistics 1983; 8(4):289–309.

[14] Sharpley C, Alavosius M. Autocorrelation in behavioral data: An alternative perspective.1988; .

[15] Suen H, Lee P, Owen S. Effects of autocorrelation on single-subject single-facet crossed-design generalizability assessment. Behavioral Assessment 1990; 12:305–315.

59

[16] Gottman J. Time-series analysis: A comprehensive introduction for social scientists, vol.400. Cambridge University Press Cambridge, 1981.

[17] Crosbie J. Interrupted time-series analysis with brief single-subject data. Journal ofConsulting and Clinical Psychology 1993; 61(6):966.

[18] Rosner B, Munoz A, Tager I, Speizer F, Weiss S. The use of an autoregressive model forthe analysis of longitudinal data in epidemiologic studies. Statistics in Medicine 1985;4(4):457–467.

[19] Rosner B, Munoz A. Autoregressive modelling for the analysis of longitudinal data withunequally spaced examinations. Statistics in Medicine 1988; 7(1-2):59–71.

[20] Box G, Tiao G. Intervention analysis with applications to economic and environmentalproblems. Journal of the American Statistical Association 1975; :70–79.

[21] Box G, Jenkins G, Reinsel G. Time series analysis: forecasting and control. Prentice Hall,1994.

[22] DasGupta A. Asymptotic theory of statistics and probability. Springer Verlag, 2008.

[23] Ihaka R, Gentleman R. R: A language for data analysis and graphics. Journal of computa-tional and graphical statistics 1996; :299–314.

[24] Davis G, Wilcox M. Adult aphasia rehabilitation: Applied pragmatics. College-Hill PressSan Diego, CA, 1985.

[25] Zucker D, Ruthazer R, Schmid C. Individual (n-of-1) trials can be combined to givepopulation comparative treatment effect estimates: methodologic considerations. Journalof clinical epidemiology 2010; 63(12):1312–1323.

[26] Golay X, Kollias S, Stoll G, Meier D, Valavanis A, Boesiger P. A new correlation-basedfuzzy logic clustering algorithm for fMRI. Magnetic Resonance in Medicine 1998;40(2):249–260.

[27] Liao W, et al.. Clustering of time series data–a survey. Pattern Recognition 2005;38(11):1857–1874.

[28] Kalpakis K, Gada D, Puttagunta V. Distance measures for effective clustering of ARIMAtime-series. Proceedings of the IEEE International Conference on Data Mining, Citeseer,2001; 273–280.

[29] Alonso A, Berrendero J, Hernandez A, Justel A. Time series clustering based on forecastdensities. Computational Statistics & Data Analysis 2006; 51(2):762–776.

[30] Piccolo D. A distance measure for classifying ARIMA models. Journal of Time SeriesAnalysis 1990; 11(2):153–164.

[31] Maharaj E. Cluster of time series. Journal of Classification 2000; 17(2):297–314.

60

[32] Corduas M, Piccolo D. Time series clustering and classification by the autoregressivemetric. Computational statistics & data analysis 2008; 52(4):1860–1872.

[33] Lehmann E, Romano J. Testing statistical hypotheses. Springer Verlag, 2005.

[34] Newbold P. The exact likelihood function for a mixed autoregressive-moving averageprocess. Biometrika 1974; 61(3):423–426.

61

BIOGRAPHICAL SKETCH

Oleksandr Savenkov was born in 1983 in Ukraine. He obtained his degree in Financial

Mathematics from Donetsk National University in 2004. After his graduation he had been

working for two years as an economist in Raiffeisen Bank Aval. Oleksandr joined the De-

partment of Statistics at the University of Florida as a graduate student in 2006. During his

study he was a teaching assistant for several undergraduate and graduate classes and research

assistant on several scientific projects.

62

2012 oleksandr v. savenkovufdcimages.uflib.ufl.edu/uf/e0/04/45/93/00001/savenkov_o.pdf · phd...

Documents