consistent estimation in generalized broken-line regression

20
Journal of Statistical Planning and Inference 126 (2004) 441 – 460 www.elsevier.com/locate/jspi Consistent estimation in generalized broken-line regression Ryan Gill a , Michael Baron b; c ; ; 1 a Department of Mathematics, University of Louisville, Louisville, KY 40292, USA b Programs in Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75083, USA c Mathematical Sciences, IBM, T.J. Watson Research Center, Yorktown Heights, NY 10598, USA Received 28 March 2002; accepted 5 September 2003 Abstract A change-point model is considered where the canonical parameter of an exponential family drifts from its control value at an unknown time and changes according to a broken-line regres- sion. Necessary and sucient conditions are obtained for the existence of consistent change-point estimators. When sucient conditions are met, it is shown that the maximum likelihood estima- tor of the change point is consistent, unlike the classical abrupt change-point models. Results are extended to the case of nonlinear trends and nonequidistant observations. c 2003 Elsevier B.V. All rights reserved. Keywords: Canonical parameter; Change-point problem; Consistency; Generalized regression; Maximum likelihood; Natural exponential family 1. Introduction In the classical parametric change-point model, the data set X consists of two parts, (X 1 ;:::; X ) being a sample from distribution P(·| 0 ) and (X +1 ;:::; X n ) from distribu- tion P(·| 1 ). The change point is the parameter of interest, and j , j =0; 1 are known or unknown nuisance parameters. This denes an abrupt change model, which has been well studied during the recent decades. Classical references are e.g. Basseville and Nikiforov (1993), Carlstein (1988), Cherno and Zacks (1964), Cobb (1978), Hinkley (1970), and Page (1954); also see Bhattacharya (1995), Lai (1995) and Zacks (1983) for surveys of retrospective and sequential change-point estimation methods. Corresponding author. Programs in Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75083, USA. E-mail addresses: [email protected] (R. Gill), [email protected] (M. Baron). 1 Research is supported by the National Science Foundation, grant #9818959. 0378-3758/$ - see front matter c 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2003.09.015

Upload: ryan-gill

Post on 26-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Statistical Planning andInference 126 (2004) 441–460

www.elsevier.com/locate/jspi

Consistent estimation in generalizedbroken-line regression

Ryan Gilla, Michael Baronb;c;∗;1aDepartment of Mathematics, University of Louisville, Louisville, KY 40292, USA

bPrograms in Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75083, USAcMathematical Sciences, IBM, T.J. Watson Research Center, Yorktown Heights, NY 10598, USA

Received 28 March 2002; accepted 5 September 2003

Abstract

A change-point model is considered where the canonical parameter of an exponential familydrifts from its control value at an unknown time and changes according to a broken-line regres-sion. Necessary and su5cient conditions are obtained for the existence of consistent change-pointestimators. When su5cient conditions are met, it is shown that the maximum likelihood estima-tor of the change point is consistent, unlike the classical abrupt change-point models. Resultsare extended to the case of nonlinear trends and nonequidistant observations.c© 2003 Elsevier B.V. All rights reserved.

Keywords: Canonical parameter; Change-point problem; Consistency; Generalized regression; Maximumlikelihood; Natural exponential family

1. Introduction

In the classical parametric change-point model, the data set X consists of two parts,(X1; : : : ;X�) being a sample from distribution P(·|�0) and (X�+1; : : : ;Xn) from distribu-tion P(·|�1). The change point � is the parameter of interest, and �j, j=0; 1 are knownor unknown nuisance parameters. This de=nes an abrupt change model, which hasbeen well studied during the recent decades. Classical references are e.g. Bassevilleand Nikiforov (1993), Carlstein (1988), Cherno? and Zacks (1964), Cobb (1978),Hinkley (1970), and Page (1954); also see Bhattacharya (1995), Lai (1995) and Zacks(1983) for surveys of retrospective and sequential change-point estimation methods.

∗ Corresponding author. Programs in Mathematical Sciences, University of Texas at Dallas, Richardson,TX 75083, USA.

E-mail addresses: [email protected] (R. Gill), [email protected] (M. Baron).1 Research is supported by the National Science Foundation, grant #9818959.

0378-3758/$ - see front matter c© 2003 Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2003.09.015

442 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

However, abrupt change models are found impractical in a number of applications.In many problems related to quality and process control, global climate, economics,pollution, etc., the distribution parameter goes out of control and changes graduallyinstead of jumping instantly to a new value and stabilizing there.

In this paper, we consider a gradual change-point model, where an observationXj ∈Rd, j = 1; : : : ; n, collected at time t(j), has distribution P(·|�j) with

�j = �t( j); � = �0 + �(t(j) − �)+: (1)

Here �0 is a control value of the parameter, and the system is “in control” until time�. After that, the system drifts away from the control value at the rate of �∈Rd unitsper one unit of time. For a multiparameter family with d¿ 1, the slope � not onlydetermines the speed, but also the direction of change. The sequence of sampling times{t(j)} is increasing to t(∞) = limj→∞ t(j)6+ ∞. Placing no additional assumptionson t(j) allows to generalize (1) to nonlinear models (Remark 1 below).

Similar models are studied in Gupta and Ramanayake (2001), for the case of Ex-ponential distributions, and in HuNskovOa (1999), JaruNskovOa (1998), and Siegmund andZhang (1995) and a few other papers, for the case of a location parameter family.A commonly considered model involves “a trend” and “an error”, so that only themean of observed values changes according to the broken-line regression. However, inreality, almost any sliding of a system from its control state results in a change (typi-cally, increase) of variance, probabilities of large deviations, and other characteristics,in addition to a changing mean. Therefore, we consider the general case of a d-variatenatural exponential family

dP�(x) = e〈�;x〉− (�) dP0(x) (2)

with the canonical parameter �∈ int(�), where

� ={�∈Rd

∣∣∣∣∫Rd

e〈�;x〉 dP0(x)¡∞}

is a convex parameter set, containing the control value �0, and (�) = log∫Rd e〈�;x〉

dP0(x) is the log Laplace transform of P0. The function (�) is analytic and convexon �, with ∇ (�) = E�X and ∇2 (�) = Var� X for all �∈ int(�) (Brown, 1986).

The model described by (1) and (2) extends the class of generalized linear models(MPcCullagh and Nelder, 1983; Myers et al., 2002) to the case of a broken-line regres-sion. From this point of view, the role of a link function is played by (∇ )−1(�),where �=E�X. Model (1) allows a generalized linear trend to change at an unknownmoment. In a more general setting, t(j) represent covariates, not necessarily corre-sponding to sampling times. In this setting, � is the value of a covariate that separatestwo di?erent generalized linear trends of the response variable.

Remark 1 (Nonlinear models). Notice that the model de=ned by (1) is not restrictedto linear trends of the parameter. Any nonlinear trend can =t into Eq. (1) by anappropriate transformation of the time scale and consequent adjustment of sampling

R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460 443

times t(j). For example, consider a system that is suspected to go out of control attime � and to change at a logarithmic speed, whereas observations are sampled atequidistant intervals, say, every hour. Then the distribution of collected data coincideswith the distribution of X1;X2; : : :, where Xj, observed at the time t(j) = log(j), hasthe distribution P�j , with �j going out of control at the time � = log(�), in accordancewith the basic linear model (1).

Remark 2 (Possible location of a change point). Classical abrupt change-point modelsassume that the change occurs at one of sampling times. That is, � = t(j) for some j.This is caused by the fact that any value of � between t(j) and t(j + 1) generates thesame joint distribution of (X1; : : : ;Xn). Hence, allowing � to be a continuous parameteryields an unidenti?able (ill-posed) model.

Unlike the classical models, any real value of � in (−∞; t(n)] results in a di?erentdistribution of data. Thus, � is not necessarily the time epoch when an observation hasbeen collected. It is possible that two consecutive observations are collected at times� − s and � + t, but no data point is sampled at the time �.

The case � = t(n) is special; it means that all the observed data follow the same“in-control” distribution, i.e., a change has not occurred before the end of our samplingperiod. In this case, it is impossible to predict when it will occur (except for a Bayesianmodel with a prior distribution of �). Any value of �¿ t(n) yields the same jointdistribution of data, therefore, we let �∈ (−∞; t(n)] for identi=ability purposes.

Remark 3 (Sampling times): Finally, we no longer assume that sampling occurs atequidistant times. The sequence of sampling times {t(j)} is almost arbitrary, but it isassumed to be known which is crucial for estimating the change point.

In this paper, we investigate the possibility of estimating the change-point parameter� consistently. In the abrupt change-point model, there does not exist a consistentchange-point estimator (Hinkley, 1970). Indeed, the parameter � stabilizes at a newvalue �1, and increasing a sample size by obtaining remote observations from thedistribution P(·|�1) contributes little information about �. Conversely, under model (1),the parameter drifts further away from its control value, making occurrence of a changemore and more apparent. As a result, consistent estimators of the change point existfor certain exponential families.

The maximum likelihood estimation procedure under model (1) is derived in Section2. Section 3 establishes a necessary condition for the existence of a consistent estimator.Su5cient conditions for the existence of consistent estimators are obtained in Section4. The proof is constructive; it is shown that under these conditions, the maximumlikelihood estimator is consistent.

2. Maximum likelihood estimation

For the regression-type gradual single change-point model where �0 and � are known,consider a data set x=(x1; : : : ; xn) where xj is the observed value of Xj for j=1; : : : ; n.

444 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

Let {f(·|�) | �∈�} be a family of densities, under a reference measure �, associatedwith {P� | �∈�}. Let z6 t(n) and let nz be the number of observations taken at orbefore time z. Then the likelihood function of x given z is

L(x|z; �0; �) = f(x1|�0) · · ·f(xnz |�0)f(xnz+1|�t(nz+1);z) · · ·f(xn|�t(n); z)

=n∏

j=1

f(xj|�0)n∏

j=nz+1

f(xj|�t( j); z)f(xj|�0)

; (3)

where

�t; z = �0 + �(t − z)+; (4)

random vectors Xj; j = 1; : : : ; n, are independent, and∏n

k ≡ 1 if k ¿n.This model may not be well de=ned for some values of z in (−∞; t(n)]. Let Vn

be the set of z for which (2) along with (4) are well de=ned for all j = 1; : : : ; n. Fromthe convexity of �, Vn is connected and Vn = (t(n) − �;∞), where

� = sup{t | �0 + �t ∈ int(�)}:Indeed, since � is convex and �0 ∈ int(�), it is necessary and su5cient for

{�0 + �(t(j) − z)+ | j = 1; : : : ; n} ⊂ int(�)

that �0 +�(t(n)−z)+ ∈ int(�). The latter implies that (t(n)−z)+ ¡�, i.e., z¿ t(n)−�.All such z constitute the set Vn.

Then, from (3), the maximum likelihood estimate of � is

�̂MLE = �̂MLE(n) = arg maxz∈Vn; z6t(n)

L(X|z; �0; �) = arg maxz∈Vn; z6t(n)

g(z)

(see Section 1, Remark 2), where

g(z) = log L(X|z; �0; �) − log L(X|z; �0; 0)

=n∑

j=nz+1

{(t(j) − z)�TXj − ’(t(j) − z) + ’(0)} (5)

and ’(t)= (�0 +�t). Thus, the maximum likelihood estimate can be computed from acumulative sums process based on independent log-likelihood ratios. In general, it maynot be unique, therefore we consider a set of all maximum likelihood estimates of �

Mn ={z6 t(n) | g(z) = max

w∈Vn

g(w)}

:

Lemma 1. The function g(z) is

(a) analytic on Vn \ {t(1); : : : ; t(n)};(b) continuous on Vn;(c) strictly concave down on (infVn; t(1)] and [t(j); t(j+ 1)] for any j= 1; : : : ; n−1.

R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460 445

Proof. Since ’ is analytic on (t(i); t(i + 1)) and obviously any linear function isanalytic, the sum of (n − i) functions is analytic and (a) follows.

By (a), it su5ces to show that g(z) is continuous at all the sampling times t(i). Fori = 1; : : : ; n,

limz↑t(i)

g(z) =n∑

j=(i−1)+1

limz↑t(i)

{(t(j) − z)�TXj − [’(t(j) − z) − ’(0)]}

=n∑

j=i

{(t(j) − t(i))�TXj − [’(t(j) − t(i)) − ’(0)]}

= 0 +n∑

j=i+1

{(t(j) − t(i))�TXj − [’(t(j) − t(i)) − ’(0)]}

= g(t(i)):

Similarly, g(z) → g(t(i)) as z ↓ t(i), for i = 0; 1; : : : ; n − 1. This proves continuity ofg(z).

Finally, we have

g′′(z) =n∑

j=i+1

{−’′′(t(j) − z)}¡ 0

which proves (c).

The last part of the proof was based on Lemma 2 below which follows from Theorem1.13(iv) of Brown (1986).

Lemma 2. For any minimal d-dimensional standard exponential family (2), ∇2 (�)is strictly positive de?nite for all �∈�.

3. Consistent estimation: necessary condition

It will be seen that the possibility of estimating the change-point parameter � con-sistently depends on the exponential family (2) and the sequence of sampling times{t(j)}. Unboundedness of ∇ (�)=E�X is the key necessary condition for the existenceof consistent estimators.

The study of consistency refers to a sequence {Xj; j¿ 1}, where Xj are distributedaccording to (1) and (2). Assumptions of Theorems 1–3 guarantee existence of sucha sequence. For an unbounded sequence of sampling times {t(j)}, Theorems 1 and 2imply that � = +∞, thus the parameter set for the change point � is (−∞;+∞).

The proof is based on the analysis of likelihood ratios

�y;z = L(X|y; �0; �)=L(X|z; �0; �) = exp{g(y) − g(z)}: (6)

446 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

Theorem 1. Under model (1), (2) with �0 + ��∈ int(�) for all �¿ 0, there does notexist a consistent estimator of a change point �, if

(a) the function ’′(�) = �T∇ (�0 + ��) is bounded from above;(b) lim inf j→∞ {t(j + 1) − t(j)}¿ 0 for a sequence of sampling times t(j).

Proof. The proof is based on the following contradiction. We will show that existenceof a consistent estimator yields E�=u(�u;v) → ∞ as n → ∞ for u¡v. On the otherhand, boundedness of E�=u(�u;v) follows from conditions (a) and (b).

Assume existence of a consistent estimator Tn(X1; : : : ;Xn). Then, for any u,

P�=u(|Tn − u|¡") → 1

as n → ∞. So pick u and v with u¡v and consider the likelihood ratio

� = �u;v(X) =dP�=u

dP�=v(X)

which satis=es

Evh(X)�(X) = Euh(X) (7)

for any �-measurable function h= h(x). Using (7) with the indicator function h1(x) =I({x : |Tn − u|¡"}), we have

Ev(h1�) = Eu(h1) = Pu(|Tn − u|¡") → 1 (8)

as n → ∞. Also,

Ev(h1) = Pv(|Tn − u|¡") → 0 (9)

and

Varv h1 = (Evh1)(1 − Evh1) → 0: (10)

Since Ev� = 1, (8) and (9) imply that

Covv(h1; �) = Evh1� − Evh1Ev� → 1: (11)

Then, from (10) and (11),

Varv � =Cov2

v(h1; �)

Corr2v(h1; �)Varv h1

→ ∞; (12)

because the correlation coe5cient is bounded between −1 and 1. Now using (7) withh = � along with (12), one has

Eu� = Ev�2 = Varv � + (Ev�)2 → ∞: (13)

Now, we will =nd a contradiction with (13) by obtaining a uniform upper bound forEu�. Condition (b) guarantees existence of positive N and & such that t(j+1)−t(j)¿&for all j¿N . Then, let r = �2((v − u)=&)�. Also, let

Mj = sup{’′(w) | t(j) − u6w6 t(j) + v − 2u}

R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460 447

and

mj = inf{’′(w) | t(j) − v6w6 t(j) − u}:Since ’ is convex, ’′ is nondecreasing, and Mj6mk as long as t(j)−2u6 t(k)−2v.Therefore, for any k, MN+kr+j6mN+(k+1)r+j, because

t(N + (k + 1)r + j) − 2v¿ t(N + kr + j) + r& − 2v¿ t(N + kr + j) − 2u:

From (5) and (6), we have

logEu� = A +n∑

j=s

{logEue(v−u)�TXj − [’(t(j) − u) − ’(t(j) − v)]}

= A +n∑

j=s

{[’(t(j) + v − 2u) − ’(t(j) − u)]

+ [’(t(j) − u) − ’(t(j) − v)]}

6 A + (v − u)n∑

j=s

(Mj − mj)

6 A + (v − u)N(

supw

’′(w) − ms

)

+ (v − u)r−1∑j=0

{ ∞∑k=0

(MN+kr+j − mN+(k+1)r+j) − mN+j

}

6 A + (v − u)(N sup

w’′(w) − Nms − rmN

)¡∞;

where s = nv + 1 and

A =nv∑

j=nu+1

{logEue(t( j)−u)�TXj − [’(t(j) − u) − ’(0)]}

=nv∑

j=nu+1

{’(2(t(j) − u)) − 2’(t(j) − u) + ’(0)}:

This contradicts (13) and shows that there is no consistent estimator of �.

Condition (a) of Theorem 1 implies boundedness of ’′(�) along at least one directionin Rd that is not orthogonal to �. Two most typical situations are considered in thefollowing corollaries.

Corollary 1 (Equally spaced observations): If condition (a) of Theorem 1 holds andt(j) = a + bj for some a¿ 0 and b¿ 0, then there is no consistent estimator of �.

448 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

Corollary 2 (Univariate case): Let d=1 and let t(j) satisfy condition (b) of Theorem1. If - is positive (negative), and ′(�) is bounded from above (below), then there isno consistent estimator of �.

To illustrate the result of Theorem 1, we consider the family of exponential dis-tributions that satis=es condition (a). This example is also applicable to Beta(.; 1),single-parameter Pareto(/), Laplace(�), and Weibull(�; �) (known �) families of distri-butions because they have the same canonical representation (2).

Example 1 (Inconsistency of �̂ for exponential distributions): The family of Exponen-tial (�) distributions can be written in the form (2) with (�)=−log(1−�), �=1−�∈(−∞; 1). Let �0 = 0, -= −1, and t(j) = j, so that observations X1; : : : ; Xn are collectedat equally spaced times. Consider an estimator �̃ = arg maxz=1; :::; n L(X|z; 0; 1).

In order to show that �̃ is not consistent, it is su5cient to show that P(log ��−1; � ¡ 0)does not converge to 1 as n → ∞. From (5),

log ��−1; � =n∑

j=�

(−Xj + log

j − � + 2j − � + 1

)= log

n−�∏j=0

Yj + log(n − � + 2); (14)

where Yj = exp{−X�+j} has Beta(1 + j; 1) distribution.For any distinct positive a1; : : : ; aN , the product of independent Beta (aj; 1) random

variables has the cumulative distribution function

F(t) =N∏

j=1

aj

N∑k=1

(−1)N−1

ak

N∏i=1i =k

(1

ak − ai

)tak

for 0¡t¡ 1 (see Gill, 2002 for the proof). With aj = 1 + j, N = n − � + 1, andt = (n − � + 2)−1, from (14) we have

P(log ��−1; � ¡ 0) = F(

1n − � + 2

)= F

(1

N + 1

)

= N !N∑

k=1

(−1)N−1

k

N∏i=1i =k

(1

k − i

) (1

1 + N

)k

= 1 −N∑

k=0

N !k!(N − k)!

(− 1

1 + N

)k

= 1 −(

1 − 11 + N

)n

:

R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460 449

The last probability converges to 1 − e−1 ≈ 0:6321 as N → ∞. This result is sup-ported by a Monte Carlo study in Gill (2002). Thus, �̂MLE is not consistent sinceP(log ��−1; � ¡ 0) does not converge to 1.

4. Consistent estimation: su&cient conditions

This section gives su5cient conditions on the family P� and sampling times t(j),under which, unlike the classical abrupt change model, the maximum likelihood estima-tor �̂MLE(n) of the change point � is consistent. A direct proof of consistency requiresit to be shown that

P(|�̂MLE(n) − �|¿") → 0 (15)

as n → ∞. If �̂MLE(n) is not de=ned uniquely, then (15) must hold for any sequenceof maximum likelihood estimates zn ∈Mn.

It will be convenient to express (15) in terms of the following sets:

Uz;k;4 = {x | g(k) − g(z)6 4}; (16)

UA =⋃z∈A

Uz; �̂MLE ;0 = {x |A ∩ Mn �= ∅};

D−z; 4 =

{x | lim

y↑zg′(y)6 4

}; (17)

D+z; 4 =

{x | lim

y↓zg′(y)¿ 4

}; (18)

VN = {i∈Z+ | n − N6 i¡n and t(i + 1) − t(i)¡ 1};

WN = {i∈Z+ | n − N6 i¡n and t(i + 1) − t(i)¿ 1}:First, we restate (15) in the new notations as

P(U(−∞; �−") ∪ U(�+"; t(n)]) → 0 (19)

as n → ∞. Then, the sets involved in (19) are connected with simpler sets D− andD+ by the following two lemmas.

Lemma 3. For any a∈ [t(i); t(i + 1)) and b∈ (t(i); t(i + 1)], one has U[t(i); b) ⊂ D−b;0

and U(a; t(i+1)] ⊂ D+a;0.

Lemma 4. For any positive integer N ¡n,

U(t(n−N ); t(n)]\Ut(n−N ); �; 4 is a subset ofn−1⋃

i=n−N

D+t(i); 4∗

i;

450 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

where

4∗i =

{4=N if i∈VN ;

0 if i∈WN :

The proofs of these and further lemmata are given in the appendix.According to Lemma 3, the maximum likelihood estimator �̂MLE cannot belong to

the interval [t(i); b) if the left derivative of g at b is positive, and it cannot belong to(a; t(i + 1)] if the right derivative of g at a is negative. Also, by Lemma 4, if �̂MLE isin the interval (t(n−N ); t(n)] and g(�)¿g(t(n−N )) + 4, then the right derivative ofg must be at least 4∗

i at least one sampling time t(i), where i¿ n − N .Lemmas 3 and 4 allow estimation of the probability in (15) by means of sets D−

and D+. The latter have a rather simple structure determined by the =rst derivative ofthe log-likelihood ratio function g(z).

Lemma 5 (Monotone properties of D−z; 4 and D+

z; 4). (a) For any z, D−z; 4 is increasing

in 4.(b) For any z, D+

z; 4 is decreasing in 4.(c) For any 4 and i¡n, D−

z; 4 is increasing in z on (t(i); t(i + 1)].(d) For any 4 and i¡n, D+

z; 4 is decreasing in z on [t(i); t(i + 1)).

In order to estimate the probability in (15), we will derive upper bounds for P(Uz;�;4),P(D−

z; 4), and P(D+z; 4). Here we use the ChernoC inequality

P(Y ¿y)6 exp(−uy)MY (u) (20)

that holds for all u¿ 0 where MY (u), the moment generating function of the randomvariable Y , exists (e.g. Ross, 1994).

Cherno? bounds for the mentioned probabilities will be expressed in terms of thefollowing integrals,

I(a; b; u) =∫ a

(1−u)a+ub’′′(w)(a − w) dw;

J (a; b; u) = (1 − u)I(a; b; u) + uI(b; a; 1 − u);

K(a; b) =∫ b

a’′′(w) dw = ’′(b) − ’′(a):

Lemma 6 (Cherno? bound for Uz;�;4). For suDciently small u¿ 0 and any �; z ∈Vn,

P(Uz;�;4)6 exp

u4 −

n∑j=nm+1

J ((t(j) − �)+; (t(j) − z)+; u)

;

where m = min{�; z}.

R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460 451

Lemma 7 (Cherno? bound for D−z; 4). For suDciently small u¿ 0 and any �; z ∈Vn,

P(D−z; 4)6 exp

u4 −

n∑j=n−

z +1

[uK(t(j) − � + u; t(j) − z)

+ I(t(j) − �; t(j) − � + 1; u)]

;

where n−z = limt↑z nt is the number of observations collected before time z.

Lemma 8 (Cherno? bound for D+z; 4). For suDciently small u¿ 0 and any �; z ∈Vn,

P(D+z; 4)6 exp

−u4 −

n∑j=nz+1

[uK(t(j) − z; t(j) − � − u)

+I(t(j) − �; t(j) − � − 1; u)]

:

It remains to derive a lower bound for the integrals I , J , and K that appear inLemmas 6–8.

Lemma 9. If ’′′(w)¿C for any w∈ [a; b] and some C ¿ 0, then

I(a; b; u)¿Cu2(a − b)2=2;

J (a; b; u)¿Cu(1 − u)(a − b)2=2

and

K(a; b)¿C(b − a):

We are now ready to prove the main result of this section.

Theorem 2. Consider model (1), (2) with �0 + ��∈ int(�) for all �¿ 0 and t(∞) =limj→∞ t(j) = +∞. Suppose that ’′′(w)¿C for any w¿ 0 and some C ¿ 0. Thenthe maximum likelihood estimator of a change point is consistent.

Proof. Based on Lemmas 3–9, we will prove that

P(|�̂MLE(n) − �|¿") = P(U(−∞; �−") ∪ U(�+"; t(n)])

6 P(U(−∞; �−")) + P(U(�+"; t(n−�n. )]) + P(U(t(n−�n. ); t(n)])

(21)

452 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

converges to 0 for any "¿ 0 by showing that each probability in (21) converges to 0for some .∈ (0; 1).

First, we notice that for any i6 n�−" and b∈ (t(i); t(i + 1)],

logP(U[t(i); b))6 logP(D−b;0) (22)

6− "2

n∑j=i+1

K(t(j) − � + "=2; t(j) − b) (23)

6− C"2

(n − i)(� − b − "

2

)(24)

6− C"2

4(n − n�−");

if b6 �−". Here, (22) follows from Lemma 3, (23) follows from Lemma 7 with z=band u = "=2, and (24) follows from Lemma 9.

A similar inequality holds for P(U(−∞; t(1)). Then,

P(U(−∞; �−"))6 P(U(−∞; t(1)) +n�−"−1∑

i=1

P(U[t(i); t(i+1))) + P(U[t(n�−");�−"))

6 (n�−" + 1)exp{

−C"2

4(n − n�−")

}→ 0; (25)

as n → ∞.Consider the second term in (21). Similarly to (22)–(24),

logP(U(a; t(i+1)])6 logP(D+a;0)

6− "2

n∑j=i+1

K(t(j) − a; t(j) − � − "=2)

6−C"2

(n − i)(a − � − "=2)6− C"2

4n.;

by Lemmas 3 and 8, for any i∈ [n�+"; n − �n.� − 1] and a∈ (t(i); t(i + 1)] such thata¿ � + ". Hence,

P(U(�+"; t(n−�n. )])6 P(U(�+"; t(n�+"+1)]) +n−�n. −1∑j=n�+"+1

P(U(t(i); t(i+1)])

6 (n − �n.� − n�+")exp{

−C"2

4n.

}→ 0; (26)

as n → ∞.

R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460 453

Finally, consider the last probability in (21). Let 4 = (C"2=8)(n − n�+") and U. =Ut(n−�n. ); �; 4. From Lemma 4,

P(U(t(n−�n. ); t(n)])6 P(U.) + P(U(t(n−�n. ); t(n)] \ U.)

6 P(U.) + P

n−1⋃

i=n−�n. D+

t(i); 4∗i

6 P(U.) +∑

i∈V�n.�

P(D+t(i); 4=�n. ) +

∑i∈W�n.�

P(D+t(i);0): (27)

We will show that each term in (27) converges to 0. From Lemma 6 withz = t(n − �n.�) and u = 1=2 and Lemma 9,

logP(U.)6124 −

n∑j=n�+1

J(t(j) − �; (t(j) − t(n − �n.�))+;

12

)

6124 − C

8

n∑j=n�+1

(t(n − �n.�))+ − �)2

6124 − C"2

8(n − n�+") = −C"2

16(n − n�+") (28)

for su5ciently large n. Therefore, as n → ∞,

P(U.) = O(

exp{

−C"2n16

})→ 0: (29)

Next, for any i∈V�n. , from Lemma 8 with z = t(i) and u = 1=2,

logP(D+t(i); 4=�n. )6− 4

2�n.� 6− C"2(n − n�+")16n. ;

so that

∑i∈V�n.�

P(D+t(i); 4=�n. )6 n. exp

{−C"2(n − �n.�)

16n.

}→ 0; (30)

as n → ∞.The last term in (27) vanishes if the set W�n. is empty. Otherwise, let = be the

cardinality of W�n. , and let wi be the ith smallest element of W�n. . Then, applyingLemma 8 with z = t(wi), 4 = 0, u = "=2 and Lemma 9,

∑i∈W�n.�

P(D+t(i);0) =

=∑i=1

P(D+t(wi);0)

454 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

6=∑

i=1

exp

− "

2

n∑j=wi+1

K(t(j) − t(wi); t(j) − � − "

2

)

6 = exp{−C"

2

(t(n − �n.�) − � − "

2

)}→ 0; (31)

as n → ∞ because =6 n. and t(wi)¿ t(n − �n.�) → ∞.From (29)–(31), we see that all three terms in (27) converge to 0. Combining this

result with (25) and (26) shows that the probability in (21) also converges to 0, whichproves consistency of �̂MLE.

As seen from the next theorem, the maximum likelihood estimator of the changepoint is always consistent in the case when t(∞)¡∞, with no restrictions on theexponential family. The assumption of bounded sampling times is rather impractical;however, generalized broken-line regression models with lim t(i)¡∞ can be used forcovariates other than times of observations. Also, according to Remark 1 of Section 1,this model is equivalent to a nonlinear model with t(∞) = ∞, where the parameter �jleaves its control state and converges to a new value.

Theorem 3. If t(∞)¡∞ and �0 + ��∈ int(�) for all 06 �6 t(∞), then the maxi-mum likelihood estimator of a change point is consistent.

Proof. Similarly to (21) and (27),

P(|�̂MLE − �|¿")6 P(U(−∞; z)) + P(U[z; �−")) + P(U(�+"; t(n−�n. )))

+P(U.) +∑

i∈V�n.�

P(D+t(i); 4∗

i) +

∑i∈W�n.�

P(D+t(i);0):

Boundedness of the sequence t(j) implies, by Lemma 2, that ’′′(t) is separatedfrom 0 on any closed interval that is contained in [0; �). Let & = min{�=2; 1} andz = min{� − "; t(∞) − � + &}. Then, for any �∈ [z; t(∞)] and t6 t(∞),

06 (t − �)+6 (t(∞) − z)+ = (� − &)+ ¡�;

and ’′′((t − �)+) = �T∇2 (�0 + �(t − �)+)�¿C is separated from 0 by some C ¿ 0.Repeating the proof of Theorem 2, we obtain that

P(U[z; �−")); P(U(�+"; t(n−�n. )]); P(U.);∑

i∈V�n.�

P(D+t(i); 4∗

i) → 0;

as n → ∞, because (25), (26), (29), and (30) do not require that t(∞)¡∞. Also,∑i∈W�n.�

P(D+t(i);0)6P(W�n. �= ∅) → 0

R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460 455

in the case of convergent {t(n)}, because t(i + 1) − t(i)¡ 1 for all i¿ n − �n.�, if nis su5ciently large.

Thus, it remains to prove that P(U(−∞; z))=P(U[t(∞)−�; z)) → 0. Notice that z6 t(∞)− �=2, hence t(n)¿z and n−

z ¡n for su5ciently large n. From Lemmas 3 and 7 withu = "=2,

logP(U[t(∞)−�; z))6 logP(D−z;0)6− "

2

n∑j=n−

z +1

K(t(j) − � + "=2; t(j) − z): (32)

Here n − n−z → 0 and

K(t(j) − � + "=2; t(j) − z)¿M (j; ")(� − "=2 − z)¿M (j; ")"=2;

where M (j; ") = min{’′′(t) | t(j) − � + "=26 t6 t(j) − z). We have

t(j) − z6max{� − &; t(∞) − � − "}6max{� − &; � − "}and for su5ciently large j,

t(j) − � + "=2¿ t(j) − t(∞) + "=2¿"=4:

Hence,

M (j; ")¿ min["=4;�−min{";&}]

’′′(t)¿ 0:

Thus, sum (32) diverges to ∞, so that P(U[t(∞)−�; z)) → 0 which concludes the proofof the theorem.

Remark 4 (Con=dence estimation of the change point): Proofs of Theorems 2 and 3allow con=dence estimation of the change-point parameter �. Cherno? bounds (25),(26), (30), and (31) obtained in the proof of Theorem 2, along with (21) and (27),estimate the probability that �̂MLE does not belong to the "-neighborhood of �. Sub-tracting their sum from 1 gives a lower bound for the coverage probability of theinterval �̂MLE ± ". As often happens, this coverage probability depends on the unknownparameter �. However, under assumption that the change point occurred, say, in a largeinterval [a; b], one can estimate the con=dence level and use �̂MLE ±" as the con=denceinterval for �.

5. Conclusion

We showed that unlike the classical abrupt change-point problems, it may be possibleto estimate the change-point parameter consistently in the case of gradual changes.For the existence of consistent estimators, it is su5cient to assume an exponentialfamily with ’′′(�) separated from 0. In this case, the maximum likelihood estimator isconsistent. It was also shown that no consistent estimators exist when ’′(�) is boundedfrom above. In stochastic modeling, an absolute majority of commonly used exponentialfamilies fall into either one or the other category.

456 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

For example, according to our results, consistent estimation of a change-point param-eter is possible for Normal(�; 1) and Poisson(�) families with any - and Gamma(.; 1)family with -¿ 0, but not in the case of Binomial(n; p), Beta(.; 1), Exponential(�) orNormal(0; /2) families.

In the case of a bounded sequence of sampling times, no further assumptions areneeded, and the maximum likelihood estimator is consistent under any exponentialfamily. It is also consistent in the situation when the canonical parameter graduallydrifts from its control state and converges to another value. This includes the casewhen the parameter approaches its the =nite boundary of its domain, e.g. in the caseof Gamma family with an expectation tending to in=nity.

Appendix A

A.1. Proof of Lemma 3

Let x∈U[t(i); b). Then the maximum value of g on [t(i); b] is attained at z ∈ [t(i); b)∩Mn. Since g is concave down on (t(i); t(i + 1)), it follows that

limy↑b

g′(y)¡ limy↓z

g′(y)6 0:

Thus, x∈D−b;0 so that U[t(i); b) ⊂ D−

b;0.Similarly, let x∈U(a; t(i+1)]. Then the maximum value of g on [a; t(i+1)] is attained

at z ∈ (a; t(i + 1)] ∩ Mn, and

limy↓a

g′(y)¿ limy↑z

g′(y)¿ 0:

Thus, x∈D+a;0 so that U(a; t(i+1)] ⊂ D+

a;0.

A.2. Proof of Lemma 4

Suppose x �∈ ⋃n−1i=n−N D+

t(i); 4∗i. Then limy↓t(i) g′(y)¡4∗

i for all i∈ [n−N; n−1] whichimplies that g′(z)¡4∗

i for z ∈ (t(i); t(i + 1)). Hence,

g(z) − g(t(i))¡4∗i (z − t(i))

for z ∈ [t(i); t(i + 1)].To show that x �∈ U(t(n−N ); t(n)] \ Ut(n−N ); �; 4, it su5ces to show that x �∈ U(t(n−N ); t(n)]

if x �∈ Ut(n−N ); �; 4. The latter implies that

g(�) − g(t(n − N ))¿4:

Choose arbitrary j ∈ [n − N; n − 1] and z ∈ [t(j); t(j + 1)]. Then

g(�) − g(z) = g(�) − g(t(n − N )) +j−1∑

i=n−N

{g(t(i)) − g(t(i + 1))}

R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460 457

+ g(t(j)) − g(z)

¿4 −j−1∑

i=n−N

4∗i (t(i + 1) − t(i)) − 4∗

j (z − t(j))

= 4 −∑

i∈VN∩[n−N;j−1]

4N

(t(i + 1) − t(i)) − 4∗j (z − t(j))

¿4 −∑

i∈VN∩[n−N;j−1]

4N

− 4∗j (z − t(j))

¿ 4 − 4(j − n + N + 1)=N¿ 0;

so that x �∈ U(t(n−N ); t(n)]. The lemma is proved.

A.3. Proof of Lemma 5

Statements (a) and (b) are obvious, whereas (c) and (d) follow from the concavityof g(z) that implies that g′(z) is a decreasing function.

A.4. Proof of Lemma 6

It follows from (5) and (6) that

P(Uz;�;4) = P(Y ¿y); (A.1)

where

Y =n∑

j=nm+1

[(t(j) − z)+ − (t(j) − �)+]�TXj

and

y = −4 +n∑

j=nm+1

[’((t(j) − z)+) − ’((t(j) − �)+)]:

Since E� exp{�TX}=exp{ (�+�)− (�)} (see Brown, 1986), the moment generatingfunction of Y can be computed as

MY (u) =E exp(uY )

=n∏

j=nm+1

exp{’((1 − u)(t(j) − �)+ + u(t(j) − �)+) − ’((t(j) − �)+)}:

According to Cherno? inequality (20),

P(Y ¿y)6 exp

u4 −

n∑j=nm+1

.j(z)

(A.2)

458 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

with

.j(z) = (1 − u)’((t(j) − �)+) + u’((t(j) − z)+)

−’((1 − u)(t(j) − �)+ + u(t(j) − z)+)

= (1 − u)∫ (t( j)−�)+

(1−u)(t( j)−�)++u(t( j)−z)+’′′(w)((t(j) − �)+ − w) dw

+ u∫ (t( j)−z)+

(1−u)(t( j)−�)++u(t( j)−z)+’′′(w)((t(j) − z)+ − w) dw;

= J ((t(j) − �)+; (t(j) − z)+; u);

where we used the integral remainder form of Taylor’s theorem as given in Thompson(1989).

A.5. Proof of Lemma 7

From (5) and (17),

P(D−z; 4) = P(Y ¿y); (A.3)

where

Y =n∑

j=n−z +1

�TXt( j)

and

y = −4 +n∑

j=n−z +1

’′(t(j) − z):

Then, from Cherno? inequality (20),

P(Y ¿y)6 exp

u4 −

n∑j=n−

z +1

.j(z)

; (A.4)

where

.j(z) = ’(t(j) − �) − ’(t(j) − � + u) + u’′(t(j) − z)

= u∫ t( j)−z

t( j)−�+u’′′(w) dw +

∫ t( j)−�+u

t( j)−�’′′(w)(w − t(j) + �) dw:

= uK(t(j) − � + u; t(j) − z) + I(t(j) − �; t(j) − � + 1; u):

R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460 459

A.6. Proof of Lemma 8

This proof essentially repeats the steps in the proof of Lemma 7 with

Y = −n∑

j=nz+1

�TXt( j)

and

y = 4 −n∑

j=nz+1

’′(t(j) − z):

A.7. Proof of Lemma 9

If ’′′(w)¿C on [a; b], then

I(a; b; u)¿C∫ a

(1−u)a+ub(a − w) dw =

12Cu2(a − b)2: (A.5)

The inequality about J (a; b; u) follows directly from (A.5), and the last statement ofthe lemma is obvious.

References

Basseville, M., Nikiforov, I.V., 1993. Detection of Abrupt Changes: Theory and Application. PTRPrentice-Hall, Inc., Englewood Cli?s, NJ.

Bhattacharya, P.K., 1995. Some aspects of change-point analysis. Change-Point Problems, IMS LectureNotes—Monograph Series, Vol. 23. Institute of Mathematical Statistics, Hayward, CA, pp. 28–56.

Brown, L.D., 1986. Fundamentals of Statistical Exponential Families with Applications in Statistical DecisionTheory, Lecture Notes—Monograph Series, Vol. 9. Institute of Mathematical Statistics, Hayward, CA.

Carlstein, E., 1988. Nonparametric estimation of a change-point. Ann. Statist. 16, 188–197.Cherno?, H., Zacks, S., 1964. Estimating the current mean of a normal distribution which is subject to

changes in time. Ann. Math. Statist. 35, 999–1028.Cobb, G.W., 1978. The problem of the Nile: conditional solution to a change-point problem. Biometrika 62,

243–251.Gill, R., 2002. Introduction to generalized broken-line regression. Ph.D. Dissertation, The University of Texas

at Dallas.Gupta, A.K., Ramanayake, A., 2001. Change points with linear trends for the exponential distribution. J.

Statist. Plann. Inference 93, 181–195.Hinkley, D.V., 1970. Inference about the change-point in a sequence of random variables. Biometrika 57,

1–17.HuNskovOa, M., 1999. Gradual changes versus abrupt change. J. Statist. Plann. Inference 76, 109–125.JaruNskovOa, D., 1998. Testing appearance of linear trend. J. Statist. Plann. Inference 70, 263–276.Lai, T.L., 1995. Sequential changepoint detection in quality control and dynamical systems. J. R. Statist.

Soc. B 57, 613–658.MPcCullagh, P., Nelder, J.A., 1983. Generalized Linear Models. Chapman & Hall, London.Myers, R.H., Montgomery, D.C., Vining, G.G., 2002. Generalized Linear Models. With Applications in

Engineering and the Sciences. Wiley-Interscience (John Wiley & Sons), New York.

460 R. Gill, M. Baron / Journal of Statistical Planning and Inference 126 (2004) 441–460

Page, E.S., 1954. Continuous inspection schemes. Biomterika 41, 100–115.Ross, R., 1994. A First Course in Probability. Macmillan, New York.Siegmund, D.O., Zhang, H., 1995. Con=dence regions in broken line regression. Change-Point Problems,

IMS Lecture Notes—Monograph Series, Vol. 23. Institute of Mathematical Statistics, Hayward, CA, pp.292–316.

Thompson, H.B., 1989. Taylor’s theorem using the generalized Riemann integral. Amer. Math. Monthly 96,346–350.

Zacks, S., 1983. Survey of classical and Bayesian approaches to the change-point problem: =xed sample andsequential procedures in testing and estimation. Recent Adv. in Statist. Academic Press, New York, pp.245–269.