econometrica, vol. 68, no. 5 september, 2000 , 1249 1280bhansen/718/parkphillips2000.pdf ·...

32
Ž . Econometrica, Vol. 68, No. 5 September, 2000 , 1249]1280 NONSTATIONARY BINARY CHOICE BY JOON Y. PARK AND PETER C. B. PHILLIPS 1 This paper develops an asymptotic theory for time series binary choice models with nonstationary explanatory variables generated as integrated processes. Both logit and Ž . probit models are covered. The maximum likelihood ML estimator is consistent but a new phenomenon arises in its limit distribution theory. The estimator consists of a mixture of two components, one of which is parallel to and the other orthogonal to the direction of the true parameter vector, with the latter being the principal component. The ML estimator is shown to converge at a rate of n 3r4 along its principal component but has the slower rate of n 1r4 convergence in all other directions. This is the first instance known to the authors of multiple convergence rates in models where the regressors have the Ž . same full rank stochastic order and where the parameters appear in linear forms of these regressors. It is a consequence of the fact that the estimating equations involve nonlinear integrable transformations of linear forms of integrated processes as well as polynomials in these processes, and the asymptotic behavior of these elements is quite different. The limit distribution of the ML estimator is derived and is shown to be a mixture of two mixed normal distributions with mixing variates that are dependent upon Brownian local time as well as Brownian motion. It is further shown that the sample proportion of binary choices follows an arc sine law and therefore spends most of its time in the neighborhood of zero or unity. The result has implications for policy decision making that involves binary choices and where the decisions depend on economic fundamentals that involve stochastic trends. Our limit theory shows that, in such conditions, policy is likely to manifest streams of little intervention or intensive intervention. KEYWORDS: Binary choice model, Brownian motion, Brownian local time, dual conver- gence rates, integrated time series, maximum likelihood estimation. 1. INTRODUCTION BINARY CHOICE MODELS ARE NOW A STANDARD TOOL of microeconometrics and have been widely used in empirical research. While most of the empirical applications have been to cross-section data, there are many situations in time series modeling where binary choice dependent variables arise. Ongoing eco- nomic decisions by individual agents over time constitute prime examples: consumers decide to buy certain durable goods, firms choose to retain or fire employees, unions decide to strike or capitulate, women choose to join the workforce, and so on. At the national level, monetary authorities decide to intervene in the market for securities to change interest rates, or intervene in the foreign exchange market to influence exchange rates. One way of formally 1 Park thanks the Department of Economics at Rice University, where he is an Adjunct Professor, for its continuing hospitality, and the Cowles Foundation for support during a visit in June, 1998. Phillips thanks the NSF for support under Grant Nos. SBR 94-22922 and SBR 97-30295. The paper was typed by the authors in SW2.5 and computational work was performed in GAUSS. 1249

Upload: others

Post on 24-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

Ž .Econometrica, Vol. 68, No. 5 September, 2000 , 1249]1280

NONSTATIONARY BINARY CHOICE

BY JOON Y. PARK AND PETER C. B. PHILLIPS1

This paper develops an asymptotic theory for time series binary choice models withnonstationary explanatory variables generated as integrated processes. Both logit and

Ž .probit models are covered. The maximum likelihood ML estimator is consistent but anew phenomenon arises in its limit distribution theory. The estimator consists of amixture of two components, one of which is parallel to and the other orthogonal to thedirection of the true parameter vector, with the latter being the principal component. TheML estimator is shown to converge at a rate of n3r4 along its principal component but hasthe slower rate of n1r4 convergence in all other directions. This is the first instance knownto the authors of multiple convergence rates in models where the regressors have the

Ž .same full rank stochastic order and where the parameters appear in linear forms ofthese regressors. It is a consequence of the fact that the estimating equations involvenonlinear integrable transformations of linear forms of integrated processes as well aspolynomials in these processes, and the asymptotic behavior of these elements is quitedifferent. The limit distribution of the ML estimator is derived and is shown to be amixture of two mixed normal distributions with mixing variates that are dependent uponBrownian local time as well as Brownian motion.

It is further shown that the sample proportion of binary choices follows an arc sine lawand therefore spends most of its time in the neighborhood of zero or unity. The result hasimplications for policy decision making that involves binary choices and where thedecisions depend on economic fundamentals that involve stochastic trends. Our limittheory shows that, in such conditions, policy is likely to manifest streams of littleintervention or intensive intervention.

KEYWORDS: Binary choice model, Brownian motion, Brownian local time, dual conver-gence rates, integrated time series, maximum likelihood estimation.

1. INTRODUCTION

BINARY CHOICE MODELS ARE NOW A STANDARD TOOL of microeconometrics andhave been widely used in empirical research. While most of the empiricalapplications have been to cross-section data, there are many situations in timeseries modeling where binary choice dependent variables arise. Ongoing eco-nomic decisions by individual agents over time constitute prime examples:consumers decide to buy certain durable goods, firms choose to retain or fireemployees, unions decide to strike or capitulate, women choose to join theworkforce, and so on. At the national level, monetary authorities decide tointervene in the market for securities to change interest rates, or intervene inthe foreign exchange market to influence exchange rates. One way of formally

1 Park thanks the Department of Economics at Rice University, where he is an Adjunct Professor,for its continuing hospitality, and the Cowles Foundation for support during a visit in June, 1998.Phillips thanks the NSF for support under Grant Nos. SBR 94-22922 and SBR 97-30295. The paperwas typed by the authors in SW2.5 and computational work was performed in GAUSS.

1249

Page 2: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1250

characterizing these situations is through a binary choice model with covariatesthat are relevant to the choice being made. In such time series applications, thecovariates will often involve nonstationary data reflecting relevant economicconditions. For instance, monetary authority decisions can be assumed to bebased on macroeconomic fundamentals, which will include such variables as realoutput and its components, inflation, unemployment rates, and other indicatorsof economic conditions and performance. Many of these variables displaynonstationary characteristics. In such situations, we may well expect the econo-metric theory of binary choice models to be different from that of traditionalcross-section theory which relies heavily on the simplifying assumption ofindependence across observations.

This paper seeks to develop a new theory for binary choice regressions thataccommodates nonstationary data. In particular, we study binary choice modelswith covariates that are integrated time series and develop a new limit theory for

Ž .the maximum likelihood ML estimation of such systems. Since ML estimationin discrete choice models involves nonlinear optimization, the asymptotic theoryin this paper involves asymptotics for nonlinear functions of integrated time

Ž Ž ..series. Some recent work by the authors Park and Phillips 1998, 1999 hasprovided techniques for analyzing nonlinear regressions with nonstationary timeseries, and this paper shows how to utilize some of those techniques in thecontext of discrete choice models. The main theoretical contribution of thepaper is to derive an asymptotic theory for the ML estimation of nonlineardiscrete choice models with covariates that are integrated time series. Some ofthe results obtained here are directly applicable in the wider context of Mestimation. They also lay the groundwork for the asymptotic analysis of indexmodels consisting of integrated time series.

One of the main findings of the paper is that there are dual rates ofconvergence in binary choice models with integrated regressors. There is a fastrate of convergence of n3r4 in a direction that is orthogonal to that of the truecoefficient vector b . A slower rate of convergence of n1r4 applies in all other0directions. Such dual convergence rates are unexpected in econometric modelswhere the covariates have the same full rank stochastic order, as they do here,and the parameters appear in linear forms of these regressors. In fact, to thebest of our present knowledge, this is the first instance of the phenomenon inasymptotic statistical theory. The explanation for the phenomenon in the pre-sent case is that the nonlinearity arising from the discrete choice probabilityframework confabulates the signal from the regressors. Since the signal worksthrough the probability function, it involves sample moments of the covariates xtin conjunction with scalar functionals of a distribution function evaluated at thelinear form xX b . These nonlinear sample moment functionals have stochastict 0orders that depend on the direction in which they are evaluated. In effect, thereis more sample information about b in directions orthogonal to b than there0are in other directions and, in particular, along b . Heuristically, the signal from0x is attenuated along b because large deviations of xX b contribute less tot 0 t 0the sample second moment since they are attenuated by the scaling of a density

Page 3: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1251

function that is evaluated in the same direction and that, by its very nature,downweights large deviations. Thus, by virtue of the formulation of the model,and somewhat ironically, there is less information in the data in the direction ofthe true parameter than there is in the orthogonal direction.

Another finding of the paper is that in nonstationary binary choice the sampleproportion of binary choices follows an arc sine law and therefore spends mostof its time in the neighborhood of zero or unity, just as a random walk spendsmost of its time on one side of the origin or the other. This result has sometestable empirical implications for policy decision making. In particular, if policyinvolves market interventions and these are influenced by any factor thatinvolves a stochastic trend, then the theory suggests that market interventionswill most likely occur in streams of little intervention or large numbers ofinterventions.

The paper is organized as follows. Section 2 outlines the model, assumptions,and gives some preliminary results. Section 3 gives the main results on the limittheory of the ML estimator. Section 4 gives a brief numerical illustration of theeffects on nonstationarity on logit and probit estimates. Section 5 concludes.Some useful lemmas are given in Appendix A, Appendix B gives proofs of themain theorems, and notation is summarized in Appendix C.

2. THE MODEL, ASSUMPTIONS, AND PRELIMINARY RESULTS

We consider the regression model given by

Ž . U X1 y sx b y« ,t t 0 t

where x is a vector of explanatory variables and « is an error. The dependentt tvariable yU is assumed to be latent and the observed variable is simply thetindicator

Ž . � U 42 y s1 y G0 .t t

Ž . Ž .The model given by 1 and 2 is a standard binary choice model. As usual, weŽ .assume that x is predetermined, i.e., x is adapted to some filtration FF ,t tq1 t

with respect to which « is measurable.tŽ . Ž .The theory of the binary choice model in 1 and 2 when x is a stationaryt

and ergodic process or short memory time series is obtained by standardmethods. The simple case where x is iid is included in many econometrics textstŽ Ž .. Ž Ž ..e.g., Amemiya 1985 and review articles Dhrymes 1986 , while dynamiccases with weakly dependent data are covered in the theory in WooldridgeŽ . Ž .1994 and White 1994 . The present paper looks, instead, at the case where xtis nonstationary. In particular, we assume that x is an integrated time series,tpossibly of the ARIMA type. All the remaining features of the model areidentical to those of the standard parametric binary choice model. Thus, we let« be iid conditionally on FF with distribution function F, which is assumed tot ty1be known and standardized, the primary cases of interest being the standard

Page 4: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1252

Ž . Žnormal leading to the probit model and the standard logistic leading to the. mlogit model . Also, we let b be an interior point of a subset of R which we0

assume to be compact and convex.Ž . Ž . Ž X .In this framework we have E y NFF sP y s1NFF sF x b . Then,t ty1 t ty1 t 0

defining u as the residual int

Ž . Ž X .3 y sF x b qu ,t t 0 t

Ž .it is apparent that u , FF is a martingale difference sequence with conditionalt tvariance

Ž . Ž 2 . 2 Ž X .4 E u NFF ss x b ,t ty1 t 0

2Ž . Ž .Ž Ž ..where s z sF z 1yF z . If the conditioning set or the dynamics areŽ . Ž X . Ž .misspecified and E y NFF /F x b , then u in 3 is no longer a martingalet ty1 t 0 t

difference and the theory developed here does not cover that case. Thus, thepresent paper contributes to correctly specified dynamic binary choice models of

Ž .given distributional functional form.The following assumption on the process generating x underpins the asymp-t

totic development. In particular, the linear process structure and the momentconditions on the innovations assist in the use of embedding arguments thatallow for stochastic process representations of key partial sum processes, asshown in Lemma 1 below.

ASSUMPTION 1: Let x sx q¨ with x s0, and wheret ty1 t 0

`

Ž .¨ sP L e s P e ,Ýt t i tyiis0

Ž . ` 5 5with P 1 nonsingular and Ý i P -`. The inno¨ations e are iid with meanis0 i t5 5 rzero and E e -` for some r)8, ha¨e a distribution that is absolutely continu-t

Ž .ous with respect to Lebesgue measure and ha¨e characteristic function w t which5 5 k Ž .satisfies lim t w t s0 for some k)0.5 t 5 ª`

Ž .LEMMA 1: Let Assumption 1 hold. Then there exists a probability space V , FF, Psupporting sequences of random ¨ariables U and V satisfying the following:nt ntŽ .a Jointly for all 1F tFn,

t t1 1Ž .U , V s u , ¨ .Ý Ýnt nt d i iž /' 'n nis1 is1

Ž .b There exists a representation

TntU sUnt ž /n

Ž .with standard Brownian motion U and time changes T in V , FF, P . Let T snt ntt ŽŽ Ž ..Tnt r n Ž . tq1. Ž . Ž 2Ý t and define FF ss U r , V . Then E t NFF sE u Nis1 ni nt rs1 n s ss1 nt n, ty1 t

Page 5: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1253

. Ž r . Ž < < 2 r .FF and E t NFF Fc E u NFF for all rG1, where c is some constantty1 nt n, ty1 r t ty1 rdepending only upon r.Ž .c Defining

n ty1 tŽ .V r s V 1 F r- ,Ýn nt ½ 5n nts1

w xm w xthen V ª V in D 0, 1 , the m-fold Cartesian product of the space D 0, 1n a. s.Ž .endowed with the uniform topology, where V is Brownian motion in V , FF, P with

¨ariance matrix S.

For the development of our theory, it is convenient to assume that b /0 and0Ž .to rotate the regressor space using an orthogonal matrix Hs h , H with1 2

Ž X .1r2h sb r b b . We then define1 0 0 0

Ž . X X5 x sh x and x sH x .1 t 1 t 2 t 2 t

Conformable with this rotation, define

Ž . X X6 V sh V and V sH V ,1 1 2 2

Ž .which are Brownian motions of dimensions 1 and my1 , respectively.Our subsequent theory involves the local time of the process V , which we1

Ž .denote by L t, s , where t and s are the temporal and spatial parameters.V1Ž . Ž . Ž .L t, s is a stochastic process in time t and space s and represents theV1

sojourn density of the process V around the spatial point s over the time1w x Ž .interval 0, t . The reader is referred to Revuz and Yor 1994 for an introduc-

Ž .tion to the properties of local time and to Phillips 1998 , Phillips and ParkŽ . Ž .1998 , and Park and Phillips 1998, 1999 for applications of this process in timeseries. In the representation of our limit theory it is especially convenient to usethe scaled local time of V given by1

Ž . Ž . Ž . Ž .7 L t , s s 1rs L t , s ,1 11 V1

Ž .where s is the variance of V . The process L t, s is called chronological11 1 1Ž .local time in Phillips and Park 1998 because it measures the sojourn time in

chronological units that the process spends in the vicinity of the spatial point s.It is defined by the limit

1 tŽ . � < Ž . < 4L t , s s lim 1 V r ys -« dr .H1 12««ª0 0

Ž . Ž .The notation in 6 and 7 will be used frequently in the paper without anyfurther reference.

Since we will be dealing with nonlinear functions of the integrated process x ,tit aids our theory to be more specific about the class of functions we willconsider. In the development that follows we draw upon the general approach of

Ž .Park and Phillips 1999 in studying nonlinear transformations of integratedprocesses. Here we will concentrate on functions that typically arise in the

Page 6: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1254

Ž .context of binary choice models such as 3 above, viz. distribution functions,probability densities and their derivatives. These functions play an importantrole in the subsequent development of our theory.

A function f : RªR will be called regular if it is bounded, integrable, anddifferentiable with bounded derivative. We denote by F the class of regularRfunctions. We also consider the class F of bounded and integrable functions,Iand the class F of functions that are bounded and vanish at infinity. Clearly,0

Ž . k Ž .F ;F ;F . Further, we define f by f x sx f x for any function f. ByR I 0 k k

convention, f gF , F or F when f gF , F or F for all is0, . . . , k . The firstk R I 0 i R I 0˙ ¨and second derivatives of f are denoted, respectively, by f and f , when they

exist.We now make some assumptions on the distribution function F of « . We willt

make extensive use of the following notation:

˙ ˙ 2Ž . Ž . Ž .8 GsFrF 1yF ; KsGFsG F 1yF .

˙ ¨ ˙ ¨ASSUMPTION 2: F is three times differentiable so that F, F, G and G all exist.˙ ˙˙ ¨ ¨ 1r2 1r2Ž . Ž . Ž . Ž . Ž Ž . . Ž .Further: a K gF , b F , GF , GF , GF 1yF gF , and c2 R 1 2 2 2 I

˙ 1r2 1r2 3 ˙Ž Ž . . Ž .GF 1yF , G F gF .2 4 0

Ž . x Ž x.For the logit model, F x se r 1qe and we have Gs1. Consequently,˙ x x 2Ž . Ž .KsF and so K x se r 1qe , the density of the logistic distribution.

Ž .Condition a of Assumption 2 is therefore clearly satisfied. Moreover, since˙ Ž . Ž .Gs0, it is simple to check the conditions in b and c of Assumption 2, all of

which hold trivially.For the probit model, it is also not difficult to verify that the conditions in

Assumption 2 hold. Indeed, using Mills ratio we have

x¡ y2w Ž .x1qO x as xª`,Ž .w x Ž .F x~Ž .G x s s yxŽ .Ž Ž ..F x 1yF x y2w Ž .x1qO x as xªy`,¢ Ž .1yF x

where w is the standard normal density function and F is the correspondingŽ .cumulative distribution function. It is apparent from these formulae that G x

Ž < <. < <sO x for large x , and that G is differentiable with a bounded derivative. Itfurther follows that

2Ž .w x˙Ž . Ž . Ž .K x sG x F x s Ž .Ž Ž ..F x 1yF x

Ž .¡xw xy2w Ž .x1qO x as xª`,Ž .F x~s Ž .yxw x

y2w Ž .x1qO x as xªy`,¢ Ž .1yF x

Page 7: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1255

Ž . Ž .and so K is regular and satisfies condition a of Assumption 2. Conditions bŽ .and c of Assumption 2 follow upon some further routine calculations.

Ž X .For each t, the conditional log likelihood of y given FF is y log F x b qt ty1 t tŽ . w Ž X .x Ž .1yy log 1yF x b . The implied conditional log likelihood for the modelt tŽ . Ž .1 and 2 has the familiar form

n nX XŽ . Ž . Ž . Ž . w Ž .x9 log L b s y log F x b q 1yy log 1yF x b ,Ý Ýn t t t t

ts1 ts1

and this function is well known to be globally concave in the logit and probitŽ Ž ..cases. For example, in the logit case we have c.f. Amemiya 1985, p. 273

2 n xXbt­ log L en X XŽ . Ž . Ž .10 sy L 1yL x x , L sL x b s ,XÝ t t t t t t x bt­b­b9 1qets1

which is negative definite for all b , just as it is when the covariates comprisecross-section observations.

Ž .The hessian 10 is a weighted sample second moment of the integratedprocess x . The weights are given by the logistic density at xX b , viz.t t

e xXt b

Ž .L 1yL s ,t t X 2x btŽ .1qe

and are nonlinear integrable functions of x . In contrast to the case of cross-tsection data that is studied in the literature, or the case where x is a stationarytand ergodic time series, the normed hessian

2 n1 ­ log L 1n XŽ .sy L 1yL x xÝ t t t tn ­b­b9 n ts1

does not converge in probability to a negative definite matrix when x is antintegrated process. In fact, as we will show, the hessian matrix ­ 2 log L r­b­b9nhas elements with different stochastic orders in different directions and, whenappropriately normed to accommodate this divergence, the matrix convergesweakly to a random limit matrix, not a constant matrix. The random limit matrixis negative definite almost surely and so the limit function is also globallyconcave.

3. MAIN RESULTS

ˆ Ž . Ž .Let b be the maximum likelihood estimator of b in 1 . From 9 and usingn 0Ž . Ž . Ž .the notation 8 we can write the score S b and hessian J b asn n

nX XŽ . Ž . Ž . Ž Ž ..11 S b s G x b x y yF x b ,Ýn t t t t

ts1n n

X X X X X˙Ž . Ž . Ž . Ž . Ž Ž ..12 J b sy K x b x x q G x b x x y yF x b .Ý Ýn t t t t t t t tts1 ts1

Page 8: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1256

ˆAs usual in ML limit theory, the asymptotic distribution of b will be obtainednfrom the expansion

ˆ ˆŽ . Ž . Ž .13 0sS b sS b qJ b b yb ,Ž . Ž .n n n 0 n n n 0

ˆ ˆŽ . Ž . Ž .where S b and S b are the scores at b and b respectively, and J bn n n 0 n 0 n n

ˆis the hessian matrix with rows evaluated at mean values between b and b . Asn 0it stands, this is a conventional problem in extremum estimation. However, in

Ž . Ž .the present case, the limits of the score S b and the hessian J b aren 0 n 0nonstandard and our first step is to characterize these limits.

Ž . Ž .To analyze the asymptotic behavior of S b and J b , we need to rotaten 0 n 0the coordinate system and reparameterize the model. Corresponding to the

Ž .transformation of x in 5 , lett

a shX b and a sH X b ,1 1 2 2

Ž X . Ž 0 0X. ŽŽ X .1r2 .and define as a , a 9sH9b and a s a , a 9s b b , 0 9sH9b .1 2 0 1 2 0 0 0Since maximum likelihood estimation is invariant with respect to reparameteri-

ˆzation, if a is the maximum likelihood estimator of a , then a sH9b orˆ ˆn 0 n nˆ Ž . Ž .b sHa . The score function S a and hessian J a for the parameter a canˆn n n n

Ž . Ž .be obtained from those of a simply by using the relationships S a sH9S bn nŽ . Ž . Ž .and J a sH9J b H. Furthermore, by premultiplying the expansion 13 byn n

Ž .H9 it is apparent that the expansion remains valid for S a , i.e.,ˆn n

Ž . Ž . Ž . Ž .Ž .14 0sS a sS a qJ a a ya ,ˆ ˆn n n 0 n n n 0

Ž . Ž . Ž .where J b is replaced by J a sH9J b H.n n n n n nThe next two lemmas provide a limit theory for sample moments and

covariance functions which assist in analyzing the asymptotic behavior of theŽ . Ž .score function 11 and hessian 12 .

LEMMA 2: Let Assumption 1 hold, and f : RªR be regular. Then we ha¨e:n `1

Ž . Ž . Ž . Ž .a f x ª L 1, 0 f s ds,Ý H1 t d 1'n y`ts1n `1 1Ž . Ž . Ž . Ž . Ž .b f x x ª V r dL r , 0 f s ds,Ý H H1 t 2 t d 2 1n 0 y`ts1

n `1 1XŽ . Ž . Ž . Ž . Ž . Ž .c f x x x ª V r V r 9 dL r , 0 f s ds,Ý H H1 t 2 t 2 t d 2 2 13r2n 0 y`ts1

jointly as nª`.

LEMMA 3: Let Assumption 1 hold, and assume s 2 f 2, s 2 g 2 gF , s 2 f , s 2 ggFR Iand s 2 f 4, s 2 g 4 gF for f , g : RªR. Then we ha¨e0

y1r4 n Ž .n Ý f x uts1 1 t t 1r2Ž . Ž .15 ª M W 1 ,dy3r4 nž /Ž .n Ý g x x uts1 1 t 2 t t

Page 9: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1257

where

` `12Ž . Ž . Ž . Ž . Ž . Ž .L 1, 0 f s ds dL r , 0 V r 9 f s g s dsH H H1 s 1 2 s sy` 0 y`

Ms ,` `1 1 2Ž . Ž . Ž . Ž . Ž . Ž . Ž . Ž .V r dL r , 0 g s f s ds V r V r 9 dL r , 0 g s ds� 0H H H H2 1 s s 2 2 1 s

0 y` 0 y`

with f ss f , g ss g and W is m-dimensional Brownian motion with co¨ariances s

matrix I, which is independent of V.

REMARKS: 1. Let V sV ys sy1V , where s and s are respectively2.1 2 21 11 1 11 12the variance of V and the covariance of V and V . We have1 1 2

1 1Ž . Ž . Ž . Ž .V r dL r , 0 s V r dL r , 0 a.s.,H H2 1 2.1 10 0

1 1Ž . Ž . Ž . Ž . Ž . Ž .V r V r 9 dL r , 0 s V r V r 9 dL r , 0 a.s.,H H2 2 1 2.1 2.1 10 0

1 Ž . Ž . � Ž .since H V r dL r, 0 s0 a.s., a result that is easy to deduce because r : V r0 1 1 14 Ž . Ž Ž ..s0 is the support of the measure dL r, 0 e.g. Revuz and Yor 1994, Ch. VI .1

Ž .We may therefore regard V as being independent of V and hence of L in so2 1 1far as the process V arises in the representations of the limit distributions in2Lemmas 2 and 3. The limiting distribution in Lemma 3 is mixed Gaussian. Themixing variates are dependent upon the local time L of V as well as V . We1 1 2

Ž .denote the limit distribution by MN 0, M .Ž .2. Note that the components in 15 are asymptotically dependent in general,

in spite of their differing rates of convergence. A special case in applicationswhen the conditional covariance matrix M is block diagonal is discussed below.

'Ž . Ž .3. In part a of Lemma 2 the sample mean of f x is standardized by n1 trather than the usual n. This reduction in norming arises because the function fis integrable and therefore attenuates contributions from large values of x .1 t

Ž . Ž .The increase in the norming factor that appears in parts b and c arises fromthe presence of the integrated regressor x in the sample moment.2 t

Ž4. If x were replaced by a stationary variate as it would in some directions2 t '.were x to be cointegrated , then the norming would return to n . Thus,2 tsuppose x is stationary, satisfies the same conditions as ¨ in Assumption 1 and3 t tis independent of u . Then, a simple derivation along the lines of the proof oftLemma 2 reveals that

n `1XŽ . Ž . Ž . Ž .16 f x x x ª L 1, 0 f s dsS ,Ý H1 t 3 t 3 t d 1 33'n y`ts1

Ž X .where S sE x x . Likewise, we find that33 3 t 3 t

n `y1r4 2Ž . Ž . Ž . Ž .17 n g x x u ª MN 0, L 1, 0 g s dsS .Ý H1 t 3 t t d 1 s 33ž /y`ts1

Page 10: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1258

'Thus, the presence of stationary elements in the regressors does not affect the n4'Ž .convergence rate of the sample mean in Lemma 2 a or the n convergence

Ž . Ž . Ž .rate of the sample covariance in Lemma 3 a . Results 16 and 17 ensure thatthe coefficients of stationary regressors in binary choice models where there arealso some integrated regressors behave in a similar way to the coefficients of x .1 t

Using these results we are now able to characterize the limit forms of theŽ . Ž .score function 11 and hessian 12 .

THEOREM 1: Let Assumptions 1 and 2 hold. Theny1 Ž . 1r2 Ž . y1 Ž . y1D S a ª Q W 1 and D J a D ª yQ,n n 0 d n n 0 n d

Ž 1r4 3r4 .jointly, where D sdiag n , n I ,n my1

` `12 0 0Ž . Ž . Ž . Ž . Ž .L 1, 0 s K a s ds dL r , 0 V r 9 sK a s dsH H H1 1 1 2 1y` 0 y`Ž .18 Qs ,

` `1 10 0Ž . Ž . Ž . Ž . Ž . Ž . Ž .V r dL r , 0 sK a s ds V r V r 9 dL r , 0 K a s ds� 0H H H H2 1 1 2 2 1 10 y` 0 y`

0 Ž X .1r2a s b b , and W is defined as in Lemma 3.1 0 0

If « has a symmetric distribution, as in the probit and logit models, K is anteven function. We therefore have

`Ž .sK s dss0.H

y`

In this case, the matrix Q given in Theorem 2 reduces to a block diagonalmatrix.

Ž . Ž .The asymptotic results for S a and J a presented in Theorem 1 aid inn 0 n 0Ž .deriving the limiting distribution of a . Indeed, from the expansion 14 we mayˆn

expect that the normed and centered estimator satisfiesy1y1 y1 y1Ž . Ž . Ž Ž . . Ž . Ž .19 D a ya sy D J a D D S a qo 1 .ˆn n 0 n n 0 n n n 0 p

Ž .Indeed, 19 is established in the proof of Theorem 2 below. This expansion isŽ .enough to deliver the form of the limit distribution of D a ya .ˆn n 0

THEOREM 2: Let Assumptions 1 and 2 hold. Then, there exists a sequence ofML estimators for which a ª a , andˆn p 0

Ž . y1r2 Ž .D a ya ª Q W 1 ,ˆn n 0 d

in the notation introduced in Theorem 1.

REMARKS: 1. As usual for local extremum estimation problems, Theorem 2establishes the existence of a consistent root of the likelihood equation. The loglikelihood function is well known to be concave in the logit and probit casesŽ Ž . Ž Ž ..and, indeed, in all cases where log F x and log 1yF x are concave func-

Page 11: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1259

Ž ..tions; see Pratt 1981 , and in such cases the consistent root is unique and is theglobal maximum.

Ž X . ` Ž .2. Let a s a , a 9. When H sK s dss0, we have the limitsˆ ˆ ˆn 1n 2 n y`

y1r2`

1r4 2Ž . Ž . Ž . Ž .n a y1 ª L 1, 0 s K s ds W 1 ,ˆ H1n d 1 1ž /y`

y1r2`13r4 2Ž . Ž . Ž . Ž . Ž .n a ª V r V r 9 dL r , 0 s K s ds W 1 ,ˆ H H2 n d 2 2 1 2ž /0 y`

Ž X.where Ws W , W 9 for W defined in Theorem 2. The limiting distributions of1 2a and a are therefore dependent only through their mixing variates givenˆ ˆ1n 2 nby V. Consequently, in this case a and a become asymptotically indepen-ˆ ˆ1n 2 ndent conditional on x .t

3. It follows from Theorem 2 that

ˆ y1r2 y1Ž . Ž . Ž .20 D H9 b yb ª Q W 1 sMN 0, Q .Ž .n n 0 d

y1r4 'Ž .Setting E sn D sdiag 1, n I we haven n my1

y1y1r4 y1Ž . Ž .D H9n sHE ª h , 0n n 1

so that

4 y1r2 y1ˆ' Ž . Ž . Ž Ž . Ž . .n b yb ª h , 0 Q W 1 sMN 0, h , 0 Q h , 0 9Ž .n 0 d 1 1 1

Ž X y1 .sMN 0, h h q ,1 1 11.2

where

Ž . y121 q sq yq Q q ,11 .2 11 12 22 21

which is expressed in terms of the elements of the partitioned matrix

q q11 12Ž .22 Qs ž /q Q21 22

Ž .with Q defined in 18 . We formalize this result as follows.

COROLLARY 1: Under Assumptions 1 and 2, as nª`

4 y1ˆ' Ž .n b yb ª P MN 0, q ,Ž .n 0 d b 11.20

Ž . Ž . Ž X .y1 Xwhere q is gi en by 21 and 22 and P sb b b b .11.2 b 0 0 0 00

According to this Corollary we have the asymptotic approximation

1y1ˆŽ .23 b ; MN b , P q .n d 0 b 11.20ž /'n

Page 12: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1260

ˆŽ .A natural estimate of the conditional covariance matrix of b is provided bynˆ y1Ž .the hessian inverse yJ b , or the more commonly used alternativen n

ˆ y1Ž .yJ b , wheren n

nX XŽ . Ž .J b sy K x b x xÝn t t t

ts1

Ž . Ž .is the negative definite first component of J b in 12 . The next result showsnˆthat these matrices accurately represent the asymptotic covariance matrix of bn

Ž .in 23 .

THEOREM 3: Under Assumptions 1 and 2,

y1 y1y1r2 y1r2 y1ˆ ˆy n J b , y n J b ª P qŽ . Ž .n n n n d b 11.20

as nª`.

ˆIt follows that the usual construction for asymptotic standard errors of bnapplies, even though there are different rates of convergence in the componentsof the estimator. The situation is analogous to that considered by Park and

Ž . Ž .Phillips 1989 in regressions with cointegrated regressors. Moreover, by 20 theˆlimit distribution of b is mixed normal in all directions, albeit at different rates,n

and this fact ensures that Wald tests of restrictions on b have asymptotic0chi-squared distributions, so that statistical inference can proceed in the usualmanner.

` Ž .COROLLARY 2: Let Assumptions 1 and 2 hold. If H sK s dss0, theny`

y1r2`b4 0 2ˆ' Ž . Ž 5 5. Ž .n b yb ª L 1, 0 s K s b ds W 1Ž . Hn 0 d 1 0ž /5 5b y`0

5 5 Ž X .1r2where b s b b , and W is uni ariate standard Brownian motion indepen-0 0 0dent of V.

ˆIt is clear from Corollaries 1 and 2 that b is asymptotically dominated by thencomponent that converges slower. The overall convergence rate is thus given byn1r4, but the limit distribution is singular. It is degenerate along the direction

ˆthat is orthogonal to the true parameter vector b , and, in that direction, b0 nconverges at the faster rate of n3r4.

ˆ X ˆŽ .Also of interest are the predicted probabilities FsF x b and the estimatedt nX ˆ ˆŽ .marginal effects f x b b , where f is the density function corresponding to F.t n n

The limit theory for these functionals is given in the following result.

COROLLARY 3: Let Assumptions 1 and 2 hold. Gi en x sx, the predictedtˆ ˆ ˆ ˆŽ . Ž .probability FsF x9b and estimated marginal effect g s f x9b b ha¨e theˆn x n n

Page 13: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1261

following asymptotic distributions as nª`

1 2 y1ˆŽ . Ž . Ž .24 F x9b ; MN F x9b , f x9b x9P xq ,Ž .n d 0 0 b 11.20ž /'n

and

1 2 y1Ž . Ž . w Ž . Ž . x25 g ; MN g x9b , f x9b q f 9 x9b x9b P q ,x d 0 0 0 0 b 11.20ž /'n

Ž . Ž .where g x9b s f x9b b .0 0 0

In both cases, the rate of convergence is n1r4. However, the limit distributionof the estimated marginal effects is singular and the faster rate n3r4 applies indirections orthogonal to b . In the Probit case, the asymptotic variance has the0very simple form

1 22 2 y1Ž . Ž .w x9b 1y x9b P q .0 0 b 11.20'n

Ž . Ž .The asymptotic variance formulae given in 24 and 25 correspond to thoseŽ Ž ..given in the literature for the iid or stationary case e.g. Greene 1997 ,

Ž .although the singularity in 25 does not arise in that case. Notwithstanding thesingularity, one can estimate the asymptotic variance matrix of the coefficientestimator by the inverse of the hessian as indicated in Theorem 3, and it isapparent that standard errors for the predicted probabilities and the estimatedmarginal effects may be computed in the usual manner. Thus, the main effect ofnonstationarity is to slow down the rate of convergence in these estimates.

Finally, it is of interest to study the asymptotic behavior of the empiricalaverage r sny1Ýn y . The quantity r is an aggregate proportion and mea-n ts1 t n

Ž .sures the proportion of positive choices i.e. y s1 outcomes in the sample data.tIt can also be used in a predictive manner to forecast the proportion of positive

�choices for some given sequence of data on the covariates, say Xs X : tst4 Ž . � X 41, . . . , n . In this event, we can define y X s1 X b G« . Of course, int t 0 t

Ž .practical applications y X is unobserved, and we would therefore use thety1 n ˆ ˆ X ˆŽ . Ž . Ž . Ž .estimate r X sn Ý F X , where F X sF X b . Such estimates aren ts1 t t t n

useful in policy situations in assessing the likely proportionate number of� 4positive choices arising in the event of the data X : ts1, . . . , n . An example int

monetary policy would be the likely number of market interventions needed fora given scenario of economic fundamentals.

The following result gives the limit theory for such quantities.

�THEOREM 4: Let Assumptions 1 and 2 hold. Suppose the time series Xs X : tt4s1, . . . , n is drawn independently of x from a process with properties equi alent tot

those of x as gi en in Assumption 1. Then the sample proportion r sny1Ýn y ,t n ts1 tŽ . y1 n Ž .the predicted proportion r X sn Ý y X , and the estimated proportionn ts1 t

Page 14: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1262

y1 n ˆ X ˆŽ . Ž .r X sn Ý F X b all ha¨e the following limit beha¨ior as nª`:n ts1 t t n

`Ž . Ž . Ž . Ž .26 r , r X , r X ª L 1, s ds.ˆ Hn n n d 1

0

Ž . y1 Ž .An elementary calculation shows that L 1, s s s L 1, srs where ss1 d1r2 Ž .s and L 1, s is the local time of standard Brownian motion. Then11

` ` sy1Ž .L 1, s dss s L 1, dsH H1 d ž /s0 0

` 1Ž . � Ž . 4s L 1, t dts 1 W r )0 dr ,H H0 0

Ž Ž ..a quantity that is well known e.g., Revuz and Yor 1994, p. 232]233 to be arandom variable that follows the arc sine law with probability density

1Ž .27 ' Ž .p x 1yx

w xon 0, 1 . We deduce that the empirical average r and predictive averagesnŽ . Ž .r X , and r X all have the same limit distribution given by the arc sine lawˆn n

Ž .27 . This result is decidedly different from the stationary case, where r ªn pw Ž X .xE F x b and the expectation is taken with respect to the stationary distribu-t 0

tion of x .tThe fact that r follows an arc sine law in the limit means that the samplen

proportion of positive choices spends most of its time in the neighborhood ofzero or unity, just as a random walk spends most of its time on one side of theorigin or the other. To take the explicit example of a monetary policy interven-tion, the theorem has the following implication. If the economic fundamentalsthat determine monetary policy intervention include a stochastic trend, then thelikely number of market interventions needed for any given scenario of funda-

Ž .mentals is determined by the arc sine law 27 . Thus, the theorem tells us that itis most likely that interventions will occur in streams giving periods where thereis very little intervention or periods where there are a large number of interven-tions. This result holds for any given trajectory of fundamentals and for anyparticular policy determining mechanism, i.e., any linear form xX b provided itt 0has a stochastic trend.

4. ILLUSTRATION OF THE EFFECTS OF NONSTATIONARITY

This section reports a brief numerical exercise that illustrates the effects ofŽ .nonstationarity on a binary choice regression. Data were generated from 1 and

Ž .2 using both logit and probit formulations for F and with exogenous covariatesx , which were generated by a bivariate vector autoregression of the formt

a 0x x ¨111 t 1 ty1 1 ts q ,x x ¨ž / ž / ž /ž /0 a2 t 2 ty1 2 t22

Page 15: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1263

0 0 Ž .FIGURE 1.}Logit model}densities of estimators of b s1, b s0; I 1 case.1 2

Ž . Ž . Ž .with ¨ s ¨ , ¨ 9' iiN 0, I . Both unit root a s1, is1, 2 and stationaryt 1 t 2 t 2 i iŽ .a s0.5, is1, 2 cases were examined. The parameter value was set at b si i 0Ž . X 0 Ž .1, 0 9. Thus, x b sb x sx and the direction orthogonal to b is 0, 1 ,t 0 1 1 t 1 t 0giving the coefficient b 0 s0 of x . The number of replications was 5,000.2 2 t

Figures 1 and 2 show kernel estimates of the sampling distributions of theŽ . 0 0correctly specified logit and probit estimates of the coefficients b and b in1 2

0 0 Ž .FIGURE 2.}Probit model}densities of estimators of b s1, b s0; I 1 case.1 2

Page 16: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1264

FIGURE 3.}Logit model}densities of estimators of b 0 s1, b 0 s0; stationary case.1 2

Ž .the unit root case a s1, is1, 2 for sample sizes ns100, 250, 500. Thei igreater concentration of the estimates of b 0 and the differing rates of conver-2gence between the two coefficients are quite apparent in the figures. Comparing

Ž . Ž .the probit results in Figure 2 with those of the logit Figure 1 , reveals theeffect of the longer tails of the logit distribution}the probit estimates havesubstantially greater dispersion for both coefficients. The reason is simply thatthe probit function attenuates the signal from the nonstationary regressors moreseverely than the logit function because of the thinner tails of the normaldensity. Interestingly, this relationship between the logit and probit estimates in

Ža nonstationary regression viz. that probit estimates are more dispersed than. Ž .logit estimates is the opposite of the well known approximate scaling relation-

ship that applies in the standard case, viz. that the logit estimates tend to be'larger than the probit estimates by a factor of close to pr 3 , a feature that

arises from the variance of the logit distribution being p 2r3, compared withŽ Ž ..that of the normal being unity c.f. Greene 1997 .

Figures 3 and 4 show the corresponding estimates for the stationary caseŽ .a s0.5, is1, 2 . There is much less difference between the densities for thei itwo coefficients in this case, just as asymptotic theory for the stationary caseindicates. However, estimates of b 0 are still somewhat more concentrated than2those of estimates of b 0. Interestingly, there is a small but noticeable difference1between the probit and logit densities, with the probit estimates now being moreconcentrated, as theory suggests for the stationary case, which is quite theopposite of the relative behavior in the nonstationary case.

Page 17: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1265

FIGURE 4.}Probit model}densities of estimators of b 0 s1, b 0 s0; stationary case.1 2

5. CONCLUSION

While binary choice models have been a popular tool of microeconometricsfor several decades, time series and panel data applications of these models arealso important and may well increasingly be so in the future. The present paperprovides an asymptotic theory for maximum likelihood estimation of thesemodels in time series contexts. Our principal finding is that dual rates ofconvergence operate in these models when there are multiple integrated regres-

Ž .sors, even though the regressors have the same full rank stochastic order. Thisoutcome is unusual and is the first instance of the phenomena in asymptoticstatistical theory that the authors have encountered. As we have seen, theprobability function formulation of conventional binary choice models serves toattenuate the signal emanating from an integrated regressor by dint of the factthat large values of xX b contribute little to the score and hessian because theyt 0arise in these quantities by way of a probability density that vanishes at infinity.The effect is that significant signal attenuation occurs in all directions exceptthose that are orthogonal to b . It is further shown that the limit distribution0theory of the ML estimator is mixed normal and that conventional methods ofinference remain valid.

ŽAnother finding of some interest is that the sample proportion of positive or.negative choices follows an arc sine law and therefore spends most of its time in

the neighborhood of zero or unity, just as a random walk spends most of its timeon one side of the origin or the other. This result has some empirical implica-tions for policy decision making, and in particular market interventions. Thus, ina situation where any of the determining factors involves a stochastic trend,market interventions will most likely occur in streams of little intervention or

Page 18: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1266

large numbers of interventions. This is obviously a testable empirical implicationof the theory that we hope to explore in later work.

There is scope for extending the results of the paper to some related models.A partial list of extensions that seem valuable for empirical work includesmultivariate models, poly-chotomous choice, panel data situations and nonpara-metric approaches, thereby covering the common extensions of binary choicethat arise in the microeconometric context. As indicated earlier, it is also

Žpossible to extend our theory to include cointegrated regressors or combina-.tions of integrated and stationary regressors . In fact, the outcomes in this case

Ž . Ž .are anticipated in results 16 and 17 and so they have not been detailed hereas they involve little that is new beyond the results already provided.

School of Economics, Seoul National Uni ersity, Seoul 151-742, Korea;[email protected]; http:rrecon.snu.ac.krr facultyr jparkr index e.html]

andCowles Foundation for Research in Economics, Yale Uni ersity, Box 208281, New

Ha¨en, CT 06520-8281, U.S.A.; [email protected]; http:rr korora.econ.yale.edu

Manuscript recei ed October, 1998; final re¨ision recei ed April, 1999.

APPENDIX A: USEFUL LEMMAS AND PROOFS

LEMMA A1: Let f : RªR and fgF .IŽ .a Let «)0 and define

Ž . < Ž . <f x s sup f xqy .«< <y F«

Then f gF .« IŽ .b Let K;R be compact, and define

Ž . < Ž . <f x s sup f cx .KcgK

Then f gF .K I

Ž . Ž .PROOF OF LEMMA A1: It is obvious that both f and f defined in a and b are bounded. It« Ktherefore suffices to show that they are integrable. To simplify the proofs, we assume that f is a

Ž . Ž .symmetric and integrable function that is monotone increasing decreasing on R R . This causesy qŽ .no loss in generality, since any integrable function is bounded by such a function. To deduce part a ,

Ž . Ž . < < Ž . Ž . Ž .we only need to note that f x s f 0 for x F« , and f x s f xy« and f xq« respectively for« «

Ž .x)« and x-y« . Clearly, f is integrable if f is. To prove part b , we choose an arbitrary c gK,« 0w x Ž . ŽŽ . .c )0, and its neighborhood N s c y« , c q« for some «)0. Define f x s f c y« x and0 0 0 0 0 0

ŽŽ . . Ž . Ž .f c q« x respectively for xG0 and x-0. By construction, we then have f cx F f x for all0 0cgN and xgR. It is obvious that f is integrable, and the stated result follows immediately from0 0the compactness of K. Q.E.D.

Page 19: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1267

LEMMA A2: Let Assumption 1 hold, and f : RªR. Denote by x k the k-times tensor product of x2 t 2 twith itself. Define

n nk k k kŽ . Ž .M s f x x and N s f x x u .Ý Ýn 1 t 2 t n 1 t 2 t t

ts1 ts1

Ž . k Ž 1q k r2 . k Ž Ž1qk .r2 .a For fgF , M so n . Moreo¨er, if fgF , then M sO n .0 n p I n pŽ . k Ž Ž1qk .r2 .b If s fgF , then N so n .0 n p

Ž X .PROOF OF LEMMA A2: Let Vs V , V 9. Note that1 2

kx2 t k k5 Ž . 5 5 Ž . 5sup s sup V r F sup V r q1-` a.s.d 2 n 2'n1FtFn 0FrF1 0FrF1

y1 n < Ž . < Ž .for all large n. For fgF , we have n Ý f x ª 0, as shown in Park and Phillips 1999 . If0 ts1 1 t py1 r2 n < Ž . < Ž .fgF , it follows from Lemma 2 that n Ý f x sO 1 , since f is bounded by a regularI ts1 1 t p

Ž . Ž .function. The stated results in part a therefore may easily be deduced. To prove part b , we noticethat

n12 2 kyŽ1qk . k 2 25 5 Ž .Ž . 5 5n E N sE s f x xÝn 1 t 2 t1qkž /n ts1

12 k 2 2 '5 Ž . 5 Ž .Ž Ž ..FE sup V r q1 s f n V r dr ª 0,H2 1 n pž /ž /00FrF1

by dominated convergence. Q.E.D.

LEMMA A3: Let Assumption 1 hold. Assume s 2 f 2, s 2 g 2 gF and s 2 f 4, s 2 g 4 gF for f , g : RR 0ªR. Define

2 y1r2 2 Ž . 2 2 y3r2 2 Ž . X 2P sn f x u and P sn g x x x u ,1 nt 1 t t 2 nt 1 t 2 t 2 t t

2 Ž 2 .and Q sE P NFF for is1, 2. Then we ha¨e for is1, 2i n t i n t ty1

t t2 2sup P y Q ª 0,Ý Ýi n s i n s p

1FtFn ss1 ss1

as nª`.

PROOF OF LEMMA A3: Notice first that

2 y1r2 Ž 2 2 . Ž . 2 y3r2 Ž 2 2 .Ž . XQ sn s f x and Q sn s g x x x .1 nt 1 t 2 nt 1 t 2 t 2 t

It follows from Lemma 2 that

n`

2 2 2Ž . Ž .Ž .Q ª L 1, 0 s f s ds,HÝ 1 nt d 1y`ts1

n`12 2 2Ž . Ž . Ž . Ž .Ž .Q ª V r V r 9 dL r , 0 s g s ds,H HÝ 2 nt d 2 2 1

0 y`ts1

and therefore, both are obviously tight.Now we show that

n4Ž .E P NFF ª 0,Ý i n t ty1 p

ts1

Page 20: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1268

for is1, 2. We may assume w.l.o.g. that x is scalar by considering each component separately. It2 tfollows that

Ž 4 . 2 Ž 2 .E u NFF ss 1y3F q3F ,t ty1 t t t

Ž . 2 Ž .where we write F sF x and s sF 1yF for short. We therefore havet 1 t t t t

n n14 2 4 2Ž . Ž .Ž .Ž .E P NFF s s f 1y3Fq3F x ª 0,Ý Ý1 nt ty1 1 t pnts1 ts1

n n14 2 4 2 4Ž . Ž .Ž .Ž .E P NFF s s g 1y3Fq3F x x ª 0,Ý Ý2 nt ty1 1 t 2 t p3nts1 ts1

Ž . Ž .due to Lemma A2 a . The stated result now follows from Theorem 2.23 of Hall and Heyde 1980 .Q.E.D.

LEMMA A4: Let Assumption 1 hold. Assume s 2 f , s 2 ggF and s 2 f 2, s 2 g 2 gF for f , g : RªR.I 0Define

2 y3r4 Ž . 2 2 y5r4 Ž . 2N sn f x u and N sn g x x u .1 nt 1 t t 2 nt 1 t 2 t t

Then we ha¨e for is1, 2

t2sup N ª 0,Ý i n s p

1FtFn ss1

as nª`.

2 Ž 2 .PROOF OF LEMMA A4: We let M sE N NFF for is1, 2, so thati n t i n t ty1

2 y3r4 Ž 2 .Ž . 2 y5r4 Ž 2 .Ž .M sn s f x and M sn s g x x .1 nt 1 t 2 nt 1 t 2 t

It follows from Lemma A2 that Ýn M 2 ª 0 for is1, 2. Moreover,ts1 i n t p

n n4 y3r2 2 2 2Ž . Ž .Ž . Ž .E N NFF sn s f 1y3Fq3F x ª 0,Ý Ý1 nt ty1 1 t p

ts1 ts1

n n4 y5r2 2 2 2 2Ž . Ž . Ž .Ž .E N NFF sn s g 1y3Fq3F x x ª 0,Ý Ý2 nt ty1 1 t 2 t p

ts1 ts1

where we assume that x is scalar, as in the proof of Lemma A3. The stated result therefore follows2 tas in Lemma A3. Q.E.D.

LEMMA A5: Let Assumption 1 hold. Define

'Ž . � Ž . Ž . 4i r s1 kd F n V r - kq1 d ,nk n 1 n n

'Ž . � Ž . 4i r s1 0F n V r -d ,n 1 n n

n 'Ž . � Ž . 4i r s1 0F n V r -d .1 n

Then we ha¨e

2 cd1 n 2< Ž . Ž . < Ž .E i r y i r dr F 1qkd log nH nk n n3r2ž / n0

Page 21: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1269

for all large n, where c is some constant. Moreo¨er,

1 n y5r8< Ž . Ž . < Ž .i r y i r drso nH n p0

for any d Gny1r3.n

Ž .PROOF OF LEMMA A5: The stated result follows from Akonom 1993 , precisely as for Lemma 2.5Ž . 1 < Ž . Ž . < 1 < Ž .of Park and Phillips 1999 . We may just apply his results to H i r y i r dr and H i r y0 nk n 0 n

nŽ . < 1Ž Ž . Ž .. 1Ž Ž . nŽ ..i r dr, instead of H i r y i r dr and H i r y i r dr. Though not stated explicitly, it is0 nk n 0 n1 < Ž . Ž . < 1 < Ž . nŽ . <obvious that all his results are applicable for H i r y i r dr and H i r y i r dr as well as0 nk n 0 n

1Ž Ž . Ž .. 1Ž Ž . nŽ ..H i r y i r dr and H i r y i r dr. Q.E.D.0 n k n 0 n

APPENDIX B: PROOFS OF THE MAIN THEOREMS

Ž . Ž .PROOF OF LEMMA 1: We show how to construct sequences U and V satisfying conditionsnt ntŽ . Ž . Ž .a ] c . In the subsequent construction, let V , FF, P be any probability space rich enough to supportthe Brownian motions U and V in addition to the other random elements that it includes. Also,define the filtrations

Ž . Ž .GG ss u , FF and GG ss U , FF ,t t ty1 nt nt n , ty1

and denote by ?NFF the distribution conditional on a sub-s-field FF .0 0Ž .Let n be given and fixed. First, let V be any random variable on V , FF, P , which has the samen1

y1 r2 y1r2 Ž .distribution as n ¨ , i.e., V s n ¨ . Second, let t be a stopping time defined on V , FF, P ,1 n1 d 1 n1Ž . y1 r2 Žfor which U t rn NFF s n u . Such a stopping time exists, as shown in Hall and Heyde 1980,n1 n0 d 1. Ž .Theorem A1 . We then define a random variable on V , FF, P , denoted by V , such that V yn2 n2

< y1 r2 < Ž .nV GG s n ¨ GG , and so on. It is obvious that we may proceed in this way to find T andn1 n1 d 2 1 nt ts1Ž .n Ž .V on V , FF, P successively so thatnt ts1

t tT 1 1ntU FF s u FF and V GG s ¨ GG ,Ý Ýn , ty1 d i ty1 nt nt d i tž / ' 'n n nis1 is1

Ž . Ž .in a zig-zag fashion. The equalities of the distributions in a and the representation of U in b arentfulfilled by construction. The moment conditions for the stopping times t follow from Hall andnt

Ž . Ž .Heyde 1980, Theorem A1 . Moreover, if we define V as in c , then an invariance principle holdsnŽ .for V , as in Phillips and Solo 1992 . In particular, it follows that V converges weakly to V inn n

w xm Ž Ž .D 0, 1 endowed with uniform topology see, e.g., Billingsley 1968, pp. 150]153 for the uniformw x. Ž .topology in D 0, 1 . We may therefore redefine V so that the distribution of U , V is unchangedn nt nt

w xand V ª V uniformly on 0, 1 . This is possible due to the representation theorem in, e.g., Pollardn a. s.Ž .1984, pp. 71]72 of weakly convergent probability measures by the almost sure convergentsequences. Q.E.D.

Ž . Ž .PROOF OF LEMMA 2: Part a follows directly from Park and Phillips 1999, Theorem 5.1 . ToŽ . Ž X .prove part b , we let V s V , V 9, where V is given in Lemma 1. Definen 1 n 2 n n

kn

Ž . Ž . � Ž . 4f x s f kd 1 kd Fx- kq1 d ,Ýn n n nksyk n

where k and d are sequences of numbers satisfying conditions in the proof of Theorem 5.1 inn nŽ .Park and Phillips 1999 . In particular, k ª` and d ª0. We may show using the notation inn n

Lemma A5 that

1 1' ' ' 'Ž . Ž Ž .. Ž . Ž Ž .. Ž . Ž .28 n f n V r V r drs n f n V r V r drqo 1H H1 n 2 n n 1 n 2 n p0 0

kn1' Ž . Ž . Ž . Ž .s n f kd i r V r drqo 1HÝ n n k 2 n p

0ksyk n

'n` 1Ž . Ž . Ž . Ž .s f s ds i r V r drqo 1 ,H Hn n 2 n pž / dy` 0n

Page 22: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1270

Ž .following the proof of Theorem 5.1, Park and Phillips 1999 . Note that V ª V uniformly and2 n a. s. 2therefore, V is bounded uniformly in n. Also, we have2 n

1 1 1Ž . Ž . Ž . Ž . < Ž . Ž . < < Ž . <i r V r dry i r V r dr F i r y i r V r drH H Hn k 2 n n 2 n n k n 2 n0 0 0

1< Ž . < < Ž . Ž . <F 1q sup V r i r y i r dr .H2 nk nž /

00FrF1

The stated result therefore follows from Lemma A5 exactly as in the proof of Theorem 5.1, Park andŽ .Phillips 1999 .

We have

` `Ž . Ž . Ž .29 f s dsª f s ds.H Hn

y` y`

yd 5r8'Moreover, if we choose d sn with 0-d-1r8 and let p sd r n , then n p ª`. Wen n n ntherefore have from Lemma A5 that

1 11 1 nŽ . Ž . Ž . Ž . Ž . Ž .30 i r V r drs i r V r drqo 1H Hn 2 n 2 pp p0 0n n

since

1 1 nŽ . Ž . Ž . Ž .i r V r dry i r V r drH Hn 2 n 20 0

1 1n n< Ž . Ž . < < Ž . < < Ž . < < Ž . Ž . <F i r y i r V r drq i r V r yV r drH Hn 2 n 2 n 20 0

1 n< Ž . < < Ž . Ž . < < Ž . Ž . <F 1q sup V r i r y i r drq sup V r yV r .H2 n 2 n 2ž /00FrF1 0FrF1

We may write

1 1 1 1nŽ . Ž . Ž . Ž . Ž .31 i r V r drs V r dL r , p s dsH H H2 2 1 np 0 0 0n

Ž Ž ..using the extended occupation times formula see, e.g., Revuz and Yor 1994, Exercise 1.15, p. 222 .Define

1 1 1Ž . Ž . Ž . Ž . Ž .32 R s V r dL r , p s dsy V r dL r , 0 .H H Hn 2 1 n 2 10 0 0

Ž . Ž .Due to 28 ] 31 , it suffices to show that R ª 0, which we now set out to do. Let c be an a. s. nsequence of numbers such that

c ª` and c p 1r2y« ª0,n n n

for some 0-«-1r2. Define

1Ž . Ž . Ž .I x s V r dL r , x ,H 2 10

cn i iq1 iŽ .I x s V L , x yL , x .Ýn 2 1 1ž / ž / ž /ž /c c cn n nis1

Then we have

< <R FA qB qC ,n n n n

Page 23: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1271

where

1< Ž . Ž . <A s I p s yI p s ds,Hn n n n

0

1< Ž . Ž . <B s I p s yI 0 ds,Hn n n n

0

1< Ž . Ž . <C s I 0 yI 0 ds.Hn n

0

We will show that A , B , C ª 0.n n n a. s.Ž .It is obvious that C ª 0 since I 0 exists. To show B ª 0, notice that we haven a. s. n a. s.

Ž . < Ž . Ž . < < <1r2y«33 sup L r , p s yL r , 0 Fk p s a.s.,1 n 1 1 nw xrg 0, 1

Ž . Žfor some constant k , due to the uniform Holder continuity of the local time L r, ? see, e.g.,¨1 1Ž ..Revuz and Yor 1994, Corollary 1.8, p. 217 . Therefore,

cn1 i 1r2y«< Ž . Ž . < < <I p s yI 0 F2k V c p s ,Ýn n n 1 2 n nž /ž /c cn nis1

and consequently,

cn1 i 1 1r2y«1r2y« < <B F2k c p V s dsª 0,HÝn 1 n n 2 a . s .ž /ž /c c 0n nis1

as was to be shown.Finally, we let

cn i i iq1n Ž .V r s V 1 F r- ,Ý2 2 ½ 5ž /c c cn n nis1

so that

1 nŽ . Ž . Ž .I x s V r dL r , x .Hn 2 10

As is well known

Ž . < Ž . n Ž . < y1 r2q«34 sup V r yV r Fk c a.s.2 2 2 n0FrF1

Ž . Ž .It follows from 33 and 34 that

1y1 r2q«< Ž . Ž . < < Ž . <I p s yI p s Fk c dL r , p sHn n n 2 n 1 n0

1 1r2y«y1r2q« < Ž . < < <Fk c dL r , 0 q2k p s ,H2 n 1 1 nž /0

and therefore,

1 1 1r2y«y1r2q« y1r2q« 1r2y«< Ž . < < <A Fk c dL r , 0 q2k k c p s dsª 0.H Hn 2 n 1 1 2 n n a . s .0 0

Ž .The proof for part b is thereby complete.

Page 24: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1272

Ž . Ž .The proof for part c is analogous to that of part b . We have

n1XŽ .f x x xÝ 1 t 2 t 2 t3r2n ts1

1' 'Ž Ž .. Ž . Ž .s n f n V r V r V r 9 drHd 1 n 2 n 2 n0

1' 'Ž Ž .. Ž . Ž . Ž .s n f n V r V r V r 9 drqo 1H n 1 n 2 n 2 n p0

kn1' 'Ž . � Ž . Ž . 4 Ž . Ž . Ž .s n f kd 1 kd F n V r F kq1 d V r V r 9 drqo 1HÝ n n 1 n n 2 n 2 n p

0ksyk n

kn 'n 1 'Ž . � Ž . 4 Ž . Ž . Ž .s d f kd 1 0F n V r -d V r V r 9 drqo 1HÝn n 1 n n 2 n 2 n pž / d 0nksyk n

1` 1Ž . � Ž . 4 Ž . Ž . Ž .s f s ds 1 0FV r -p V r V r 9 drqo 1H H 1 n 2 2 pž / py` 0n

` 1 1Ž . Ž . Ž . Ž . Ž .s f s ds V r V r 9 dL r , p s dsqo 1H H H 2 2 1 n pž /y` 0 0

` 1Ž . Ž . Ž . Ž . Ž .s f s ds V r V r 9 dL r , 0 qo 1 .H H 2 2 1 pž /y` 0

Ž .Each step can be shown rigorously using the arguments in the proof of part b . Q.E.D.

PROOF OF LEMMA 3: We set ms2. This is just for notational simplicity. The proof for theŽ . 2general case is essentially identical. For any cs c , c gR , we let1 2

Ž . y1 r4 Ž . y3 r4 Ž .A x , x sc n f x qc n f x x ,n 1 2 1 1 2 1 2

and define

ty1 T Tni n , iy1' 'Ž . Ž . Ž .35 M r s n A n V U yUÝn n ni ž / ž /ž /n nis1

Tn , ty1' 'Ž . Ž .q n A n V U r yU ,n nt ž /ž /n

for T rn- rFT rn, where T , ts1, . . . , n, are the time changes introduced in Lemma 1. Onen, ty1 nt ntmay easily see that M is a continuous martingale such thatn

n TnnŽ . Ž .36 A x , x u s M ,Ý n 1 t 2 t t d n ž /nts1

for all n.Ž . 2Ž . 2 Ž . w xLet B x , x ss x A x , x . The quadratic variation M of M is given byn 1 2 1 n 1 2 n n

ty1 T T Tni n , iy1 n , ty12 2' 'w xŽ . Ž . Ž .M r sn A n V y qnA n V ryÝn n ni n ntž / ž /n n nis1

n Tnt'Ž . Ž .s B n V 1 rG qo 1 ,Ý n nt p½ 5nts1

Page 25: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1273

w xuniformly in rg 0, 1 , by Lemma A3. Consequently, we have

Ž . w xŽ . Ž .37 M r ª c9M r c,n p

w xuniformly in rg 0, 1 , where

` r `2Ž . Ž . Ž . Ž . Ž . Ž .L r , 0 f s ds dL s, 0 V s 9 f s g s dsH H H1 s 1 2 s s

y` 0 y`Ž .M r s ,r ` r `

2Ž . Ž . Ž . Ž . Ž . Ž . Ž . Ž .V s dL s, 0 g s f s ds V s V s 9 dL s, 0 g s ds� 0H H H H2 1 s s 2 2 1 s0 y` 0 y`

Ž . Ž . Ž . Ž . Ž . Ž .due to the results in Lemma 2, and where the notation f s ss s f s and g s ss s g s iss s

being used.Moreover, if we let s be the covariance of U and V andu¨

Ž . 2 Ž . Ž .C x , x ss x A x , x ,n 1 2 1 n 1 2

w xthen the quadratic covariation process M , V of M and V isn n

ty1 T T Tni n , iy1 n , ty1' ' ' 'w xŽ . Ž . Ž .M , V r s n A n V y s q n A n V ry sÝn n ni u¨ n nt u¨ž / ž /n n nis1n Tnt'Ž . Ž .ss C n V 1 rG qo 1 ª 0,Ýu¨ n nt p p½ 5nts1

w xuniformly in rg 0, 1 , by Lemma A4. It follows, in particular, that

Ž . w xŽ Ž ..38 M , V r r ª 0,n n p

Ž . � w x w xŽ . 4where r r s inf sg 0, 1 : M s ) r is a sequence of time changes.n nŽ .The asymptotic distribution of the continuous martingale M in 35 is completely determined byn

Ž . Ž . Ž .37 and 38 , as shown in Revuz and Yor 1994, Theorem 2.3, page 496 . Now define

Ž . Ž Ž ..W r sM r r .n n n

Ž .The process W is the DDS or Dambis, Dubins-Schwarz Brownian motion of the continuousnŽ Ž ..martingale M see, for example, Revuz and Yor 1994, Theorem 1.6, page 173 . It now follows thatn

Ž .V, W converges jointly in distribution to two independent standard linear Brownian motionsnŽ .V, W , say. Therefore,

Tn n Ž .M ª W c9Mc ,n dž /nŽ .which, due to 36 , completes the proof for the first part. Q.E.D.

y1 Ž .PROOF OF THEOREM 1: The limiting distribution of D S a is derived by applying Lemma 3n n 0Ž . Ž . Ž . Ž .with f x sxG x and g x sG x . It is easy to check that the conditions in Lemma 3 hold. Note2 ˙Ž . Ž . Ž .that G F 1yF sK, for which K gF by Assumption 2 a . Also, we have GF 1yF sF and2 R

˙ 4 3 ˙ 3 ˙Ž . Ž .F gF by Assumption 2 b . Moreover, G F 1yF sG F and we require G F gF in Assumption1 1 4 0Ž . 0 Ž X .1r22 c . Thus, with a s b b and using Lemma 3, we obtain1 0 0

nX Xy1 y1Ž . Ž . Ž Ž ..D S a sD G x b H9x y yF x bÝn n 0 n t 0 t t t 0

ts1

y1 r4 n Ž 0 .n Ý G x a x uts1 1 t 1 1 t ts y3 r4 n 0Ž .ž /n Ý G x a x uts1 1 t 1 2 t t

y1y1r4 0 n 0Ž . Ž .n a Ý f x a u1 ts1 1 t 1 tsy3 r4 n 0ž /Ž .n Ý g x a x uts1 1 t 1 2 t t

1r2 Ž .ª M W 1 ,d

Page 26: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1274

with

m m11 12Msm Mž /21 22

where

`y20 2 0Ž . Ž . Ž .m s a L 1, 0 f sa ds,H11 1 1 s 1y`

`1y10 0 0Ž . Ž . Ž . Ž . Ž .m s a dL r , 0 V r 9 f sa g sa ds,H H12 1 1 2 s 1 s 10 y`

`1y10 0 0Ž . Ž . Ž . Ž . Ž .m s a V r dL r , 0 g sa f sa ds,H H21 1 2 1 s 1 s 10 y`

`1 2 0Ž . Ž . Ž . Ž .M s V r V r 9 dL r , 0 g sa ds,H H22 2 2 1 s 10 y`

and where

2 2 22 2Ž . Ž . Ž . Ž . Ž . w Ž .x Ž .f x s f x s x sx G x F x 1yF x sx K x ,s

2 2 2 2Ž . Ž . Ž . Ž . Ž . w Ž .x Ž .g x sg x s x sG x F x 1yF x sK x ,s

2 2Ž . Ž . Ž . Ž . Ž . Ž . Ž . w Ž .x Ž .f x g x s f x g x s x sxG x F x 1yF x sxK x .s s

Thus,

` `12 0 0Ž . Ž . Ž . Ž . Ž .L 1, 0 s K a s ds dL r , 0 V r 9 sK a s dsH H H1 1 1 2 1y` 0 y`

Ms ,` `1 10 0Ž . Ž . Ž . Ž . Ž . Ž . Ž .V r dL r , 0 sK a s ds V r V r 9 dL r , 0 K a s ds� 0H H H H2 1 1 2 2 1 1

0 y` 0 y`

y1 Ž .and, with MsQ, we have the stated result for the limit of the score process D S a .n n 0˙ ˙Ž . Ž .To get the stated asymptotic result for J a , we let G sG x and observe thatn 0 t 1 t

n n nXy1r2 2 y1 y3r2˙ ˙ ˙ Ž .n G x u , n G x x u , n G x x u so 1 .Ý Ý Ýt 1 t t t 1 t 2 t t t 2 t 2 t t p

ts1 ts1 ts1

˙ ˙ 2 ˙Ž . Ž . Ž . Ž . Ž .These follow from a direct application of Lemma A2 b with f x sG x , xG x , x G x and usingi˙ 1r2 1r2w Ž . x Ž .the fact that GF 1yF gF by Assumption 2 c . We therefore have2 0

y1 Ž . Ž .D J a D an n 0 n 0

nXy1 0 y1Ž . Ž .syD K x a H9x x HD qo 1Ýn 1 t 1 t t n p

ts1Ž .39

y1 r2 n Ž 0 . 2 y1 n Ž 0 . Xn Ý K x a x n Ý K x a x xts1 1 t 1 1 t ts1 1 t 1 1 t 2 t Ž .s qo 1 .pXy1 n 0 y3r2 n 0Ž . Ž .ž /n Ý K x a x x n Ý K x a x xts1 1 t 1 2 t 1 t ts1 1 t 1 2 t 2 t

The stated results are now immediate from Lemma 2. Q.E.D.

PROOF OF THEOREM 2: We employ a standard approach to local extremum estimation. InŽ .particular, Theorem 10.1 of Wooldridge 1994 allows for varying rates of convergence in the

components of the estimator and is well suited to the present problem. The idea is simply to show

Page 27: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1275

Ž .that equation 19 holds and that there is a consistent local solution to the likelihood equation.Ž . Ž .Conditions i and ii of Wooldridge’s theorem hold trivially by our assumption about b and our0

Ž . Ž .Assumption 2, so it remains to verify conditions iii and iv of that theorem.The likelihood equation for the ML estimator a isˆn

Ž . Ž .40 S a s0,ˆn n

which has the expansion

Ž . Ž . Ž . Ž .S a sS a qJ a a ya s0,ˆ ˆn n n 0 n n n 0

or

Ž . Ž . Ž .Ž . w Ž . Ž .xŽ .41 S a qJ a a ya q J a yJ a a ya s0,ˆ ˆn 0 n 0 n 0 n n n 0 n 0

Ž . Ž . Ž .where S a and S a are the scores respectively at a and a , and J a is the hessian matrixˆ ˆn n n 0 n 0 n nŽ .with rows evaluated at mean values that lie on the line segment connecting a and a . Then 41ˆn 0

can be written as

y1 Ž . w y1 Ž . y1 x Ž .0sD S a q D J a D D a yaˆn n 0 n n 0 n n n 0

Ž y1 w Ž . Ž .x y1 . Ž .q D J a yJ a D D a ya ,ˆn n n n 0 n n n 0

or

y1 Ž . w y1 Ž . y1 x Ž .0sD S a q D J a D D a yaˆn n 0 n n 0 n n n 0Ž .42

y2 d Ž y1 w Ž . Ž .x y1 . Ž .qn C J a yJ a C D a ya ,ˆn n n n 0 n n n 0

yd y1 Ž . Ž . Ž .where C sD n for some d)0, so that C D so 1 as nª`, as in condition iii a ofn n n nŽ . Ž . Ž . Ž .Wooldridge’s Theorem 10.1. Equation 19 now follows from 42 if the final term of 42 is o 1 .p

Ž . Ž .This will be so, if condition iii b of Wooldridge’s theorem holds.To show this is so, we need to establish that

Ž . 5 y1 w Ž . Ž .x y1 5 Ž .43 sup C J a yJ a C so 1 .n n n 0 n p� 5 Ž .5 4a : C aya F1n 0

Our proof involves looking at the components of the hessian. We therefore partition the hessianconformably with a as

n Ž . n Ž .J a J a11 12Ž .J a s ,n n nŽ . Ž .ž /J a J a21 22

Ž .wherein, from 12 ,

n n nn ˙ ˙Ž . Ž . Ž . Ž . Ž Ž . . Ž .44 J a sy K a x x y G a x x F a yF q G a x x u ,Ý Ý Ýi j t i t jt t i t jt t t t i t jt t

ts1 ts1 ts1

Ž . Ž X .for i, js1, 2 and where we define f a s f x a qx a for any function f : RªR and furthert 1 t 1 2 t 2Ž 0 0X .define f to be the value of the function f at a s a , a 9.t 0 1 2

Ž .For 43 to hold it is sufficient that

y1 r2q2 d < n Ž . n Ž . <n J a yJ a ª 0,11 11 0 p

Ž . y1 q2 d 5 n Ž . n Ž . 545 n J a yJ a ª 0,12 12 0 p

y3 r2q2 d 5 n Ž . n Ž . 5n J a yJ a ª 0,22 22 0 p

uniformly for all a and a satisfying1 2

Ž . < < y1 r4qd 5 5 y3 r4qd46 a y1 Fn and a Fn ,1 2

Page 28: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1276

Ž .for some d)0. Now, from 44 we have

Ž . n Ž . n Ž . n Ž . n Ž . n Ž .47 J a yJ a sA a* qB a* qC a* ,i j i j 0 i j i j i j

wheren

Xn 0 0˙Ž . Ž . Ž Ž . Ž ..A a sy K a x x x a ya qx a ya ,Ýi j t i t jt 1 t 1 1 2 t 2 2ts1

nXn 0 0˙˙Ž . Ž . Ž . Ž Ž . Ž ..B a sy GF a x x x a ya qx a ya ,tÝi j i t jt 1 t 1 1 2 t 2 2

ts1

nXn 0 0¨Ž . Ž . Ž Ž . Ž ..C a s G a x x u x a ya qx a ya ,Ýi j t i t jt t 1 t 1 1 2 t 2 2

ts1

and a* is on the line segment connecting a and a . For f : RªR, define f by0

Ž . < Ž . <f x s sup sup f axqb ,< < < <ay1 F« b F«

Ž .for «)0 given. We denote f s f x . As shown in Lemma A1, fgF if fgF . Sincet 1 t I I0 y1r4qd y3r4qd'5 5 Ž . < < 5 5sup x r n sO 1 , a ya Fn , and a Fn , we have for any «)01F t F n 2 t p 1 1 2

< Ž . < Ž . Ž .f x a qx a F f x qo 1 ,1 t 1 2 t 2 1 t p

for large n, uniformly in 1F tFn.Ž .By virtue of 46 we have

n nn y1r4qd y3r4qd˙ ˙5 Ž . 5 5 5 5 5 < < 5 5 5 5 5 5A a Fn K x x x qn K x x x .Ý Ýi j t i t jt 1 t t i t jt 2 t

ts1 ts1

Ž .It therefore follows from Lemma A2 a that

Ž . 5 n Ž . 5 Ž 1r4qd . 5 n Ž . 5 Ž 3r4qd . 5 n Ž . 5 Ž 5r4qd .48 A a sO n , A a sO n , A a sO n ,11 p 12 p 22 p

˙ ˙˙ ¨ ˙Ž . Ž .uniformly in a satisfying 46 . Note that KsGFqGF and hence K gF by Assumption 2 b . We2 Imay use exactly the same argument to deduce that

Ž . 5 n Ž . 5 Ž 1r4qd . 5 n Ž . 5 Ž 3r4qd . 5 n Ž . 5 Ž 5r4qd .49 B a sO n , B a sO n , B a sO n .11 p 12 p 22 p

Finally, we have

n1r2n y1r4qd ¨5 Ž . 5 Ž Ž .. 5 5 5 5 < <C a Fn G F 1yF x x xÝi j t t t i t jt 1 t

ts1

n1r2y3r4qd ¨ Ž Ž .. 5 5 5 5 5 5qn G F 1yF x x x ,Ý t t t i t jt 2 t

ts1

since

2 2Ž Ž < 5 .. Ž . Ž .E u FF FE u NFF sF 1yF ,t ty1 t ty1 t t

Ž . Ž .by the conditional Jensen inequality. It follows directly from Assumption 2 b and Lemma A2 a that

Ž . 5 n Ž . 5 Ž 1r4qd . 5 n Ž . 5 Ž 3r4qd . 5 n Ž . 5 Ž 5r4qd .50 C a sO n , C a sO n , C a sO n ,11 p 12 p 22 p

Ž . Ž . Ž . Ž .uniformly in a given by 46 . If we let 0-d-1r12, we may now easily deduce 45 from 48 , 49 ,Ž . Ž . Ž . Ž .and 50 together with 47 . Hence, 43 holds and therefore 19 .

Page 29: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1277

Ž .It now follows as in the proof of Theorem 10.1 of Wooldridge 1994 that there exists a solutionŽ .to the likelihood equation 40 with probability approaching one such that

Ž . Ž .D a ya sO 1 .ˆn n 0 p

From Theorem 1, we have the joint weak convergence

Ž . Ž y1 Ž . y1 Ž . y1 . Ž 1r2 Ž . .51 D S a , D J a D ª Q W 1 , yQ ,n n 0 n n 0 n d

Ž .where Q is positive definite with probability one. Thus, condition iv of Wooldridge’s theorem holds.Ž . Ž . Ž . Ž .The given limit distribution of D a ya now follows directly from 42 , 43 , and 51 . Q.E.D.ˆn n 0

PROOF OF THEOREM 3: Write

y1y1ˆ ˆŽ . w Ž . xyJ b syH H9J b H H9n n n n

y1w Ž . xsyH J a H9ˆn n

y1y1 y1 y1 y1w Ž . xsyHD D J a D D H9.ˆn n n n n n

By Theorems 1 and 2 we have

y1 Ž . y1 y1 Ž . y1 Ž .yD J a D syD J a D qo 1 ª Q.ˆn n n n n n 0 n p d

Ž . Ž .Partitioning the hessian matrix conformably with 22 and using 39 , we have

J J11 12y1 y1Ž .52 D Dn nJ Jž /21 22

n Ž 0 . 2 n Ž 0 . XÝ K x a x Ý K x a x xts1 1 t 1 1 t ts1 1 t 1 1 t 2 ty1 y1 Ž .sD D qo 1 .n n pXn 0 n 0Ž . Ž .ž /Ý K x a x x Ý K x a x xts1 1 t 1 2 t 1 t ts1 1 t 1 2 t 2 t

Then

y1y1 y1 y1 y1Ž .J a 1 J yJ J JJ Jˆn n 11.2 11.2 12 2211 12 's s n .y1 y1 y1J Jž /' ' ž /yJ J J Jn n 21 22 22 21 11.2 22

Next observe that

n nXy1r2 y1r2 0 2 y1 0Ž . Ž . Ž . Ž .n J s n K x a x qo 1 y n K x a x x qo 1Ý Ý11 .2 1 t 1 1 t p 1 t 1 1 t 2 t pž / ž /

ts1 ts1

y1n nXy3r2 0 y1 0Ž . Ž . Ž . Ž .? n K x a x x qo 1 n K x a x x qo 1Ý Ý1 t 1 2 t 2 t p 1 t 1 2 t 1 t pž / ž /

ts1 ts1

ª q yq Qy1q sq ,d 11 12 22 21 11.2

1 y1y1 y1 y1 y1 y3r2 y1r2' 'Ž .Ž .Ž . Ž .n J J J s n J n J n J sO n ,11 .2 12 22 11.2 12 22 p'n

and

'n y1y1 y3r2 y1' Ž . Ž .n J s n J sO n .22 22 p3r2n

Page 30: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1278

It follows that

y1y1Ž .J an n q 011 .2ª ,d ž /'n 0 0

and thus

y1 y1q 0y1 y1 11.2w Ž . xD J a D ª .ˆn n n n d ž /0 0

Consequently,

y1 y1y1 y1 y1 y1ˆ' 'Ž . w Ž . xy n J b syHD n D J a D D H9ˆn n n n n n n n

y1X X Xy1 y1Ž .ª h q h sb b b b q ,d 1 11.2 1 0 0 0 0 11.2

y1 y1ˆ ˆ' 'Ž . Ž . Ž .giving the stated result for y n J b . The result for y n J b follows from 52 using then n n nsame argument. Q.E.D.

PROOF OF COROLLARY 2: The stated result follows immediately, since

ˆ 0 0Ž . Ž . Ž .b yb sJ a ya qJ a ya ,ˆ ˆn 0 1 1 n 1 2 2 n 2

and4 4 0 y1r2ˆ' 'Ž . Ž . Ž .n b yb s n J a ya qO n ,ˆn 0 1 1 n 1 p

for large n. Q.E.D.

ˆ ˆŽ .PROOF OF COROLLARY 3: Use the following mean value expansions for FsF x9b and g sˆn xˆ ˆŽ .f x9b bn n

ˆ X X X ˆŽ . Ž . Ž .FsF x b q f x b x b yb ,t 0 t n t n 0Ž .53ˆŽ . Ž .Ž .g sg x9b qG x9b b yb ,x 0 n n 0

where

Ž . Ž . Ž .G x9b s f x9b I q f 9 x9b b x9.n n m n n

Then

4 1r4 y1ˆ ˆ'Ž . Ž Ž .. Ž . Ž . Ž . Ž .54 n FyF x9b s f x9b x9 b yb n ; f x9b x9 MN 0, P q0 n n 0 d 0 b 11.20

2 y1Ž .s MN 0, f x9b x9P xq ,Ž .d 0 b 11.20

and4 1r4 y1ˆ' Ž Ž .. Ž .Ž . Ž . Ž .n g yg x9b sG x9b b yb n ; G x9b MN 0, P qx 0 n n 0 d 0 b 11.20

Ž Ž . Ž . y1 .s MN 0, G x9b P G x9b 9q .d 0 b 0 11.20

A simple calculation reveals that

2Ž . Ž . Ž Ž . Ž . .G x9b P G x9b 9s f x9b q f 9 x9b x9b P ,0 b 0 0 0 0 b0 0

and the stated results follow. Q.E.D.

Ž X .PROOF OF THEOREM 4: Since y sF x b qu and u is a martingale difference, we havet t 0 t t

n nXy1 y1 Ž . Ž .r sn y sn F x b qo 1 .Ý Ýn t t 0 p

ts1 ts1

Page 31: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

NONSTATIONARY BINARY CHOICE 1279

Ž .The function F z is a cumulative distribution function and is asymptotically homogeneous ofdegree zero as zª` in the sense that

Ž . � 4 Ž .F l z s1 z)0 qR z , l ,

Ž .and R z, l is dominated by a locally integrable function that vanishes at infinity. It thereforeŽ .follows from Park and Phillips 1999, Theorem 5.3 that

n` `

Xy1 Ž . � 4 Ž . Ž .n F x b ª 1 s)0 L 1, s dss L 1, s ds,H HÝ t 0 d 1 1y` 0ts1

as required.Ž . y1 n Ž .The proof for the predicted proportion r X sn Ý y X follows in the same manner. Inn ts1 t

y1 n ˆ ˆ X ˆŽ . Ž . Ž . Ž .the estimated case, r X sn Ý F X , with F X sF X b . By the mean value expansion inn ts1 t t t nŽ . Ž . Ž .54 and using Lemma 2 a and b we find that

ny1 ˆŽ . Ž . Ž . Ž .r X s r X qn f x9b x9 b ybˆ Ýn n n n 0

ts1

Ž . Ž y1 r4 .s r X qO n ,n p

Ž . Ž .and thus r X has the same limit as r X . Q.E.D.n n

APPENDIX C: NOTATION

ª almost surely converge.a. s.ª convergence in probability.pª weak convergence.dŽ .o 1 tends to zero in probability.pŽ .o 1 tends to zero almost surely.a. s.

s distributional equivalence.d; asymptotically distributed as.d' equivalence.W, V , V standard Brownian motions.1 2

Ž .MN 0, V mixed normal distribution with variance V.5 5 k? Euclidean norm in R .F class of regular functions.RF class of bounded integrable functions.IF class of bounded functions vanishing at infinity.0

REFERENCES

Ž .AKONOM, J. 1993 : ‘‘Comportement Asymptotique du Temps d’Occupation du Processus desSommes Partielles,’’ Annales de l’Institut Henri Poincare, 29, 57]81.´

Ž .AMEMIYA, T. 1985 : Ad¨anced Econometrics. Cambridge: Harvard University Press.Ž .BILLINGSLEY, P. 1968 : Con¨ergence of Probability Measures. New York: Wiley.

Ž .DHRYMES, P. 1986 : ‘‘Limited Dependent Variables,’’ Ch. 27 in Handbook of Econometrics, Vol. III,ed. by Z. Griliches and M. D. Intriligator. Amsterdam: North Holland.

Ž .GREENE, W. 1997 : Econometric Analysis, Third Edition. New York: MacMillan.Ž .HALL, P., AND C. C. HEYDE 1980 : Martingale Limit Theory and its Application. New York: Academic

Press.Ž .PARK, J. Y., AND P. C. B. PHILLIPS 1989 : ‘‘Statistical Inference in Regressions with Integrated

Processes: Part 2,’’ Econometric Theory, 5, 95]132.Ž .}}} 1998 : ‘‘Nonlinear Regression with Integrated Processes,’’ Cowles Foundation Discussion

Paper No. 1182.

Page 32: Econometrica, Vol. 68, No. 5 September, 2000 , 1249 1280bhansen/718/ParkPhillips2000.pdf · variable yU is assumed to be latent and the observed variable is simply the t indicator

J. Y. PARK AND P. C. B. PHILLIPS1280

Ž .}}} 1999 : ‘‘Asymptotics for Nonlinear Transformations of Integrated Time Series,’’ Economet-ric Theory, 15, 269]298.

Ž .PHILLIPS, P. C. B. 1998 : ‘‘Econometric Analysis of the Fisher Equation,’’ Cowles FoundationDiscussion Paper No. 1180.

Ž .PHILLIPS, P. C. B., AND J. Y. PARK 1998 : ‘‘Nonstationary Density Estimation and Kernel Autore-gression,’’ Yale University, Cowles Foundation Discussion Paper No. 1181.

Ž .PHILLIPS, P. C. B., AND V. SOLO 1992 : ‘‘Asymptotics for Linear Processes,’’ Annals of Statistics, 20,971]1001.

Ž .POLLARD, D. 1984 : Con¨ergence of Stochastic Processes. New York: Springer-Verlag.Ž .PRATT, J. W. 1981 : ‘‘Concavity of the Log Likelihood,’’ Journal of the American Statistical

Association, 76, 103]106.Ž .REVUZ, D., AND M. YOR 1994 : Continuous Martingale and Brownian Motion, 2nd ed. New York:

Springer-Verlag.Ž .WHITE, H. 1994 : Estimation, Inference and Specification Analysis. Cambridge: Cambridge University

Press.Ž .WOOLDRIDGE, J. M. 1994 : ‘‘Estimation and Inference for Dependent Processes,’’ in Handbook of

Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden. Amsterdam: North Holland.