the review of economic studies ltd. - yale universitylc436/papers/kiviet1986.pdfthe review of...

22
The Review of Economic Studies Ltd. On the Rigour of Some Misspecification Tests for Modelling Dynamic Relationships Author(s): Jan F. Kiviet Source: The Review of Economic Studies, Vol. 53, No. 2 (Apr., 1986), pp. 241-261 Published by: The Review of Economic Studies Ltd. Stable URL: http://www.jstor.org/stable/2297649 Accessed: 26/02/2009 17:07 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=resl. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected]. The Review of Economic Studies Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Review of Economic Studies. http://www.jstor.org

Upload: others

Post on 21-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • The Review of Economic Studies Ltd.

    On the Rigour of Some Misspecification Tests for Modelling Dynamic RelationshipsAuthor(s): Jan F. KivietSource: The Review of Economic Studies, Vol. 53, No. 2 (Apr., 1986), pp. 241-261Published by: The Review of Economic Studies Ltd.Stable URL: http://www.jstor.org/stable/2297649Accessed: 26/02/2009 17:07

    Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unlessyou have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and youmay use content in the JSTOR archive only for your personal, non-commercial use.

    Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/action/showPublisher?publisherCode=resl.

    Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.

    JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with thescholarly community to preserve their work and the materials they rely upon, and to build a common research platform thatpromotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

    The Review of Economic Studies Ltd. is collaborating with JSTOR to digitize, preserve and extend access toThe Review of Economic Studies.

    http://www.jstor.org

    http://www.jstor.org/stable/2297649?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/action/showPublisher?publisherCode=resl

  • Review of Economic Studies (1986) LIII, 241-261 0034-6527/86/00150241$02.00

    ? 1986 The Society for Economic Analysis Limited

    On the Rigouro f Some

    Misspecification Tests for

    Modelling Dynamic Relationships

    JAN F. KIVIET

    University of Amsterdam

    For regression models alternative asymptotically equivalent misspecification tests may lead to conflicting inference in small samples. Effective misspecification tests should have correct significance levels irrespective of the true parameters and any redundant regressors in the model, and reasonable power against a wide class of alternative specifications. A simulation study of various tests for serial correlation and predictive failure in models with lagged dependent variables finds many tests defective in small samples. Only particular degrees of freedom adjustments to the test statistics yield improved small sample behaviour.

    1. INTRODUCTION

    Increasing attention is nowadays paid to model selection and specification pro- cedures, especially in modelling dynamic relationships. Numerous test statistics and diagnostic checks have been suggested as tools in model selection strategies; many of these recent developments are summarized in Harvey (1981). The existence of alternative principles (such as likelihood-ratio, Wald and Lagrange multiplier) for generating test statistics alone means that more than one test statistic with desirable properties is usually available for testing a particular null hypothesis against a specific alternative hypothesis. These statistics often have the same limiting distribution, but their small sample distribu- tions generally differ. Moreover, there is a variety of ways of modifying these tests to generate new tests which retain the original asymptotic properties but may have improved small sample properties. The practitioner is thus faced with a proliferation of tests for the same null and alternative hypotheses. Because these alternative test statistics may have different power functions and different true significance levels in small samples, they may cause conflicting statistical inference and consequently confuse model builders.

    For the hypothesis of linear constraints on the coefficients in the general linear model with spherical normal disturbances, Savin (1976) and Breusch (1979) show that the asymptotically equivalent Wald (W), likelihood-ratio (LR), and Lagrange multiplier (LM) tests satisfy the systematic inequality W?- LR_ LM. Evans and Savin (1982) show that in the classical normal linear regression model the probability of conflict among these tests can be substantial. By applying various small-sample correction factors they also investigated how accurately these tests can approximate an exact test (i.e. a test with correct significance level). As the usual F test is exact in the case they investigated, there

    241

  • 242 REVIEW OF ECONOMIC STUDIES

    is no need to apply a (possibly modified) asymptotic test. Hence their results are only of practical interest if they represent general characteristics of W, LR or LM tests and so suggest a sensible way to modify these tests in more general cases such as those involving nonlinearities or where for other reasons it is difficult to derive an exact test.

    In this paper we assess the inaccuracies involved in applying asymptotic tests in small samples and examine the effectiveness of various small sample correction factors in cases where no exact test is available. Rather than consider specification tests of coefficient restrictions, we investigate two types of misspecification tests, viz. tests for serial correlation and tests for predictive failure, in the linear regression model with lagged dependent variables. (For discussion of the distinction between tests of specification and misspecification see Mizon (1977b)). Both these types of mis- specification test have been used in recent applied work to reduce the risk of accepting misspecified models-see, inter alia, Davidson et al. (1978), Hendry and Mizon (1978), Hendry (1980), Mizon and Hendry (1980), Davidson and Hendry (1981) and Hendry and Richard (1982). In these and other applied studies different tests for the same misspecification have led to conflicting inferences. We use Monte Carlo methods to investigate whether it is worthwhile computing several test statistics for the same alternative or whether there is one particular (possibly modified) test available which is a useful and reliable tool in model selection.

    In addition, overparameterisation, or reduction in the effective sample size, can adversely affect the small sample behaviour of test statistics, and so robustness to overparameterisation is desirable. When the small sample distribution of a test statistic is not known and an approximate sampling distribution based on asymptotic theory is used, it is possible that the approximation will deteriorate with reductions in the number of degrees of freedom. For example, if in checking the adequacy of a general model as part of a general to specific modelling exercise a misspecification test is significant, then a further generalisation of the already overparameterised model may result in an even more striking rejection of the null of no misspecification. Indeed, it was precisely this phenomenon encountered in using tests for serial correlation and for predictive failure on a general model that led us to investigate more carefully the small sample behaviour of such tests. It is important to distinguish between evidence of misspecification arising from the inadequacy of the model and evidence of misspecification resulting from the assumed (usually asymptotically valid) distribution of the test statistic providing a poor approximation to the unknown small sample distribution. Evidence of misspecification of the latter type is potentially misleading, and so we explore the value of degrees of freedom adjustments and other modifications in reducing this problem.

    Since we wish to examine the effectiveness of misspecification tests as tools in a complete model selection strategy, Section 2 discusses some crucial aspects of the success- ive stages of a specification search. This leads us to formulate three characteristics of an effective misspecification test. In Sections 3 and 4 we introduce the different test statistics and try to establish systematic inequalities between the various tests for serial correlation and predictive failure respectively. The Monte Carlo design is described in Section 5. The detailed results, including powers and the noting of cases in which type I error probabilities are parameter invariant, are presented in Sections 6 and 7. We find that in general asymptotic chi-squared critical values are reasonably accurate only when the test statistics are adjusted by an Edgeworth-based correction factor used by Anderson (1958, p. 208). Alternatively, when numerator and denominator of the statistics are corrected for degrees of freedom the F critical values are also reasonably accurate. Section 8 summarizes the conclusions.

  • KIVIET MISSPECIFICATION TESTS 243

    2. MODEL SELECTION IN AD-MODELS

    We consider the linear regression model with predetermined explanatory variables, paying special attention to lagged dependent variables. Mizon (1977a) and Mizon and Hendry (1980) describe a model selection strategy for the Autoregressive Distributed lag (AD) dynamic model, denoted AD(mo, ml,..., Mk):

    ca(0)(L)yt = c + caM(L)xt(l) + . . - + a(k)(L)xt(k) + Et (1)

    where ca()(L) is a polynomial of order mo in the lag-operator L associated with the dependent variable yt, the lag-operator polynomials a(l)(L), ... I a(k)(L) associated with the k exogenous variables xt(l), ... , x,(k)-have orders ml, . . ., Mk respectively, Et is a white-noise disturbance term, and c is a constant. The polynomial a(?)(L) has all its roots outside the unit circle and is normalized so that it includes mo coefficients; the total number of regressors in (1) is K = k+ 1 +>3j m>. If the data generation process (DGP) is (1) and no extra information is available to restrict these K coefficients, then consistent and asymptotically efficient estimates are obtained by ordinary least squares.

    However, in practice the DGP is unknown, so that in a specification search mis- specified models will be estimated and tested. These might, for example, omit relevant explanatory variables, choose lag polynomials of too low order, or use inappropriately transformed variables. Such misspecifications will in general lead to inconsistent para- meter estimates and incorrect assessment of the estimates' sampling properties. A general to specific modelling strategy (see Mizon (1977a, b) and Mizon and Hendry (1980)) aims to lessen the risk of inconsistency by starting from a quite general model determined by relevant economic theory, the available data and computing limitations, and then testing for acceptable simplifications of it. Hence, initial overparameterisation is a deliberate element of the strategy. Conditional on the general model asymptotically valid t- and F-specification tests can be used to find a parsimonious model. The adequacy of the general model itself must also be tested and this is achieved using misspecification tests, though clearly these tests will also be used at subsequent stages of the modelling process to check the adequacy of simplified models. If a misspecification test statistic is significant, the model specification has to be reconsidered. In the context of AD-models, the order of the polynomials will often be increased or explanatory variables will be replaced or added. These model respecifications need not correspond directly to the alternative hypotheses for which the particular misspecification tests have high power. For example, a significant value of a statistic testing for serial independence need not imply that the model should be augmented by a serially correlated error process; nor should predictive failure lead the investigator automatically to specify a model with shift dummy variables.

    In this paper we investigate characteristics of separate tests only, and ignore problems of applying several tests sequentially to different specifications using the same data. We assert that the effectiveness of a misspecification test in any modelling strategy (particularly in the general to specific strategy mentioned above) depends on the following three criteria. First, the test should have an actual significance level (size) close to the nominal level, and so ceteris paribus tests with known size should be preferred to tests with only asymptotic validity. When the model builder is unaware of a substantial difference between the actual and nominal size of the test, the decisions taken may be inappropriate. Too low an actual size (relative to the nominal size) favours the initial specification, leaving the model builder less critical than he thinks. Too high an actual size implies too frequent rejection of an adequate specification. Secondly, the size of the test should

  • 244 REVIEW OF ECONOMIC STUDIES

    be robust to possible overparameterisation (i.e. including redundant lagged regressors) to help avoid the problems associated with overparameterisation mentioned above. Thirdly, the effectiveness of a misspecification test depends on its power to reject mis- specified models in addition to that against which the test is optimal.

    In our simulation study we investigate with these three criteria in mind, the small sample behaviour of alternative forms of serial correlation and predictive failure mis- specification tests. We examine the rejection frequencies of these tests for evidence on their size, on their robustness to changes in the degree of overparameterisation, and on their "power" against a range of alternative hypotheses. We then assess which test statistics can be recommended for practical model selection.

    3. TESTS FOR SERIAL CORRELATION

    Three types of test for the serial independence of the disturbances in dynamic models are often used: tests based upon Box and Pierce (1970)'s time series portmanteau lack-of-fit test, tests suggested in Durbin (1970), and tests based on the Lagrange multiplier principle presented in Breusch (1978) and Godfrey (1978). Serial correlation tests based on the likelihood-ratio and Wald principles are computationally less attractive because they require estimation of the model under the (nonlinear) alternative hypothesis. Here we review over a dozen particular versions of these tests (listed in Table I), and we investigate thoroughly their effectiveness in small samples. In addition, we also consider the Durbin- Watson test statistic which is often reported for models containing lagged dependent variables, despite its well-known inadequacy in dynamic models.

    The Lagrange multiplier test statistic takes the same form for both AR (n) and MA (n) alternatives, where n is the order of the process. We denote by LM one particular version of a number of asymptotically equivalent expressions for this test statistic:

    LM = T * e'E[E'E - E'X(X'X) -X'EJ] -E'e/ e'e, (2)

    TABLE I

    List of tests for serial correlation of order n, their null-distribution, some characteristics and references

    Test Formula Null statistic in text distribution Characteristics References

    LM (2), (3) xn can be expressed as T- R2 Breusch (1978, (31)); Godfrey (1978, (16))

    LMC (4) Xn LMF (5) Fn,T-K-n Durbin (1970, p. 420);

    Harvey (1981, p. 227) LMR (6) Xn LMR > LMC LMW (7) 2 LMW_ LM; LMW> LMF LMP (8) x2n can be negative LMN (9) 2 can be negative LML (10) x2 can be negative Breusch (1978, (23)) LMD (11) x2 can be negative Durbin (1970, (11)) LM* 2 LM* < LM LMW* 2 LMF < LMW* < LMW LMP* 2 LMP* BP Ljung and Box (1978)

  • KIVIET MISSPECIFICATION TESTS 245

    where T is the number of sample observations, e'= (el, . . . , eT) is the T-element vector of residuals from the OLS regression of (1), and X is the T x K matrix of all the model's explanatory variables. The T x n matrix E = [e1, . . ., en] is constructed from the T-element vectors (e')' = (0, . .. , 0, el, . . ., eT-i) which include the residuals lagged i periods. Under the null hypothesis of white-noise disturbances E, LM is asymptotically xn distributed, even if X includes lagged dependent or redundant regressors.

    We can show (see Breusch (1978), p. 354) that LM can be expressed as T times the coefficient of determination R2 in the auxiliary regression of e on the T x (K + n) matrix

    [X. E]:

    LM = T- (RSS[x]-RSS[X: E]/ RSS[X], (3)

    where RSS[X E] is the sum of squared residuals from this artificial regression, and

    RSS[X] = e'e. So LM can be considered as a specification test applied to this artificial regression. We also examine the performance of the Edgeworth corrected test statistic discussed by Anderson and used by Evans and Savin (1982, p. 742)

    LMC= (T-K-n-1+) In (RSS[x]/RSS[x E]), (4)

    and the usual F test

    LMF = TK (RSS[X]- RSS[x E])/RSS[X E]- (5) n

    Crude versions (incorporating no small sample corrections) of LMC and LMF respec- tively are the likelihood-ratio statistic

    LMR = T- ln (RSS[x]/RSS[x:E]), (6)

    and the Wald-type statistic

    LMW = T * (RSS[x] - RSS[x: E])/ RSS[[X E]- (7)

    We also examine the conjecture (see Breusch and Godfrey (1981, p. 74) and Godfrey (1978, p. 1300)) that the finite sample performance of tests may be improved by omitting asymptotically negligible terms. Since under Ho, plim (1/ T)E'E = o-2In and plim E'E/e'e = In, a test statistic asymptotically equivalent to LM is

    T e'E [ E'X(X'X)-]X'E1-1 E'e (8) LMP= T l I [n e'e Je'e

    Omitting the asymptotically negligible covariance between lagged disturbances and exogenous regressors in the LMP statistic yields

    eFE EI-'[Xo 0](X'X)f[Xo 0]'E 1 E'e LMN=T--f In- I I I ~1(9) LMN =~# [I ee j e'e

    where the Tx K matrix [X0 0] contains the Tx mo matrix X0 of lagged dependent regressors. Yet another asymptotically equivalent test is obtained (see Breusch 1978, formula (23)) by replacing the matrix E'X0/e'e in LMN by a consistent estimate Qo of plim E'X0/e'e under Ho. Then we obtain

    LML= T-E[In-eIeE * Q(X'X)-'Q']-' E'e (10) eIe eIe

  • 246 REVIEW OF ECONOMIC STUDIES

    where Q Q Q0* 0] is an n x K matrix. The LML statistic is very closely related to Durbin (1970)'s test against AR (n) disturbances which can be written as

    LMD = T- e'E(E'E)f-[In - e'e, Q(X'X)f`Q']f-(E'E)f-E'e. (11)

    Since plim e'e[E'EJ71 = In under Ho, LMD is asymptotically equivalent to all test statistics mentioned earlier. When n = 1, ele-' (ee)'e%, so we have LML?- LMD.

    Note that the LMP, LMN, LML and LMD statistics-although asymptotically x2 distributed under Ho-can all be negative in small samples, because the matrices in square brackets are not necessarily positive definite. However, as the x2 distribution only approximates the unknown exact distribution, negative values could be interpreted as insignificant values, but the small sample properties of such a testing procedure would need to be investigated. In fact, for the n = 1 case (where the square root of LMD is well-known as Durbin's h statistic) simulations in Hendry and Trivedi (1972) and Spencer (1975) found both rather frequent negative values and a substantial difference between the actual and nominal sizes of the h-test. This suggests that in small samples the distribution of LMD may differ substantially from its asymptotic distribution.

    We investigate the impact of multiplying each of the statistics LM, LMW, LMP, LMN, LML and LMD by (T - K)! T These degrees of freedom corrected versions, where a2 in the denominator is estiamted by e'e/( T- K) rather than e'e/ T, are super- scripted * (see Table I). From the inequalities

    T-K T-K LM*= KLM

  • KIVIET MISSPECIFICATION TESTS 247

    are the statistic

    BP = T- rr= T e'E* E'e

    (15) ele ele

    (see Box and Pierce (1970) and Pierce (1971)) and the modification suggested in Ljung and Box (1978)

    LBP = T21 nT2 r2> BP, (16) = T-1i

    where r' = (rl, . . ., rn) = e'E/e'e is an n-element vector of autocorrelation coefficients of the residual vector, with ri = e'e'/ e'e. BP can be obtained by neglecting the second term within the square brackets of the formulae of LMP, LMN or LML. Only when mo = 0 (i.e. (1) contains no lagged dependent variables) does BP = LMN = LML hold; if moi> 0 then BP is not asymptotically equivalent to any of the Lagrange multiplier type test statistics. Following Pierce (1971), BP is usually treated as a xn-_0 statistic in an autoregressive model, where ino is the number of lagged dependent variables. Note that no? mo with Fo < mo only if a(?)(L) in (1) contains zero coefficients; obviously this test

    can only be applied when n > ino. Breusch and Pagan (1980, p. 245) argue that the BP test is inappropriate in models containing both exogeneous and lagged dependent vari- ables. However, we will compare its small sample performance with that of the other tests for serial independence discussed above.

    4. POST-SAMPLE PREDICTIVE TESTS

    The tests used in a specification search are all employed on the same set of data. So although it may be possible to determine overall significance levels of sequences of such tests (see inter alia Mizon (1977a) and Pagan and Hall (1983)), the perils of data mining are far from imaginary. Thus it is wise to check a model on a fresh set of data (typically the most recent time series data) by a post-sample prediction test. Prediction tests check whether the model specification derived from the sample data also fits the post-sample data relatively well. However, using this additional data at every stage of such a search means that the test statistic is being used as a model selection criterion. Note that post-sample predictive failure can occur for two reasons: the model may be inadequate within sample, or it may be misspecified for only the post-sample period. Table II lists the tests investigated.

    Chow (1960)'s test in the classical linear regression model is

    T K RSST+m-RSST (17) LRF = 17

    in RSST

    TABLE II

    List of m period post-sample prediction tests, their null-distributions, some characteristics and references

    Test Formula Null statistic in text distribution Characteristics References

    LRF (17), (20) Fm,T-K Chow (1960) (21)

    LR (18) xm LRC (19) x2 LRC < LR PR (22) m PR* (23) x2 PR>PR*>LRF Hendry (1980, p. 222)

  • 248 REVIEW OF ECONOMIC STUDIES

    where RSST+m and RSST = e'e are the sums of squared residuals of the model estimated from T + m and T observations respectively. If all the regressors are exogenous, LRF has an Fm,T-K distribution under the null hypothesis Ho of constant parameters over the entire sample. Note that this test can be applied for any positive value of m, and so should not be considered as merely a substitute for the usual test of structural change between two sub-samples with m < K (see Wilson (1978)). It is often the case that the prediction test is used without a particular constructive alternative hypothesis in mind, i.e. it is used as a misspecification test. Whereas the structural change test is usually employed with the hypothesis of coefficient non-constancy (with constant error variance) as a constructive alternative. Anderson and Mizon (1984) provide a recent discussion of these tests.

    The alternative hypothesis for the LRF test can be represented by

    (y*) =[Xj] (I) (e ) with (?) N(0, 0,2IT+m)

    where y* and ?* are m-element vectors, X* is the m x k matrix of post-sample regressors, and Z is an arbitrary m x m non-singular matrix (possibly a matrix Im of m dummy variables). The straightforward F-test of the linear restrictions y = 0 leads to the LRF statistic in (17). Hence we see that this test-like the serial correlation test-can also be viewed as a significance test in a classical regression framework. Again we examine modifying the likelihood-ratio test for y =0

    LR = (T+ m) ln (RSST+m/RSST) (18)

    by the Anderson correction factor, to give

    LRC = (T+2- K-1) ln (RSST+m/RSST) (19)

    Both LR and LRC are asymptotically distributed as X2 under Ho. The corrected test has a smaller probability of type I error than LR, but no such inequality exists between LR and LRF.

    Two alternative expressions for LRF are

    T-K (Y*-9*) [Im + X*(xXfXJ*1](Y* A*) (20) LRF= 20 m RSST

    T-K (y* 9*)'[Im - X*(X'X + X*'X*)-1X](y* -9*) (21) m RSST

    where 9A = X*(X'X)-fX'y is the predicted value of y*. From (21) we see that

    PR = T* (Y*-9*)'(Y*-9*)/ RSST = T - PSSm/ RSST (22)

    (where PSSm is the sum of m squared prediction errors) is asymptotically Xm distributed under Ho, and is asymptotically equivalent to LRF, since for finite m and uniformly bounded regressors, so that as T-> oo plim (1/ T)X*X* = 0 and plim X*E*//T =0, we have:

    plim (y* -A9)'X*(X'X + X*X*)-1X*(y* -_9*) = 0.

    Davidson et al (1978) and Hendry (1980) use a degrees of freedom corrected version of PR

    PR*= (T-K) K PSSm/ RSST. (23)

  • KIVIET MISSPECIFICATION TESTS 249

    From (21), (22) and (23) it is clear that these statistics are concerned with predictive failure, and that PR > PR* > m * LRF; with (13) this implies the inequalties:

    P{PR>Xm(a)} >P{PR* >Xm(a)}>P{m LRF> m(a)}( (24)

    = P{LRF>-X 2(a)} > P{LRF> Fm,T-K(a)}.

    So PR has a higher rejection frequency than PR*, which in turn is higher than that for LRF, irrespective of the correctness of the model. Again we see that alternative asymptoti- cally equivalent tests may in small samples yield systematically conflicting inference for a chosen nominal size. Such conflict emerges at the 5% level in Hendry (1980, equation (11)), Mizon and Hendry (1980, Table II) and in Davidson and Hendry (1981, equation (13) and Tables 4b and 4c).

    5. THE MONTE CARLO DESIGN

    In the Monte Carlo experiments the actual data generation process was the AD(1, 1) model

    Yt = yYt-i + o30xt + 1xt_1 + et with JyJ < 1 and et IIN(0, o-2). (25) We calculated the misspecification tests of interest from four estimated regression equations (each of which included a constant term): the correct AD(1, 1) model; the overparameterised AD(2,2) specification; the misspecified AD(0, 0) model; and the AD(1) model. For the first two regressions, the empirical rejection frequencies estimate the tests' true significance levels and are used to assess sensitivity to redundant regressors. For the two remaining regressions, the rejection frequencies estimate the power of the tests against two particular types of dynamic misspecification. Note that the AD(0, 0) model excludes any dynamics; the AD(1) model specifies a parsimonious univariate AR(1) time-series process. These experiments do not cover all conceivable circumstances, but in our view tests that fail in these simple cases should no longer be used in models containing lagged dependent variables.

    For (25) two alternative processes to generate xt were investigated.

    xt= Axt + + et, (26) where et IIN(0, o-2) and {JE}, {le} mutually independent, then (i) A I< 1 and u = 0 generates a stationary AR(1) process for xt, while (ii) A = 1 generates a (nonstationary) random walk with drift parameter ,u. Appendix A shows how to obtain starting values for the xt and Yt series in a computationally efficient and statistically satisfactory way.

    The Monte Carlo design fixes the values of 12 parameters: {y, 30,I3l, A, k, X-2, @2 o'2, T, m, n, a}. We chose the sample size T, the post-sample size m the order of

    serial correlation n, and the nominal significance level a by the grid T e {20, 40, 80}, m E {4, 8, 20}, n E {1, 4, 8}, a E {0-01, 0 05, 0-10}. The values of the coefficients y, o30 and P,3 determine the systematic dynamics of the model and we selected values we believe are relevant for econometric practice. We first chose the true coefficients of the lagged dependent variable as y E {0 5, 0 9}. Given y, we then chose o3o and I,3 to obtain values of the total multiplier (the long-run effect of a unit increase in x on y) as

    TMP = (O+f1)/1(1-fy) E {0*2, 1*0, 5-0},

    and values of the immediate standardized impact multiplier (the proportion of the total multiplier realised instantaneously) as

    SIMo = Po/ TMP E {0.2, 0 5, 0 8}.

  • 250 REVIEW OF ECONOMIC STUDIES

    The sets for y, TMP and SIMo define a grid of 18 combinations of coefficient values, which are detailed fully in Appendix B. The values of the parameters in the generating process (26) for the explanatory variable were fixed as follows. In the stationary case we set A = 0-8; in the non-stationary case we fixed ,u = 0-02 and the starting value x-2 = 1. In all cases we fixed o-2 = 1, and after some initial experimentation chose o-2 = 3, to give reasonable values for the coefficient of determination R2 in the AD(1, 1) specification (see Appendix B).

    What number of replications M would be adequate for the present purpose? The probability p that a test statistic leads to rejection of the null hypothesis is estimated by the corresponding rejection frequency p in M replications of each Monte Carlo experi- ment. The variance of p will be p(l -p)/M, and a 95% confidence interval for p will be approximately (for large M)

    [?p-2pf( 1 -?)/M, ?++ 2A/?( 1-?)/ M]. (27) This implies that M should be around 10,000 (which is prohibitive) if we want to estimate any p E [0, 1 ] by a 95 % confidence interval no wider than + ? 001. Our results are obtained from merely 500 replications. Because this choice of M implies a relatively large confidence interval for ? 001 , we will not be able to analyse satisfactorily the differences between actual and nominal sizes at a = 001 and so report no details for this case. However, our results support the conjecture that any considerable difference between a test statistic's small sample and asymptotic distributions usually affects the entire right- hand tail area in the same direction, so it would be rash to suppose that a test statistic that performs badly at the 5% and 10% levels might perform better at the 1% level.

    Ideally, the observed significance level of the tests examined in the next two sections should be close to the nominal level, regardless of the values of the parameters; failure at any parameter set is sufficient to disqualify a test for practical purposes, as we wish to use it when the parameters are either unknown (i.e. coefficients) or uncontrolled (e.g. the regressors and sample size). Our results for type I errors are reported using simple summary statistics (minimum, mean and maximum) over all 18 coefficient combinations of either the stationary or nonstationary x, series, or over all 36 different DGP's considered. The precision determined by the confidence interval (27) leads us to suggest that a test fails the criterion of correct size if the estimated significance level exceeds 0 09 or is below 0-02 at the 5% nominal level, or if it exceeds 0-15 or is below 0-06 at the 10% nominal level. Of course, power values deserve a more detailed presentation since they will vary over the different parameter sets.

    Finally, note that different series of random numbers {JE}, {l} were generated for each of the 36 different DGP's defined by the 18 coefficient combinations and x, either stationary or non-stationary. In each replication of a DGP one hundred observations on the relevant explanatory and dependent variables have been generated, and these data were used in all the experiments for the various different values of T, n, m and a.

    6. RESULTS FOR TESTS FOR SERIAL CORRELATION

    When the model with the correct AD(1, 1) specification is estimated, the test rejection frequencies estimate the actual significance level at a given critical value of the asymptotic distribution. We first discuss the results for the inadequate but popular Durbin-Watson and Box-Pierce type tests. Note that in all the Tables rejection frequencies are expressed as a proportion of the 500 replications.

  • KIVIET MISSPECIFICATION TESTS 251

    TABLE III

    Rejection frequencies of the Durbin- Watson bounds- and "asymptotic" tests at the nominal 5% level in the correctly specified AD(1, 1) model over the 36 combinations in the Monte Carlo experiments

    dL du "asymptotic" d Sample size min mean max min mean max min mean max

    T=20 0.000 0-002 0-008 0-104 0-201 0-322 0 004 0-024 0-046. T=40 0.000 0*007 0-018 0-046 0.110 0-204 0-006 0-028 0 058 T=80 0.000 0-013 0-028 0-018 0*074 0-124 0.000 0 033 0-062

    Table III presents rejection frequencies, averaged over all 36 combinations of coefficients and exogenous variables, for the Durbin-Watson test statistic DW at the tabulated 5% critical lower and upper bounds dL and du. We see that the actual significance level is always very low if dL is used, but becomes very irregular and generally too high if the inconclusive region is added to the critical region by using du as the critical value. If we avoid the inconvenience of the bounding critical values and use T12 _ (1- DW/2) as an approximately asymptotically normal test for serial correlation then we obtain uncalibrated statistical inference: in general the actual significance level is too low, but is sometimes fortuitously correct.

    If we apply the DW test to the overparameterised AD(2, 2) model (using du and dL values for K = 6) the significance level is nil for dL, varies wildly with T (we found values from 0-00 to 0-56) for du, and is very small for the asymptotic test. These results confirm that in the model with lagged dependent variables the DW statistic cannot give valid evidence about autocorrelated disturbances; therefore it is best not calculated at all.

    We investigated both BP and LBP variants of portmanteau residual correlation tests, but as the AD(1, 1) model includes one lagged dependent variable we have mo = imo = 1 and so the tests cannot be performed for serial correlation of order n = 1. Therefore we considered n E {2, 4, 8} for these two tests; the statistics are then compared with critical values of the x2 distribution with 1, 3 and 7 degrees of freedom respectively. As with the DW test there is little difference between the stationary and the non-stationary cases. Table IV shows rather high rejection frequencies (even with T = 80) especially for low values of n. Because of this and the wide gap between the minimum and maximum rejection frequencies over the 18 different coefficient combinations, these tests cannot be recommended. The unstable true significance levels-anything from half to three times

    TABLE IV

    Rejection frequencies of the Box- Pierce and the Ljung- Box test in the correctly specified AD(1, 1) model

    Stationary data, 5% level Non-stationary data, 10% level Sample size,

    order of serial BP LBP BP LBP correlation min mean max min mean max min mean max min mean max

    T=20 n=2 0-04 0-08 0-15 0-06 0-12 0-21 0 10 0X19 0-33 0X14 0-25 0X42 n=4 0-03 0X04 0-07 0-06 0-10 0X14 0 05 0 10 0X14 0X12 0-18 0-27 n=8 0.01 0-02 0 05 0-06 0 09 0-14 0-02 0 05 0-08 0-12 0X17 0-25

    T=40 n=2 0 03 0X08 0-14 0 04 0.10 0-17 0 10 0 19 0 31 0-12 0X22 0 34 n=4 0-02 0 05 0 09 0 04 0 07 0X12 0-08 0-12 0 19 0 10 0X17 0 25 n = 8 0-02 0 04 0 07 0 05 0X08 0-12 0 05 0-08 0X12 0 11 0X16 0X20

    T=80 n=2 0-04 0 09 0X16 0 04 0 10 0X18 0 09 0-20 0-28 0 10 0X21 0 30 n=4 0 03 0-06 0X10 0-04 0-07 0 11 0X08 0X13 0-20 0 10 0,16 0 23 n=8 0-02 0 05 0-08 0 04 0 07 0 11 0-08 0 11 0 16 0 10 0,14 0-21

  • 252 REVIEW OF ECONOMIC STUDIES

    TABLE V

    Rejection

    frequencies of

    some of

    the

    Lagrange

    multiplier

    type

    tests

    for

    serial

    correlation at

    the

    nominal

    5%

    level in

    the

    AD(1, 1)

    and

    AD(2, 2)

    models

    averaged

    over

    the 18

    coefficient

    combinations

    (stationary

    data, a = 0

    05)

    AD(1, 1)

    AD(2,2)

    T

    n

    LM

    LM*

    LMW*

    LMP*

    LMN*

    LML*

    LMD*

    LMC

    LMF

    LM

    LMP*

    LMN*

    LML*

    LMG

    LMF

    20

    1

    0-08

    0 05

    0-08

    0-04

    0-05

    0*07

    0-08

    0-05

    0-05

    0-12

    0 03

    0-06

    0.10

    0-06

    0-06

    4

    0-09

    0-02

    0-21

    0-04

    0-04

    0-04

    0-16

    0-06

    0-06

    0 15

    0-06

    0-04

    0-05

    0-06

    0-06

    8

    0-04

    0.00

    0-50

    0-05

    0-03

    0-02

    0-33

    0-06

    0-04

    0-13

    0 05

    0 03

    0 03

    0 07

    0.05

    40

    1

    0-06

    0-05

    0-06

    0-04

    0-05

    0-05

    0-06

    005

    0-05

    007

    003

    0-06

    0 10

    0-05

    005

    4

    0-06

    0-03

    0 09

    0-05

    0.05

    0.05

    0 11

    0-04

    0-04

    0-08

    0 07

    0.05

    0-07

    0.05

    0.05

    8

    0-04

    0-02

    0 17

    0.05

    0-04

    0-04

    0-18

    0-04

    0-04

    007

    0 05

    0-04

    0-05

    0-04

    0-04

    80

    1

    0*05

    0-05

    0.05

    0-05

    0-05

    0-05

    0-05

    0-05

    0-05

    0-06

    003

    0-06

    0 10

    005

    005

    4

    0-05

    0-04

    0-07

    0.05

    0-06

    0.05

    0-08

    0-04

    0-05

    0-06

    0 07

    0-04

    0 07

    0 05

    0.05

    8

    0-04

    0-03

    0 09

    0.05

    0.05

    0.05

    0-12

    0-04

    0-04

    0-05

    0-06

    0-04

    0-06

    0-04

    0-04

  • KIVIET MISSPECIFICATION TESTS 253

    the nominal level-preclude effective use of BP or LBP in a model selection strategy. This conclusion is confirmed in the overparameterised AD(2, 2) model and when the LBP test is used with the xn (rather than xn- 0) critical value. In the AD(1) model with n =20 Davies, Triggs and Newbold (1977) find significance levels for BP considerably less than those predicted by asymptotic theory, especially for small T and y. Our results indicate the opposite finding for low values of n in models with exogenous regressors.

    We now look at the significance levels of the asymptotically valid Lagrange multiplier type tests. Table V presents the average rejection frequencies over the 18 parameter combinations for the stationary data at the 5% level. The results for the non-stationary data lead to the same conclusions and are not reproduced here. We list results for only the degrees of freedom corrected tests and the popular T- R2 version of LM. The rejection frequencies for the latter test appear to increase in the presence of redundant regressors, particularly for T = 20, while LM* fails in the correctly specified model especially for large values of n relative to T Applying the inequalities in (12) to the results in Table V implies that the LMW, LMP, LMN, LML and LMD tests are even less attractive than their degrees of freedom corrected counterparts, which themselves have generally unsatis- factory type I error probabilities. LMW* and LMD* in particular have extremely high rejection frequencies for small T and large n, while LMP*, LMN* and LML* are adversely influenced by overparameterisation. This provides evidence against the conjec- ture that omitting asymptotically negligible terms gives better small sample behaviour. We also examined the frequency in the simulations of negative values for the tests (LMP, LMN, LML and LMD) that have non-definite quadratic forms in the numerator of the statistic. Negative values are particularly associated with specific coefficient combinations; frequencies averaged over all combinations ranged from 0 00 to 0 09 in the AD(1, 1) model, and from 0 00 to 0-36 in the AD(2, 2) model.

    It appears from Table V that the LMC and LMF statistics are the only ones that satisfy the criteria of correct size and invariance to overparameterisation. Table VI provides more detailed results for these two tests. Taking into account the Monte Carlo sampling variability approximated by (27) the results for LMF and LMC are satisfying on the whole, especially for T o40 (LMC appears a bit vulnerable in the extreme case T= 20 and n = 8). All other tests for serial correlation investigated here are found to be unfit in some respect.

    TABLE VI

    Estimated type I error probabilities of the LMF and LMC test

    AD(2, 2) model AD(1, 1) model stationary data, 5% level non-stationary data, 10% level

    LMF LMC LMF LMC T n min mean max min mean max min mean max min mean max

    20 1 0-04 0-06 0 07 0 05 0 06 0*07 0-08 0-10 0 13 0-08 0 10 0 13 4 0-04 0-06 0-08 0-04 0-06 0-09 0*08 0.11 0 15 0-08 0 12 0 14 8 0-03 0 05 0-06 0 05 0 07 0-09 0-06 0-10 0 14 0-08 0-12 0-16

    40 1 0-03 0-05 0-08 0-03 0.05 0*08 009 0*10 0*13 0.09 0.10 0 13 4 0-04 0.05 0-06 0*04 0.05 0-06 0 08 0.11 0*14 0-08 0.11 0.15 8 0 03 0 04 0*05 0 03 0-04 0-06 0-07 0*10 0*14 0-07 0*10 0-14

    80 1 0 04 0-05 0-07 0 04 0:05 0 07 0 09 0-10 0 13 0 09 0 10 0-13 4 0 03 0 05 0 06 0 03 0 05 0-06 0-05 0-10 0-14 0-05 0.10 0-14 8 0*03 0 04 0-06 0 03 0*04 0-06 0-06 0.10 0.12 0-06 0.10 0-12

  • 254 REVIEW OF ECONOMIC STUDIES

    In the discussion of Breusch and Godfrey (1981, p. 108), Osborn wonders how the test LMW (T - K - n)/ T compares with LM: she shows there is no simple systematic inequality between them. As we see from (14) this test has a significance level below that of LMW* but exceeding that of LMF. From Table V, we conclude that this particular degrees of freedom correction to LMW will not outperform LMF.

    Coefficient combination 5 in our simulation experiments (see Appendix B for details) is very closely related to the pilot simulation study in Davidson and Hendry (1981) who find acceptable actual significance levels for the crude Lagrange multiplier test LM Our Monte Carlo study suggests that this results from their large sample size (T = 74) relative to the number of coefficients in a model with no redundant regressors.

    We now examine the power of the tests with respect to alternative hypotheses that do not correspond to the DGP (25). As tests for serial correlation are often used as general diagnostic checks to reveal any serious misspecifications, including the omission of lagged (dependent) variables, it is important that these tests should have power in AD models against misspecification of the dynamic adjustment process. Table VII presents rejection frequencies for the individual coefficient combinations for some tests of the misspecified AD(O, 0) and AD(1) models. Applying the DW test to the AD(O, 0) model (which excludes any lagged dependent variables) gives quite attractive power figures even when dL is used as the critical value. This test has a significance level below the nominal level in an adequately specified model; nevertheless its power here outperforms the LMF test. Note that in this model to = 0 so the BP, LMN and LML tests are equivalent. The power of BP appears to exceed that of LMF, although this may be overstated as BP may have too high a rejection frequency even when the AD(O, 0) specification is correct. However, we know the LMF test has a significance level very close to the nominal level

    TABLE VII

    Rejection frequencies of some particular tests for serial correlation when applied to misspecified models at the nominal 5% level (stationary data). DGP is AD(1, 1)

    AD(O, 0) model AD(1) model

    dL BP (n =4) LMF

    Combi- n =1 n =4 LMF (T = 40) nation TMP MNL T = 20 T=40 T=20 T=40 T=20 T=40 T=20 T=40 n = 1 n = 4

    1 0-2 8 0-75 1.00 0-55 0-97 0-74 0-96 0-5o 0-78 0-07 0-06 2 1-0 8 0-78 1-00 0 52 0-98 0-75 0-98 0-53 0-83 0-09 0-08 3 5-0 8 0-97 1-00 0-84 1-00 0-95 1.00 0-85 0-99 0-09 0-07 4 0-2 5 0-74 1.00 0-49 0-98 0-71 0-97 0-49 0 73 0-06 0-04 5 1-0 5 0-77 1.00 0-52 0-97 0-76 0-97 0-50 0-82 0-06 0-07 6 5-0 5 0-93 1-00 0-75 1.00 0-93 1-0'Y 0-77 0-98 0-06 0-06 7 0-2 2 0-76 1-00 0-55 0-98 0-74 0-97 0-52 0-78 0-06 0-05 8 1-0 2 0-77 0-99 0-49 0-97 0-74 0-97 0-47 0-77 0-04 0-05 9 5-0 2 0 87 1-00 0-67 0-99 0-85 0-99 0-66 0-90 0-08 0-05

    10 0-2 1-6 0-25 0-81 0-12 0-49 0-25 0-36 0-12 0 05 0 05 0 05 11 1-0 1-6 0-49 0-92 0-23 0-66 0-47 0-56 0-21 0-07 0-05 0 07 12 5-0 1-6 0-70 0-99 0-31 0-86 0-70 0-76 0-28 0-09 0-09 0-08 13 0-2 1-0 0-23 0-81 0-13 0*50 0-22 0-36 0-12 0-04 0-06 0-07 14 1*0 1*0 0-42 0-93 0-20 0-63 0-41 0-53 0-17 0-06 0-06 0-04 15 5-0 1-0 0-67 0-97 0-27 0-82 0-64 0-71 0-26 0-09 0.10 0-06 16 0-2 0-4 0-25 0-83 0-13 0-50 0*26 0-36 0-12 0-03 0-04 0-05 17 1-0 0 4 0*28 0-84 0-14 0-50 0-24 0-37 0-12 0-04 0-04 0-05 18 5-0 0-4 0-58 0*97 0-23 0-72 0-52 0-61 0-19 0-06 0-06 0-05

  • KIVIET MISSPECIFICATION TESTS 255

    even in small samples and is a reliable model selection guideline for this reason. In the AD(O, 0) model the power increases with the mean lag MNL, the total multiplier TMP, and the sample size (except for high n and low -y).

    Whether the power figures in the AD(O, 0) model are satisfactory is a matter for personal judgement. This type of misspecification is often tested by applying the DW or LMF tests, yet cases of insignificant test statistics are frequently found especially for y = 0 5 and n = 4. Since we find that serial correlation tests may have reasonable power against general alternatives, this illustrates the message in Hendry and Mizon (1978) that, for instance, a significant DW statistic should not automatically be followed by estimating an autocorrelated error process, but rather by the specification of a more general AD model that allows for systematic dynamics instead of pure disturbance dynamics.

    The power of the LMF test in the misspecified AD(1) model is very disappointing: in many cases it is indistinguishable from the nominal and actual significance level. This also holds for the non-stationary data. So if the DGP is an AD(1, 1) model a model selection strategy starting from an incorrect AR (1) univariate time-series model and which tests for residual autocorrelation is unlikely to reject this simple model. This illustrates the assertion in Hendry and Richard (1983, p. 11) that the randomness of the residuals is a necessary but by no means sufficient condition for the adequacy of a model's specification.

    7. RESULTS FOR THE POST-SAMPLE PREDICTION TESTS

    All the tests for predictive failure we investigate are only asymptotically valid. Table VIII presents the main results, averaged over the 18 coefficient combinations, for the correctly specified AD(1, 1) model. Even a sample size of T = 80 is too small for the PR and PR* tests to exhibit their asymptotic qualities; in moderate sample sizes these tests have too large actual significance levels leading to a too frequent incidence of type I errors. As the LRF statistic yields a test with actual size close to nominal size over the whole experimental design, it appears that omitting asymptotically negligible terms (to obtain PR and PR* from (21)) worsens small sample behaviour. Although the results for the crude likelihood-ratio test LR are very poor, it is remarkable that (except for the extreme case T = m = 20) the simple Edgeworth correction in LRC gives much better results.

    TABLE VIII

    Rejection frequencies of post-sample predictive tests in the correctly specified AD(1, 1) model, averaged over the 18 coefficient combinations

    stationary data; 5% level non-stationary data; 10% level T m PR PR* LR LRC LRF PR PR* LR LRC LRF

    20 4 030 0-21 0-16 0-06 0-06 0-41 0-32 0-26 0.11 0 11 8 0-42 0-31 0-26 0 07 0-06 0 55 0 43 0 37 0-12 0-12

    20 0-61 0 47 0 57 0 10 0-06 0 74 0-61 0-69 0-17 0-12

    40 4 0-15 0-12 0.10 0-06 0-06 025 0-20 0-18 0 11 0.11 8 0-20 0 15 0-13 0-06 005 0-32 0-26 0-23 0-12 0.11

    20 0-32 0-24 0-30 0*07 0-06 0 47 0-38 0-42 0-13 0.11

    80 4 009 0-08 007 005 005 0-16 0-14 0-13 0.10 0.10 8 0.11 009 0*09 0*05 005 0.19 0-16 0 15 0 10 0 10

    20 0-16 0-13 0 15 005 005 0-26 0-22 0-24 0 10 0.10

  • 256 REVIEW OF ECONOMIC STUDIES

    The superiority of LRF is again found in the overparameterised AD(2, 2) model dealt with in Table IX. Including redundant regressors appears to further increase the significance levels of PR and PR*, perhaps leading to the unnecessary extension of an already overparameterised (but otherwise adequate) model. The LRF and LRC tests prove to be relatively invariant with respect to both overparameterisation and the coefficient values of the DGP.

    In Table X the power of the LRF test is shown to be very poor with respect to the AD(O, 0) and AD(1) alternatives considered. Apparently the generated sample data are too smooth to produce serious prediction errors. We observe the curious phenomenon that the rejection frequencies decrease for larger sample sizes. This occurs because, for

    TABLE IX

    Rejectionfrequencies of somepost-samplepredictive tests at the nominal 5% level in the overparameterizedAD(2, 2) model (stationary data)

    LRC LRF T m PR* LR min mean max min mean max

    20 4 0-28 0-22 004 0-06 007 0 04 0-06 0 07 8 039 034 004 007 0.10 004 0*06 009

    20 0*56 0-68 0 07 0.11 0-15 0 04 0-06 0-10

    40 4 0*14 0.11 0 04 0-06 0 07 0 04 0*06 0*08 8 0*18 0*16 0*03 0*05 0*08 003 0*05 0-07

    20 0*29 0*35 0*03 0-06 0*09 0-02 0-06 007

    80 4 009 0-08 004 0*05 007 004 0*05 007 8 0.11 0.10 0.03 0*05 0*07 003 0*05 0*07

    20 0*15 0-17 0*04 0*05 0*07 0*03 005 0-06

    TABLE X

    Rejection frequencies of the LRF test at the nominal 5% level in misspecified models. DGP is AD(1, 1)

    AD(0, 0) model AD(1) model

    stationary data stationary data non-stationary data

    Combi- m=4 m=8 m=4 m=8 m=4 m=8 nation T=20 T=80 T=20 T=80 T=20 T=80 T=20 T=80 T=20 T=80 T=20 T=80

    1 0*24 0.00 0*30 0.00 0-06 0*04 0*07 0*04 0-07 0-06 0-08 0*05 2 025 0.00 0*33 0.00 009 0*05 0*09 0*06 0*08 0-06 0*08 0-06 3 0.25 0.00 0.35 0.01 0 09 0*06 0*09 0-06 0*12 0.11 0-12 0-14 4 0.20 0.00 0*29 0.01 0*06 005 0*06 0*05 0*07 0*05 0*07 0*05 5 0.19 0.00 0-28 0.00 0*06 0-06 0*06 0-06 0*07 0 05 0*09 0-06 6 0-26 0.00 0*35 0.00 0.06 0*04 0-06 0*06 0*07 0*07 0*07 0-08 7 0.19 0.00 0.27 0.00 0.06 0-06 0-06 0-06 0-08 0-06 0*09 0 04 8 0*22 0.00 0*30 0.00 0*05 0*05 0 04 0-06 0-06 0*04 0*07 0 04 9 0-22 0.00 0-27 0.00 0-06 0*05 0-08 0 05 0*07 0 04 0*09 0 03

    10 009 0.00 0*12 0.00 0.05 0*05 0*05 0*04 0*07 007 0*06 007 11 0-12 0.00 0-13 0.00 0.06 0*06 0 05 0*06 0*09 0*08 0-10 0-08 12 0-16 000 0.15 0.00 009 0*08 009 0*09 0*14 0.09 0-18 0*12 13 0*08 0.00 0.10 0.00 0.05 0-06 0-06 0*07 0 05 0 07 0-06 0-06 14 0-11 0.00 0-14 0.00 0.06 0 07 0-06 0 05 0-08 0-06 0-08 0-08 15 0-13 0.00 0-17 0.00 0-08 0-07 0-10 0 07 0-08 0 07 0.10 0 09 16 0.09 0.00 0-12 0.00 0.05 0-05 0-04 0-05 0-06 0.05 0.05 0-05 17 0 09 0.00 0-11 0 00 0 05 0Q04 0 04 0*06 0-06 0 05 0 09 0-06 18 0-11 0.00 0-12 0.00 0-08 0-06 0-06 0-06 0 05 0 07 0 04 0 07

  • KIVIET MISSPECIFICATION TESTS 257

    stationary regressors and fixed m plim (1/ T)RSST+m = plim (1/ T)RSST even in mis- specified models, depriving the test of its power in large samples.

    We conclude that although a post-sample prediction test with a correct size in small samples does exist, detecting a dynamic misspecification is likely only if the relevant regressor variable x, is already included in the specification. Starting with a parsimonious univariate time series model wth omitted (lagged) variables, it is doubtful that this will be detected by such a test. However, varying data correlations-not considered in our Monte Carlo design-will obviously enhance the power of a post-sample prediction test.

    8. CONCLUSIONS

    Misspecification tests are important tools in empirical modelling, but they can only be effective if the user has control over the probability of type I errors. Their usefulness improves if they have high power against a wide range of alternative model specifications. In a simulation study we investigated many versions of two general types of mis- specification tests applied in finite samples to a single equation linear regression model with a lagged dependent variable and normally distributed disturbances. We found that the rejection probabilities of the tests may vary substantially for different parameter values of the data generation process. For particular tests this may occur both in misspecified models (as is to be expected) and in adequately parameterised or overparameterised models. Because of this lack of robustness, a model builder who is ignorant of the parameter values of the data generation process can be led astray in the model selection process.

    Our Monte Carlo results corroborate theoretical findings that the Durbin-Watson, Box-Pierce, and Ljung-Box tests for serial correlation should not be applied in regression models wth lagged dependent variables. In such models these statistics are best not calculated at all as they have no sound interpretation. The simulations also reveal that even asymptotically valid tests such as (generalizations of) Durbin's h-statistic and various formulations of the Lagrange multiplier test-including the popular T- R2 version-have poor small sample properties. However, the Lagrange multiplier type F-test, denoted here by LMF (and computationally as simple as the T- R2 version), appears to have a type I error probability that is relatively invariant to sample size, order of serial correlation, true coefficeint values, and redundant regressors. Test LMF is suggested in Harvey (1981, p. 277) where it is referred to as the "goodness of fit F-test", but where the appropriate degrees of freedom correction goes unrecorded. On the basis of our simulation results, it seems reasonable to ignore the fact that LMF does not have an exact F distribution in dynamic models, and to use critical values from the F distribution with n and T - K - n degrees of freedom. The other test statistics could only be used with confidence by evaluating appropriate critical values by simulation, as suggested in Bera and Jarque (1982); we believe this to be an infeasible alternative for the practitioner; besides there is no indication that tests would be obtained with better power characteristics than LMF.. Hence, from the computationally simple tests for serial correlation investigated here, we recommend LMF; the Durbin-Watson test might be preferable only if no lagged depen- dent variables are included in the specification (as it is then UMP for particular X matrices and specific alternative hypotheses).

    For the post-sample prediction tests examined, the F test (denoted LRF here) is also the most reliable. The divergence of the asymptotic distribution from the finite-sample distribution of test statistics is well-illustrated here by the poor properties of the likelihood- ratio test.

  • 258 REVIEW OF ECONOMIC STUDIES

    For both the LMF and LRF tests, we found that multiplying the corresponding likelihood-ratio test statistics by a simple Edgeworth-based scalar correction factor pro- duces tests that (apart from some extreme cases) have rejection frequencies almost equal to the chosen F tests. As these Edgeworth corrected statistics are used with x2 critical values-which are much easier to memorize than F critical values-practitioners might prefer to use these LMC and LRC statistics. In the context of non-linear regression both the F- and the Anderson reformulation of test statistics are employed in Mizon (1977b).

    Of all the test statistics investigated here, only the F tests and the LMC and LRC tests could usefully be re-examined to analyse their rejection probabilities more thoroughly over a wider parameter space, perhaps by using response surface techniques. The present Monte Carlo study has revealed only that many test procedures are deficient in small samples. It also suggests that it is questionable whether serial correlation and predictive failure tests will be very effective in detecting misspecification of an AD model, especially if the specification search starts from a simple ARMA representation while the exogenous explanatory variables of the AD process are themselves modelled parsimoniously by ARIMA processes.

    APPENDIX A

    It is not necessary-and it is computationally inefficient-to adopt starting values, say x50 -0 = Y-50, to generate the required values {yt, xt; t = -1,..., T+ m} according to the formulas (26) and (25), as has been done frequently in comparable simulation studies. (When estimating the AD(2, 2) model Y-i and x-, are needed.) Since general ARMA (p, q) series of length N can be synthesized exactly from N + q IIN(O, 1) drawings (see McLeod and Hipel (1978)), both the waste of random numbers and small sample non-stationarity problems as indicated in Hendry (1979, Appendix B) can be avoided as follows. Using the lag-operator L, (25) may be rewritten

    Yt = 8 1- L 1-L (A.1)

    Substitution of (26) for the stationary xt series leads to

    /30 +f 1 L 1 yt (1 -yL)( 1-L e' + 1-y Et. (A.2)

    Hence, Yt is the sum of an ARMA (2, 1) process and an (independent) AR (1) process, which in general leads to an ARMA (3, 2) process, see Granger and Newbold (1977, p. 29). Because in (A.2) both processes have the factor (1 - yL)-1 in common, yt reduces to an ARMA (2, 1) process.

    Now the series {yt, xt; t =-1,..., T+ m} can be generated from the mutually independent white-noise series {It; t=1, . ..., T+m} and {4t; t = -2, .. ., T+ m} in the following way. By means of the method of McLeod and Hipel we generate the observations w-2, wl and v-, of the AR (2) process wt = [(1 - yL)(1 - AL)]-lgt and the AR (1) process vt = (1 - yL)-'Et. Then, as

    Yt = (,o +1 L) wt + vt, (A.3)

    we can calculate y-1. From {le} we also generate the AR (1) process {xt; t - -1, . . ., T+ m} and then the remaining required observations of {yt; t- -1, ..., T+ m} are obtained directly from the generating formula (25).

  • KIVIET MISSPECIFICATION TESTS 259

    For the experiments where x, is non-stationary we have

    Xt =Xt-1+9 +6t= X-2+(t +2)g + = -1 ei, (t'-1). (A.4)

    Now the explanatory variable can be calculated directly from a given starting value x2, from the drift parameter ,, and from the white-noise series {e}. Substitution of (A.4) in (A.1) leads to

    I30 + 16 Et __ _ __ Yt= l [x2+(t+2)]+(p0 + w L)E - t L+ 1-'L

    Hence, the time-series {yt} can be calculated from

    Yt= - 6 6

    [x_2+ (t+2),] + (P30+PL) Et=-, ui + vt, (A.5)

    where vt is again the AR (1) process vt = (1 - yL)-1Et and ui = (1 - yL)1'si is another AR (1) process with the same autocorrelation function, but generated from the white-noise series {ti}.

    APPENDIX B

    Table XI details all 18 possible coefficient combinations of the AD(1, 1) model that serves as the DGP in the Monte Carlo experiments. For each combination it reports the various dynamic adjustment characteristics and the R2 values averaged over the 500 replications for both types of exogenous explanatory variable xt obtained in the AD(1, 1) specification. For the definition of these dynamic characteristics see Harvey (1981, pp. 224-235). Note that the mean lag MNL is defined only if all lag coefficients in (,80+,81L)(1k-AL)-1 have the same sign. Since -y>O and 180>0 in our models, this happens if /3 > - y,80, which is always the case here. The formula for the mean lag in (25) is

    MNL = 31/(,30 +f 31) + y/ (1 - y), (P1 > - YPo0)

    The median lag measures the number of time periods it takes for 50% of the total multiplier to be realised. Since the standardised interim impact multipliers are

    SIM_ =13?/TMP i = 0 I 1- y l(/ + YP0)/1( 0 +' ,1) = 1 - y i ( - SIMO) i '-1

    the median lag MDL becomes

    PI0 SIMo-0 05 MDL= (0 5- SIMo)/(SIM, - SIMo) SIM1 '0 5

    (ln y)1- * ln (0 5/(1 - SIMO)) otherwise.

    First version received July 1983; final version accepted October 1985 (Eds.) I would like to thank for their valuable comments on earlier versions of this paper: Anil Bera, David

    Hendry, Jan Podivinsky, Stephen Pollock, participants of a session at the 1981 European Meeting of the Econometric Society in Amsterdam and members of seminars at the University of Leeds and Erasmus University Rotterdam. Also the helpful suggestions from my colleagues at the University of Amsterdam, especially Mars Cramer, are gratefully acknowledged. Finally, I want to thank Managing Editor, Grayham Mizon and a number of anonymous referees of this Review for their remarks and guidance.

  • 260 REVIEW OF ECONOMIC STUDIES

    TABLE

    XI

    mean R2

    x

    stationary

    x

    non-stationary

    A =

    0-8

    X-2 = 1; A =

    0*02

    AD(1, 1)

    Dynamic

    adjustment

    O2= 3;

    o-21

    O2= 3;

    o=2

    coefficients

    characteristics

    -

    ---

    Combination

    y

    10

    TMP

    SIMO

    SIM,

    MNL

    MDL

    T20

    T40

    T80

    T= 2O

    T40

    T80

    1

    0*9

    0-04

    -0-02

    0-2

    0-2

    0-28

    8

    4-46

    0-580

    0-665

    0

    734

    0-623

    0-723

    0-819

    2

    0.9

    0*20

    -0-10

    1-0

    0-2

    0-28

    8

    4-46

    0-650

    0

    754

    0-816

    0-842

    0-933

    0-971

    3

    0-9

    1 00

    -0-50

    5.0

    0-2

    0-28

    8

    4*46

    0

    933

    0-963

    0

    977

    0*988

    0*996

    0-999

    4

    0-9

    0*10

    -0*08

    0-2

    05

    0-55

    5

    0

    0-590

    0-684

    0-746

    0-640

    0-750

    0-831

    5

    0-9

    0-50

    -0-40

    1*0

    05

    O55

    5

    0

    0-738

    0-808

    0-851

    0*905

    0951

    0*979

    6

    0 9

    2-50

    -2*00

    5-0

    0.5

    0-55

    5

    0

    0*978

    0-984

    0*988

    0*994

    0*997

    0-999

    7

    0-9

    0-16

    -0*14

    0-2

    0-8

    0-82

    2

    0

    0-621

    0-694

    0*744

    0*682

    0-779

    0-853

    8

    0-9

    0-80

    -0-70

    1.0

    0-8

    0-82

    2

    0

    0-843

    0-870

    0-890

    0*944

    0-969

    0-983

    9

    0*9

    4-00

    -3-50

    5.0

    0-8

    0-82

    2

    0

    0-989

    0-992

    0-993

    0-997

    0-999

    0-999

    10

    0.5

    0-04

    0 06

    0-2

    0-2

    0-60

    1-6

    0 75

    0*344

    0-344

    0-348

    0-541

    0-657

    0-765

    11

    05

    0*20

    0*30

    1.0

    0-2

    0 60

    1-6

    0*75

    0-760

    0

    812

    0-843

    0-931

    0-967

    0-984

    12

    05

    1.00

    150

    5.0

    0-2

    0 60

    1*6

    0 75

    0-983

    0.990

    0-992

    0*996

    0-999

    0-999

    13

    0-5

    0.10

    0

    0-2

    0-5

    075

    1 0

    0

    0340

    0343

    0-364

    0-535

    0-642

    0-758

    14

    0.5

    0-50

    0

    1-0

    05

    0-75

    1.0

    0

    0-779

    0-830

    0

    853

    0-929

    0-966

    0-984

    15

    0-5

    2 50

    0

    50

    05

    0-75

    1.0

    0

    0-987

    0991

    0-993

    0-997

    0-999

    0-999

    16

    05

    0-16

    -0-06

    0-2

    0-8

    0*90

    0-4

    0

    0-371

    0-369

    0-374

    0-560

    0-664

    0-783

    17

    0.5

    0-80

    -030

    1.0

    0-8

    090

    0-4

    0

    0-821

    0855

    0-871

    0-948

    0-971

    0-985

    18

    05

    400

    -1.50

    5.0

    08

    090

    0-4

    0

    0991

    0-993

    0-994

    0-998

    0-999

    0-999

  • KIVET MISSPECIFICATION TESTS 261

    REFERENCES

    ANDERSON, T. W. (1958) An Introduction to Multivariate Statistical Analysis (John Wiley & Sons) ANDERSON, G. J. and MIZON, G. E. (1984), "Parameter Constancy Tests: Old and New", (discussion paper

    NO. 8325, University of Southampton) BERA, A. K. and JARQUE, C. M. (1982), "Model Specification Tests; A Simultaneous Approach", Journal

    of Econometrics, 20, 59-82. BOX, G. E. P. and PIERCE, D. A. (1970), "Distribution of Residual Autocorrelations in Autoregressive-

    Integrated Moving Average Time Series Models", Journal of the American Statistical Association, 65, 1509-1526.

    BREUSCH, T. S. (1978), "Testing for Autocorrelation in Dynamic Linear Models", Australian Economic Papers, 17, 334-355.

    BREUSCH, T. S. (1979), "Conflict among Criteria for Testing Hypotheses: Extension and Comments", Econometrica, 47, 203-207.

    BREUSCH, T. S. and GODFREY, L. G. (1981), "A Review of Recent Work on Testing for Auto-Correlation in Dynamic Simultaneous Models", in Currie, D., Nobay, R. and Peel, D. (eds) Macroeconomic Analysis (London: Croom Helm).

    BREUSCH, T. S. and PAGAN, A. R. (1980), "The Lagrange Multiplier Test and Its Applications to Model Specification in Econometrics", Review of Economic Studies, 47, 239-253.

    CHOW, G. C. (1960), "Test of Equaltiy between Sets of Coefficients in Two Linear Regressions", Econometrica, 28, 591-605.

    DAVIDSON, J. E. H. and HENDRY, D. F. (1981), "Interpreting Econometric Evidence: The Behaviour of Consumers' Expenditure in the UK", European Economic Review, 16, 177-192.

    DAVIDSON, J. E. H., HENDRY, D. F., SRBA, F. and YEO, S. (1978), "Econometric Modelling of the Aggregate Time-Series Relationship between Consumers' Expenditure and Income in the United King- dom", The Economic Journal, 88, 661-692.

    DAVIES, N., TRIGGS, C. M. and NEWBOLD, P. (1977), "Significance Levels of the Box-Pierce Portmanteau Statistic in Finite Samples", Biometrika, 64, 3, 517-522.

    DURBIN, J. (1970), "Testing for Serial Correlation in Least-Squares Regression when some of the Regressors are Lagged Dependent Variables", Econometrica, 38, 410-421.

    EVANS, G. B. A. and SAVIN, N. E. (1982), "Conflict Among the Criteria Revisited: the W, LR and LM Tests", Econometrica, 50, 737-748.

    GODFREY, L. G. (1978), "Testing Against General Autoregressive and Moving Average Error Models when the Regressors include Lagged Dependent Variables", Econometrica, 46, 1293-1302.

    HARVEY, A. C. (1981), The Econometric Analysis of Time Series (Oxford: Philip Allan.) HENDRY, D. F. (1979), "The Behaviour of Inconsistent Instrumental Variables Estimators in Dynamic Systems

    with Autocorrelated Errors", Journal of Econometrics, 9, 295-314. HENDRY, D. F. (1980), "Predictive Failure and Econometric Modelling in Macroeconomics: The Transaction

    Demand for Money" in Ormerod, P. (ed) Modelling the Economy (London: Heinemann Educational Books.)

    HENDRY, D. F. and MIZON, G. E. (1978), "Serial Correlation as a Convenient Simplification, not a Nuisance: A Comment on a Study of the Demand for Money by the Bank of England", The Economic Journal, 88, 549-563.

    HENDRY, D. F. and RICHARD, J-F. (1982), "On the Formulation of Empirical Models in Dynamic Econometrics", Journal of Econometrics, 20, 3-33.

    HENDRY, D. F. and TRIVEDI, P. K. (1972), "Maximum Likelihood Estimation of Difference Equations with Moving Average Errors: A Simulation Study", Review of Economic Studies, 39, 117-145.

    LJUNG, G. M. and BOX, G. E. P. (1978), "On a Measure of Lack of Fit in Time-Series Models", Biometrika, 65, 297-303.

    MARDIA, K. V. and ZEMROCH, P. J. (1978) Tables of the F- and Related Distributions with Algorithms (London: Academic Press).

    MCLEOD, A. I. and HIPEL, K. W. (1978), "Simulation Procedures for Box-Jenkins Models", Water Resources Research, 14, 969-975.

    MIZON, G. E. (1977a), "Model Selection Procedures", in Artis, M. J. and Nobay, A. R. (eds) Studies in Modern Economic Analysis (Oxford: Basil Blackwell).

    MIZON, G. E. (1977b), "Inferential Procedures in Nonlinear Models: An Application in a UK Cross Section Study of Factor Substitution and Returns to Scale", Econometrica, 45, 1221-1242.

    MIZON, G. E. and HENDRY, D. F. (1980), "An Empirical Application and Monte Carlo Analysis of Tests of Dynamic Specification", Review of Economic Studies, 47, 21-45.

    PAGAN, A. R. and HALL, A. D. (1983), "Diagnostic Tests as Residual Analysis (with discussion)", Econometric Reviews, 2, 159-254.

    PIERCE, D. A. (1971), "Distribution of Residual Autocorrelation in the Regressioin Model with Autoregress- ive-Moving Average Errors", Journal of the Royal Statistical Society, Series B, 33, 140-146.

    SAVIN, N. E. (1976), "Conflict Among Testing Procedures in a Linear Regression Model with Autoregressive Disturbances", Econometrica, 44, 1303-1315.

    SPENCER, B. G. (1975), "The Small Sample Bias of Durbin's Test for Serial Correlation", Journal of Econometrics, 3, 249-254.

    WILSON, A. L. (1978), "When is the Chow Test UMP?", The American Statistician, 32, 66-68.

    Article Contentsp.241p.242p.243p.244p.245p.246p.247p.248p.249p.250p.251p.252p.253p.254p.255p.256p.257p.258p.259p.260p.261

    Issue Table of ContentsThe Review of Economic Studies, Vol. 53, No. 2 (Apr., 1986), pp. 171-308Front MatterEditorial [pp.171-172]Announcement [p.173]Bertrand-Edgeworth Oligopoly in Large Markets [pp.175-204]The Dynamic Effects of Tax Law Asymmetries [pp.205-225]A Complete Characterization of ARMA Solutions to Linear Rational Expectations Models [pp.227-239]On the Rigour of Some Misspecification Tests for Modelling Dynamic Relationships [pp.241-261]Complete Consistency: A Testing Analogue of Estimator Consistency [pp.263-269]Disappointment and Dynamic Consistency in Choice under Uncertainty [pp.271-282]Rational Expectations and Price Rigidity in a Monopolistically Competitive Market [pp.283-292]A Note on Commodity Taxation: The Choice of Variable and the Slutsky, Hessian and Antonelli Matrices (SHAM) [pp.293-299]Pricing Optimal Distributions to Overlapping Generations: A Corollary to Efficiency Pricing [pp.301-306]An Improved Bound for Approximate Equilibria [pp.307-308]Back Matter