model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216...

25
ELSEVIER Journal of Econometrics 83 {1998) 213-237 JOURNAL OF Econometrics Model specification and endogeneity Alice Nakamura a'*, Masao Nakamura b ~ Faculty of Business, University of Alberta, Edmonton, Alb., Canada T6G 2R6 bFaculty of Commerce, University of British Columbia. Vancourel; BC, Canada V6T 17.2 Received 1 March 1995; received in revised form 1 June 1996 Abstract This paper considers the treatment of endogenous explanatory variables in the work of the Cowles Commission and in Carl Christ's classic 1966 textbook, and certain problems that arise when this approach is followed in areas such as the study of female labor supply where a priori knowledge is sparse or uncertain. The motivations for, and evidence against the use of, mixed estimation approaches involving exogeneity pretests are explored. The paper concludes with a consideration of complementary and alternative empirical approaches, including greater use of predictive evaluation as suggested by Christ. ~(ii! 1998 Elsevier Science S.A. Ke),words: Endogeneity; endogcneity testing; lnstrunaental variables; Female labor supply ,IEL class~lication." C 14; C30 I. Introduction The Cowles Commission for Research in Economics, in its Chicago era, helped to put the concept of endogeneity into place theoretically as a corner-stone * Corresponding author. This paper was prepared for 'Macroeconometrics and Econometric Methods: A Conference in Honor of Professor Carl F. Christ', The Johns Hopkins University, 21-22 April 1995. The research was partially supported by grants from the Social Sciences and Humanities Research Council of Canada. The authors thank Don Andrews, Patrick Asea, Erwin Diewert, R.W. Farebrother, Lawrence Klein, Jan Kmenta, Hiroki Tsurumi, and particularly John Cragg as well as an anony- mous referee for helpful comments on earlier versions of various parts of this paper. We also thank Ken Nakamura, a medical student, for help with the section on research methods in epidemiology. All remaining errors and misinterpretations are our responsibility. 0304-4076/98/$19.00 (C) 1998 Elsevier Science S.A. All rights reserved Pll S0304-4076(97)00070-5

Upload: others

Post on 23-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

ELSEVIER Journal of Econometrics 83 {1998) 213-237

JOURNAL OF Econometrics

Model specification and endogeneity

Alice Nakamura a'*, Masao Nakamura b

~ Faculty of Business, University of Alberta, Edmonton, Alb., Canada T6G 2R6 b Faculty of Commerce, University of British Columbia. Vancourel; BC, Canada V6T 17.2

Received 1 March 1995; received in revised form 1 June 1996

Abstract

This paper considers the treatment of endogenous explanatory variables in the work of the Cowles Commission and in Carl Christ's classic 1966 textbook, and certain problems that arise when this approach is followed in areas such as the study of female labor supply where a priori knowledge is sparse or uncertain. The motivations for, and evidence against the use of, mixed estimation approaches involving exogeneity pretests are explored. The paper concludes with a consideration of complementary and alternative empirical approaches, including greater use of predictive evaluation as suggested by Christ. ~(ii! 1998 Elsevier Science S.A.

Ke),words: Endogeneity; endogcneity testing; lnstrunaental variables; Female labor supply ,IEL class~lication." C 14; C30

I. Introduction

The Cowles Commission for Research in Economics, in its Chicago era, helped to put the concept of endogeneity into place theoretically as a corner-stone

* Corresponding author. This paper was prepared for 'Macroeconometrics and Econometric Methods: A Conference in

Honor of Professor Carl F. Christ', The Johns Hopkins University, 21-22 April 1995. The research was partially supported by grants from the Social Sciences and Humanities Research Council of Canada. The authors thank Don Andrews, Patrick Asea, Erwin Diewert, R.W. Farebrother, Lawrence Klein, Jan Kmenta, Hiroki Tsurumi, and particularly John Cragg as well as an anony- mous referee for helpful comments on earlier versions of various parts of this paper. We also thank Ken Nakamura, a medical student, for help with the section on research methods in epidemiology. All remaining errors and misinterpretations are our responsibility.

0304-4076/98/$19.00 (C) 1998 Elsevier Science S.A. All rights reserved Pll S 0 3 0 4 - 4 0 7 6 ( 9 7 ) 0 0 0 7 0 - 5

Page 2: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

214 A. Nakamura, M. Nakamura / Jourtial of Econometrics 83 ¢!998) 213-237

of econometric identification, estimation, and inference. 1 Carl F. Christ deserves considerable credit for the fact that the Cowles Commission approach to these problems became widely accepted. His classic 1966 textbook, Econometric Models and Methods, made these advances more accessible to interested re- searchers and subsequent generations of graduate students including our own.

Christ's motivation, and that of the Cowles Commission, for defining and drawing out the consequences of the endogeneity of variables in econometric models was to further behavioral understanding. This paper examines how an emphasis on endogeneity has spurred the development and use of instrumental variables estimation and endogeneity tests in areas such as labor economics, and how concerns about endogeneity are (and are not) helping to improve behav- ioral understanding.

In Sections 2 and 3, we review certain basics of why the concept of endogene- ity is important, and distinguish two types of operationally different endogenous variables: those of primary interest in the context of a particular research project, and endogenous control variables. Section 4 presents an introduction to endogeneity testing from the applied perspective of research on female labor supply. Section 5 examines formal statistical properties of the endogeneity tests introduced in Section 4. Section 6 looks back at the original motivations spelled out by Christ for designating some variables as endogenous, and considers possible costs of a preoccupation with endogeneity testing and the instrumenta- tion of right hand variables suspected of being endogenous. Alternative and supplementary research strategies are considered. Section 7 concludes.

2. Endogeneity in traditional simultaneous equations models

Economics models in consumer demand analysis and many areas of macro. economics have traditionally involved multiple dependent variables theorized to be causally and simultaneously interrelated. This is the context within which concerns about identification and endogeneity evolved, stimulating the develop- ment of simultaneous equations estimation methods. It is within this context that Christ introduces the concept of endogeneity. Following the approach of the Cowles Commission (Koopmans and Hood, 1953, pp. 117-120), Christ defines an endogenous variable in a linear simultaneous equations model as the complement of an exogenous variable:

An exogenous variable in a stochastic model is a variable whose value in each period is statistically independent of the values of all the random disturbances in the model in all periods . . . . Exogenous variables may be random, or may be

z For a retrospective overview of the Cowles Commission's contributions to econometric methods and practice, see Christ (1994).

Page 3: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 215

deliberately set by some agency, as by government. Variables that are not exogenous are endotdenous. (Christ 1966, pp. 156-157)

In Christ's treatment of endogeneity, two context-specific types of endogen- ous variables are lumped together under this label. 2 We define and discuss the first in this section and the second in the second subsection of the following section. The first type are the variables that are the direct focus of research interest. Christ (1966, pp. 12, 13) provides an example:

... imagine a theory that claims to be able to make a conditional prediction of national income for next year, given the magnitudes for next year of investment, government purchases, and tax receipts. Then in such a theory national income is an endogenous variable ....

We will refer to endogenous variables for which equations must be developed to fulfill primary research objectives as primary endogenous variables. In Christ's textbook, as in the work of the Cowles Commission, situations are contemplated in which there is true a priori information on the interrelationships among the primary endogenous variables of a model.

Single equation coefficient estimates for the directly included endogenous variables will pick up not only the direct effects of these variables but also spurious effects due to the correlations of these variables with model disturbance terms. This is the endoqeneity bias problem. When equations with directly included endogenous explanatory variables are estimated by ordinary least squares (OLS) - the context in which concerns about endogeneity bias problems were first raised - the resulting endogeneity problem is also called the OLS bias problem (Christ, 1966, pp. 453-464). The use of IV or some other simultaneou~ equations estimation method is how researchers often try to deal with these bias problems. 3

The main acknowledged cost of using a simultaneous equations estimation method is a loss of e/Jiciency. The IV approach involves separating a variable suspected of being endogenous, such as the wage variable in a labor supply model, into the portion explained by an auxiliary IV equation and the anxiliary regression residuals. Suppose the objective is to estimate flw, the coefficient of the wage variable, w, in a labor supply equation. If ew denotes the residuals from an auxiliary instrumental wage equation, the loss of efficiency in estimating ft,. comes from the addition of fl,,ew to the error term for the labor supply equation when the predicted wage variable from the auxiliary equation, v~, is substituted

2 Papers by Geweke (1987) led us to think about this distinction.

3 Although it is rarely mentioned in studies using instrumental variables, in finite samples, IV estimates can also be biased even when the instruments are exogenously determined. See Pagan and Jung (1993), Angrist and Krueger (1995) and Buse (1992).

Page 4: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237

for w in the labor supply equation. When the R 2 for the auxiliary wage equation is low, this loss of efficiency can be substantial.

Nevertheless, with primary endogenous variables in situations where there is appropriate a priori information and sufficient data, basic research objectives and the statistical advantage of avoiding or lessening endogeneity bias problems both poin: in the direction of using a simultaneous equations estimation method such as iV.

3. Dealing with endogeneity outside the textbook context

3.1. Interrelated primary endogenous variables

In modern empirical research, many of the models used resemble the tradi- tional textbook simultaneous equations models in that they involve sets of interrelated primary endogenous variables. For instance, models of female labor supply are often specified to involve equations for both the wages women receive and their annual hours of work, with hours of work assumed to depend on the wage variable. However, in this and many other areas of applied economic research there is little in the way of strongly held a priori knowledge. 4 As a consequence, there is considerable disagreement concerning the choice of variables to be included in the equations to be estimated. Low R 2 values for the auxiliary IV equations for variables believed to be endogenous are common, and the associated efficiency losses can be large.

Moreover, in areas where a priori knowledge is sparse or uncertain, two other potential costs of instrumentation can be important. The first is that some of the variables included on the right hand side of the auxiliary IV equations may not be exogenous either. This can result in endogeneity bias problems even qBer instrunlentation. In fact, an endogeneity bias problem can be worsened by instrumentation if the explanatory variables included in an auxiliary IV equa- tion pick up components of variation in a variable suspected of being endogen- ous that are, in fact, correlated with the true equation disturbance term and account for very little of the truly exogenous variation in this variable. An example may help clarify this point.

Consider an office situation in which the salaries of the secretaries differ depending on labor supply related attributes such as whether they work full time or part time and their seniority, as well as on whether the personnel director happened to be there at the time of the initial hiring or the acting personnel director was in charge who makes systematically different decisions about starting wage rates. The secretary-to-secretary differences in who did the hiring are a source of purely exogenous wage variation, from a worker perspective, that

There are other important problems we do not deal with including sample selectivity.

Page 5: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 217

would not be captured by any of the variables commonly included in auxiliary IV wage equations. Rather, with an IV approach this exogenous wage variation would probably end up as part of the residual wage contribution to the error term for the labor supply equation. On the other hand, some truly endogenous components of wage variation may well be picked up by schooling variables in an instrumental wage equation, because of the effects of unobserved ability and taste factors. This, in turn, could cause endogeneity bias problems despite the use of IV estimation.

A second concern when there is little a priori knowledge is that instrumenta- tion will result in a loss of relevance because the auxiliary equations fail to capture the types of variation in the right-hand side endogenous variables that are important from an applications perspective. This concern is discussed more fully in Section 6.1. This worry, coupled with concerns that the available instruments may not be truly exogenous and with the knowledge that there will always be efficiency losses from instrumentation, has motivated some re- searchers including ourselves to seek ways of verifying the existence, or of determining the seriousness, of potential endogeneity problems before deciding on whether or not to use IV estimation. More specifically, these concerns have led some researchers to be interested in a pretest estimation strategy, where the choice of whether to stick with single equation estimation results or to use IV or some other simultaneous equations estimation method would depend on em- pirical pretest evidence concerning the existence or the likely severity of the suspected endogeneity bias problems.

3.2. Endogenous control variables

in the work of the Cowles Commission and in Christ's textbook, the terms exogenous and endogenous are given rigorous statistical delinitions. There can be problems of endogeneity bias whenever included explanatory variables fail to satisfy the statistical definition of exogeneity. We have discussed the wage variable in models of female labor supply as an example of a primary endogenous variable for which a behavioral, or at least a forecasting, equation is needed to meet the stated research objectives as well as to deal with possible endogeneity bias problems. However, in many applied research areas, many of the explanatory variables suspected of being endogenous are not of direct research interest.

Explanatory variables suspected of being endogenous but which are not of direct interest in the given research context will be termed endogenous control variables. They are included to control for effects that might otherwise obscure the behavioral responses of prime interest.

4. Evolution of endogeneity testing in labor economics

In this section we trace the evolution of the main endogeneity tests that have been used in cross-sectional and panel data labor economics

Page 6: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

218 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237

studies) Interest in endogeneity pretests in research on female labor supply was stimulated by frustration with the quality of the instrumental equations for the explanatory variables suspected of being endogenous.

For example, in a series of papers, Nakamura and Nakamura produced labor supply wage elasticity estimates for married women that differed greatly from the published results of others for women. The R 2 values for the auxiliary wage equations in the Nakamura and Nakamura studies rarely exceeded 0.34 and were mostly in a range of 0.06 to 0.14. This raised questions about whether the weak auxiliary wage equations were the cause of the unusual elasticity estimates and whether the e x t e n t of the endogeneity problems for the wage variable justified the inevitable losses in efficiency due to the substitution of poor instrumental wage variables. 6

5 There are other approaches to endogeneity testing besides the sort that has found application in labor economics, including methods for which it is necessary to have observations over time. Also, Reynolds {1982) takes a Baysian posterior odds approach to the problem of endogeneity testing. Discussions and results concerning alternative possible definitions of endogeneity, causality, and identifiability can be found in Simon (1953), in Zeliner {1984) and in Cragg and Donald {1993).

~' Killingsworth and Heckman (1986, pp. 134-135) sum up the theoretical arguments that were used to explain why the female uncompensated wage elasticity of labor supply might be expected to be considerably more positive than the male elasticity:

The lirst step is to apply to commodity demands the discussions of input demand of Hicks {1965, pp. 242 246}, Marsh,'dl { 1920, pp. 386, 852 853}, and Pigou { 1946, p. 682): the elasticity ofdemand for a good (in this case, leisureJ with respect to its price {in this case, the wage rate) will be greater, the greater is the awtilability of alternatives to that good. The next step {Mincer, 1962) is to observe that women in effect have more alternative uses for their time .... market work, home work and leisure than do men, who for the most part divide their time between only two uses, market work and leisure. In other words, the substitution towards market work that men undertake when their wage rises is primarily a substitution away from leisure, whereas a wage increase leads women to substitute away from both leisure and home work.

On our elasticity estimates, Killingsworth and Heckman {1986, pp. 185, 193) write:

The main exception to these generalizations concerns the results of studies of US and Canadian data by Nakamura and Nakamura {1981b), Nakamura, Nakamura and Cullen {1979) and Robinson and Tomes {1985). Here, the uncompensated elasticity of labor supply with respect to wages is negative . . . . it is tempting simply to dismiss such results as mere anomalies . . . .

Commenting on the evolution of this empirical literature and a later 1981 survey article by Heckman, Killingsworth and MaCurdy, Berndt (1991, pp. 634-6341 writes:

Not all labor econometricians agree with Heckman, Killingsworth, and MaCurdy's assessment. In particular, already in the late 1970s, Alice Nakamura and Masao Nakamura reported results of a second-generation study that found female labor supply to be basically unresponsive to changes in wage rates, similar to the findings reported by others for males .... Additional supporting findings were presented later in Nakamura and Nakamura 11981b, 1985a, b). This controversy about whether males and females respond differently to wage rate changes is of considerable interest ....

Page 7: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamt~ra / Jourm~l of Econometrics 83 (1998) 213-237 219

Doubts were also raised about whether the instrumental wage variables were capturing the relevant wage variations, from the perspective of labor supply behavior. The explanatory variable that accounted for most of the explained variation in the auxiliary wage equations was years of schooling: a variable that remains fixed in value for most adults over long numbers of years. Doubts were raised also as to whether the education and certain other variables in the auxiliary wage equations were truly exogenous. It is problems of these sorts that stimulated the interest of applied labor economists in endogeneity tests.

In 1973 and 1974 papers, Wu presented tests for endogeneity based on his test statistics TI, T2, T3 and 7"4. Hausman (1978) presented another endogeneity test - one that appeared to be far more convenient to implement than those of Wu. Nakamura and Nakainura (1981a) showed that the OLS-IV specialization of Hausman's endogeneity test approach is identical to Wu's T2 test for the linear class of models specified in Wu's (1973, 1974) papers. 7 Nakamura and Nakamura also showed the exact relationships of the Hausman and Wu tests to the endogeneity tests of Durbin (1954), Revankar and Hartley (1973), and Revankar (1978).

Basic statistical properties of the endogeneity tests cited above are explored in Section 5. This material is important because these endogeneity tests are still widely used, and some findings that are based on them are important. For example, assertions that variables for past work experience must be in- strumented in models for female labor supply are usually backed up by citations to an influential paper by Mroz (1987) that makes extensive use of Wu-Haus- man type tests. ~ Mroz's objectives in his paper were to address endogeneity concerns raised by others, and to try to narrow the wide range of estimates in the published literature for the income and the substitution effects of the wage variable on the labor supply of married women. Mroz explains:

Everyone familiar with the past ten years' research on empirical models of female labor supply is aware of the wide range of estimated income and substitution effects . . . . The estimates presented ... demonstrate the sensitiv- ity of the wage and income coefficients to minor variations in the variables used to instrument the wage rate . . . . Notice that estimates using the set of instrumental variables with the wife's market experience . . . . yield larger wage responses than the rows without this set of instruments . . . . This suggests a possible specification error. To test for such errors, we apply variants of the specification tests proposed by Durbin (1954), Wu (1973), Hausman (1978), and White (1982). (Mroz 1987, pp. 765-.773)

7 Much of the proof in Nakamura and Nakamura 11981a) was provided in an anonymous Econometrica referee report as a replacement for a more cumbersome proof in the original paper. We are grateful now to be able to acknowledge this help from Adrian Pagan.

H For specifics of his endogeneity tests, see Mroz (1987, pp. 773774, 796-798}.

Page 8: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

220 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237

Methodological differences in econometrics are of mud,. greater interest when the differences in perspective lead to important differences in reported applied findings. This is certainly the case with the use of endogeneity tests in research on female labor supply!

5. The properties of the Wu-Hausman endogeneity test

To understand whether endogeneity testing provides a general solution to the problem of deciding when to instrument right-hand variables suspected of being endogenous, it is necessary to examine the formal statistical properties of the tests.

As already noted, Wu (1973, 1974) presents the endogeneity test statistics T~, T2, T3 and 7'4. He finds his T2 test to be the best. Hausman (1978) presents two asymptotically equivalent statistics which differ only in that one uses an IV estimator of the standard error of the regression while the other uses the estimator obtained from application of OLS. Hausman establishes most of his theoretical results for the former of these. We have shown that this statistic of Hausman's, which is Durbin's (1954) statistic, is identical to Wu's /'3 or T4 depending on specifics of the estimator for the standard error of the regres- sion. The second of Hausman's statistics (Hausman 1978, p. 1259) is the one that has been used most in applied studies, and that we have shown to be identical to Wu's 7"2. We refer to it as the WwHausman statistic since it was Hausman who presented it in a form that led to its widespread use but Wu who presented it first.

Both Wu (1973) and Hausman (1978) examine the asymptotic, local power properties of their various test statistics, where by local what is meant is that the departure from the null hypothesis tends to zero as the sample size (n) goes to infinity. However, the alternatives that are relevant in applied settings are rarely local in nature. More relevant comparisons of the test statistics can be carried out using the exact distributional results of Kariya and Hodoshima.

Under the assumption of normal disturbance terms, Kariya and Hodoshima (1980) derive the exact conditional distributions of the Wu-Hausman statistic and of the closely related Revankar (1978) statistic. They show that both these test statistics obey noncentral conditional F distributions. The distribution for the Revankar statistic, RV, is characterized by a single noncentral parameter, 6t. The Wu-Hausman statistic, 7'2, is shown to obey a doubly noncentral F distri- bution with the noncentral parameters 6t, as for the distribution of the Revan- kar statistic, and 62. One hindrance to using these results of Kariya and Hodoshima is that their expressions for 6t and 62 are difficult to interpret. Building on their results, in the following subsection and the appendix we derive alternative expressions that can be more easily understood. We use these alternative expressions to examine properties of the Wu-Hausman and related endogeneity tests.

Page 9: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal t?f Econometrics 83 (1998) 213-237 221

5.1. The conditional distributions of the Wu-Hausman and Revankar statistics

Consider the general linear two-equation model with additive disturbances given by 9

Yt = ~Xty2 + Zl~x2 + ~.1Ul (la) and

Y2 = Z l f l l ~- Z2f12 -1- V2, (lb)

where Yl and Y2 are (n x 1) vectors of observations on two endogenous variables; Zt and Z2 are (n x K1) and (n x K2) matrices of observations on KI exogenous variables included and K2 exogenous variables excluded from the structural equation (la) for the primary endogenous variable yl; v2 = 22u~ + 2au2 where u~ and u2 are (n x l) vectors of random disturbances which are independently normally distributed with means of 0 and variances of 1; and ~t, ~t2, ill, f12, 21, 22 and ;-3 are unknown parameters or parameter vectors of appropriate dimen-

._ 2 / ( ~ 2 + ; 2 ) ] . sions. The population correlation between 21ut and v2 is p [22 2 1/2 Let the reduced form equation for 3'1 be

Yl = Z I H I + Z2H2 + Vl, (2)

where t't = ;.rut + ~tv2. Then we have: f211 = Var(v~) = 22 + 221220tl + 2,, 2 , ,2 t221 = Cov(el, v2) = 2122 + {X I [ A 2 "+" '~3) , ~"222 = Var(v2)= 222 + ,.3, ~"~12 --"

0tt(222 + 2 ~ ) , a n d I2tl.2 - {~r~22~"~11- (~c~12)2 }/~,-222 ._ }2,2-,~2~ lZ3/t/-2 + Z2). W e d e no te

by S11, SI2, $22, f12 and/12 the OLS estimators of Q I I , QI2, Q22, f12 and I12, respectively. The number of included endogenous explanatory variables in the structural equation (la) for Yt is G2 = I. Eq. (lb) is a reduced form instrumental equation for Y2. The variable 3'2 could be either a second primary endogenous or an endogenous control variable. The structural equation (la) for yt is assumed to be identified. (The variance of the IV or two-stage least squares estimator lbr • t in (la)exists if Ka - 1 >i 23

The Wu-Hausman and related endogeneity tests can be viewed as tests of either Ho: Cov(21ut, v2) 0 or of H '" 1o -- o. P = 0. Cov(21u~, v2) is the population covariance between ;~1141, the disturbance term of the structural equation for y~, and v2, the disturbance term of the auxiliary reduced form equation for Y2. This covariance is the numerator of p, the population correlation between 21u~ and v2. So p will be zero if and only if the covariance is zero.

For the given model, the Wu-Hausman statistic may be written as It

T2 = ( ' I (Q*/Q2) (3)

'} it is possible as well to prove these results for a general multi-equation system.

t°See Nakamura and Nakamura (1985c, pp. 215--216, 220--226) and Nakamura et al. (1990, pp. 100-103) on the choice of the null hypothesis in endogencity testing.

,t See Nakamura and Nakamura ~1981a) for the identity of Wu's T2 and the Hausman statistic. See Kariya and Hodoshima (1980, p. 47, Eqs. (3.16) and (3.18)) for expressions in (3} and {4).

Page 10: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

222 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213--237

where ct =(n-Kt -2G2) /G2, Q*=(bt-b2)'[(y'2A2Y2)-l-O'~AtY2)-l] - ' ( b t - b 2 ) , b ~ = ( y ' 2 A ~ y 2 ) - ~ y ' 2 A ~ y ~ for i = 1 , 2 , A ~ = I - Z I ( Z ' ~ Z t ) - t Z ' ~ ,

A2 = Z ( Z ' Z ) - ~ Z ' - Zx(Z' tZt ) - ~ Z'~, Z = (Zt, Z2), and Q2 = Stt.2 + q2, and where q2 = / 1 [ [ A 2 2 t - A21f12( f i2A2:~2) - l f l '2Af~]f f l2, A22 = [Z'2Z2 - Z ' 2 Z I ( Z ' t Z I ) - ~ Z' tZ2]- t, and S ~ . 2 = S t~ - S t 2 S 2 2 ~ $2~. Using this same nota- tion, Revankar's statistic may be written as

RV --c3(Q*/S11.2 ) (4)

where 12 c3 = ( n - K - G 2 ) / G 2 and K = Kt + K2.

Kariya and Hodoshima (1980, p. 48, Eq. (3.31)) show that the distributions of the Wu-Hausman statistic, T2, and Revankar's RV, conditional on the values for f12 and $22 , a r e given by

and

T21(f12, S22) ~ F " ( 6 t , 6 2 " G 2 , n - k t - 2G~) (5a)

RV 1(fl2, $22) ~" F ' ( 3 1 "G2 , n - K - G2) . (Sb)

F" and F' are the doubly noncentral and noncentral F distributions, respect- ively.

In the appendix, we show that 6~ in (5a) and (Sb) can be expressed as

~t = (p'/2 ~) S2:,(R~, ~-- z,1,..z,) (C.F.), (6a)

or, for a large n, as

2 6i ~ [p~'l( I -- p" )](n - k) (R.~,: ~ z,ts, z., ). (6b)

in (6a}, p*' is the square of the population correlation between the true error terms for the structural equation {la) for ,,tr and for the auxiliary equation (lb)

' is the population variance of the component of the true for .v:; the parameter ,,,,~ error term for the equation for y., that is independent of the true error term for equation (la) for .~t,r ' the statistic S..,2 is the OLS estimator of the population variance of the true error term for (lb); and R~,-z,/~, z: is the R 2 from the regression of), , - Z t f l t on Z: , where Z~ denotes the K~ exogenous variables included in both (la) and (1 b) while Z, denotes the K2 exogenous variables included in (Ib) but excluded from (la). C.F. approaches 1 as n goes to infinity.

t2 St t.2 in the denominator of RV as given in {4} is the residual sum of squares from the regression of the OLS residuals from the reduced form equation for yt on the OLS residuals from the reduced form equation for y:. Q2 in the denominator of T.~ as given in {3} is the residual sum of squares from the :egre~sion of r~ o, r2, ZI and the OLS residuals from the reduced form equation for y,. Thus, q: is the amount by which the residual sum of squares from the regression of Yt on Y2, Zt and the OLS residuals from the reduced form equation for 3'2 exceeds the residual sum of squares from the regression of the OLS residuals from the reduced form equation for),t on the OLS residuals from the reduced form equation b r Y2. The quantity Q* may also be interpreted as the difference between the residual sum of squares when OLS is applied to {la) and the residual sum of squares when yt is regressed on y,, Z~ and the OLS residuals from the reduced form equation for 3,,.

Page 11: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 223

For the second noncentral parameter for the distribution of the Wu-Haus- man statistic, it is shown in the appendix that

pl im~_.~ (62/n) = 0, (7)

and that the Wu-Hausman and Revankar tests are consistent tests. A correct proof of the consistency of these tests does not seem to have been given before in the literature.

5.2. Finite sample comparisons of the Wu-Hausman and Revankar tests

From the appendix expression (A.4) it can be seen that the second noncentral parameter for the Wu-Hausman statistic, 62, will always be positive when K2 > G2. The T2 test will tend to be more powerful than the RV test due to the effect of the associated degrees of freedom, but the RV test will tend to be more powerful than the T2 test due to the effects of 62 on the distribution for 7"2.13 However, from expression (A.5) in the appendix, it can be seen that 62 cannot increase faster in probability than 61 as n increases. This suggests that the properties of the finite sample distribution of the Wu-Hausman statistic are primarily determined by 61: a conjecture that is supported by Monte Carlo results. 14 These Monte Carlo experiments also show that the net effects of 62 and of the difference in degrees of freedom on the relative powers of the Wu-Hausman and Revankar tests are typically small. When Eq. (la) for Yl is just identified, the T2 and RV test statistics are identical, and both have distributions that depend only on 6t.

Using methods parallel to our analysis of the Wu-Hausman and Revankar tests, it can be shown that the distributions of all of the endogeneity test statistics that were mentioned in Section 4 are primarily determined by 61. These results focus attention on expressions (6a) and (6b).

5.3. Poor power fin" endogeneiO, tests when tile auxiliary instrumental equation is weak

A result that emerges from the expressions for 61 and the role of this noncentral parameter in determining the power properties of the Wu-Hausman and related endogeneity tests is that their power will be higher the higher the proportion is of the variability in the included endogenous variable that is explained by the exogenous variables excluded from the structural equation.

is Because of the properties of the doubly noncentral F distribution (see Lehmann, 1959), the conditional power of the Wu-Hausman test is a strictly increasing function of 6t but a strictly decreasing function of 6z to the extent that 61 and ¢~2 can take on independent values.

t4We are referring to our own unpublished Monte Carlo results and to Thurman {1986).

Page 12: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

224 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237

More specifically, the power is an increasing function of R2..,.,-za~, z_~ which appears in (6a) and (6b). We can express this partial R 2 as

R 2 2 ).:.z,.z, - Rr:.z, ?

R;,,.-Z,lh.Z,. = 2 1 -- Rr: . z ,

where 2 is R,,,.z. is the usual overall R 2 for the regression of Y2 on Zt and R 2 . . , ) ' , . ' Z ~ , Z , .

t he R 2 for the regression of Y2 on Zt and Z2. The minimum value that 2 - - R and ( la) is Rr:_z,/~.z: can take on is zero, in which c a s e R 2 2 Y 2 " Z t , Z : ~ y., 'gt

unidentified. The maximum possible value for R 2 r:-za~, z, is R.?,:.z,.z:, which is the R 2 for the instrumental equation for y2. When this R 2 is low, the power of the Wu-Hausman, or any of the other closely related endogeneity tests, will tend to be low. t5

Weak auxiliary equations are commonplace in labor economics research based on micro level data. (See Revankar and Yoshino (1990) for a macro data example where this is so as well.) Mroz (1987, p. 770) reports R 2 values for instrumental wage equations ranging from 0.15 to 0.23. The R 2 values for auxiliary equations for child status variables are often even lower. In situations like these where a pretest for evaluating a potential endogeneity problem is particularly needed, acceptances of a null hypothesis of no endogeneity are likely to be Type !I errors. The use of a Wu-Hausman type endogeneity test provides the appearance, but not the reality, of rigorous investigation of whether it is reasonable to use OLS rather than IV estimation results.

5.4. EmlogeneiO, tests and emhJgeneiO, hias

Because the distribution of "I'2 is primarily determined by 6t, we sec from (6a} and {6b} that the power of the Wu Hausman test will rise as ,o 2 rises, where

[Cov(;~ut, ,;.2ut + ;.3u.,}] 2 ;..a pz = Var(2,u~} Var{;.,u~ + ;.3ua)--,;.~ + ;.32' (8)

Recall that ;.tu, is the true error term for Eq. (la) for j,~, and c2 = ;..~ut + ,;.3u2 is the error term for Eq. (Ib) for Y2. Nonzero values of p-' are the root source of the OLS bias problem. However, the value of p-' is not the sole determinant of the size of the OLS bias. Using the result given in Nakamura and Nakamura (1985c, p. 215, Eq. (4)} and the given properties of u, and u2, the population value of the O LS bias can be expressed as

B = plim(bl - ~,1

[Cov(2~ut, ;.2u~ + ;.~u2)] 2 ;.,;-2 plim(I/n) (y2A,y2) plim(l/n) (y'2A~y2) (9)

t SOn this topic, see. for example. Cragg and Donald (1993).

Page 13: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 225

where b~ is the OLS estimator of ~ in (la), and where y'2AlY2 is the sum of squared residuals from the regression of Y2 on Z~. It can be shown that B depends on 2~, 22 and 2 3 whereas p depends on 22 and 23 but not 2x (as is also the case for both 61 and 62).

From the expressions for p and B, we see that p = 0 if and only if 22 = 0, and when 22 = Owe also have B = 0. Hence H*: B = 0will always be true when H~" p = 0 and Ho: Cov(2~u~, v2) = 0 are true. But the power of the Wu-Hausman test will not be affected by changes in the value of 2~, and hence in the (nonzero) value of B, so long as 2~ and 23 a r e fixed. Monte Carlo evidence reveals that when p and also the explanatory power of the instrumental equation are low, the power of the test of H~: B = 0 (and also of H"o. P = 0 and of Ho" Cov(2~ u~, v2)= 0) can be low despite values of B that are arbitrarily large relative to ~ , the true coefficient of the endogenous explanatory variable. (See Nakamura and Nakamura, 1985c, pp. 220-221.) Unfortunately, it is B rather than p that is of real interest in most applied contexts where there are concerns about the possible endogeneity of included explanatory variables. ~ 6

The importance of this is that tests of endogeneity are usually applied when researchers have strong a priori reasons for believing that p, and hence B, are nonzero. What most applied researchers who use Wu-Hausman type tests are trying to do is heed Durbin's (1954, p. 27) advice: "Since the use ofan instrumen- tal variable involves a certain loss of efficiency one should feel rather cautious about using it until the extent of the bias in the ordinary least-squares estimators has been investigated". Christ (1966, pp. 157 .... 158) offers similar advice:

... there is no point in the enlargement of most models at which a convincing stand can be made against such arguments for the addition of another equation ....... unless it is the point where all possible wlriables have already been included, and of course the model would then be utterly unmanageable. What the economist should do in practice, therefore, in my opinion, is to stop adding equations and variables when he believes that the variables he chooses to call exogenous meet the definition closely enough so that the errors incurred through the discrepancy are small in comparison with the degree of accuracy that he thinks is desirable for his purpose (or is attainable). This is necessarily a somewhat arbitrary decision.

If an exogeneity test is to be used as a ba~is for deciding when B is 'close enough' to zero, it is important that the power of the ~:est reliably rises as the departure of B from zero increases in size. This is not the case for the Wu-Hausman and the other related tests of endogeneity.

!o Hausraan and Taylor (1981) note: 'It appears in practice ... that Ho is frequently tested in situations where we can infer from the subsequent actions taken that the hypothesis Hc'~ was intended... (p. 13). However, they do not consider the differences in power that we do for tests of

H*" B = 0 versus Ho: Coy(2, ul, v2) = 0 or H"~. p = 0.

Page 14: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

226 A. Nakamura, M. Nakamura / Journal of" Econometrics 83 t1998) 213-237

The perverse power properties of these endogene!+y tests are rooted in problems with properly standardizing the difference (b~ - b2)in the numerator of expressions (3) and (4) for T2 and RV, respectively. As defined following (3), b2 is an IV estimator of ~t~ in (la). This estimator is consistent whether or not the associated explanatory variable is endogenous; thus, under the maintained assumptions of the model, plim(b2) - ~ . Since b~ is the OLS estimator of ~ , (b~ - b2) is a point estimator of B - plim(b~ - ~) , the population value of the OLS bias defined in (9). This is so under the null hypothesis Ho: Cov(,;.~u~, v2) = 0 which is equivalent to H' • = • = o.P 0 a n d t o H ~ B 0. I t i sa lso so under the corresponding alternative hypotheses. However, the variance estimators for the difference (bi - b 2 ) that are used in forming the various endogeneity test statistics are only consistent under the null hypotheses Ho, H~ and H* (see Durbin (1954, p. 29) or Hausman (1978)).

In labor economics, tests of endogeneity are being used as though the difference ( b ~ - b2) were being properly standardized even if there is some endogeneity. In particular, small values of endogeneity test statistics are often interpreted informally as indications that whatever endogeneity problems there are are not severe. This inference is inappropriate. The Wu-Hausman and related tests are tests of the existence of an endogeneity bias; they do not provide a metric for the seriousness of an existing endogeneity bias problem.

5.5. The hazard c?['endogenous instruments

A further result that emerges from examination of expressions (3) and (4) is that the Ta and RV tests are only valid when suitable instrumental variables can be found ~ suitable in at least the minimal sense that these instruments are independent of the model disturbance terms. Otherwise, it may not be true that plim(b:) -~ ~t, and {hi - ha) may be an inconsistent estimator of B. Pretesting of the Wu-Hausman variety replaces a priori speculation about the possible endogeneity of the explanatory variable that is the object of the pretest with a priori speculation about the possible endogeneity of the variables in the whole set of instruments to be used in carrying out the endogeneity test. We must ask what is gained by this.

5.6. Rejection o f a pretest mixed estimation approach

The properties of Wu-Hausman type tests lead us to reject a mixed estima- tion strategy, with the decision to use OLS versus IV results depending on the outcome of an exogeneity test. A first reason is that, as shown in Section 5.3, when the etticiency losses from using IV are large, the power of Wu-Hausman type pretests is likely to be low. The tests are weak when they are needed most. Second of all, as was shown in Section 5.4, these are tests for the existence, not the seriousness, of endogeneity bias. They do not provide an appropriate metri..:

Page 15: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 227

for judging the seriousness of an endogeneity bias problem, which is what applied researchers are usually concerned about when trying to choose between OLS and IV res,dts. A third issue that was noted in Section 5.5 above is that Wu-Hausman type tests are of no use in situations where a researcher is concerned that, in addition to efficiency losses, instrumentation may not fully solve the problem of endogeneity bias because the available instruments are not truly exogenous.

6. Dealing with endogeneity without pretests

Without endogeneity pretesting, we are back in the position of having to make variable by variable judgements as to which of the explanatory variables in a model should be instrumented without the benefit of any generally accepted and replicable decision rule. In this situation, some recommend always in- strumenting explanatory variables suspected of being endogenous. These re- searchers argue that appropriate instruments can almost always be found, and that bias and inconsistency problems are inherently more serious than efficiency losses. As evidence that appropriate instruments can be found, supporters of always instrumenting endogenous variables often cite ~he growing literature on natural experiments.

6. i. Rejection of an 'always instntment 'polio'

We reject the 'always instrument' policy. A natural experiment rarely provides exogenous variation for more than one of the included explanatory variables in a model. Natural experiments cannot solve all the endogeneity problems of, say, a labor economist interested in estimating an equation tbr annual hours of work that contains a wage variable, a work experience variable, an education variable, a husband's income variable, and several child status variables all of which are probably endogenous to some degree under normal circumstances. Even vari- ables like years of age or an indicator for being nonwhite that are often said to be unassailably exogenous are not really so. No one works more or less simply because another birthday has passed by, or because of being nonwhite. In a labor supply model, variables for age and race are serving as proxies for related circumstances and behavior, some of which are probably endogenous. Thus we reject the 'always instrument' policy because we regard it as infeasible.

Second of all, with endogenous control variables we reject an 'always instru- ment' policy because of the resulting diversion of effort. For example, the decision to instrument the child status variables in a labor supply equation means that attention must be diverted away from trying to understand how children affect the labor supply behavior of their mothers to the side issue ofhow to specify auxiliary child status equations. A related problem of an always

Page 16: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

228 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213--237

instrument policy is that it encourages applied researchers to limit the variables they include in their models so as to avoid difficult instrumentation problems. Also, efforts to find truly exogenous instruments have the unfortunate tendency to push researchers toward the use of substantively irrelevant instruments.

A third reason we oppose an 'always instrument' policy is that there is usually little real evidence that the instruments that are used are exogenous themselves. This was one of our reasons as well for rejecting a Wu-Hausman pretest estimation strategy.

A related fourth reason we oppose an 'always instrument' policy has to do with the simplified nature of economic models. In instrumentation, variables are usually viewed as homogeneous. Consider a wage variable, w, which appears in a labor supply equation with a true coefficient flw. As before, let Cv denote the predicted values of w from an auxiliary IV equation. Usually we assume that if ~, is substituted for w in the labor supply equation, then flw is the true coefficient of the predicted wage variable too. But d, will only reflect those components of the variation in w captured by the explanatory variables in the auxiliary wage equation.

People's actual wages change for reasons such as improvements in qualifica- tions leading to merit increments or promotions, or switches to better jobs; geographical moves; job loss; seniority wage contract provisions; personal problems requiring reductions in job responsibilities; and so on. In addition. wages differ from worker to worker because of inter-employer differences in compensation practices, differing economic conditions at the time of initial hiring, statistical discrimination, and other such hlctors. Different choices of instruments will pick up different components of the variation in a wage variable even if all the instruments are truly exogenous, and the 'true' coefficients associated with these different components may differ. Because of this, our general beliet' is that multiple instruments should be experimented with and reported for important right-hand side endogenous variables, and that the results for direct estimation should be reported as well. For less important explanatory variables- endogenous control variables for which the researcher is unable or unwilling to spend the effort to check out relevant alternative instru- ments .... we are against instrumentation, in general, we are against claims of parameter consistency based on poorly supported instrumentation attempts.

6.2. Possible iessons fi'om research in epidemiologv

Economics and epidemiology have a great deal in common in terms of pervasive possibilities for endogenous feedback and co-determination of out- come variables. The difficulties are similar, but the research approaches adopted for dealing with these difficulties differ.

Consider tb, early medical case history findings that seemed to indicate an association between high cholesterol levels and cardiovascular disease. There

Page 17: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 229

were suggestions that these results might be partially, or even entirely, due to the fact that those with higher cholesterol levels also, on average, weigh more, exercise less, and smoke more. These concerns might lead an applied econometrician to begin thinking about regressing some indicator of cardio- vascular disease on variables for body weight, amount of exercise per week, and cigarettes smoked per week as well as a cholesterol level measure, though the lack of theories about the underlying behavioral mechanisms would hamper this approach. The econometrician would also be troubled by suggestions that factors that had been ignored in data collection, such as tension or hostility levels and other aspects of eating behavior, could affect the likelihood and severity of cardiovascular disease. Without information on these sorts of factors, concerns might arise about correlations between the equation error term and some of the included explanatory factors affecting cardiovascular disease - much as labor economists such as Schultz (1980) and Mroz (1987) have worried that the taste for work in the true disturbance term for a labor supply equation may be correlated with the child status and various other included explanatory variables.

Why is it that medical journals are not filled with cardiovascular multiple regression models, and with studies repolting Wu-Hausman type test results to ascertain whether certain explanatory variables in cardiovascular disease equa- tions are endogenous? Why are attempts to instrument potentially endogenous explanatory variables hard to find in medical research journals?

In exploring potential causal factors, five distinct questions arise: {1) Is there any detectable effect? 12) How big does the effect seem to be? 13) How big would the effect have to be to be of applied relevance? {4) is it possible that the apparent effect ..... or the apparent smallness or lack of an effect ......... is partially, or even wholly, due to other causal Ihctors? (5) Are there synergistic interactions with other causal factors?

Our impression from the epidemiological literature on the effects of choles- terol on cardiovascular disease is that there is heavy reliance on analysis of variance methods for detecting differences among various groupings of the available data. It seems that a matched samples approach is often used in grouping observations, and that the groupings used in successive studies are progressive in that increasing numbers of factors are taken into account in the matching process. Pro qressively expanded and retar~letted data collection efforts seem to be an inte~lral part oJ' this research approach. Replication of estimation results and prediction also appear to be fundamental to the process by which findings becoming 'established'.

Our impression is that a progressive research strategy is embarked on in epidemiology without pretending to have fully specitied models of the relevant behavioral processes, and without the custom of making unsubstantiated claims about the independence of explanatory factors versus unobserved true error terms. This practice stands in contrast to the stated a priori theory needs of

Page 18: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

230 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237

empirical economists as portrayed by the Cowles Commission and Christ ( 1994, p. 33):

The Cowles view was that to understand a particular aspect of economic behavior, such as the price of food, or aggregate personal consumption, one wanted a system of equations capable of describing it. These equations should contain all relevant observable variables, be of known form (e.g. linear, log-linear, quadratic), and have estimatable coefficients . . . . Little attention was given to how to choose the variables and the form of the equations; it was thought that economic theory would provide this information in each case . . . . The main effort was directed to estimating the equations once they had been formulated.

The Cowles Commission research strategy seems to be to use theory and other a priori knowledge to provide answers to questions (4) and (5) above, and to build these answers into the models to be used in carrying out empirical investigations of questions (1) and (2). (Question (3) above is largely ignored in this approach.) This empirical strategy would be appropriate and effective in labor economics (f only we had the needed a priori knowledge.

6.3. Prediction as one means jbr choosing among alternative estimation results

Most of the approaches we would recommend for dealing with potentially endogenous explanatory variables - these include direct estimation as well as estimation with alternative instrument sets, replication with different existing data sets, and ongoing new data collection to permit the inclusion of new control variables will lead to competing sets of estimation results. How should we choose among them?

Noting the possibility of poor a priori information, Christ (1966) writes:

On many occasions throughout this book it has been emphasized that in practice the so-called a priori information is far from certain and that it is important to have some means of testing and evaluating this information. (p. 531)

In this situation the economist ... can try out several different theoretically reasonable forms, in a sort of experimental fashion confronting each one with relevant data, and then choose among them after he sees how well they fit the data . . . .

The danger lies in the possibility of being too clever or too persistent, and finding an equation that fits the available data well enough but is nevertheless wrong because it describes temporary or accidental features . . . . The best protection we can have against this danger is to test our equations against data that could not have influenced the choice of the equations. (pp. 8-9)

Page 19: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 231

In cross-section studies ... the sample is typically very much larger than in time series studies. We can therefore divide an available sample into two parts, each containing hundreds or thousands of observations, one part to be used initially to help suggest the form of the model and the other part to be used later as a test of the predictive ability of the model chosen. (p. 548)

In our studies of female labor supply behavior we experiment with alternative model specifications, and replicate our results for Canadian census data, US census data, and US Panel Study of Income Dynamics (PSID) data for multiple time periods. Also, in most of our studies, we make use of data samples for multiple demographic groups, and we report direct OLS results and also IV results for alternative instrument sets. Our uncompensated wage elasticity estimates for married women consistently fall in a far narrower range than the range of -0 .89 to 15.24 reported by Killingsworth (1983, Table 43, pp. 194-199) for so-called second generation studies. In our book The Second Paycheck (Nakamura and Nakamura, 1985a), based on PSID data, we provide the elasticity estimates for a number of different estimation approaches and demo- graphic groups, as summarized in Table 1. These are then compared with the elasticity estimates of others:

I'E]stimates of the uncompensated wage elasticity of hours of work for wives from experimental data are found from Killingsworth (1983, Table 6.2, pp. 398-399) to range from -0.36 to 0.94 . . . . the estimates for wives from experimental data span roughly the same range as the estimates for men from both nonexperimental and experimental data (-0.38 to 0.28). Using non- experimental data, Nakamura, et al. (1979, p. 800) obtain uncompensated wage elasticities of hours of work for married women who work of -0.320 to 0.299; Nakamura and Nakamura (1981b, p. 483) report values of -0.495 to 0.654; and values of -0.197 to -0.030 are reported in Nakamura and Nakamura (1983, p. 246) . . . . Also, using yet another data set for Canada with an improved wage variable Chris Robinson and Nigel Tomes (1985) have obtained results for married women that support our findings. (Nakamura and Nakamura, 1985a, pp. 181-183)

Having documented the effects of differences in estimation methods on the uncompensated wage elasticity, a wide variety of in-sample and out-of-sample predictive evaluations involving cross-sectional, year-to-year, and cumulative comparisons are also presented, t'~ Interested readers are referred to Nakamura

,7 We have increasingly focused on year-to-year changes and comparisons (see Nakamura and Nakamura. 1992. 1994} because we agree with Manski 11995. p. 135):

We have seen that the reflection problem can make it rather ditlicult to draw conclusions about the nature of social effects from observations of the equilibrium outcomes experienced by a population .... Observation of the dynamics of social processes can sometimes open new possibilities for inference.

Page 20: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

232 ,4. Nakamura, M. Nakamura / Jourttal of Econometrics 83 (1998) 213-237

Table I Uncompensated wage elasticities of hours of work evaluated at the mean hours of work

Wives Unmarried women

21-46 47-64 21-46 47-64

Working women who worked in year t - I Using IV wage estimates and including lagged hours in the labor supply equation

Using OLS wage estimates and including lagged hours in the labor supply equation

Using IV wage estimates without including lagged hours in the labor supply equation

Workiml women who did not work in year t- ! Using IV wage estimates Using OI,S wage estimates

0.06 0.04 0.12 0.06

- 0. i 0 - 0.08 - 0.03 - 0.03

-0 .10 -0.29 -0 .02 0.03

-0.59 -0.14 0.21 -0.94 -0.10 -0.32 -0 .02 -0.35

Source: The iigures shown are from tables in Nakamura and Nakamura (1985a, pp, 187-189).

and Nakamura (1985a), as well as Nakamura and Nakamura (1992) and Nakamura et al. (1990); and also to Nakamura and Walker (1994) for dis- cussions of predictive approaches to model choice.

The key insight that underlies our own use of predictive analyses is that if the estimated equations cannot capture the main features of the distributions of the dependent variables, both in-sample and out-obsample, 'it is likely that at least one wrong assumption was made somewhere' (Christ, 1966, p. 158). Heckman 11985) in his Foreword to our book The Second Paycheck provides the following generous assessment of our methods of model choice and validation - a n assessment that we repeat because it is strongly supportive of an emphasis on prediction and on the use of judgment in enlarging economic models. Heckman writes:

This book advocates and implements a novel approach to model verification. The approach pursued in many recent studies of labor supply has been to arrive at final empirical specifications for a single demographic group by means of a battery of't ' and "F' tests on the coefficients of candidate variables. The problem of pretest bias is conveniently ignored. Only razely ... do analysts ask how well fitted micro relationships explain other aspects of labor supply such as the aggregate time series movement. Focusing on one demog- raphic group in isolation, these studies present a bewildering array of findings that have thus far eluded synthesis.

This book does not adopt the conventional 't" ratio methodology. The authors estimate the same models for a variety of age, marital status, and sex

Page 21: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 233

groups and look for commonalities of findings across groups. They look for consistency in the impact of explanatory variables on different dimensions of labor supply. Models are simulated both within samples and out of sam- ples . . . .

Given the lack of basic knowledge about empirical regularities of labor force dynamics, the approach taken by the authors appears the most scientifically promising one. (pp. xi-xii)

7. Conclusions

In many areas of applied economic research, the ordinary least squares (OLS) estimates of key relationships are suspect because of potential endogeneity bias problems. However, instrumental variables (IV) estimates can be suspect as well because of large standard errors and erratic parameter estimates, and because of other concerns such as the potential endogeneity or lack of relevant variation in the variables included in the auxiliary IV equation(s). In cases like these, applied researchers have turned to endogeneity tests such as those of Wu and of Hausman in the quest for a statistically sound and replicable basis tot choosing between OLS and IV estimation results. We conclude that this hope is often unrealistic.

Lacking a general statistical decision rule for choosing between OLS and lV or other simultaneous equation estimation results, we consider other research strategies that might help. These include strategies for forming hypotheses about potentially important omitted factors: hypotheses that can then guide further data collection efforts. These also include predictive evaluations of alternative sets of estimation results. These are research strategies that can help direct attention toward empirical models that can explain key features of the observed behavior: they are not strategies that will necessarily lead us to decide on a sintdle best estimated model.

Appendix

To derive interpretable expressions for 61 and 62, we begin with the following expressions for these parameters that Kariya and Hodoshima give in their derivation of the exact conditional distributions of the Wu-Hausman and Revankar statistics:

61 = ~l 'M( l + W ) - l MPl/ tI l l .2 (A.I)

and

62 - rlN( l + 14:) - l N~1/~11.2 = II'Npl/~11.2, (A.2)

Page 22: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

2 3 4 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237

A l ' lr ~ | At F

with r /= C ( l l 2 -- f12~r'~2~ ~r'~12), ]~r __ Cfl2(f l2 C Cf l2 ) - f l2C , N = l - M , W = Cfl2 ¢ - l ~ ' r ' Q1 f21 f222~ ~2, , and with C defined as a non- 0 2 2 p 2 , ~ , and t2i t.2 = t - 2 singular (g2 xK2) matrix such that C 2 = Z ' 2 [ 1 - Z I ( Z ' I Z I ) - 1 Z ' I ] Z 2 . The values for f12 and $22 o n which the distributions of T2 and RV are conditional can be readily obtained in any real application by estimating the reduced form equation for Y2. We view the conditional nature of these distributions as an advantage. However, the expressions given in (A.l) and (A.2) are difficult to interpret because it is not obvious how M, N and W relate to the salient properties of a given model.

To express 6t in (A.1) in more meaningful terms, we let a = Cfl: , • = Cfl2, and m - " ~X 1 - - ~"2221 ~221 • If the structural equation for .vt is identified (i.e., K 2 /> G2), then f 1 2 ~ ! = / - / 2 and

~I = C(II2 - f12 f2~2t t22t )= C f l 2 ( ~ l - ~r~21~'~21 ) (A.3)

which implies that ~/= ~m. We also have {1 + W } - ~ = (I + aa ' / S22 ) - ' = I - [aa'/{S2~ + a'a)], a ' ( l + W ) - t a = a'S22a/(S,.2 + a'a), and M = a(a 'a ) - l a '. Substituting into (A.I), we get 61 = {m2/ t2t t .2)~ 'a(a 'a) - z [a'S22a/(S2~ q-a'a)] {a'a)- 1 a'u = (m2/f~l t . : ) (S2:) (~ 'a /a 'a)2[a 'a / (S2, + a'a)]. Since p is the correla- tion coetticient between ;.tu~ and c2, and since p 2 ~.2,,:2 , = ,', 2/~,'- 2 +/..~), we have m 2/12t t.2 = p2/2 ~. The term (~'a/a'a) 2 is essentially a finite sample correction factor that we will denote by C.F. The term a'a/(S22 + a'a) is the R 2 from the regression of ),2 - Z t / I t on Z,., which we denote by R ~.,.., -z,a, .z:. This term is the ratio of the variation of ),~ - Z,/It explained by Z., to the total variation of 3'2 - Ztfl t ; that is, it is the proportion of the total variation of 3'2 explained by Za alter removing the effects of Zt . Using this notation, 6t can be expressed as in (6a} in the text. Moreover, since S 2 , . / { n - k ) tends toward I2.,., = ,;.~. + 2 ~ and C.F. tends toward I ' s ~. ,i. n goes to inlinity, with plim,, , a = ~, we also have the approximate expression {6b}.

Using the same notation, 6, in {A.2) can be rewritten in a more interpretable exact form as

<~2 = {m' l lT~ ~.., }~' [ I -a la 'a}-~ a']u

= 0 1 1 2 / ~ t.., t [ {{~'~}{a'a} - {~'a} 2 i l i a ' a t ]

= (p2/21 )In" ln i [ { (~ '~ /n i (a 'a ln} - (~'aln}2 i l ia 'a /n i l . (A.4)

Under standard assumptions, C"ln converges in probability to a constant matrix. From this property and the consistency of the estimator of/7.., it follows that

= 0 , IA.5}

where a* = plim,,_., {Y.ir~ i a~/n).

Page 23: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 235

Since N defined following Eq. (A.2) is idempotent, we get from (A.4) and (A.5) that 0 = O~s.2plim,_.~(62/n) = t211.2plim,,-.~Ol'N'Ntl/n) = I2~.2 (plim,,_. ~ ,fN'/x/~)(plim,_.~ Nrl/v~), or

plim,_.. ~ (Npl/x/~) = 0. (A.6)

We also have

N,I = NC(H2 - f12 O22t O 2 , ) = NCH2 - NCf1202~022. (A.7)

Since NCf1202~O21 = [I -(Cfl2(fl2C'Cfl2)-' fl'2C']Cf12022~02,, it follows from plim,_.~ f12 = ,82 that

plim,_.,_ NCf12021 0 2 1 = 0. (A.8)

Thus, from (A.6)-(A.8), we get

plim,,_..~ (Nrl/x/~) = plim,,_., (NCII2/x/~) = 0. (A.9)

We can write q2 = f I2C 'N 'NCFI2 , where q2 is defined following (3) in tl:e text and where it can be seen from (3) and (4) in the text that q2 is the only difference other than degrees of freedom between the Wu-Hausman and Revankar statis- tics. Thus we have

q2/n = ( fi'2 C'N'/x/~ ( NCfi 2 /x~ ). (A.10)

Using plim, ....... , I)2 = / / 2 , we get from (A.9) that

plim,,_.,, (NCtl, . / ,~) = plim,,_.~, (NCI1/x~) = 0. (A.I 1)

Thus, property (7) in the text follows from (A.10) and {A.I 1). Kariya and Hodoshima (1980, p. 50) note the result given in (7) without proof.

Tsurumi and Shiba (1982) discuss this result as it relates to the identification problem. This result seems to be implied by Revankar (1978, Proposition 4, p. 175), but his proof is based on incorrect lemmas by Wu, as explained, by Hausman (1978, In. 7, p. 1257): 'Wu's derivation of the (non-local) limiting distribution of the test statistic under the alternative hypothesis in equation (3.12) of his paper (1973) seems incorrect since application of the central limit theorem on p. 748 requires the sum of random variables with zero mean .... Interpreted locally Wu's results seem valid since only the usual least squares variance term v t is needed'.

The above results imply the following large sample properties. First, result (7) that plim,_., (q2/n) = 0 means that plim,_.~ T2 = plim,_,,, RV. Thus for large n both T2 in (3) and RV in {4) essentially obey the same conditional distribution F'{6t : G2, ~, ) and hence have essentially the same power. Asymptotically there is no difference in power between the Wu Hausman and Revankar tests. Secondly, it is seen from (6a) in the text that plim,_.~,~61 = ~ since plim,_, ~ $22 = oc,. Hence both the Wu-Hausman and Revankar tests have the power of one for large values of n. This proves that both tests are consistent tests.

Page 24: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

236 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213- 237

Note that when the structural equation for y~ is just identified so that K2 = G2, then N = I - M = 0 and from (A.2) we see that 62 is identically equal to zero. Also, since K = K~ + K2 by definition, when K2 = G2 the degrees of freedom for the two tests are the same. Hence, the two tests will be equally powerful. This is what we would expect, since in this case q2 = 0 and c~ = ca, and hence the W u - H a u s m a n and Revankar statistics are identical.

References

Angrist, J.D., Krueger, A.B., 1995. Split-sample instrumental variables estimates of the return to schooling. Journal of Business and Economic Statistics 13, 225-235.

Berndt, E.R., 1991. The Practice of Econometrics: Classic and Contemporary. Addison Wesley, Reading, MA.

Bust, A., 1992. The bias of instrumental variables estimators. Econometrica 60, 173-180. Christ, C.F., 1966. Econometric Models and Methods. Wiley, New York. Christ, C.F., 1994, The Cowles Commission's contributions to econometrics at Chicago, 1939-1955.

Journal of Economic Literature 32, 3059. Cragg, J.C., Donald, S.G., 1993. Testing identifiability and specification in instrumental variable

models. Econometric Theory 9, 222-240. Durbin, J., 1954. Errors in variables. Review of the international Statistical Institute 22, 23-32, Geweke, J., 1987. Endogeneity and exogeneity, in: Eatweil, J., Milgate, M., Newman, P. (Eds.), The

New Palgrave: a Dictionary of Economics, vol. 2. Macmillan, London, pp. 134 136. ttausman, J.A., 1978. Specilication tests in econometrics. Econometrica 46, 1251 1271. llausman, J.A., Taylor, W.E., 198 I. Comparing specification tests and classical tests, Bell Laborator-

ies economics discussion paper. Heckman, J.J., 1985. Foreword. In: Nakamura, A., Nakamura M. ( Eds.}, The Second Paycheck:

a Socioeconomic Analysis of Earnings. Academic Press, Orhmdo, FI,, pp. xi xii. Ileckn~an, J..I., Killingsworth, M.R., MaCurdy, T.E., 1981. Empirical evidence on static labour

supply models: a survey of recent developmetlts. In: Hornstein, Z., Gricc, J., Webb, A. (Eds.) The I~conomics of the Labour Market, iler Majesty's Stationery ()tlice, I,ondon, pp. 73 122.

Ilicks, J.R., 1965. The Theory of Wages, 2nd cd. Macmillan, I,ondon. Kariya, T., Hodoshima, J., 1980. Finite sample properties of the tests of independence in structural

systems and LRT. Economic Studies Quarterly 3 I, 4556. Killmgsworth, M.R., 1983. Labor Supply. Cambridge University Press, New York. KiUingsworth, M.R., Heckman, .I.J., 1986, Female labor supply: a survey. In: Ashenfelter, O., Layard,

R. (Eds.}, Handbook of Labor Economics, vol. !. North-Holland, Amsterdam, pp. 103 204, Koopmans, T.C., Hood, W.C., 1953. The estimation of simultaneous linear economic relationships.

In: tlood, W.C., Koopmans, T.C, (Eds.), Studies in Econometric Methods, Wiley, New York, pp. 112199.

Lehmann, E,L., 1959. Testing Statistical Hypotheses. Wiley, New York. Manski, C.F., 1995. Identification Problems in the Social Sciences. llarvard University Press,

Cambridge, MA. Marshall, A., 1920. Principles of Economics, 8th ed. Macmillan, New York. Mincer, J., 1962. Labor fi}rce participation of married women: a study of labor supply, in: Aspects

of Labor Economics. National Bureau of Economic Research, Princeton University Press, Princeton, N J, pp. 63 97.

Mroz, T,A., 1987. The sensitivity of an empirical modal of married women's hours of work to economic and statistical assumpti{~ns. Econometrica 55, 765 799.

Nakamura, A., Nakamura, M,, 1981a. On the relationships among several specifica!ion error tests presented by Durbin, Wu and Hausman. Economctrica 49, 1583 1588.

Page 25: Model specification and endogeneityblogs.ubc.ca/nakamura/files/2016/03/nakamura_joe_1998.pdf · 216 A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 for w in the

A. Nakamura, M. Nakamura / Journal of Econometrics 83 (1998) 213-237 237

Nakamura, A., Nakamura, M., 198 lb. A comparison of the labor force behavior of married women in the United States and Canada, with special attention to the impact of income taxes. Econometrica 49, 451-489.

Nakamura, A., Nakamura, M., 1983. Part-time and full-time work behavior of married women: a model with a doubly truncated dependent variable. Canadian Journal of Economics, 229-257.

Nakamura, A., Nakamura, M., 1985a. The Second Paycheck: a Socioeconomic Analysis of Earnings. Academic Press, Orlando, FL.

Nakamura, A., Nakamura, M. 1985b. Dynamic models of the labor force behavior of married women which can be estimated using limited amounts of past information. Journal of Econo- metrics 27, 273-298.

Nakamura, A., Nakamura, M., 1985c. On the performance of tests by Wu and by Hausman for detecting the ordinary least squares bias problem. Journal of Econometrics 29, 213-227.

Nakamura, A., Nakamura, M., 1992. The econometrics of female labor supply and children. Econometric Reviews 11, 1-71.

Nakamura, A., Nakamura, M., 1994. Predicting female labor supply: effects of children and recent work experience. Journal of Human Resources 29, 304-327.

Nakamura, A., Nakamura, M., Duleep, H.O., 1990. Alternative approaches to model choice. Journal of Economic Behavior and Organization 14, 97 125.

Nakamura, A., Walker, J.R., 1994. Learning from empirical evidence. Journal of Human Resources 29, 223247.

Nakamura, M., Nakamura, A., Cullen, D., 1979. Job opportunities, the offered wage, and the labor supply of married women. American Economic Review 69, 787 805.

Pagan, A.R., Jung, Y., 1993. Understanding some failures of instrumental variable estimators, Working paper, Australian National University.

Pigou. A.C., 1946. The economics of welfare, 4th ed. Macmillan, London. Revanl<ar, N.S., 1978. Asymptotic relative etliciency analysis of certain tests of independence in

a structural system. International Economic Review 19, 165 179. Revankar, N.S., l lartley, M.J., 1973. An independence test and conditional unbiased predictions in

the context of simultaneous equation systems. Intcrnational Economic Review 14, 625 631. revankar, N.S., Yoshino, N., 1990. An expanded equation approach to weak-endogeneity texts in

struclural systems and a monetary application. Review of Econonlics and Statistics 72, 173 177. I~eynoltls, R.A., 1982. I~osterior odds for the hypothesis of independence between stochastic re-

gressors and disturbances. International Econqmfic Review 23, 479 491). Robinson, C., Tomes, N., 1985. More on the labottr supply o1' Canadian women. Canadian Journal

of Economics 18, 156 163. Schultz, T.P., i 9811. Estimating labor supply functions for married women. In: Smith, J. I !!d.), Female

labor supply. Princeton University Press, Princeton, NJ, pp. 25 89. Simon, tt., 1953. Causal ordering and identiliability, in: Hood and Koopmans, Eds., Studies in

Econometric Methods, Wiley, New York, pp. 4974. Thurman, W.N., 1986. Endogeneity testing in a supply and demand framework. Review of Econ-

omics and Statistics 68, 638-646. Tsurumi, H., Shiba, T., 1982. Bayes factor and likelihood ratio interpretation of the F statistics for

testing exogeneity. Econometric Society Meetings, New York, December. White, H., 1982. Maxinaum likelihood estimation of misspecilied models. Econometrica 50, ! 25. Wu, 1)., 1973. Alternative tests of independence between stochastic regressors and disturbances.

Econometrica 41,733 75(I. Wu, !)., 1974. Alternative tests of independence bct.ween stochastic regressors and disturbances:

tinite sample results. Econometrica 42, 529 546. Zellner, A., 1984. Causality and Econometrics. in: Zellner, A. (Ed.), Basic Issues in Econometrics.

University of Chicago Press, Chicago, I L, pp. 35 74.