chapter 4 - houston h. stokes pagehhstokes.people.uic.edu/ftp/book1/ch4.docx · web view4.5...

Revised Chapter 4 in Specifying and Diagnostically Testing Econometric Models (Edition 3) © by Houston H. Stokes 17 October 2010 All rights reserved. Preliminary Draft

Chapter 4

Simultaneous Equations Systems.....................................................14.0 Introduction...............................................................14.1 Estimation of Structural Models.................................................2Table 4.1 Matlab Program to obtain Constrained Reduced Form................................3

Table 4.2 Edited output from running Matlab Program in Table 4.1............................54.2 Estimation of OLS, LIML, LS2, LS3, and ILS3.......................................94.3 Examples................................................................16

Table 4.3 Setup for ols, liml, ls2, ls3, and ils3 commands..................................17Table 4.4 SAS Implementation of the Kmenta Model.....................................25Table 4.5 RATS Implementation of the Kmenta Model...................................27

4.4 Exactly identified systems.....................................................34Table 4.6 Exactly Identified Kmenta Problem..........................................34

4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command..............................38Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML.....................39

4.6 LS2 and GMM Models and Specification tests........................................52Table 4.8 LS2 and General Method of Moments estimation routines...........................54Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats......................60

4.8 Conclusion................................................................71

Simultaneous Equations Systems

4.0 Introduction

In section 4.1, after first discussing the basic simultaneous equations model, the constrained reduced form, the unconstrained reduced form and the final form are introduced. The MATLAB symbolic capability is used to illustrate how the constrained reduced form relates to the structural parameters of the model. In section 4.2 the theory behind QR approach to simultaneous equations modeling as developed by Jennings (1980) is discussed in some detail. The simeq command performs estimation of systems of equations by the methods of OLS, limited information maximum likelihood (LIML), two-stage least squares (2SLS), three-stage least squares (3SLS), iterative three-stage least squares (I3SLS), seemingly unrelated regression (SUR) and full information maximum likelihood (FIML), using code developed by Les Jennings (1973, 1980). The Jennings code is unique in that it implements the QR approach to estimate systems of equations, which results in both substantial savings in time and increased accuracy.1

The estimation methods are well known and covered in detail in such books as Johnston (1963, 1972, 1984), Kmenta (1971, 1986), and Pindyck and Rubinfeld (1976, 1981, 1990) and will only be sketched here. What will be discussed are the contributions of Jennings and others. The

1 The B34S qr command is designed to provide up to 16 digits of accuracy. This command, which also allows estimation of the principal component (PC) regression, uses LINPACK code and is documented in Chapter 10. The qr command is distinct from the code in the simeq command. The matrix command contains extensive and programmable QR capability. For further examples see Chapter 10 and 16. and sections of chapter 2

4-1

4-2 Chapter 4

discussion of these techniques follows closely material in Jennings (1980) and Strang (1976).

Section 4.3 illustrates estimation of variants of the Kmenta model using RATS, B34S and SAS while section 4.4 illustrates an exactly identified model. Section 4.5 shows how using the matrix command OLS, LIMF, 3SLS and FIML can be estimated. The code here is for illustration purposes, benchmarking but not production. Section 4.6 shows matrix command subroutines LS2 and GAMEST that respectively do single equation 2SLS and GMM models. This code is 100% production.

4.1 Estimation of Structural Models

Assume a system of G equations with K exogenous variables2

(4.1-1)

where is the kth exogenous variable for the ith period, is the jth endogenous variable for the

ith period, and is the jth equation error term for the ith period. If we define

equation (4.1-1) can be written as

(4.1-2)

If all observations in are included, then

2 For further discussion see Pindyck and Rubinfeld (1981, 339-349).

Simultaneous Equations Systems 4-3

and equation (4.1-2) can be written as

(4.1-3)

From equation (4.1-3), the constrained reduced form can be calculated as

(4.1-4)

If is estimated directly with OLS, then it is called the unconstrained reduced form. The B34S simeq command estimates B, using either OLS, 2SLS, LIML, 3SLS, I3SLS, or FIML. For each estimated vector B, the associated reduced form coefficient vector π can be optionally calculated.3 If B is estimated by OLS, the coefficients will be biased since the key OLS assumption that the right-hand-side variables are orthogonal with the error term is violated.

Model (4.1-3) can be normalized such that the coefficients . The necessary condition for identification of each equation is that the number of endogenous variables - 1 be less than or equal to the number of excluded exogenous variables. The reason for this restriction is that otherwise it would not be possible to solve for the elements of uniquely in terms of the other parameters of the model. A short example from Greene (2003) that is self documented using MATLAB illustrates this problem.

Table 4.1 Matlab Program to obtain Constrained Reduced Form

% Greene (2003) Chapter 15 Problem # 1% y1= g1*y2 + b11*x1 + b21*x2 + b31*x3% y2= g2*y1 + b12*x1 + b22*x2 + b32*x3%% We know BY+GX=E

syms g1 g2 b11 b21 b31 b12 b22 b32

B =[ 1, -g1; -g2, 1]

G =[-b11,-b21,-b31; -b12,-b22,-b32]

a= -1*inv(B)*G

p11=a(1,1)p12=a(1,2)p13=a(1,3)p21=a(2,1)p22=a(2,2)p23=a(2,3)

% Hopeless. Have 6 equations BUT more than 6 variables

3 If the model is exactly identified, the constrained reduced form can be directly estimated by OLS or using (4.1-4) from LIML, 2SLS or 3SLS. This is shown empirically in section 4.5.

4-4 Chapter 4

' Now impose restrictions' ' b21=0 b32=0'

G =[-b11, 0, -b31; -b12,-b22, 0 ]

B,Ga= -1*inv(B)*G

' Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22 'p11=a(1,1)p12=a(1,2)p13=a(1,3)p21=a(2,1)p22=a(2,2)p23=a(2,3)


Table 4.2 Edited output from running Matlab Program in Table 4.1p11 = -1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12p12 = -1/(-1+g1*g2)*b21+g1/(-1+g1*g2)*b22p13 = -1/(-1+g1*g2)*b31+g1/(-1+g1*g2)*b32p21 = -g2/(-1+g1*g2)*b11+1/ (-1+g1*g2)*b12p22 = -g2/(-1+g1*g2)*b21+1/ (-1+g1*g2)*b22p23 = -g2/(-1+g1*g2)*b31+1/ (-1+g1*g2)*b32

Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22

p11 = -1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12p12 = -g1/(-1+g1*g2)*b22p13 = -1/(-1+g1*g2)*b31p21 = -g2/(-1+g1*g2)*b11+1/(-1+g1*g2)*b12p22 = -1/(-1+g1*g2)*b22p23 = -g2/(-1+g1*g2)*b31

If the excluded exogenous variables of the ith equation are not significant in any other equation, then the ith equation will not be identified, even if it is correctly specified. We note that

where . The reduced form

disturbance is not correlated with the exogenous variables or .

from which we deduce that

(4.1-5)

In summary, = G by K exogenous variable coefficient matrix, = G by G nonsingular endogenous variable coefficient matrix, = K by K symmetric positive definite matrix structural covariance matrix, =G by K constrained reduced form coefficient matrix and = G by G reduced form covariance matrix. The importance of this is that since and can be estimated consistently by OLS, following Greene (2003, 387) if were known, we could obtain from (4.1-4) and from (4.1-5). If there are no endogenous variables on the right, yet a number of equations are estimated where there is covariance in the error term across equations, the seemingly unrelated regression model (SUR) can be estimated as

(4.1-6)

Elements of can be estimated if OLS is used on each of the G equations and

(4.1-7)

4-6 Chapter 4

For more detail see Greene (2003) or other advanced econometric books. Pindyck and Rubinfield (1976, 1981, 1990) provides a particularly good treatment that is consistent with the notation in this chapter.

From (4.1-4) Theil (1971, 463-468) suggests calculating the final form. First partition the ith observation of the exogenous variables into lagged endogenous, current exogenous and lagged exogenous where identifies are used to express lags > 1.

(4.1-8)

Theil (1971) shows that (4.1-8) can be expressed as

(4.1-9)

where is the impact multiplier. If there are no lagged endogenous variables in the system,

and the constrained reduced form and the final form are the same. In this case

. The interim multipliers are which, when summed, form the total multiplier

(4.1-10)

Goldberger (1959) and Kmenta (1971, 592) provide added detail. The importance of (4.1-8) is that it shows the effect on all endogenous variables of a change in any exogenous variable after all effects have had a chance to work themselves out in the system.

There are several common mistakes made in setting up simultaneous equations systems. These include the following:


- Not fully checking for multicollinearity in the equations system.

- Attempting to interpret the estimated B and Γ coefficients as partial derivatives, rather than looking at the reduced form G by K matrix π.

- Not effectively testing whether excluded exogenous variables are significant in at least one other equation in the system.

- Not building into the solution procedure provisions for taking into account the number of significant digits in the data.

The simeq code has unique design characteristics that allow solutions for some of these problems. In the next sections, we will briefly outline some of these features.

Assume for a moment that X is a T by K matrix of observations of the exogenous variables, Y is a T by 1 vector of observations of the endogenous variable, and β is a K element array of OLS coefficients, then the OLS solution for the estimated β from equation (2.1-8) is

. The problem with this approach is that some accuracy is lost by forming the matrix . The QR approach4 proceeds by operating directly on the matrix X to express it in terms of the upper triangular K by K matrix R and the T by T orthogonal matrix Q. X is factored as

(4.1-11)

Since Q'Q = I, then

(4.1-12)

Following Jennings (1980), we define the condition number of matrix X, (C(X)), as the

ratio of the square root of the largest eigenvalue of to the smallest

eigenvalue of

(4.1-13)

4 A good discussion of the QR factorization is contained in Strang (1976). Other references include Jennings (1980) and Dongarra, Bunch, Moler, and Stewart (1979).

4-8 Chapter 4

If , and X is square and nonsingular, then

(4.1-14)

Throughout B34S, 1/C(X) is checked to test for rank problems. Jennings (1980) notes that C(X) can also be used as a measure of relative error. If μ is a measure of round-off error, then

is the bound for the relative error of the calculated solution. In an IBM 370 running double precision, μ is approximately .1E-16. If C(X) is > .1E+8 (1 /C(X) is < .1E-8), then

, meaning that no digits in the reported solution are significant. Jennings (1980) looks at the problem from another perspective. If matrix X has a round-off error of τX such that the actual X used is X+τX, then must be less than 1/C(X) for a solution to exist. If

(4.1-15)

then there exists a such that is singular.5 The user can inspect the estimate of the condition and determine the degree of multicollinearity. Most programs only report problems when the matrix is singular. Inspection of C(X) gives warning of the degree of the problem. The simeq command contains the IPR parameter option with which the user can inform the program of the number of significant digits in X. This information is used to terminate the iterative three-stage (ILS3) iterations when the relative change in the solution is within what would be expected, given the number of significant digits in the data.

Jennings (1980) notes that the relative error of the QR solution to the OLS problem given in equation (4.1-10) has the form

(4.1-16)

where and are of the order of machine precision and are the lengths of the estimated residual and estimated coefficients, respectively. (The length or L2NORM of a vector

ei is defined as ) . Equation (4.1-14) indicates that as the relative error of the computer solution improves, the closer the model fits. An estimate of this relative error is made for OLS, LIML and 2SLS estimators reported by simeq.

4.2 Estimation of OLS, LIML, LS2, LS3, and ILS3

5 For more detail on techniques used in simeq to avoid numerical error in the calculations arising from differences in the means of the data, see Jennings (1980).


For OLS estimation of a system of equations, simeq uses the QR approach discussed earlier. If the reduced option is used, once the structural coefficients B and Γ in equation (4.1-3) are known, the constrained reduced form coefficients π from equation (4.1-4) are displayed. If B and Γ are estimated using OLS, and all structural equations are exactly identified, then the constraints on π imposed from the structural coefficients B and Γ are not binding and π could be estimated directly with OLS or indirectly via (4.1-4). However, if one or more of the equations in the structural equations system (4.1-2) are overidentified, π must be estimated as .

Although the reduced-form coefficients π exist and may be calculated from any set of structural estimates B and Γ, in practice it is not desirable to report those derived from OLS estimation because in the presence of endogenous variables on the right-hand side of an equation, the OLS assumption that the error term is orthogonal with the explanatory variables is violated. Since OLS imposes this constraint as a part of the estimation process, the resulting estimated B and Γ are biased.

The reason that OLS is often used as a benchmark is because from among the class of all linear estimators, OLS produces minimum variance. The loss in predictive power of LIML and 2SLS has to be weighed against the fact that OLS produces biased estimates. If reduced-form coefficients are desired, identities in the system must be entered. The number of identities plus the number of estimated equations must equal the number of endogenous variables in the model. The simeq command requires that the number of model sentences and identity sentences is equal to the number of variables listed in the endogenous sentence.

The 2SLS estimator first estimates all endogenous variables as a function of all exogenous variables. This is equivalent to estimating an unconstrained form of the reduced-form equation (4.1-4). Next, in stage 2 the estimated values of the endogenous variables on the right in

the jth equation are used in place of the actual values of the endogenous variables Y j on the right to estimate equation (4.1-2). Since the estimated values of the endogenous variables on the right are only a function of exogenous variables, the theory suggests they can be assumed to be orthogonal with the population error, and OLS can be safely used for the second stage. In terms of our prior notation, the two-stage estimator for the first equation is

(4.2-1)

where is the matrix of predicted endogenous variables in the first equation and X1 is the

4-10 Chapter 4

matrix of exogenous variables in the first equation. For further details on this traditional estimation approach, see Pindyck and Rubinfeld (1981, 345-347).

The QR approach used by Jennings (1980) involves estimating equation (4.2-1) as the solution of

(4.2-2)

For where and X+ pseudoinverse6 of X. Zj

consists of the X and Y variables in the jth equation. XX+ is not calculated directly but is expressed in terms of the QR factorization of X. By working directly on X, and not forming X'X, substantial accuracy is obtained. Jennings proceeds by writing

(4.2-3)where Ir is the r by r identity matrix and r is the rank of X. Using equation (4.2-3), equation (4.2-2) becomes

(4.2-4)

where and .

The 2SLS covariance matrix can be estimated as

(4.2-5)

where is the degrees of freedom and is the residual sum of squares (or the square of the L2NORM of the residual). There is a substantial controversy in the literature about the

appropriate value for . Since the SEs of the estimated 2SLS coefficients are known only

asymptotically, Theil (1971) suggests that be set equal to T, the number of observations used

6 If we define X+ as the pseudoinverse of the T by K matrix X, then it can be shown (Strang 1976, 138, exercise 3.4.5) that the following four conditions hold: 1. XX+X=X; 2. X+XX+=X+; 3. (XX+)'=XX+; and 4. (X+X)'=X+X . The pseudoinverse can be obtained from the singular value decomposition or the QR factorization of X.


to estimate the model. Others suggest that be set to T-K, similar to what is being used in OLS. If Theil's suggestion is used, the estimated SEs of the coefficients are larger. The T-K option is more conservative. The simeq command produces both estimates of the coefficient standard errors to facilitate comparison with other programs and researcher preferences.

Two-stage least squares estimation of an equation with endogenous variables on the right, in contrast with OLS estimation, in theory produces unbiased coefficients at the cost of some loss of efficiency. If a large system is estimated, it is often impossible to use all exogenous variables in the system because of loss of degrees of freedom. The usual practice is to select a subset of the exogenous variables. The greater the number of exogenous variables relative to the degrees of freedom, the closer the predicted Y variables on the right are to the raw Y variables on the right. In this situation, the 2SLS estimator sum of squares of residuals will approach the OLS estimator sum of squares of residuals. Such an estimator will lose the unbiased property of the 2SLS estimator. Usual econometric practice is to use OLS and 2SLS and compare the results to see how sensitive the OLS results are to simultaneity problems.

While 2SLS results are sensitive to the variable that is used to normalize the system, limited information maximum likelihood (LIML) estimation, which can be used in place of 2SLS, is not so sensitive. Kmenta (1971, 568-570) has a clear discussion which is summarized below. The LIML estimator,7 which is hard to explain in simple terms, involves selecting values for b and δ for each equation such that L is minimized where L = SSE1 / SSE. We define SSE1 as the residual variance of estimating a weighted average of the y variables in the equation on all exogenous variables in the equation, while SSE is the residual variance of estimating a weighted average of the y variables on all the exogenous variables in the system. Since SSE SSE1, L is bounded at 1. The difficulty in LIML estimation is selecting the weights for combining the y variables in the equation. Assume equation 1 of (4.1-1)

(4.2-6) Ignoring time subscripts, we can define

(4.2-7)

If we define and we knew the vector we would know

y*1 since and could regress y* on all x variables on the right in that equation and call

the residual variance SSE1 and next regress on all x variables in the system and call the residual variance SSE. If we define X1 as a matrix consisting of the columns of the x variables on

the right X1= [x1i,...,x1K], and we knew B1*, then we could estimate as

7 Kmenta (1971, 565-572) has one of the clearest descriptions. The discussion here complements that material.

4-12 Chapter 4

(4.2-8)

However, we do not know B1*. If we define

(4.2-9)

(4.2-10)

where X is the matrix of all X variables in the system, then can be written as

(4.2-11)

Minimizing L implies that

det (4.2-12)

The LIML estimator uses eigenvalue analysis to select the vector B1* such that L is minimized. This calculation involves solving the system

(4.2-13)

for the smallest root L which we will call This root can be substituted back into equation (4.2-12) to get B1* and into equation (4.2-8) to get Γ1. Jennings shows that equation (4.2-13) can be rewritten as

. (4.2-14)

Further factorizations lead to accuracy improvements and speed over the traditional methods of solution outlined in Johnston (1984), Kmenta (1971), and other books. Jennings (1973, 1980) briefly discusses tests made for computational accuracy, given the number of significant digits in the data and various tests for nonunique solutions. One of the main objectives of the simeq code was to be able to inform the user if there were problems in identification in theory and in practice. Since the LIML standard errors are known only asymptotically and are, in fact, equal to the 2SLS estimated standard errors, these are used for both the 2SLS and LIML estimators.

In the first stage of 2SLS, π is the unconstrained, reduced form.

Y = πX + V (4.2-15) and is estimated to obtain the predicted variables. 2SLS, OLS, and LIML are all special cases of the Theil (1971) k class estimators. The general formula for the k class estimator for the first


equation (Kmenta 1971, 565) is

(4.2-16)

where is the predicted residual from estimating all but the 1st y variable in equation (4.2-15),

, and X1 is the X variables on the right-hand side of the first equation. (4.2-16) follows directly from (4.2-1). If k=0, equation (4.2-15) is the formula for OLS estimation of the first equation. If k=1, equation (4.2-16) is the formula for 2SLS estimation of the first equation and can be transformed to equation (4.2-5). If k = , the minimum root of equation (4.2-13), equation (4.2-16) is the formula for the LIML estimator (Theil 1971, 504). Hence, OLS, 2SLS, and LIML are all members of the k class of estimators.

Three-stage least squares utilizes the covariance of the residuals across equations from the estimated 2SLS model to improve the estimated coefficients B and Γ. If the model has only exogenous variables on the right-hand side (B = 0), the OLS estimates can be used to calculate the covariance of the residuals across equations. The resulting estimator is the seemingly unrelated regression model (SUR). In this discussion, we will look at the 3SLS model only, since the SUR model is a special case. From (4.2-2) we rewrite the 2SLS estimator for the i th

equation as

, (4.2-17)

which estimates the ith 2SLS equation

. (4.2-18)

If we define8 and multiply equation (4.2-18) by P'X', we obtain

(4.2-19)

which can be written

(4.2-20)

where . If all G 2SLS equations are written as

8 This discussion is based on material contained in Johnston (1984, 486).

4-14 Chapter 4

(4.2-21)

then the system can be written as

w = Wα + ε. (4.2-22)

For each equation, i=j and

(4.2-23)

while the covariance of the error term for the system becomes

(4.2-24)

Equation (4.2-24) indicates that for each equation there is no heteroskedasticity, but that there is contemporaneous correlation of the residuals across equations. Equation (4.2-24) can be estimated from the 2SLS estimates of the residuals of each equation for 3SLS or the OLS estimates of the residuals of each equation for SUR models. Let

(4.2-25)

be such an estimate. The 3SLS estimator of the system , where becomes

(4.2-26)

Jennings (1980) uses two alternative approaches to solve (4.2-26) depending on whether the covariance of the 3SLS estimator

(4.2-27)


is required or not. In the former case, a orthogonal factorization method is used. In the latter case to save space the conjugate gradiant interative algorithm (Lanczos reductyion) suggested by Paige and Sanders (1973) is used. This latter approach may or may not converge. For added detail see Jennings (1980). If the switch kcov=diag is used there will not be convergence issues, since the QR approach will be used. Since many software systems use inversion methods, slight differences in the estimated coefficients will be observed since the QR approach is in theory more accurate. Implementation of the "textbook" approach is illustrated using the matrix command in section 4.4.

In a model with G equations, if the equation of interest is the jth equation, then assuming the exogenous variables in the system are selected correctly and the jth equation is specified correctly, 2SLS estimates are invariant to any other equation. 3SLS of the j th equation, in contrast, is sensitive to the specification of other equations in the system since changes in other equation specifications will alter the estimate of V and thus the 3SLS estimator of δ from equation (4.2-26). Because of this fact, it is imperative that users first inspect the 2SLS estimates closely. The constrained reduced form estimates, π, should be calculated from the OLS and 2SLS models and compared. The differences show the effects of correcting for simultaneity. Next, 3SLS should be performed. A study of the resulting changes in δ and π will show the gain of moving to a system-wide estimation procedure. Since changes in the functional form of one equation i can possibly impact the estimates of another equation j, in this step of model building, sensitivity analysis should be attempted. In a multiequation system, the movement from 2SLS to 3SLS often produces changes in the estimate of δi for one equation but not for another equation. In a model in which all equations are over identified, in general the 3SLS estimators will differ from the 2SLS estimators. If all equations are exactly identified, then V is a diagonal matrix (Theil 1971, 511) and there is no gain for any equation from using 3SLS. In the test problem from Kmenta (1971, 565), which is discussed in the next section, one equation is over identified and one equation is exactly identified. In this case, only the exactly identified equation will be changed by 3SLS. This is because the exactly identified equation gains from information in the over identified equation but the reverse is not true. The over identified equation does not gain from information from the exactly identified equation.

In SUR models, if all equations contain the same variables, there is no gain over OLS from going to SUR, since V is again a diagonal matrix. Just as the LIML method of estimation is an alternative to 2SLS, the FIML is a more costly alternative to 3SLS and I3SLS.

FIML9 is a generalization of LIML for systems of models. Like LIML, it is invariant to the variable used to normalize the model. FIML, in contrast with 3SLS, is highly nonlinear and, as a consequence, much more costly to estimate. Because FIML is asymptotically equivalent to 3SLS (Theil 1971, 525) and the simeq code does not contain any major advantages over other

9 The fiml section of the simeq command is the weakest link. In addition to a probably a scaler error in the fiml standard errors, there often are convergence problems that appear to be data related. In view of this and the fact that 3SLS is an inexpensive substitute, users are encouraged to employ 3SLS and I3SLS in place of FIML. Future releases of B34S will endeavor to improve the FIML code or disable the option. The matrix command implementation of FIML, shown later in section 4.4, provides a look into how such a model might be implemented.

4-16 Chapter 4

programs, the discussion of FIML is left to Theil (1971), Kmenta (1971) and Johnston (1984) except for an annotated FIML example using the matrix command. In the next section, an annotated output is presented.

Iterative 3SLS is an alternative final step in which the estimate of V is updated from the information from the 3SLS estimates. The problem now becomes where do you stop iterating on the estimates of V? The simeq command uses the information on the number of significant digits (see ipr parameter) in the raw data and equation (4.1-8) to terminate the I3SLS iterations if the relative change is within what would be expected, given the number of significant digits in the raw data. If ipr is not set, the simeq command assumes ten digits.

4.3 Examples

Using data on supply and demand from Kmenta (1971, 565), Table 4.1 shows matrix code to estimate models for OLS, LIML, 2SLS, and 3SLS. The reduced-form estimates for each model are calculated. Not all output is shown to save space. The results are the same, digit for digit, as those reported in Kmenta (1971, 582). Note the use of the keyword ls2 for 2SLS and ls3 for 3SLS since the parser will not recognize 2SLS and 3SLS as keywords.


Table 4.3 Setup for ols, liml, ls2, ls3, and ils3 commands

==KMENTA1B34sexec data nohead corr$ Input q p d f a $ Label q = 'Food consumption per head'$ Label p = 'Ratio of food prices to consumer prices'$ Label d = 'Disposable income in constant prices'$ Label f = 'Ratio of t-1 years price to general p'$ Label a = 'Time'$ Comment=('Kmenta (1971) page 565 answers page 582')$ Datacards$ 98.485 100.323 87.4 98.0 1 99.187 104.264 97.6 99.1 2 102.163 103.435 96.7 99.1 3 101.504 104.506 98.2 98.1 4 104.240 98.001 99.8 110.8 5 103.243 99.456 100.5 108.2 6 103.993 101.066 103.2 105.6 7 99.900 104.763 107.8 109.8 8 100.350 96.446 96.6 108.7 9 102.820 91.228 88.9 100.6 10 95.435 93.085 75.1 81.0 11 92.424 98.801 76.9 68.6 12 94.535 102.908 84.6 70.9 13 98.757 98.756 90.6 81.4 14 105.797 95.119 103.1 102.3 15 100.225 98.451 105.1 105.0 16 103.522 86.498 96.4 110.5 17 99.929 104.016 104.4 92.5 18 105.223 105.769 110.7 89.3 19 106.232 113.490 127.1 93.0 20B34sreturn$B34seend$B34sexec simeq printsys reduced ols liml ls2 ls3 ils3 kcov=diag ipr=6$ Heading=('Test Case from Kmenta (1971) Pages 565 - 582 ' ) $ Exogenous constant d f a $ Endogenous p q $ Model lvar=q rvar=(constant p d) Name=('Demand Equation')$ Model lvar=q rvar=(constant p f a) name=('Supply Equation')$ B34seend$

==

4-18 Chapter 4

The OLS results are as follows:

Test Case from Kmenta (1971) Pages 565 - 582

Summary of Input Parameters and Model Number of systems to be estimated - - - 2 Number of identities - - - - - - - - - 0 Number of exogenous variables - - - - 4 Number of endogenous variables - - - - 2 Number of data points in time - - - - - 20 Maximum number of unknowns per system - 4 Print Parameter - - - - - - - - - - - - 2 Solutions wanted 0 => no, 1 => yes - - Reduced form coefficients - - - - - - - 1 Ordinary Least Squares - - - - - - - - 1 LIMLE Solution - - - - - - - - - - - - 1 Two Stage Least Squares - - - - - - - - 1 Three Stage Least Squares - - - - - - - 1 Three Stage Covariance Matrix - - - - - 1 Iterated Three Stage Least Squares - - 1 Covariance Matrix for I3SLSQ - - - - - 1 Maximum number of iterations - - - - - 25 Functional Minimization 3SLSQ - - - - - 0 Covariance Matrix for Functional Min. - 0

Systems described by the following columns of data (Variables)

Name of the System LHS No. X NO. Y

Demand Equation 2 Q 2 1 1 CONSTANT 1 P 2 D * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Supply Equation 2 Q 3 1 1 CONSTANT 1 P 3 F 4 A * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

B34S 8.10R (D:M:Y) 11/ 4/04 (H:M:S) 11:13:19 SIMEQ STEP PAGE 4


Least Squares Solution for System Number 1 Demand Equation

Condition Number of Matrix is greater than 21.04911571706159 Relative Numerical Error in the Solution 1.301987681166638E-11

LHS Endogenous Variable No. 2 Q

Exogenous Variables (Predetermined) Std. Error t 1 CONSTANT 99.89542 7.519362 13.28509 2 D 0.3346356 0.4542183E-01 7.367285

Endogenous Variables (Jointly Dependent) Std. Error t 3 P -0.3162988 0.9067741E-01 -3.488177

Residual Variance for Structural Disturbances 3.725391173733892 Ratio of Norm Residual to Norm LHS 1.762488253954560E-02

Covariance Matrix of Estimated Parameters

CONSTANT D P 1 2 3 CONSTANT 1 56.54 D 2 0.3216E-01 0.2063E-02 P 3 -0.5948 -0.2333E-02 0.8222E-02

Correlation Matrix of Estimated Parameters

CONSTANT D P 1 2 3 CONSTANT 1 1.000 D 2 0.9417E-01 1.000 P 3 -0.8724 -0.5665 1.000


Least Squares Solution for System Number 2 Supply Equation



Exogenous Variables (Predetermined) Std. Error t


1 CONSTANT 58.27543 11.46291 5.083825 2 F 0.2481333 0.4618785E-01 5.372263 3 A 0.2483023 0.9751777E-01 2.546227

Endogenous Variables (Jointly Dependent) Std. Error t 4 P 0.1603666 0.9488394E-01 1.690134



CONSTANT F A P 1 2 3 4 CONSTANT 1 131.4 F 2 -0.3044 0.2133E-02 A 3 -0.2792 0.1316E-02 0.9510E-02 P 4 -0.9875 0.8440E-03 0.5220E-03 0.9003E-02


CONSTANT F A P 1 2 3 4 CONSTANT 1 1.000 F 2 -0.5749 1.000 A 3 -0.2498 0.2921 1.000 P 4 -0.9079 0.1926 0.5642E-01 1.000


Contemporaneous Covariance of Residuals (Structural Disturbances) For Least Squares Solution.

Condition Number of residual columns, 2.664758

Demand E Supply E 1 2 Demand E 1 3.167 Supply E 2 3.411 4.628

Correlation Matrix of Residuals



Coefficients of the Reduced Form Equations.

Least Squares Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than 4.195815340351579

P Q 1 2 CONSTANT 1 87.31 72.28 D 2 0.7020 0.1126 F 3 -0.5206 0.1647 A 4 -0.5209 0.1648

Mean sum of squares of residuals for the reduced form equations.

1 P 0.42748D+01 2 Q 0.39192D+01

Condition Number of columns of exogenous variables, 11.845

For each estimated equation, the condition number of the matrix, equation (4.1-7), and the relative numerical errors in the solution, equation (4.1-8), are given. The relative numerical errors for the supply and demand equations were .1302E-10 and .13187E-10, respectively. Estimated coefficients agree with Kmenta (1971, 582). From the estimated B and Γ coefficients, the constrained reduced form π coefficients are calculated. The condition number of the exogenous columns, .11845E+2, shows little multicollinearity among the exogenous variables. The next outputs show the corresponding estimates for LIML, 2SLS, and 3LSL. As was discussed earlier, since the asymptotic SEs for LIML are the same as for 2SLS, the simeq command does not print these values.

4-20 Chapter 4


Limited Information - Maximum Likelihood Solution f 1 Demand Equation

Rank and Condition Number of Exogenous Columns 2 8.5174634 Rank and Condition Number of Endogenous Variables orthogonal to X(K) 2 6.5593694 Rank and Condition Number of Endogenous Variables orthogonal to X 2 2.3005812

Value of LIML Parameter is 1.173867141559841



Standard Deviation Equals 2SLSQ Standard Deviation.

Exogenous Variables (Predetermined) 1 CONSTANT 93.61922 2 D 0.3100134

Endogenous Variables (Jointly Dependent) 3 P -0.2295381



Limited Information - Maximum Likelihood Solution f 2 Supply Equation






Exogenous Variables (Predetermined) 1 CONSTANT 49.53244 2 F 0.2556057 3 A 0.2529242

Endogenous Variables (Jointly Dependent) 4 P 0.2400758



Contemporaneous Covariance of Residuals (Structural Disturbances) For LIMLE Solution.







LIMLE Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than 4.258817996669486

P Q 1 2 CONSTANT 1 93.88 72.07 D 2 0.6601 0.1585 F 3 -0.5443 0.1249 A 4 -0.5386 0.1236



1 P 0.41286D+01 2 Q 0.38401D+01


Two Stage Least Squares Solution for System Number 1 Demand Equation



Exogenous Variables (Predetermined) Std. Error t Theil SE Theil t 1 CONSTANT 94.63330 7.920838 11.94738 7.302652 12.95876 2 D 0.3139918 0.4694366E-01 6.688695 0.4327991E-01 7.254908

Endogenous Variables (Jointly Dependent) Std. Error t Theil SE Theil t 3 P -0.2435565 0.9648429E-01 -2.524313 0.8895412E-01 -2.738002





CONSTANT D P 1 2 3 CONSTANT 1 1.000 D 2 0.1326 1.000 P 3 -0.8812 -0.5833 1.000


Two Stage Least Squares Solution for System Number 2 Supply Equation



Exogenous Variables (Predetermined) Std. Error t Theil SE Theil t 1 CONSTANT 49.53244 12.01053 4.124086 10.74254 4.610868 2 F 0.2556057 0.4725007E-01 5.409637 0.4226175E-01 6.048158 3 A 0.2529242 0.9965509E-01 2.537996 0.8913422E-01 2.837565

Endogenous Variables (Jointly Dependent) Std. Error t Theil SE Theil t 4 P 0.2400758 0.9993385E-01 2.402347 0.8938355E-01 2.685905





CONSTANT F A P 1 2 3 4 CONSTANT 1 1.000 F 2 -0.5706 1.000 A 3 -0.2467 0.2924 1.000 P 4 -0.9126 0.1983 0.5815E-01 1.000


Contemporaneous Covariance of Residuals (Structural Disturbances) For Two Stage Least Squares Solution.




4-22 Chapter 4




Two Stage Least Squares Solution Condition number of matrix used to find the reduced form coefficients is no smaller than 4.135372945327849

P Q 1 2 CONSTANT 1 93.25 71.92 D 2 0.6492 0.1559 F 3 -0.5285 0.1287 A 4 -0.5230 0.1274


1 P 0.39831D+01 2 Q 0.38317D+01

Condition number of the large matrix in Three Stage Least Squares 60.70221



Three Stage Least Squares Solution for System Number 1 Demand Equation




Residual Variance (For Structural Disturbances) 3.286454

Three Stage Least Squares Covariance for System Demand Equation


Three Stage Least Squares Solution for System Number 2 Supply Equation





Three Stage Least Squares Covariance for System Supply Equation



Contemporaneous Covariance of Residuals (Structural Disturbances) For Three Stage Least Squares Solution.







Three Stage Least Squares Solution using Orthogonal Factorization. Condition number of matrix used to find the reduced form coefficients is no smaller than 4.232905401139098

P Q 1 2 CONSTANT 1 89.98 72.72 D 2 0.6645 0.1521 F 3 -0.4846 0.1180 A 4 -0.7575 0.1845


1 P 0.19065D+01 2 Q 0.42494D+01

Iterated Three Stage Least Squares Results are given next.

4-24 Chapter 4

Iteration begins for Iterated 3SLSQ.

Condition number of the large matrix in Three Stage Least Squares 147.2220


Iterated Three Stage Least Squares Solution for System No. 1 Demand Equation





Iterated Three Stage Least Squares Covariance for System Demand Equation


Iterated Three Stage Least Squares Solution for System No. 2 Supply Equation



Endogenous Variables (Jointly Dependent) Std. Error t Theil SE Theil t 4 P 0.2270569 0.1069194 2.123627 0.9563159E-01 2.374287


Iterated Three Stage Least Squares Covariance for System Supply Equation



Contemporaneous Covariance of Residuals (Structural Disturbances) For Iterated Three Stage Least Squares Solution.







Iterated Three Stage Least Squares Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than 4.249772824974006

P Q 1 2 CONSTANT 1 89.42 72.86 D 2 0.6672 0.1515 F 3 -0.4770 0.1162 A 4 -0.7981 0.1944


1 P 0.20576D+01


2 Q 0.43519D+01

In the Kmenta test problem, one equation (demand) was overidentified and one equation (supply) was exactly identified. As was mentioned earlier, the 2SLS and 3SLS results for the overidentified equation are the same because the other equation was exactly identified. However, the 3SLS results for the exactly identified equation (supply) differ from the 2SLS results because the other equation (demand) is over identified. Close inspection of the results for 3SLS for the demand equation shows that they are the same as those of Kmenta (1971, 582) and Kmenta (1986, 712). The supply-equation results are the same as those of Kmenta (1971) but differ slightly from those of Kmenta (1986), which appear to be in error.10 To facilitate testing, SAS and RATS setups are shown in Tables 4.2 and 4.3 and their output discussed in some detail.

Table 4.4 SAS Implementation of the Kmenta Model

B34SEXEC OPTIONS OPEN('testsas.sas') UNIT(29) DISP=UNKNOWN$ B34SRUN$B34SEXEC OPTIONS CLEAN(29) $ B34SEEND$B34SEXEC PGMCALL IDATA=29 ICNTRL=29$ SAS $PGMCARDS$proc means; run;

proc syslin 3sls reduced; instruments d f a constant;endogenous p q;

demand: model q = p d;supply: model q = p f a;run;

proc syslin it3sls reduced; instruments d f a constant;endogenous p q;

demand: model q = p d;supply: model q = p f a;run;

B34SRETURN$B34SRUN $B34SEXEC OPTIONS CLOSE(29)$ B34SRUN$

/$ The next card has to be modified to point to SAS location/$ Be sure and wait until SAS gets done before letting B34S resume

B34SEXEC OPTIONS dodos('start /w /r sas testsas') dounix('sas testsas')$ B34SRUN$B34SEXEC OPTIONS NPAGEOUT NOHEADER WRITEOUT(' ','Output from SAS',' ',' ') WRITELOG(' ','Output from SAS',' ',' ') COPYFOUT('testsas.lst') COPYFLOG('testsas.log') dodos('erase testsas.sas','erase testsas.lst','erase testsas.log') dounix('rm testsas.sas','rm testsas.lst','rm testsas.log')$

10 The file example.mac contains an extension of the above test case that calls RATS, SAS and a B34S matrix implementation. For the supply equation SAS gets the Kmenta (1986) results which are 52.1972 (11.8934), .2286 (.0997), .2282 (.0440), (.3611). What RATS calls 3SLS produces what B34S calls I3SLS. Readers are encouraged to use the code in tables 4.4 and 4.5 to further investigate this issue. A major difficulty for the researcher to be able to tell exactly what is being estimated by a software system. For this reason attempting the model on multiple software systems is strongly advised.

4-26 Chapter 4

B34SRUN$

Simultaneous Equations Systems 27

Table 4.5 RATS Implementation of the Kmenta Model

B34SEXEC OPTIONS HEADER$ B34SRUN$

b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$b34sexec options clean(28)$ b34srun$b34sexec options clean(29)$ b34srun$

b34sexec pgmcall$ rats passastspcomments('* ', '* Data passed from B34S(r) system to RATS', '* ', "display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()" '* ') $

PGMCARDS$** heading=('test case from kmenta 1971 page 565 - 582 ' ) $* exogenous constant d f a $* endogenous p q $* model lvar=q rvar=(constant p d) name=('demand eq.') $* model lvar=q rvar=(constant p f a) name=('supply eq.') $

linreg q# constant p dlinreg q# constant p f a

instruments constant d f alinreg(inst) q# constant p dlinreg(inst) q# constant p f a

source d:\r\liml.src@liml q# constant p d@liml q# constant p f a

equation demand q# constant p dequation supply q# constant p f a

* Supply does not match known answers!!

sur(inst,iterations=200) 2# demand resid1# supply resid2

nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3

compute c0 = .1compute c1 = .1compute c2 = .1compute d0 = .1compute d1 = .1compute d2 = .1compute d3 = .1

frml d_eq q = c0 + c1*p + c2*dfrml s_eq q = d0 + d1*p + d2*f + d3*anlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq

28 Chapter 4

b34sreturn$b34srun $

b34sexec options close(28)$ b34srun$b34sexec options close(29)$ b34srun$b34sexec options/$ dodos(' rats386 rats.in rats.out ') dodos('start /w /r rats32s rats.in /run') dounix('rats rats.in rats.out')$ B34SRUN$

b34sexec options npageout WRITEOUT('Output from RATS',' ',' ') COPYFOUT('rats.out') dodos('ERASE rats.in','ERASE rats.out','ERASE rats.dat') dounix('rm rats.in','rm rats.out','rm rats.dat') $ B34SRUN$

As noted earlier, the 2SLS and 3SLS results for the overidentified equation (demand) are the same. However, the printout shows that the residual variance for the 2SLS result is 3.8664, while the residual variance for the 3SLS result is 3.2865. The reason for this apparent error is that the 2SLS residual variance equals the sum of squared residuals divided by T-K, while the 3SLS calculation uses T; hence, 3.8664 = 3.2865 *(20/17).

To investigate the differences in the supply equation that occur in Kmenta (1971) and (1986), edited and annotated SAS and RATS output is shown next. SAS 3SLS and I3SLS output is shown to agree with Kmenta (1986) for both demand and supply equations.

The SYSLIN Procedure Three-Stage Least Squares Estimation

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 94.63330 7.920838 11.95 <.0001 P 1 -0.24356 0.096484 -2.52 0.0218 D 1 0.313992 0.046944 6.69 <.0001

Model SUPPLY Dependent Variable Q

Parameter Estimates


Intercept 1 52.19720 11.89337 4.39 0.0005 P 1 0.228589 0.099673 2.29 0.0357 F 1 0.228158 0.043994 5.19 <.0001 A 1 0.361138 0.072889 4.95 0.0001

Endogenous Variables

P Q

DEMAND 0.243557 1 SUPPLY -0.22859 1

Exogenous Variables

Intercept D F A

DEMAND 94.6333 0.313992 0 0 SUPPLY 52.1972 0 0.228158 0.361138


Inverse Endogenous Variables

DEMAND SUPPLY

P 2.11799 -2.11799 Q 0.48415 0.51585

30 Chapter 4

The SYSLIN Procedure Three-Stage Least Squares Estimation

Reduced Form

Intercept D F A

P 89.87924 0.665032 -0.48324 -0.76489 Q 72.74263 0.152019 0.117695 0.186293

The SYSLIN Procedure Iterative Three-Stage Least Squares Estimation

Parameter Estimates


Intercept 1 94.63330 7.920838 11.95 <.0001 P 1 -0.24356 0.096484 -2.52 0.0218 D 1 0.313992 0.046944 6.69 <.0001


Parameter Estimates


Intercept 1 52.66182 12.80511 4.11 0.0008 P 1 0.226586 0.107459 2.11 0.0511 F 1 0.223372 0.046774 4.78 0.0002 A 1 0.380006 0.072010 5.28 <.0001

Endogenous Variables

P Q

DEMAND 0.243557 1 SUPPLY -0.22659 1

Exogenous Variables

Intercept D F A

DEMAND 94.6333 0.313992 0 0 SUPPLY 52.66182 0 0.223372 0.380006

Inverse Endogenous Variables

DEMAND SUPPLY

P 2.127012 -2.12701 Q 0.481952 0.518048

The SYSLIN Procedure Iterative Three-Stage Least Squares Estimation

Reduced Form

Intercept D F A

P 89.27387 0.667864 -0.47512 -0.80828 Q 72.89007 0.151329 0.115718 0.196861

RATS output is shown next for OLS, 2SLS, LIML, and 3SLS two ways. Note that for the supply equation the estimated coefficients, SE’s, t’s and probabilities were:

Constant 52.552667563 11.395623960 4.61165 0.00000399 P 0.227056969 0.095630772 2.37431 0.01758185 F 0.224496638 0.041626039 5.39318 0.00000007 A 0.375573566 0.064094682 5.85967 0.00000000

Which are very close to the B34S I3SLS results which are duplicated below

Exogenous Variables (Predetermined) Std. Error t Theil SE Theil t


1 CONSTANT 52.55269 12.74080 4.124755 11.39572 4.611616 2 F 0.2244964 0.4653972E-01 4.823758 0.4162639E-01 5.393126 3 A 0.3755747 0.7166061E-01 5.241020 0.6409520E-01 5.859638

Endogenous Variables (Jointly Dependent) Std. Error t Theil SE Theil t 4 P 0.2270569 0.1069194 2.123627 0.9563159E-01 2.374287

These results are not at all like the SAS supply equation 3SLS results of


Parameter Estimates


Intercept 1 52.19720 11.89337 4.39 0.0005 P 1 0.228589 0.099673 2.29 0.0357 F 1 0.228158 0.043994 5.19 <.0001 A 1 0.361138 0.072889 4.95 0.0001

And I3SLS results of:

Parameter Estimates


Intercept 1 52.66182 12.80511 4.11 0.0008 P 1 0.226586 0.107459 2.11 0.0511 F 1 0.223372 0.046774 4.78 0.0002 A 1 0.380006 0.072010 5.28 <.0001

that agree with Kmenta (1986) but not with Kmenta (1971). The full RATS output is shown below calculating 3SLS two different ways.

linreg q # constant p d

Linear Regression - Estimation by Least Squares Dependent Variable Q Usable Observations 20 Degrees of Freedom 17 Centered R**2 0.763789 R Bar **2 0.735999 Uncentered R**2 0.999689 T x R**2 19.994 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.93012724 Sum of Squared Residuals 63.331649953 Regression F(2,17) 27.4847 Significance Level of F 0.00000471 Log Likelihood -39.90530 Durbin-Watson Statistic 1.744203

Variable Coeff Std Error T-Stat Signif *******************************************************************************1. Constant 99.89542291 7.51936214 13.28509 0.00000000 2. P -0.31629880 0.09067741 -3.48818 0.00281529 3. D 0.33463560 0.04542183 7.36729 0.00000110

linreg q # constant p f a

Linear Regression - Estimation by Least Squares Dependent Variable Q Usable Observations 20 Degrees of Freedom 16 Centered R**2 0.654807 R Bar **2 0.590084 Uncentered R**2 0.999546 T x R**2 19.991 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.40508651 Sum of Squared Residuals 92.551058175 Regression F(3,16) 10.1170 Significance Level of F 0.00056018 Log Likelihood -43.69905 Durbin-Watson Statistic 2.109731

Variable Coeff Std Error T-Stat Signif *******************************************************************************

32 Chapter 4

1. Constant 58.275431202 11.462909888 5.08383 0.00011056 2. P 0.160366596 0.094883937 1.69013 0.11038810 3. F 0.248133295 0.046187854 5.37226 0.00006227 4. A 0.248302347 0.097517767 2.54623 0.02156713

instruments constant d f a linreg(inst) q # constant p d

Linear Regression - Estimation by Instrumental Variables Dependent Variable Q Usable Observations 20 Degrees of Freedom 17 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.96632066 Sum of Squared Residuals 65.729087795 J-Specification(1) 2.535651 Significance Level of J 0.11130095 Durbin-Watson Statistic 2.009220

Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 94.63330387 7.92083831 11.94738 0.00000000 2. P -0.24355654 0.09648429 -2.52431 0.02183240 3. D 0.31399179 0.04694366 6.68869 0.00000381

linreg(inst) q # constant p f a

Linear Regression - Estimation by Instrumental Variables Dependent Variable Q Usable Observations 20 Degrees of Freedom 16 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.45755523 Sum of Squared Residuals 96.633243702 Durbin-Watson Statistic 2.384645

Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 49.532441699 12.010526407 4.12409 0.00079536 2. P 0.240075779 0.099933852 2.40235 0.02878451 3. F 0.255605724 0.047250071 5.40964 0.00005785 4. A 0.252924175 0.099655087 2.53800 0.02192877

source d:\r\liml.src @liml q # constant p d

Linear Regression - Estimation by LIML Dependent Variable Q Usable Observations 20 Degrees of Freedom 17 Centered R**2 0.751068 R Bar **2 0.721782 Uncentered R**2 0.999673 T x R**2 19.993 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.98141608 Sum of Squared Residuals 66.742164700 Regression F(2,17) 25.6459 Significance Level of F 0.00000736 Log Likelihood -40.42982 Durbin-Watson Statistic 2.051725


LIML Specification Test Chi-Squared(1)= 3.477343 with Significance Level 0.06221456 @liml q # constant p f a

Linear Regression - Estimation by LIML Dependent Variable Q Usable Observations 20 Degrees of Freedom 16 Centered R**2 0.639582 R Bar **2 0.572004 Uncentered R**2 0.999526 T x R**2 19.991 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.45755523 Sum of Squared Residuals 96.633243702 Regression F(3,16) 9.4643 Significance Level of F 0.00078341 Log Likelihood -44.13068 Durbin-Watson Statistic 2.384645

Variable Coeff Std Error T-Stat Signif *******************************************************************************


1. Constant 49.532441699 12.010526407 4.12409 0.00079536 2. P 0.240075779 0.099933852 2.40235 0.02878451 3. F 0.255605724 0.047250071 5.40964 0.00005785 4. A 0.252924175 0.099655087 2.53800 0.02192877

LIML Specification Test Chi-Squared(0)= 0.000000 with Significance Level NA equation demand q # constant p d equation supply q # constant p f a * Supply does not match known answers!! sur(inst,iterations=200) 2 # demand resid1 # supply resid2

Linear Systems - Estimation by System Instrumental Variables

Iterations Taken 6 Usable Observations 20

Dependent Variable Q Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.65490543 Sum of Squared Residuals 65.729087795 Durbin-Watson Statistic 2.009220



Variable Coeff Std Error T-Stat Signif ******************************************************************************* 4. Constant 52.552667563 11.395623960 4.61165 0.00000399 5. P 0.227056969 0.095630772 2.37431 0.01758185 6. F 0.224496638 0.041626039 5.39318 0.00000007 7. A 0.375573566 0.064094682 5.85967 0.00000000

Covariance\Correlation Matrix of Residuals Q QQ 3.286454389751 0.9815996605 Q 4.197924168364 5.565097402593

nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3 compute c0 = .1 compute c1 = .1 compute c2 = .1 compute d0 = .1 compute d1 = .1 compute d2 = .1 compute d3 = .1 frml d_eq q = c0 + c1*p + c2*d frml s_eq q = d0 + d1*p + d2*f + d3*a nlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq

GMM-No ZU Dependence Convergence in 6 Iterations. Final criterion was 0.0000065 < 0.0000100 Usable Observations 20 Function Value 2.98311941 J-Specification(1) 2.983119 Significance Level of J 0.08413697



34 Chapter 4

Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. C0 94.63330387 7.30265212 12.95876 0.00000000 2. C1 -0.24355654 0.08895412 -2.73800 0.00618138 3. C2 0.31399179 0.04327991 7.25491 0.00000000 4. D0 52.55266757 11.39562399 4.61165 0.00000399 5. D1 0.22705697 0.09563077 2.37431 0.01758185 6. D2 0.22449664 0.04162604 5.39318 0.00000007 7. D3 0.37557357 0.06409468 5.85967 0.00000000

4.4 Exactly identified systems

Table 4.7 shows the Kmenta supply and demand model modified to be exactly identified. In this form of the model the exogenous variable a was removed from the demand equation. In this case can be directly estimated with OLS and does not have to be calculated as using (4.1-4). It will be shown below that the LIML, 2SLS and 3SLS results are all the same. If

is calculated from the biased OLS model over identified system, it will, however, not be the same.

Table 4.6 Exactly Identified Kmenta Problem

/; Modified PROBLEM FROM KMENTA (1971) PAGE 565 - 582b34sexec options ginclude('b34sdata.mac') member(kmenta); b34srun;b34sexec simeq printsys reduced ols liml ls2 ls3 ils3 icov ipr=6itmax=2000 kcov=diag ; heading=('Modified test case from kmenta 1971 pp 565-582' ) ;* the variable a has been removed from demand equation ; exogenous constant d f ; endogenous p q ; model lvar=q rvar=(constant p d) name=('demand eq.') ; model lvar=q rvar=(constant p f) name=('supply eq.') ; b34seend ;

b34sexec matrix;call loaddata;call olsq(q d f :print);call olsq(p d f :print);b34srun;

Edited output from running the code in Table 4.6 is shown below and will show alternative ways to calculate the constrained reduced form:

Q = 71.7276 + .18278 D + .11739 F (4.4-1) (15.93) (3.86) (2.67)


P = 82.1843 + .4346 D - .28520 F (4.4-2) (10.19) (4.95) (-3.49)

which was estimated in (4.4-1) and (4.4-2) with OLS.

Modified test case from kmenta 1971 pp 565-582

Least Squares Solution for System Number 1 demand eq.



Exogenous Variables (Predetermined) Std. Error t 1 CONSTANT 99.89542 7.519362 13.28509 2 D 0.3346356 0.4542183E-01 7.367285

Endogenous Variables (Jointly Dependent) Std. Error t 3 P -0.3162988 0.9067741E-01 -3.488177



Least Squares Solution for System Number 2 supply eq.



Exogenous Variables (Predetermined) Std. Error t 1 CONSTANT 65.56501 12.76481 5.136387 2 F 0.2137827 0.5080064E-01 4.208269

Endogenous Variables (Jointly Dependent) Std. Error t 3 P 0.1467363 0.1089446 1.346889




Least Squares Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than 4.319326581036200

P Q 1 2 CONSTANT 1 74.14 76.44 D 2 0.7227 0.1060 F 3 -0.4617 0.1460


1 P 0.21861D+02 2 Q 0.44308D+01

Condition Number of columns of exogenous variables, 9.7857


Limited Information - Maximum Likelihood Solution f 1 demand eq.






Exogenous Variables (Predetermined) 1 CONSTANT 106.7894 2 D 0.3616812

36 Chapter 4

Endogenous Variables (Jointly Dependent) 3 P -0.4115989



Limited Information - Maximum Likelihood Solution f 2 supply eq.






Exogenous Variables (Predetermined) 1 CONSTANT 35.90387 2 F 0.2373297

Endogenous Variables (Jointly Dependent) 3 P 0.4205434





LIMLE Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than 2.403435013906650

P Q 1 2 CONSTANT 1 85.18 71.73 D 2 0.4346 0.1828 F 3 -0.2852 0.1174


Two Stage Least Squares Solution for System Number 1 demand eq.




Endogenous Variables (Jointly Dependent) Std. Error t Theil SE Theil t 3 P -0.4115989 0.1448445 -2.841660 0.1335401 -3.082213



Two Stage Least Squares Solution for System Number 2 supply eq.



Exogenous Variables (Predetermined) Std. Error t Theil SE Theil t 1 CONSTANT 35.90387 18.86754 1.902944 17.39501 2.064032 2 F 0.2373297 0.6019217E-01 3.942866 0.5549444E-01 4.276639

Endogenous Variables (Jointly Dependent) Std. Error t Theil SE Theil t 3 P 0.4205434 0.1660421 2.532751 0.1530833 2.747154





Two Stage Least Squares Solution Condition number of matrix used to find the reduced form coefficients is no smaller than 2.403435013906650

P Q 1 2 CONSTANT 1 85.18 71.73 D 2 0.4346 0.1828 F 3 -0.2852 0.1174


Three Stage Least Squares Solution for System Number 1 demand eq.



Endogenous Variables (Jointly Dependent) Std. Error t Theil SE Theil t 3 P -0.4115989 0.1448445 -2.841660 0.1335401 -3.082213


Three Stage Least Squares Solution for System Number 2 supply eq.


Exogenous Variables (Predetermined) Std. Error t Theil SE Theil t 1 CONSTANT 35.90387 18.86754 1.902944 17.39501 2.064032 2 F 0.2373297 0.6019217E-01 3.942866 0.5549444E-01 4.276639

Endogenous Variables (Jointly Dependent) Std. Error t Theil SE Theil t 3 P 0.4205434 0.1660421 2.532751 0.1530833 2.747154



Three Stage Least Squares Solution using Orthogonal Factorization. Condition number of matrix used to find the reduced form coefficients is no smaller than 2.403435013906646

P Q 1 2 CONSTANT 1 85.18 71.73 D 2 0.4346 0.1828 F 3 -0.2852 0.1174

Note that the following OLS regressions successfully replicate the constrained reduced form values calculated by LIML, 2SLS and 3SLS models. In such exactly identified models it is possible to proceed from the reduced form to the coefficients of the estimated simultaneous structural model as shown in Table 4.1 for the theoretical model. B34S(r) Matrix Command. d/m/y 13/ 5/08. h:m:s 8: 9:49.

=> CALL LOADDATA$

=> CALL OLSQ(Q D F :PRINT)$

Ordinary Least Squares Estimation Dependent variable Q Centered R**2 0.7142164973143195 Adjusted R**2 0.6805949087630629 Residual Sum of Squares 76.62264354549249 Residual Variance 4.507214326205441 Standard Error 2.123020095572682 Total Sum of Squares 268.1142991999998 Log Likelihood -41.81037433562074 Mean of the Dependent Variable 100.8982000000000 Std. Error of Dependent Variable 3.756498223780113 Sum Absolute Residuals 32.24420684107891 F( 2, 17) 21.24279452844673 F Significance 0.9999762143066244 1/Condition XPX 5.775396842473943E-07 Maximum Absolute Residual 4.421086526017319 Number of Observations 20

Variable Lag Coefficient SE t D 0 0.18278440 0.47299583E-01 3.8643977 F 0 0.11738935 0.44030665E-01 2.6660816

38 Chapter 4

CONSTANT 0 71.727578 4.5035392 15.926935

=> CALL OLSQ(P D F :PRINT)$

Ordinary Least Squares Estimation Dependent variable P Centered R**2 0.6043888119424351 Adjusted R**2 0.5578463192297805 Residual Sum of Squares 263.9721582328006 Residual Variance 15.52777401369415 Standard Error 3.940529661567611 Total Sum of Squares 667.2514989500000 Log Likelihood -54.17988429200851 Mean of the Dependent Variable 100.0190500000000 Std. Error of Dependent Variable 5.926086393627488 Sum Absolute Residuals 56.50496104816216 F( 2, 17) 12.98574220495295 F Significance 0.9996226165906434 1/Condition XPX 5.775396842473943E-07 Maximum Absolute Residual 9.070540097816391 Number of Observations 20

Variable Lag Coefficient SE t D 0 0.43463860 0.87792579E-01 4.9507442 F 0 -0.28520325 0.81725152E-01 -3.4897855 CONSTANT 0 85.184338 8.3590023 10.190730

4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command

The matrix command, documented in Chapter 16, provides a means by which to illustrate the estimation of OLS, 2SLS and 3SLS models using “classic textbook” formulas.


Table 4.7 shows code that implements OLS, 2SLS, 3SLS and FIML estimation using these formulas:

Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML/$/$ Estimates Kmenta Problem with Matrix command./$ Purpose is to illustrate OLS/2SLS/3SLS/FIML both with/$ SIMEQ and with Matrix Commands./$/$ FIML SE same as 3SLS asymptotically (See Greene 5e page 408)/$/$ Problem Discussed in "Specifying and Diagnostically Testing/$ Econometric Models" Chapter 4 Third Edition/$%b34slet verbose=0;/$ set =1 to "test" matrix setup. Usually set=0%b34slet dosimeq=1;/$ set =1 to run the SIMEQ command as well as matrix

B34SEXEC DATA NOHEAD CORR$ INPUT Q P D F A $ LABEL Q = 'Food consumption per head'$ LABEL P = 'Ratio of Food Prices to consumer prices'$ LABEL D = 'Disposable Income in constant prices'$ LABEL F = 'Ratio of T-1 years price to general P'$ LABEL A = 'Time'$ COMMENT=('KMENTA(1971) PAGE 565 ANSWERS PAGE 582')$ DATACARDS$ 98.485 100.323 87.4 98.0 1 99.187 104.264 97.6 99.1 2 102.163 103.435 96.7 99.1 3 101.504 104.506 98.2 98.1 4 104.240 98.001 99.8 110.8 5 103.243 99.456 100.5 108.2 6 103.993 101.066 103.2 105.6 7 99.900 104.763 107.8 109.8 8 100.350 96.446 96.6 108.7 9 102.820 91.228 88.9 100.6 10 95.435 93.085 75.1 81.0 11 92.424 98.801 76.9 68.6 12 94.535 102.908 84.6 70.9 13 98.757 98.756 90.6 81.4 14 105.797 95.119 103.1 102.3 15 100.225 98.451 105.1 105.0 16 103.522 86.498 96.4 110.5 17 99.929 104.016 104.4 92.5 18 105.223 105.769 110.7 89.3 19 106.232 113.490 127.1 93.0 20B34SRETURN$B34SEEND$

%b34sif(&dosimeq.eq.1)%then;B34SEXEC SIMEQ PRINTSYS REDUCED OLS LIML LS2 LS3 FIML FIMLC KCOV=DIAG IPR=6$ HEADING=('Test Case from Kmenta (1971) Pages 565 - 582 ' ) $ EXOGENOUS CONSTANT D F A $ ENDOGENOUS P Q $ MODEL LVAR=Q RVAR=(CONSTANT P D) NAME=('Demand Equation')$ MODEL LVAR=Q RVAR=(CONSTANT P F A) NAME=('Supply Equation')$ B34SEEND$%b34sendif;

b34sexec matrix;call loaddata;verbose=0;

%b34sif(&verbose.ne.0)%then;verbose=1;

40 Chapter 4

%b34sendif;

x_1=mfam(catcol(constant p d));x_2=mfam(catcol(constant p f a));x_1px_1=transpose(x_1)*x_1;x_2px_2=transpose(x_2)*x_2;x_1py_1=transpose(x_1)*vfam(q);x_2py_2=transpose(x_2)*vfam(q);

d1=inv(x_1px_1)*x_1py_1;d2=inv(x_2px_2)*x_2py_2;call print('OLS eq 1 ',d1 );call print('OLS eq 2 ',d2 );

* 2SLS ;* z_i is right hand side of equation i ;x = mfam(catcol(constant d f a));xpx = transpose(x)*x;z_1 = mfam(catcol(constant p d) );z_2 = mfam(catcol(constant p f a));xpz_1 = transpose(x)*z_1;xpz_2 = transpose(x)*z_2;xpy_1 = transpose(x)*vfam(q);xpy_2 = transpose(x)*vfam(q);y_1py_1 = vfam(q)*vfam(q);y_2py_2 = vfam(q)*vfam(q);y_1py_2 = vfam(q)*vfam(q);

ls2eq1=inv(transpose(xpz_1)*inv(xpx)*xpz_1)* (transpose(xpz_1)*inv(xpx)*xpy_1);call print('Two stage estimates Equation 1',ls2eq1);fit1=vfam(q)-z_1*ls2eq1;

sigma11=(y_1py_1 - (2.*vfam(q)*z_1*ls2eq1) + ls2eq1*transpose(z_1)*z_1*ls2eq1)/17.;

if(verbose.ne.0)then;call print('sigma11 ',sigma11:);call print('Residual Variance 1',sigma11*sigma11:);call print('Test 1 ',(fit1*fit1)/ 17.:);call print('Large sample ',(fit1*fit1)/ 20.:);endif;

varcoef1=sigma11*inv(transpose(z_1)*x*inv(xpx)*transpose(x)*z_1);call print('Asymptotic Covariance Matrix eq 1 ',varcoef1);

ls2eq2=inv(transpose(xpz_2)*inv(xpx)*xpz_2)* (transpose(xpz_2)*inv(xpx)*xpy_2);call print('Two stage estimates Equation 2',ls2eq2);

fit2=vfam(q)-z_2*ls2eq2;

sigma22=(y_2py_2 - (2.*vfam(q)*z_2*ls2eq2) + ls2eq2*transpose(z_2)*z_2*ls2eq2)/16.;

if(verbose.ne.0)then;call print('sigma22 ',sigma22:);call print('Residual Variance 2',sigma22*sigma22:);call print('Test 2 ',(fit2*fit2)/ 16.:);call print('Large Sample ',(fit2*fit2)/ 20.:);endif;


sigma12=(y_1py_2 - (vfam(q)*z_1*ls2eq1) - (vfam(q)*z_2*ls2eq2) + ls2eq1*transpose(z_1)*z_2*ls2eq2)/20.;

if(verbose.ne.0)call print('test sigma12 ',sigma12);

varcoef2=sigma22*inv(transpose(z_2)*x*inv(xpx)*transpose(x)*z_2);call print('Asymptotic Covariance Matrix eq 2 ',varcoef2);

* Get sigma(i,j) from fits ;

s=mfam(catcol(fit1,fit2));sigma=(transpose(s)*s)/20.;call print('Large Sample sigma (Jennings) ',sigma);

covar1=sigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1);covar2=sigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2);call print('Estimated Covariance Matrix - Large Sample':);call print(covar1,covar2);ls2se=dsqrt(array(:covar1(1,1),covar1(2,2),covar1(3,3) covar2(1,1),covar2(2,2),covar2(3,3) covar2(4,4)));call print('SE of LS2 Model Equations - Large Sample',ls2se);

sssigma(1,1)=sigma(1,1)*(20./17.);sssigma(1,2)=sigma(1,2)*(20./dsqrt(17.*16.));sssigma(2,1)=sigma(2,1)*(20./dsqrt(17.*16.));sssigma(2,2)=sigma(2,2)*(20./16.);

call print('Kmenta (Small Sample Sigma ',sssigma);covar1=sssigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1);covar2=sssigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2);

call print('Estimated Covariance Matrix - Small Sample':);call print(covar1,covar2);ls2se=dsqrt(array(:diag(covar1),diag(covar2)));call print('SE of LS2 Model Equations - Small Sample',ls2se);

* LS3 calculation ;xpxinv=inv(xpx);

/$ sigma=inv(sssigma);

sigma=inv(sigma);

term11= sigma(1,1)*(transpose(xpz_1)*xpxinv*xpz_1);term12= sigma(1,2)*(transpose(xpz_1)*xpxinv*xpz_2);term21= sigma(2,1)*(transpose(xpz_2)*xpxinv*xpz_1);term22= sigma(2,2)*(transpose(xpz_2)*xpxinv*xpz_2);left1 =catcol(term11 term12);left2 =catcol(term21 term22);left =catrow(left1 left2);

if(verbose.ne.0)call print(term11 term12 term21 term22 left1 left2 left);

right1=(sigma(1,1)*(transpose(xpz_1)*xpxinv*xpy_1)) + (sigma(1,2)*(transpose(xpz_1)*xpxinv*xpy_2));right2=(sigma(2,1)*(transpose(xpz_2)*xpxinv*xpy_1)) + (sigma(2,2)*(transpose(xpz_2)*xpxinv*xpy_2));

right=catrow(right1 right2);

42 Chapter 4

call print(right1 right2 right,inv(left));ls3=inv(left)*right;call print('Three Stage Least Squares ',ls3);

ls3se = dsqrt(diag(inv(left)));t3sls=array(norows(ls3):ls3(,1))/afam(ls3se);call print('Three Stage Least Squares SE',ls3se);call print('Three Stage Least Squares t ',t3sls);

* FIML following Kmenta (1971) pages 578 - 581 ;

* q = f(constant P D ) ;* q = g(constant p F A) ;

* q = a1 + a2*p + a3*d + u1 ;* q = b1 + b2*p + b3*f + b4*a + u2;

y = transpose(mfam(catcol(q p)));x = transpose(mfam(catcol(constant d f a)));

gt= 2.* dfloat(norows(y));t =dfloat(norows(y));call print('Using 3sls starting values ',ls3);

/$ a1=sfam(ls3(1));/$ a2=sfam(ls3(2));/$ a3=sfam(ls3(3));/$ b1=sfam(ls3(4));/$ b2=sfam(ls3(5));/$ b3=sfam(ls3(6));/$ b4=sfam(ls3(7));

program model; bigb = matrix(2,2: 1.0, -1.0*a2, 1.0, -1.0*b2); biggamma = matrix(2,4:-1.0*a1, -1.0*a3, 0.0, 0.0, -1.0*b1, 0.0, -1.0*b3, -1.0*b4); u1u2=bigb*y+biggamma*x;

phi = u1u2*transpose(u1u2);

/$ General purpose FIML setup if there are no identities/$ For a discussion of Formulas see Kmenta (1971) page 578-581

func=(-1.0*(gt*pi())/2.0) - ((t/2.0)*dlog(dmax1(dabs(det(phi)) ,.1d-30) )) + ( t *dlog(dmax1(dabs(det(bigb)),.1d-30) )) - (.5*sum(transpose(u1u2)*inv(phi)*u1u2));

call outstring(3, 3,'Function'); call outdouble(36,3,func); call outdouble(4, 4, a1); call outdouble(36,4, a2); call outdouble(55,4, a3);

call outdouble(4 ,5, b1); call outdouble(36,5, b2); call outdouble(55,5, b3);


call outdouble(4, 6, b4); return; end;

call print(model); rvec =vector(7:ls3); ll =vector(7:) -1.d+2; uu =vector(7:) +1.d+3; call echooff;

call cmaxf2(func :name model :parms a1 a2 a3 b1 b2 b3 b4 :ivalue rvec :lower ll :upper UU :maxit 10000 :maxfun 10000 :maxg 10000 :print);

b34srun;

The matrices X_1 and X_2 are built with the catcol command and the OLS estimates for equations 1 and 2 are respectively D1 and D2. Edited results show.OLS eq 1

D1 = Vector of 3 elements

99.8954 -0.316299 0.334636

=> CALL PRINT('OLS eq 2 ',D2 )$

OLS eq 2

D2 = Vector of 4 elements

58.2754 0.160367 0.248133 0.248302

which are consistent with what was obtained with the simeq command. Next using the “textbook” 2SLS formula

(4.5-1)

we obtain the 2SLS estimates and the error covariance matrix which is needed for the 3SLS estimates. Edited results match what was found earlier with simeq. Note that call echooff; has been turned off to illustrate the steps of the calculation.

Two stage estimates Equation 1

LS2EQ1 = Vector of 3 elements

94.6333 -0.243557 0.313992

=> FIT1=VFAM(Q)-Z_1*LS2EQ1$

44 Chapter 4

=> SIGMA11=(Y_1PY_1 - (2.*VFAM(Q)*Z_1*LS2EQ1) + => LS2EQ1*TRANSPOSE(Z_1)*Z_1*LS2EQ1)/17.$

=> IF(VERBOSE.NE.0)THEN$

=> CALL PRINT('sigma11 ',SIGMA11:)$

=> CALL PRINT('Residual Variance 1',SIGMA11*SIGMA11:)$

=> CALL PRINT('Test 1 ',(FIT1*FIT1)/ 17.:)$

=> CALL PRINT('Large sample ',(FIT1*FIT1)/ 20.:)$

=> ENDIF$

=> VARCOEF1=SIGMA11*INV(TRANSPOSE(Z_1)*X*INV(XPX)*TRANSPOSE(X)*Z_1)$

=> CALL PRINT('Asymptotic Covariance Matrix eq 1 ',VARCOEF1)$

Asymptotic Covariance Matrix eq 1

VARCOEF1= Matrix of 3 by 3 elements

1 2 3 1 62.7397 -0.673422 0.493016E-01 2 -0.673422 0.930922E-02 -0.264190E-02 3 0.493016E-01 -0.264190E-02 0.220371E-02

=> LS2EQ2=INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)* => (TRANSPOSE(XPZ_2)*INV(XPX)*XPY_2)$

=> CALL PRINT('Two stage estimates Equation 2',LS2EQ2)$

Two stage estimates Equation 2

LS2EQ2 = Vector of 4 elements

49.5324 0.240076 0.255606 0.252924

=> FIT2=VFAM(Q)-Z_2*LS2EQ2$

=> SIGMA22=(Y_2PY_2 - (2.*VFAM(Q)*Z_2*LS2EQ2) + => LS2EQ2*TRANSPOSE(Z_2)*Z_2*LS2EQ2)/16.$


=> CALL PRINT('sigma22 ',SIGMA22:)$

=> CALL PRINT('Residual Variance 2',SIGMA22*SIGMA22:)$

=> CALL PRINT('Test 2 ',(FIT2*FIT2)/ 16.:)$

=> CALL PRINT('Large Sample ',(FIT2*FIT2)/ 20.:)$

=> ENDIF$


=> SIGMA12=(Y_1PY_2 - (VFAM(Q)*Z_1*LS2EQ1) - (VFAM(Q)*Z_2*LS2EQ2) + => LS2EQ1*TRANSPOSE(Z_1)*Z_2*LS2EQ2)/20.$

=> IF(VERBOSE.NE.0)CALL PRINT('test sigma12 ',SIGMA12)$

=> VARCOEF2=SIGMA22*INV(TRANSPOSE(Z_2)*X*INV(XPX)*TRANSPOSE(X)*Z_2)$

=> CALL PRINT('Asymptotic Covariance Matrix eq 2 ',VARCOEF2)$

Asymptotic Covariance Matrix eq 2

VARCOEF2= Matrix of 4 by 4 elements

1 2 3 4 1 144.253 -1.09541 -0.323818 -0.295229 2 -1.09541 0.998677E-02 0.936222E-03 0.579069E-03 3 -0.323818 0.936222E-03 0.223257E-02 0.137681E-02 4 -0.295229 0.579069E-03 0.137681E-02 0.993114E-02

=> * GET SIGMA(I,J) FROM FITS $

=> S=MFAM(CATCOL(FIT1,FIT2))$

=> SIGMA=(TRANSPOSE(S)*S)/20.$

=> CALL PRINT('Large Sample sigma (Jennings) ',SIGMA)$

Large Sample sigma (Jennings)

SIGMA = Matrix of 2 by 2 elements

1 2 1 3.28645 3.59324 2 3.59324 4.83166

=> COVAR1=SIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$

=> COVAR2=SIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$

=> CALL PRINT('Estimated Covariance Matrix - Large Sample':)$

Estimated Covariance Matrix - Large Sample

=> CALL PRINT(COVAR1,COVAR2)$

COVAR1 = Matrix of 3 by 3 elements

1 2 3 1 53.3287 -0.572408 0.419064E-01 2 -0.572408 0.791284E-02 -0.224561E-02 3 0.419064E-01 -0.224561E-02 0.187315E-02


1 2 3 4 1 115.402 -0.876328 -0.259055 -0.236183 2 -0.876328 0.798942E-02 0.748977E-03 0.463256E-03 3 -0.259055 0.748977E-03 0.178606E-02 0.110144E-02 4 -0.236183 0.463256E-03 0.110144E-02 0.794491E-02

=> LS2SE=DSQRT(ARRAY(:DIAG(COVAR1),DIAG(COVAR2)))$

46 Chapter 4

=> CALL PRINT('SE of LS2 Model Equations - Large Sample',LS2SE)$

SE of LS2 Model Equations - Large Sample

LS2SE = Array of 7 elements

7.30265 0.889541E-01 0.432799E-01 10.7425 0.893836E-01 0.422617E-01 0.891342E-01

=> SSSIGMA(1,1)=SIGMA(1,1)*(20./17.)$

=> SSSIGMA(1,2)=SIGMA(1,2)*(20./DSQRT(17.*16.))$

=> SSSIGMA(2,1)=SIGMA(2,1)*(20./DSQRT(17.*16.))$

=> SSSIGMA(2,2)=SIGMA(2,2)*(20./16.)$

=> CALL PRINT('Kmenta (Small Sample Sigma ',SSSIGMA)$

Kmenta (Small Sample Sigma

SSSIGMA = Matrix of 2 by 2 elements

1 2 1 3.86642 4.35744 2 4.35744 6.03958

=> COVAR1=SSSIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$

=> COVAR2=SSSIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$

=> CALL PRINT('Estimated Covariance Matrix - Small Sample':)$

Estimated Covariance Matrix - Small Sample

=> CALL PRINT(COVAR1,COVAR2)$


1 2 3 1 62.7397 -0.673422 0.493016E-01 2 -0.673422 0.930922E-02 -0.264190E-02 3 0.493016E-01 -0.264190E-02 0.220371E-02


1 2 3 4 1 144.253 -1.09541 -0.323818 -0.295229 2 -1.09541 0.998677E-02 0.936222E-03 0.579069E-03 3 -0.323818 0.936222E-03 0.223257E-02 0.137681E-02 4 -0.295229 0.579069E-03 0.137681E-02 0.993114E-02

=> LS2SE=DSQRT(ARRAY(:COVAR1(1,1),COVAR1(2,2),COVAR1(3,3) => COVAR2(1,1),COVAR2(2,2),COVAR2(3,3) COVAR2(4,4)))$

=> CALL PRINT('SE of LS2 Model Equations - Small Sample',LS2SE)$

SE of LS2 Model Equations - Small Sample

LS2SE = Array of 7 elements


7.92084 0.964843E-01 0.469437E-01 12.0105 0.999339E-01 0.472501E-01 0.996551E-01

Note that the estimated asymptotic covariance matrix for each equation was calculated as

(4.5-2)

The SE for each coefficient is the square root of the diagonal elements of the estimated covariance matrix. The 3SLS model is estimated using the “textbook” equation as

(4.5-3)

where . Equation (4.5-3) comes directly from Kmenta (1971, 577) and is consistent with Theil (1971, 510). It is to me noted that most modern texts The estimated output verifies the simeq 3SLS command. In the matrix program output each term in (4.4-3) is broken out and put together into the left and right parts of (4.5-3), which at first looks formidable. => * LS3 CALCULATION $

=> XPXINV=INV(XPX)$

=> SIGMA=INV(SIGMA)$

=> TERM11= SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPZ_1)$




=> LEFT1 =CATCOL(TERM11 TERM12)$

=> LEFT2 =CATCOL(TERM21 TERM22)$

=> LEFT =CATROW(LEFT1 LEFT2)$


=> CALL PRINT(TERM11 TERM12 TERM21 TERM22 LEFT1 LEFT2 LEFT)$

=> ENDIF$

=> RIGHT1=(SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_1)) + => (SIGMA(1,2)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_2))$

=> RIGHT2=(SIGMA(2,1)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_1)) + => (SIGMA(2,2)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_2))$

48 Chapter 4

=> RIGHT=CATROW(RIGHT1 RIGHT2)$

=> CALL PRINT(RIGHT1 RIGHT2 RIGHT,INV(LEFT))$

RIGHT1 = Vector of 3 elements

842.104 84261.3 82406.3

RIGHT2 = Vector of 4 elements

-208.606 -20873.2 -20220.4 -2196.91

RIGHT = Matrix of 7 by 1 elements

1 1 842.104 2 84261.3 3 82406.3 4 -208.606 5 -20873.2 6 -20220.4 7 -2196.91

Matrix of 7 by 7 elements

1 2 3 4 5 6 7 1 53.3287 -0.572408 0.419064E-01 52.0707 -0.556756 0.337445E-01 0.509185E-01 2 -0.572408 0.791284E-02 -0.224561E-02 -0.291667 0.494945E-02 -0.180825E-02 -0.272854E-02 3 0.419064E-01 -0.224561E-02 0.187315E-02 -0.232929 0.632767E-03 0.150833E-02 0.227598E-02 4 52.0707 -0.291667 -0.232929 113.162 -0.866671 -0.235979 -0.327163 5 -0.556756 0.494945E-02 0.632767E-03 -0.866671 0.794779E-02 0.649506E-03 0.855426E-03 6 0.337445E-01 -0.180825E-02 0.150833E-02 -0.235979 0.649506E-03 0.154836E-02 0.203856E-02 7 0.509185E-01 -0.272854E-02 0.227598E-02 -0.327163 0.855426E-03 0.203856E-02 0.425029E-02

=> LS3=INV(LEFT)*RIGHT$

=> CALL PRINT('Three State Least Squares ',LS3)$

Three Stage Least Squares

LS3 = Matrix of 7 by 1 elements

1 1 94.6333 2 -0.243557 3 0.313992 4 52.1176 5 0.228932 6 0.228978 7 0.357907

=> LS3SE = DSQRT(DIAG(INV(LEFT)))$

=> CALL PRINT('Three State Least Squares SE',LS3SE)$

Three State Least Squares SE

LS3SE = Vector of 7 elements

7.30265 0.889541E-01 0.432799E-01 10.6378 0.891504E-01 0.393493E-01 0.651943E-01

The estimated standard errors are those suggested by Theil. The FIML estimation method required a maximization procedure. Kmenta (1971) shows that for a model without constraints FIML maximizes

(4.5-4)


where or the number of equations in the model. The Kmenta test problem can be written

(4.5-5)

For this problem

and refer to the Jacobian or absolute value of the determinant of respectively. Using the matrix command it is fairly easy to implement this estimator. Problems can arise of there are local maximums in the problem. The edited FIML results are given next.

=> PROGRAM MODEL$

=> CALL PRINT(MODEL)$

MODEL = Program

PROGRAM MODEL$BIGB = MATRIX(2,2: 1.0, -1.0*A2, 1.0, -1.0*B2)$BIGGAMMA = MATRIX(2,4:-1.0*A1, -1.0*A3, 0.0, 0.0, -1.0*B1, 0.0, -1.0*B3, -1.0*B4)$U1U2=BIGB*Y+BIGGAMMA*X$PHI = U1U2*TRANSPOSE(U1U2)$FUNC=(-1.0*(GT*PI())/2.0) - ((T/2.0)*DLOG(DMAX1(DABS(DET(PHI)) ,.1D-30) )) + ( T *DLOG(DMAX1(DABS(DET(BIGB)),.1D-30) )) - (.5*SUM(TRANSPOSE(U1U2)*INV(PHI)*U1U2))$CALL OUTSTRING(3, 3,'Function')$CALL OUTDOUBLE(36,3,FUNC)$CALL OUTDOUBLE(4, 4, A1)$CALL OUTDOUBLE(36,4, A2)$CALL OUTDOUBLE(55,4, A3)$CALL OUTDOUBLE(4 ,5, B1)$CALL OUTDOUBLE(36,5, B2)$CALL OUTDOUBLE(55,5, B3)$CALL OUTDOUBLE(4, 6, B4)$RETURN$END$

=> RVEC =VECTOR(7:LS3)$

=> LL =VECTOR(7:) -1.D+2$

=> UU =VECTOR(7:) +1.D+3$

=> CALL ECHOOFF$

Constrained Maximum Likelihood Estimation using CMAXF2 Command Final Functional Value -13.37570521223952 # of parameters 7 # of good digits in function 15 # of iterations 28

50 Chapter 4

# of function evaluations 55 # of gradiant evaluations 30 Scaled Gradient Tolerance 6.055454452393343E-06 Scaled Step Tolerance 3.666852862501036E-11 Relative Function Tolerance 3.666852862501036E-11 False Convergence Tolerance 2.220446049250313E-14 Maximum allowable step size 108037.5007234256 Size of Initial Trust region -1.000000000000000 1 / Cond. of Hessian Matrix 2.229180241990960E-09

# Name Coefficient Standard Error T Value 1 A1 93.619219 3.4191227 27.381064 2 A2 -0.22953804 0.60544227E-01 -3.7912458 3 A3 0.31001341 0.34296485E-01 9.0392183 4 B1 51.944511 7.3541629 7.0632799 5 B2 0.23730613 0.45456398E-01 5.2205221 6 B3 0.22081875 0.28752980E-01 7.6798560 7 B4 0.36970888 0.14370566E-01 25.726814

SE calculated as sqrt |diagonal(inv(%hessian))|

Hessian Matrix

1 2 3 4 5 6 7 1 230.516 23089.2 22524.9 -174.328 -17459.8 -16825.9 -1834.71 2 23086.3 0.231266E+07 0.225660E+07 -17458.5 -0.174897E+07 -0.168498E+07 -183728. 3 22522.1 0.225634E+07 0.220289E+07 -17032.0 -0.170639E+07 -0.164483E+07 -179522. 4 -174.305 -17456.3 -17029.9 135.877 13609.6 13117.1 1430.22 5 -17457.4 -0.174875E+07 -0.170618E+07 13607.8 0.136313E+07 0.131360E+07 143221. 6 -16823.6 -0.168477E+07 -0.164463E+07 13115.4 0.131342E+07 0.126732E+07 137918. 7 -1834.45 -183704. -179499. 1430.03 143201. 137898. 15323.9

Gradiant Vector

-0.568518E-06 -0.557801E-04 -0.544320E-04 0.447704E-06 0.438995E-04 0.419615E-04 0.528029E-05

Lower vector

-100.000 -100.000 -100.000 -100.000 -100.000 -100.000 -100.000

Upper vector

1000.00 1000.00 1000.00 1000.00 1000.00 1000.00 1000.00

B34S Matrix Command Ending. Last Command reached.

Space available in allocator 7873665, peak space used 8277 Number variables used 130, peak number used 135 Number temp variables used 36882, # user temp clean 0

and replicate the Kmenta test values for coefficients. The simeq FIML results are:


Functional Minimization Solution for System No. 1 Demand Equation





Functional Minimization 3SLS Covariance for System Demand Equation



Functional Minimization Solution for System No. 2 Supply Equation





Functional Minimization 3SLS Covariance for System Supply Equation



Contemporaneous Covariance of Residuals (Structural Disturbances) For Functional Minimization 3SLSQ Solution.







Condition number of matrix used to find the reduced form coefficients is no smaller than 4.284084281338983

P Q 1 2 CONSTANT 1 89.27 73.13 D 2 0.6641 0.1576 F 3 -0.4730 0.1086 A 4 -0.7919 0.1818


1 P 0.20588D+01 2 Q 0.43479D+01

and give identical coefficients but different SE's due to the algorithm used. Greene (2003, page 408), notes that "asymptotically the covariance matrix for the FIML estimator is the same as that for the 3SLS estimator."

The purpose of this exercise has been to illustrate how "textbook" formulas can be used with a programming language, such as the matrix command, to produce 2SLS, 3SLS and FIML estimates fairly easily where the alternative would be to build a C or Fortran program to perform the calculation. Since "textbook" formulas are used for the matrix example, the accuracy of these calculations are inferior to the QR approach of Jennings (1980), which is the basis for the simeq command. Inspection of the matrix program that implements these estimators may give the reader confidence to tackle other calculations that have not been implemented in commercial

52 Chapter 4

software.11 The matrix examples shown have been coded for teaching purposes (clarity of the code) not research purposes. Many components of the calculation that appear a number of places in a formula such as (4.4-3) have not been calculated once and saved.

4.6 LS2 and GMM Models and Specification tests

The Generalized Method of Moments estimation technique is a generalization of 2SLS that allows for various assumptions on the error distribution. Assume there are l instruments in

Z. The basic idea of GMM is to select coefficients such that

(4.6-1)

where

(4.6-2)

It can be shown that the efficient GMM estimator is

(4.6-3)

where

(4.6-4)

Using the 2SLS residuals, a heteroskedasticity-consistent estimator of S can be obtained as

(4.6-5)

which has been characterized as a standard sandwich approach to robust covariance estimation.For more details see Davidson and MacKinnon (1993, 607-610) and Baum (2006, 194-197)

Hall – Rudebusch - Wilcox (1996) proposed a likelihood ratio test of the relevance of

instrumental variables Z that is based on canonical correlation between the X and Z . The ordered canonical correlation vector can be calculated as the square root of the eigenvalues of

11 The modern pace of research is so fast that if one waits until a new procedure is implemented in commercial software, often it is too late.


(4.6-6)

with associated eigenvectors or the square root of the eigenvalues of

(4.6-7)

with associated eigenvectors . The vectors and maximize the correlation between

and which equals . As noted by Hall-Rudebusch-Wilcox (1996, 287) “ and are the

vectors which yield the highest correlation subject to the constrains that and are orthogonal.” The proposed Anderson statistic

(4.6-8)

is distributed as Chi-squared with (l-k+1) degrees of freedom where l is the rank of Z and k is the rank of X and can be applied to both 2SLS and GMM models. A significant statistic is consistent with appropriate instruments. A disadvantage of the Anderson test, is that it assumes that the regressors are distributed multivariate normal. Further information on the Anderson test

is in Baum (2006, 208). The Anderson statistic can also be displayed in LM form as

or in the Cragg-Donald (1993) form as . If these ststistics are not significant, the instruments selected are weak.

For GMM estimation the Hansen (1982) J statistic which tests for overidentifying restrictions is usually used. The Hansen test, which is also called the Sargon (1958) test, is the value of the efficient GMM objective function

(4.6-9)

and is distributed as chi-square with degrees of freedom l-k. A significant value indicates the selected instruments are not suitable.

The Basmann (1960) over identification test is

(4.6-10)

where is the residual from the LS2 equation and is the residual from a model that

54 Chapter 4

predicts as a function of Z. The Basmann test is distributed as chi-square with degrees of freedom l-k. If the instruments Z have no predictive power, or in other words are orthogonal to

the LS2 residuals, then and the chi-square value will not be significant. A significant chi-square value, however, indicates that the instruments are not suitable since they are not exogenous.

Table 4.8 lists subroutines LS2 and GMMEST that estimate 2SLS and GMM models respectively. For an exactly identified system, LS2 and GMM will be the same. For an overidentified system, GMM is more efficient.

Table 4.8 LS2 and General Method of Moments estimation routines

/;/; Loads LS2 and GMMEST/;subroutine ls2(y1,x1,z1,names,yvar,iprint);/;/; y1 => left hand side Usually set as %y from OLS/; x1 => right hand side. Usually set as %x from OLS step/; z1 => instrumental Variables/; names => Names from OLS step. Usually set as %names/; yvar => usually set from call olsq as %yvar/; iprint => =1 print coef, =2 print covariance in addition/;/; if # of obs for z1 < x1 then x1 will be truncated/;/; Automatic variables created/; %olscoef => OLS Coefficients/; %ols_se => OLS SE/; %ols_t => OLS t/; %ls2coef => LS2 Coefficients/; %ls2_sel => Large Sample LS2 SE/; %ls2_ses => Small Sample LS2 SE/; %ls2_t_l => Large Sample LS2 t/; %ls2_t_s => Small Sample LS2 t/; %rss_ols => e'e for OLS/; %rss_ls2 => e'e for LS2/; %yhatols => yhat for OLS/; %yhatls2 => yhat for LS2/; %resols => OLS Residual/; %resls2 => LS2 Residual/; %covar1 => Large Sample covariance/; %sigma_l => Large Sample sigma/; %sigma_s => Small Sample Sigma/; %z/; %info => Model is ok if = 0/; For conditional Heteroskedasticity Sargan(1958)=Hansen(1982) j test/; %sargan => Sargan(1958) test/; %basmann => Basmann(1960)/;/; Example Job:/;/; b34sexec options ginclude('b34sdata.mac') member(kmenta);


/; b34srun;/;/; b34sexec matrix;/; call loaddata;/; call echooff;/; call print('OLS for Equation # 1':);/; call olsq(q p d :savex :print);/; call ls2a(%y,%x,catcol(d,f,a,constant),%names,%yvar,1);/;/; call print('OLS for Equation # 2':);/; call olsq(q p a f: a :savex :print);/; call ls2a(%y,%x,catcol(d,f,a,constant),%names,%yvar,1);/; b34srun;/;/; Command built 26 April 2010, Mods 26 May 2010 2 August 2010/;y =vfam(y1);%z=mfam(z1);x =mfam(x1);n1=norows(%z);n2=norows(x);if(n2.lt.n1)call deleterow(%z,1,(n1-n2));

if(n1.lt.n2)then;call epprint('ERROR: # obs for instruments < # obs for equation');go to done;endif;

/; This saves the OLS Results

call olsq(y x :noint);%olscoef=%coef;%ols_se=%se;%ols_t =%t;n_k=%nob-%k;%rss_ols=%rss;%yhatols=%yhat;%resols =%res;

* 2SLS ;

zpz = transpose(%z)*%z;zpx = transpose(%z)*x;zpy = transpose(%z)*y;ypy = y*y;irank=rank(zpx);iorder=rank(zpz);/;if(iorder.lt.irank)then;call epprint('ERROR: Model Underidentified.':);go to done;endif;

/;%ls2coef =inv(transpose(zpx)*inv(zpz)*zpx)* (transpose(zpx)*inv(zpz)*zpy);/;/; Error trap turned off/;

/; call gminv((transpose(zpx)*inv(zpz)*zpx),%ls2coef,%info,rrcond);

56 Chapter 4

/; if(%info.ne.0)then;/; go to done;/; endif;

/; %ls2coef=%ls2coef*(transpose(zpx)*inv(zpz)*zpy);

%yhatls2=x*%ls2coef;%resls2 =y-%yhatls2;sigma_w=(ypy - (2.*y*x*%ls2coef) + %ls2coef*transpose(x)*x*%ls2coef)/dfloat(n_k);varcoef=sigma_w*inv(transpose(x)*%z*inv(zpz)*transpose(%z)*x);%ls2_ses=dsqrt(diag(varcoef));

* Get sigma(i,j) from fits ;

%rss_ls2=sumsq(%resls2);%sigma_l=%rss_ls2/dfloat(%nob);%sigma_s=%rss_ls2/dfloat(n_k);%covar_1=%sigma_l*inv(transpose(zpx)*inv(zpz)*zpx);%ls2_sel=dsqrt(diag(%covar_1));%ls2_t_s=afam(%ls2coef)/afam(%ls2_ses);%ls2_t_l=afam(%ls2coef)/afam(%ls2_sel);/;/; squared canonical correlations/;if(iprint.ne.0)then;can_corr=real(eig(inv(transpose(x)*x)*(transpose(x)*%z)*inv(zpz)*zpx));call print(can_corr);anderson=-1.*dfloat(norows(%z)) *dlog(sum(kindas(%z,1.0)-afam(can_corr)));anderlm = dfloat(norows(%z))*min(can_corr);cragg_d = anderlm/(1.0 - min(can_corr));endif;/;/; %sargan & %basmann/;call olsq(%resls2 %z :noint);%basmann=(dfloat( norows(%z)-nocols(%z))*(sumsq(%resls2)-%rss))/%rss;%sargan = dfloat(norows(%z))*%rsq;/;if(iprint.ne.0)then;call print(' ':);call print('OLS and LS2 Estimation':);call print(' ':);gg= 'Dependent Variable ';gg2=c1array(8:yvar);ff=catrow(gg,gg2);call print(ff:);call print('OLS Sum of squared Residuals ',%rss_ols:);call print('LS2 Sum of squared Residuals ',%rss_ls2:);call print('Large Sample ls2 sigma ',%sigma_l:);call print('Small Sample ls2 sigma ',%sigma_s:);call print('Rank of Equation ',irank:);call print('Order of Equation ',iorder:);if(irank.lt.iorder)call print('Equation is overidentified':);if(irank.eq.iorder)call print('Equation is exactly identified':);/;call print('Anderson LR ident./IV Relevance test ',anderson:);/;if(iorder.ge.irank.and.anderson.gt.0.0)then;


aprob=chisqprob(anderson,dfloat(iorder+1-irank));call print('Significance of Anderson LR Statistic',aprob:);endif;/;call print('Anderson Canon Correlation LM test ',anderlm:);/;if(iorder.ge.irank.and.anderlm.gt.0.0)then;aprob=chisqprob(anderlm,dfloat(iorder+1-irank));call print('Significance of Anderson LM Statistic',aprob:);endif;/;call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:);/;if(iorder.ge.irank.and.cragg_d.gt.0.0)then;aprob=chisqprob(cragg_d,dfloat(iorder+1-irank));call print('Significance of Cragg-Donald test ',aprob:);endif;/;call print('Basmann ',%basmann:);/;if(iorder.gt.irank.and.%basmann.gt.0.0)then;bprob=chisqprob(%basmann,dfloat(iorder-irank));call print('Significance of Basmann Statistic ',bprob:);endif;/;call print('Sargan N*R-sq / J-Test Test ',%sargan:);/;if(iorder.gt.irank.and.%sargan.gt.0.0)then;sprob=chisqprob(%sargan,dfloat(iorder-irank));call print('Significance of Sargan Statistic ',sprob:);endif;/;call tabulate(names,%olscoef,%ols_se,%ols_t,%ls2coef, %ls2_ses,%ls2_sel, %ls2_t_s,%ls2_t_l :title'+++++++++++++++++++++++++++++++++++++++++++++++++++++');

call print(' ':);if(iprint.eq.2)call print('Estimated Covariance Matrix - Large Sample',%covar_1);endif;/;call makeglobal(%olscoef);call makeglobal(%ols_se);call makeglobal(%ols_t);call makeglobal(%ls2coef);call makeglobal(%ls2_sel);call makeglobal(%ls2_ses);call makeglobal(%ls2_t_l);call makeglobal(%ls2_t_s);call makeglobal(%rss_ols);call makeglobal(%rss_ls2);call makeglobal(%yhatols);call makeglobal(%yhatls2);call makeglobal(%resols);call makeglobal(%resls2);call makeglobal(%covar_1);call makeglobal(%sigma_l);call makeglobal(%sigma_s);call makeglobal(%z);

58 Chapter 4

call makeglobal(%sargan);call makeglobal(%basmann);/; call makeglobal(%info);/;done continue;return;end;subroutine gmmest(y,x,z,names,yvar,j_stat,sigma,iprint);/;/; GMM Model - Built 12 May 2010/;/; Must call ls2 prior to this call to produce global variable/; %z/;/; The following global variables are created:/; %resgmm => GMM Residuals/; %segmm => GMM SE/; %tgmm => GMM t/; %coefgmm => GMM Coef/; %yhatgmm => GMM Y hat/;/; The Anderson Test is discussed in Baum/; "An introduction to Modern Econometrics Using Stata" (2006) p. 208/; Both the IV and LM forms of tgeh test are given./;/; Generates feasable two-step GMM Estimator. Results are the same as/; produced by the RATS "optimalweights" option./;/; Note: When running bootstraps inv(s) can fail to invert if dummy/; variables are in the dataset./;/; See Baum (2006) page 196/;xpz = transpose(x)*z;xpy = transpose(x)*vfam(y);ypy = vfam(y)*vfam(y);/;/; GMM Coefficients/;irank =rank(xpz);iorder=rank(transpose(z)*z);/;if(iorder.lt.irank)then;call epprint('ERROR: Model Underidentified.':);go to done;endif;/;adj=kindas(z,1.0)/dfloat(norows(z));s=hc_sigma(adj,z,%resls2);inv_s=inv(s);%coefgmm=inv(xpz*inv_s*transpose(xpz)) * (xpz*inv_s*transpose(z)*vfam(y));%resgmm =vfam(y)-x*%coefgmm;%yhatgmm=x*%coefgmm;sigma=hc_sigma(kindas(z,1.),z,%resls2);/;/; Logic from Rats User's Guide Version 7 page 245/;j_stat=%resgmm*z*inv(sigma)*transpose(z)*%resgmm;/;/; Stock Watcon 2007 page 734


/;%segmm=dsqrt(diag(inv(xpz*inv(sigma)*transpose(xpz))));%tgmm=afam(%coefgmm)/afam(%segmm);/;/;/; squared canonical correlations/;can_corr = real(eig(inv(transpose(x)*x)*(transpose(x)*z) *inv(transpose(z)*z)* transpose(xpz)));/;if(iprint.gt.1)call print(can_corr);anderson=-1.*dfloat(norows(z)) *dlog(sum(kindas(z,1.0)-afam(can_corr)));anderlm = dfloat(norows(z))*min(can_corr);cragg_d = anderlm/(1.0 - min(can_corr));/;if(iprint.ne.0)then;call print(' ':);call print('GMM Estimates':);call print(' ':);gg= 'Dependent Variable ';gg2=c1array(8:yvar);ff=catrow(gg,gg2);call print(ff:);call print('OLS sum of squares ',sumsq(%resols):);call print('LS2 sum of squares ',sumsq(%resls2):);call print('GMM sum of squares ',sumsq(%resgmm):);call print('Rank of Equation ',irank:);call print('Order of Equation ',iorder:);if(irank.lt.iorder)call print('Equation is overidentified':);if(irank.eq.iorder)call print('Equation is exactly identified':);call print('Anderson ident./IV Relevance test ',anderson:);/;if(iorder.ge.irank.and.anderson.gt.0.0)then;aprob=chisqprob(anderson,dfloat(iorder+1-irank));call print('Significance of Anderson Statistic ',aprob:);endif;/;call print('Anderson Canon Correlation LM test ',anderlm:);/;if(iorder.ge.irank.and.anderlm.gt.0.0)then;aprob=chisqprob(anderlm,dfloat(iorder+1-irank));call print('Significance of Anderson LM Statistic',aprob:);endif;/;call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:);/;if(iorder.ge.irank.and.cragg_d.gt.0.0)then;aprob=chisqprob(cragg_d,dfloat(iorder+1-irank));call print('Significance of Cragg-Donald test ',aprob:);endif;/;call print('Hansen J_stat Ident. of instruments',j_stat:);/;if(iorder.gt.irank.and.j_stat.gt.0.0)then;jprob=chisqprob(j_stat,dfloat(iorder-irank));call print('Significance of Hansen j_stat ',jprob:);endif;/;call tabulate(names,%coefgmm,%segmm,%tgmm :title '+++++++++++++++++++++++++++++++++++++++++++++++++++++');

60 Chapter 4

call print(' ':);endif;

call makeglobal(%resgmm);call makeglobal(%segmm);call makeglobal(%tgmm);call makeglobal(%coefgmm);call makeglobal(%yhatgmm);done continue;return;end;

Table 4.9 shows the setup to estimate and test LS2 and GMM models for the Griliches (1976) wage data used as a test case in Baum (2006). The Griliches model regresses the log wage on education, experience, tenure, age, a number of control variables and various year dummy variables. Stata and Rats results are shown for comparison. In addition Baum (2006) can be inspected for replication purposes.

Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats

%b34slet dob34s1=0;%b34slet dob34s2=1;%b34slet dostata=1;%b34slet dorats =1;b34sexec options ginclude('micro.mac') member(griliches76); b34srun%b34sif(&dob34s1.ne.0)%then;b34sexec matrix;call loaddata;call echooff;call olsq(iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt :print);iqyhat=%yhat;

call olsq(lw iqyhat s expr tenure rns smsaiyear_67iyear_68iyear_69iyear_70iyear_71iyear_73 :print);

call olsq(lw iq s expr tenure rns smsaiyear_67iyear_68iyear_69iyear_70iyear_71iyear_73 :print);

call gamfit(lw iq s expr tenure rns[factor,1] smsa[factor,1]iyear_67[factor,1]iyear_68[factor,1]iyear_69[factor,1]iyear_70[factor,1]iyear_71[factor,1]


iyear_73[factor,1] :print);

call marspline(lw iq s expr tenure rns smsaiyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print :nk 40 :mi 2);

call gamfit(lw80 iq s expr tenure rns[factor,1] smsa[factor,1]iyear_67[factor,1]iyear_68[factor,1]iyear_69[factor,1]iyear_70[factor,1]iyear_71[factor,1]iyear_73[factor,1] :print);

call marspline(lw80 iq s expr tenure rns smsaiyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print :nk 40 :mi 2);b34srun;%b34sendif;

%b34sif(&dob34s2.ne.0)%then;b34sexec matrix;call loaddata;call load(ls2);call echooff;

call character(lhs,'lw');call character(endvar,'iq');

call character(rhs,'iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 constant');

call character(ivar,'s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 constant med kww age mrt');

call olsq(argument(lhs) argument(rhs) :noint :print :savex);

call ls2(%y,%x,catcol(argument(ivar)),%names,%yvar,1);call print(lhs,rhs,ivar,endvar);

call gmmest(%y,%x,%z,%names,%yvar,j_stat,sigma,1);call graph(%y %yhatols %yhatls2,%yhatgmm :nocontact :pgborder :nolabel);

b34srun;%b34sendif;%b34sif(&dostata.ne.0)%then;b34sexec options open('statdata.do') unit(28) disp=unknown$ b34srun$b34sexec options clean(28)$ b34srun$b34sexec options open('stata.do') unit(29) disp=unknown$ b34srun$b34sexec options clean(29)$ b34srun$b34sexec pgmcall idata=28 icntrl=29$stata$* for detail on stata commands see Baum page 205 ;pgmcards$

* uncomment if do not use /e* log using stata.log, text

global xlist s expr tenure rns smsa iyear_67 /// iyear_68 iyear_69 iyear_70 iyear_71 iyear_73

62 Chapter 4

ivregress 2sls lw $xlist (iq=med kww age mrt)ivregress liml lw $xlist (iq=med kww age mrt)ivregress gmm lw $xlist (iq=med kww age mrt)

ivreg lw $xlist (iq=med kww age mrt)

ivreg2 lw $xlist (iq=med kww age mrt)

ivreg2 lw $xlist (iq=med kww age mrt), gmm2s robustoverid, all

* orthog(age mrt)

gmm (lw-{xb:$xlist iq} +{b0}), /// instruments ($xlist med kww age mrt) onestep nologexit,clear b34sreturn$ b34seend$b34sexec options close(28); b34srun;b34sexec options close(29); b34srun;b34sexec options dounix('stata -b do stata.do ') dodos('stata /e stata.do'); b34srun;b34sexec options npageout writeout('output from stata',' ',' ') copyfout('stata.log') dodos('erase stata.do',/; 'erase stata.log','erase statdata.do') $ b34srun$%b34sendif;

%b34sif(&dorats.ne.0)%then;b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$b34sexec options clean(28)$ b34srun$b34sexec options clean(29)$ b34srun$

b34sexec pgmcall$ rats passastspcomments('* ', '* Data passed from B34S(r) system to RATS', '* ', "display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()" '* ') $

PGMCARDS$*

instruments s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 $ med kww age mrt constant

* OLS

linreg lw# constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq


* 2SLS

linreg(inst) lw# constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq

* GMM

linreg(inst,optimalweights) lw# constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq

b34sreturn$b34srun $

b34sexec options close(28)$ b34srun$b34sexec options close(29)$ b34srun$b34sexec options/$ dodos(' rats386 rats.in rats.out ') dodos('start /w /r rats32s rats.in /run') dounix('rats rats.in rats.out')$ B34SRUN$

b34sexec options npageout WRITEOUT('Output from RATS',' ',' ') COPYFOUT('rats.out') dodos('ERASE rats.in','ERASE rats.out','ERASE rats.dat') dounix('rm rats.in','rm rats.out','rm rats.dat') $ B34SRUN$%b34sendif;

Edited and annotated results are shown next.

Variable Label # Cases Mean Std. Dev. Variance Maximum Minimum

RNS 1 residency in South 758 0.269129 0.443800 0.196959 1.00000 0.00000 RNS80 2 residency in South in 1980 758 0.292876 0.455383 0.207373 1.00000 0.00000 MRT 3 marital status = 1 if married 758 0.514512 0.500119 0.250119 1.00000 0.00000 MRT80 4 marital status = 1 if married in 1980 758 0.898417 0.302299 0.913845E-01 1.00000 0.00000 SMSA 5 reside metro area = 1 if urban 758 0.704485 0.456575 0.208461 1.00000 0.00000 SMSA80 6 reside metro area = 1 if urban in 1980 758 0.712401 0.452942 0.205156 1.00000 0.00000 MED 7 mother s education, years 758 10.9103 2.74112 7.51374 18.0000 0.00000 IQ 8 iq score 758 103.856 13.6187 185.468 145.000 54.0000 KWW 9 score on knowledge in world of work test 758 36.5739 7.30225 53.3228 56.0000 12.0000 YEAR 10 Year 758 69.0317 2.63179 6.92634 73.0000 66.0000 AGE 11 Age 758 21.8351 2.98176 8.89087 30.0000 16.0000 AGE80 12 Age in 1980 758 33.0119 3.08550 9.52033 38.0000 28.0000 S 13 completed years of schooling 758 13.4050 2.23183 4.98106 18.0000 9.00000 S80 14 completed years of schooling in 1980 758 13.7071 2.21469 4.90486 18.0000 9.00000 EXPR 15 experience, years 758 1.73543 2.10554 4.43331 11.4440 0.00000 EXPR80 16 experience, yearsin 1980 758 11.3943 4.21075 17.7304 22.0450 0.692000 TENURE 17 tenure, years 758 1.83113 1.67363 2.80104 10.0000 0.00000 TENURE80 18 tenure, years in 1980 758 7.36280 5.05024 25.5049 22.0000 0.00000 LW 19 log wage 758 5.68674 0.428949 0.183998 7.05100 4.60500 LW80 20 log wage in 1980 758 6.82656 0.409927 0.168040 8.03200 4.74900 IYEAR_67 21 758 0.831135E-01 0.276236 0.763063E-01 1.00000 0.00000 IYEAR_68 22 758 0.104222 0.305750 0.934828E-01 1.00000 0.00000 IYEAR_69 23 758 0.112137 0.315744 0.996940E-01 1.00000 0.00000 IYEAR_70 24 758 0.844327E-01 0.278219 0.774060E-01 1.00000 0.00000 IYEAR_71 25 758 0.121372 0.326775 0.106782 1.00000 0.00000 IYEAR_73 26 758 0.208443 0.406464 0.165213 1.00000 0.00000 CONSTANT 27 758 1.00000 0.00000 0.00000 1.00000 1.00000

Number of observations in data file 758 Current missing variable code 1.000000000000000E+31

64 Chapter 4

Ordinary Least Squares Estimation Dependent variable LW Centered R**2 0.4301415547786606 Adjusted R**2 0.4209626268019410 Residual Sum of Squares 79.37338878983863 Residual Variance 0.1065414614628706 Standard Error 0.3264068955504320 Total Sum of Squares 139.2861498420176 Log Likelihood -220.3342420049200 Mean of the Dependent Variable 5.686738782319042 Std. Error of Dependent Variable 0.4289493629019316 Sum Absolute Residuals 194.5217111479906 F(12, 745) 46.86185095575703 F Significance 1.000000000000000 1/Condition XPX 1.486105464518127E-06 Maximum Absolute Residual 1.186094775249485 Number of Observations 758

Variable Lag Coefficient SE t IQ 0 0.27121199E-02 0.10314110E-02 2.6295239 S 0 0.61954782E-01 0.72785810E-02 8.5119313 EXPR 0 0.30839472E-01 0.65100828E-02 4.7371858 TENURE 0 0.42163060E-01 0.74812112E-02 5.6358601 RNS 0 -0.96293467E-01 0.27546700E-01 -3.4956444 SMSA 0 0.13289929 0.26575835E-01 5.0007567 IYEAR_67 0 -0.54209478E-01 0.47852181E-01 -1.1328528 IYEAR_68 0 0.80580850E-01 0.44895091E-01 1.7948700 IYEAR_69 0 0.20759151 0.43860470E-01 4.7329979 IYEAR_70 0 0.22822373 0.48799418E-01 4.6767716 IYEAR_71 0 0.22269148 0.43095233E-01 5.1674272 IYEAR_73 0 0.32287469 0.40657433E-01 7.9413448 CONSTANT 0 4.2353569 0.11334886 37.365677

The below listed edited output replicates Baum (2006, 193-194). The Basman and Sargan tests of 97.0249 and 87.655, respectively, show high significance which rejects the null hypothesis that there is no correlation between the residuals of the LS2 model and the instruments. This finding suggests serious problems since endogeniety present in the OLS model will not be removed by LS2 estimation. Note that Stata replicates the Sargon test value. The Anderson value of 54.33 that tests for the relevance of the instruments matches the value reported in Baum (2006, 204) but does not match the value reported by Stata in the printed output that uses the revised ivreg2 Stata command that uses the LM form of the test value of 52.436. The B34S output includes both statistics. Since the null was rejected, the instruments appear relevant in that they are related to the endogenous variables. This is confirmed with the Cragg-Donald (1993) statistic of 56.333. In addition to various LS2 and GMM results, both Stata bootstrap and Stata robust errors results are shown. The bootstrap results do not make do not make assumptions about the distribution of the regressiors.

The Rats coefficient results for LS2 and GMM match B34S and Stata. Note that Rats uses the small sample SE formula while Stata reports the large sample SE. B34S LS2 results report both. The exact formulas for all LS2 and GMM calculations in B34S are contained in the two subroutines listed in Table 4.8.


OLS and LS2 Estimation

Dependent Variable LW OLS Sum of squared Residuals 79.37338878983863 LS2 Sum of squared Residuals 80.01823370030675 Large Sample ls2 sigma 0.1055649521112226 Small Sample ls2 sigma 0.1074070251010829 Rank of Equation 13 Order of Equation 16 Equation is overidentified Anderson LR ident./IV Relevance test 54.33777011513529 Significance of Anderson LR Statistic 0.9999999999552830 Anderson Canon Correlation LM test 52.43586586757428 Significance of Anderson LM Statistic 0.9999999998881718 Cragg-Donald Chi-Square Weak ID Test 56.33277600836977 Significance of Cragg-Donald test 0.9999999999829244 Basmann 97.02497131695870 Significance of Basmann Statistic 1.000000000000000 Sargan N*R-sq / J-Test Test 87.65523169449482 Significance of Sargan Statistic 1.000000000000000

+++++++++++++++++++++++++++++++++++++++++++++++++++++

Obs NAMES %OLSCOEF %OLS_SE %OLS_T %LS2COEF %LS2_SES %LS2_SEL %LS2_T_S %LS2_T_L 1 IQ 0.2712E-02 0.1031E-02 2.630 0.1747E-03 0.3937E-02 0.3903E-02 0.4436E-01 0.4474E-01 2 S 0.6195E-01 0.7279E-02 8.512 0.6918E-01 0.1305E-01 0.1294E-01 5.301 5.347 3 EXPR 0.3084E-01 0.6510E-02 4.737 0.2987E-01 0.6697E-02 0.6639E-02 4.460 4.498 4 TENURE 0.4216E-01 0.7481E-02 5.636 0.4327E-01 0.7693E-02 0.7627E-02 5.625 5.674 5 RNS -0.9629E-01 0.2755E-01 -3.496 -0.1036 0.2974E-01 0.2948E-01 -3.484 -3.514 6 SMSA 0.1329 0.2658E-01 5.001 0.1351 0.2689E-01 0.2666E-01 5.025 5.069 7 IYEAR_67 -0.5421E-01 0.4785E-01 -1.133 -0.5260E-01 0.4811E-01 0.4769E-01 -1.093 -1.103 8 IYEAR_68 0.8058E-01 0.4490E-01 1.795 0.7947E-01 0.4511E-01 0.4472E-01 1.762 1.777 9 IYEAR_69 0.2076 0.4386E-01 4.733 0.2109 0.4432E-01 0.4393E-01 4.759 4.800 10 IYEAR_70 0.2282 0.4880E-01 4.677 0.2386 0.5142E-01 0.5097E-01 4.641 4.682 11 IYEAR_71 0.2227 0.4310E-01 5.167 0.2285 0.4412E-01 0.4374E-01 5.178 5.223 12 IYEAR_73 0.3229 0.4066E-01 7.941 0.3259 0.4107E-01 0.4072E-01 7.935 8.004 13 CONSTANT 4.235 0.1133 37.37 4.400 0.2709 0.2685 16.24 16.38

LHS = LW

RHS = IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_ 70 IYEAR_71 IYEAR_73 CONSTANT

IVAR = S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT MED KWW AGE MRT

ENDVAR = iq

GMM Estimates

Dependent Variable LW OLS sum of squares 79.37338878983863 LS2 sum of squares 80.01823370030675 GMM sum of squares 81.26217887229201 Rank of Equation 13 Order of Equation 16 Equation is overidentified Anderson ident./IV Relevance test 54.33777011513529 Significance of Anderson Statistic 0.9999999999552830 Anderson Canon Correlation LM test 52.43586586757428 Significance of Anderson LM Statistic 0.9999999998881718 Cragg-Donald Chi-Square Weak ID Test 56.33277600836977 Significance of Cragg-Donald test 0.9999999999829244 Hansen J_stat Ident. of instruments 74.16487762432548 Significance of Hansen j_stat 0.9999999999999994

+++++++++++++++++++++++++++++++++++++++++++++++++++++

Obs NAMES %COEFGMM %SEGMM %TGMM 1 IQ -0.1401E-02 0.4113E-02 -0.3407 2 S 0.7684E-01 0.1319E-01 5.827 3 EXPR 0.3123E-01 0.6693E-02 4.667 4 TENURE 0.4900E-01 0.7344E-02 6.672 5 RNS -0.1007 0.2959E-01 -3.403 6 SMSA 0.1336 0.2632E-01 5.075 7 IYEAR_67 -0.2101E-01 0.4554E-01 -0.4614 8 IYEAR_68 0.8910E-01 0.4270E-01 2.087 9 IYEAR_69 0.2072 0.4080E-01 5.080 10 IYEAR_70 0.2338 0.5285E-01 4.424 11 IYEAR_71 0.2346 0.4257E-01 5.510 12 IYEAR_73 0.3360 0.4041E-01 8.315 13 CONSTANT 4.437 0.2900 15.30

66 Chapter 4

B34S Matrix Command Ending. Last Command reached.output from stata

___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 11.1 Copyright 2009 StataCorp LP Statistics/Data Analysis StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 [email protected] 979-696-4601 (fax)

Single-user Stata perpetual license: Serial number: 30110535901 Licensed to: Houston H. Stokes University of Illinois at Chicago

Notes: 1. (/m# option or -set memory-) 120.00 MB allocated to data 2. Stata running in batch mode

. do stata.do

. * File built by B34S on 17/10/10 at 12:29:31 . run statdata.do

. * uncomment if do not use /e . * log using stata.log, text . global xlist s expr tenure rns smsa iyear_67 /// > iyear_68 iyear_69 iyear_70 iyear_71 iyear_73

. bootstrap _b _se, reps(50): /// > ivregress 2sls lw $xlist (iq=med kww age mrt) (running ivregress on estimation sample)

Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50

Bootstrap results Number of obs = 758 Replications = 50

------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- b | iq | .0001747 .0074584 0.02 0.981 -.0144435 .0147928 s | .0691759 .0217356 3.18 0.001 .0265749 .1117769 expr | .029866 .0079507 3.76 0.000 .014283 .0454491 tenure | .0432738 .0086468 5.00 0.000 .0263264 .0602211 rns | -.1035897 .0406823 -2.55 0.011 -.1833256 -.0238538 smsa | .1351148 .0258812 5.22 0.000 .0843886 .1858411 iyear_67 | -.052598 .0422675 -1.24 0.213 -.1354408 .0302448 iyear_68 | .0794686 .0459301 1.73 0.084 -.0105528 .16949 iyear_69 | .2108962 .0456788 4.62 0.000 .1213673 .300425 iyear_70 | .2386338 .0592127 4.03 0.000 .122579 .3546886 iyear_71 | .2284609 .0513617 4.45 0.000 .1277939 .3291279 iyear_73 | .3258944 .0432171 7.54 0.000 .2411904 .4105984 _cons | 4.39955 .4995474 8.81 0.000 3.420455 5.378645 -------------+---------------------------------------------------------------- se | iq | .0039035 .0012226 3.19 0.001 .0015073 .0062996 s | .0129366 .0034772 3.72 0.000 .0061214 .0197518 expr | .0066393 .0007373 9.00 0.000 .0051941 .0080845 tenure | .0076271 .0011929 6.39 0.000 .005289 .0099652 rns | .029481 .0052416 5.62 0.000 .0192077 .0397544 smsa | .0266573 .002741 9.73 0.000 .021285 .0320297 iyear_67 | .0476924 .0051268 9.30 0.000 .0376441 .0577407 iyear_68 | .0447194 .004026 11.11 0.000 .0368285 .0526102 iyear_69 | .0439336 .0055467 7.92 0.000 .0330623 .054805 iyear_70 | .0509733 .0052485 9.71 0.000 .0406864 .0612601 iyear_71 | .0437436 .0041483 10.54 0.000 .035613 .0518741 iyear_73 | .0407181 .0041193 9.88 0.000 .0326444 .0487917 _cons | .2685443 .0796381 3.37 0.001 .1124564 .4246321 ------------------------------------------------------------------------------

. * Durbin-Wu-Hausman exogenous test robust errors . ivregress 2sls lw $xlist (iq=med kww age mrt), vce(robust)


Instrumental variables (2SLS) regression Number of obs = 758 Wald chi2(12) = 573.14 Prob > chi2 = 0.0000 R-squared = 0.4255 Root MSE = .32491

------------------------------------------------------------------------------ | Robust lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- iq | .0001747 .0041241 0.04 0.966 -.0079085 .0082578 s | .0691759 .0132907 5.20 0.000 .0431266 .0952253 expr | .029866 .0066974 4.46 0.000 .0167394 .0429926 tenure | .0432738 .0073857 5.86 0.000 .0287981 .0577494 rns | -.1035897 .029748 -3.48 0.000 -.1618947 -.0452847 smsa | .1351148 .026333 5.13 0.000 .0835032 .1867265 iyear_67 | -.052598 .0457261 -1.15 0.250 -.1422195 .0370235 iyear_68 | .0794686 .0428231 1.86 0.063 -.0044631 .1634003 iyear_69 | .2108962 .0408774 5.16 0.000 .1307779 .2910144 iyear_70 | .2386338 .0529825 4.50 0.000 .1347901 .3424776 iyear_71 | .2284609 .0426054 5.36 0.000 .1449558 .311966 iyear_73 | .3258944 .0405569 8.04 0.000 .2464044 .4053844 _cons | 4.39955 .290085 15.17 0.000 3.830994 4.968106 ------------------------------------------------------------------------------ Instrumented: iq Instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt

. ivreg2 lw $xlist (iq=med kww age mrt)

IV (2SLS) estimation --------------------

Estimates efficient for homoskedasticity only Statistics consistent for homoskedasticity only

Number of obs = 758 F( 12, 745) = 45.91 Prob > F = 0.0000 Total (centered) SS = 139.2861498 Centered R2 = 0.4255 Total (uncentered) SS = 24652.24662 Uncentered R2 = 0.9968 Residual SS = 80.0182337 Root MSE = .3249

------------------------------------------------------------------------------ lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- iq | .0001747 .0039035 0.04 0.964 -.007476 .0078253 s | .0691759 .0129366 5.35 0.000 .0438206 .0945312 expr | .029866 .0066393 4.50 0.000 .0168533 .0428788 tenure | .0432738 .0076271 5.67 0.000 .0283249 .0582226 rns | -.1035897 .029481 -3.51 0.000 -.1613715 -.0458079 smsa | .1351148 .0266573 5.07 0.000 .0828674 .1873623 iyear_67 | -.052598 .0476924 -1.10 0.270 -.1460734 .0408774 iyear_68 | .0794686 .0447194 1.78 0.076 -.0081797 .1671169 iyear_69 | .2108962 .0439336 4.80 0.000 .1247878 .2970045 iyear_70 | .2386338 .0509733 4.68 0.000 .1387281 .3385396 iyear_71 | .2284609 .0437436 5.22 0.000 .1427251 .3141967 iyear_73 | .3258944 .0407181 8.00 0.000 .2460884 .4057004 _cons | 4.39955 .2685443 16.38 0.000 3.873213 4.925887 ------------------------------------------------------------------------------ Underidentification test (Anderson canon. corr. LM statistic): 52.436 Chi-sq(4) P-val = 0.0000 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 13.786 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 16.85 10% maximal IV relative bias 10.27 20% maximal IV relative bias 6.71 30% maximal IV relative bias 5.34 10% maximal IV size 24.58 15% maximal IV size 13.96 20% maximal IV size 10.26 25% maximal IV size 8.31

Source: Stock-Yogo (2005). Reproduced by permission. ------------------------------------------------------------------------------ Sargan statistic (overidentification test of all instruments): 87.655 Chi-sq(3) P-val = 0.0000 ------------------------------------------------------------------------------ Instrumented: iq Included instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 Excluded instruments: med kww age mrt ------------------------------------------------------------------------------

68 Chapter 4

. ivreg2 lw $xlist (iq=med kww age mrt), gmm2s robust

2-Step GMM estimation ---------------------

Estimates efficient for arbitrary heteroskedasticity Statistics robust to heteroskedasticity

Number of obs = 758 F( 12, 745) = 49.67 Prob > F = 0.0000 Total (centered) SS = 139.2861498 Centered R2 = 0.4166 Total (uncentered) SS = 24652.24662 Uncentered R2 = 0.9967 Residual SS = 81.26217887 Root MSE = .3274

------------------------------------------------------------------------------ | Robust lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- iq | -.0014014 .0041131 -0.34 0.733 -.009463 .0066602 s | .0768355 .0131859 5.83 0.000 .0509915 .1026794 expr | .0312339 .0066931 4.67 0.000 .0181157 .0443522 tenure | .0489998 .0073437 6.67 0.000 .0346064 .0633931 rns | -.1006811 .0295887 -3.40 0.001 -.1586738 -.0426884 smsa | .1335973 .0263245 5.08 0.000 .0820021 .1851925 iyear_67 | -.0210135 .0455433 -0.46 0.645 -.1102768 .0682498 iyear_68 | .0890993 .042702 2.09 0.037 .0054049 .1727937 iyear_69 | .2072484 .0407995 5.08 0.000 .1272828 .287214 iyear_70 | .2338308 .0528512 4.42 0.000 .1302445 .3374172 iyear_71 | .2345525 .0425661 5.51 0.000 .1511244 .3179805 iyear_73 | .3360267 .0404103 8.32 0.000 .2568239 .4152295 _cons | 4.436784 .2899504 15.30 0.000 3.868492 5.005077 ------------------------------------------------------------------------------ Underidentification test (Kleibergen-Paap rk LM statistic): 41.537 Chi-sq(4) P-val = 0.0000 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 13.786 (Kleibergen-Paap rk Wald F statistic): 12.167 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 16.85 10% maximal IV relative bias 10.27 20% maximal IV relative bias 6.71 30% maximal IV relative bias 5.34 10% maximal IV size 24.58 15% maximal IV size 13.96 20% maximal IV size 10.26


Output from RATS

* * Data passed from B34S(r) system to RATS * display @1 %dateandtime() @33 ' Rats Version ' %ratsversion() 10/17/2010 12:29 Rats Version 7.30000 * CALENDAR(IRREGULAR) ALLOCATE 758 OPEN DATA rats.dat DATA(FORMAT=FREE,ORG=OBS, $ MISSING= 0.1000000000000000E+32 ) / $ RNS $ RNS80 $ MRT $ MRT80 $ SMSA $ SMSA80 $ MED $ IQ $ KWW $ YEAR $ AGE $ AGE80 $ S $ S80 $ EXPR $ EXPR80 $ TENURE $ TENURE80 $ LW $ LW80 $ IYEAR_67 $ IYEAR_68 $ IYEAR_69 $ IYEAR_70 $ IYEAR_71 $ IYEAR_73 $ CONSTANT SET TREND = T TABLE Series Obs Mean Std Error Minimum Maximum RNS 758 0.269129288 0.443800128 0.000000000 1.000000000 RNS80 758 0.292875989 0.455382503 0.000000000 1.000000000 MRT 758 0.514511873 0.500119364 0.000000000 1.000000000 MRT80 758 0.898416887 0.302298767 0.000000000 1.000000000 SMSA 758 0.704485488 0.456574966 0.000000000 1.000000000 SMSA80 758 0.712401055 0.452941990 0.000000000 1.000000000 MED 758 10.910290237 2.741119861 0.000000000 18.000000000 IQ 758 103.856200528 13.618666082 54.000000000 145.000000000 KWW 758 36.573878628 7.302246519 12.000000000 56.000000000 YEAR 758 69.031662269 2.631794247 66.000000000 73.000000000 AGE 758 21.835092348 2.981755741 16.000000000 30.000000000 AGE80 758 33.011873351 3.085503913 28.000000000 38.000000000 S 758 13.405013193 2.231828411 9.000000000 18.000000000 S80 758 13.707124011 2.214692601 9.000000000 18.000000000 EXPR 758 1.735428758 2.105542485 0.000000000 11.444000244 EXPR80 758 11.394261214 4.210745167 0.691999972 22.045000076 TENURE 758 1.831134565 1.673629972 0.000000000 10.000000000 TENURE80 758 7.362796834 5.050240439 0.000000000 22.000000000 LW 758 5.686738782 0.428949363 4.605000019 7.051000118 LW80 758 6.826555411 0.409926757 4.749000072 8.031999588 IYEAR_67 758 0.083113456 0.276235910 0.000000000 1.000000000 IYEAR_68 758 0.104221636 0.305749595 0.000000000 1.000000000 IYEAR_69 758 0.112137203 0.315743524 0.000000000 1.000000000 IYEAR_70 758 0.084432718 0.278219253 0.000000000 1.000000000 IYEAR_71 758 0.121372032 0.326774746 0.000000000 1.000000000 IYEAR_73 758 0.208443272 0.406463569 0.000000000 1.000000000 TREND 758 379.500000000 218.960042017 1.000000000 758.000000000

70 Chapter 4

* instruments s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 $ med kww age mrt constant * OLS linreg lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq

Linear Regression - Estimation by Least Squares Dependent Variable LW Usable Observations 758 Degrees of Freedom 745 Centered R**2 0.430142 R Bar **2 0.420963 Uncentered R**2 0.996780 T x R**2 755.559 Mean of Dependent Variable 5.6867387823 Std Error of Dependent Variable 0.4289493629 Standard Error of Estimate 0.3264068956 Sum of Squared Residuals 79.373388790 Regression F(12,745) 46.8619 Significance Level of F 0.00000000 Log Likelihood -220.33424 Durbin-Watson Statistic 1.726206

Variable Coeff Std Error T-Stat Signif ******************************************************************************** 1. Constant 4.235356890 0.113348861 37.36568 0.00000000 2. S 0.061954782 0.007278581 8.51193 0.00000000 3. EXPR 0.030839472 0.006510083 4.73719 0.00000260 4. TENURE 0.042163060 0.007481211 5.63586 0.00000002 5. RNS -0.096293467 0.027546700 -3.49564 0.00050091 6. SMSA 0.132899286 0.026575835 5.00076 0.00000071 7. IYEAR_67 -0.054209478 0.047852181 -1.13285 0.25764051 8. IYEAR_68 0.080580850 0.044895091 1.79487 0.07307967 9. IYEAR_69 0.207591515 0.043860470 4.73300 0.00000265 10. IYEAR_70 0.228223732 0.048799418 4.67677 0.00000346 11. IYEAR_71 0.222691481 0.043095233 5.16743 0.00000031 12. IYEAR_73 0.322874689 0.040657433 7.94134 0.00000000 13. IQ 0.002712120 0.001031411 2.62952 0.00872684

* 2SLS linreg(inst) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq

Linear Regression - Estimation by Instrumental Variables Dependent Variable LW Usable Observations 758 Degrees of Freedom 745 Mean of Dependent Variable 5.6867387823 Std Error of Dependent Variable 0.4289493629 Standard Error of Estimate 0.3277301102 Sum of Squared Residuals 80.018233699 J-Specification(3) 86.151910 Significance Level of J 0.00000000 Durbin-Watson Statistic 1.723148

Variable Coeff Std Error T-Stat Signif ******************************************************************************** 1. Constant 4.399550073 0.270877148 16.24187 0.00000000 2. S 0.069175917 0.013048998 5.30124 0.00000015 3. EXPR 0.029866018 0.006696962 4.45964 0.00000948 4. TENURE 0.043273756 0.007693380 5.62480 0.00000003 5. RNS -0.103589698 0.029737133 -3.48351 0.00052378 6. SMSA 0.135114831 0.026888925 5.02492 0.00000063 7. IYEAR_67 -0.052598010 0.048106697 -1.09336 0.27458852 8. IYEAR_68 0.079468615 0.045107833 1.76175 0.07852207 9. IYEAR_69 0.210896152 0.044315294 4.75899 0.00000234 10. IYEAR_70 0.238633821 0.051416062 4.64123 0.00000409 11. IYEAR_71 0.228460915 0.044123572 5.17775 0.00000029 12. IYEAR_73 0.325894418 0.041071810 7.93475 0.00000000 13. IQ 0.000174655 0.003937397 0.04436 0.96463097


* GMM linreg(inst,optimalweights) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq

Linear Regression - Estimation by GMM Dependent Variable LW Usable Observations 758 Degrees of Freedom 745 Mean of Dependent Variable 5.6867387823 Std Error of Dependent Variable 0.4289493629 Standard Error of Estimate 0.3302676947 Sum of Squared Residuals 81.262178869 J-Specification(3) 74.164878 Significance Level of J 0.00000000 Durbin-Watson Statistic 1.720776

Variable Coeff Std Error T-Stat Signif ******************************************************************************** 1. Constant 4.436784487 0.289950376 15.30188 0.00000000 2. S 0.076835453 0.013185922 5.82708 0.00000001 3. EXPR 0.031233937 0.006693110 4.66658 0.00000306 4. TENURE 0.048999780 0.007343684 6.67237 0.00000000 5. RNS -0.100681114 0.029588671 -3.40269 0.00066726 6. SMSA 0.133597299 0.026324546 5.07501 0.00000039 7. IYEAR_67 -0.021013483 0.045543337 -0.46140 0.64451500 8. IYEAR_68 0.089099315 0.042701995 2.08654 0.03692996 9. IYEAR_69 0.207248405 0.040799543 5.07967 0.00000038 10. IYEAR_70 0.233830843 0.052851170 4.42433 0.00000967 11. IYEAR_71 0.234552477 0.042566121 5.51031 0.00000004 12. IYEAR_73 0.336026675 0.040410335 8.31536 0.00000000 13. IQ -0.001401434 0.004113144 -0.34072 0.73331372

4.7 Potential problems of IV Models

Instrumental variable estimation methods, while necessary and useful for models with endogenous variables on the right, have a number of features that can be serious drawbacks.12 In the first place such estimators are never unbiased when endogenous variables are on the right. Citing Kinal (1980), Wooldridge (2010, 207) notes "when all endogenous variables have homoskedastic normal distributions with expectations linear in the exogenous variables, the number of moments of the 2SLS estimator that exist is one fewer than the number of overidentifying restrictions. This finding implies that when the number of instruments equals the number of explanatory variables, the IV estimator does not have the expected value." Even for large sample analysis, there will be problems if there are weak instruments. Assume a single endogenous variable x on the right or

(4.7-1)

where z is the instrumental variable. It can be shown that

(4.7-2)

The greater the correlation between the instruments and the population error the greater the bias. The weaker the instrument, the smaller and the greater the bias. The bias in the 12 Wooldridge (2010) especially pages 107-114 forms the basis for this section.

72 Chapter 4

OLS estimator is

(4.7-3)

and can be less than the bias in the IV estimator if

(4.7-4)

The more significant the Anderson test, the larger everything else equal and the less the bias in the IV estimator. The more significant the Basmann (1960) test, the larger and the more bias in the IV estimator.

4.8 Conclusion

The simeq command should be used when either there are endogenous variables on the right-hand side of a regression model or when the seemingly unrelated regression model is desired. In the former case, if OLS is attempted, the results will be biased estimates. Jennings (1973, 1980), the original developer of the simeq code, made a major contribution in developing fast and accurate code that was designed to alert the user to problems in the structure of the model. These include rank tests on all the key matrices as well as rank tests on the matrix of exogenous variables in the system. The matrix command was used to illustrate calculation of OLS, LIML, 2SLS, 3SLS and FIML models using more traditional equations that those used by Jennings. SAS and Rats code was shown and the results compared to the B34S program output. Using the matrix command LS2 (same as 2SLS) and GMM routines together with a number of diagnostic tests are shown and the results compared to Stata and Rats using an important dataset studied by Griliches (1975) and Baum (2006).

chapter 4 - houston h. stokes pagehhstokes.people.uic.edu/ftp/book1/ch4.docx · web view4.5...

Documents