3sls 3sls is the combination of 2sls and sur. it is used in an system of equations which are...

31
3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables on both the left and right hand sides of the equation. THAT IS THE 2SLS PART. But there error terms in each equation are also correlated. Efficient estimation requires we take account of this. THAT IS THE SUR (SEEMINGLY UNRELATED REGRESSIONS). PART. Hence in the regression for the ith equation there are endogenous (Y ) variables on the rhs AND the error term is correlated with the error terms in other equations.

Upload: nicole-carney

Post on 28-Mar-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

3SLS

3SLS is the combination of 2SLS and SUR.

It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables on both the left and right hand sides of the equation. THAT IS THE 2SLS PART.

But there error terms in each equation are also correlated. Efficient estimation requires we take account of this. THAT IS THE SUR (SEEMINGLY UNRELATED REGRESSIONS). PART.

Hence in the regression for the ith equation there are endogenous (Y ) variables on the rhs AND the error term is correlated with the error terms in other equations.

Page 2: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

3SLSlog using "g:summ1.log"

If you type the above then a log is created on drive g (on my computer this is the flash drive, on yours you may need to specify another drive.

The name summ1 can be anything. But the suffx must be log

At the end you can close the log by typing:

log close

So open a log now and you will have a record of this session

Page 3: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

3SLS Load Data

Clearuse http://www.ats.ucla.edu/stat/stata/examples/greene/TBL16-2

THAT link no longer works. But the following doeswebuse kleinIn order to get the rest to workrename consump crename capital1 k1rename invest irename profits p rename govt grename wagegovt wgrename taxnetx trename totinc trename wagepriv wpgenerate x=totinc

Page 4: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

*generate variablesgenerate w = wg+wpgenerate k = k1+igenerate yr=year-1931generate p1 = p[_n-1] generate x1 = x[_n-1]

Page 5: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

OLS Regression

regress c p p1 w

Regresses c on p , p1 and w (what this equation means is not so important).

Page 6: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Usual output

_cons 16.2366 1.302698 12.46 0.000 13.48815 18.98506 w .7962188 .0399439 19.93 0.000 .7119444 .8804931 p1 .0898847 .0906479 0.99 0.335 -.1013658 .2811351 p .1929343 .0912102 2.12 0.049 .0004977 .385371 c Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 941.429389 20 47.0714695 Root MSE = 1.0255 Adj R-squared = 0.9777 Residual 17.8794524 17 1.05173249 R-squared = 0.9810 Model 923.549937 3 307.849979 Prob > F = 0.0000 F( 3, 17) = 292.71 Source SS df MS Number of obs = 21

Page 7: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

reg3

By the command reg3, STATA estimates a system of structural equations, where some equations contain endogenous variables among the explanatory variables. Estimation is via three-stage least squares (3SLS). Typically, the endogenous regressors are dependent variables from other equations in the system.

In addition, reg3 can also estimate systems of equations by seemingly unrelated regression (SURE), multivariate regression (MVREG), and equation-by-equation ordinary least squares (OLS) or two-stage least squares (2SLS).

Page 8: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

2SLS Regression

reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1)

Regresses c on p , p1 and w. The instruments (i.e. The predetermined or exogenous variables in this equation and the rest of the system) are t wg g yr p1 x1 k1

This means that p and w (which are not included in the instruments are endogenous).

Page 9: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

The output is as before, but it confirms what the exogenous and endogenous

variables are.

.

Exogenous variables: t wg g yr p1 x1 k1 Endogenous variables: c p w _cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192 w .8101827 .0447351 18.11 0.000 .7158 .9045654 p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696 p .0173022 .1312046 0.13 0.897 -.2595153 .2941197c Coef. Std. Err. t P>|t| [95% Conf. Interval]

c 21 3 1.135659 0.9767 225.93 0.0000 Equation Obs Parms RMSE "R-sq" F-Stat P Two-stage least-squares regression

Page 10: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

2SLS Regression

ivreg c p1 (p w = t wg g yr p1 x1 k1)

This is an alternative command to do the same thing. Note that the endogenous variables on the right hand side of the equation are specified in (p w

And the instruments follow the = sign.

Page 11: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

The results are identical.

Instruments: p1 t wg g yr x1 k1Instrumented: p w _cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192 p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696 w .8101827 .0447351 18.11 0.000 .7158 .9045654 p .0173022 .1312046 0.13 0.897 -.2595153 .2941197 c Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 941.429389 20 47.0714695 Root MSE = 1.1357 Adj R-squared = 0.9726 Residual 21.9252518 17 1.28972069 R-squared = 0.9767 Model 919.504138 3 306.501379 Prob > F = 0.0000 F( 3, 17) = 225.93 Source SS df MS Number of obs = 21

Instrumental variables (2SLS) regression

Page 12: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

3SLS Regression

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)

This format does two new things. First it specifies all the three equations in the system. Note it has to do this. Because it needs to calculate the covariances between the error terms and for this it needs to know what the equations – and hence the errors –are.

Secondly it says 3sls not 2sls

Page 13: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

All 3 equations are printed out. This tells us what these equations look like

Exogenous variables: t wg g yr p1 x1 k1 Endogenous variables: c p w i wp x _cons 1.797216 1.115854 1.61 0.107 -.3898181 3.984251 yr .149674 .0279352 5.36 0.000 .094922 .2044261 x1 .181291 .0341588 5.31 0.000 .1143411 .2482409 x .4004919 .0318134 12.59 0.000 .3381388 .462845wp _cons 28.17785 6.793768 4.15 0.000 14.86231 41.49339 k1 -.1948482 .0325307 -5.99 0.000 -.2586072 -.1310893 p1 .7557238 .1529331 4.94 0.000 .4559805 1.055467 p -.0130791 .1618962 -0.08 0.936 -.3303898 .3042316i _cons 16.44079 1.304549 12.60 0.000 13.88392 18.99766 w .790081 .0379379 20.83 0.000 .715724 .8644379 p1 .1631439 .1004382 1.62 0.104 -.0337113 .3599992 p .1248904 .1081291 1.16 0.248 -.0870387 .3368194c Coef. Std. Err. z P>|z| [95% Conf. Interval]

wp 21 3 .7211282 0.9863 1594.75 0.0000i 21 3 1.446736 0.8258 162.98 0.0000c 21 3 .9443305 0.9801 864.59 0.0000 Equation Obs Parms RMSE "R-sq" chi2 P Three-stage least-squares regression

Page 14: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Lets compare the three different sets of equations. Look at the coefficient on w. In OLS very significant and in 2SLS not significant but in 3SLS its back to similar with OLS and significant. That is odd.

Now I expect that if 2sls is different because of bias then so should 3sls. As it stands it suggests that OLS is closer to 3SLS than 2SLS is to 3SLS. Which does not make an awful lot of sense.

But we do not have many observations. Perhaps that is partly why.

3SLS 2SLS OLScoeffi cient t stat coeffi cient t stat coeffi cient t stat

p 0.125 1.16 0.017 0.13 0.193 2.12p1 0.163 1.62 0.810 18.11 0.090 0.99w 0.790 20.83 0.216 1.81 0.796 19.93_cons 16.441 12.6 16.555 11.28 16.237 12.46R2 0.98 0.977 0.981

Page 15: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

3SLS Regression

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma)

Now this command stores the variances and covariances between the error terms in a matrix I call sig.

You have used generate to generate variables, scalar to generate scalars. Similarly matrix produces a matrix.

e(Sigma)stores this variance covariance matrix from the previous regression

Page 16: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

3SLS Regression

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma)display sig[1,1], sig[1,2], sig[1,3]display sig[2,1], sig[2,2], sig[2,3]display sig[3,1], sig[3,2], sig[3,3]

. display sig[1,1], sig[1,2], sig[1,3]1.0440596 .43784767 -.3852272

.

. display sig[2,1], sig[2,2], sig[2,3]

.43784767 1.3831832 .19260612

.

. display sig[3,1], sig[3,2], sig[3,3]-.3852272 .19260612 .476426261.04406 0.437848 -0.38523

0.437848 1.383183 0.192606

-0.38523 0.192606 0.476426

Variance of 1st error term

Covariance of error terms from equations 2 and 3

Page 17: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

3SLS Regression

.

1.04406 0.437848 -0.38523

0.437848 1.383183 0.192606

-0.38523 0.192606 0.476426

This relates to the variance covariance matrix in the lecture

Hence 0.437848 relates to σ12 and of course σ21

This matrix is Σ

Page 18: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

3SLS Regression

display sig[1,2]/( sig[1,1] ^0.5* sig[2,2]^0.5)

Now this should give the correlation between the error terms from equations 1 and 2.

It is this formula Correlation (x, y) = σxy /(σx σx). When we do this we get:

.36435149

. display sig[1,2]/( sig[1,1]^0.5* sig[2,2]^0.5)

Page 19: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Lets check

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma)matrix cy= e(b)generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])correlate ri rc

matrix cy= e(b) stores the coefficients from the regression in a regression vector we call cy,

cy[1,1] is the first coefficient on p in the first equationcy[1,4] is the fourth coefficient in the first equation (the constant term)cy[1,5] is the first coefficient ion p in the second equationNote this is cy[1,5] NOT cy[2,1]

Page 20: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Lets checkreg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma)matrix cy= e(b)generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])correlate ri rc

Thus cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4] is the predicted value from this first regression. and

i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])

Is the actual minus the predicted value, i.e. The error term from the 2nd equation

correlate ri rc prints out the correlation between the two error terms

Page 21: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Lets checkreg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma)matrix cy= e(b)generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])correlate ri rc

The correlation is 0,30, close to what we had before. But not the same. Now the main purpose of this class is to illustrate commands. So its not too important. I think it could be because stata is not calculating the e(sigma) matrix by dividing by n-k, but just n?????

rc 0.3011 1.0000 ri 1.0000 ri rc

(obs=21). correlate ri rc

Page 22: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Lets checkClick on help (on tool bar at the top of the screen to the right).Click on ‘stata command’In the dialogue box type reg3

Move down towards the end of the file and you get the following

e(cons_#) 1 when equation # has a constant; 0 otherwise e(ic) number of iterations e(p_#) significance for equation # e(chi2_#) chi-squared for equation # e(ll) log likelihood e(dfk2_adj) divisor used with VCE when dfk2 specified e(rmse_#) root mean squared error for equation # e(F_#) F statistic for equation # (small) e(r2_#) R-squared for equation # e(df_r) residual degrees of freedom ( small) e(rss_#) residual sum of squares for equation # e(df_m#) model degrees of freedom for equation # e(mss_#) model sum of squares for equation # e(k_eq) number of equations e(k) number of parameters e(N) number of observations Scalars

reg3 saves the following in e():

Saved results

Page 23: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Some important retrievables e(mss_#) model sum of squares for equation # e(rss_#) residual sum of squares for equation # e(r2_#) R-squared for equation # e(F_#) F statistic for equation # (small) e(rmse_#) root mean squared error for equation # e(ll) log likelihood

Where # is a number e.g. If 2 it means equation 2.

And

Matrices e(b) coefficient vector e(Sigma) Sigma hat matrix e(V) variance-covariance matrix of the estimators

Page 24: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

The Hausman Test AgainWe looked at this with respect to panel data. But it is a general test to allow us to compare an equation which has been estimated by two different techniques. Here we apply the technique to comparing ols with 3sls.

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),olsest store EQNols

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1 x1 k1)est store EQN3sls

hausman EQNols EQN3sls

Page 25: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

The Hausman Test AgainBelow we run the three regressions specifying ols and store the results as EQNols.

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),olsest store EQNols

Then we run the three regressions specifying 3sls and store the results as EQN3sls.

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1 x1 k1)est store EQN3sls

Then we do the Hausman testhausman EQNols EQN3sls

Page 26: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

The Results

(V_b-V_B is not positive definite) Prob>chi2 = 0.9963 = 0.06 chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)

Test: Ho: difference in coefficients not systematic

B = inconsistent under Ha, efficient under Ho; obtained from reg3 b = consistent under Ho and Ha; obtained from reg3 w .7962188 .790081 .0061378 .0124993 p1 .0898847 .1631439 -.0732592 . p .1929343 .1248904 .068044 . EQNols EQN3sls Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients

. hausman EQNols EQN3sls

The table prints out the two sets of coefficients and their difference.

The Hausman test statistic is 0.06

The significance level is 0.9963

This is clearly very far from being significant at the 10% level.

Page 27: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

The Hausman Test AgainHence it would appear that the coefficients from the two regressions are not significantly different.

If OLS was giving biased estimates that 3SLS corrects they would be different.

Hence we would conclude that there is no endogeneity which requires endogenous techniques.

But because the error terms do appear correlated SUR is probably the approriate technique as it produces better results.

Page 28: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Tasks1. Using the display command, e.g.

display e(mss_2)

Print on the screen some of the retrievables from eqach regression (the above the model sum of squared residuals for the second equation.

2. Lets look at the display command

Type:

display "The residual sum of squares =" e(mss_2)

Page 29: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Tasks

display "The residual sum of squares =" e(mss_2), "and the R2 =" e(r2_2)

display _column(20) "The residual sum of squares =" e(mss_2), _column(50) "and the R2 =" e(r2_2)

display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" e(r2_2)

display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" _skip(5) e(r2_2)

display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" _skip(10) e(r2_2)

Page 30: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

Tasks

Close log:

log close

And have a look at it in word.

Page 31: 3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables

webuse kleinIn order to get the rest to workrename consump crename capital1 k1rename invest irename profits p rename govt grename wagegovt wgrename taxnetx trename totinc trename wagepriv wpgenerate x=totincgenerate w = wg+wpgenerate k = k1+igenerate yr=year-1931generate p1 = p[_n-1] generate x1 = x[_n-1]reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1)reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)