Download - Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 12: Asymptotics for the Regression Model12-1/39

Econometrics IProfessor William Greene

Stern School of Business

Department of Economics


Econometrics I

Part 12 – Asymptotics for the Regression Model


SettingThe least squares estimator is

(XX)-1Xy = (XX)-1ixiyi

= + (XX)-1ixiεi

So, it is a constant vector plus a sum of random variables. Our ‘finite sample’ results established the behavior of the sum according to the rules of statistics. The question for the present is how does this sum of random variables behave in large samples?


Well Behaved Regressors

A crucial assumption: Convergence of the moment matrix XX/n to a positive definite matrix of finite elements, Q

What kind of data will satisfy this assumption?

What won’t?

Does stochastic vs. nonstochastic matter?

Various conditions for “well behaved X”


Probability Limit

1n

i ii 1

1n

i i i ii 1 i

We use convergence in mean square. Adequate for almost all problems,

not adequate for some time series problems.

1 1n n

1 1 1( '

n n n

b X'X x

b- b- X'X x x

1n

1

1 1n n

i i j j2 i 1 j=1

1n

1 1 1

n nn

In E[( '| ] in the double sum, terms with unequal

subscripts have expectation zero.

E[( '|

X'X

X'X x x X'X

b- b- X

b- b-

1 1

n 2i j i2 i 1

1 1 12 2

1 1 1] 'E[ | ]

n nn

1 1 1 1

n n n n n n

X X'X x x X X'X

X'X X'X X'X X'X


Mean Square Convergence

E[b|X]=β for any X.

Var[b|X]0 for any specific well behaved X

b converges in mean square to β


Crucial Assumption of the Model

i

i

1What must be assumed to get plim ?

n

(1) = a random vector with finite means and variance

and identical distributions.

(2) = a random variable with a constant distribution with

finite mean

X' 0

x

i

i i

i i i

n

ii 1

and variance and E[ ]=0

(3) and statistically independent.

Then, = = an observation in a random sample, with

constant variance matrix and mean vector 0.

1 converges to its expectat

n

x

z x

z ion by the law of large numbers.


Consistency of s2

2

2

1 1 n 1s

n K n K n K nn

1n K

1 1 1plims plim plim ( )

n n n

1 1 1 1plim plim plim ( ) plim

n n n n

1plim

n

What must be a

-1

-1

-1

e'e 'M 'M

'M ' 'X X'X X'

' 'X X'X X'

' 0'Q 0

2 21ssumed to claim plim = E[ ] ?

n'


Asymptotic Distribution

1

n

i ii 1

1 1n n

The limiting behavior of is the same as

that of the statistic that results when the

moment matrix is replaced by its limit. We

examine the behavior of the modified

b X'X x

b

n1i ii 1

sum

1n

Q x


Asymptotics

n1i ii 1

1n

What is the mean of this random vector?

What is its variance?

Do they 'converge' to something? We use

this method to find the probability limit.

What is the asymptotic distribu

Q x

tion?


Asymptotic Distributions Finding the asymptotic distribution b β in probability. How to describe the

distribution? Has no ‘limiting’ distribution

Variance 0; it is O(1/n) Stabilize the variance? Var[n b] ~ σ2Q-1 is O(1) But, E[n b]= n β which diverges

n (b - β) a random variable with finite mean and variance. (stabilizing transformation)

b apx. β +1/ n times that random variable


Limiting Distribution

n (b - β) = n (X’X)-1X’ε = n (X’X/n)-1(X’ε/n)

Limiting behavior is the same as that of

n Q-1(X’ε/n)

Q is a fixed matrix. Behavior depends on the random vector n (X’ε/n)


Limiting Normality

n n

i i ii 1 i 1

n

ii 1

1 1 1n n n n

n n n1

Mean of a sample. n Independent observations.

Mean converges to zero (plim (1/n) already assumed

n a candidate for the Lindberg-Feller Central L

X'ε x w w

w

X'ε = 0

w = 2 2

i i i

2

2

imit Theorem.

Variance of each term (| ) is '. Variance of /n.

Var n

Based on the CLT, n N[ , ]

d

x x x w Q

w Q

w 0 Q


Asymptotic Distribution

1

-1

2

-1 -1 2 -1 2 -1

2 -1

Limiting distribution of

'n( ) n

n n

is the same as that of n .

n N[ , ]

Therefore,

n N[0, ( ) ] N[ , ]

Conclude: n( ) N[ , ]

Approximately : N[

d

d

d

a

X'X Xb

Q w

w 0 Q

Q w Q Q Q 0 Q

b 0 Q

b

2 -1, ( n) ]/ Q


Asymptotic Properties

Probability Limit and Consistency Asymptotic Variance Asymptotic Distribution


Root n Consistency How ‘fast’ does b β? Asy.Var[b] =σ2/n Q-1 is O(1/n)

Convergence is at the rate of 1/n n b has variance of O(1)

Is there any other kind of convergence? x1,…,xn = a sample from exponential population; min

has variance O(1/n2). This is ‘n – convergent’ Certain nonparametric estimators have variances that

are O(1/n2/3). Less than root n convergent. Kernel density estimators converge slower than n


Asymptotic Results Distribution of b does not depend on normality of ε Estimator of the asymptotic variance (σ2/n)Q-1 is (s2/n)

(X’X/n)-1. (Degrees of freedom corrections are irrelevant but conventional.)

Slutsky theorem and the delta method apply to functions of b.


Test StatisticsWe have established the asymptotic distribution of b. We

now turn to the construction of test statistics. In particular, we based tests on the Wald statistic

F[J,n-K] = (1/J)(Rb - q)’[R s2(XX)-1R]-1(Rb - q)

This is the usual test statistic for testing linear hypotheses in the linear regression model, distributed exactly as F if the disturbances are normally distributed. We now obtain some general results that will let us construct test statistics in more general situations.


Full Rank Quadratic Form

A crucial distributional result (exact): If the random vector x has a K-variate normal distribution with mean vector and covariance matrix , then the random variable W = (x - )-1(x - ) has a chi-squared distribution with K degrees of freedom.

(See Section 5.4.2 in the text.)


Building the Wald Statistic-1 Suppose that the same normal distribution

assumptions hold, but instead of the parameter matrix we do the computation using a matrix Sn which has the property plim Sn = . The exact chi-squared result no longer holds, but the limiting distribution is the same as if the true were used.


Building the Wald Statistic-2Suppose the statistic is computed not with an x that has an

exact normal distribution, but with an xn which has a limiting normal distribution, but whose finite sample distribution might be something else. Our earlier results for functions of random variables give us the result

(xn - ) Sn-1(xn - ) 2[K]

(!!!)VVIR! Note that in fact, nothing in this relies on the normal distribution. What we used is consistency of a certain estimator (Sn) and the central limit theorem for xn.


General Result for Wald DistanceThe Wald distance measure: If plim xn = , xn is

asymptotically normally distributed with a mean of and variance , and if Sn is a consistent estimator of , then the Wald statistic, which is a generalized distance measure between xn converges to a chi-squared variate.

(xn - ) Sn-1(xn - ) 2[K]


The F Statistic An application: (Familiar) Suppose bn is the least

squares estimator of based on a sample of n observations. No assumption of normality of the disturbances or about nonstochastic regressors is made. The standard F statistic for testing the hypothesis H0: R - q = 0 is

F[J, n-K] = [(e*’e* - e’e)/J] / [e’e / (n-K)] where this is built of two sums of squared residuals.

The statistic does not have an F distribution. How can we test the hypothesis?


JF is a Wald StatisticF[J,n-K] = (1/J) (Rbn - q)[R s2(XX)-1 R’]-1 (Rbn - q).

Write m = (Rbn - q). Under the hypothesis, plim m=0.

n m N[0, R(2/n)Q-1R’]

Estimate the variance with R(s2/n)(X’X/n)-1R’]

Then, (n m )’ [Est.Var(n m)]-1 (n m )

fits exactly into the apparatus developed earlier. If plim bn = , plim s2 = 2, and the other asymptotic results we developed for least squares hold, then

JF[J,n-K] 2[J].


Application: Wald Testsread;nobs=27;nvar=10;names=Year, G , Pg, Y , Pnc , Puc , Ppt , Pd , Pn , Ps $1960 129.7 .925 6036 1.045 .836 .810 .444 .331 .3021961 131.3 .914 6113 1.045 .869 .846 .448 .335 .3071962 137.1 .919 6271 1.041 .948 .874 .457 .338 .3141963 141.6 .918 6378 1.035 .960 .885 .463 .343 .3201964 148.8 .914 6727 1.032 1.001 .901 .470 .347 .3251965 155.9 .949 7027 1.009 .994 .919 .471 .353 .3321966 164.9 .970 7280 .991 .970 .952 .475 .366 .3421967 171.0 1.000 7513 1.000 1.000 1.000 .483 .375 .3531968 183.4 1.014 7728 1.028 1.028 1.046 .501 .390 .3681969 195.8 1.047 7891 1.044 1.031 1.127 .514 .409 .3861970 207.4 1.056 8134 1.076 1.043 1.285 .527 .427 .4071971 218.3 1.063 8322 1.120 1.102 1.377 .547 .442 .4311972 226.8 1.076 8562 1.110 1.105 1.434 .555 .458 .4511973 237.9 1.181 9042 1.111 1.176 1.448 .566 .497 .4741974 225.8 1.599 8867 1.175 1.226 1.480 .604 .572 .5131975 232.4 1.708 8944 1.276 1.464 1.586 .659 .615 .5561976 241.7 1.779 9175 1.357 1.679 1.742 .695 .638 .5981977 249.2 1.882 9381 1.429 1.828 1.824 .727 .671 .6481978 261.3 1.963 9735 1.538 1.865 1.878 .769 .719 .6981979 248.9 2.656 9829 1.660 2.010 2.003 .821 .800 .7561980 226.8 3.691 9722 1.793 2.081 2.516 .892 .894 .8391981 225.6 4.109 9769 1.902 2.569 3.120 .957 .969 .9261982 228.8 3.894 9725 1.976 2.964 3.460 1.000 1.000 1.0001983 239.6 3.764 9930 2.026 3.297 3.626 1.041 1.021 1.0621984 244.7 3.707 10421 2.085 3.757 3.852 1.038 1.050 1.1171985 245.8 3.738 10563 2.152 3.797 4.028 1.045 1.075 1.1731986 269.4 2.921 10780 2.240 3.632 4.264 1.053 1.069 1.224


Data Setup

Create;G=log(G);Pg=log(PG);y=log(y);pnc=log(pnc);puc=log(puc);ppt=log(ppt);pd=log(pd);pn=log(pn);ps=log(ps);t=year-1960$

Namelist;X=one,y,pg,pnc,puc,ppt,pd,pn,ps,t$Regress;lhs=g;rhs=X$


Regression ModelBased on the gasoline data: The regression

equation is

g =1 + 2y + 3pg + 4pnc + 5puc +

6ppt + 7pd + 8pn + 9ps + 10t +

All variables are logs of the raw variables, so that coefficients are elasticities. The new variable, t, is a time trend, 0,1,…,26, so that 10 is the autonomous yearly proportional growth in G.


Least Squares Results+----------------------------------------------------+| Ordinary least squares regression || LHS=G Mean = 5.308616 || Standard deviation = .2313508 || Model size Parameters = 10 || Degrees of freedom = 17 || Residuals Sum of squares = .003776938 | | Standard error of e = .01490546 || Fit R-squared = .9972859 || Adjusted R-squared = .9958490 || Model test F[ 9, 17] (prob) = 694.07 (.0000) || Chi-sq [ 9] (prob) = 159.55 (.0000) |+----------------------------------------------------++---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ Constant -5.97984140 2.50176400 -2.390 .0287 Y 1.39438363 .27824509 5.011 .0001 9.03448264 PG -.58143705 .06111346 -9.514 .0000 .47679491 PNC -.29476979 .25797920 -1.143 .2690 .28100132 PUC -.20153591 .07415599 -2.718 .0146 .40523616 PPT .08050720 .08706712 .925 .3681 .47071442 PD 1.50606609 .29745626 5.063 .0001 -.44279509 PN .99947385 .27032812 3.697 .0018 -.58532943 PS -.81789420 .46197918 -1.770 .0946 -.62272267 T -.01251291 .01263559 -.990 .3359 13.0000000


Covariance Matrix


Linear Hypothesis

H0: Aggregate price variables are not significant determinants of gasoline consumption

H0: β7 = β8 = β9 = 0

H1: At least one is nonzero

0 0 0 0 0 0 1 0 0 0 0

= 0 0 0 0 0 0 0 1 0 0 , = 0

0 0 0 0 0 0 0 0 1 0 0

Rβ- q= 0

R q


Wald Test

Matrix ; R = [0,0,0,0,0,0,1,0,0,0/ 0,0,0,0,0,0,0,1,0,0/ 0,0,0,0,0,0,0,0,1,0] ; q = [0 / 0 / 0 ] $Matrix ; m = R*b - q ; Vm = R*Varb*R' ; List ; Wald = m'<Vm>m $Matrix WALD has 1 rows and 1 columns. 1 +-------------- 1| 66.91506


Restricted RegressionCompare Sums of SquaresRegress; lhs=g;rhs=X;

cls:pd=0,pn=0,ps=0$

+----------------------------------------------------+| Linearly restricted regression || Ordinary least squares regression || LHS=G Mean = 5.308616 || Standard deviation = .2313508 || Residuals Sum of squares = .01864365 | .00377694| Standard error of e = .3053166E-01 | | Fit R-squared = .9866028 | .9972859 without restrictions| Adjusted R-squared = .9825836 || Model test F[ 6, 20] (prob) = 245.47 (.0000) || Restrictns. F[ 3, 17] (prob) = 22.31 (.0000) | Note: J(=3)*F = Chi-Squared = 66.915 from before| Not using OLS or no constant. Rsqd & F may be < 0. || Note, with restrictions imposed, Rsqd may be < 0. |+----------------------------------------------------++---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ Constant -4.46504223 4.77789711 -.935 .3631 Y 1.05851456 .55196204 1.918 .0721 9.03448264 PG -.15852276 .05008100 -3.165 .0057 .47679491 PNC .21765564 .18336687 1.187 .2516 .28100132 PUC -.24298315 .10328032 -2.353 .0309 .40523616 PPT -.12617610 .10436708 -1.209 .2432 .47071442 PD .000000 ......(Fixed Parameter)....... -.44279509 PN .222045D-15 ......(Fixed Parameter)....... -.58532943 PS -.444089D-15 ......(Fixed Parameter)....... -.62272267 T .02944666 .02126600 1.385 .1841 13.0000000


Nonlinear Restrictions

I am interested in testing the hypothesis that certain ratios of elasticities are equal. In particular,

1 = 4/5 - 7/8 = 0

2 = 4/5 - 9/8 = 0


Setting Up the Wald StatisticTo do the Wald test, I first need to estimate the asymptotic covariance matrix for the

sample estimates of 1 and 2. After estimating the regression by least squares, the estimates are

f1 = b4/b5 - b7/b8 f2 = b4/b5 - b9/b8.

Then, using the delta method, I will estimate the asymptotic variances of f1 and f2 and the asymptotic covariance of f1 and f2. For this, write f1 = f1(b), that is a function of the entire 101 coefficient vector. Then, I compute the 110 derivative vectors, d1 = f1(b)/b and d2 = f2(b)/b These vectors are

1 2 3 4 5 6 7 8 9 10 d1 = 0, 0, 0, 1/b5, -b4/b5

2, 0, -1/b8, b7/b82, 0, 0

d2 = 0, 0, 0, 1/b5, -b4/b52, 0, 0, b9/b8

2, -1/b8, 0


Wald StatisticsThen, D = the 210 matrix with first row d1 and second row

d2. The estimator of the asymptotic covariance matrix of [f1,f2] (a 21 column vector) is V = D s2 (XX)-1 D. Finally, the Wald test of the hypothesis that = 0 is carried out by using the chi-squared statistic W = (f-0)V-1(f-0). This is a chi-squared statistic with 2 degrees of freedom. The critical value from the chi-squared table is 5.99, so if my sample chi-squared statistic is greater than 5.99, I reject the hypothesis.


Wald Test

In the example below, to make this a little simpler, I computed the 10 variable regression, then extracted the 51 subvector of the coefficient vector c = (b4,b5,b7,b8,b9) and its associated part of the 1010 covariance matrix. Then, I manipulated this smaller set of values.


Application of the Wald Statistic? Extract subvector and submatrix for the testmatrix;list ; c =b(4:9)]$matrix;list ; vc=varb(4:9,4:9)? Compute derivativescalc ;list ; g11=1/c(2); g12=-c(1)*g11*g11; g13=-1/c(4) ; g14=c(3)*g13*g13 ; g15=0; g21= g11 ; g22=g12 ; g23=0 ; g24=c(5)/c(4)^2 ; g25=-1/c(4)$? Move derivatives to matrixmatrix;list; dfdc=[g11,g12,g13,g14,g15 / g21,g22,g23,g24,g25]$? Compute functions, then move to matrix and compute Wald statisticcalc;list ; f1=c(1)/c(2) - c(3)/c(4) ; f2=c(1)/c(2) - c(5)/c(4) $matrix ; list; f = [f1/f2]$matrix ; list; vf=dfdc * vc * dfdc' $matrix ; list ; wald = f' * <vf> * f$


ComputationsMatrix C is 5 rows by 1 columns. 1 1 -0.2948 -0.2015 1.506 0.9995 -0.8179Matrix VC is 5 rows by 5 columns. 1 2 3 4 5 1 0.6655E-01 0.9479E-02 -0.4070E-01 0.4182E-01 -0.9888E-01 2 0.9479E-02 0.5499E-02 -0.9155E-02 0.1355E-01 -0.2270E-01 3 -0.4070E-01 -0.9155E-02 0.8848E-01 -0.2673E-01 0.3145E-01 4 0.4182E-01 0.1355E-01 -0.2673E-01 0.7308E-01 -0.1038 5 -0.9888E-01 -0.2270E-01 0.3145E-01 -0.1038 0.2134 G11 = -4.96184 G12 = 7.25755 G13= -1.00054 G14 = 1.50770 G15 = 0.000000 G21 = -4.96184 G22 = 7.25755 G23 = 0 G24 = -0.818753 G25 = -1.00054DFDC=[G11,G12,G13,G14,G15/G21,G22,G23,G24,G25]Matrix DFDC is 2 rows by 5 columns. 1 2 3 4 5 1 -4.962 7.258 -1.001 1.508 0.0000 2 -4.962 7.258 0.0000 -0.8188 -1.001F1= -0.442126E-01F2= 2.28098F=[F1/F2]VF=DFDC*VC*DFDC'Matrix VF is 2 rows by 2 columns. 1 2 1 0.9804 0.7846 2 0.7846 0.8648WALD Matrix Result is 1 rows by 1 columns. 1 1 22.65


Noninvariance of the Wald Test

Download - Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics

Top Related