soderlind - lecture notes - econometrics- some statistics

7/21/2019 Soderlind - Lecture Notes - Econometrics- Some Statistics

1/24

Lecture Notes - Econometrics: Some Statistics

Paul Sderlind1

10 February 2011

1 University of St. Gallen. Address: s/bf-HSG, Rosenbergstrasse 52, CH-9000 St. Gallen,

Switzerland.E-mail: [email protected]. Document name: EcmXSta.TeX.


2/24

Contents

21 Some Statistics 2

21.1 Distributions and Moment Generating Functions . . . . . . . . . . . . . . 2

21.2 Joint and Conditional Distributions and Moments . . . . . . . . . . . . . 4

21.2.1 Joint and Conditional Distributions . . . . . . . . . . . . . . . . 4

21.2.2 Moments of Joint Distributions . . . . . . . . . . . . . . . . . . 5

21.2.3 Conditional Moments . . . . . . . . . . . . . . . . . . . . . . . 5

21.2.4 Regression Function and Linear Projection . . . . . . . . . . . . 6

21.3 Convergence in Probability, Mean Square, and Distribution . . . . . . . . 7

21.4 Laws of Large Numbers and Central Limit Theorems . . . . . . . . . . . 9

21.5 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

21.6 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

21.7 Special Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

21.7.1 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . 11

21.7.2 The Lognormal Distribution . . . . . . . . . . . . . . . . . . . . 16

21.7.3 The Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . 17

21.7.4 The t and F Distributions . . . . . . . . . . . . . . . . . . . . . . 18

21.7.5 The Bernouilli and Binomial Distributions . . . . . . . . . . . . . 20

21.7.6 The Skew-Normal Distribution . . . . . . . . . . . . . . . . . . . 20

21.7.7 Generalized Pareto Distribution . . . . . . . . . . . . . . . . . . 21

21.8 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1


3/24

Chapter 21

Some Statistics

This section summarizes some useful facts about statistics. Heuristic proofs are given in

a few cases.

Some references: Mittelhammer (1996), DeGroot (1986), Greene (2000), Davidson

(2000), Johnson, Kotz, and Balakrishnan (1994).

21.1 Distributions and Moment Generating Functions

Most of the stochastic variables we encounter in econometrics are continuous. For a

continuous random variable X, the range is uncountably infinite and the probability that

X x is Pr.X x/D Rx1f.q/dq wheref .q/ is the continuous probability densityfunction ofX. Note thatXis a random variable, x is a number (1.23 or so), andq is just

a dummy argument in the integral.

Fact 21.1 (cdf and pdf) The cumulative distribution function of the random variableX is

F.x/D Pr.X x/D Rx1f.q/dq. Clearly, f .x/D dF.x/=dx. Note thatx is just anumber, not random variable.

Fact 21.2 (Moment generating function ofX) The moment generating function of the

random variableX ismgf .t/DE etX. Ther th moment is the r th derivative ofmgf .t/evaluated attD 0: E Xr D d mgf .0/=dt r . If a moment generating function exists (thatis,E etX < 1 for some small intervalt2 .h; h/), then it is unique.

Fact 21.3 (Moment generating function of a function ofX) IfXhas the moment generat-

ing functionmgfX.t/DE etX, theng.X/has the moment generating function E etg.X/.

2


4/24

The affine function aCbX (a andb are constants) has the moment generating func-tion mgfg.X/.t/D E et.aCbX/ D eta E etbX D etamgfX.bt/. By setting bD 1 and

aD E X we obtain a mgf for central moments (variance, skewness, kurtosis, etc),mgf.XEX/.t/ D etE XmgfX.t/.

Example 21.4 When X N.;2/, then mgfX.t/D exp

tC 2t2=2. LetZD.X/=so a D =andbD 1=. This gives mgfZ.t/ D exp.t=/mgfX.t=/ Dexp

t2=2

. (Of course, this result can also be obtained by directly settingD 0 and

D 1in mgfX.)

Fact 21.5 (Change of variable, univariate case, monotonic function) Suppose Xhas the

probability density functionfX.c/ and cumulative distribution functionFX.c/. LetYDg.X/be a continuously differentiable function with dg=dX > 0 (so g.X/is increasing

for all c such thatfX.c/ > 0. Then the cdf ofY is

FY.c/ D PrY c D Prg.X/ c D PrX g1.c/ D FXg1.c/;

whereg1 is the inverse function ofg such thatg1.Y /DX. We also have that the pdfofY is

fY.c/ D fXg1.c/ dg1.c/

dc :If, instead,dg=dX < 0(so g.X/is decreasing), then we instead have the cdf of Y

FY.c/ D PrY c D Prg.X/ c D PrX g1.c/ D 1 FXg1.c/;

but the same expression for the pdf.

Proof. DifferentiateFY.c/, that is,FXg1.c/with respect toc .

Example 21.6 LetX U.0; 1/andYD g.X/D F1.X/ where F.c/ is a strictlyincreasing cdf. We then get

fY.c/ D dF.c/

dc :

The variable Y then has the pdfdF.c/=dc and the cdfF.c/. This shows how to gen-

erate random numbers from the F ./ distribution: draw X U.0; 1/ and calculateYD F1.X/.

3


5/24

Example 21.7 LetYD exp.X/, so the inverse function is XD ln Y with derivative1=Y. Then,fY.c/D fX.ln c/=c. Conversely, letYD ln X, so the inverse function is

XD exp.Y /with derivativeexp.Y /. Then,fY.c/ D fXexp.c/ exp.c/.Example 21.8 LetX U.0; 2/, so the pdf and cdf ofXare then 1=2 andc =2 respec-tively. Now, letYD g.X/ D Xgives the pdf and cdf as1=2 and1 C y=2respectively.The latter is clearly the same as 1 FXg1.c/ D 1 .c=2/.

Fact 21.9 (Distribution of truncated a random variable) Let the probability distribution

and density functions ofXbeF .x/andf .x/, respectively. The corresponding functions,

conditional ona < X b areF.x/ F.a/=F.b/ F.a/andf .x/=F.b/ F.a/.

Clearly, outside a < X b the pdf is zero, while the cdf is zero below a and unity aboveb.

21.2 Joint and Conditional Distributions and Moments

21.2.1 Joint and Conditional Distributions

Fact 21.10 (Joint and marginal cdf) LetX andYbe (possibly vectors of) random vari-

ables and letx and y be two numbers. The joint cumulative distribution function of

X and Y is H.x;y/ D Pr.X x; Y y/ D Rx1 Ry1 h.qx; qy/dqydqx, whereh.x;y/ D @2F .x; y/=@x@y is the joint probability density function.

Fact 21.11 (Joint and marginal pdf) The marginal cdf ofXis obtained by integrating out

Y:F .x/ D Pr.X x; Y anything/ D Rx1 R11 h.qx; qy/dqy dqx. This shows that themarginal pdf ofx isf .x/ D dF.x/=dxD R11 h.qx; qy/dqy .Fact 21.12 (Conditional distribution) The pdf ofYconditional onXD x (a number) isg.yjx/D h.x; y/=f .x/. This is clearly proportional to the joint pdf (at the given valuex).

Fact 21.13 (Change of variable, multivariate case, monotonic function) The result in

Fact 21.5 still holds if X and Y are both n 1 vectors, but the derivative are [email protected]/=@dc0 which is an nn matrix. If g1i is the i th function in the vector g1

4


6/24

then

@g1.c/

@dc 0 D

2664

@g11 .c/

@c1 @g11 .c/

@cn::

:

::

:@g1n .c/@c1

@g1n .c/@cm

3775 :

21.2.2 Moments of Joint Distributions

Fact 21.14 (Caucy-Schwartz).E XY /2 E.X2/ E.Y2/:

Proof.0 E.aXCY /2 D a2 E.X2/C2a E. X Y /CE.Y2/. Set a D E.XY /= E.X2/to get

0

E.XY /2

E.X2/ CE.Y2/, that is,

E.XY /2

E.X2/ E.Y2/:

Fact 21.15 (1 Corr.X;y/1). LetY andXin Fact 21.14 be zero mean variables(or variables minus their means). We then getCov.X;Y/2 Var.X/ Var.Y /, that is,1 Cov.X;Y /=Std.X/Std.Y / 1.

21.2.3 Conditional Moments

Fact 21.16 (Conditional moments) E .Yjx/ D Ryg.yjx/dy and Var .Yjx/ D RyE .Yjx/g.yjx/dy.

Fact 21.17 (Conditional moments as random variables) Before we observeX, the condi-

tional moments are random variablessinceXis. We denote these random variables by

E .YjX/,Var .YjX/, etc.

Fact 21.18 (Law of iterated expectations) E YD EE .YjX/. Note thatE .YjX/ is arandom variable since it is a function of the random variableX. It is not a function ofY,

however. The outer expectation is therefore an expectation with respect to X only.

Proof. EE .YjX/ D RRyg.yjx/dyf.x/dxD R Ryg.yjx/f .x/dydxD R Ryh.y; x/dydxDE Y:

Fact 21.19 (Conditional vs. unconditional variance) Var .Y / D Var E .YjX/CE Var .YjX/.

5


7/24

Fact 21.20 (Properties of Conditional Expectations) (a) YD E .YjX/CUwhere U andE .YjX/ are uncorrelated: Cov . X ; Y /D Cov X; E .YjX/ C U D Cov X; E .YjX/.

It follows that (b)CovY; E .YjX / D VarE .YjX/; and (c)Var .Y / D Var E .YjX / CVar .U /. Property (c) is the same as Fact 21.19, where Var .U / D E Var .YjX /.

Proof. Cov . X ; Y / D R Rx.yE y/h.x; y/dydxD Rx R.y E y/g.yjx/dy f.x/dx,but the term in brackets is E .YjX / E Y.

Fact 21.21 (Conditional expectation and unconditional orthogonality) E .YjZ/D0)E Y ZD 0.

Proof. Note from Fact 21.20 that E.YjX/

D0implies Cov . X ; Y /

D0so E XY

DE XE Y(recall that Cov . X ; Y / D E X YE XE Y:) Note also that E .YjX / D 0 impliesthat E YD 0(by iterated expectations). We therefore get

E .YjX/ D 0 )"

Cov . X ; Y / D 0E YD 0

#) E YXD 0:

21.2.4 Regression Function and Linear Projection

Fact 21.22 (Regression function) Suppose we use information in some variables X to

predict Y. The choice of the forecasting function OY D k.X/D E .YjX/ minimizesEY k.X/2:The conditional expectationE .YjX/is also called the regression functionofY onX. See Facts 21.20 and 21.21 for some properties of conditional expectations.

Fact 21.23 (Linear projection) Suppose we want to forecast the scalarYusing thek 1vectorXand that we restrict the forecasting rule to be linear OYD X0. This rule is alinear projection, denotedP .YjX/, if satisfies the orthogonality conditions EX.YX0/D 0k1, that is, ifD .E XX0/1 E X Y. A linear projection minimizesEYk.X/2 within the class of lineark.X/functions.

Fact 21.24 (Properties of linear projections) (a) The orthogonality conditions in Fact

21.23 mean that

YD X0 C ";

6


8/24

where E.X"/D 0k1. This implies that EP.YjX/"D 0, so the forecast and fore-cast error are orthogonal. (b) The orthogonality conditions also imply thatE X Y D

EXP.YjX/. (c) WhenXcontains a constant, so E " D 0, then (a) and (b) carry over tocovariances: CovP.YjX/;" D 0andCov X ; Y D CovXP; .YjX/.Example 21.25 (P .1jX /) When YtD 1, thenD .E XX0/1 E X. For instance, sup-poseXD x1t ; xt20. Then

D"

E x21t E x1tx2t

E x2tx1t E x22t

#1 " E x1t

E x2t

#:

Ifx1tD 1in all periods, then this simplifies to D 1; 00.Remark 21.26 Some authors prefer to take the transpose of the forecasting rule, that is,

to useOYD 0X. Clearly, sinceXX0 is symmetric, we get 0D E.YX0/.E XX0/1.Fact 21.27 (Linear projection with a constant in X) IfXcontains a constant, then P.aYCbjX / D aP.YjX / C b.Fact 21.28 (Linear projection versus regression function) Both the linear regression and

the regression function (see Fact 21.22) minimizeEY k.X/2, but the linear projectionimposes the restriction thatk.X/ is linear, whereas the regression function does not im-

pose any restrictions. In the special case whenY andXhave a joint normal distribution,then the linear projection is the regression function.

Fact 21.29 (Linear projection and OLS) The linear projection is about population mo-

ments, but OLS is its sample analogue.

21.3 Convergence in Probability, Mean Square, and Distribution

Fact 21.30 (Convergence in probability) The sequence of random variablesfXTg con-

verges in probability to the random variable Xif (and only if) for all" > 0

limT!1

Pr.jXT Xj < "/ D 1:

We denote thisXTp! X orplim XTD X (X is the probability limit ofXT). Note: (a)

Xcan be a constant instead of a random variable; (b) ifXT andXare matrices, then

XTp! Xif the previous condition holds for every element in the matrices.

7


9/24

Example 21.31 SupposeXTD 0with probability .T 1/=T andXTD Twith prob-ability 1=T. Note that limT!1 Pr.jXT 0j D 0/D limT!1.T 1/=T D 1, so

limT!1 Pr.jXT 0j D "/ D 1for any" > 0. Note also thatE XTD 0 .T 1/=TCT 1=TD 1, soXTis biased.

Fact 21.32 (Convergence in mean square) The sequence of random variables fXTg con-verges in mean square to the random variableXif (and only if)

limT!1

E.XT X /2 D 0:

We denote this XTm! X. Note: (a)Xcan be a constant instead of a random variable;

(b) ifXT andX are matrices, then XT

m

! Xif the previous condition holds for everyelement in the matrices.

2 1 0 1 20

1

2

3Distribution of sample avg.

T=5T=25T=100

Sample average5 0 5

0

0.1

0.2

0.3

0.4

Distribution of Tsample avg.

Tsample average

Sample average of zt1where z

thas a 2(1) distribution

Figure 21.1: Sampling distributions

Fact 21.33 (Convergence in mean square to a constant) IfXin Fact 21.32 is a constant,

then thenXTm! Xif (and only if)

limT!1

.E XT X/2 D 0and limT!1

Var.XT2/ D 0:

This means that both the variance and the squared bias go to zero as T! 1.

8


10/24

Proof. E.XT X/2 D E X2T 2XE XTC X2. Add and subtract.E XT/2 and recallthat Var.XT/ D E X2T.E XT/2. This gives E.XTX/2 D Var.XT/2XE XTCX2C

.E XT/

2

D Var.XT/ C .E XT X /2

.

Fact 21.34 (Convergence in distribution) Consider the sequence of random variables

fXTg with the associated sequence of cumulative distribution functions fFTg. IflimT!1 FTDF(at all points), then Fis the limiting cdf ofXT. If there is a random variable X with

cdfF, thenXTconverges in distribution toX: XTd! X. Instead of comparing cdfs, the

comparison can equally well be made in terms of the probability density functions or the

moment generating functions.

Fact 21.35 (Relation between the different types of convergence) We haveXTm

!X

)XT

p! X) XT d! X. The reverse implications are not generally true.

Example 21.36 Consider the random variable in Example 21.31. The expected value is

E XTD 0.T 1/=TC T =TD 1. This means that the squared bias does not go to zero,soXTdoes not converge in mean square to zero.

Fact 21.37 (Slutskys theorem) IffXTg is a sequence of random matrices such thatplim XTDXandg.XT/a continuous function, thenplim g.XT/ D g.X/.

Fact 21.38 (Continuous mapping theorem) Let the sequences of random matrices fXTgandfYTg, and the non-random matrix faTg be such thatXT d! X, YT

p! Y, andaT! a(a traditional limit). Letg.XT; YT; aT/ be a continuous function. Then g.XT; YT; aT/

d!g.X;Y;a/.

21.4 Laws of Large Numbers and Central Limit Theorems

Fact 21.39 (Khinchines theorem) Let Xt be independently and identically distributed

(iid) withE XtD < 1. ThenT

tD1Xt=T p

! .Fact 21.40 (Chebyshevs theorem) IfE XtD 0andlimT!1 Var.TtD1Xt=T / D 0, thenTtD1Xt=T

p! 0.

Fact 21.41 (The Lindeberg-Lvy theorem) LetXt be independently and identically dis-

tributed (iid) withE XtD 0andVar.Xt/ < 1. Then 1pT

TtD1Xt= d! N.0; 1/.

9


11/24

21.5 Stationarity

Fact 21.42 (Covariance stationarity)Xt is covariance stationary if

E XtD is independent oft ;Cov .Xts; Xt / D s depends only ons, and

both ands are finite.

Fact 21.43 (Strict stationarity)Xt is strictly stationary if, for all s, the joint distribution

ofXt ; XtC1;:::;XtCs does not depend ont .

Fact 21.44 (Strict stationarity versus covariance stationarity) In general, strict station-

arity does not imply covariance stationarity or vice versa. However, strict stationary with

finite first two moments implies covariance stationarity.

21.6 Martingales

Fact 21.45 (Martingale) Let t be a set of information in t, for instanceYt ; Yt1;::: If

E jYt j < 1 andE.YtC1jt/ D Yt, thenYt is a martingale.

Fact 21.46 (Martingale difference) IfYt is a martingale, thenXtD Yt Yt1 is a mar-tingale difference: Xt hasE jXt j < 1 andE.XtC1jt / D 0.

Fact 21.47 (Innovations as a martingale difference sequence) The forecast errorXtC1DYtC1 E.YtC1jt/is a martingale difference.

Fact 21.48 (Properties of martingales) (a) IfYt is a martingale, then E.YtCsjt /D Ytfors 1. (b) IfXt is a martingale difference, thenE.XtCsjt / D 0fors 1.

Proof. (a) Note that E.YtC2jtC1/DYtC1 and take expectations conditional on t :EE.YtC2jtC1/jt D E.YtC1jt / D Yt . By iterated expectations, the first term equalsE.YtC2jt/. Repeat this fortC 3,tC 4, etc. (b) Essentially the same proof.

Fact 21.49 (Properties of martingale differences) IfXt is a martingale difference and

gt1 is a function oft1, thenXtgt1 is also a martingale difference.

10


12/24

Proof. E.XtC1gt jt/ D E.XtC1jt/gt sincegt is a function oft .

Fact 21.50 (Martingales, serial independence, and no autocorrelation) (a)Xt is serially

uncorrelated ifCov.Xt ; XtCs/D0for all s 0. This means that a linear projection ofXtCs onXt ; Xt1;:::is a constant, so it cannot help predictXtCs. (b)Xt is a martingale

difference with respect to its history ifE.XtCsjXt ; Xt1; :::/ D 0for all s 1. This meansthat no function ofXt ; Xt1;::: can help predictXtCs. (c) Xt is serially independent if

pdf.XtCsjXt ; Xt1; :::/D pdf.XtCs/. This means than no function ofXt ; Xt1;::: canhelp predict any function ofXtCs .

Fact 21.51 (WLN for martingale difference) IfXt is a martingale difference, then plim TtD1Xt=TD

0if either (a)Xt is strictly stationary andE

jxt

j< 0or (b)E

jxt

j1C 0and

allt . (See Davidson (2000) 6.2)

Fact 21.52 (CLT for martingale difference) LetXt be a martingale difference. Ifplim TtD1.X

2t

E X2t/= TD 0and either

(a)Xt is strictly stationary or

(b) maxt21;T .E jXt j2C/1=.2C/TtD1EX

2t=T

< 1 for > 0and allT > 1;

then.TtD1Xt=p

T /=.TtD1E X2t= T /

1=2 d! N.0; 1/. (See Davidson (2000) 6.2)

21.7 Special Distributions

21.7.1 The Normal Distribution

Fact 21.53 (Univariate normal distribution) IfX N.; 2/, then the probability den-sity function ofX,f .x/is

f.x/ D 1p22

e12.x /2 :

The moment generating function ismgfX.t / D exp

tC 2t2=2and the moment gen-erating function around the mean is mgf.X/.t/ D exp

2t2=2

.

Example 21.54 The first few moments around the mean areE.X/ D 0,E.X/2 D2,E.X /3 D 0(all odd moments are zero),E.X /4 D 34,E.X /6 D 156,andE.X /8 D 1058.

11


13/24

2 1 0 1 20

0.2

0.4

Pdf of N(0,1)

x

2

0

2

20

20

0.1

0.2

x

Pdf of bivariate normal, corr=0.1

y

2

0

2

20

20

0.2

0.4

x


y

Figure 21.2: Normal distributions

Fact 21.55 (Standard normal distribution) IfX

N.0; 1/, then the moment generating

function ismgfX.t/D exp

t2=2

. Since the mean is zero, m.t/gives central moments.

The first few are E XD 0, E X2 D 1, E X3 D 0 (all odd moments are zero), andEX4 D 3. The distribution function, Pr.X a/D .a/D 1=2C1=2 erf.a=p2/,whereerf./is the error function,erf.z/ D 2p

Rz0

exp.t2/dt ). The complementary errorfunction iserfc.z/ D 1 erf.z/. Since the distribution is symmetric around zero, we have.a/D Pr.X a/D Pr.X a/D 1 .a/. Clearly, 1 .a/D .a/D1=2 erfc.a=

p2/.

Fact 21.56 (Multivariate normal distribution) IfXis an n1 vector of random variableswith a multivariate normal distribution, with a mean vector and variance-covariance

matrix,N.; /, then the density function is

f.x/ D 1.2/n=2jj1=2 exp

1

2.x /01.x /

:

12


14/24

2

0

2

20

20

0.1

0.2

x


y 2 1 0 1 20

0.2

0.4

0.6

0.8Conditional pdf of y, corr=0.1

x=0.8

x=0

y

2

0

2

20

20

0.2

0.4

x


y 2 1 0 1 20

0.2

0.4

0.6

0.8Conditional pdf of y, corr=0.8

x=0.8x=0

y

Figure 21.3: Density functions of normal distributions

Fact 21.57 (Conditional normal distribution) Suppose Zm1 andXn1 are jointly nor-

mally distributed " Z

X

# N

" Z

X

#;

" ZZ ZX

XZ XX

#!:

The distribution of the random variable Z conditional on thatXD x (a number) is alsonormal with mean

E .Zjx/ D Z C ZX 1XX.x X/ ;and variance (variance ofZ conditional on that XD x, that is, the variance of theprediction errorZ E .Zjx/)

Var .Zjx/ D ZZ ZX 1XX XZ :

13


15/24

Note that the conditional variance is constant in the multivariate normal distribution

(Var .ZjX/ is not a random variable in this case). Note also thatVar .Zjx/ is less than

Var.Z/D ZZ (in a matrix sense) ifXcontains any relevant information (so ZX isnot zero, that is,E .Zjx/ is not the same for allx).

Fact 21.58 (Steins lemma) IfYhas normal distribution andh./ is a differentiable func-

tion such thatE jh0.Y /j < 1, thenCovY; h.Y / D Var.Y / E h0.Y /.

Proof. E.Y/h.Y / D R11.Y/h.Y /.YI ; 2/d Y, where .YI ; 2/ is thepdf ofN.;2/. Note that d.YI ; 2/=dYD .YI ; 2/.Y/=2, so the integralcan be rewritten as2R

11h.Y /d.YI ; 2/. Integration by parts (R

udv D uvRvd u) gives

2 h.Y /.YI ;

2

/11 R11.YI ; 2/h0.Y/dY D 2 E h0.Y /.

Fact 21.59 (Steins lemma 2) It follows from Fact 21.58 that ifXandYhave a bivariate

normal distribution andh./ is a differentiable function such thatE jh0.Y /j


16/24

Fact 21.63 (Lower truncation) In Fact 21.62, letb! 1, so we only have the truncationa < X. Then, we have

E.Xja < X/ D C .a0/1 .a0/

and

Var.Xja < X/ D 2(

1 C a0.a0/1 .a0/

.a0/

1 .a0/2)

:

(The latter follows fromlimb!1 b0.b0/ D 0.)Example 21.64 SupposeX N.0;2/and we want to calculateE jxj. This is the sameasE.XjX > 0/ D 2.0/.

Fact 21.65 (Upper truncation) In Fact 21.62, leta! 1, so we only have the trunca-tionX b. Then, we have

E.XjX b/ D .b0/.b0/

and

Var.XjX b/ D 2(

1 b0.b0/.b0/

.b0/

.b0/

2):

(The latter follows fromlima!1 a0.a0/ D 0.)

Fact 21.66 (Delta method) Consider an estimator Ok1 which satisfiesp

T

O 0 d! N .0; / ;

and suppose we want the asymptotic distribution of a transformation of

q1D g . / ;

whereg .:/is has continuous first derivatives. The result is

pT hg O g .0/i d! N 0; qq ; whereD @g .0/

@0

@g .0/

0

@ , where

@g .0/

@0

isq k:

Proof. By the mean value theorem we have

g

O

D g .0/ C@g ./

@0

O 0

;

15


17/24

where

@g ./

@ 0 D

2664

@g1./

@ 1 @g1./

@k:::

: : : :::

@gq./

@ 1 @gq./

@k

3775qk

;

and we evaluate it at which is (weakly) between O and0. Premultiply byp

T and

rearrange as pTh

g

O

g .0/i

D @g ./

@ 0p

T

O 0

.

IfO is consistent (plimOD 0) and @g ./ =@ 0is continuous, then by Slutskys theoremplim @g ./ =@ 0D @g .0/ =@ 0, which is a constant. The result then follows from the

continuous mapping theorem.

21.7.2 The Lognormal Distribution

Fact 21.67 (Univariate lognormal distribution) Ifx N.; 2/ andy D exp.x/ thenthe probability density function ofy,f .y/is

f.y/ D 1yp

22e

12. lny

/2,y > 0:

Ther th moment ofy isE yr

Dexp.r

Cr22=2/.

Example 21.68 The first two moments areE yD exp C 2=2andE y2 D exp.2C22/. We therefore get Var.y/ D exp 2 C 2 exp 2 1 and Std .y/= E y Dp

exp.2/ 1.

Fact 21.69 (Moments of a truncated lognormal distribution) Ifx N.; 2/ andyDexp.x/ then E.yr jy > a /D E.yr/.r a0/=.a0/, where a0D .ln a /=.Note that the denominator is Pr. y > a / D .a0/. In contrast, E.yr jy b/ D

E.yr

/.rC b0/=.b0/, whereb0D .ln b /=. The denominator isPr.y b/ D.b0/.

Example 21.70 The first two moments of the truncated (from below) lognormal distri-

bution are E.yjy > a /D exp C 2=2. a0/=.a0/ and E.y2jy > a /Dexp

2 C 22.2 a0/=.a0/.

16


18/24

Example 21.71 The first two moments of the truncated (from above) lognormal distri-

bution are E.yjy b/ D exp

C 2=2

.C b0/=.b0/ and E.y2jy b/ D

exp

2 C 22

.2C b0/=.b0/.Fact 21.72 (Multivariate lognormal distribution) Let then 1vectorx have a mulivari-ate normal distribution

x N.;/, where D

2664

1:::

n

3775 andD

2664

11 1n:::

: : : :::

n1 nn

3775 :

Theny

Dexp.x/ has a lognormal distribution, with the means and covariances

E yiD exp .iC i i=2/Cov.yi ; yj/ D exp

iC jC .i iC jj/=2

exp.ij/ 1

Corr.yi ; yj/ D

exp.ij/ 1

=

qexp.i i/ 1

exp.jj/ 1

:

Cleary,Var.yi/ D exp 2iC i i exp.i i/ 1. Cov.y1; y2/andCorr.y1; y2/have thesame sign asCorr.xi ; xj/ and are increasing in it. However,Corr.yi ; yj/ is closer to zero.

21.7.3 The Chi-Square Distribution

Fact 21.73 (The 2ndistribution) IfY 2n, then the pdf ofY is f.y/ D 12n=2 .n=2/yn=21ey=2,where ./ is the gamma function. The moment generating function ismgfY.t /D .1 2t /n=2 fort < 1=2. The first moments ofY areE YD nandVar.Y / D 2n.

Fact 21.74 (Quadratic forms of normally distribution random variables) If the n 1vectorX N.0;/, then Y D X01X 2n. Therefore, if the n scalar randomvariablesXi, i D 1;:::;n, are uncorrelated and have the distributions N.0;2i /, i D

1;:::;n, thenYD n

iD1X2

i =

2

i 2

n.

Fact 21.75 (Distribution ofX0AX) If the n1 vectorX N.0; I /, andA is a symmetricidempotent matrix (A D A0 andA D AA D A0A) of rankr, thenYD X0AX 2r .

Fact 21.76 (Distribution ofX0CX) If the n 1 vectorX N.0;/, where hasrankr nthenYD X0CX 2r whereC is the pseudo inverse of.

17


19/24

Proof. is symmetric, so it can be decomposed as D CC0 where C are theorthogonal eigenvector (C0CD I) andis a diagonal matrix with the eigenvalues along

the main diagonal. We therefore have D CC0D C111C01 where C1 is an n rmatrix associated with the r non-zero eigenvalues (found in the r r matrix11). Thegeneralized inverse can be shown to be

CDh

C1 C2

i" 111 00 0

# h C1 C2

i0D C1111 C01;

We can writeCD C11=211 1=211 C01. Consider ther 1vectorZD 1=211 C01X, andnote that it has the covariance matrix

E ZZ 0D 1=211 C01E XX0C11=211 D 1=211 C01C111C01C11=211 D Ir ;

sinceC01C1D Ir . This shows thatZ N.0r1; Ir/, so Z 0ZD X0CX 2r .Fact 21.77 (Convergence to a normal distribution) LetY 2n andZD .Y n/=n1=2.ThenZ

d! N.0; 2/.Example 21.78 IfYD niD1X2i =2i , then this transformation means ZD .niD1X2i=2i1/=n1=2.

Proof. We can directly note from the moments of a 2n variable that E ZD .E Yn/=n1=2 D 0, and Var.Z/D Var.Y /=nD 2. From the general properties of momentgenerating functions, we note that the moment generating function ofZ is

mgfZ.t / D etpn

1 2 t

n1=2

n=2with lim

n!1mgfZ.t/ D exp.t2/:

This is the moment generating function of aN.0; 2/distribution, which shows that Z d!

N.0; 2/. This result should not come as a surprise as we can think ofYas the sum of

n variables; dividing by n1=2 is then like creating a scaled sample average for which acentral limit theorem applies.

21.7.4 The t and F Distributions

Fact 21.79 (The F .n1; n2/ distribution) IfY1 2n1 andY2 2n2 andY1 andY2 areindependent, then ZD .Y1=n1/=.Y2=n2/ has an F .n1; n2/ distribution. This distribution

18


20/24

0 5 100

0.5

1

a. Pdf of Chisquare(n)

x

n=1

n=2n=5

n=10

0 2 4 60

0.5

1

b. Pdf of F(n1,10)

x

n

1

=2

n1=5

n1=10

0 2 4 60

0.5

1

c. Pdf of F(n1,100)

x

n1=2

n1=5

n1=10

2 0 20

0.2

0.4

d. Pdf of N(0,1) and t(n)

x

N(0,1)t(10)

t(50)

Figure 21.4:2, F, and t distributions

has no moment generating function, butE ZD n2=.n2 2/forn > 2.Fact 21.80 (Convergence of an F .n1; n2/ distribution) In Fact (21.79), the distribution of

n1ZD Y1=.Y2=n2/converges to a2n1 distribution asn2! 1. (The idea is essentiallythatn2! 1 the denominator converges to the mean, which is E Y2=n2D 1. Only thenumerator is then left, which is a2n1 variable.)

Fact 21.81 (Thetn distribution) IfX N.0; 1/andY 2n andXandY are indepen-dent, thenZD X=.Y=n/1=2 has atn distribution. The moment generating function doesnot exist, butE ZD 0forn > 1and Var.Z/ D n=.n 2/forn > 2.

Fact 21.82 (Convergence of a tn distribution) The t distribution converges to a N.0; 1/

distribution asn ! 1.

Fact 21.83 (tnversusF .1;n/distribution) IfZ tn, thenZ2 F.1;n/.

19


21/24

21.7.5 The Bernouilli and Binomial Distributions

Fact 21.84 (Bernoulli distribution) The random variable X can only take two values:

1 or 0, with probability p and1 p respectively. The moment generating function ismgf .t/ D pe t C 1 p. This givesE.X / D p andVar.X / D p.1 p/.

Example 21.85 (Shifted Bernoulli distribution) Suppose the Bernoulli variable takes the

values aorb(instead of 1 and 0) with probability pand1prespectively. ThenE.X / Dpa C .1 p/b andVar.X/ D p.1 p/.a b/2.

Fact 21.86 (Binomial distribution). SupposeX1; X2;:::;Xn all have Bernoulli distribu-

tions with the parameterp. Then, the sum YD X1C X2C ::: C Xn has a Binomialdistribution with parametersp andn. The pdf is pdf.Y / D n=y.n y/p

y

.1 p/ny

foryD 0;1;:::;n. The moment generating function is mgf .t/D pe t C 1 pn. ThisgivesE.Y / D np andVar.Y / D np.1 p/.

Example 21.87 (Shifted Binomial distribution) Suppose the Bernuolli variables X1; X2;:::;Xn

take the values a orb (instead of 1 and 0) with probability p and1 p respectively.Then, the sumYD X1 C X2 C ::: C Xn hasE.Y /Dnpa C .1 p/b andVar.Y /Dnp.1 p/.a b/2.

21.7.6 The Skew-Normal Distribution

Fact 21.88 (Skew-normal distribution) Let and be the standard normal pdf and cdf

respectively. The pdf of a skew-normal distribution with shape parameter is then

f.z/ D 2.z/.z/:

IfZ has the above pdf and

Y

D

C!Z with! > 0;

then Yis said to have a SN.;!2; / distribution (see Azzalini (2005)). Clearly, the pdf

ofY is

f.y/ D 2 .y / =! .y /=!=!:The moment generating function ismgfy.t /D2 exp

tC !2t2=2.!t/whereD

=p

1 C 2. When > 0then the distribution is positively skewed (and vice versa)and

20


22/24

when D 0 the distribution becomes a normal distribution. When! 1, then thedensity function is zero forY , and2 .y /=!=! otherwisethis is a half-

normal distribution.

Example 21.89 The first three moments are as follows. First, notice thatE ZDp

2=,

Var.Z/ D 1 22= andE.Z E Z/3 D .4= 1/p

2=3. Then we have

E YD C !E ZVar.Y / D !2 Var.Z/

E .Y E Y /3 D !3 E.Z E Z/3:

Notice that with D 0 (so D 0), then these moments of Y become , !2

and 0respecively.

21.7.7 Generalized Pareto Distribution

Fact 21.90 (Cdf and pdf of the generalized Pareto distribution) The generalized Pareto

distribution is described by a scale parameter ( > 0) and a shape parameter (). The

cdf (Pr.Z z/, whereZ is the random variable andz is a value) is

G.z/ D ( 1 .1 C z=/1=

if

0

1 exp.z=/ D 0;

for0 zandz =in case < 0. The pdf is therefore

g.z/ D(

1

.1 C z=/1=1 if 01

exp.z=/ D 0:

The mean is defined (finite) if < 1 and is then E.z/ D =.1 /, the median is.2 1/=and the variance is defined if < 1=2and is then2=.1 /2.1 2/.

21.8 Inference

Fact 21.91 (Comparing variance-covariance matrices) LetVar. O/ andVar./ be thevariance-covariance matrices of two estimators,Oand, and supposeVar. O/Var./is a positive semi-definite matrix. This means that for any non-zero vectorR thatR0 Var. O/R

21


23/24

R0 Var./R, so every linear combination ofOhas a variance that is as large as the vari-ance of the same linear combination of. In particular, this means that the variance of

every element in O (the diagonal elements ofVar. O/) is at least as large as variance ofthe corresponding element of.

22


24/24

Bibliography

Azzalini, A., 2005, The skew-normal distribution and related Multivariate Families,

Scandinavian Journal of Statistics, 32, 159188.

Davidson, J., 2000,Econometric theory, Blackwell Publishers, Oxford.

DeGroot, M. H., 1986, Probability and statistics, Addison-Wesley, Reading, Mas-

sachusetts.

Greene, W. H., 2000, Econometric analysis, Prentice-Hall, Upper Saddle River, New

Jersey, 4th edn.

Johnson, N. L., S. Kotz, and N. Balakrishnan, 1994, Continuous univariate distributions,

Wiley, New York, 2nd edn.

Mittelhammer, R. C., 1996,Mathematical statistics for economics and business, Springer-

Verlag, New York.

Sderlind, P., 2009, An extended Steins lemma for asset pricing, Applied Economics

Letters,forthcoming, 16, 10051008.

23

soderlind - lecture notes - econometrics- some statistics

Documents