1 estimation of constant-cv regression models alan h. feiveson nasa – johnson space center...

32
1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

Upload: merryl-simon

Post on 28-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

1

Estimation of constant-CV regression models

Alan H. Feiveson

NASA – Johnson Space Center

Houston, TX

SNASUG 2008

Chicago, IL

Page 2: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

2

yi = 0 + 1xi + ei V( ei ) = 2

)(

)()(

i

ii yE

ySDyCV

Variance Models with Simple Linear Regression

22)( ii xeV yi = 0 + 1xi + ei

yi = 0 + 1xi + ei V( ei ) = 2(0 + 1xi)2

y = X + Zu

Page 3: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

3

.clear

. set obs 100

. gen x=10*uniform()

. gen mu = 1+.5*x

. replace y=mu+.10*mu*invnorm(uniform())

Example: 0 = 1.0, 1 = 0.5, = 0.101

23

45

6y

0 2 4 6 8 10x

yi = 0 + 1xi + ei V(ei) = 2(0 + 1xi)2

Problem: Estimate 0, 1, and

Page 4: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

4

22

10

)()(

1)(log

i

ii yV

xyV

Variance Stabilization

yi = 0 + 1xi + ei V(ei) = 2(0 + 1xi)2

)()]('[))(( 2 yVfyfV

But E(log yi) = g( 1, xi)

Page 5: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

5

Source | SS df MS Number of obs = 100-------------+------------------------------ F( 2, 97) = 630.65 Model | 15.9635043 2 7.98175216 Prob > F = 0.0000 Residual | 1.22767564 97 .01265645 R-squared = 0.9286-------------+------------------------------ Adj R-squared = 0.9271

Total | 17.19118 99 .173648282 Root MSE = .1125

------------------------------------------------------------------------------ z | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .2707448 .018873 14.35 0.000 .2332872 .3082025 x2 | -.0110946 .0017622 -6.30 0.000 -.0145921 -.0075972 _cons | .1783796 .0441957 4.04 0.000 .0906634 .2660958------------------------------------------------------------------------------

. gen z = log(y)

. gen x2 = x*x

. reg z x x2

. predict z_hat

Approximate g( 1, xi) by polynomial in x, then do OLS regression of log y on x:

0 = 1.0, 1 = 0.5, = 0.10

Page 6: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

6

0.5

11

.52

0 2 4 6 8 10x

z_hat log(y)

But what about 0 and 1?

Page 7: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

7

reg y x

predict muh

reg y x [w=1/muh^2]

.local rmse=e(rmse)

.gen wt = 1/muh^2

.summ wt

.local wbar=r(mean)

.local sigh = sqrt(`wbar’)*`rmse’

Alternative: Iteratively re-weighted regression

Page 8: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

8

ITERATION 0

. reg y x

Source | SS df MS Number of obs = 100-------------+------------------------------ F( 1, 98) = 832.95 Model | 172.307709 1 172.307709 Prob > F = 0.0000 Residual | 20.272763 98 .206864928 R-squared = 0.8947-------------+------------------------------ Adj R-squared = 0.8937 Total | 192.580472 99 1.94525729 Root MSE = .45482

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .518296 .0179585 28.86 0.000 .482658 .5539339 _cons | .9530661 .1014154 9.40 0.000 .7518106 1.154322------------------------------------------------------------------------------

. gen wt = 1/(_b[_cons] + _b[x]*x)^2

2)518.0953.0(1

X

Page 9: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

9

ITERATION 1

. reg y x [w=wt](analytic weights assumed)(sum of wgt is 1.3046e+01)

Source | SS df MS Number of obs = 100-------------+------------------------------ F( 1, 98) = 1278.10 Model | 123.561274 1 123.561274 Prob > F = 0.0000 Residual | 9.47421712 98 .096675685 R-squared = 0.9288-------------+------------------------------ Adj R-squared = 0.9281 Total | 133.035492 99 1.34379284 Root MSE = .31093

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .510555 .014281 35.75 0.000 .4822147 .5388952 _cons | .9877915 .0533903 18.50 0.000 .8818402 1.093743------------------------------------------------------------------------------

. replace wt = 1/(_b[_cons] + _b[x]*x)^2(100 real changes made)

2)511.0988.0(1

X

Page 10: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

10

ITERATION 2

. reg y x [w=wt](analytic weights assumed)(sum of wgt is 1.2842e+01)

Source | SS df MS Number of obs = 100-------------+------------------------------ F( 1, 98) = 1271.77 Model | 124.885941 1 124.885941 Prob > F = 0.0000 Residual | 9.62343467 98 .098198313 R-squared = 0.9285-------------+------------------------------ Adj R-squared = 0.9277 Total | 134.509376 99 1.35868057 Root MSE = .31337

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .510746 .0143219 35.66 0.000 .4823247 .5391674 _cons | .9870233 .0540361 18.27 0.000 .8797904 1.094256------------------------------------------------------------------------------

. replace wt = 1/(_b[_cons] + _b[x]*x)^2(100 real changes made)

2)511.0987.0(1

X

Page 11: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

11

ITERATION 3

. noi reg y x [w=wt](analytic weights assumed)(sum of wgt is 1.2846e+01)

Source | SS df MS Number of obs = 100-------------+------------------------------ F( 1, 98) = 1271.92 Model | 124.855967 1 124.855967 Prob > F = 0.0000 Residual | 9.6200215 98 .098163485 R-squared = 0.9285-------------+------------------------------ Adj R-squared = 0.9277 Total | 134.475988 99 1.35834331 Root MSE = .31331

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .5107417 .0143209 35.66 0.000 .4823223 .5391612 _cons | .9870406 .0540213 18.27 0.000 .8798371 1.094244------------------------------------------------------------------------------. summ wt

Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- wt | 100 .1284623 .1209472 .0269917 .5841107

. local wbar=r(mean)

. noi di e(rmse)*sqrt(`wbar')

.11229563

0 = 1.0, 1 = 0.5, = 0.10

Page 12: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

12

Can we do this using -xtmixed- ?

. xtmixed y x ||???: x

• How do we get –xtmixed- to estimate a non-constant residual variance?

• Degenerate dependency of random effects (u0i = u1i).

• Coefficients of random intercept and slope (c0 and c1) need to be constrained.

yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi + c0u0i + c1xiu1i

where u0i = u1i and c1/c0 = 1/0

Page 13: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

13

yi = 0 + 1xi + c0u0i + c1xiu1i

Can we do this using -xtmixed- ?

set obs 1000gen x = 5*uniform()gen mu = 3+1.4*xgen u0=invnorm(uniform())gen u1=invnorm(uniform())gen y = mu + 0.05*u0 + 0.50*x*u1

How do we get –xtmixed- to estimate a non-constant residual variance?

Ex. 1: Ignore dependency of u’s and constraints on c’s:

gen ord=_nxtmixed y x ||ord: x,noc

Page 14: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

14

. xtmixed y x ||ord: x,noc nolog

Mixed-effects REML regression Number of obs = 1000Group variable: ord Number of groups = 1000

Obs per group: min = 1 avg = 1.0 max = 1

Wald chi2(1) = 5745.72Log restricted-likelihood = -1444.8061 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.404595 .0185301 75.80 0.000 1.368276 1.440913 _cons | 3.0182 .0116741 258.54 0.000 2.99532 3.041081------------------------------------------------------------------------------

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------ord: Identity | sd(x) | .5058616 .0118386 .4831824 .5296053-----------------------------+------------------------------------------------ sd(Residual) | .0495162 .0143015 .0281125 .0872159------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 718.85 Prob >= chibar2 = 0.0000

0 = 3.0, 1 = 1.4, c0 = 0.05, c1 = 0.50

Page 15: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

15

yi = 0 + 1xi + c1ziu1i

Can we do this using -xtmixed- ?

set obs 1000gen x = 5*uniform()gen z = 3 + 1.4*xgen u1=invnorm(uniform())gen y = 3 + 1.4*x + 0.50*z*u1gen ord=_nxtmixed y x ||ord: z,noc

How do we get –xtmixed- to estimate a non-constant residual variance?

Ex. 2: No random intercept, covariate known:

Page 16: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

16

Can we do this using -xtmixed- ?

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log restricted-likelihood = -2506.6255 numerical derivatives are approximateflat or discontinuous region encounteredIteration 1: log restricted-likelihood = -2503.1832 numerical derivatives are approximate

Garbage!

yi = 0 + 1xi + c1ziu1i

How do we get –xtmixed- to estimate a non-constant residual variance?

Ex. 2: No random intercept, covariate known:

xtmixed y x ||ord: z,noc

Page 17: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

17

expand 3sort ordgen yf=y + .001*invnorm(uniform())xtmixed yf x ||ord: z,noc nolog

Can we do this using -xtmixed- ?

How do we get –xtmixed- to estimate a non-constant residual variance?

Ex. 2: No random intercept, covariate known:

yi = 0 + 1xi + c1ziu1i

Page 18: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

18

. xtmixed yf x ||ord: z,noc nolog

Mixed-effects REML regression Number of obs = 3000Group variable: ord Number of groups = 1000

Obs per group: min = 3 avg = 3.0 max = 3

Wald chi2(1) = 484.13Log restricted-likelihood = 7952.0717 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.393598 .063337 22.00 0.000 1.26946 1.517736 _cons | 3.071291 .1252851 24.51 0.000 2.825737 3.316846------------------------------------------------------------------------------

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------ord: Identity | sd(z) | .4814862 .0107771 .46082 .5030792-----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 31334.04 Prob >= chibar2 = 0.0000

0 = 3.0, 1 = 1.4, c1 = 0.50

Page 19: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

19

Can we do this using -xtmixed- ?

• Degenerate dependency of random effects (u0i = u1i).

• Coefficients of random intercept and slope (c0 and c1) need to be constrained.

yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi + c0u0i + c1xiu1i

where u0i = u1i and c1/c0 = 1/0

Ex 3: No random intercept (unknown covariate)

Page 20: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

20

Can we do this using -xtmixed- ?

• Degenerate dependency of random effects (u0i = u1i).

• Coefficients of random intercept and slope (c0 and c1) need to be constrained.

yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi + c0u0i + c1xiu1i

where u0i = u1i and c1/c0 = 1/0

Recast model with one error term and pretend zi = 0 + 1xi is known. Then iterate.

Ex 3: No random intercept (unknown covariate)

Page 21: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

21

yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi+ c1ziu1i

Can we do this using -xtmixed- ?

1. Expand and introduce artificial “residual” error term

.expand 3

.gen yf=y + .001*invnorm(uniform())

Page 22: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

22

1. Expand and introduce artificial “residual” error term.2. Estimate zi by OLS or other “easy” method.

Can we do this using -xtmixed- ?

.expand 3

.gen yf=y + .001*invnorm(uniform())

.reg y x

.predict zh

yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi+ c1ziu1i

Page 23: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

23

1. Expand and introduce artificial “residual” error term.2. Estimate zi by OLS or other “easy” method.3. Fit model pretending prediction zhi is actual zi.

.expand 3

.gen yf=y + .001*invnorm(uniform())

.reg y x

.predict zh

.xtmixed yf x ||ord: zh,noc nolog

yi = 0 + 1xi + c1ziu1i

zi = 0 + 1xi is unknown]

Can we do this using -xtmixed- ?

Page 24: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

24

.expand 3

.gen yf=y + .001*invnorm(uniform())

.reg y x

.predict zh

.xtmixed yf x ||ord: zh,noc nolog

.drop zh

.predict zh

Can we do this using -xtmixed- ?

yi = 0 + 1xi + c1ziu1i

zi = 0 + 1xi is unknown]

1. Expand and introduce artificial “residual” error term.2. Estimate zi by OLS or other “easy” method.3. Fit model pretending prediction zhi is actual zi.4. Iterate.

Page 25: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

25

. xtmixed yf x ||ord: zh,noc

Mixed-effects REML regression Number of obs = 3000Group variable: ord Number of groups = 1000

Obs per group: min = 3 avg = 3.0 max = 3

Wald chi2(1) = 484.01Log restricted-likelihood = 7952.4049 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.39339 .0633354 22.00 0.000 1.269255 1.517525 _cons | 3.071699 .1261407 24.35 0.000 2.824467 3.31893------------------------------------------------------------------------------

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------ord: Identity | sd(zh) | .4764546 .0106645 .4560044 .4978219-----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 31334.71 Prob >= chibar2 = 0.0000

0 = 3.0, 1 = 1.4, c1 = 0.50

Page 26: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

26

args NS nr be0 be1 c1 c2drop _allset obs `NS'gen id=_ngen u1 = invnorm(uniform())expand `nr'sort idgen u2=invnorm(uniform())gen x = 10*uniform()gen z = `be0' + `be1'*xgen zi = z + `c1'*z*u1gen y = zi + `c2'*zi*u2

yij = 0 + 1xij + c1(0 + 1xij)u1i + c2[0 + 1xij + c1(0 + 1xij)u1i]u2i

2-level modelE(yij | xij )

E(yij | xij , u1i )

[“z”]

[“zi”]

Page 27: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

27

1 1 1 1 1 1 1 1 1 12

22

22

22

22

2

33

33

33

33

33

4 4 4 4 4 4 4 4 4 4

5

5

5

5

5

5

5

5

5

5

01

02

03

04

05

0

0 2 4 6 8 10x

z_i zy

2-level model (example)

Page 28: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

28

//[gen y = zi + `c2'*zi*e]

gen obs=_nexpand 3sort obsgen yf = y + .001*invnorm(uniform())

xtmixed y x ||id: x,noc nologpredict zh0predict uh1i_0,reffects level(id)gen zhi_0 = zh0 + uh1i_0

xtmixed yf x ||id: zh0,noc ||obs: zhi_0,noc nologpredict zh1predict uh1i_1,reffects level(id)gen zhi_1 = zh1 + uh1i_1

xtmixed yf x ||id: zh1,noc ||obs: zhi_1,noc nologpredict zh2predict zhi_2,reffects level(id)gen zhi_2 = zh2 + uh1i_2

noi xtmixed yf x ||id: zh2,noc ||obs: zhi_2,noc nolog

0

1

2

3

Page 29: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

29

. run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1

.21211063 .05062864

.21216237 .05076224

.21213417 .05075685

.21213363 .05075686

.21213354 .05075685

.21213353 .05075685

Mixed-effects REML regression Number of obs = 300----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum----------------+------------------------------------------ id | 20 15 15.0 15 obs | 100 3 3.0 3-----------------------------------------------------------

Page 30: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

30

. run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1

Wald chi2(1) = 421.17Log restricted-likelihood = 1038.0836 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons | 1.039358 .0535761 19.40 0.000 .9343512 1.144366------------------------------------------------------------------------------

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------id: Identity | sd(zh) | .2121335 .0348399 .1537487 .2926895-----------------------------+------------------------------------------------obs: Identity | sd(zhi) | .0507568 .0040389 .0434271 .0593237-----------------------------+------------------------------------------------ sd(Residual) | .0009429 .0000471 .0008549 .00104------------------------------------------------------------------------------LR test vs. linear regression: chi2(2) = 2866.48 Prob > chi2 = 0.0000

Page 31: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

31

be[2] sample: 10000

0.9 1.0 1.1 1.2

0.0 2.5 5.0 7.5 10.0

sige sample: 10000

0.04 0.05 0.06 0.07 0.08

0.0 25.0 50.0 75.0 100.0

c2

be[1] sample: 10000

0.8 0.9 1.0 1.1

0.0 2.0 4.0 6.0 8.0

sigu sample: 10000

0.1 0.2 0.3 0.4

0.0

5.0

10.0

15.0

c1

Bayesian Estimation (WINBUGS)

Page 32: 1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

32

WINBUGS

STATA (xtmixed)

node mean sd 2.5% median 97.5% start samplebe1 1.077 0.04988 0.9847 1.076 1.169 10001 10000be0 1.030 .05257 0.9317 1.03 1.131 10001 10000c1 0.217 0.03455 0.1618 0.2127 0.2958 10001 10000c2 0.064 0.00465 0.0556 0.0637 0.07353 10001 10000

yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons| 1.039358 .0535761 19.40 0.000 .9343512 1.144366------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------

id: Identity | sd(xb) | .2121335 .0348399 .1537487 .2926895-----------------------------+------------------------------------------------obs: Identity | sd(muhi) | .0507568 .0040389 .0434271 .0593237-----------------------------+------------------------------------------------