1 estimation of constant-cv regression models alan h. feiveson nasa – johnson space center...

1

Estimation of constant-CV regression models

Alan H. Feiveson

NASA – Johnson Space Center

Houston, TX

SNASUG 2008

Chicago, IL

2

yi = 0 + 1xi + ei V( ei ) = 2

)(

)()(

i

ii yE

ySDyCV

Variance Models with Simple Linear Regression

22)( ii xeV yi = 0 + 1xi + ei

yi = 0 + 1xi + ei V( ei ) = 2(0 + 1xi)2

y = X + Zu

3

.clear

. set obs 100

. gen x=10*uniform()

. gen mu = 1+.5*x

. replace y=mu+.10*mu*invnorm(uniform())

Example: 0 = 1.0, 1 = 0.5, = 0.101

23

45

6y

0 2 4 6 8 10x

yi = 0 + 1xi + ei V(ei) = 2(0 + 1xi)2

Problem: Estimate 0, 1, and

4

22

10

)()(

1)(log

i

ii yV

xyV

Variance Stabilization

yi = 0 + 1xi + ei V(ei) = 2(0 + 1xi)2

)()]('[))(( 2 yVfyfV

But E(log yi) = g( 1, xi)

5

Source | SS df MS Number of obs = 100-------------+------------------------------ F( 2, 97) = 630.65 Model | 15.9635043 2 7.98175216 Prob > F = 0.0000 Residual | 1.22767564 97 .01265645 R-squared = 0.9286-------------+------------------------------ Adj R-squared = 0.9271

Total | 17.19118 99 .173648282 Root MSE = .1125

------------------------------------------------------------------------------ z | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .2707448 .018873 14.35 0.000 .2332872 .3082025 x2 | -.0110946 .0017622 -6.30 0.000 -.0145921 -.0075972 _cons | .1783796 .0441957 4.04 0.000 .0906634 .2660958------------------------------------------------------------------------------

. gen z = log(y)

. gen x2 = x*x

. reg z x x2

. predict z_hat

Approximate g( 1, xi) by polynomial in x, then do OLS regression of log y on x:

0 = 1.0, 1 = 0.5, = 0.10

6

0.5

11

.52

0 2 4 6 8 10x

z_hat log(y)

But what about 0 and 1?

7

reg y x

predict muh

reg y x [w=1/muh^2]

.local rmse=e(rmse)

.gen wt = 1/muh^2

.summ wt

.local wbar=r(mean)

.local sigh = sqrt(`wbar’)*`rmse’

Alternative: Iteratively re-weighted regression

8

ITERATION 0

. reg y x

Source | SS df MS Number of obs = 100-------------+------------------------------ F( 1, 98) = 832.95 Model | 172.307709 1 172.307709 Prob > F = 0.0000 Residual | 20.272763 98 .206864928 R-squared = 0.8947-------------+------------------------------ Adj R-squared = 0.8937 Total | 192.580472 99 1.94525729 Root MSE = .45482

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .518296 .0179585 28.86 0.000 .482658 .5539339 _cons | .9530661 .1014154 9.40 0.000 .7518106 1.154322------------------------------------------------------------------------------

. gen wt = 1/(_b[_cons] + _b[x]*x)^2

2)518.0953.0(1

X

9

ITERATION 1

. reg y x [w=wt](analytic weights assumed)(sum of wgt is 1.3046e+01)



. replace wt = 1/(_b[_cons] + _b[x]*x)^2(100 real changes made)

2)511.0988.0(1

X

10

ITERATION 2

. reg y x [w=wt](analytic weights assumed)(sum of wgt is 1.2842e+01)



. replace wt = 1/(_b[_cons] + _b[x]*x)^2(100 real changes made)

2)511.0987.0(1

X

11

ITERATION 3

. noi reg y x [w=wt](analytic weights assumed)(sum of wgt is 1.2846e+01)


------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .5107417 .0143209 35.66 0.000 .4823223 .5391612 _cons | .9870406 .0540213 18.27 0.000 .8798371 1.094244------------------------------------------------------------------------------. summ wt

Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- wt | 100 .1284623 .1209472 .0269917 .5841107

. local wbar=r(mean)

. noi di e(rmse)*sqrt(`wbar')

.11229563

0 = 1.0, 1 = 0.5, = 0.10

12

Can we do this using -xtmixed- ?

. xtmixed y x ||???: x

• How do we get –xtmixed- to estimate a non-constant residual variance?

• Degenerate dependency of random effects (u0i = u1i).

• Coefficients of random intercept and slope (c0 and c1) need to be constrained.

yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi + c0u0i + c1xiu1i

where u0i = u1i and c1/c0 = 1/0

13

yi = 0 + 1xi + c0u0i + c1xiu1i


set obs 1000gen x = 5*uniform()gen mu = 3+1.4*xgen u0=invnorm(uniform())gen u1=invnorm(uniform())gen y = mu + 0.05*u0 + 0.50*x*u1

How do we get –xtmixed- to estimate a non-constant residual variance?

Ex. 1: Ignore dependency of u’s and constraints on c’s:

gen ord=_nxtmixed y x ||ord: x,noc

14

. xtmixed y x ||ord: x,noc nolog

Mixed-effects REML regression Number of obs = 1000Group variable: ord Number of groups = 1000

Obs per group: min = 1 avg = 1.0 max = 1

Wald chi2(1) = 5745.72Log restricted-likelihood = -1444.8061 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.404595 .0185301 75.80 0.000 1.368276 1.440913 _cons | 3.0182 .0116741 258.54 0.000 2.99532 3.041081------------------------------------------------------------------------------

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------ord: Identity | sd(x) | .5058616 .0118386 .4831824 .5296053-----------------------------+------------------------------------------------ sd(Residual) | .0495162 .0143015 .0281125 .0872159------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 718.85 Prob >= chibar2 = 0.0000

0 = 3.0, 1 = 1.4, c0 = 0.05, c1 = 0.50

15

yi = 0 + 1xi + c1ziu1i


set obs 1000gen x = 5*uniform()gen z = 3 + 1.4*xgen u1=invnorm(uniform())gen y = 3 + 1.4*x + 0.50*z*u1gen ord=_nxtmixed y x ||ord: z,noc


Ex. 2: No random intercept, covariate known:

16


Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log restricted-likelihood = -2506.6255 numerical derivatives are approximateflat or discontinuous region encounteredIteration 1: log restricted-likelihood = -2503.1832 numerical derivatives are approximate

Garbage!




xtmixed y x ||ord: z,noc

17

expand 3sort ordgen yf=y + .001*invnorm(uniform())xtmixed yf x ||ord: z,noc nolog





18

. xtmixed yf x ||ord: z,noc nolog



Wald chi2(1) = 484.13Log restricted-likelihood = 7952.0717 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.393598 .063337 22.00 0.000 1.26946 1.517736 _cons | 3.071291 .1252851 24.51 0.000 2.825737 3.316846------------------------------------------------------------------------------

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------ord: Identity | sd(z) | .4814862 .0107771 .46082 .5030792-----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 31334.04 Prob >= chibar2 = 0.0000

0 = 3.0, 1 = 1.4, c1 = 0.50

19




yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi + c0u0i + c1xiu1i


Ex 3: No random intercept (unknown covariate)

20




yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi + c0u0i + c1xiu1i


Recast model with one error term and pretend zi = 0 + 1xi is known. Then iterate.

Ex 3: No random intercept (unknown covariate)

21

yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi+ c1ziu1i


1. Expand and introduce artificial “residual” error term

.expand 3

.gen yf=y + .001*invnorm(uniform())

22

1. Expand and introduce artificial “residual” error term.2. Estimate zi by OLS or other “easy” method.


.expand 3


.reg y x

.predict zh

yi = 0 + 1xi + (0 + 1xi)ui

= 0 + 1xi+ c1ziu1i

23

1. Expand and introduce artificial “residual” error term.2. Estimate zi by OLS or other “easy” method.3. Fit model pretending prediction zhi is actual zi.

.expand 3


.reg y x

.predict zh

.xtmixed yf x ||ord: zh,noc nolog


zi = 0 + 1xi is unknown]


24

.expand 3


.reg y x

.predict zh

.xtmixed yf x ||ord: zh,noc nolog

.drop zh

.predict zh



zi = 0 + 1xi is unknown]

1. Expand and introduce artificial “residual” error term.2. Estimate zi by OLS or other “easy” method.3. Fit model pretending prediction zhi is actual zi.4. Iterate.

25

. xtmixed yf x ||ord: zh,noc




------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.39339 .0633354 22.00 0.000 1.269255 1.517525 _cons | 3.071699 .1261407 24.35 0.000 2.824467 3.31893------------------------------------------------------------------------------

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------ord: Identity | sd(zh) | .4764546 .0106645 .4560044 .4978219-----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 31334.71 Prob >= chibar2 = 0.0000

0 = 3.0, 1 = 1.4, c1 = 0.50

26

args NS nr be0 be1 c1 c2drop _allset obs `NS'gen id=_ngen u1 = invnorm(uniform())expand `nr'sort idgen u2=invnorm(uniform())gen x = 10*uniform()gen z = `be0' + `be1'*xgen zi = z + `c1'*z*u1gen y = zi + `c2'*zi*u2

yij = 0 + 1xij + c1(0 + 1xij)u1i + c2[0 + 1xij + c1(0 + 1xij)u1i]u2i

2-level modelE(yij | xij )

E(yij | xij , u1i )

[“z”]

[“zi”]

27

1 1 1 1 1 1 1 1 1 12

22

22

22

22

2

33

33

33

33

33

4 4 4 4 4 4 4 4 4 4

5

5

5

5

5

5

5

5

5

5

01

02

03

04

05

0

0 2 4 6 8 10x

z_i zy

2-level model (example)

28

//[gen y = zi + `c2'*zi*e]

gen obs=_nexpand 3sort obsgen yf = y + .001*invnorm(uniform())

xtmixed y x ||id: x,noc nologpredict zh0predict uh1i_0,reffects level(id)gen zhi_0 = zh0 + uh1i_0

xtmixed yf x ||id: zh0,noc ||obs: zhi_0,noc nologpredict zh1predict uh1i_1,reffects level(id)gen zhi_1 = zh1 + uh1i_1

xtmixed yf x ||id: zh1,noc ||obs: zhi_1,noc nologpredict zh2predict zhi_2,reffects level(id)gen zhi_2 = zh2 + uh1i_2

noi xtmixed yf x ||id: zh2,noc ||obs: zhi_2,noc nolog

0

1

2

3

29

. run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1

.21211063 .05062864

.21216237 .05076224

.21213417 .05075685

.21213363 .05075686

.21213354 .05075685

.21213353 .05075685

Mixed-effects REML regression Number of obs = 300----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum----------------+------------------------------------------ id | 20 15 15.0 15 obs | 100 3 3.0 3-----------------------------------------------------------

30

. run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1


------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons | 1.039358 .0535761 19.40 0.000 .9343512 1.144366------------------------------------------------------------------------------

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------id: Identity | sd(zh) | .2121335 .0348399 .1537487 .2926895-----------------------------+------------------------------------------------obs: Identity | sd(zhi) | .0507568 .0040389 .0434271 .0593237-----------------------------+------------------------------------------------ sd(Residual) | .0009429 .0000471 .0008549 .00104------------------------------------------------------------------------------LR test vs. linear regression: chi2(2) = 2866.48 Prob > chi2 = 0.0000

31

be[2] sample: 10000

0.9 1.0 1.1 1.2

0.0 2.5 5.0 7.5 10.0

sige sample: 10000

0.04 0.05 0.06 0.07 0.08

0.0 25.0 50.0 75.0 100.0

c2

be[1] sample: 10000

0.8 0.9 1.0 1.1

0.0 2.0 4.0 6.0 8.0

sigu sample: 10000

0.1 0.2 0.3 0.4

0.0

5.0

10.0

15.0

c1

Bayesian Estimation (WINBUGS)

32

WINBUGS

STATA (xtmixed)

node mean sd 2.5% median 97.5% start samplebe1 1.077 0.04988 0.9847 1.076 1.169 10001 10000be0 1.030 .05257 0.9317 1.03 1.131 10001 10000c1 0.217 0.03455 0.1618 0.2127 0.2958 10001 10000c2 0.064 0.00465 0.0556 0.0637 0.07353 10001 10000

yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons| 1.039358 .0535761 19.40 0.000 .9343512 1.144366------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------

id: Identity | sd(xb) | .2121335 .0348399 .1537487 .2926895-----------------------------+------------------------------------------------obs: Identity | sd(muhi) | .0507568 .0040389 .0434271 .0593237-----------------------------+------------------------------------------------

1 estimation of constant-cv regression models alan h. feiveson nasa – johnson space center...

Documents

muhreg y x w

y coef

reg z x x2

s2b0 b1xi2 y

b0 b1xi eivei

b0 b1xi ei v ei

gen wt

prob f