1 estimation of constant-cv regression models alan h. feiveson nasa – johnson space center...
TRANSCRIPT
1
Estimation of constant-CV regression models
Alan H. Feiveson
NASA – Johnson Space Center
Houston, TX
SNASUG 2008
Chicago, IL
2
yi = 0 + 1xi + ei V( ei ) = 2
)(
)()(
i
ii yE
ySDyCV
Variance Models with Simple Linear Regression
22)( ii xeV yi = 0 + 1xi + ei
yi = 0 + 1xi + ei V( ei ) = 2(0 + 1xi)2
y = X + Zu
3
.clear
. set obs 100
. gen x=10*uniform()
. gen mu = 1+.5*x
. replace y=mu+.10*mu*invnorm(uniform())
Example: 0 = 1.0, 1 = 0.5, = 0.101
23
45
6y
0 2 4 6 8 10x
yi = 0 + 1xi + ei V(ei) = 2(0 + 1xi)2
Problem: Estimate 0, 1, and
4
22
10
)()(
1)(log
i
ii yV
xyV
Variance Stabilization
yi = 0 + 1xi + ei V(ei) = 2(0 + 1xi)2
)()]('[))(( 2 yVfyfV
But E(log yi) = g( 1, xi)
5
Source | SS df MS Number of obs = 100-------------+------------------------------ F( 2, 97) = 630.65 Model | 15.9635043 2 7.98175216 Prob > F = 0.0000 Residual | 1.22767564 97 .01265645 R-squared = 0.9286-------------+------------------------------ Adj R-squared = 0.9271
Total | 17.19118 99 .173648282 Root MSE = .1125
------------------------------------------------------------------------------ z | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .2707448 .018873 14.35 0.000 .2332872 .3082025 x2 | -.0110946 .0017622 -6.30 0.000 -.0145921 -.0075972 _cons | .1783796 .0441957 4.04 0.000 .0906634 .2660958------------------------------------------------------------------------------
. gen z = log(y)
. gen x2 = x*x
. reg z x x2
. predict z_hat
Approximate g( 1, xi) by polynomial in x, then do OLS regression of log y on x:
0 = 1.0, 1 = 0.5, = 0.10
6
0.5
11
.52
0 2 4 6 8 10x
z_hat log(y)
But what about 0 and 1?
7
reg y x
predict muh
reg y x [w=1/muh^2]
.local rmse=e(rmse)
.gen wt = 1/muh^2
.summ wt
.local wbar=r(mean)
.local sigh = sqrt(`wbar’)*`rmse’
Alternative: Iteratively re-weighted regression
8
ITERATION 0
. reg y x
Source | SS df MS Number of obs = 100-------------+------------------------------ F( 1, 98) = 832.95 Model | 172.307709 1 172.307709 Prob > F = 0.0000 Residual | 20.272763 98 .206864928 R-squared = 0.8947-------------+------------------------------ Adj R-squared = 0.8937 Total | 192.580472 99 1.94525729 Root MSE = .45482
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .518296 .0179585 28.86 0.000 .482658 .5539339 _cons | .9530661 .1014154 9.40 0.000 .7518106 1.154322------------------------------------------------------------------------------
. gen wt = 1/(_b[_cons] + _b[x]*x)^2
2)518.0953.0(1
X
9
ITERATION 1
. reg y x [w=wt](analytic weights assumed)(sum of wgt is 1.3046e+01)
Source | SS df MS Number of obs = 100-------------+------------------------------ F( 1, 98) = 1278.10 Model | 123.561274 1 123.561274 Prob > F = 0.0000 Residual | 9.47421712 98 .096675685 R-squared = 0.9288-------------+------------------------------ Adj R-squared = 0.9281 Total | 133.035492 99 1.34379284 Root MSE = .31093
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .510555 .014281 35.75 0.000 .4822147 .5388952 _cons | .9877915 .0533903 18.50 0.000 .8818402 1.093743------------------------------------------------------------------------------
. replace wt = 1/(_b[_cons] + _b[x]*x)^2(100 real changes made)
2)511.0988.0(1
X
10
ITERATION 2
. reg y x [w=wt](analytic weights assumed)(sum of wgt is 1.2842e+01)
Source | SS df MS Number of obs = 100-------------+------------------------------ F( 1, 98) = 1271.77 Model | 124.885941 1 124.885941 Prob > F = 0.0000 Residual | 9.62343467 98 .098198313 R-squared = 0.9285-------------+------------------------------ Adj R-squared = 0.9277 Total | 134.509376 99 1.35868057 Root MSE = .31337
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .510746 .0143219 35.66 0.000 .4823247 .5391674 _cons | .9870233 .0540361 18.27 0.000 .8797904 1.094256------------------------------------------------------------------------------
. replace wt = 1/(_b[_cons] + _b[x]*x)^2(100 real changes made)
2)511.0987.0(1
X
11
ITERATION 3
. noi reg y x [w=wt](analytic weights assumed)(sum of wgt is 1.2846e+01)
Source | SS df MS Number of obs = 100-------------+------------------------------ F( 1, 98) = 1271.92 Model | 124.855967 1 124.855967 Prob > F = 0.0000 Residual | 9.6200215 98 .098163485 R-squared = 0.9285-------------+------------------------------ Adj R-squared = 0.9277 Total | 134.475988 99 1.35834331 Root MSE = .31331
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .5107417 .0143209 35.66 0.000 .4823223 .5391612 _cons | .9870406 .0540213 18.27 0.000 .8798371 1.094244------------------------------------------------------------------------------. summ wt
Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- wt | 100 .1284623 .1209472 .0269917 .5841107
. local wbar=r(mean)
. noi di e(rmse)*sqrt(`wbar')
.11229563
0 = 1.0, 1 = 0.5, = 0.10
12
Can we do this using -xtmixed- ?
. xtmixed y x ||???: x
• How do we get –xtmixed- to estimate a non-constant residual variance?
• Degenerate dependency of random effects (u0i = u1i).
• Coefficients of random intercept and slope (c0 and c1) need to be constrained.
yi = 0 + 1xi + (0 + 1xi)ui
= 0 + 1xi + c0u0i + c1xiu1i
where u0i = u1i and c1/c0 = 1/0
13
yi = 0 + 1xi + c0u0i + c1xiu1i
Can we do this using -xtmixed- ?
set obs 1000gen x = 5*uniform()gen mu = 3+1.4*xgen u0=invnorm(uniform())gen u1=invnorm(uniform())gen y = mu + 0.05*u0 + 0.50*x*u1
How do we get –xtmixed- to estimate a non-constant residual variance?
Ex. 1: Ignore dependency of u’s and constraints on c’s:
gen ord=_nxtmixed y x ||ord: x,noc
14
. xtmixed y x ||ord: x,noc nolog
Mixed-effects REML regression Number of obs = 1000Group variable: ord Number of groups = 1000
Obs per group: min = 1 avg = 1.0 max = 1
Wald chi2(1) = 5745.72Log restricted-likelihood = -1444.8061 Prob > chi2 = 0.0000
------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.404595 .0185301 75.80 0.000 1.368276 1.440913 _cons | 3.0182 .0116741 258.54 0.000 2.99532 3.041081------------------------------------------------------------------------------
------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------ord: Identity | sd(x) | .5058616 .0118386 .4831824 .5296053-----------------------------+------------------------------------------------ sd(Residual) | .0495162 .0143015 .0281125 .0872159------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 718.85 Prob >= chibar2 = 0.0000
0 = 3.0, 1 = 1.4, c0 = 0.05, c1 = 0.50
15
yi = 0 + 1xi + c1ziu1i
Can we do this using -xtmixed- ?
set obs 1000gen x = 5*uniform()gen z = 3 + 1.4*xgen u1=invnorm(uniform())gen y = 3 + 1.4*x + 0.50*z*u1gen ord=_nxtmixed y x ||ord: z,noc
How do we get –xtmixed- to estimate a non-constant residual variance?
Ex. 2: No random intercept, covariate known:
16
Can we do this using -xtmixed- ?
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log restricted-likelihood = -2506.6255 numerical derivatives are approximateflat or discontinuous region encounteredIteration 1: log restricted-likelihood = -2503.1832 numerical derivatives are approximate
Garbage!
yi = 0 + 1xi + c1ziu1i
How do we get –xtmixed- to estimate a non-constant residual variance?
Ex. 2: No random intercept, covariate known:
xtmixed y x ||ord: z,noc
17
expand 3sort ordgen yf=y + .001*invnorm(uniform())xtmixed yf x ||ord: z,noc nolog
Can we do this using -xtmixed- ?
How do we get –xtmixed- to estimate a non-constant residual variance?
Ex. 2: No random intercept, covariate known:
yi = 0 + 1xi + c1ziu1i
18
. xtmixed yf x ||ord: z,noc nolog
Mixed-effects REML regression Number of obs = 3000Group variable: ord Number of groups = 1000
Obs per group: min = 3 avg = 3.0 max = 3
Wald chi2(1) = 484.13Log restricted-likelihood = 7952.0717 Prob > chi2 = 0.0000
------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.393598 .063337 22.00 0.000 1.26946 1.517736 _cons | 3.071291 .1252851 24.51 0.000 2.825737 3.316846------------------------------------------------------------------------------
------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------ord: Identity | sd(z) | .4814862 .0107771 .46082 .5030792-----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 31334.04 Prob >= chibar2 = 0.0000
0 = 3.0, 1 = 1.4, c1 = 0.50
19
Can we do this using -xtmixed- ?
• Degenerate dependency of random effects (u0i = u1i).
• Coefficients of random intercept and slope (c0 and c1) need to be constrained.
yi = 0 + 1xi + (0 + 1xi)ui
= 0 + 1xi + c0u0i + c1xiu1i
where u0i = u1i and c1/c0 = 1/0
Ex 3: No random intercept (unknown covariate)
20
Can we do this using -xtmixed- ?
• Degenerate dependency of random effects (u0i = u1i).
• Coefficients of random intercept and slope (c0 and c1) need to be constrained.
yi = 0 + 1xi + (0 + 1xi)ui
= 0 + 1xi + c0u0i + c1xiu1i
where u0i = u1i and c1/c0 = 1/0
Recast model with one error term and pretend zi = 0 + 1xi is known. Then iterate.
Ex 3: No random intercept (unknown covariate)
21
yi = 0 + 1xi + (0 + 1xi)ui
= 0 + 1xi+ c1ziu1i
Can we do this using -xtmixed- ?
1. Expand and introduce artificial “residual” error term
.expand 3
.gen yf=y + .001*invnorm(uniform())
22
1. Expand and introduce artificial “residual” error term.2. Estimate zi by OLS or other “easy” method.
Can we do this using -xtmixed- ?
.expand 3
.gen yf=y + .001*invnorm(uniform())
.reg y x
.predict zh
yi = 0 + 1xi + (0 + 1xi)ui
= 0 + 1xi+ c1ziu1i
23
1. Expand and introduce artificial “residual” error term.2. Estimate zi by OLS or other “easy” method.3. Fit model pretending prediction zhi is actual zi.
.expand 3
.gen yf=y + .001*invnorm(uniform())
.reg y x
.predict zh
.xtmixed yf x ||ord: zh,noc nolog
yi = 0 + 1xi + c1ziu1i
zi = 0 + 1xi is unknown]
Can we do this using -xtmixed- ?
24
.expand 3
.gen yf=y + .001*invnorm(uniform())
.reg y x
.predict zh
.xtmixed yf x ||ord: zh,noc nolog
.drop zh
.predict zh
Can we do this using -xtmixed- ?
yi = 0 + 1xi + c1ziu1i
zi = 0 + 1xi is unknown]
1. Expand and introduce artificial “residual” error term.2. Estimate zi by OLS or other “easy” method.3. Fit model pretending prediction zhi is actual zi.4. Iterate.
25
. xtmixed yf x ||ord: zh,noc
Mixed-effects REML regression Number of obs = 3000Group variable: ord Number of groups = 1000
Obs per group: min = 3 avg = 3.0 max = 3
Wald chi2(1) = 484.01Log restricted-likelihood = 7952.4049 Prob > chi2 = 0.0000
------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.39339 .0633354 22.00 0.000 1.269255 1.517525 _cons | 3.071699 .1261407 24.35 0.000 2.824467 3.31893------------------------------------------------------------------------------
------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------ord: Identity | sd(zh) | .4764546 .0106645 .4560044 .4978219-----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 31334.71 Prob >= chibar2 = 0.0000
0 = 3.0, 1 = 1.4, c1 = 0.50
26
args NS nr be0 be1 c1 c2drop _allset obs `NS'gen id=_ngen u1 = invnorm(uniform())expand `nr'sort idgen u2=invnorm(uniform())gen x = 10*uniform()gen z = `be0' + `be1'*xgen zi = z + `c1'*z*u1gen y = zi + `c2'*zi*u2
yij = 0 + 1xij + c1(0 + 1xij)u1i + c2[0 + 1xij + c1(0 + 1xij)u1i]u2i
2-level modelE(yij | xij )
E(yij | xij , u1i )
[“z”]
[“zi”]
27
1 1 1 1 1 1 1 1 1 12
22
22
22
22
2
33
33
33
33
33
4 4 4 4 4 4 4 4 4 4
5
5
5
5
5
5
5
5
5
5
01
02
03
04
05
0
0 2 4 6 8 10x
z_i zy
2-level model (example)
28
//[gen y = zi + `c2'*zi*e]
gen obs=_nexpand 3sort obsgen yf = y + .001*invnorm(uniform())
xtmixed y x ||id: x,noc nologpredict zh0predict uh1i_0,reffects level(id)gen zhi_0 = zh0 + uh1i_0
xtmixed yf x ||id: zh0,noc ||obs: zhi_0,noc nologpredict zh1predict uh1i_1,reffects level(id)gen zhi_1 = zh1 + uh1i_1
xtmixed yf x ||id: zh1,noc ||obs: zhi_1,noc nologpredict zh2predict zhi_2,reffects level(id)gen zhi_2 = zh2 + uh1i_2
noi xtmixed yf x ||id: zh2,noc ||obs: zhi_2,noc nolog
0
1
2
3
29
. run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1
.21211063 .05062864
.21216237 .05076224
.21213417 .05075685
.21213363 .05075686
.21213354 .05075685
.21213353 .05075685
Mixed-effects REML regression Number of obs = 300----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum----------------+------------------------------------------ id | 20 15 15.0 15 obs | 100 3 3.0 3-----------------------------------------------------------
30
. run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1
Wald chi2(1) = 421.17Log restricted-likelihood = 1038.0836 Prob > chi2 = 0.0000
------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons | 1.039358 .0535761 19.40 0.000 .9343512 1.144366------------------------------------------------------------------------------
------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------id: Identity | sd(zh) | .2121335 .0348399 .1537487 .2926895-----------------------------+------------------------------------------------obs: Identity | sd(zhi) | .0507568 .0040389 .0434271 .0593237-----------------------------+------------------------------------------------ sd(Residual) | .0009429 .0000471 .0008549 .00104------------------------------------------------------------------------------LR test vs. linear regression: chi2(2) = 2866.48 Prob > chi2 = 0.0000
31
be[2] sample: 10000
0.9 1.0 1.1 1.2
0.0 2.5 5.0 7.5 10.0
sige sample: 10000
0.04 0.05 0.06 0.07 0.08
0.0 25.0 50.0 75.0 100.0
c2
be[1] sample: 10000
0.8 0.9 1.0 1.1
0.0 2.0 4.0 6.0 8.0
sigu sample: 10000
0.1 0.2 0.3 0.4
0.0
5.0
10.0
15.0
c1
Bayesian Estimation (WINBUGS)
32
WINBUGS
STATA (xtmixed)
node mean sd 2.5% median 97.5% start samplebe1 1.077 0.04988 0.9847 1.076 1.169 10001 10000be0 1.030 .05257 0.9317 1.03 1.131 10001 10000c1 0.217 0.03455 0.1618 0.2127 0.2958 10001 10000c2 0.064 0.00465 0.0556 0.0637 0.07353 10001 10000
yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons| 1.039358 .0535761 19.40 0.000 .9343512 1.144366------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------
id: Identity | sd(xb) | .2121335 .0348399 .1537487 .2926895-----------------------------+------------------------------------------------obs: Identity | sd(muhi) | .0507568 .0040389 .0434271 .0593237-----------------------------+------------------------------------------------