ad-a168 on the use of ridge and stein-type ...ad-a168 349 on the use of ridge and stein-type...

29
AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY 86 TR-374 N@8814-86-K-@i56 UNCLASSIFIED F/G 12/1

Upload: others

Post on 02-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/iPREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICSA E GELFAND 21 MAY 86 TR-374 N@8814-86-K-@i56

UNCLASSIFIED F/G 12/1

Page 2: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

1- 128 112

MtWGP __SLU -E

Page 3: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

SO NON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN PREDICTION

BY

0)ALAN E. GELFAIID

(0 TECHNICAL REPORT N0. 374

SMAY 21, 1986

a PREPARED UNDER CONTRACT

N00014-86-K-0156 (NR-042-267)

FOR THE OFFICE OF NAVAL RESEARCH

Reproduction In Whole or In Part is Permitted

for any purpose of the United States Government

Approved for public release; distribution unlimited.

DEPARTMENT OF STATISTICS DTICSTANFORD UNIVERSITY ELECTE

*C.STANFORD, CALIFORNIA JUN9 W

*

99

4 86 6 9 07

Page 4: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

i.4

* :-, ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN PREDICTION

BY

ALAN E. GELFANDS':

TECHNICAL REPORT NO. 374z.'. MAY 21, 1986

Prepared Under Contract

N00014-86-K-0156 (NR-042-267)

For the Office of Naval Research

Herbert Solomon, Project Director

Reproduction ui Whole or in Part is Permittedfor any purpose of the United States Government

Approved for public release; distribution unlimited.

DTICr!ECTE

" JUN 9 1986.9-. DEPARTMENT OF STATISTICS w

-.. , STANFORD UNIVERSITY

STANFORD, CALIFORNIA B4'..#m

Page 5: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN PREDICTION

Alan E. Gelfand

1. Introduction

For the usual regression model with fixed regressors,

Y = X8 + C, Ynxl' Xnxp full rank, BpxI and cnxl - (0,021),

there is considerable literature devoted to alternatives to

the ordinary least squares estimator, BOLS of B. From work

'P. -originally dating to Stein (1956) and James and Stein (1961)

when c is normally distributed and p _ 3, 8OLS is inadmissible

under loss (8-0) Q(6-8), Q an unrestricted positive definite

matrix. Thus, much of this extant discussion focuses on the

development of biased estimators with small "variances" which

achieve a smaller expected loss either uniformly over p-dimen-

sional Euclidean space or at least in the vicinity of some speci-

fied B*. Two "classes" of.such reduced variance regression

estimators are particularly well discussed - ridge estimators

and Stein-type estimators. Either directly or upon orthogonal

transformation these estimators take the form

(1) = Co + (I-C)B'-,.,C OLS

where C is a diagonal matrix, usually data dependent. They may

also be seen to be Bayes or "Empirical" Bayes procedures as well.

The review paper by Draper and Van Nostrand (1979) provides an

-S-

F: { . < "- . ,,' _ZK_ ..* .v.-. .

4. .rN_"

Page 6: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

!' 32

excellent summary of both the theoretical and simulated effort

in this area. . In the context of cross-validation, i.e. of

examining the performance of an estimator obtained in one

sample in prediction in a second independent sample, the work

of Stone (1974) leads to estimators of the form in (1) as well.

Herein we consider the simplest such cross-validation

problem. At a new vector of predictor values, X0 , we seek to

estimate XTB. We take as loss function (6 (Y)-XTB)2 for an0 0

estimator 6(Y) and we assume henceforth that c is normally dis-

tributed with a2 unknown. Our problem differs from that of

esti.mating the vector B since the results of Cohen (1965) showT^ T

that aXoBOLS is an admissible estimator of X 0 for 0 < a* _,

i.e. the UMVU estimator is admissible. (In fact, 6(Y) of the

form yTY is admissible for X T i.f.f.0

T -1T( Tx-1 <lXTx)-I(2y-X(X X) ). X < X2OO. ) Nonetheless,

if we have some confidence in B', i.e. that B0 is near the true

value B, then it makes sense to attempt to improve upon XO0OLS

in the "vicinity of 0*" using estimators of the form (1). More

specifically, how well do the "classes" of ridge estimators and

of Stein-type estimators perform in this prediction? Can we

make a "best" choice within these classes for a particular

prediction?

The problem of prediction of an independent observation Y

at X0 using the loss (6(Y)-Y 0) is equivalent to that of pre-

dicting XT8, i.e. E (6(Y)-Y 0 )2 C2 + ET(6(Y)-X 2

iVe p 4(W)

Page 7: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

t

3

For an estimator of the form XS, the expected loss be-

comes

(2) E B-0)T C -T E EX0iC I-0i) 20.

In the sequel we take the generalized ridge estimator BR to be

A-(XTX AlXy

(3) (XTy+ABI)"

where A is p.d. symmetric and possibly dependent on Y. We take

the general Stein-type estimator BS to be

(4) B3 (1- c/Q)BOLS + c/Q B0

' where Q - (BOLS-BU) TxTx(BOLS-) and c may depend on Y. In

practice c/Q is usually replaced by min(c/Q, 1).

In section 2 we calculate the risK (2), of the estimators

(3) and (4) when A, c are constant. We then investigate "best"

choices for A, c. Since these choices will be functions of

and a' as well as X0 , A and c must be estimated from Y. In section

3 we summarize a simulation study which compares the performance

of versions of (3) and (4) which are discussed for the estimation

of B along with others motivated by work in section 2. In section

4 we offer concluding remarks in particular with regard to

multiple prediction.

* -

Page 8: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

4

2. Theoretical Results

We first .note that for 0OLS (2) becomes

(5) a2XT (XTX) lX .

We now claim that

Theorem 1: For 8R as in (3), (2) becomes

a 2xT(TX+A)-IXTX(XTx+A)-I0

(6)+TTX+A)-l T T+ x-A(-*)(B*) x +A)-

Proof: We transform to principal components form. Let R be

nonsingular such that RXT XRT = I, RART = D, D a diagonal matrix

with diagonal entries di. For any point 0 in p-dimensional'V*

Euclidean space, let a - (R-1)T0. Then

OR (R-)T O a + (I+D) Da*' (I+D)R 0OLS + (ID,

i.e. of the form (1) with C - (I+D)-1 In terms of a, (2)

becomes Ea(a-a)w0w0 (a-a), w0 - RX. Since a - N(a,o 2 I), this

expectation is readily calculated to be

Ii, _

ai 2. TC2. + wT(IC)(a-ca*)(saa)T(IC)w0

Substitution for a, w0 and C yields (6). 0

/

m :m :W 4:2--Kh *'Z-KZ

Page 9: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

5

Note: Normality is not employed in this calculation.

In (3) A.is usually taken to be diagonal and, in fact,

the class of ridge (as opposed to generalized ridge) estimators

sets A = al, a > 0. The case where either by design or trans-

formation xTx = I reduces (5) to a 2EX and reduces (6), for

generalized ridge estimators (ai are the diagonal elements of

the diagonal matrix A) to

(7) a2 EX2 1 2 + ]EX () -a 1)2

(l+ai ) i

Investigation of this expression reveals that an optimal

choice for the ai to minimize (7) needn't exist although local

minima can be found. In the case of ridge estimation, i.e.

all a. = a, a unique minimum can be found. This occurs at

•2 - 2 T

(8) a0 = 0 0 X 0

where y = xT(8-'). Note that a > 0 and finite provided 8-8k

isn't orthogonal to XO . The associated minimum equals

2 2Toy XoX02 2 T

0 0

~2.~2T-" T

When 6 is such that 8-8* is orthogonal to X0 , then X 8 predicts

perfectly. For such 8's we can obtain zero expected loss and

q

-'

Page 10: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

would want no weight attached to 8 OLS' I.e. would want a = -.

In fact, it is clear that for X0 ,8 fixed there will be a set

of B' 's which predict X perfectly and that 8' needn't be0

close to B in Euclidean distance. Thus the appropriate pseudo-

metric for the prediction problem is (0 1-0 2 )TX 0X T(1I-2). This

pseudometric clarifies the earlier notion of "vicinity of B"'

and under this distance the further V' is from 8 the closer a

is to 0, i.e. the more weight is placed on BOLS, the closer 0*

is to 0 the larger a becomes, i.e. the more weight is placed

on- o . As would be expected, a 0 is invariant to scaling of Xo,

although the risk clearly isn't.T

Using (8) our estimator of X0 is

T = (la) XOBOLS + (l+a 0 ) a0X0B'a 0

and, in fact, for any fixed a > 0, Ta improves upon XTa2 o2a_1 O

whenever y < 2 a (2+a).

,From (8) a convenient estimator of a0 is:

(9) o Y XTX0. 0 0

when 02 is the usual UMVU estimator of a2 and X TY 0 (SOLS-)-

The fact that Ey- 2 doesn't exist suggests that a will be very

dwunstable and that T will perform poorly. We return to thisa0untabl andthatTa0^2

point in the discussion of the simulation study. Since a is

t . ~

Page 11: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

7A A A TA

independent of 8 0LS and a0 depends on O only through XOOLS,we may compute, the expected loss for Ta0 If [2 = a XT, then

2 ^ 2 A2 T 0A 0y - N(Y,) and, with T= a X X0, (2) for T- becomes

" 2 2 (T; )2 +2r ; Y. 2_ 2-r 2( r2(10) E Y Y)2 T2 +E (A2 (22 2 )A )-

-. Y+T (YY Ty

The equality (10) is seen using the identity E yf(y)(Y-y),a-..1

SA 2E () (Stein (1973)) valid provided E IAf(Y) which,," Y jhc

as the following calculations show, is the case). Now^22 2 2 2 2 2 2Y P2r X1 , y

2 /2T independent of T /2 ~ X Hence•2A .12 -n2Ll2

(y^ 1^2 )tI-3--,-p) where L - Po(y2/2t2). The expectation

of each term in (10) can thus be evaluated and (10) becomes

(11 T (L(n-)E~-P+L-13)- I(n-p+2) (2L+I) -n-p-2L+l I ,.(ii 2{L(n-)E(n-+2 3)I[(n-p+2L+5) n-p+2L+l~ j

-.. 2-,' :If we divide (11) by T . i.e. consider the risk relative

to that of OLS, then this relative risk is a function of2 2 2

aY /" T. Hence we set T 1 and examine the simpler estimator* A2 3

( +1)-1 3 which may be thought of as an "empirical" Ba-yes estimator

against a normal prior centered at 0, adjusted to have no singulariti

in RI . The risk of this estimator is readily obtained to be

1 + E C +1)-(3y -2) by an argument similar to that leading toY(10). This risk (symmetric about 0) is graphed in Figure I against

% > 0 to illustrate what may be expected, up to scaling, if (11)

is evaluated. Note that the risk is bounded and considerably

.,- , .- -

Page 12: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

8

less than I for y small. Because (2+)-23has sngularites

in the complex plane it is not admissible.

If we restore xTx, not necessarily diagonal, our estimator

T ^Tin (3) has A = a0X X or a0 X X according to (8) or (9).

Theorem 2: For BS as in (4) with p > 2, (2) becomes

(12) a X0 (XTX) -x 0 + X0T(XX)-1I0( +4co2)r /a2- 2cr2)

where

r E 2L+l r =E11 (p+2(L+M))(p+2(L+M)-2) ' 2 E p+2(L+M)-2

with

L - Po(X) , /2o 2xT(xTx)-IX

(13)

M- Po(6) , 6-- (AXT(XTX)- X0 _ 2 )/ 2 O2 Xo(XTX)-l

where L,M independent and A ( -*) T xTx(0-0).

Proof: As in Theorem 1, let R be nonsingular such that

XT T ~1 T 2Rx = I and let a = (R )T(SoLS-B*). Then a - N(a,o I) with-1,

.(a = (B-)T(8), Q = a a, and (2) becomes

P .

IK-V

g:N

Page 13: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

c, T- T 2

.~ p-T XTXTX)lwhere w = RX 0 (and w w - 0

If we expand (14) we obtain

• (wT^ 2 w (-))(15) E ( (a-O C.EE (CT)2 /(T)2-2cE (WT a

The last term may be written as -2c~wiE af(a)(Cii-Cii) where

f(a) (C (TC)- (w Tc). Using the Stein identity,2 3f(cz)- A

( E C Eaf(a)(a i-i), which is valid here), on this

expression, after manipulation (15) becomes

a(16) 2wTw + (c2+4ca 2 )E ( TA2/( T)2 - 2c 2w wEa (i/MTa).

TA 2 T^A T2:.(W CT ^ (wT0)2Finally if we let U (w w V = (w then

U- xIL 2 with L as in (13), _ .: - 1+2Lw

a

2 (3Y'IM X with M as in (13)vz..0 p-l+2M

;.;U I +2L p-l+2Mand given L and M, U Be(2 --, 2 ) independent of

U+V 2 H:.q -- Hence2. p+2(L+M)"

. "-?.

P.-"

Page 14: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

I / % . ".- , ." - . ." . . • , ° .% . " . r - . -. , - o .-r.- , -. - r ~ r -, - r - v_-w-w - .. - . " -4• -

..

.

.

. .

10

(w a)T Ta[ U iLM) T r 1 IL,T 2 w E(- 2 = w wE(V LV( a) (U+V)2

T T-w w E 2L+I 1 w w1

G 2 (p+2(L+MY) p+2(L+M-)-2 21

and similarly

1 1

a ^T^ 2 2aia a

M akinm these substitutions into (16) and restoring Xwe obtain

m0

(12). 0

Note: The proof reveals that the expected loss,3 (2), for more general-

Sestimators of the form (th(Q))os +h(Q) can be developed. In

fact, if

* E8I3h(Q)y Ia OLS,i

the loss Is

2 T T -1 2 22a 2XT XT -_l XEhQ-a2 Eh()(17) a X (X X) X +E h(Q)Y"..4

U Note:Inspectionof ( reveals that the uniquee loss, c2) is oe eea

Sinc Xe loss Inain osclnsfXsoi ti2~~~~~ 0*I0Eh ,. Io oE h ^

Page 15: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

apparent that for X0 fixed as a - 0, r2 /rl p, i.e.2

C o- a (p-2).. Hence if we believe 8* is close to 8 the

"usual" constant, a 2 (p-2), may be employed. Using this constant,

if A is near 0, the relative risk of X T to X OBOLS will, from

(12), be near 2/p, as it is in the case of estimating 8. As in

remarks after (10), If C2 is the usual estimator of a independent

of 8OLS we may compute (2) for 8S as in (4) withc - o (p-2).

We obtain

o : (19) a X0T(xTx)- o(1+r (p 2 _- 4 + 2 (n - p ) -1(p-2) 2)-2r2(p-2)).

Expressions similar to (19) can be obtained for instances

of the more general estimators mentioned above (17). This suggests

that the risk (2) may be calculated for commonly used (adaptive)

ridge .estimators, e.g. those considered in section 3. However,

without restrictions on the design matrix X, these estimators

often fail to either provide a in closed form or to define

a as a function of Q.

Since to a first order approximation c0 Z o2(2+1)- 1(p-2+2(6-1))

we may estimate c 0 by

^ --

(20) co = a2(2x+l) (p-2+2(0-.))

where X,6 are the expressions in (13) with 1 replaced by .

We would truncate c0 to the interval CO,Q]. Since E - doesn't

doen'

.1 .. -v . . - ... ' . . . . .[ ' . > -. . - .] .. - . - . .. .. -/ ...< .. <

Page 16: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

12AA

exist c0 will be very unstable (as with a0 in (9)) suggesting

that the resulting predictor will perform poorly. Again we

return to this point in the next section.

4,

3. A Simulation Study

A simulation was conducted to compare the use of the OLS

predictor with the predictors discussed in the previous section

and with predictors arising from other estimators of 0 which

have been discussed in the literature. For convenience we set2 T 2a = 1 and take X X= I, i.e. OLS " N($,I). Without loss of

generality we set 0* = 0 and X 0 = 1. Under this setup ridge

estimators become (+a) 0LS and Stein estimators become

,(1 - c/LS OLS) TOLS In addition to OOLS we consider the

following six estimators of a (4 ridge type, 2 Stein type).

4-. AT

(i) HK - arising from a = P/ BOLSOLS" The ridge estimators

discussed by Hoerl and Kennard (1970), Hoerl, Kennard

and Baldwin (1975) and, in fact, Lawless and Wang (1976)

reduce to 0HK in our setup.4-T

(ii) BRM - arising from a = p/( LSBOLS-P) with (l+a) - = 0ATA

if 6OLSOOLS : p. The RIDGM and STEINM estimators discussed

4- by Dempster, Shatzoff, and Wermuth (1977) reduce to 8RM"

(ili) 8 MG - arising from a - (1 - T ^ -OLSOLS) /0 1 OLS/OLS

with (1+a)- 0 if BOLSOLS I p. The ridge estimator

of McDonald and Galarneau (1975) reduces to 8MG

.4

Page 17: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

[- I I

13AA

(iv) B^ - arising from a 0 given in (9).a 0

UA

(v) 8p - arising from c = p-2, i.e. the "usual" Stein

estimator.

(vi) B^ - arising from co given in (20), truncated to [0,-).c 0

"Positive part" restrictions were applied to all "shrinkages"

in (v) and (vi).

We note that under the above assumptions the risks in (3)

and (4) and, in fact, of the predictors arising from (i) - (vi)

depend on X 0 and 8 only through (XTS) 2 and aT. Since (XT8) 2

- T 0 < r < 1, we may summarize the results in terms of

T $ and r. We consider p = 3,6,10. For a given p we generated

sets of 2p independent uniform random variables on the interval

[-1,1. In each case we considered the first p observations as

a p vector, standardized to length 1 and designated it as an X0'

Similarly the second p observations are considered as a B vector

with scaling by .1,1,10. Hence we have large OB , i.e. BB - 100,

moderate B T, i.e. B = 1 and small B T, i.e. T B .01. For

each X 0 , pair 1000 B 's were generated from (SI) and using

X each of the seven predictors were calculated for each of the

1000 replications. Bias, variance and mean square error (MSE)

were estimated. A large number of Xo,B pairs (approximately 400)

were investigated enabling a wide range of r's. Table 1 provides

a brief summary indicating the best 0 for prediction over ranges

.A

Page 18: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

14

for r along with the typical percentage reduction in risk

(using the best predictor over that range), loo(MSE XTAOL

-MSE X 0 )/MSE X 0 L0 0OLL

Several comments are appropriate:

(I) The cross-over points in Table 1 are approximate, but

.in the vicinity of the cross-over competing predictors are

indistinguishable with respect to MSE.

(ii) It is not surprising that regardless of B, if OT8 large

and r large, the OLS predictor is best. In fact, if STB large

and .01 < r < .5, the percent improvement of the best

predictor over OLS is never greater than 5%.

'-. (iii) As expected, BA , BA performed very badly, always sixthSo ao0 T

or seventh, doing well only when BTB large and r very

small (regardless of p). However,. in such cases, improve-

ments will be substantial, increasing as r decreases, while

the other five predictors are indistinguishable. Near r - .0

SB- is best; much below .01, BA is best.a c

V (iv) Op-2 is likely the best overall choice always amongst

the two or three best apart from cases in (i1) above.

TT. ."(v) When 0is small or moderate,an$

, .RM' OHK':: Op-2 MG

were always the best four. When BT B is small and p' 3,

BMG is close in performance to p-2' When is smallA A Tand p 6, RM and BMG split for second best. When B 8

:4' is small and p = 10, p-2 is second with 8 MG third.p-wM

Page 19: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

• 15A 1 A A

(vi) When r is large c 0 is almost always <0 whence - B OLS.

-o C 0

Table 1 reveals that in many cases substantial reduction

in squared error loss over the OLS predictor can be achieved.

It further suggests the possibility of selecting the predictor

according to 8T$ and r. However, finely detailed selection,

e.g. according to X0 , will be unsuccessful as the performance

.of. and 0, reveals. In practice we will have Aia2 instead of BT

co a0

T T -1 -12and r = (AX0(X X) X0 )- , and we might define estimators^% ^ 2A,r with 8 replaced by 8OLS, a replaced by a • We may calculate

A n-i -E(A)= - (p+A) whence A' = n-l -p Is UM.FVU for A. By an argu-

n-3 -ment similar to that contained in Theorem 2, we may show

E(r) = E(p+2(L+M)) (2L+l) : r+(l-rp)((p+A)-+(p+A) 3 2A) where

L,M are distributed as in (13). For individual predictions,

preliminary calculation of A' and r should enable a Judicious

choice of predictor.

As Thisted and Morris (1980, p. 19) observe, the poorest

estimation case for ridge procedures occurs when (with X TX - I,

B* = 0) BT = (l,O,0,...,o) with 81 large. This is also the

poorest estimation case for-Stein type procedures in the sense

that the first coordinate will account for about half of the

total risk and all coordinate risks would decrease if B1 was

excluded (see Baranchik (1964)). For prediction this implies

2 TA large and r = X0 1/XoX0 . Hence this is the poorest case for

prediction as well, i.e. depending upon X0 1, Improvement will

be small or, in fact, the OLS predictor will be better.

-ale

Page 20: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

16

i!ow will multicollinearity in XT affect prediction?

Let xTx D, a diagonal matrix with diagonal elements d and

assume d <d 2 ... d . Then the extent of multicollinearity

is usually measured in terms of how close d1 is to 0. Although

ridge methods have been advocated for improved estimation when

d is quite small2 particularly relative to the other d., Thisted

and Morris (p. 21) and others have established that in this case

optimal ridge as well as Stein-type estimators will produce

inconsequential improvement over 0OLS. For prediction (with

2 T22 2St = 0), A = Edi i and r - (X0B)0/(Xi/diEd iB).- - Bingham and

$, Larntz (1977, p. 102) observe that (in this notation) the worst

case for ridge estimation occurs when large 8i are associated

with small d This tells us little about the magnitude of A. How-

ever for fixed X0 and B as XT becomes more severely multicollinear

T^ 2var(X0BOLS) E Xi/di will grow larger and r will become smaller.

As the simulation suggests when r is small, using an appropriate

predictor, we can expect significant improvement over the OLS

predictor.

4. Multiple Prediction

In concluding we offer several comments regarding multiple

or simultaneous prediction. Suppose we wish to make r predictions

* defined by XlX 2 ,...,Xr and we set X* = (Xl,...,Xr). For

-.: convenience we assume the X are a linearly independent set whencei

rank (X*) =r < p. What is an appropriate loss to employ? One

Page 21: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

17

choice is unweighted sum of squared error loss, i.e.

E(X(a-_8)) 2 =.( -)TGI(8-8) with G = XuXaT. A second choice

T ^arises from the Joint distribution of the XiBOLS, i.e.

XOT OLS - N(x*ToV2(x*T(xTx)-Ix*)-I ) suggests (8-B)T G2 (0-B)

where G2 X*(XT(XTx)-Ix*)-lxT. Others may be envisioned as

well. If r < p, GI,G2 are positive semi-definite whence, as

noted earlier for individual prediction, we may have loss equal

to 0 but B not close to B. Nonetheless, it is well-known that

if P is any positive definite matrix X*T oLS Is admissible for

X'TB under loss ( -0)x (-0) if p 1 2, inadmissible if

p ! 3. In fact, work done by Berger (1976), Bhattacharya (1966),

Bock (1975), Casella (1977), Efron and Morris (1976) and

Strawderman (1978) leads to explicit minimax predictors which

improve upon x*T^OLS. These predictors will be generalized

adaptive ridge of the form

. (I + A(xTOOLS2 ,xgT(x x)-X,,P))-x*TSoLS* (see e.g. Strawderman,

Theorem 6, p. 626, for a family of such predictors),' parelleling,

in a sense, the individual predictors X . and X Bao 0

As Strawderman notes (p. 626) there is no one predictor

which will dominate the OLS predictor for all P. For P - I (i.e.

G0) a very simple procedure Is to use estimates of A and r to

select a good predictor for each Xi. If we are not prepared to

specify P the simulation study suggests that using 8 p-2 (i.e.

c 2 (p-2) in (4)) regardless of X. is a simple but perhaps

adequate choice.

A.6.

Page 22: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

~18

References

Baranchik, A. (1964). "Multiple Regression and Estimationof the Mean of Multivariate Normal Distribution," Dept.of Statistics, Stanford University, Technical Report #51.

Berger, James 0. (1976). "Admissible Minimax Estimation of aMultivariate Normal Mean with Arbitrary Quadratic Loss,"Ann. Statist. 4, 223-226.

Bhattacharya, P.K. (1966). "Estimating the Mean of a Multi-variate Normal Population with General Quadratic LossFunction," Ann. Math. Statist. 37, 1819-1824.

Bingham, C. and K. Larntz (1977). Comment.J. Amer. Statist. Assoc.72, 97-102.

Bock, Mary Ellen (1975). "Minimax Estimators of the Mean of aMultivariate Normal Distribution," Ann. Statist. 3, 209-218.

Casella, George (1977). "Minimax Ridge Regression Estimation,"Mimeograph Series #497, Dept. of Statistics, PurdueUniversIty.

Cohen, A. (1965). "Estimates. of Linear Combinations of theParameters in the Mean Vector of a Multivariate Distribution,"Ann. Math. Statist. 36, 78-87.

-. Dempster, A.P., Martin Schatzoff and Nanny Wermuth (1977)."A Simulation Stidy of Alternatives to Ordinary LeastSquares," J. Amer. Statist.-Assoc. 72, 77-106.

Draper, N., and R.C. Van Nostrand (1979). "Ridge Regression andJames-Stein Estimation: Review and Comments," Technometrics21 (4), p. 451-466.

Efron, B., and C. Morris (1972). "Limiting Risk of Bayes andEmpirical Bayes Estimators - Part II," J. Amer. Statist.Assoc. 67-, 130-139.

_____, __ii-_1. (1976). "Families of Minimax Estimators

of the Mean of a Multivariate Normal Distribution," Ann. Statist.4, 11-21.

Hoerl, Arthur E., and Robert W. Kennard (1970). "Ridge Regression:Biased Estimation for Nonorthogonal Problems," Technometrics 12,55-67.

Regression: Some Simulations," Comm. in Statist. 43 105-123.

Page 23: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

19

James, W., and C. Stein (1961). "Estimation with QuadraticLoss," Proceedins of the Fourth Berkeley Symposium, vol. I,ed. by Jerzy Neyman, pp. 361-379. Berkeley and Los Angeles:University of California Press.

Lawless, J. F., and P. Wang (1976). "A Simulation Study of Ridgeand Other Regression Estimators," Comm. in Statist. A5, 307-323.

V McDonald, Gary C., and Diane I. Galarneau (1975). "A Monte CarloEvaluation of Some Ridge Type Estimators," J. Amer. Statist.Assoc. 70, 407-416.

Stein, Charles (1956). "Inadmissibility of the Usual Estimatorfor the Mean of a Multivariate Normal Distribution," Proceed-ings of the Third Berkeley Symposium, Vcl. I, ed. by JerzyNeyman, pp. 197-206. Berkeley and Los Angeles: University ofCalifornia Press.

_______(1973). "Estimation of the Mean of a Multivariate-Normal Distribution," Proceedings of the Prague Symposium on

Asymptotic Statistics, 345-361.

Stone, 14. (1974). "Cross-validatory Choice and Assessment ofStatistical Predictions," J. Roy. Statist. Soc. B-36, 111-147.

Strtwderman, William E. (1978). "Minimax Adaptive GeneralizedRidge Regression Estimators," J. Amer. Statist. Assoc. 73,623-627.

Thisted, R., and C. Morris (1980). "Theoretical Results forAdaptive Ordinary Ridge Regression Estimators," Universityof Chicago Technical Report #94.

,v.I.

-4-:

Page 24: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

20

Footnctes

1. A possible refinement to using the predictor defined by

S with c = p-2) would employ the "limited translation"

approach as discussed in Efron and Morris (1972, p. 136).

Limiting the amount of shift for each coordinate of BOLS

toward the corresponding coordinate of 0S using a relevance

function, p, leads to an estimator 0 and resulting predictor

X 8 . We also note that the estimator, O resulting from0-1.

James and Stein (1961, p. 366) sets c = 2

With this c (19) becomes

a2TXI( T )-l X{+np2- 1(-p( 2_ 4-raX x) x 0 {I+(n-p+2) (n-p) (p2-1)f-2(n-p+2) (n-2)(p-2) r2).

2. We recognize that these simplifying assumptions diminish the

utility of the simulation study. In particular, certain

estimators which differ in a more general model become

equivalent. The study is only intended to be illustrativeand suggestive. Certainly a more elaborate one might be

undertaken. We also recognize that with these simplification,Pd"

expressions for the exact risks of some of the predictors

below (ignoring restrictions) may be obtained using (17).-%

2 ,.a

Page 25: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

21

va

4.,

,,

"4-

-4

4,...

<.4I

'.,.

0'0

4' ('3vs

Page 26: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

* 22

;m 0

4) 0A *i 02<

L44

to (a(~4

r. a CM a0.m4 . 1 '0 ca0ca 0c

4 0av vv(a cv a

-H t4 t4 04 H r- 0- t- 9 0 0

hO' S.-4 Ct) C) %-IA - 0 C) ~ L .- J ~ N r

.4 0 * (v 0 v v ~ 06~

a v a

0 A U"\ UN i-I v A Ul% fH V A LA\ LA\ H- V wl.4- m N~ H- 0 N~ H- 0 M r-I 0

* *4 41

SE-4 ~L

.5.. Cl) C)

94 b4 \.4 co '0o d % :0

V ti ca a4c a an a a2 a 4)IL0 tc

co 0 0 V) o 0

2) _4Z Xr -7 _- z

4 A LA L A IA v

$4 9. %4:4

4-')

H C7\

'I LA A %.

Cu 0% LA\2 %- % 0 2L

00. 4) U .

0.. 4V

OC~4 LiO .4 9

IV .

%- 94 .4

.4-r to

Nih N-J"-)

W. 1

Page 27: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

23

UNCLASSIFIEDSECURITY CLASSIFICATION OF THIS PAGE (Whn Data Enteare

IN," REPORT DOCUMENTATION PAGE REAF STRUCTIORSi" J'BEFORE COMPLETING FORMREOTNUMBER GVCSON ______________

. . 374

4. TITLE (and Subtitle) S. TYPE OF REPORT A PERIOD COVERED

On The Use Of Ridge And Stein-Type Estimators TECHNICAL REPORTIn Prediction

6. PERFORMING ORG. REPORT NUMBER

7. AUTHOR(a) 1. CONTRACT OR GRANT NUMBER(O)

Alan E. Gelfand N00014-86-K-0156

9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK

Department of Statistics ARA A WORK UNIT NUMBERS

Stanford University NR-042-267Stanford, CA 94305I, CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE

Office of Naval Research May 21, 1986Statistics & Probability Program Code 1111- 13. NUMBER OF PAGES

24I. MONITORING AGENCY NAME & ADDRESS(If dilferent from Controlllng Olllce) IS. SECURITY CLASS. (ol this report)

UNCLASSIFIEDI. DECLASSIFICATION/DOWNGRAOING

SCHEDULE

1. DISTRIBUTION STATEMENT (of thil Repore)

APPROVED FOR PUBLIC RELEASE: DISTRIBUTION UNLIMITED.

17. DISTRIBUTION STATEMENT (of the abstract entered in Slock 30, it dJllerent frm Repmrt)

18. SUPPLEMENTARY NOTES% ak

19. KEY WORDS (Continue on t.eee side iI neceeeary and idenltify by block nuimber)

Ridge-type estimators, Stein-type estimators, Regression.

20. ABSTRACT (Contirnue an reverse side iI necessary and identify by block number)

PLEASE SEE FOLLOWING PAGE.

....

rYFORM a.DD I FA 73 1473 EDITION Of I NOV65 IS OBSOLETES/ 0102- 014- 6601 1 UNCLASSIFIED_SECURITY CLASSIFICATION OF THIS PAGE (Whe Date Bntered)

do r r., , , , . . . ., . , . . . . , . , . 2 : ",". .,. : -, ,"." " . . ." . , . .",."." .". .'. .

Page 28: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

* - - . - -- -- -. - -. -_ 7 -m - _ - -C .-

.- UTCLASSIFIED- - aUcUTt CLASIFICATION OF ?Nil PAgE1 Mhs . & Rhore

TECHNICAL REPORT NO. 374

20. ABSTRACT

For the usual regression model with fixed regressors, there is a con-

siderable literature devoted to alternatives to ordinary least squares esti-

mators of the regression parameters. These alternatives are biased with

"small" variances resulting in reduced mean square error over some (perhaps

all) of the parameter space. Two prominent classes of such estimators are

ridge-type and Stein-type estimators.

Consider the simplest prediction problem in this context, i.e. prediction

at a single new vector of prediction values. We-eeleulat5 the risk (squared

,- error) for predictors based on estimators in the above families. While the

ordinary least squares predictor is admissible, a simulation study reveals

4that over regions of the parameter space substantial reduction in risk is

possible using estimators in these families. A simple preliminary procedure

based upon the vector of prediction values is given to select a -,good" esti-

mator from these families. It is apparent that in multiple prediction a

single choice of estimator need not be best.

*.2

S11ECURItY CLASSIFICATION OP THIS pAoE*W.(rDsf ,a

Page 29: AD-A168 ON THE USE OF RIDGE AND STEIN-TYPE ...AD-A168 349 ON THE USE OF RIDGE AND STEIN-TYPE ESTIMATORS IN i/i PREDICTION(U) STANFORD UNIV CR DEPT OF STATISTICS A E GELFAND 21 MAY

'!1 1

* /4'5-',4

>,"

'- .r*