estimation of sometimes—pool predictor in multiple ... · sample regression with the 1970 sample...

ESTIMATION OF SOMETIMES—POOL PREDICTOR IN MULTIPLE REGRESSION

ANALYSIS

A . M . KANDI L ABSTRACT

An objective statistical procedure is used as

an aid in deciding whether or not to pool two or more

regression equations. Sometimes-pool predictor is

derived . Relative efficiencies of the sometimes-pool

predictor to the never-pool predictor are obtained .

Biases of estimation of S.P. predictor and

computations of two numerical examples is presented.

KeyAsbrds, Sometimes-pool predictor , never-pool

predictor ■ .S.E ($ P.) , rn.s.E (NP) , relative

efficiency R.E

1— INTRODUCTION This paper is concerned with inference

procedures involving the use of preliminary tests of

significance to determine whether or not to pool two or

more linear regression lines with each others or

multiple regressiont.°

• Zagazig University, Banha Branh.

Since an investigator iS usually interested

in making inferences from a sample about the population

from which it was generated,we will be concerned

primarily with the effects that the pooling or not of

regression estimates has on subsequent inferences.

Bancroft (1944), Hosteller (1948), Bancroft

(1964),Kale and Bancroft (1967),Han and Bancroft

(1968),Larosn and Barr (1972)Studied the pooling

problem of data.

Suppose,the investigator suspects,but is not

certain that the conditions causing the underlying

linear relationship between Y and X are the same for

1969 as for 1970.Johnson,Bancroft and Han

(1977)considered the possibility of pooling the 1969

sample regression with the 1970 sample regression.

An important statistical problem in applied

substantive research is considered In this

paper.Suppose we have a lot of explanatory variables

and we wish to investigate the effects of these

variables on a response variable.In this case an

ordinary multiple regression analysis required a very

large sample to make inference about the popuation

1. 2 )

regression model of interest.Instead we wish to use an

objective statistical procedure as an aid in deciding

whether or not to pool two or more multiple regression

equations ,which are defined from a one suitable

sample,to make an inference about a common regression

model of interst

To illustrate our idea of this paper, let us

consider the case where we wish to investigate the

effects of two variables on a third.Thus,the ordinary

multiple regression model is defined as

Y. = Po + Pi Xi + f3z Zi + SI ; I= 1, 2, ..., n (1)

The least square method can be used to estimate fl o ,fl i

and p2 and consequently the estimated value of It.

Now,divide the set of explanatory variables

up into two parts with say the first variable X in the

first part and the remaining variable Z in the second

Then fit the first linear relationship to n

observations :

((X 1 ,Y 11 ), (X2 , Y22 ),...., (X , Y) , denoting this

fitted line by

Xis = Poi + Psi Xi + Ili; I= 1, 2, . . . , n ( 2)

similarly, fit the second linear relationship

to the same Sample (2 1 2 21 ) , (2' Y22 ) , .. ,(2

Yzn ),denoting this fitted line by

Y. .0 +p 2 +s.;.1.= 1, 2, ..., n (3) 02 12 v.

Let us make the ordinary assumptions about the true

deviation terms UL,Cland

In this paper we wish to use an objective

statistical procedure as an aid in deciding whether or

not to pool the estimated value of Y il ,from the first A

regression eqation (2)vith the estimated value of Y.

from the second regression equation(3) to make an

inference about a comnon regression model of interst as

in equation (1).

Such procedure leads to a "Some

times-pool"predictor which will be defined and studied

in section (2).Relative efficiency(R.E),mean square

errors (m.S.E),and biases of estimation of

sometimes-pool predictor are derived in section(3).Tvo

numerical examples are illustrated in section(4).

2. POOLING PROCEDURE

(2.0 TIE CASE OF TWO REGRESSION EQUATIONS

Consider the two regression model (2)and

Y. =0 +0+U;1= 1, 2,...,n of t

Zia A 002 + Piz Z. + s.; 1= 1, 2,...,n

Where XL and Z1 are known,UL and CL are normal

random variablestol and all are parameters of the

first model oz . fl are parameters of the second is

model,Y il is the i observation of a phenemenon for

the first model and Y . is the i th observation of the

same phenemenon for the secon model.

Since the determination coefficient R2

considered as a measure of the goodness of fit of the

fitted line to the observations,then we can use it as a A

weights for the estimated values Y. i vnd Y i2

(1=1,2, .... n)

Combining the two weighted regression A

models,the linear estimator for Y is defined as : A A A A A

t Y + W I . 1= 1, 2, ...,n ( 4)

t= 1, 2 (5)

Thus, we have

A R + R

0 A I ol 02 A A N A

t CI 2 02 141 + R

R2 A 1 11 A A

lc if .1. Ra — W1 (31 1 s z

R2 A 2 1;2 A A (32e _2

+ R - W 0

f a Xs

and hence

A A Yi, = 00, + 01, Xi + Pa Z j 1 = 1, 2 ..... n

Nov, we are interested in estimating the equation A A A A Y. = Po + pi xi + 02 Z i; i= 1, 2 ..... n (10)

A A A When it is suspected that 0, = Poc , Oa

A A A Pao and consequently Y. =

se We then test whether

h A these two equations Y i and Y. are significantly

LC n A

different by testing whether 0,, and 0 cc are

significantly different A A

different , 02and Paa are A

consequently whether Y

A and Pic are significantly

significantly different and A

and Y are significantly

different in the usual way

Consider the null hypothesis that the two

lines (9) and (10) are both estimate of the true line

MthereforetheestimatorofY.is defined as

A n rt + Os xi + (12

Z. if #K12 > X

A n a = /3 + 13

i 4' 0 Z. if 7 X

is ac a Y =

(vhererX

is the test statistics for testing H o : Y, A

r-yto and

X2 = En

A 2t t Y i t

ic) tic

value is the 100 (1-K) percentage point of this

distribution with (n-3) degrees of freedom.It should A

be noted that, since y i is the best unbiased estimate

of yi , then y. can be used to estimnate the valueiX2

if A y i is not defined.The estimator y i is referred to as the

sometimes-pool predictor.

(12)THE CASE OF MORE THAN TWO MULTIPLE REGRESSION

EQUAT I ONS

Let us consider the case where we have m >2

independent explanatory variables . The multiple

regression model is defined as

X = X 0 + t (13)

where y is a nxl random vector having the expected

value X B and covariance matrix 6 3 .1 , R, = 100 , Pe

..., pm ] is the vector of parameter s,X is n x (m + 1)

matrix with the vector X = I and m vectors x.( 1= 1,

2, ..., m) which are the values of m explanatory

variables and e is the vector of random variables, with

ordinary assumptions of least square method.

Divide the set of explanatory variables up into

P > 2 parts with, say

X = (X

, ..., Xir ) for the first part

X = (X22 , X22 , X22 ) for the second part

1 1 XP = ( Xpl , Xp2 , ..., X ) for the Pth p part

p Where X r = in (. 01 t

Then fit the P-rtlationships by

Y -x p +e,t - 1, 2, ..., p (14)

Where / t is a n x 1 random vector of observations, I t

is n x (r t + 1) matrix, pi is a (r t + 1) x 1 vector of

parameters for the t th , model.

Therefore. the estimator of / is defined as

Y== X 0 cif 1C22 5A 2

A A 1 = Z.0 >IX2

A A A A Where p [ p , p , p with e oe te me

P ^_ A

A L Al ot p - w . (16) P A

2 t Al t dot

I too R t

A2 A R p ^ 4 j t A A

Pic P A2

Wt Pit , J= 1,2,.., m; t= 1,2,..,p (17) t=1 R t

A A in is the test statistics for testing H Y = Y and 'X is the 100 (1-a) percentage point of the /( 2

distribution with (n- m- 1) degrees of freedom.

(3) ESTIMATION OF RELATIVE EFFICIENCIES AND BIASES OF SOMETIMES— POOL ESTIMATORS

(3.0 THE RELATIVE EFFICIENY OF Ye

In this shudy the estimated relative efficiency

(R.E) of the sometimes- pool estimator Y. the never-A

pool estimator Y. is defined to be the inverse ratio of

the arithemtic average of the mean square errors

(M.S.E)forytandthenevor- A poolestimat" L taken *-

over the observed values. In other words R.E. (Y.) is

defined as :- * A

R.E (Y) = H.S.E (Y.) / H.S.E (Y.) (18)

It is easy to show that

' 6 n H.S.E = E

1 c v20-2 oc . x) + c v 03

2 1 1 2

C E - ) 2 E

xi •.• tat

= E ( z. - • • E 3: t.•

62 = w 6+: + wZ te.

• I 1 + W b2 e2 N.S.E (Y:) = - (22-

Also , it is easy to show that

2 - I _R2 C

1Z + C2

R.S.E (Y ) E (I + i n C

1C2 n(1-R) 1=1

62 (2 1)

Z 1 Cl C

i 2:xi 31 iZ xi 31 + 2 ( c 1 (X ) + ( C

2 CI C

2 C (2 )

Lx i 2:7( ., 3 1 ) ( )(xz

c i C2 CI Ct

3 erz n

Where : E 2 n

= ( E x i 302

/ ( E x1 )

1=1 1=1 1-1

From equations (21) and (22), we have

* 3 R.E. (Y ) = --- i 2 42

(32 ) THE BIASE OF ESTIMATOR et

• Since Y i = vi cii + Te2

It is easy to show that

E (Yr) = (B + Lc X. + B2 Z.) + Y. (25)

Therefore

A A ( yi) = B. + + BA

2 2 4 i= 1,2,..;n (26)

A A A A A A ( WI P.. w. P.. - 0.) A

B s . ors - 1) fiC A n A

(W2 - 1) 02

C13) THE RELATIVE EFFICIENCY OF Y:w The relative efficiency of the sometimes- pool

A pr ed i ctor el.

the never - pool predictor Y. is

defined in the same fashion in section (3.1)

2 A 6- n — %

M.S.E (Y.) = ( x . X) (0 x) ) 1 J (28) t=1

(x. -x) = and 1 = 1, 2, ...

and - , • P 2

M.S.E n— E ( E w2

er (x (x..; (x.-xt n )41 ] -

in tit

p A t.

+ 2 5 5 vit vac COV ( 1„. Yud t.(2

A If Y

t (t= 1, 2, p) are independent linear

relationships

2 - M.S.ECYA= E [ E fr

Z ( XX) (x t a

t -. x

t n x ] t t - ■ii

and hence

R.E (Y ) 52 E int

( (x - x) (x'x) -* (x - t x) + n1 I

[5 tiZ di(X. - X) (X' t - t 1 x. 4. -tt -t

C14) THE BIAS OF ESTIMATOR Yi A A A

Since 0 w = v (x' x) -1 x' Y = HY (32)

then A

Oc - 0 = ( H - V -1 X ' ) Y = AY (33)

Where H = v (x' x) -1 x'

A = (H - (x' x) -1 x') (39)

and thus

A E ((3,) = E LA + (x' )5) -1 x')] Y

A =AX0+ 0 (35)

A A Bias ((3, ) =AX0 ( 36)

and therefore A A

Bias (Y**

) =XAX0=XAY (37)

4. NUMERICAL EXAMPLES

In this section, two numerical examples are given.

The first illustrates the case of two simple linear

regression, the second illustrates the case of more

than two multiple regression equations.

Example Lii

This example is based on the data presented by

John Hey (1973). Suppose economic theory leads us to

expect that the variable Y (household consumption

expenditure depends on variables X (household income)

and Z (household size). If ve assume that we have two

linear relationships as follows

Y (.1 Po, 4. Pis x i. + ui

and , i= 1, 2, .... 20

Y = 0 + 0 Z + c i2 02 12

Ourfomputations are :- A A A

R: = 0.9887 ; v z = 0.5554 ; 00z = 8.76268 ; 014 = 0.5763 A A R2 = 0.7914 ; vz = 0.4446 ; 002=-24.8362 ; Ozz= 22.6580

= 1763 . ; 0.3201 floc - 6 = 0 ; (3 2c - 10.0733

and therefore

= - 6.1763 + 0.3201 x + 10.0733 Zi

with S2

= 8.5943 ; R2

= 0.9949

Also , the ordinary multiple regression equation is

defined as :

= 1=1 Y

n( Y. - Yic )

= 0.5072 ; = 0.4857 = 4.7468

Y = 0.5072 + 0.4857 X + 4.7468 Z

2 with S2

= 1.2525 ; R2

= 0.9993 and R = 0.9991

2 where it is the adjusted of R

= 2.4853 with 9 d.f

C 2 12 w. Yic

= 2.19978 with 9 d.f

1)(2 = 16.92 < k (0.05.9) tr

Therefore Y = Yic

and R.E (Yi) = 40 %

Emma ill

This example is based on the data presebred by Jan

Kmenta (1971) where the variable Y ( food consumption

per head) as an endogenous variable and x i (ratio of

food prices to general consumer prices), x2 (disposable

income in constant prices), x 3 (ratio of preceding

year's prices received by farmers for products to

general consumer prices), and x 4 (time in years). If we

assume that we have three linear relationships as

follows

Yt i . 0 + 0 X. + u

y =(.3 +13 X E I . 02 22 I .

Y. + 0 X + 0 X + e t3 03 33

I. 43 i4

Our comptations are 1' 2 R =

2 R = h, Ra

1102 =

0.0096

0.5947

0.1057

94.6788

77.0146

98.7258

1 = 3 A 0ii=

73 22=

0.00049

0.062182

0.24487

0 : 0049

0.0159

0.9833

0.00081

0.1383

and hence,

; = 94.7688 + 0.0622 Xti A Y a

= 77.0146 + 0.2449 X1. 2

A Y = 98.7258 + 0.0049 I .

0.1383 X,4

Also, A

floc = 77.3125

= 0.0000039 Az Rc

= 0.97069

A (ii= = 0.000987

134c = 0.0001116

0 = 0.96287

172c = 0.240787 Az S = 7.2614

and hence

A YiC = 77.3125 + 0.000987 X . + 0.240787 Xt2 0.0000039%3

+ 0.0001116 X ,

Also, the ordinary multiple regression equation is

defined as :-

Y = 101.2215 - 0.34497 Xtc

+ 0.3632 X .2 ■

+ 0.00109 X .

- 0.13379 X

A2 A S = 3.3354 ; 712 = 0.8134 ; 12-2 = 0.1636

n ( Y - )2

za - E - 0.5853137

tat yC

A22 - E

A ( Y. - Y. )2 IC.

- 1.082596 Y. LC

11( 2 = 4.601 ( . 0,45)

Therefore

and R.E (YIN*

) = 0.95 %

ACKNOWLEGEMENT

I am grateful to Dr. M.A. ANWAR for his helpful

comments. I am also, grateful to Mr. A.W. AWAD for

revising the paper linguistically and I should like to

thank Mr. MEDHAT for his help in calculating and typing

the paper.

REFERENCES

(1) Bancroft, T.A. (1944) "On Biases in Estimation Due

to the use of preliminary tests of significance"

Annals of Mathematical Statistics 15, 190- 204.

(2) Bancroft, T.A (1964) "Analysis and Inference for

Incompletely specified models involving the use of

preliminary test (s) of significance" Biometrics

20, 427- 442.

(3) Han, C.P. and Bancroft, T.A. (1963) 'On pooling

means when variance is unknown". J.A.S.A. 63,

1333 - 1342.

(4) Jan, K. (1971) "Elements of Econometrics"

Macmillian Co. N.Y P. 565.

(5) John, D.H. (1973) "Statistics In Economics"

Praeger pub. P. 316.

(6) Johnson, J.P.; Bancroft, T.A.; and Han, C.P.

(1977) "A pooling Methodology for Regressions in

prediction" Biomtrics 33, 57 - 67.

(7) Larson, H.J., and barr, D.B. (1972) "Anoting on

pooling Regression Estimate". American

statistician 26, No. 5, 35.

(8) Mosteller, F (1948) "On pooling Data(' J.A.S.A. 43,

231 - 242.

(1 F )

estimation of sometimes—pool predictor in multiple ... · sample regression with the 1970 sample...

Documents

multiple regression - information services & technology (ist...

linear regression models - columbia...

categorical predictor variables - mcmaster...

multiple regression in spss gv917. multiple regression...

regression with categorical predictor...

chapter 1 – linear regression with 1...

introduction to regression · introduction toregression...

extremal linear quantile regression with weibull...

simple linear regression - uliege.be · 22 simple linear...

multiple and complex regression. extensions of simple linear...

linear regression with one predictor variable

stk4900/9900 - lecture 7€¦ · stk4900/9900 - lecture 7...

kclpure.kcl.ac.uk · web viewthe mcq-k variable was...

multiple regression chapter 1313 multiple regression...

chapter 1 linear regression with one predictor variable

simple and multiple linear regression: sample size

simple linear regression...

multiple regression. multiple regression typically, we want...

lecture 1a: linear regression with one predictor variable

sample selection regression models (ch....