linear structural relation - summitsummit.sfu.ca/system/files/iritems1/4166/b13525062.pdf · 2 ox...

THE ERRORS I N VARIABLES MODEL: ESTIMATION O F

THE LINEAR STRUCTURAL RELATION

by

Stephen W e r n e r

B. Sc. , S i m o n Fraser U n i v e r s i t y , 1969

A T H E S I S SUBMITTED I N PARTIAL FULFILLMENT OF

THE R E Q U I R E m N T S FOR THE DEGREE O F

MASTER O F SCIENCE

i n the D e p a r t m e n t

0 f

M a t h e m a t i c s

(C) STEPHEN WERNER 1973 .L.'

SIMON FRASER UNIVERSITY

M a r c h 1973

All r igh ts reserved. T h i s thesis may n o t be

reproduced i n w h o l e or i n part , by photocopy

or other m e a n s , w i t h o u t p e r m i s s i o n of the

author.

APPROVAL

Name: Stephen Werner

Degree: Master of Science

T i t l e of Thesis: The Errors i n Variables Model: Estimation of the

Linear Structural Relation

Examining Committee :

Chai man : C. Y. Shen

- >---- . C . Villegas

Senior Supervisor

D. Eaves

- - - - . R. Rennie

D. Mallory V External Examiner

Date Approved: 16, 1973

(ii)

PARTIAL COPYRIGHT LICENSE

I h e r e b y g r a n t t o Simon F r a s e r U n i v e r s i t y t h e r i g h t t o lend

my t h e s i s o r d i s s e r t a t i o n ( t h e t i t l e o f which is shown below) t o u s e r s

o f t h e Simon F r a s e r U n i v e r s i t y L i b r a r y , and t o make p a r t i a l o r s i n g l e

c o p i e s o n l y f o r s u c h u s e r s o r i n r e s p o n s e t o a r e q u e s t from t h e l i b r a r y

o f a n y o t h e r u n i v e r s i t y , o r o t h e r e d u c a t i o n a l i n s t i t u t i o n , on i t s own

b e h a l f o r f o r one of i t s u s e r s . I f u r t h e r a g r e e t h a t pe rmiss ion f o r

m u l t i p l e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be g r a n t e d

by me o r t h e Dean of Graduate S t u d i e s . It is unders tood t h a t copying

o r p u b l i c a t i o n of t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t b e a l lowed

w i t h o u t my w r i t t e n pe rmiss ion .

T i t l e of ~ h e s i s / ~ i s s e r t a t i o n :

GRRoRL' ,^I J~R,,E.LGs I ' (o\E~ :

ABSTRACT

Consider the l inear re la t ion Y = a f @ X where a and 6 are

constants which we wish t o estimate from a sample of n p a i r s of

observations. We cannot d i rec t ly measure X o r Y because of e r rors

of observation, ra ther we measure x = X + & and y = Y + 6. The random

variables & and 6 are normally d i s t r ibu ted with mean zero and f i n i t e

variances 2 2 al and O2 respectively. I f X is also normally d i s t r ibu ted

(independently of E and 6 ) with mean zero and variance 2 OX we are i n

a s i t ua t ion known as the l i nea r s t r u c t u r a l model. I f the d i f f e r en t values

of X a re considered as addit ional parameters we a re i n a s i t ua t ion known

as the l i nea r functional model. Chiefly we w i l l deal with the s t ruc tu ra l

model and it is the case t h a t no parameters of t h i s model can be consistently

estimated. The purpose of t h i s paper i s t o show why t h i s i s s o and t o show

what is required i n the way of ex t ra knowledge o r assumptions i n order t h a t

we may consistently estimate these parameters. We are chief ly concerned

with estimating f3 and we give estimates f o r the various cases as they

a r i s e . In many cases an asymptotic variance of the estimate i s also given.

The l a s t chapter of the paper i s essen t ia l ly concerned with confidence

in t e rva l s f o r B .

ACKNOWLEDGMENT

I would l i k e t o take t h i s opportunity t o express my g r a t i t u d e t o

Professor Cesareo Vi l legas f o r suggest ing the t o p i c and f o r h i s he lp and

encouragement while preparing t h i s t h e s i s . I would a l s o l i k e t o thank

Simon Frase r Universi ty and the Simon Frase r Universi ty P r e s i d e n t ' s

research g r a n t f o r t h e i r f i n a n c i a l support. F i n a l l y , I want t o thank

M r s . A. Gerencser f o r typing the e n t i r e work.

TABLE OF CONTENTS

PAGE

T i t l e Page

Approval

Abst rac t

Acknowledgment

Table of Contents

L i s t of Tables

CHAPTER 1 In t roduc t ion :

1.2 The C l a s s i c a l Leas t Squares So lu t ion

1 . 3 The Two Sub-Models of t h e E r r o r s i n Var iables

Mode 1

1.3.1 The Funct ional Model

1.3.2 The S t r u c t u r a l Model

1.4 Example

CHAPTER 2 Leas t Squares and Maximum Likel ihood I

2.1 I d e n t i f i a b i l i t y of t h e Parameters

2.2 Leas t Squares Est imation

2.3 Maximum Likel ihood Est imation

2.3.1 Knowledge of one e r r o r va r i ance

2 2 2.3.2 Knowledge of t h e r a t i o A = CS /cf

2 1 2

2.3.3 Both 0 and CS: known

2 . 3 . 4 When a i s known

2 .4 Example

CHAPTER 3 Est imates Derived from Grouping t h e Data ,

3.2 Example

i

i i

iii

i v

v

v i i

1

2

5

CHAPTER 4 Instrumental Variables '

4.1 One Instrumental Variable Observed Without

Er ro r

4.2 Two Instrumental Variables Observed With E r r o r

4.3 Example

CHAPTER 5 Contro l l ing the Observations - CHAPTER 6 Cumulants (-

CHAPTER 7 The Analysis of Variance

7.1 Replicat ion of the Observations

7.2 The Analysis of Variance f o r an Instrumental

Variable

7.3 The Analysis of Variance f o r the Method of

Grouping

7.4 Example

PAGE

39

3 9

CHAPTER 8 Confidence I n t e r v a l s and Tes ts of Hypotheses , ,' 75

8.1 Confidence I n t e r v a l f o r @, No Extra Information 75

8.2 The Case When h is Known 78

2 2 8.3 The Case When Both cll and c12 a r e Known 81

8.4 The Use of Ins t rumenta l Variables 82

8.5 Confidence I n t e r v a l s Based on Wald's Method 83

8.6 Confidence I n t e r v a l s Using Three Groups 86

8.7 Tes t ing Equal i ty of Lines Derived from Severa l 87

Runs i n the Berkson Model

8.8 Confidence I n t e r v a l s f o r t h e Replicated Case 89

BIBLIOGRAPHY 94

LIST OF TABLES

Table 1.1 Values of x and y

1.2 Replicated Values of x and y

2.1 Iden t i f i ca t ion of B

2.2 Example

3.1 Optimum Proportions

3 . 2 Example

7.1 Anovar Table f o r Replication

7.2 Anovar f o r Regression with an Instrumental Variable

7.3 Example

PAGE

8

9

13

25

35

38

60

6 9

7 4

CHAPTER 1

INTRODUCTION

Consider a l i n e a r r e l a t i o n Y = a + 6 X between two unobservable

va r i ab les X and Y , where a and 6 a r e unknown constants . The

purpose of t h i s paper i s t o p resen t var ious es t imates of t h i s r e l a t i o n .

I n p a r t i c u l a r we w i l l be concerned wi th e s t ima t ing 6 , t h e slope.

A t y p i c a l experiment w i l l c o n s i s t of n observations (xi, yi) where

W e assume t h a t the Ei a r e i d e n t i c a l l y and independently d i s t r i b u t e d with

mean ze ro and f i n i t e variance 2

0 . S imi la r ly 6 w i l l have mean ze ro and i

f i n i t e variance 0:. Unless otherwise s t a t e d both E and 6 w i l l be

assumed t o follow a normal d i s t r i b u t i o n and be independent. The t r u e values ,

'i and Y w i l l always be assumed independent of E and 6i, f o r every

i t i

The following a r e we l l known d e f i n i t i o n s which w i l l be use fu l throughout A

t h e balance of t h i s paper. Let 8 denote an a r b i t r a r y parameter and l e t en

be an es t imate of 8 based upon t h e random sample xl, x2, ..., x . n A

Defini t ion 1: A cons i s t en t es t imate 8 of 8 i s one which converges i n n

p r o b a b i l i t y t o 8. That i s , Gn i s c o n s i s t e n t i f , V E > 0 ,

n

Defini t ion 2: 8 i s an unbiased es t imate of 8 i f E (gn) , the expected n

value of 6, i s equal t o 8. It is n o t necessa r i ly t r u e t h a t a cons i s t en t

es t imate is unbiased.

h

Defini t ion 3 : 6, i s a s u f f i c i e n t s t a t i s t i c i f , given the value of On , t he cond i t iona l d i s t r i b u t i o n of xlr X 2 , . . . X n i s independent of 8. In o the r words we g e t no e x t r a knowledge on the value of 8 by

having complete knowledge of the sample values.

Def in i t ion 4: Let X1 and X2 be real-valued continuous random var iab les

w i t h d i s t r i b u t i o n funct ions F1 ( 0 ) and F2 ( * ) r e spec t ive ly . I f , f o r every

r e a l number z ,

we say t h a t F3 is the convolution of F1 and F2 and w r i t e F3 = FI * F2.

It i s w e l l known t h a t i f X1 and X2 a r e independent then F3 i s t h e

d i s t r i b u t i o n of X .t X2. 1

Defini t ion 5 : We say t h a t a d i s t r i b u t i o n funct ion F1 i s d i v i s i b l e by a

d i s t r i b u t i o n function i f the re e x i s t s a d i s t r i b u t i o n funct ion F3 such

t h a t F1 = F2 * Fg.

1.2. The C l a s s i c a l Least Squares Solut ion

Let x and y be r e a l valued random var iab les wi th f i n i t e second

moments , perhaps independent,

def ined on the same p r o b a b i l i t y space of r e f erence. Let a and 6 be

constants and consider t h e random var iab le Z = a + f3 x.

The regress ion l i n e of y on x i s defined as t h a t l i n e y = a + B x 2 1 where a and 6 a r e chosen s o t h a t E(y - Z) i s minimized . Let

2 d = y - Z and consider the minimization of E(d ) .

2 Let E(x) = p and E(y) = V. Let ox = E(x - p) 2 2 2 and a = Ely - V ) .

Y

Thus

The only term involving a i s (a - V + 6 p) 2 and hence E(d ) i s minimized wi th r e spec t t o a by p u t t i n g (a - v + . f3 p ) = 0. Thus we have

S u b s t i t u t i n g t h i s value of a i n t o equation ( 1 . 2 , d i f f e r e n t i a t i n g wi th

r e spec t t o 6 and s e t t i n g equal t o zero g ives

The l e a s t squares regress ion l i n e of y on x i s then

1 --

- This is not the usual d e f i n i t i o n of a regress ion curve, E (Y I x) , although it i s t r u e t h a t when x and y have a b i v a r i a t e normal d i s t r i b u t i o n t h i s regress ion l i n e w i l l coincide with t h e regress ion curve of y on x.

We know t h a t Cov ( x i y) and O2 can be consistently estimated by X

and

respectively, thus

i s a consistent estimate of B. We also see t ha t

i s a consistent estimate of a.

The expressions i n (1.7) and (1.8) are the values t h a t minimize

2 1 (yi - a - @ xi) and fo r t h i s reason are cal led the l e a s t squares estimates of a and @. No er rors have been associated with e i t h e r x o r y and no claim has been made of a l i nea r re la t ion between them. The under-

lying model for t h i s case i s y = a + @ x + 6 , where 6 is a random

variable with E (6) = E (x6) = 0 and represents random var ia t ion associated

with the data.

Let Y and X be l inear ly re la ted random variables with Y = a + @ X

and l e t Y be unobservable. Our observations are 'i and yi, f o r

i = 1 2 . n where yi = yi + G i 1 ~ ( 8 ) = 0 and Y and 6 are independent. The underlying model f o r the l e a s t squares regression l i ne

of y on X i s then y = a + @ X + 8. ~t i s c lear t h a t , with X

replac ing x above, these two models may be considered i d e n t i c a l . There-

h

f o r e the s t a t i s t i c s and $ a s given by (1.7) and (1.8) a r e

c o n s i s t e n t es t imates of a and 6 .

From the symmetry of the model w e may de r ive s i m i l a r e s t ima tes when

X i s observed with e r r o r and Y is n o t , t h i s l i n e being c a l l e d the

regress ion l i n e of x on Y where x and Y a r e now our observations.

When both Y and X a r e s u b j e c t t o e r r o r at tempts have been made

t o compute both of the regress ion l i n e s of y on x and x on y (which,

i n genera l , a r e d i f f e r e n t ) and "average" them t o g e t an es t ima te of the

t r u e l i n e Y = a + 6 X. While it i s t r u e t h a t 6 l i e s between the s lopes

of these two l i n e s we w i l l no t be able t o f i n d it i n t h i s way.

1.3 The Two Sub-Models of the Errors i n Variables Model

A s of y e t w e have n o t ind ica ted how t h e t r u e values X and Y behave.

There a r e two b a s i c models, a l l s a t i s f y i n g t h e assumptions of the f i r s t

s e c t i o n , which we consider . They a re the func t iona l model and the s t r u c t u r a l

model, t h e l a t t e r being the chief concern of t h i s paper.

1.3.1 The Functional Model

I n t h i s model t h e t r u e values X and Y a r e considered t o be f ixed

(non-random) o r mathematical v a r i a b l e s , both s u b j e c t t o e r r o r s of observation.

It i s the case t h a t X takes on a s e t of f ixed , unknown values: X1, X2,

. . . , Xn c a l l e d " inc iden ta l parameters' ' by Neyman and S c o t t [ 45 ] . Although . - - . - - - .

we w i l l no t e s p e c i a l l y consider t h i s case of the e r r o r s i n va r i ab les model

i n t h e balance of the paper it is worth not ing t h a t when w e have r e p l i c a t e d

observat ions , i . e . when f o r each i we take Ni

a d d i t i o n a l observations

on Xi and Y (see chapter 7 ) , t h i s model i s e s s e n t i a l l y no d i f f e r e n t i

from the model t o be described below.

There is an i n t e r e s t i n g paper by S o l a r i [ 5 5 ] which shows t h a t when

a = 0 and the maximum l ike l ihood equations a r e solved we achieve, not a

maximum, b u t a saddle p o i n t and t h a t no maximum l ike l ihood s o l u t i o n e x i s t s .

She presumes t h a t t h i s w i l l a l s o be t h e case f o r a n o t equal t o zero.

For a f u l l e r d iscuss ion on the func t iona l case , r e f e r t o t h e Bibl io-

graphy, e s p e c i a l l y Kendall [29, 301, S ~ r e n t [56] and Vi l l egas [59, 60, 611.

1.3.2 The S t r u c t u r a l Model

The model we descr ibe here w i l l be t h e underlying model f o r the balance

of t h i s paper , although from time t o time some of t h e b a s i c assumptions w i l l

be a l t e r e d .

The ch ie f d i f f e rence between t h e s t r u c t u r a l and the func t iona l models

is t h a t i n t h e s t r u c t u r a l case the t r u e values a r e random var iab les . Our

b a s i c model has X (and thus Y ) fol lowing a normal d i s t r i b u t i o n . Let CI

E(X) = p and E(Y) = V and l e t X have a f i n i t e variance d From a x ' random sample. (xl, yl) , (x2, y2) , . . . , (xnr yn) of s i z e n w e s e e t h a t

From equations (1.9) w e s e e t h a t the model has s i x unknown parameters :

2 2 2 a , (3, ?..I, OX, o1 and 02. Since x and y have a b i v a r i a t e normal d i s -

2 2 t r i b u t i o n with parameters ?..I, V , Ox, O and Cov(x, y) it i s c l e a r t h a t

Y

even p e r f e c t information of these parameters w i l l no t be s u f f i c i e n t t o

provide information on the parameters of the s t r u c t u r a l model.

Thus the b a s i c s t r u c t u r a l model, a s it s tands , i s u n i d e n t i f i a b l e and

2 a l l t h a t we a r e able t o es t imate i s p , V , Ox, O2 and COV (x , y ) . Our Y

r e a l i n t e r e s t i s i n es t imat ing B and a and unless we a r e given some

a d d i t i o n a l knowledge, o r a r e prepared t o a l t e r the model, we cannot do this.

This paper dea l s then with ways and means of e s t ima t ing the l i n e a r

s t r u c t u r a l r e l a t i o n i n those cases i n which it i s p o s s i b l e t o do s o and

g iv ing those cases which do and do no t l ead t o c o n s i s t e n t es t imates . I t i s

- - c l e a r t h a t i f b i s a cons i s t en t es t imate of 6 then a = y - b x i s a

c o n s i s t e n t es t imate of a. Thus es t imates f o r a a r e t i e d up i n es t imates

f o r B and we s h a l l n o t consider them f u r t h e r .

I t should be noted a t t h i s p o i n t t h a t use of the words " s t r u c t u r a l f '

and "funct ional" has no t been f ixed i n t h e l i t e r a t u r e , f o r ins t ance Lindley

[ 3 7 , 381 uses the word func t iona l t o denote what we c a l l s t r u c t u r a l models.

I t i s o f t e n d i f f i c u l t t o d i s t i n g u i s h between the two cases and although the

d i f fe rences may be q u i t e minor, i n f a c t o f t en g iv ing t h e same numerical

r e s u l t s , it i s no t c o r r e c t t o force d a t a t o an inappropr ia t e model. This

i l l u s t r a t e s an o f t en neglected r u l e i n s t a t i s t i c s : never p i c k a model o r

decide on t h e type of inferences t o be made a f t e r the d a t a has been c o l l e c t e d ,

t h e r e is a good chance of in t roducing a s e r i o u s b i a s i n t o t h e r e s u l t s . The

c o r r e c t time t o choose a model is before any observations a r e made, s e e

Acton [ 1 1 on t h i s po in t . \_

1.4 Example

To i l l u s t r a t e the var ious es t imates given i n the fol lowing chapters

we give an a r t i f i c i a l l y generated example. Since our main i n t e r e s t is the

es t imat ion of the parameters a and f3 and s i n c e t h e b e s t es t imate of a - -

is a = y - b x , where b is the es t ima te of 6, we only given t h e calcu- l a t e d value of b f o r the various es t imates .

A l l of t h e da ta was drawn from random normal t a b l e s [47] and t r ans - -- - ..

2 2 2 formed s o t h a t E(X) = 10, CT = 4, E(E) = ~ ( 6 ) = 0, CT1 = -04 and O2 =

X

,0625. The l i n e chosen was Y = 2 + 2X and 30 values of X were

obtained, from which we ca lcu la ted 30 values of Y. We chose 60 values

each f o r E and 6 , t h e l a s t 30 of which were used only f o r chapter 7

i n which we need r e p l i c a t e d observations.

I n the t a b l e s below we give only t h e computed values of x = X + E

and y = Y + 6 because, i n an a c t u a l experiment, t h e s e would be

observations.

Table 1.1 VALUES OF x AND y

our only

The next t ab le only applies fo r repl icated observations, the order of the

observations is the same as i n the table above. For example the i - j

en t r i e s i n Tables (1.1) and (1.2) both correspond t o the same t rue value 'ij.

Table 1 . 2 REPLICATED VALUES OF x AND

We w i l l require the following s t a t i s t i c s and thus l is t them here.

The s t a t i s t i c s apply only t o Table 1.1. They are not required for the

repl icated data.

CHAPTER 2

LEAST SQUARES AND MAXIMUM LIKELIHOOD

2.1 I d e n t i f i a b i l i t y of the Parameters -

Let us consider the b a s i c s t r u c t u r a l model a s o u t l i n e d i n s e c t i o n 1.3.

It was mentioned t h a t es t imat ion of f3 i s no t p o s s i b l e wi th the model a s

ou t l ined , we now consider why t h i s i s s o .

The f i v e moments given i n s e c t i o n 1 .3 completely determine a b i v a r i a t e

normal d i s t r i b u t i o n and the parameters may be est imated by the sample

moments which a r e s u f f i c i e n t s t a t i s t i c s . These es t imates a r e ,

- -1 x = n C xi

i=l

- y = n

-1 " i Z l Y i

2 -1 - 2 Sx = n C (xi - X) i-1

2 -1 - 2 S = n 2 ( y i - y )

Y i-1

-1 - - S = n C (xi - X) (yi - y) .

XY i=l

A s previous ly pointed o u t we cannot es t imate our s i x unknown p a r ame t e rs

with these f i v e equations; unless we can somehow "assign" a value t o a t

l e a s t one, no f u r t h e r es t imat ion i s poss ib le .

The f i r s t two of the equations i n s e c t i o n 1.3 do no t con t r ibu te any

information i n es t imat ing the o t h e r parameters. Thus we drop them and

consider

The parameters i n (2.2) a r e un iden t i f i ab le , meaning t h a t they cannot

be determined uniquely from the j o i n t d i s t r i b u t i o n of our observed v a r i a b l e s .

Following the terminology of Reiers$l [ 5 0 ] we s h a l l r e f e r t o a s t r u c t u r e when

our parameters and d i s t r i b u t i o n s i n the model have been spec i f i ed . I f

P(x, y ) denotes the d i s t r i b u t i o n of our observed va r i ab les the re w i l l e x i s t

an i n f i n i t y of s t r u c t u r e s genera t ing P (x, y) . These s t r u c t u r e s a r e c a l l e d equivalent i n the sense t h a t they a l l generate t h e same d i s t r i b u t i o n P(x, y )

b u t t h e parameters do not necessa r i ly have t h e same value i n each s t r u c t u r e .

For example, i f x and y a r e j o i n t l y d i s t r i b u t e d a s above with E (x) = P I

= 3, o2 = 9 and cov(x, y ) = 4 then we f i n d t h a t both of the E(y) = V , ox Y

s t r u c t u r e s

l ead t o t h e same j o i n t d i s t r i b u t i o n P (x, y ) of x and y.

I f S1 is any s t r u c t u r e then an equivalent s t r u c t u r e S2 genera t ing

the same P(x , y ) may be formed t ak ing y z 0 such t h a t Y + # 0 2 - 2

and y < o2 6-I OX . S2 i s then formed (Moran [41]) by rep lac ing 2 -. O x '

2 2 2 2 -1 2 2 o,, 02, 8 and a with @ox(i3 + y ) - l I o2 1 + Y o x ( f 3 + ~ ) I o2 - B y o x B + y and a - y ,J r e spec t ive ly .

We say t h a t a parameter i s i d e n t i f i a b l e i f a l l equivalent s t r u c t u r e s

lead t o t h e same value of the parameter. Thus w e say t h a t the parameters

i n t h i s model, i n p a r t i c u l a r 6 , a r e not i d e n t i f i a b l e .

We now consider th ree theorems which w i l l t e l l us under what condi-

t i o n s 6 is an i d e n t i f i a b l e parameter. The proofs , which a r e s t r a i g h t -

forward, may be found i n ~ e i e r ~ ~ i l _ l ~ ~ - ] -and (theoxem 1-- -Bty) in-

Reiersgfl and Koopmans [ 351 . . .-- - --- - -

Theorem 1. I f E and 6 a r e normally d i s t r i b u t e d , no t necessa r i ly

independent, 6 i s i d e n t i f i a b l e i f and only i f n e i t h e r X nor Y

i s normally d i s t r i b u t e d .

Theorem 2 . When 6 is i d e n t i f i a b l e t h e o t h e r parameters a r e a l s o iden-

t i f i a b l e i f and only i f n e i t h e r X nor Y is d i v i s i b l e by a normal

d i s t r i b u t i o n (see d e f i n i t i o n 5 , chap te r 1.) and exac t ly one of E and

6 i s i d e n t i c a l l y zero.

Theorem 3. When E and 6 a r e independent and X is normally d i s t r i -

buted then f3 i s i d e n t i f i a b l e i f and only i f the d i s t r i b u t i o n s of n e i t h e r

E nor 6 a r e d i v i s i b l e by a normal d i s t r i b u t i o n .

W e may l ist the i d e n t i f i a b i l i t y of f3 i n t h e various cases w i t h the

following t a b l e given by Reiersgfl 1503. -

@ = O o r B = m I f3 no t i d e n t i f i a b l e X no t nornmlly d i s t r i b u t e d I i d e n t i f i a b l e X normally

d i s t r i b u t e d

Nei ther P ( E l nor P (6 )

Table 2 .l. IDENTIFICATION OF 8

d i v i s i b l e by a normal I $ i d e n t i f i a b l e

CASE

d i s t r i b u t i o n

E i t h e r P ( E ) o r P ( 6 )

Conclusion on B

d i v i s i b l e by a normal 6 not i d e n t i f i a b l e I d i s t r i b u t i o n I

B + O

f3 f i n i t e

I t i s c l e a r t h a t i f x and y a r e independent then w e may assume

t h a t X and Y a r e cons tant and t h a t f3 may have any value. I f X

and Y a r e no t independent then B i s no t zero o r i n f i n i t e . Thus we

have t h e conclusion f3 is n o t i d e n t i f i a b l e i f f3 = 0 o r B = 03 i n

t a b l e 2 .l.

-

Least Squares Est imation

A survey o f the e r r o r s i n va r i ab les model would be incomplete wi thout ,

mention of some o f the many l e a s t squares es t imates proposed over t h e yea r s ,

f o r l e a s t squares has been pursued by almost everyone concerned wi th a

regress ion problem. The b a s i c idea is t o minimize a s u m of squares

(poss ib ly weighted) of r e s idua l s i n some d i r e c t i o n . The sum of absolute

values has a l s o been considered, b u t does n o t lend i t s e l f t o Calculus,

being discontinuous a t the o r i g i n , and we s h a l l no t cons ider it f u r t h e r .

When only one va r i ab le , Y say , i s observed with e r r o r , the es t imate

derived i n chapter 1 i s an e f f i c i e n t and c o n s i s t e n t e s t ima te of B .

When, however, X i s a l s o s u b j e c t t o e r r o r b is n e i t h e r c o n s i s t e n t

nor unbiased. An exception t o t h i s i s when w e have r e p l i c a t e d observations,

however t h i s w i l l be defer red t o a l a t e r chapter .

Divide t h e numerator and denominator o f b i n equation (2.3) by

n . The denominator converges i n p r o b a b i l i t y t o 2 ox. The numerator may be

w r i t t e n a s

- x-IJ 1 - The expressions - E(yi - V ) , - 2(xi - p ) V and (x - p ) v a l l converge n n

i n p r o b a b i l i t y t o zero. Thus the numerator converges i n p r o b a b i l i t y t o

t h e same l i m i t a s the f i r s t expression. This l i m i t is $ o:/o: which we

may w r i t e a s

R i c h a m o n and Wu bJ.i.give t h e mean of b as /----

The expressions i n equations (2 .4 ) and (2.5) a re c lear ly the same.

~ h u s b i s a consis tent and unbiased estimate of

not of f3.

We now consider some of the attempts made t o take the e r rors of the

x observations i n t o account. I t should be noted tha t a l l of the estimates

i n the balance of t h i s sect ion (except for the l a s t one) w i l l , i n general,

be inconsis tent .

It should perhaps be mentioned a t t h i s point t ha t consistency is not

an important property i n small samples (Madansky [ 3 9 ] ) since we a re never .---. --- . -.

too sure jus t how close b i s t o B . Consistency r e l a t e s t o i d e n t i f i a b i l i t y

i n the sense t h a t i f no consis tent estimate e x i s t s then the parameters a r e

not iden t i f iab le , i .e. we have too many parameters.

One of the e a r l i e s t authors, Adcock [2], suggested minimizing the

sums of squares of the normal distances from the observed points t o the

461 ca l led t h i s the major axis of the correlation t rue l i n e . fl-earyan L.

e l l i p s e , making an angle 0 with the X-axis where

tan 28 = 2'11

"20-"02

where the 'ij

denote population moments.

Solving fo r 0 we see t h a t

(lJ20-~02kd ("20-llo2 1 L+4"L 8 = Tan 0 = - 11

From equations (2.21 we see tka t sgn@) = ~gnC1-1~~) and hence in

equation (2 .7 ) we w i l l take the positive square root as the negative

root would imply sgn(B) = -sgn(pll) . The population constants

1-111~ 1-120 and pO2 may be estimated from

the sample moments Mll, M20 and MO2 respectively where

' ~ e t T be the estimate of 8 . Ne thus have as estimates of @ and

b = tan T

The standard deviations of these estimates are given by - Kermack and Haldane -- - - -- - - - I-----.

[33] and are

where r, the sample correlation coefficient, i s given by

It was implicit ly assumed for the above estimate tha t the er ror variances

were t h e same. I n the nex t s e c t i o n we w i l l a l s o achieve t h i s r e s u l t when

2 -1 we assume ?I = (5 0 known t o be uni ty .

2 1

The disadvantage with t h i s e s t ima te i s t h a t it i s not i n v a r i a n t

under changes of s c a l e , although it i s i n v a r i a n t under r o t a t i o n . I n

p r a c t i c e t h e former i s usually the more important. I t was suggested by

Jones [27] and T e i s s i e r - [ 5 7 ] - (see Ker-%ck_-and - Haldane .-- - - . [ 3 3 ] ) t h a t t h e -- - --- -

coordinates be s tandardized t o overcome t h i s problem. Thus we transform

and hence we make pZ0 - - vo2 = 1 and t a n 0 (see equation (2.8) )

i d e n t i c a l l y un i ty . I n terms of t h e o r i g i n a l d a t a the l i n e becomes

the sign of the s lope being t h a t of pll. This l i n e (2.9) is c a l l e d ,

"The reduced major ax i s" by Kermack and Haldane [ 3 3 ] who give t h e s tandard - -- - - -

dev ia t ions of

and

For i l l u s t r a t i o n l e t us so lve t h e l e a s t squares es t imate when we

2 -2 assume = 0 /a i s known. We w i l l no t assume X = 1, b u t ou t s o l u t i o n 2 1 below w i l l be seen t o be t h e same a s eqn. (2.8) i f we s u b s t i t u t e X = 1

i n t o it.

To allow f o r more genera l i ty w e w i l l allow the variance of 'i - a -

B xi t o vary with i. I n t h i s case we minimize the sum

where w a r e inverse ly p ropor t iona l t o t h e variance of y i - c 1 - B ~ ~ i

given X (Deming [16) c f . a l s o Kummel 1361) t h e cons tant of p ropor t iona l i ty

being independent o f i . We follow the method of Lindley - [37 ] and minimize S i n equation

(2.10) . Since X is fixed,

Let w = - I f o r convenience and minimize S with r e s p e c t t o B. i A + B ~

A - h s e t t i n g this equal t o ze ro and l e t t i n g a = y - f3x g ives

A

where we have again taken t h e p o s i t i v e square r o o t i n o rde r t h a t f3 and h

cov(x, y) have t h e same s ign . It can be shown t h a t t h i s value f3 does

indeed correspond t o a minimum value of S.

I n the terminology of equation (2.8) t h e above equation becomes

This es t imate o f f3 i s cons i s t en t , and i n f a c t i s the only c o n s i s t e n t

es t imate given i n t h i s sec t ion ; unless i t i s the case t h a t A = 1. It i s

i n t e r e s t i n g t o note t h a t , a s e a r l y a s 1879, Kummel [36] minimized a weighted y-- -- - -A

sum of squares and achieved a r e s u l t , the same a s equation 2 1 , which

agreed with Adcock ' s es t imate only when t h e e r r o r variances were equal . I n

s p i t e of t h i s the re a r e many more l e a s t squares es t imates i n t h e l i t e r a t u r e

which a r e not c o n s i s t e n t and which ignore the e r r o r var iances . Some of

these a r e amazingly complex, an example b e i n q ~ o r k -A-d - [ - 641 .- who has poss ib ly

the most d i f f i c u l t es t imate t o compute, r e l y i n g a s it does on i t e r a t i v e

methods.

For papers genera l iz ing l e a s t squares t o t h e mul t iva r i a t e cases I

r e f e r t o Sprent [56 ] and Vi l legas [59, 60, 611. These papers consider only --- --._- _ _ - - - - - - . - . - - - - -- - - - . - c o n s i s t e n t es t imates .

2.3. Maximum Likelihood Est imation

I n order t o es t imate the parameters i n equations (2 .2) , t h e only

information being t h a t of equations ( 2 . 1 , we have shown, v ia R e i e r s d l ' s

theorems, t h a t more information i s requi red . The l a s t th ree equations of

(2.1) a r e t h e maximum l ike l ihood s o l u t i o n s of the parameters on the l e f t

hand s i d e of equations (2 .2) . This s e c t i o n w i l l be devoted t o maximum

l ike l ihood es t imates of 6 where we a r e e i t h e r provided wi th a d d i t i o n a l

information on the e r r o r var iances o r a r e prepared t o make c e r t a i n

assumptions regarding them.

2.3.1. Knowledge of one e r r o r variance

L (A) G1 is known.

2 - 2 I n t h i s case OX - Ox

- O: can be es t imated and we may es t imate @ by

2 (B) G2 i s known.

2 2 The problem i s symmetrical i n GI and G2 and hence we es t imate

I n the f i r s t case the re is a p o s i t i v e p r o b a b i l i t y t h a t the known

value of 0: could tu rn o u t l a r g e r than the es t imate of o implying

2 t h a t OX < 0 which i s impossible. I f t h i s should happen the procedure

does no t give an es t imate of $. Simi la r ly i n (B) it could happen t h a t

2 2 S < C2 and again we cannot es t imate 6 by t h i s method. The p r o b a b i l i t y Y

of this happening i n e i t h e r case w i l l tend t o zero a s the sample s i z e

inc reases .

2 2 2.3.2. Knowledge of the r a t i o = 0 /O 2 1

The inconsistency t h a t may a r i s e i n s e c t i o n (2.3.1) cannot occur

i n t h i s case , perhaps this is why this case is the most popular f o r s tudy

i n the l i t e r a t u r e .

With A known, equations (2.2) become

2 2 and $ Cov(x, y ) + 8 ( A o: - o ) - A Cov(x, y ) = 0 and thus we es t imate

Y

h

A s before we take t h e p o s i t i v e square r o o t i n order t h a t f3 w i l l have

t h e c o r r e c t s i g n , t h a t o f Cov(x, y ) . Equations (2.15) can be seen t o

be t h e same a s t h a t achieved by using Lindley ' s weighted least-squares

es t imate of the preceding sec t ion .

2 2 2.3.3. Both rS1 and O2 known -

I n t h i s case we could use one of C2.12), (2.13) o r (2.15) t o

give us an es t imate of 8. We could a l s o use (Madansky [39] ) t h e geometric

mean of (2.13) and (2.141,

where sgn (g) i s chosen t o be t h e same a s sgn (cov(x. y ) 1 . Thus we have four (usual ly d i f f e r e n t ) es t imates f o r 8 , a s r e s u l t which d i d not endear

i t s e l f t o many people. This s i t u a t i o n the re fo re became known a s the

"over ident i f ied" case, leading some ( e . g. Allen [ 31) t o recommend t h a t knowledge o f X i s t o be p re fe r red . Kendall _ and -- S t u a r t -_- [32] _ _ . say t h a t

t h i s case is q u i t e unmanageable s i n c e we must ob ta in the maximum l ike l ihood

by solv ing th ree equations f o r the two unknowns, 6 and a 2 X

~ i e f e ~ - m , __- - i n h i s review of t h i s book, does not agree and mentions

2 t h a t the maximum l ike l ihood equations f o r B , ax, a and p can be solved

by maximizing the l ike l ihood with r e s p e c t t o t h e f o u r parameters. Following

this, Barne t t [4] d i f f e r e n t i a t e s t h e log-l ikel ihood wi th r e s p e c t t o a, B, - - -

2 p and OX and r e a l i z e s the equations,

again + i s used t o denote sgn CCovCx, y) ) . It is i n t e r e s t i n g t o note t h a t t h i s es t imate f o r f3 i s the same a s

A

t h a t when h i s known. The es t imate 02 i s no t the same a s Cov(x. y) /f3. X

which i s t h e b e s t we could achieve with j u s t h known.

2 I n t h i s case. with both o2 and o known. the same d i f f i c u l t i e s

1 2

mentioned i n s e c t i o n (2.3.1) may a r i s e . I f so l t h i s method f a i l s t o

provide an es t imate .

When both e r r o r variances a r e known Madanskv [35&states t h a t we may

consider the case where COV(E, 6 ) # 0 and t h a t (2.16) i s the maximum

l ike l ihood es t imate of B . H e says t h a t w e may es t imate cov(E, 6 ) = polo2

This i s not c o r r e c t ( c f . Moran [41]) f o r only B2 is i d e n t i f i a b l e with

To see t h i s l e t p # 0 and consider the l a s t equation of (2.21 which

now becomes

and hence i t is no longer t rue t h a t sgn(f3) = sgn(Cov(x, y ) 1 .

I f sgn(p) is known then t h e magnitude o f p and f3 i t s e l f a r e

i d e n t i f i a b l e r however from equation (2.19) we see t h a t sgn(p1 i s not

necessa r i ly t h a t of cov(x, y) and it is unl ikely t h a t sgn(p) w i l l be

forthcoming .

I f we a s s m p # 0 and t h a t we know 2 O1 w e can i d e n t i f y only a 2 x while i f only A i s known no parameters a r e i d e n t i f i a b l e . It i s thus wise

when planning an experiment t o at tempt t o keep c o v ( ~ , 6) = 0 i f we p lan

on using maximum l ike l ihood es t imat ion .

2.3.4. When a i s known - I f a is known then Y = a f 6 X passes through the p o i n t X = 0,

Y = a and by t r a n s l a t i o n of the coordinate axes we may make t h e l i n e pass

through the o r i g i n . We may take

a s a c o n s i s t e n t es t imate of f3 s o long a s # 0 A t e s t f o r = 0

should of course be made before es t imates a r e taken on any of the para-

meters, f o r i f s o they would not be cons i s t en t .

There is a danger t h a t could a r i s e i n t h i s s i t u a t i o n , namely when we

a r e only concerned with approximating the t r u e r e l a t i o n , which may be non-

l i n e a r , by a l i n e a r r e l a t i o n over a c e r t a i n range. For example, we know

t h a t the t r u e r e l a t i o n passes through the o r i g i n , however we a r e only

concerned wi th the range a < X < b, a > 0 . I n t h i s event o u r l i n e may

be anything b u t l i n e a r i n t h e v i c i n i t y of the o r i g i n and using a = 0

could s e r i o u s l y e f f e c t our r e s u l t s .

We s h a l l d e f e r t o a l a t e r chapter the study o f r e p l i c a t i o n (another

form of a d d i t i o n a l knowledge) using l e a s t squares and maximum l ike l ihood

methods .

I n this chapter , and i n most o the r s , w e give t h e numerical value of

t h e various es t imates given, app l i ed t o t h e d a t a i n chapter 1 i n t a b u l a r

form. Unless comment seems t o be requi red f o r our r e s u l t s we s h a l l l e t

t h e t a b l e s t a n d on i t s own mer i t s . We see from t a b l e 2.2 t h a t the b e s t

es t imates of t h i s chapter gave r i s e t o t h e b e s t numerical values.

Table 2.2. EXAMPLE

EQUATION

2.3

2.8

2.9

2.12

2.13

2 . l 5

2.16

2.17

2.20

METHOD OF ESTIMATION

Least squares, no e r r o r i n X

Minimize normal d i s t ances

Standardized coordinates

2 Max. l ike l ihood O1 = .04, known

Max. l ike l ihood 0: = .0625, known

Max. l ike l ihood = 1.5625, known

Max. l i ke l ihood (I) 0: = .04, o2 = .0625 2

Max. l ike l ihood (11) o2 = .04, B: = .0625 1

a known, a = 2

- -- -

ESTIMATE OF $

CHAPTER 3

ESTIMATES DERIVED FROM GROUPING THE DATA

Grouping es t imates a r e loosely based on t h e idea t h a t t o de f ine a

s t r a i g h t l i n e only two p o i n t s a r e requi red . We form two groups, take the

- - - - means, (xl, yl) and (x2, y2) of each and choose t h e l i n e throuqh t h e s e

Some es t imates r equ i re t h a t some of the da ta be dropped, t h i s da ta

being a t h i r d group. To maintain consistency we s h a l l denote our groups

GI, G2 and G3. The da ta t o be dropped, i f any, w i l l comprise G2,

otherwise G2 w i l l be empty.

The theorems i n the previous chapter t e l l us t h a t i f our parameters

a r e no t i d e n t i f i a b l e they remain s o no mat ter how we may rearrange t h e

da ta . With this i n mind we now i n v e s t i g a t e Wald's method [62] which was I--_- -

t he f i r s t published, although the paper by Nair and S t r i v a s t a v a [43] may -- "

have been w r i t t e n concurrently o r perhaps e a r l i e r .

Wald s t a t e s t h a t t h e l i n e Y = a + f3 X can be es t imated i n c e r t a i n

cases from the observed values o f x and y wi thout knowledge of 2 O1

2 and C2. The es t imates a r e a l l cons i s t en t . These cases occur when the

following four assumptions a r e s a t i s f i e d , t h e four th be ing known a s "Wald's

condit ion".

1 The e r r o r terms El, E ~ , . . ., E are independently and n

2 i d e n t i c a l l y d i s t r i b u t e d with f i n i t e variance O~ , a s a r e

the 61, 62, . .. , 6 with f i n i t e variance 0; . n

(2) E(ci 6 . ) = 0 for a l l i and for a l l j . 3

(-31 There ex i s t s a s ingle l inear re la t ion between the t rue

values X and Y , i . e . Yi = a + 6 Xi.

I (Xl+. . .+X 1- (XWl+. . .+X I m n (4) l i m . i n f . 7 0 . n

where for convenience we l e t n = 2 m be even. Let

I f the above four conditions are s a t i s f i e d we w i l l estimate 6 by

Let

and

From assumption (3) we have t h a t

W e now show tha t b = b2/bl i s a consistent estimate of 6 i f

our four assumptions a r e s a t i s f i e d . From Y = a + 6 Xi we have: i

1 1 2 2 The variance of -[ (6 +. . .+6 ) - (6m1+. . .+6,) ] is ?(n 02) = a2/n and n 1 m -

1 2 I1 t h e variance of -[ (E +. . .+E ) - ( E +. ..+& ) ] = al/n I both of these

n 1 m m+l n

converging i n p r o b a b i l i t y t o zero. Applying assumption (4) t o both the

numerator and denominator ensures t h a t b converges i n p r o b a b i l i t y t o 6 , - -

i . e . b is c o n s i s t e n t . Since b i s c o n s i s t e n t we have t h a t a = y - b x

i s a c o n s i s t e n t e s t ima te o f a .

W e now tu rn t o es t imat ing 2 2 al and a2. Let

The equations ( 3 . 4 ) represent what would be sample es t ima tes of

and Cov(X, Y ) a s given i n equations (2.2) i f we a c t u a l l y knew O x t OY

the t r u e values Xi 2

and Yi. L e t Sx, s2 and S be def ined a s i n Y xy

equations 2 1 . Then

Equations (3.5) may be proved a s follows, r e c a l l i n g t h a t es t imates

(2.1) a r e i n f a c t biased. From equations (2.2) we have

b u t

and

Thus from (3.7) and (3.8) equation (3.6) becomes

This proves the f i r s t equation of (3.5) and t h e second equation may be

shown i n an exac t ly s i m i l a r way. The t h i r d equation follows e a s i l y from

assunptions (2) and from the assumption of independence between e r r o r

terms and t r u e values.

From assumption (3) w e know t h a t

Thus from the l a s t o f equations (3.5) and from (3.9)

S u b s t i t u t i n g equations (3.10) i n t o t h e f i r s t two of equations (3.5)

gives

We have shown tha t b i s a consistent estimate of and hence it is

c lear tha t the expressions

and

2 converge i n probabili ty t o o2 and o respectively. Thus (3.12) a re

1 2

consistent estimates of equations 3 1 because of the (n/n-1) adjust-

ment they are a l s o unbiased.

Although our estimates are consistent i f assumption (4) i s s a t i s f i e d

they may not be the most effect ive t h a t we could derive. Our observations

i n t o two groups and where

and (xi, yi) C G3 i f i > m. It is true tha t any other division of the

observations w i l l a l so give consistent estimates so long as the grouping

i s performed independently of the errors , E and 6i, and so long as i

condition (4) remains sa t i s f i ed . We consider now how we may improve

our estimates . W e obtain a be t t e r est.imate by finding tha t estimate which w i l l give us

the shortest confidence in terva l for B (Wald [ 6 2 ] ) . We w i l l see i n chapter 8 -- t ha t the shor tes t confidence in terva l a r i ses when lbll is a maximum.

From equations (3.1) we see tha t 1 bll is maximized by ordering

the observations, renumbering where necessary, xl 5 x2 5 . . . 5 x . Thus n

G1 i s the s e t { (xl. yl) , (x2, y2) , ..., (xmI ym) 3 and G3 the s e t {hrn+,t Y,,) I ..., (X n , y,)). Depending as it does on the values x 1' x 2' . . . , x it i s unlikely t ha t t h i s grouping w i l l be independent of the

n ' e r ro r s El , E ~ , . . . , E . I f we knew the r e l a t i ve s i ze s of the t rue values

n

X1t X2 , ..., Xn (more on t h i s l a t e r ) we could order the t rue values X1 5

X2 5 . . . C Xn, again renumbering where necessary. and l e t G = { (xi. yi) I 1 xi € bl. X 2 , . . . , xm}1 and Gg = (xi, yi) I Xi E { x ~ + ~ ~ XmtzI .. . I X n l l . This grouping is en t i r e ly independent of the e r rors .

The two groupings, ordering x and ordering X I w i l l be i den t i ca l i n

t he case where the range of E is the f i n i t e i n t e r v a l [-c, c] and a l l of

the observed values xl, x2, ..., x f a l l outside of the i n t e r v a l [XI-c, n

x 1 + c 1 , where x ' denotes the median of xl, x2, ..., x . In t h i s case we n

may order the x ' s with confidence t h a t we have performed the grouping

independently of the e r rors . I n p rac t ice we may order i n t h i s way i f 3 c >

0 such t h a t P [ I E ~ 2 c] i s small and the number of x i n [xl-c, x'+cl i is a l so small.

Let b ' and b" be the est imates of B obtained by ordering the

X1t .-., and the XI, ..., Xn respectively. We consider the case n where b" is not known, a s may of ten be t he case, and w i l l now f ind upper

and lower bounds f o r b" . I f E is normal, l e t v2 be t he sum of the squared res iduals i n the

x-direction divided by degrees of freedom. A good estimate of c w i l l then

be 3v and the i n t e rva l [-c, c] may be considered as a possible range

f o r E . I f E i s not normal it may be wise t o increase c t o a s much as

C = 5v.

Let S be the s e t of a l l possible groupings which s a t i s f y the following

conditions

where x' i s again the median value of the x.

For each grouping g € S calculate b and l e t b* and b** be

the obtained minimum and maximum values, respectively. Since the X-ordered

grouping i s i n S we therefore have b* and b** as lower and upper

limits, respectively, of b" . Wald gives a conditions which, i f s a t i s f i ed , w i l l imply tha t the

expression i n assumption (4) w i l l not converge s tochast ical ly t o zero.

This condition is 3 X € R such tha t

where [-c, c ] i s the range of E.

I f X does not have t h i s property, as it obviously doesn't when X,

& and 6 are normal, i t may happen tha t f o r every grouping defined

independently of the errors the expression i n assumption (41 converges

s tochast ical ly t o zero. There i s one case where the expression does not

converge t o zero, even though our variables a re normally dis t r ibuted,

and tha t i s where the order of the X ' s i s known. We can never be sure

of the order by merely looking a t the data a f t e r the experiment, but

sometimes in the laboratory we can s e t up the equipment so t h a t the t rue

X is, say, increased from observation t o observation. I f t h i s i s so

then the ordering given t o the observed x ' s is merely the order of the i r

occurrence. We see then that E(x + x + .. . + x ) < y < EIx + .. . + 1 2 m m + l

xn) and that b w i l l not tend t o zero. Thus we achieve, perhaps not

1

the most e f f i c i en t estimate, but a t l eas t a consistent estimate. This i s

yet another verif icat ion of the truism tha t the time t o begin the s t a t i s -

t i c a l analysis is before, not a f t e r , the experiment is performed.

We have dealt with Wald's method i n some d e t a i l because it was the

f i r s t of the grouping estimators and because it is a f a i r ly simple and

straightforward procedure. I t i s a lso qui te commonly misunderstood and,

t o quote Moran [40], "caused a considerable amount of confusion i n the

l i t e ra tu re . . . Im. The main d i f f icu l ty , as might be expected, i s dividing the observations so th3t the dis tr ibut ions of the errors are unaffected.

In the in teres ts of increased efficiency Bar t l e t t [6] divided the --, . . . . - . ---.-. .

observations in to three groups. This was also done by Nair and Str ivis tava

[ 431 (cf . a l so Nair and Bannerjee [ 421) but we w i l l follow the outline of -- .- ---- - - .

Bar t l e t t . Consider the uniform model i n which x = X i s observed without e r ror

and spaced a t equidistant unit intervals . In t h i s case the ordinary l eas t

squares estimate

w i l l provide an unbiased estimate of 6 with an e r ro r variance of 2 - 2

02/E (xi-X) . I f we l e t n = 2k+1 then it can be shown by induction tha t - 2

1 (xi-X) = !?, (!?,+I) (2k+1) . The observations are s p l i t i n t o three groups 3

where the two end groups each have k elements, k i s as close t o n/3

as possible. Bar t le t t then uses the estimate

for 8 . For locating the l ine Bar t le t t has i t pass through the overall - -

man (x, y ) while Nair and Shrivistava use (3.13) to locate the l ine ,

as well as t o estimate the slope.

Estimate (3.13) has an error variance of

Thus the re la t ive efficiency of b ' is

1 n which can be shown t o be a maximum when k = - 0 R + l ) = - 3 3 . Thus

When k = n/2 we have

Thus, by using th ree groups r a t h e r than two, we have increased the

r e l a t i v e e f f i c i e n c y of our es t imate . The inc rease i n e f f i c i e n c y , a s i t

t u r n s ou t , i s approximately twenty percent . B a r t l e t t suggests t h a t i n

genera l k = n/3 i s t o be p r e f e r r e d t o k = n/2.

Rather than j u s t considering X uniformly d i s t r i b u t e d , Gibson a_"_"%_

Jowett [2 31 went a s t e p f u r t h e r and considered s e v e r a l o t h e r d i s t r i b u t i o n s . -_- -

They found t h a t the three-group method was " su rp r i s ing ly e f f i c i e n t " , b u t

recommended the genera l use of the r a t i o 1 : f o r d iv id ing t h e obser-

va t ions r a t h e r than 1: 1: 1 a s given by B a r t l e t t and Nair and Bannerjee.

They do t h i s s i n c e t h e normal d i s t r i b u t i o n has the more common occurrence,

t h e r a t i o 1 : being optimum i n t h i s case and being f a i r l y good i n

o the r cases , although it is no t t o o good f o r extreme skewness. For those

s p e c i f i c cases where the d i s t r i b u t i o n of X i s known they give the

following t a b l e which gives the optimum r a t i o s .

Table 3.1. OPTIMUM PROPORTIONS

D I S TRI B UTI ON

Normal

Uniform

B e l l Shape

U-Shape

J-Shape

Skew

FREQUENCY CURVE RANGE PROPORTIONS

.27: .46: .27

.33: .33: .33

.31: .38: .31

.39: .22: -39

.45: .40: .15

.36: .45: .19

APPROXIMATE RAT1 OS

1:2:1

1: 1: 1

3: 4: 3

2:1:2

3: 3: 1

4:5:2

The f i n a l grouping es t imates t h a t we consider i n t h i s chapter a r e

those due t o Neyman and S c o t t [453. We w i l l b r i e f l y review two methods . -- _

of es t imat ion given by them with necessary and s u f f i c i e n t condit ions f o r

t h e i r consistency. For both o f these methods they admit t h a t cons i s t en t

es t imates of f3 w i l l be achieved i n "very except ional cases only". I n

both methods we do not necessa r i ly assume t h a t the e r r o r s a r e uncorre la ted .

For the f i r s t method f i x two numbers a and b

0 and

Let Z

P ( x > b) > 0 . Let

and W be the mean values of x and y j j i

group G f o r j = 1, 2. j

An es t imate f o r 6 is then

the law of l a rge numbers,

L I b = - 1 z - z '

2 1

such t h a t P ( x 5 a ) >

r e spec t ive ly i n

and converge i n p r o b a b i l i t y t o

E (Wi) and E (zi) r e spec t ive ly , thus the s t o c h a s t i c l i m i t of b is

( E W ~ - E W ~ ) / ( E Z ~ ~ E ~ ~ ) . The authors cons ider condi t ions f o r E (W -W 1 = 2 1 B E(z2-z1). Let ( c , d) be t h e s h o r t e s t i n t e r v a l such t h a t PCc C E 5 d)=

1. Since E (E) = 0 it is c l e a r t h a t c 5 0 5 d and they show t h a t

necessary and s u f f i c i e n t condit ions f o r the consistency of b a r e

The second method involves f ix ing two proportions P1 and P 2 '

with pl > 0, p2 > 0 and pl + p2 5 1. Let 3

and W denote the 3

means of x and y respectively f o r i = 1, 2 , . . . [npl] = r and l e t i i

Z 4 and W denote the means of x and yi respectively fo r i = n-S+1, 4 i

n-S+2, . . . , n where S = [nP2] . The estimate for 6 i s then

Neyman and Scot t show tha t i f X and X are points such t h a t P1 1-p2

P(x 5 X ) = pl and P ( x Z X ) = p2 and i f (c, d) has the same p 1 1-p2

meaning a s above then a necessary and su f f i c i en t condition f o r the consis-

tency of b2 is

We w i l l a l so consider grouping methods when we have repl icated

observations and when we use the analysis of variance. These however w i l l

be considered i n l a t e r chapters.

The most important applications of grouping methods occur when we

have some ex t ra knowledge on the posi t ion of the t rue values. For example,

if the order of the X ' s is known o r i f we have knowledge t h a t our X ' s

were achieved from two (or from k) processes (cf. Madansky [39] on t h i s ---------lp _ _ - --

po in t ) , we may form two (or k) groups with each x i n i t s appropriate -/

-- -

group and be assured t h a t , s o long a s E i s independent of the processes,

we have grouped independently of the e r rors and tha t Waldls condition

(assumption ( 4 ) ) i s s a t i s f i e d . This does not contradict the Reirsal

theorems, for we have addi t ional information a t our disposal which can be

used t o give us a consistent estimate of 8 .

3.2. Example

The table below giving most of the estimates of th i s chapter i s se l f -

explanatory. We do not give values for the Neyman and Scott estimates

because for any values of a , b or p lt P2

we might reasonably choose

we would get resu l t s s imilar t o others l i s t e d here.

Table 3.2. EXAMPLE

METHOD OF ESTIMATION

Wald's Method : Unordered Data

11 " : x-ordered Data

I I I' : X-ordered Data

B a r t l e t t ' s Method : Unordered Data

11 " : x-orderedData

11 " t X-ordered Data

Optimum Proportion : .27 : .46 : .27

II I1 : x-ordered Data

It II 8 X-ordered Data

ESTIMATE OF 6

1.94

1.95

1.97

1.8

1.97

1.98

1.88

1.98

1.99

CHAPTER 4

INSTRUMENTAL VARIABLES

In t h i s chapter we s h a l l consider the use of addi t ional knowledge i n

the form of Instrumental Variables. These instrumental variables form a t

l e a s t one s e t of extra data, highly correla ted with X but independent

of E and 6 . I t i s not too d i f f i c u l t t o f ind variables correla ted with

X and Y, the so-called " invest igat ional variables", y e t it may prove

d i f f i c u l t t o have them independent of the e r rors .

A fu r ther problem t h a t could a r i s e is when the invest igat ional and

instrumental variables a r e so highly correla ted tha t perhaps the inst ru-

mental variables should have been added t o the re la t ion as a t h i r d

dimension. Madansky [39] . -- gives .- an example of t h i s with Y and X being, _ _ _ _ - -

respectively, the pr ice and the quanti ty avai lable of bu t te r . The re la t ion

i s Y = cx + f3 X with Z1 the pr ice of margarine. an instrumental variable.

He points out t h a t the t rue re la t ion may perhaps have been b e t t e r expressed

as y = c i + B x + y z .

Instrumental variables were developed independently by Geary [ 221

and by Reiersdl [48, 491. We consider f i r s t the simplest case where we have -- _ _ - w- - -- - but one instrumental variable.

4.1. One Instrumental Variable observed without e r r o r

Let Zi, i = 1 2, . . . n be a s e t of variables correla ted with and Yi but independent of E and di. We may assume t h a t Zi is

i

observed without e r ro r , for if there is an e r ro r qi , we may replace 'i

i n what follows by Zi + Tli (Moran [41]) . In order t h a t the notation be -- -- -

a l i t t l e simpler we consider the homogeneous r e l a t i on and a prime on a

- variable w i l l denote measurement about i t s mean, e .g. X! = Xi - X. Thus

1

our re la t ion becomes Y' = B xl or

where B = -B1/B2.

I f we multiply (4.1) by Z f /n and sum over i we r ea l i ze

Consider a l so the analogous expression involving the observed variables

It is c l ea r t ha t

and

Thus E ( A ) = E ( B ) = 0 s ince we assume independence between Z and the i

e r r o r terms. Var (A) and Var (B) a re both O C ' / ~ ) and hence A and B

both converge t o zero i n probabi l i ty . Thus i r respec t ive of the ac tua l

l imit ing values of C yf Z f , C Y; Z f , C x! Zf and C X I Zf we see tha t 1 1

expression (4.3) converges i n probabi l i ty t o zero and therefore

J I

Cx! Z !

converges i n p r o b a b i l i t y t o B . Thus

i s a c o n s i s t e n t es t imate of B = -6 /B s o long a s our assumption 1 2

Cov(z, x) # 0 i s s a t i s f i e d .

Let us now compute an asymptotic variance f o r b f o r t h e case where

t h e l i n e goes through the o r ig in . Since X and Z a r e c o r r e l a t e d w e may

l e t

E X . Z = y 1 i

where y i s a cons tant which we can es t imate s i n c e Z i s observed wi thout

e r r o r . Consider

From t h e law of l a r g e numbers we have t h a t

and

The d i s t r ibu t ion of z ( 6 i - 6 ei) Zi converges Limit Theorem t o a normal d i s t r i bu t ion with mean zero and

by the Central

variance

We can estimate 0; s ince Z i s observed without e r r o r and, by a

well-known theorem, the d i s t r ibu t ion of (4.7) converges t o a normal

d i s t r i bu t ion with mean zero and variance

Thus the asymptotic variance of b i s

4.2. Two Instrumental Variables observed with e r r o r

Reiers$l[49] considers the case where we have two s e t s Z1 and z2 --_ - - I of instrumental var iables r e l a t ed by y1 Z1 + y2 Z2 = 0, where Y1 and

y2 are known constants. We assume Z1 and Z2 are observed with e r r o r

and t h a t our observations are

where the random variables Wli

and w have f i n i t e variances and means 2 i

of zero. We assume t h a t a l l e r ro r s a re independent of a l l t rue values and of

each other . The observations can then be represented by the quadruples

(xi' Yi t 2 z ) and we es t imate f3 by li' 2 i

Let us show t h a t b i s a c o n s i s t e n t e s t ima te of 6 . Rewrite b a s

I f we t ake the expected value of t h e denominator of (4.11) w e g e t

Applying the law of l a rge numbers t o (4.12) w e g e t

s i n c e we have assumed independence of e r r o r s . Note t h a t Z2 = -Yl/y2 Z l ,

thus the denominator converges i n p r o b a b i l i t y t o

Applying t h e same technique t o t h e numerator of (4.10) we s e e t h a t

From equations (4.10) , (4.12) and (4.13) we see t h a t b converges i n

probabi l i ty , so long as Cov(Z X) # 0, t o 1'

Thus b is a consistent estimate of B , so long as Cov(zl, x) # 0.

For an example i n the use of t h i s type of instrumental var iable the

reader is referred t o the paper by Carlsonc-Sobel a x d d m o e which is

concerned with estimation i n a biochemical s i tua t ion .

Durbin [20] considers the one instrumental variable without e r r o r - -. case f o r various values of Z. He shows t h a t i f Z = '1 according as

i

t o whether x i s greater o r smaller than x ' , the median of the x . ' s , i 1

the method reduces t o Wald's method. I f we put the x 's i n t o one of i

three groups and l e t 'i = -1, 0 o r 1 according t o the group i n which

x was placed we have B a r t l e t t ' s method. i

He a l so considers ordering the x and l e t t i n g Z = i, the posi t ion i i

of x . This i s a b e t t e r instrumental var iable i f the order of the x ' s i is tha t of the X ' s , but Durbin shows t h a t it w i l l still be a good choice

even i f the E 's a re re la t ive ly large, i n which case he says t h a t the b ias

should be l e s s than tha t for the or ig ina l variables. I f we can s e t up our

experiment such t h a t our t r u e values a r e increasing then t h i s w i l l be a

per fec t instrumental variable. In t h i s case we do not reorder the x but

l e t Zi be the order of occurrence.

45

There is a f u r t h e r s tudy on ins t rumenta l va r i ab les due t o Tukey -[581

which we w i l l consider i n chapter 7, on the Analysis of Variance.

4.3. Example

The concept of using two ins t rumenta l va r i ab les i s not too u s e f u l

i n p r a c t i c e and s ince reference has been made t o an example using t h i s

case we s h a l l apply our example only t o t h e case of one ins t rumenta l

v a r i a b l e observed without e r r o r .

Est imates using Wald's and B a r t l e t t ' s methods have a l ready been

given and it would be p o i n t l e s s t o reproduce them. We consider Durbin's

method o f order ing the x ' s and l e t t i n g Z = i. For comparison we w i l l i

a l s o o r d e r t h e X's. The r e s u l t s a r e

'i = i, order x : b = 1.977

'i =.i, order X : b = 1.983 .

CHAPTER 5

CONTROLLING THE OBSERVATIONS

I n many experiments, especial ly under laboratory conditions, we are

not so concerned with measuring X but with actual ly achieving X . For

example, i n an experiment comparing reaction r a t e s of some chemicals with

respect t o temperature it i s not often t h a t we measure the r a t e s and see

what the temperature happened t o be a t the time. Rather, we would pick

cer ta in temperatures, say 15O~. 20•‹c, ..., e t c . , and measure the corres- ponding r a t e s . This is an example of what Berkson [ 71 would c a l l a --__ "controlled experiment"; X, the temperature, being the controlled

variable.

I n the l i t e r a t u r e the model achieved when one of the var iables i s

controlled is often referred t o as the "Berkson Model", s ince Berkson

was the or ig ina tor of the idea. This model makes a pleasant change from

the usual errors-in-variables model s ince, i f the conditions a r e s a t i s f i e d ,

both a and f3 are i den t i f i ab l e with f3 estimated by the "usual"

estimate

I t i s a l s o the case, as Berkson claims, t h a t b is consistent.

Unfortunately he does not show t h i s too well and i s c r i t i c i z e d and disputed

by endal all_-[30]. Lirndley--[38] however gives a mathematical j u s t i f i ca t ion

of the consistency of b i n t h i s model and, following h i s method, we s h a l l

show tha t b is consistent. An almost i den t i ca l solut ion is given by

chef f6 1521 who also considers the idea of making several runs or w-

replications on the relat ion. He allows for the poss ib i l i t y of a d i f fe rent

l i ne y = a + b . x being the t rue l i n e for each run due t o circumstances j I

which we may o r may not be able t o control during the J (j = 1, 2, . . . , J) runs. We l e t a and b be random variables with E ( a . ) = and E (b . ) =

j j J I

6 fo r a l l j and t e s t hypotheses which essent ia l ly s t a t e tha t the l ines

are , i n fac t , the same. This w i l l be deferred t o the chapter on confidence

intervals and t e s t s of hypotheses.

In the example a t the beginning of t h i s chapter the thermometer reads

the value we require, X. However we know tha t e r ro r ex i s t s and tha t the

t rue temperature is X I while our observation i s x. We know that x is

not a random variable since it was preselected and E , of course, i s a

random variable defined as before, thus X i t s e l f is a random variable and

we have X = x + E. Similar as t h i s model may appear t o the errors-in-

variables model, there are i n f ac t some rather large differences. One

difference ex is t s i n tha t we do not only drop the assumption of independence

between X and E but can actually point out tha t t he i r correlation

coeff ic ient i s -1. It is c lear tha t i n t h i s model the words "s tructural"

and "functional" have no relevance a t a l l . Our observations on Y are not

controlled so y = Y + 6 is the same as before. So

where 5 E ( @ & + 6 ) i s a normally d i s t r i b u t e d random v a r i a b l e with zero 2 2 2 2

expecta t ion and f i n i t e variance G = @ Gl + 02. Equation ( 5 . 2 ) should 3

look f a m i l i a r f o r it represents the c l a s s i c a l l e a s t squares s i t u a t i o n wi th

y a random var iab le , x a f ixed o r mathematical v a r i a b l e and 5 a

normally d i s t r i b u t e d random var iab le with mean zero and f i n i t e variance.

A s shown i n chapter 1 we minimize S = 1 (yi - a - b xi) i n t h e v e r t i c a l d i r e c t i o n and achieve equation (5.1) a s our c o n s i s t e n t es t imate of 6.

We minimize i n the d i r e c t i o n of the uncontro l led observat ions , thus i f w e

had con t ro l l ed y r a t h e r than x we would have minimized S i n the

h o r i z o n t a l d i r ec t ion .

There i s a f u r t h e r i n t e r e s t i n g p o i n t about this model. Let the l i n e

achieved by c o n t r o l l i n g x and minimizing i n the y-direct ion be y = a + 1

bl X. This value of y is an unbiased and c o n s i s t e n t e s t ima te of a + @ x

f o r given x. Let the l i n e achieved by c o n t r o l l i n g y and minimizing i n

the x-direct ion be x = a2 + b2 y , a and b2 w i l l have an expecta t ion of 2 '/B and - f o r given y. I n f a c t this second l i n e w i l l be equal t o t h e f i r s t except f o r poss ib le sampling d i f fe rences which converge t o ze ro

a s sample s i z e increases . This f a c t , impressive i n i t s e l f , l e d Berkson t o

claim t h a t wi th h i s model the re would be only one regress ion l i n e . I n

genera l it is known t h a t the genera l regress ion problem i s n o t i n v e r t i b l e

(see Madansky [ 39 ] f o r a d iscuss ion on this p o i n t ) and t h a t the two l i n e s -- _ ___._ ._._ ----I. --- w i l l be d i f f e r e n t . I f we con t ro l x then the regress ion of y on x gives

the c o r r e c t s o l u t i o n , however t h e regress ion of x on y w i l l n o t ; i t may

even n o t be l i n e a r (Lindley [ 3 8 ] ) . Thus f o r x con t ro l l ed t h e r e a r e two ..- - - .

regress ions and s i m i l a r l y f o r y cont ro l led . This p o i n t was somewhat

ambiguous i n Berkson 's paper.

The variance of b a s given by Berkson and v e r i f i e d by chef pk f g 1521 .---

is , i f x i s controlled,

which we may estimate by using @chef•’; [ 5 2 ] )

I f we control y, the variance of b is

The rea l value of t h i s model i s not tha t in some cases we can use

the l e a s t squares and achieve consistent estimates but in tha t we can

apply the model before we begin the experiment. I f it is a t a l l possible

t o control the observations on one of the variables then we should do so

for then no problems with estimation would ar i se .

A l l the t e s t s of hypotheses and confidence intervals t h a t can be

derived in the c lass ica l regression case are formally the same in the

Berkson model. Although the theory is different the actual resul t s w i l l

be the same fo r both models. This i s another advantage of the Berkson

model.

Our example is not considered for the Berkson model for the data,

being drawn a t random, is not applicable. We could, as some do, force

the data t o the model and "see what happens", however we would only be

repeating the l eas t squares solution for no errors i n X of chapter 2.

CHAPTER 6

CUMULANTS

The idea of using cumulants and moments t o es t imate has been

considered by severa l authors ( c . f . [21], [28] , [3?_], WL, We i n v e s t i - . ._.__----

g a t e this method wi th p a r t i c u l a r reference t o Geary [21].

For convenience l e t t h e r e l a t i o n be w r i t t e n a s

where 8 = -B1/B2.

Let

Thus B1 Z + B2 W = 0.

If M(h) denotes t h e moment genera t ing function of a random var iab le

and L(h) t h e cumulant generat ing funct ion, then L(h) = Logp M(h) o r -

[ ? 'ihi] . exp L(h) = exp i=l i!

W e def ine the cumulants R (cl, c2) of order c 1 2 + c i n terms of

the moments of the (xi, yi) w i t h t h e following i d e n t i t y i n 6.

Since

and a w e l l known theorem s t a t e s t h a t the moment generat ing funct ion (m.g.f.)

of a sum of independent random v a r i a b l e s is the product of t h e m. g. f . ' s ;

we have t h a t , Z and E and W and 6 being independent, the re fo re

Let L(cl, c2) denote t h e c + c o rde r cumulant of (2 , W ) and l e t 1 2 cl, c2 # 0. Geary shows t h a t , from (6.3) and (6.4) I L(cl, c ) = R(clt c 2 ) -

2

A fundamental proper ty of cumulants is t h a t they a r e i n v a r i a n t under

change of o r i g i n i f the order i s a t l e a s t two (one, i n the un iva r i a t e case ) .

Thus t h e r e w i l l be no d i f f i c u l t y i n computation even though we performed

t h e transformations i n equations (6.2). I n f a c t it was t h i s proper ty of

cumulants t h a t l e d them t o t h e i r o r i g i n a l name of "semi-invariants".

(Kendall and ~ t u a r t - L 3 1 1 ) . ~ - - - - - _ I _ _ _ _ _ _ -.

Theorem 6.1.

-1 To prove this l e t y = and rewri te equation (6.1) as Z = y W , then

E[exp(B1z + B2w).I = E[exp(Bly + B2)w1 (6.6)

i s an i d e n t i t y . Therefore

is immediate from equations (6.3) and (6.6) . Consider equation (6.7) and i d e n t i f y the c o e f f i c i e n t s of

on b o t h s i d e s of the equation. Clear ly the c o e f f i c i e n t on the l e f t hand

s i d e i s L(cl+l , c 2 ) , f o r t h e r i g h t hand s i d e l e t d = c + c + 1 and 1 2 consider the expansion

of (B1y + fi2) and i ts c o e f f i c i e n t

Thus the required term of the r i g h t hand s i d e i s

and our requi red c o e f f i c i e n t i s

c +1 (c, +l) ! y L' (dl

~ e t us now i d e n t i f y the c o e f f i c i e n t s of (c2+1) on both

sides of equation C6.7) . On the l e f t hand s ide the coeff ic ient is clear ly L(cl, c2 + 1) , for

the r ight hand s ide l e t d = c + c + 1 and consider the term 1 2

d of the expansion of (o1y + e2) . Thus the required term of the r ight hand s ide is

Tkus the required coeff ic ient is

which is equal t o the l e f t hand coeff ic ient of L(clI C, + 1). Multiply

both of these coefficients by y so tha t

d-c2 cl+l -1 Me know tha t y ' Y and (.cl + 1) ! (d - c2) ! i s equal t o

, -1 (c2 + 1): (d - cl). since d = c 1 2 + c + 1. Thus expression (6.7) is equal t o the r ight hand s ide of (6.8) and

o r B1 L(c1 + 1, c,) + B2 L(cl, c2 + 1) = 0, which proves he or em 1.

The re fore

and may be estimated by

Often a "k" is used i n place of the "R", i.e.

i n which case the s t a t i s t i c s i n (6.12) are known as "k-stat is t ics"

CKaplan [ 281) . --.-,----

For convenience we now give the values of up t o fourth order cumulants

i n terms of the moments measured about the means as

L(1) = M(1) = 0

given by Coc

The sample k-s ta t i s t ics are unbiased and consistent estimates of the

population cumulants , thus the quotient (6.11) converges i n probability

t o B (6.10) , s o long a s !4 Ccl + I, c2) does not tend t o zero , i .e.

L(cl +- 1, c2) # 0. Thus s o long as L(cl + I, c ) # 0 we have an i n f i n i t y 2 of c o n s i s t e n t es t imates of 6.

In our b a s i c s t r u c t u r a l model we a r e concerned wi th normal d i s t r i b u -

t i o n s which, a s i s w e l l known, have vanishing cumulants i f the order i s

t h r e e o r higher. Since c + c + 1 is always a t l e a s t th ree we thus know 1 2 t h a t the cumulants w i l l vanish and t h a t , i n the normal case , t h e use of

cumulants is o f no value whatsoever.

In t h e normal case then cumulants w i l l n o t be used f o r es t imat ion;

unfor tunate ly they a re no t much b e t t e r f o r t h e non-normal cases. When we

compute moments (and the re fo re cumulants, s ee equations (6.13) ) of order

h igher than four we a r e almost wasting our e f f o r t s , unless we have a very

' l a r g e number o f observat ions , f o r t h e inaccuracy becomes q u i t e unmanageable

Wendall and S t u a r t [ 311 . - --.---- - It must a l s o be remembered t h a t f o r symmetric d i s t r i b u t i o n s a l l odd

o rde r cumulants a r e zero and a l l i n a l l it must be s t a t e d t h a t es t imat ion

via cumulants w i l l genera l ly be unsa t i s fac to ry . Ne do not es t imate 6 f o r our example v i a t h e r e s u l t s of t h i s chapter .

Any r e s u l t s t h a t w e could g e t would n o t have any value s i n c e X i s normally

d i s t r i b u t e d and t h i s i s p r e c i s e l y the case where we do no t use cumulants.

CHAPTER 7

THE ANALYSIS OF VARIANCE

7.1. Repl ica t ion of the observations

2 2 2 -2 Fle have seen t h a t when we know

O l f O2 o r X = O 0 no problem

2 1

a r i s e s i n e s t ima t ing the l i n e a r r e l a t i o n . We consider now the i d e a of

being able t o es t imate a t l e a s t one of these parameters from another type

of a d d i t i o n a l knowledge ; r e p l i c a t i o n .

Repl ica t ion has not been considered much i n the l i t e r a t u r e . Dorf and "- - -

Gurland 1171 f e e l t h a t t h i s may be because those a c t u a l l y involved i n --. .-- -.

experiments do n o t themselves r e p l i c a t e t h e i r observations. Why this

should be s o i s not too c l e a r s ince v i l l e g a s [ 5 9 L p o i n t s o u t t h a t t h e r e

i s no d i f f i c u l t y i n i d e n t i f i c a t i o n nor i n the achieving of cons i s t en t

e s t ima tes of the parameters when r e p l i c a t i o n is ava i l ab le .

~ o l l o w i n g the s o l u t i o n of Hays [24] we s h a l l de r ive , by way o f the -----_I_-

ana lys i s o f var iance , t h r e e es t imates due t o ~ u k q [58] along w i t h a four th -- es t ima te given by Dorf and Gurland [17] who a l s o der ive t h e f i r s t th ree - '.A*

b u t w i t h . t h e assumption t h a t COVCE, 6 ) = 0. This assumption we w i l l drop

and g ive a l l four es t imates f o r t h e case t h a t E and 6 may o r may no t

be independent. Although this chapter is mainly concerned wi th t h e Analysis

of Variance, t h i s s e c t i o n w i l l a l s o consider o t h e r e s t ima tes f o r t h e case

o f r e p l i c a t i o n .

It i s convenient t o change t h e no ta t ion somewhat and say t h a t we have

n "treatments" wi th N i = 1 , 2 , . . . , n, observations on each treatment. i

t he t r u e values X i Y . Thus i f (XI, Y1) (X2, Y 2 ) , . . , ( X n , Yn) a r e t h e t r u e values o f our n t rea tments then the observations w i l l be

To keep g e n e r a l i t y we do no t assume t h a t a l l t h e Ni

a r e equa l , although

w e w i l l assume t h a t N . 1 2 f o r a t l e a s t one i. This assures us t h a t 1

we do indeed have r e p l i c a t e d observations.

Define a a s t h e e f f e c t of t reatment i , s o t h a t ai = Xi - u , i

then ai

i s a normally d i s t r i b u t e d random v a r i a b l e with mean ze ro and

var iance O2 Define the following q u a n t i t i e s x '

n Me = - C . Eij w h e r e N = C N .

N i l l i=l i

Let SST denote the t o t a l sum of squares ,

- 2 SST = ? . (xij-x)

11

5 8

Thus the SST can be broken down in to two components; the variation

- 2 due t o both treatments and chance (2 Ni(xi - X ) which we c a l l the sums of squares between (SSB) , and the variation due t o chance alone which w e

c a l l the sums of squares within (SSW). Let us consider SSB. W e show

f i r s t t ha t

Since xij = lJ + ai + E i j we have

and

L .*. SSB = L Ni (p + ai + Me - l~ - Ma - Me)

i

since

59

We define the mean square between (MSB) as SSB divided by degrees of

freedom; there are n-1 d. f . fo r SSB. The expected value of the mean

square between (EMSB) is then

When the N are a l l equal t o m f o r a l l i then i

from the defini t ions of a and M . When the N are not a l l equal, i a i

Snedecor and Cochran [ 541 give . - . --

Since the expressions i n (7.4) and (7.3) are ident ica l when the Ni

are a l l equal we s h a l l use (7.4) i n e i the r case, thus we maintain the

generali ty of having Ni

not constant. It may a lso be shown t h a t

From equations (7.4) and (7.5) we have t h a t

2 2 N - C N ~

EMSB = (T + O N(n-1) X 1 '

W e w i l l denote t h i s expression B xx'

The mean square within (MSW) is defined as the SSW divided by

degrees of freedom. There are N-n degrees of freedom fo r SSW.

To evaluate the expected mean square wi th in (EMSW) l e t i be f ixed and

l e t xilf x~~~ ..., x be t h e r e p l i c a t e d observations on X . I f we i N i i

- 2 only consider t reatment i we may es t imate Xi by xi and G1 by

1 - 2 - x i - x . ) . Using a l l of the observations then we can es t imate 2 N 2 - 1 3 1 O1 1. - - 2

by ? x i - . ) divided by degrees of freedom, c l e a r l y we have 1 3 1

n 2 1 (Ni - n) = N - n d. f . Therefore w e e s t ima te 0 by i= 1 1

This expression i s the same a s (7.7) thus MSW es t imates 0: and

2 EMSW = Dl. We c a l l t h i s quan t i ty Wxx.

I n an analogous manner we may c a l c u l a t e t h e v a r i a t i o n due t o the y ' s

'"YY and W ) and t h a t due t o both x and y cBxy and W . W e

YY xy

t a b u l a t e these r e s u l t s i n t a b l e 7.1.

Table 7.1. ANOVAR TABLE FOR REPLICATION

SOURCE OF VARIATION + MEAN SQUARE: EXPECTED MEAN SQUARE

On inspect ion of the expected mean squares we f ind t h r e e es t imates o f

B a l l converging i n p r o b a b i l i t y t o 6 a s n +- and Ni +- f o r a t l e a s t

one i , s o long a s t h e r e spec t ive denominators do n o t converge t o zero.

Thus t h e following th ree es t imates a r e a l l cons i s t en t .

We do no t necessa r i ly assume Cov(&, 6 ) # 0; i f we do however then

C s g n 6) i s not necessa r i ly t h a t of Cov(x, y ) (see chapter 2, s e c t i o n 2.3.3).

2 W e may o b t a i n (sgn 8 ) from Cov(x, y ) = 6 a + Cov(&, 6 ) using W o r X xy from a p l o t t i n g of t h e observations.

We may a l s o use t a b l e 7.1 f o r the func t iona l case i f w e modify t h e

t a b l e s l i g h t l y . I n the func t iona l case a a r e no t random var iab les , b u t i 2

w e can i n t e r p r e t them a s " f ixed e f f e c t s " . The quan t i ty (5 has no meaning, X 2

however, we may use the t a b l e by r e t a i n i n g E N . (ai - Ma) a s i n (7.2) and i 1 2 2

s u b s t i t u t e t h i s d i r e c t l y i n the t a b l e f o r (N - 3 ) OX. The balance of N

this s e c t i o n i s devoted t o o t h e r es t imates o f B f o r the case of r e p l i c a t i o n .

With r e p l i c a t i o n we have an immediate e s t ima te o f A = G:/G: and we can

use the maximum l ike l ihood es t imate of 8 with A known. This es t imate i s

(Dorf and Gurland [171)

where

It i s c l e a r t h a t b4 i s cons i s t en t s o long a s B # 0. I f B = 0 then xy xy

h

we may take b = 0 c o n s i s t e n t l y unless B - X B = 0 a l s o , i n which 4 Y Y XX

event b4 is indeterminate.

The es t imate b4 i s no t r e a l l y a maximum l ike l ihood es t ima te of B

f o r we do no t know A ; we only have an es t ima te of A. For the r ep l i ca ted

case Vi l legas [59] gives the maximum l ike l ihood s o l u t i o n f o r f i and we -

s h a l l o u t l i n e h i s method. Assume t h a t E and 6 a r e not independent,

an assumption t h a t w e usual ly l i k e t o make and a r e able t o make i n t h e

r e p l i c a t e d case , and l e t

These a r e vec to r q u a n t i t i e s , hence t h e underl ining. We l e t the number

o f r e p l i c a t i o n s be the same f o r each t rea tment , say Ni = M. Notice t h a t

Let C be the (unknown) variance-covariance matr ix o f t h e e r r o r s i j e . The p r o b a b i l i t y dens i ty f o r one - z ( ignoring the cons tant ) is

and therefore the l ike l ihood function i s

W e know from t a b l e 7.1 t h a t Z can be estimated. l e t t h i s es t imate h

be S. Vil legas shows t h a t 8 and 8

linear structural relation - summitsummit.sfu.ca/system/files/iritems1/4166/b13525062.pdf · 2 ox...

Documents