linear structural relation - summitsummit.sfu.ca/system/files/iritems1/4166/b13525062.pdf · 2 ox...
TRANSCRIPT
-
THE ERRORS I N VARIABLES MODEL: ESTIMATION O F
THE LINEAR STRUCTURAL RELATION
by
Stephen W e r n e r
B. Sc. , S i m o n Fraser U n i v e r s i t y , 1969
A T H E S I S SUBMITTED I N PARTIAL FULFILLMENT OF
THE R E Q U I R E m N T S FOR THE DEGREE O F
MASTER O F SCIENCE
i n the D e p a r t m e n t
0 f
M a t h e m a t i c s
(C) STEPHEN WERNER 1973 .L.'
SIMON FRASER UNIVERSITY
M a r c h 1973
All r igh ts reserved. T h i s thesis may n o t be
reproduced i n w h o l e or i n part , by photocopy
or other m e a n s , w i t h o u t p e r m i s s i o n of the
author.
-
APPROVAL
Name: Stephen Werner
Degree: Master of Science
T i t l e of Thesis: The Errors i n Variables Model: Estimation of the
Linear Structural Relation
Examining Committee :
Chai man : C. Y. Shen
- >---- . C . Villegas
Senior Supervisor
D. Eaves
- - - - . R. Rennie
D. Mallory V External Examiner
Date Approved: 16, 1973
(ii)
-
PARTIAL COPYRIGHT LICENSE
I h e r e b y g r a n t t o Simon F r a s e r U n i v e r s i t y t h e r i g h t t o lend
my t h e s i s o r d i s s e r t a t i o n ( t h e t i t l e o f which is shown below) t o u s e r s
o f t h e Simon F r a s e r U n i v e r s i t y L i b r a r y , and t o make p a r t i a l o r s i n g l e
c o p i e s o n l y f o r s u c h u s e r s o r i n r e s p o n s e t o a r e q u e s t from t h e l i b r a r y
o f a n y o t h e r u n i v e r s i t y , o r o t h e r e d u c a t i o n a l i n s t i t u t i o n , on i t s own
b e h a l f o r f o r one of i t s u s e r s . I f u r t h e r a g r e e t h a t pe rmiss ion f o r
m u l t i p l e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be g r a n t e d
by me o r t h e Dean of Graduate S t u d i e s . It is unders tood t h a t copying
o r p u b l i c a t i o n of t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t b e a l lowed
w i t h o u t my w r i t t e n pe rmiss ion .
T i t l e of ~ h e s i s / ~ i s s e r t a t i o n :
GRRoRL' ,^I J~R,,E.LGs I ' (o\E~ :
-
ABSTRACT
Consider the l inear re la t ion Y = a f @ X where a and 6 are
constants which we wish t o estimate from a sample of n p a i r s of
observations. We cannot d i rec t ly measure X o r Y because of e r rors
of observation, ra ther we measure x = X + & and y = Y + 6. The random
variables & and 6 are normally d i s t r ibu ted with mean zero and f i n i t e
variances 2 2 al and O2 respectively. I f X is also normally d i s t r ibu ted
(independently of E and 6 ) with mean zero and variance 2 OX we are i n
a s i t ua t ion known as the l i nea r s t r u c t u r a l model. I f the d i f f e r en t values
of X a re considered as addit ional parameters we a re i n a s i t ua t ion known
as the l i nea r functional model. Chiefly we w i l l deal with the s t ruc tu ra l
model and it is the case t h a t no parameters of t h i s model can be consistently
estimated. The purpose of t h i s paper i s t o show why t h i s i s s o and t o show
what is required i n the way of ex t ra knowledge o r assumptions i n order t h a t
we may consistently estimate these parameters. We are chief ly concerned
with estimating f3 and we give estimates f o r the various cases as they
a r i s e . In many cases an asymptotic variance of the estimate i s also given.
The l a s t chapter of the paper i s essen t ia l ly concerned with confidence
in t e rva l s f o r B .
-
ACKNOWLEDGMENT
I would l i k e t o take t h i s opportunity t o express my g r a t i t u d e t o
Professor Cesareo Vi l legas f o r suggest ing the t o p i c and f o r h i s he lp and
encouragement while preparing t h i s t h e s i s . I would a l s o l i k e t o thank
Simon Frase r Universi ty and the Simon Frase r Universi ty P r e s i d e n t ' s
research g r a n t f o r t h e i r f i n a n c i a l support. F i n a l l y , I want t o thank
M r s . A. Gerencser f o r typing the e n t i r e work.
-
TABLE OF CONTENTS
PAGE
T i t l e Page
Approval
Abst rac t
Acknowledgment
Table of Contents
L i s t of Tables
CHAPTER 1 In t roduc t ion :
1.2 The C l a s s i c a l Leas t Squares So lu t ion
1 . 3 The Two Sub-Models of t h e E r r o r s i n Var iables
Mode 1
1.3.1 The Funct ional Model
1.3.2 The S t r u c t u r a l Model
1.4 Example
CHAPTER 2 Leas t Squares and Maximum Likel ihood I
2.1 I d e n t i f i a b i l i t y of t h e Parameters
2.2 Leas t Squares Est imation
2.3 Maximum Likel ihood Est imation
2.3.1 Knowledge of one e r r o r va r i ance
2 2 2.3.2 Knowledge of t h e r a t i o A = CS /cf
2 1 2
2.3.3 Both 0 and CS: known
2 . 3 . 4 When a i s known
2 .4 Example
CHAPTER 3 Est imates Derived from Grouping t h e Data ,
3.2 Example
i
i i
iii
i v
v
v i i
1
2
5
-
CHAPTER 4 Instrumental Variables '
4.1 One Instrumental Variable Observed Without
Er ro r
4.2 Two Instrumental Variables Observed With E r r o r
4.3 Example
CHAPTER 5 Contro l l ing the Observations - CHAPTER 6 Cumulants (-
CHAPTER 7 The Analysis of Variance
7.1 Replicat ion of the Observations
7.2 The Analysis of Variance f o r an Instrumental
Variable
7.3 The Analysis of Variance f o r the Method of
Grouping
7.4 Example
PAGE
39
3 9
CHAPTER 8 Confidence I n t e r v a l s and Tes ts of Hypotheses , ,' 75
8.1 Confidence I n t e r v a l f o r @, No Extra Information 75
8.2 The Case When h is Known 78
2 2 8.3 The Case When Both cll and c12 a r e Known 81
8.4 The Use of Ins t rumenta l Variables 82
8.5 Confidence I n t e r v a l s Based on Wald's Method 83
8.6 Confidence I n t e r v a l s Using Three Groups 86
8.7 Tes t ing Equal i ty of Lines Derived from Severa l 87
Runs i n the Berkson Model
8.8 Confidence I n t e r v a l s f o r t h e Replicated Case 89
BIBLIOGRAPHY 94
-
LIST OF TABLES
Table 1.1 Values of x and y
1.2 Replicated Values of x and y
2.1 Iden t i f i ca t ion of B
2.2 Example
3.1 Optimum Proportions
3 . 2 Example
7.1 Anovar Table f o r Replication
7.2 Anovar f o r Regression with an Instrumental Variable
7.3 Example
PAGE
8
9
13
25
35
38
60
6 9
7 4
-
CHAPTER 1
INTRODUCTION
Consider a l i n e a r r e l a t i o n Y = a + 6 X between two unobservable
va r i ab les X and Y , where a and 6 a r e unknown constants . The
purpose of t h i s paper i s t o p resen t var ious es t imates of t h i s r e l a t i o n .
I n p a r t i c u l a r we w i l l be concerned wi th e s t ima t ing 6 , t h e slope.
A t y p i c a l experiment w i l l c o n s i s t of n observations (xi, yi) where
W e assume t h a t the Ei a r e i d e n t i c a l l y and independently d i s t r i b u t e d with
mean ze ro and f i n i t e variance 2
0 . S imi la r ly 6 w i l l have mean ze ro and i
f i n i t e variance 0:. Unless otherwise s t a t e d both E and 6 w i l l be
assumed t o follow a normal d i s t r i b u t i o n and be independent. The t r u e values ,
'i and Y w i l l always be assumed independent of E and 6i, f o r every
i t i
The following a r e we l l known d e f i n i t i o n s which w i l l be use fu l throughout A
t h e balance of t h i s paper. Let 8 denote an a r b i t r a r y parameter and l e t en
be an es t imate of 8 based upon t h e random sample xl, x2, ..., x . n A
Defini t ion 1: A cons i s t en t es t imate 8 of 8 i s one which converges i n n
p r o b a b i l i t y t o 8. That i s , Gn i s c o n s i s t e n t i f , V E > 0 ,
-
n
Defini t ion 2: 8 i s an unbiased es t imate of 8 i f E (gn) , the expected n
value of 6, i s equal t o 8. It is n o t necessa r i ly t r u e t h a t a cons i s t en t
es t imate is unbiased.
h
Defini t ion 3 : 6, i s a s u f f i c i e n t s t a t i s t i c i f , given the value of On , t he cond i t iona l d i s t r i b u t i o n of xlr X 2 , . . . X n i s independent of 8. In o the r words we g e t no e x t r a knowledge on the value of 8 by
having complete knowledge of the sample values.
Def in i t ion 4: Let X1 and X2 be real-valued continuous random var iab les
w i t h d i s t r i b u t i o n funct ions F1 ( 0 ) and F2 ( * ) r e spec t ive ly . I f , f o r every
r e a l number z ,
we say t h a t F3 is the convolution of F1 and F2 and w r i t e F3 = FI * F2.
It i s w e l l known t h a t i f X1 and X2 a r e independent then F3 i s t h e
d i s t r i b u t i o n of X .t X2. 1
Defini t ion 5 : We say t h a t a d i s t r i b u t i o n funct ion F1 i s d i v i s i b l e by a
d i s t r i b u t i o n function i f the re e x i s t s a d i s t r i b u t i o n funct ion F3 such
t h a t F1 = F2 * Fg.
1.2. The C l a s s i c a l Least Squares Solut ion
Let x and y be r e a l valued random var iab les wi th f i n i t e second
moments , perhaps independent,
-
def ined on the same p r o b a b i l i t y space of r e f erence. Let a and 6 be
constants and consider t h e random var iab le Z = a + f3 x.
The regress ion l i n e of y on x i s defined as t h a t l i n e y = a + B x 2 1 where a and 6 a r e chosen s o t h a t E(y - Z) i s minimized . Let
2 d = y - Z and consider the minimization of E(d ) .
2 Let E(x) = p and E(y) = V. Let ox = E(x - p) 2 2 2 and a = Ely - V ) .
Y
Thus
The only term involving a i s (a - V + 6 p) 2 and hence E(d ) i s minimized wi th r e spec t t o a by p u t t i n g (a - v + . f3 p ) = 0. Thus we have
S u b s t i t u t i n g t h i s value of a i n t o equation ( 1 . 2 , d i f f e r e n t i a t i n g wi th
r e spec t t o 6 and s e t t i n g equal t o zero g ives
The l e a s t squares regress ion l i n e of y on x i s then
1 --
- This is not the usual d e f i n i t i o n of a regress ion curve, E (Y I x) , although it i s t r u e t h a t when x and y have a b i v a r i a t e normal d i s t r i b u t i o n t h i s regress ion l i n e w i l l coincide with t h e regress ion curve of y on x.
-
We know t h a t Cov ( x i y) and O2 can be consistently estimated by X
and
respectively, thus
i s a consistent estimate of B. We also see t ha t
i s a consistent estimate of a.
The expressions i n (1.7) and (1.8) are the values t h a t minimize
2 1 (yi - a - @ xi) and fo r t h i s reason are cal led the l e a s t squares estimates of a and @. No er rors have been associated with e i t h e r x o r y and no claim has been made of a l i nea r re la t ion between them. The under-
lying model for t h i s case i s y = a + @ x + 6 , where 6 is a random
variable with E (6) = E (x6) = 0 and represents random var ia t ion associated
with the data.
Let Y and X be l inear ly re la ted random variables with Y = a + @ X
and l e t Y be unobservable. Our observations are 'i and yi, f o r
i = 1 2 . n where yi = yi + G i 1 ~ ( 8 ) = 0 and Y and 6 are independent. The underlying model f o r the l e a s t squares regression l i ne
of y on X i s then y = a + @ X + 8. ~t i s c lear t h a t , with X
-
replac ing x above, these two models may be considered i d e n t i c a l . There-
h
f o r e the s t a t i s t i c s and $ a s given by (1.7) and (1.8) a r e
c o n s i s t e n t es t imates of a and 6 .
From the symmetry of the model w e may de r ive s i m i l a r e s t ima tes when
X i s observed with e r r o r and Y is n o t , t h i s l i n e being c a l l e d the
regress ion l i n e of x on Y where x and Y a r e now our observations.
When both Y and X a r e s u b j e c t t o e r r o r at tempts have been made
t o compute both of the regress ion l i n e s of y on x and x on y (which,
i n genera l , a r e d i f f e r e n t ) and "average" them t o g e t an es t ima te of the
t r u e l i n e Y = a + 6 X. While it i s t r u e t h a t 6 l i e s between the s lopes
of these two l i n e s we w i l l no t be able t o f i n d it i n t h i s way.
1.3 The Two Sub-Models of the Errors i n Variables Model
A s of y e t w e have n o t ind ica ted how t h e t r u e values X and Y behave.
There a r e two b a s i c models, a l l s a t i s f y i n g t h e assumptions of the f i r s t
s e c t i o n , which we consider . They a re the func t iona l model and the s t r u c t u r a l
model, t h e l a t t e r being the chief concern of t h i s paper.
1.3.1 The Functional Model
I n t h i s model t h e t r u e values X and Y a r e considered t o be f ixed
(non-random) o r mathematical v a r i a b l e s , both s u b j e c t t o e r r o r s of observation.
It i s the case t h a t X takes on a s e t of f ixed , unknown values: X1, X2,
. . . , Xn c a l l e d " inc iden ta l parameters' ' by Neyman and S c o t t [ 45 ] . Although . - - . - - - .
we w i l l no t e s p e c i a l l y consider t h i s case of the e r r o r s i n va r i ab les model
i n t h e balance of the paper it is worth not ing t h a t when w e have r e p l i c a t e d
observat ions , i . e . when f o r each i we take Ni
a d d i t i o n a l observations
-
on Xi and Y (see chapter 7 ) , t h i s model i s e s s e n t i a l l y no d i f f e r e n t i
from the model t o be described below.
There is an i n t e r e s t i n g paper by S o l a r i [ 5 5 ] which shows t h a t when
a = 0 and the maximum l ike l ihood equations a r e solved we achieve, not a
maximum, b u t a saddle p o i n t and t h a t no maximum l ike l ihood s o l u t i o n e x i s t s .
She presumes t h a t t h i s w i l l a l s o be t h e case f o r a n o t equal t o zero.
For a f u l l e r d iscuss ion on the func t iona l case , r e f e r t o t h e Bibl io-
graphy, e s p e c i a l l y Kendall [29, 301, S ~ r e n t [56] and Vi l l egas [59, 60, 611.
1.3.2 The S t r u c t u r a l Model
The model we descr ibe here w i l l be t h e underlying model f o r the balance
of t h i s paper , although from time t o time some of t h e b a s i c assumptions w i l l
be a l t e r e d .
The ch ie f d i f f e rence between t h e s t r u c t u r a l and the func t iona l models
is t h a t i n t h e s t r u c t u r a l case the t r u e values a r e random var iab les . Our
b a s i c model has X (and thus Y ) fol lowing a normal d i s t r i b u t i o n . Let CI
E(X) = p and E(Y) = V and l e t X have a f i n i t e variance d From a x ' random sample. (xl, yl) , (x2, y2) , . . . , (xnr yn) of s i z e n w e s e e t h a t
From equations (1.9) w e s e e t h a t the model has s i x unknown parameters :
-
2 2 2 a , (3, ?..I, OX, o1 and 02. Since x and y have a b i v a r i a t e normal d i s -
2 2 t r i b u t i o n with parameters ?..I, V , Ox, O and Cov(x, y) it i s c l e a r t h a t
Y
even p e r f e c t information of these parameters w i l l no t be s u f f i c i e n t t o
provide information on the parameters of the s t r u c t u r a l model.
Thus the b a s i c s t r u c t u r a l model, a s it s tands , i s u n i d e n t i f i a b l e and
2 a l l t h a t we a r e able t o es t imate i s p , V , Ox, O2 and COV (x , y ) . Our Y
r e a l i n t e r e s t i s i n es t imat ing B and a and unless we a r e given some
a d d i t i o n a l knowledge, o r a r e prepared t o a l t e r the model, we cannot do this.
This paper dea l s then with ways and means of e s t ima t ing the l i n e a r
s t r u c t u r a l r e l a t i o n i n those cases i n which it i s p o s s i b l e t o do s o and
g iv ing those cases which do and do no t l ead t o c o n s i s t e n t es t imates . I t i s
- - c l e a r t h a t i f b i s a cons i s t en t es t imate of 6 then a = y - b x i s a
c o n s i s t e n t es t imate of a. Thus es t imates f o r a a r e t i e d up i n es t imates
f o r B and we s h a l l n o t consider them f u r t h e r .
I t should be noted a t t h i s p o i n t t h a t use of the words " s t r u c t u r a l f '
and "funct ional" has no t been f ixed i n t h e l i t e r a t u r e , f o r ins t ance Lindley
[ 3 7 , 381 uses the word func t iona l t o denote what we c a l l s t r u c t u r a l models.
I t i s o f t e n d i f f i c u l t t o d i s t i n g u i s h between the two cases and although the
d i f fe rences may be q u i t e minor, i n f a c t o f t en g iv ing t h e same numerical
r e s u l t s , it i s no t c o r r e c t t o force d a t a t o an inappropr ia t e model. This
i l l u s t r a t e s an o f t en neglected r u l e i n s t a t i s t i c s : never p i c k a model o r
decide on t h e type of inferences t o be made a f t e r the d a t a has been c o l l e c t e d ,
t h e r e is a good chance of in t roducing a s e r i o u s b i a s i n t o t h e r e s u l t s . The
c o r r e c t time t o choose a model is before any observations a r e made, s e e
Acton [ 1 1 on t h i s po in t . \_
-
1.4 Example
To i l l u s t r a t e the var ious es t imates given i n the fol lowing chapters
we give an a r t i f i c i a l l y generated example. Since our main i n t e r e s t is the
es t imat ion of the parameters a and f3 and s i n c e t h e b e s t es t imate of a - -
is a = y - b x , where b is the es t ima te of 6, we only given t h e calcu- l a t e d value of b f o r the various es t imates .
A l l of t h e da ta was drawn from random normal t a b l e s [47] and t r ans - -- - ..
2 2 2 formed s o t h a t E(X) = 10, CT = 4, E(E) = ~ ( 6 ) = 0, CT1 = -04 and O2 =
X
,0625. The l i n e chosen was Y = 2 + 2X and 30 values of X were
obtained, from which we ca lcu la ted 30 values of Y. We chose 60 values
each f o r E and 6 , t h e l a s t 30 of which were used only f o r chapter 7
i n which we need r e p l i c a t e d observations.
I n the t a b l e s below we give only t h e computed values of x = X + E
and y = Y + 6 because, i n an a c t u a l experiment, t h e s e would be
observations.
Table 1.1 VALUES OF x AND y
our only
-
The next t ab le only applies fo r repl icated observations, the order of the
observations is the same as i n the table above. For example the i - j
en t r i e s i n Tables (1.1) and (1.2) both correspond t o the same t rue value 'ij.
Table 1 . 2 REPLICATED VALUES OF x AND
We w i l l require the following s t a t i s t i c s and thus l is t them here.
The s t a t i s t i c s apply only t o Table 1.1. They are not required for the
repl icated data.
-
CHAPTER 2
LEAST SQUARES AND MAXIMUM LIKELIHOOD
2.1 I d e n t i f i a b i l i t y of the Parameters -
Let us consider the b a s i c s t r u c t u r a l model a s o u t l i n e d i n s e c t i o n 1.3.
It was mentioned t h a t es t imat ion of f3 i s no t p o s s i b l e wi th the model a s
ou t l ined , we now consider why t h i s i s s o .
The f i v e moments given i n s e c t i o n 1 .3 completely determine a b i v a r i a t e
normal d i s t r i b u t i o n and the parameters may be est imated by the sample
moments which a r e s u f f i c i e n t s t a t i s t i c s . These es t imates a r e ,
- -1 x = n C xi
i=l
- y = n
-1 " i Z l Y i
2 -1 - 2 Sx = n C (xi - X) i-1
2 -1 - 2 S = n 2 ( y i - y )
Y i-1
-1 - - S = n C (xi - X) (yi - y) .
XY i=l
A s previous ly pointed o u t we cannot es t imate our s i x unknown p a r ame t e rs
with these f i v e equations; unless we can somehow "assign" a value t o a t
l e a s t one, no f u r t h e r es t imat ion i s poss ib le .
The f i r s t two of the equations i n s e c t i o n 1.3 do no t con t r ibu te any
information i n es t imat ing the o t h e r parameters. Thus we drop them and
consider
-
The parameters i n (2.2) a r e un iden t i f i ab le , meaning t h a t they cannot
be determined uniquely from the j o i n t d i s t r i b u t i o n of our observed v a r i a b l e s .
Following the terminology of Reiers$l [ 5 0 ] we s h a l l r e f e r t o a s t r u c t u r e when
our parameters and d i s t r i b u t i o n s i n the model have been spec i f i ed . I f
P(x, y ) denotes the d i s t r i b u t i o n of our observed va r i ab les the re w i l l e x i s t
an i n f i n i t y of s t r u c t u r e s genera t ing P (x, y) . These s t r u c t u r e s a r e c a l l e d equivalent i n the sense t h a t they a l l generate t h e same d i s t r i b u t i o n P(x, y )
b u t t h e parameters do not necessa r i ly have t h e same value i n each s t r u c t u r e .
For example, i f x and y a r e j o i n t l y d i s t r i b u t e d a s above with E (x) = P I
= 3, o2 = 9 and cov(x, y ) = 4 then we f i n d t h a t both of the E(y) = V , ox Y
s t r u c t u r e s
l ead t o t h e same j o i n t d i s t r i b u t i o n P (x, y ) of x and y.
I f S1 is any s t r u c t u r e then an equivalent s t r u c t u r e S2 genera t ing
the same P(x , y ) may be formed t ak ing y z 0 such t h a t Y + # 0 2 - 2
and y < o2 6-I OX . S2 i s then formed (Moran [41]) by rep lac ing 2 -. O x '
2 2 2 2 -1 2 2 o,, 02, 8 and a with @ox(i3 + y ) - l I o2 1 + Y o x ( f 3 + ~ ) I o2 - B y o x B + y and a - y ,J r e spec t ive ly .
-
We say t h a t a parameter i s i d e n t i f i a b l e i f a l l equivalent s t r u c t u r e s
lead t o t h e same value of the parameter. Thus w e say t h a t the parameters
i n t h i s model, i n p a r t i c u l a r 6 , a r e not i d e n t i f i a b l e .
We now consider th ree theorems which w i l l t e l l us under what condi-
t i o n s 6 is an i d e n t i f i a b l e parameter. The proofs , which a r e s t r a i g h t -
forward, may be found i n ~ e i e r ~ ~ i l _ l ~ ~ - ] -and (theoxem 1-- -Bty) in-
Reiersgfl and Koopmans [ 351 . . .-- - --- - -
Theorem 1. I f E and 6 a r e normally d i s t r i b u t e d , no t necessa r i ly
independent, 6 i s i d e n t i f i a b l e i f and only i f n e i t h e r X nor Y
i s normally d i s t r i b u t e d .
Theorem 2 . When 6 is i d e n t i f i a b l e t h e o t h e r parameters a r e a l s o iden-
t i f i a b l e i f and only i f n e i t h e r X nor Y is d i v i s i b l e by a normal
d i s t r i b u t i o n (see d e f i n i t i o n 5 , chap te r 1.) and exac t ly one of E and
6 i s i d e n t i c a l l y zero.
Theorem 3. When E and 6 a r e independent and X is normally d i s t r i -
buted then f3 i s i d e n t i f i a b l e i f and only i f the d i s t r i b u t i o n s of n e i t h e r
E nor 6 a r e d i v i s i b l e by a normal d i s t r i b u t i o n .
W e may l ist the i d e n t i f i a b i l i t y of f3 i n t h e various cases w i t h the
following t a b l e given by Reiersgfl 1503. -
-
@ = O o r B = m I f3 no t i d e n t i f i a b l e X no t nornmlly d i s t r i b u t e d I i d e n t i f i a b l e X normally
d i s t r i b u t e d
Nei ther P ( E l nor P (6 )
Table 2 .l. IDENTIFICATION OF 8
d i v i s i b l e by a normal I $ i d e n t i f i a b l e
CASE
d i s t r i b u t i o n
E i t h e r P ( E ) o r P ( 6 )
Conclusion on B
d i v i s i b l e by a normal 6 not i d e n t i f i a b l e I d i s t r i b u t i o n I
B + O
f3 f i n i t e
I t i s c l e a r t h a t i f x and y a r e independent then w e may assume
t h a t X and Y a r e cons tant and t h a t f3 may have any value. I f X
and Y a r e no t independent then B i s no t zero o r i n f i n i t e . Thus we
have t h e conclusion f3 is n o t i d e n t i f i a b l e i f f3 = 0 o r B = 03 i n
t a b l e 2 .l.
-
Least Squares Est imation
A survey o f the e r r o r s i n va r i ab les model would be incomplete wi thout ,
mention of some o f the many l e a s t squares es t imates proposed over t h e yea r s ,
f o r l e a s t squares has been pursued by almost everyone concerned wi th a
regress ion problem. The b a s i c idea is t o minimize a s u m of squares
(poss ib ly weighted) of r e s idua l s i n some d i r e c t i o n . The sum of absolute
values has a l s o been considered, b u t does n o t lend i t s e l f t o Calculus,
-
being discontinuous a t the o r i g i n , and we s h a l l no t cons ider it f u r t h e r .
When only one va r i ab le , Y say , i s observed with e r r o r , the es t imate
derived i n chapter 1 i s an e f f i c i e n t and c o n s i s t e n t e s t ima te of B .
When, however, X i s a l s o s u b j e c t t o e r r o r b is n e i t h e r c o n s i s t e n t
nor unbiased. An exception t o t h i s i s when w e have r e p l i c a t e d observations,
however t h i s w i l l be defer red t o a l a t e r chapter .
Divide t h e numerator and denominator o f b i n equation (2.3) by
n . The denominator converges i n p r o b a b i l i t y t o 2 ox. The numerator may be
w r i t t e n a s
- x-IJ 1 - The expressions - E(yi - V ) , - 2(xi - p ) V and (x - p ) v a l l converge n n
i n p r o b a b i l i t y t o zero. Thus the numerator converges i n p r o b a b i l i t y t o
t h e same l i m i t a s the f i r s t expression. This l i m i t is $ o:/o: which we
may w r i t e a s
R i c h a m o n and Wu bJ.i.give t h e mean of b as /----
-
The expressions i n equations (2 .4 ) and (2.5) a re c lear ly the same.
~ h u s b i s a consis tent and unbiased estimate of
not of f3.
We now consider some of the attempts made t o take the e r rors of the
x observations i n t o account. I t should be noted tha t a l l of the estimates
i n the balance of t h i s sect ion (except for the l a s t one) w i l l , i n general,
be inconsis tent .
It should perhaps be mentioned a t t h i s point t ha t consistency is not
an important property i n small samples (Madansky [ 3 9 ] ) since we a re never .---. --- . -.
too sure jus t how close b i s t o B . Consistency r e l a t e s t o i d e n t i f i a b i l i t y
i n the sense t h a t i f no consis tent estimate e x i s t s then the parameters a r e
not iden t i f iab le , i .e. we have too many parameters.
One of the e a r l i e s t authors, Adcock [2], suggested minimizing the
sums of squares of the normal distances from the observed points t o the
461 ca l led t h i s the major axis of the correlation t rue l i n e . fl-earyan L.
e l l i p s e , making an angle 0 with the X-axis where
tan 28 = 2'11
"20-"02
where the 'ij
denote population moments.
Solving fo r 0 we see t h a t
(lJ20-~02kd ("20-llo2 1 L+4"L 8 = Tan 0 = - 11
-
From equations (2.21 we see tka t sgn@) = ~gnC1-1~~) and hence in
equation (2 .7 ) we w i l l take the positive square root as the negative
root would imply sgn(B) = -sgn(pll) . The population constants
1-111~ 1-120 and pO2 may be estimated from
the sample moments Mll, M20 and MO2 respectively where
' ~ e t T be the estimate of 8 . Ne thus have as estimates of @ and
b = tan T
The standard deviations of these estimates are given by - Kermack and Haldane -- - - -- - - - I-----.
[33] and are
where r, the sample correlation coefficient, i s given by
It was implicit ly assumed for the above estimate tha t the er ror variances
-
were t h e same. I n the nex t s e c t i o n we w i l l a l s o achieve t h i s r e s u l t when
2 -1 we assume ?I = (5 0 known t o be uni ty .
2 1
The disadvantage with t h i s e s t ima te i s t h a t it i s not i n v a r i a n t
under changes of s c a l e , although it i s i n v a r i a n t under r o t a t i o n . I n
p r a c t i c e t h e former i s usually the more important. I t was suggested by
Jones [27] and T e i s s i e r - [ 5 7 ] - (see Ker-%ck_-and - Haldane .-- - - . [ 3 3 ] ) t h a t t h e -- - --- -
coordinates be s tandardized t o overcome t h i s problem. Thus we transform
and hence we make pZ0 - - vo2 = 1 and t a n 0 (see equation (2.8) )
i d e n t i c a l l y un i ty . I n terms of t h e o r i g i n a l d a t a the l i n e becomes
the sign of the s lope being t h a t of pll. This l i n e (2.9) is c a l l e d ,
"The reduced major ax i s" by Kermack and Haldane [ 3 3 ] who give t h e s tandard - -- - - -
dev ia t ions of
and
-
For i l l u s t r a t i o n l e t us so lve t h e l e a s t squares es t imate when we
2 -2 assume = 0 /a i s known. We w i l l no t assume X = 1, b u t ou t s o l u t i o n 2 1 below w i l l be seen t o be t h e same a s eqn. (2.8) i f we s u b s t i t u t e X = 1
i n t o it.
To allow f o r more genera l i ty w e w i l l allow the variance of 'i - a -
B xi t o vary with i. I n t h i s case we minimize the sum
where w a r e inverse ly p ropor t iona l t o t h e variance of y i - c 1 - B ~ ~ i
given X (Deming [16) c f . a l s o Kummel 1361) t h e cons tant of p ropor t iona l i ty
being independent o f i . We follow the method of Lindley - [37 ] and minimize S i n equation
(2.10) . Since X is fixed,
Let w = - I f o r convenience and minimize S with r e s p e c t t o B. i A + B ~
-
A - h s e t t i n g this equal t o ze ro and l e t t i n g a = y - f3x g ives
A
where we have again taken t h e p o s i t i v e square r o o t i n o rde r t h a t f3 and h
cov(x, y) have t h e same s ign . It can be shown t h a t t h i s value f3 does
indeed correspond t o a minimum value of S.
I n the terminology of equation (2.8) t h e above equation becomes
This es t imate o f f3 i s cons i s t en t , and i n f a c t i s the only c o n s i s t e n t
es t imate given i n t h i s sec t ion ; unless i t i s the case t h a t A = 1. It i s
i n t e r e s t i n g t o note t h a t , a s e a r l y a s 1879, Kummel [36] minimized a weighted y-- -- - -A
sum of squares and achieved a r e s u l t , the same a s equation 2 1 , which
agreed with Adcock ' s es t imate only when t h e e r r o r variances were equal . I n
s p i t e of t h i s the re a r e many more l e a s t squares es t imates i n t h e l i t e r a t u r e
which a r e not c o n s i s t e n t and which ignore the e r r o r var iances . Some of
these a r e amazingly complex, an example b e i n q ~ o r k -A-d - [ - 641 .- who has poss ib ly
the most d i f f i c u l t es t imate t o compute, r e l y i n g a s it does on i t e r a t i v e
methods.
For papers genera l iz ing l e a s t squares t o t h e mul t iva r i a t e cases I
r e f e r t o Sprent [56 ] and Vi l legas [59, 60, 611. These papers consider only --- --._- _ _ - - - - - - . - . - - - - -- - - - . - c o n s i s t e n t es t imates .
-
2.3. Maximum Likelihood Est imation
I n order t o es t imate the parameters i n equations (2 .2) , t h e only
information being t h a t of equations ( 2 . 1 , we have shown, v ia R e i e r s d l ' s
theorems, t h a t more information i s requi red . The l a s t th ree equations of
(2.1) a r e t h e maximum l ike l ihood s o l u t i o n s of the parameters on the l e f t
hand s i d e of equations (2 .2) . This s e c t i o n w i l l be devoted t o maximum
l ike l ihood es t imates of 6 where we a r e e i t h e r provided wi th a d d i t i o n a l
information on the e r r o r var iances o r a r e prepared t o make c e r t a i n
assumptions regarding them.
2.3.1. Knowledge of one e r r o r variance
L (A) G1 is known.
2 - 2 I n t h i s case OX - Ox
- O: can be es t imated and we may es t imate @ by
2 (B) G2 i s known.
2 2 The problem i s symmetrical i n GI and G2 and hence we es t imate
I n the f i r s t case the re is a p o s i t i v e p r o b a b i l i t y t h a t the known
value of 0: could tu rn o u t l a r g e r than the es t imate of o implying
2 t h a t OX < 0 which i s impossible. I f t h i s should happen the procedure
-
does no t give an es t imate of $. Simi la r ly i n (B) it could happen t h a t
2 2 S < C2 and again we cannot es t imate 6 by t h i s method. The p r o b a b i l i t y Y
of this happening i n e i t h e r case w i l l tend t o zero a s the sample s i z e
inc reases .
2 2 2.3.2. Knowledge of the r a t i o = 0 /O 2 1
The inconsistency t h a t may a r i s e i n s e c t i o n (2.3.1) cannot occur
i n t h i s case , perhaps this is why this case is the most popular f o r s tudy
i n the l i t e r a t u r e .
With A known, equations (2.2) become
2 2 and $ Cov(x, y ) + 8 ( A o: - o ) - A Cov(x, y ) = 0 and thus we es t imate
Y
h
A s before we take t h e p o s i t i v e square r o o t i n order t h a t f3 w i l l have
t h e c o r r e c t s i g n , t h a t o f Cov(x, y ) . Equations (2.15) can be seen t o
be t h e same a s t h a t achieved by using Lindley ' s weighted least-squares
es t imate of the preceding sec t ion .
-
2 2 2.3.3. Both rS1 and O2 known -
I n t h i s case we could use one of C2.12), (2.13) o r (2.15) t o
give us an es t imate of 8. We could a l s o use (Madansky [39] ) t h e geometric
mean of (2.13) and (2.141,
where sgn (g) i s chosen t o be t h e same a s sgn (cov(x. y ) 1 . Thus we have four (usual ly d i f f e r e n t ) es t imates f o r 8 , a s r e s u l t which d i d not endear
i t s e l f t o many people. This s i t u a t i o n the re fo re became known a s the
"over ident i f ied" case, leading some ( e . g. Allen [ 31) t o recommend t h a t knowledge o f X i s t o be p re fe r red . Kendall _ and -- S t u a r t -_- [32] _ _ . say t h a t
t h i s case is q u i t e unmanageable s i n c e we must ob ta in the maximum l ike l ihood
by solv ing th ree equations f o r the two unknowns, 6 and a 2 X
~ i e f e ~ - m , __- - i n h i s review of t h i s book, does not agree and mentions
2 t h a t the maximum l ike l ihood equations f o r B , ax, a and p can be solved
by maximizing the l ike l ihood with r e s p e c t t o t h e f o u r parameters. Following
this, Barne t t [4] d i f f e r e n t i a t e s t h e log-l ikel ihood wi th r e s p e c t t o a, B, - - -
2 p and OX and r e a l i z e s the equations,
-
again + i s used t o denote sgn CCovCx, y) ) . It is i n t e r e s t i n g t o note t h a t t h i s es t imate f o r f3 i s the same a s
A
t h a t when h i s known. The es t imate 02 i s no t the same a s Cov(x. y) /f3. X
which i s t h e b e s t we could achieve with j u s t h known.
2 I n t h i s case. with both o2 and o known. the same d i f f i c u l t i e s
1 2
mentioned i n s e c t i o n (2.3.1) may a r i s e . I f so l t h i s method f a i l s t o
provide an es t imate .
When both e r r o r variances a r e known Madanskv [35&states t h a t we may
consider the case where COV(E, 6 ) # 0 and t h a t (2.16) i s the maximum
l ike l ihood es t imate of B . H e says t h a t w e may es t imate cov(E, 6 ) = polo2
This i s not c o r r e c t ( c f . Moran [41]) f o r only B2 is i d e n t i f i a b l e with
To see t h i s l e t p # 0 and consider the l a s t equation of (2.21 which
now becomes
and hence i t is no longer t rue t h a t sgn(f3) = sgn(Cov(x, y ) 1 .
I f sgn(p) is known then t h e magnitude o f p and f3 i t s e l f a r e
i d e n t i f i a b l e r however from equation (2.19) we see t h a t sgn(p1 i s not
necessa r i ly t h a t of cov(x, y) and it is unl ikely t h a t sgn(p) w i l l be
forthcoming .
-
I f we a s s m p # 0 and t h a t we know 2 O1 w e can i d e n t i f y only a 2 x while i f only A i s known no parameters a r e i d e n t i f i a b l e . It i s thus wise
when planning an experiment t o at tempt t o keep c o v ( ~ , 6) = 0 i f we p lan
on using maximum l ike l ihood es t imat ion .
2.3.4. When a i s known - I f a is known then Y = a f 6 X passes through the p o i n t X = 0,
Y = a and by t r a n s l a t i o n of the coordinate axes we may make t h e l i n e pass
through the o r i g i n . We may take
a s a c o n s i s t e n t es t imate of f3 s o long a s # 0 A t e s t f o r = 0
should of course be made before es t imates a r e taken on any of the para-
meters, f o r i f s o they would not be cons i s t en t .
There is a danger t h a t could a r i s e i n t h i s s i t u a t i o n , namely when we
a r e only concerned with approximating the t r u e r e l a t i o n , which may be non-
l i n e a r , by a l i n e a r r e l a t i o n over a c e r t a i n range. For example, we know
t h a t the t r u e r e l a t i o n passes through the o r i g i n , however we a r e only
concerned wi th the range a < X < b, a > 0 . I n t h i s event o u r l i n e may
be anything b u t l i n e a r i n t h e v i c i n i t y of the o r i g i n and using a = 0
could s e r i o u s l y e f f e c t our r e s u l t s .
We s h a l l d e f e r t o a l a t e r chapter the study o f r e p l i c a t i o n (another
form of a d d i t i o n a l knowledge) using l e a s t squares and maximum l ike l ihood
methods .
-
I n this chapter , and i n most o the r s , w e give t h e numerical value of
t h e various es t imates given, app l i ed t o t h e d a t a i n chapter 1 i n t a b u l a r
form. Unless comment seems t o be requi red f o r our r e s u l t s we s h a l l l e t
t h e t a b l e s t a n d on i t s own mer i t s . We see from t a b l e 2.2 t h a t the b e s t
es t imates of t h i s chapter gave r i s e t o t h e b e s t numerical values.
Table 2.2. EXAMPLE
EQUATION
2.3
2.8
2.9
2.12
2.13
2 . l 5
2.16
2.17
2.20
METHOD OF ESTIMATION
Least squares, no e r r o r i n X
Minimize normal d i s t ances
Standardized coordinates
2 Max. l ike l ihood O1 = .04, known
Max. l ike l ihood 0: = .0625, known
Max. l ike l ihood = 1.5625, known
Max. l i ke l ihood (I) 0: = .04, o2 = .0625 2
Max. l ike l ihood (11) o2 = .04, B: = .0625 1
a known, a = 2
- -- -
ESTIMATE OF $
-
CHAPTER 3
ESTIMATES DERIVED FROM GROUPING THE DATA
Grouping es t imates a r e loosely based on t h e idea t h a t t o de f ine a
s t r a i g h t l i n e only two p o i n t s a r e requi red . We form two groups, take the
- - - - means, (xl, yl) and (x2, y2) of each and choose t h e l i n e throuqh t h e s e
Some es t imates r equ i re t h a t some of the da ta be dropped, t h i s da ta
being a t h i r d group. To maintain consistency we s h a l l denote our groups
GI, G2 and G3. The da ta t o be dropped, i f any, w i l l comprise G2,
otherwise G2 w i l l be empty.
The theorems i n the previous chapter t e l l us t h a t i f our parameters
a r e no t i d e n t i f i a b l e they remain s o no mat ter how we may rearrange t h e
da ta . With this i n mind we now i n v e s t i g a t e Wald's method [62] which was I--_- -
t he f i r s t published, although the paper by Nair and S t r i v a s t a v a [43] may -- "
have been w r i t t e n concurrently o r perhaps e a r l i e r .
Wald s t a t e s t h a t t h e l i n e Y = a + f3 X can be es t imated i n c e r t a i n
cases from the observed values o f x and y wi thout knowledge of 2 O1
2 and C2. The es t imates a r e a l l cons i s t en t . These cases occur when the
following four assumptions a r e s a t i s f i e d , t h e four th be ing known a s "Wald's
condit ion".
1 The e r r o r terms El, E ~ , . . ., E are independently and n
2 i d e n t i c a l l y d i s t r i b u t e d with f i n i t e variance O~ , a s a r e
the 61, 62, . .. , 6 with f i n i t e variance 0; . n
-
(2) E(ci 6 . ) = 0 for a l l i and for a l l j . 3
(-31 There ex i s t s a s ingle l inear re la t ion between the t rue
values X and Y , i . e . Yi = a + 6 Xi.
I (Xl+. . .+X 1- (XWl+. . .+X I m n (4) l i m . i n f . 7 0 . n
where for convenience we l e t n = 2 m be even. Let
I f the above four conditions are s a t i s f i e d we w i l l estimate 6 by
Let
and
From assumption (3) we have t h a t
W e now show tha t b = b2/bl i s a consistent estimate of 6 i f
-
our four assumptions a r e s a t i s f i e d . From Y = a + 6 Xi we have: i
1 1 2 2 The variance of -[ (6 +. . .+6 ) - (6m1+. . .+6,) ] is ?(n 02) = a2/n and n 1 m -
1 2 I1 t h e variance of -[ (E +. . .+E ) - ( E +. ..+& ) ] = al/n I both of these
n 1 m m+l n
converging i n p r o b a b i l i t y t o zero. Applying assumption (4) t o both the
numerator and denominator ensures t h a t b converges i n p r o b a b i l i t y t o 6 , - -
i . e . b is c o n s i s t e n t . Since b i s c o n s i s t e n t we have t h a t a = y - b x
i s a c o n s i s t e n t e s t ima te o f a .
W e now tu rn t o es t imat ing 2 2 al and a2. Let
The equations ( 3 . 4 ) represent what would be sample es t ima tes of
and Cov(X, Y ) a s given i n equations (2.2) i f we a c t u a l l y knew O x t OY
the t r u e values Xi 2
and Yi. L e t Sx, s2 and S be def ined a s i n Y xy
equations 2 1 . Then
-
Equations (3.5) may be proved a s follows, r e c a l l i n g t h a t es t imates
(2.1) a r e i n f a c t biased. From equations (2.2) we have
b u t
and
Thus from (3.7) and (3.8) equation (3.6) becomes
This proves the f i r s t equation of (3.5) and t h e second equation may be
shown i n an exac t ly s i m i l a r way. The t h i r d equation follows e a s i l y from
assunptions (2) and from the assumption of independence between e r r o r
terms and t r u e values.
From assumption (3) w e know t h a t
Thus from the l a s t o f equations (3.5) and from (3.9)
S u b s t i t u t i n g equations (3.10) i n t o t h e f i r s t two of equations (3.5)
-
gives
We have shown tha t b i s a consistent estimate of and hence it is
c lear tha t the expressions
and
2 converge i n probabili ty t o o2 and o respectively. Thus (3.12) a re
1 2
consistent estimates of equations 3 1 because of the (n/n-1) adjust-
ment they are a l s o unbiased.
Although our estimates are consistent i f assumption (4) i s s a t i s f i e d
they may not be the most effect ive t h a t we could derive. Our observations
i n t o two groups and where
and (xi, yi) C G3 i f i > m. It is true tha t any other division of the
observations w i l l a l so give consistent estimates so long as the grouping
i s performed independently of the errors , E and 6i, and so long as i
condition (4) remains sa t i s f i ed . We consider now how we may improve
our estimates . W e obtain a be t t e r est.imate by finding tha t estimate which w i l l give us
the shortest confidence in terva l for B (Wald [ 6 2 ] ) . We w i l l see i n chapter 8 -- t ha t the shor tes t confidence in terva l a r i ses when lbll is a maximum.
From equations (3.1) we see tha t 1 bll is maximized by ordering
-
the observations, renumbering where necessary, xl 5 x2 5 . . . 5 x . Thus n
G1 i s the s e t { (xl. yl) , (x2, y2) , ..., (xmI ym) 3 and G3 the s e t {hrn+,t Y,,) I ..., (X n , y,)). Depending as it does on the values x 1' x 2' . . . , x it i s unlikely t ha t t h i s grouping w i l l be independent of the
n ' e r ro r s El , E ~ , . . . , E . I f we knew the r e l a t i ve s i ze s of the t rue values
n
X1t X2 , ..., Xn (more on t h i s l a t e r ) we could order the t rue values X1 5
X2 5 . . . C Xn, again renumbering where necessary. and l e t G = { (xi. yi) I 1 xi € bl. X 2 , . . . , xm}1 and Gg = (xi, yi) I Xi E { x ~ + ~ ~ XmtzI .. . I X n l l . This grouping is en t i r e ly independent of the e r rors .
The two groupings, ordering x and ordering X I w i l l be i den t i ca l i n
t he case where the range of E is the f i n i t e i n t e r v a l [-c, c] and a l l of
the observed values xl, x2, ..., x f a l l outside of the i n t e r v a l [XI-c, n
x 1 + c 1 , where x ' denotes the median of xl, x2, ..., x . In t h i s case we n
may order the x ' s with confidence t h a t we have performed the grouping
independently of the e r rors . I n p rac t ice we may order i n t h i s way i f 3 c >
0 such t h a t P [ I E ~ 2 c] i s small and the number of x i n [xl-c, x'+cl i is a l so small.
Let b ' and b" be the est imates of B obtained by ordering the
X1t .-., and the XI, ..., Xn respectively. We consider the case n where b" is not known, a s may of ten be t he case, and w i l l now f ind upper
and lower bounds f o r b" . I f E is normal, l e t v2 be t he sum of the squared res iduals i n the
x-direction divided by degrees of freedom. A good estimate of c w i l l then
be 3v and the i n t e rva l [-c, c] may be considered as a possible range
f o r E . I f E i s not normal it may be wise t o increase c t o a s much as
C = 5v.
Let S be the s e t of a l l possible groupings which s a t i s f y the following
-
conditions
where x' i s again the median value of the x.
For each grouping g € S calculate b and l e t b* and b** be
the obtained minimum and maximum values, respectively. Since the X-ordered
grouping i s i n S we therefore have b* and b** as lower and upper
limits, respectively, of b" . Wald gives a conditions which, i f s a t i s f i ed , w i l l imply tha t the
expression i n assumption (4) w i l l not converge s tochast ical ly t o zero.
This condition is 3 X € R such tha t
where [-c, c ] i s the range of E.
I f X does not have t h i s property, as it obviously doesn't when X,
& and 6 are normal, i t may happen tha t f o r every grouping defined
independently of the errors the expression i n assumption (41 converges
s tochast ical ly t o zero. There i s one case where the expression does not
converge t o zero, even though our variables a re normally dis t r ibuted,
and tha t i s where the order of the X ' s i s known. We can never be sure
of the order by merely looking a t the data a f t e r the experiment, but
sometimes in the laboratory we can s e t up the equipment so t h a t the t rue
X is, say, increased from observation t o observation. I f t h i s i s so
then the ordering given t o the observed x ' s is merely the order of the i r
-
occurrence. We see then that E(x + x + .. . + x ) < y < EIx + .. . + 1 2 m m + l
xn) and that b w i l l not tend t o zero. Thus we achieve, perhaps not
1
the most e f f i c i en t estimate, but a t l eas t a consistent estimate. This i s
yet another verif icat ion of the truism tha t the time t o begin the s t a t i s -
t i c a l analysis is before, not a f t e r , the experiment is performed.
We have dealt with Wald's method i n some d e t a i l because it was the
f i r s t of the grouping estimators and because it is a f a i r ly simple and
straightforward procedure. I t i s a lso qui te commonly misunderstood and,
t o quote Moran [40], "caused a considerable amount of confusion i n the
l i t e ra tu re . . . Im. The main d i f f icu l ty , as might be expected, i s dividing the observations so th3t the dis tr ibut ions of the errors are unaffected.
In the in teres ts of increased efficiency Bar t l e t t [6] divided the --, . . . . - . ---.-. .
observations in to three groups. This was also done by Nair and Str ivis tava
[ 431 (cf . a l so Nair and Bannerjee [ 421) but we w i l l follow the outline of -- .- ---- - - .
Bar t l e t t . Consider the uniform model i n which x = X i s observed without e r ror
and spaced a t equidistant unit intervals . In t h i s case the ordinary l eas t
squares estimate
w i l l provide an unbiased estimate of 6 with an e r ro r variance of 2 - 2
02/E (xi-X) . I f we l e t n = 2k+1 then it can be shown by induction tha t - 2
1 (xi-X) = !?, (!?,+I) (2k+1) . The observations are s p l i t i n t o three groups 3
where the two end groups each have k elements, k i s as close t o n/3
as possible. Bar t le t t then uses the estimate
-
for 8 . For locating the l ine Bar t le t t has i t pass through the overall - -
man (x, y ) while Nair and Shrivistava use (3.13) to locate the l ine ,
as well as t o estimate the slope.
Estimate (3.13) has an error variance of
Thus the re la t ive efficiency of b ' is
1 n which can be shown t o be a maximum when k = - 0 R + l ) = - 3 3 . Thus
When k = n/2 we have
-
Thus, by using th ree groups r a t h e r than two, we have increased the
r e l a t i v e e f f i c i e n c y of our es t imate . The inc rease i n e f f i c i e n c y , a s i t
t u r n s ou t , i s approximately twenty percent . B a r t l e t t suggests t h a t i n
genera l k = n/3 i s t o be p r e f e r r e d t o k = n/2.
Rather than j u s t considering X uniformly d i s t r i b u t e d , Gibson a_"_"%_
Jowett [2 31 went a s t e p f u r t h e r and considered s e v e r a l o t h e r d i s t r i b u t i o n s . -_- -
They found t h a t the three-group method was " su rp r i s ing ly e f f i c i e n t " , b u t
recommended the genera l use of the r a t i o 1 : f o r d iv id ing t h e obser-
va t ions r a t h e r than 1: 1: 1 a s given by B a r t l e t t and Nair and Bannerjee.
They do t h i s s i n c e t h e normal d i s t r i b u t i o n has the more common occurrence,
t h e r a t i o 1 : being optimum i n t h i s case and being f a i r l y good i n
o the r cases , although it is no t t o o good f o r extreme skewness. For those
s p e c i f i c cases where the d i s t r i b u t i o n of X i s known they give the
following t a b l e which gives the optimum r a t i o s .
Table 3.1. OPTIMUM PROPORTIONS
D I S TRI B UTI ON
Normal
Uniform
B e l l Shape
U-Shape
J-Shape
Skew
FREQUENCY CURVE RANGE PROPORTIONS
.27: .46: .27
.33: .33: .33
.31: .38: .31
.39: .22: -39
.45: .40: .15
.36: .45: .19
APPROXIMATE RAT1 OS
1:2:1
1: 1: 1
3: 4: 3
2:1:2
3: 3: 1
4:5:2
-
The f i n a l grouping es t imates t h a t we consider i n t h i s chapter a r e
those due t o Neyman and S c o t t [453. We w i l l b r i e f l y review two methods . -- _
of es t imat ion given by them with necessary and s u f f i c i e n t condit ions f o r
t h e i r consistency. For both o f these methods they admit t h a t cons i s t en t
es t imates of f3 w i l l be achieved i n "very except ional cases only". I n
both methods we do not necessa r i ly assume t h a t the e r r o r s a r e uncorre la ted .
For the f i r s t method f i x two numbers a and b
0 and
Let Z
P ( x > b) > 0 . Let
and W be the mean values of x and y j j i
group G f o r j = 1, 2. j
An es t imate f o r 6 is then
the law of l a rge numbers,
L I b = - 1 z - z '
2 1
such t h a t P ( x 5 a ) >
r e spec t ive ly i n
and converge i n p r o b a b i l i t y t o
E (Wi) and E (zi) r e spec t ive ly , thus the s t o c h a s t i c l i m i t of b is
( E W ~ - E W ~ ) / ( E Z ~ ~ E ~ ~ ) . The authors cons ider condi t ions f o r E (W -W 1 = 2 1 B E(z2-z1). Let ( c , d) be t h e s h o r t e s t i n t e r v a l such t h a t PCc C E 5 d)=
1. Since E (E) = 0 it is c l e a r t h a t c 5 0 5 d and they show t h a t
necessary and s u f f i c i e n t condit ions f o r the consistency of b a r e
-
The second method involves f ix ing two proportions P1 and P 2 '
with pl > 0, p2 > 0 and pl + p2 5 1. Let 3
and W denote the 3
means of x and y respectively f o r i = 1, 2 , . . . [npl] = r and l e t i i
Z 4 and W denote the means of x and yi respectively fo r i = n-S+1, 4 i
n-S+2, . . . , n where S = [nP2] . The estimate for 6 i s then
Neyman and Scot t show tha t i f X and X are points such t h a t P1 1-p2
P(x 5 X ) = pl and P ( x Z X ) = p2 and i f (c, d) has the same p 1 1-p2
meaning a s above then a necessary and su f f i c i en t condition f o r the consis-
tency of b2 is
We w i l l a l so consider grouping methods when we have repl icated
observations and when we use the analysis of variance. These however w i l l
be considered i n l a t e r chapters.
The most important applications of grouping methods occur when we
have some ex t ra knowledge on the posi t ion of the t rue values. For example,
if the order of the X ' s is known o r i f we have knowledge t h a t our X ' s
were achieved from two (or from k) processes (cf. Madansky [39] on t h i s ---------lp _ _ - --
po in t ) , we may form two (or k) groups with each x i n i t s appropriate -/
-- -
group and be assured t h a t , s o long a s E i s independent of the processes,
we have grouped independently of the e r rors and tha t Waldls condition
(assumption ( 4 ) ) i s s a t i s f i e d . This does not contradict the Reirsal
theorems, for we have addi t ional information a t our disposal which can be
used t o give us a consistent estimate of 8 .
-
3.2. Example
The table below giving most of the estimates of th i s chapter i s se l f -
explanatory. We do not give values for the Neyman and Scott estimates
because for any values of a , b or p lt P2
we might reasonably choose
we would get resu l t s s imilar t o others l i s t e d here.
Table 3.2. EXAMPLE
METHOD OF ESTIMATION
Wald's Method : Unordered Data
11 " : x-ordered Data
I I I' : X-ordered Data
B a r t l e t t ' s Method : Unordered Data
11 " : x-orderedData
11 " t X-ordered Data
Optimum Proportion : .27 : .46 : .27
II I1 : x-ordered Data
It II 8 X-ordered Data
ESTIMATE OF 6
1.94
1.95
1.97
1.8
1.97
1.98
1.88
1.98
1.99
-
CHAPTER 4
INSTRUMENTAL VARIABLES
In t h i s chapter we s h a l l consider the use of addi t ional knowledge i n
the form of Instrumental Variables. These instrumental variables form a t
l e a s t one s e t of extra data, highly correla ted with X but independent
of E and 6 . I t i s not too d i f f i c u l t t o f ind variables correla ted with
X and Y, the so-called " invest igat ional variables", y e t it may prove
d i f f i c u l t t o have them independent of the e r rors .
A fu r ther problem t h a t could a r i s e is when the invest igat ional and
instrumental variables a r e so highly correla ted tha t perhaps the inst ru-
mental variables should have been added t o the re la t ion as a t h i r d
dimension. Madansky [39] . -- gives .- an example of t h i s with Y and X being, _ _ _ _ - -
respectively, the pr ice and the quanti ty avai lable of bu t te r . The re la t ion
i s Y = cx + f3 X with Z1 the pr ice of margarine. an instrumental variable.
He points out t h a t the t rue re la t ion may perhaps have been b e t t e r expressed
as y = c i + B x + y z .
Instrumental variables were developed independently by Geary [ 221
and by Reiersdl [48, 491. We consider f i r s t the simplest case where we have -- _ _ - w- - -- - but one instrumental variable.
4.1. One Instrumental Variable observed without e r r o r
Let Zi, i = 1 2, . . . n be a s e t of variables correla ted with and Yi but independent of E and di. We may assume t h a t Zi is
i
observed without e r ro r , for if there is an e r ro r qi , we may replace 'i
-
i n what follows by Zi + Tli (Moran [41]) . In order t h a t the notation be -- -- -
a l i t t l e simpler we consider the homogeneous r e l a t i on and a prime on a
- variable w i l l denote measurement about i t s mean, e .g. X! = Xi - X. Thus
1
our re la t ion becomes Y' = B xl or
where B = -B1/B2.
I f we multiply (4.1) by Z f /n and sum over i we r ea l i ze
Consider a l so the analogous expression involving the observed variables
It is c l ea r t ha t
and
Thus E ( A ) = E ( B ) = 0 s ince we assume independence between Z and the i
e r r o r terms. Var (A) and Var (B) a re both O C ' / ~ ) and hence A and B
both converge t o zero i n probabi l i ty . Thus i r respec t ive of the ac tua l
l imit ing values of C yf Z f , C Y; Z f , C x! Zf and C X I Zf we see tha t 1 1
expression (4.3) converges i n probabi l i ty t o zero and therefore
-
J I
Cx! Z !
converges i n p r o b a b i l i t y t o B . Thus
i s a c o n s i s t e n t es t imate of B = -6 /B s o long a s our assumption 1 2
Cov(z, x) # 0 i s s a t i s f i e d .
Let us now compute an asymptotic variance f o r b f o r t h e case where
t h e l i n e goes through the o r ig in . Since X and Z a r e c o r r e l a t e d w e may
l e t
E X . Z = y 1 i
where y i s a cons tant which we can es t imate s i n c e Z i s observed wi thout
e r r o r . Consider
From t h e law of l a r g e numbers we have t h a t
and
-
The d i s t r ibu t ion of z ( 6 i - 6 ei) Zi converges Limit Theorem t o a normal d i s t r i bu t ion with mean zero and
by the Central
variance
We can estimate 0; s ince Z i s observed without e r r o r and, by a
well-known theorem, the d i s t r ibu t ion of (4.7) converges t o a normal
d i s t r i bu t ion with mean zero and variance
Thus the asymptotic variance of b i s
4.2. Two Instrumental Variables observed with e r r o r
Reiers$l[49] considers the case where we have two s e t s Z1 and z2 --_ - - I of instrumental var iables r e l a t ed by y1 Z1 + y2 Z2 = 0, where Y1 and
y2 are known constants. We assume Z1 and Z2 are observed with e r r o r
and t h a t our observations are
where the random variables Wli
and w have f i n i t e variances and means 2 i
of zero. We assume t h a t a l l e r ro r s a re independent of a l l t rue values and of
each other . The observations can then be represented by the quadruples
-
(xi' Yi t 2 z ) and we es t imate f3 by li' 2 i
Let us show t h a t b i s a c o n s i s t e n t e s t ima te of 6 . Rewrite b a s
I f we t ake the expected value of t h e denominator of (4.11) w e g e t
Applying the law of l a rge numbers t o (4.12) w e g e t
s i n c e we have assumed independence of e r r o r s . Note t h a t Z2 = -Yl/y2 Z l ,
thus the denominator converges i n p r o b a b i l i t y t o
Applying t h e same technique t o t h e numerator of (4.10) we s e e t h a t
-
From equations (4.10) , (4.12) and (4.13) we see t h a t b converges i n
probabi l i ty , so long as Cov(Z X) # 0, t o 1'
Thus b is a consistent estimate of B , so long as Cov(zl, x) # 0.
For an example i n the use of t h i s type of instrumental var iable the
reader is referred t o the paper by Carlsonc-Sobel a x d d m o e which is
concerned with estimation i n a biochemical s i tua t ion .
Durbin [20] considers the one instrumental variable without e r r o r - -. case f o r various values of Z. He shows t h a t i f Z = '1 according as
i
t o whether x i s greater o r smaller than x ' , the median of the x . ' s , i 1
the method reduces t o Wald's method. I f we put the x 's i n t o one of i
three groups and l e t 'i = -1, 0 o r 1 according t o the group i n which
x was placed we have B a r t l e t t ' s method. i
He a l so considers ordering the x and l e t t i n g Z = i, the posi t ion i i
of x . This i s a b e t t e r instrumental var iable i f the order of the x ' s i is tha t of the X ' s , but Durbin shows t h a t it w i l l still be a good choice
even i f the E 's a re re la t ive ly large, i n which case he says t h a t the b ias
should be l e s s than tha t for the or ig ina l variables. I f we can s e t up our
experiment such t h a t our t r u e values a r e increasing then t h i s w i l l be a
per fec t instrumental variable. In t h i s case we do not reorder the x but
l e t Zi be the order of occurrence.
-
45
There is a f u r t h e r s tudy on ins t rumenta l va r i ab les due t o Tukey -[581
which we w i l l consider i n chapter 7, on the Analysis of Variance.
4.3. Example
The concept of using two ins t rumenta l va r i ab les i s not too u s e f u l
i n p r a c t i c e and s ince reference has been made t o an example using t h i s
case we s h a l l apply our example only t o t h e case of one ins t rumenta l
v a r i a b l e observed without e r r o r .
Est imates using Wald's and B a r t l e t t ' s methods have a l ready been
given and it would be p o i n t l e s s t o reproduce them. We consider Durbin's
method o f order ing the x ' s and l e t t i n g Z = i. For comparison we w i l l i
a l s o o r d e r t h e X's. The r e s u l t s a r e
'i = i, order x : b = 1.977
'i =.i, order X : b = 1.983 .
-
CHAPTER 5
CONTROLLING THE OBSERVATIONS
I n many experiments, especial ly under laboratory conditions, we are
not so concerned with measuring X but with actual ly achieving X . For
example, i n an experiment comparing reaction r a t e s of some chemicals with
respect t o temperature it i s not often t h a t we measure the r a t e s and see
what the temperature happened t o be a t the time. Rather, we would pick
cer ta in temperatures, say 15O~. 20•‹c, ..., e t c . , and measure the corres- ponding r a t e s . This is an example of what Berkson [ 71 would c a l l a --__ "controlled experiment"; X, the temperature, being the controlled
variable.
I n the l i t e r a t u r e the model achieved when one of the var iables i s
controlled is often referred t o as the "Berkson Model", s ince Berkson
was the or ig ina tor of the idea. This model makes a pleasant change from
the usual errors-in-variables model s ince, i f the conditions a r e s a t i s f i e d ,
both a and f3 are i den t i f i ab l e with f3 estimated by the "usual"
estimate
I t i s a l s o the case, as Berkson claims, t h a t b is consistent.
Unfortunately he does not show t h i s too well and i s c r i t i c i z e d and disputed
by endal all_-[30]. Lirndley--[38] however gives a mathematical j u s t i f i ca t ion
of the consistency of b i n t h i s model and, following h i s method, we s h a l l
show tha t b is consistent. An almost i den t i ca l solut ion is given by
-
chef f6 1521 who also considers the idea of making several runs or w-
replications on the relat ion. He allows for the poss ib i l i t y of a d i f fe rent
l i ne y = a + b . x being the t rue l i n e for each run due t o circumstances j I
which we may o r may not be able t o control during the J (j = 1, 2, . . . , J) runs. We l e t a and b be random variables with E ( a . ) = and E (b . ) =
j j J I
6 fo r a l l j and t e s t hypotheses which essent ia l ly s t a t e tha t the l ines
are , i n fac t , the same. This w i l l be deferred t o the chapter on confidence
intervals and t e s t s of hypotheses.
In the example a t the beginning of t h i s chapter the thermometer reads
the value we require, X. However we know tha t e r ro r ex i s t s and tha t the
t rue temperature is X I while our observation i s x. We know that x is
not a random variable since it was preselected and E , of course, i s a
random variable defined as before, thus X i t s e l f is a random variable and
we have X = x + E. Similar as t h i s model may appear t o the errors-in-
variables model, there are i n f ac t some rather large differences. One
difference ex is t s i n tha t we do not only drop the assumption of independence
between X and E but can actually point out tha t t he i r correlation
coeff ic ient i s -1. It is c lear tha t i n t h i s model the words "s tructural"
and "functional" have no relevance a t a l l . Our observations on Y are not
controlled so y = Y + 6 is the same as before. So
-
where 5 E ( @ & + 6 ) i s a normally d i s t r i b u t e d random v a r i a b l e with zero 2 2 2 2
expecta t ion and f i n i t e variance G = @ Gl + 02. Equation ( 5 . 2 ) should 3
look f a m i l i a r f o r it represents the c l a s s i c a l l e a s t squares s i t u a t i o n wi th
y a random var iab le , x a f ixed o r mathematical v a r i a b l e and 5 a
normally d i s t r i b u t e d random var iab le with mean zero and f i n i t e variance.
A s shown i n chapter 1 we minimize S = 1 (yi - a - b xi) i n t h e v e r t i c a l d i r e c t i o n and achieve equation (5.1) a s our c o n s i s t e n t es t imate of 6.
We minimize i n the d i r e c t i o n of the uncontro l led observat ions , thus i f w e
had con t ro l l ed y r a t h e r than x we would have minimized S i n the
h o r i z o n t a l d i r ec t ion .
There i s a f u r t h e r i n t e r e s t i n g p o i n t about this model. Let the l i n e
achieved by c o n t r o l l i n g x and minimizing i n the y-direct ion be y = a + 1
bl X. This value of y is an unbiased and c o n s i s t e n t e s t ima te of a + @ x
f o r given x. Let the l i n e achieved by c o n t r o l l i n g y and minimizing i n
the x-direct ion be x = a2 + b2 y , a and b2 w i l l have an expecta t ion of 2 '/B and - f o r given y. I n f a c t this second l i n e w i l l be equal t o t h e f i r s t except f o r poss ib le sampling d i f fe rences which converge t o ze ro
a s sample s i z e increases . This f a c t , impressive i n i t s e l f , l e d Berkson t o
claim t h a t wi th h i s model the re would be only one regress ion l i n e . I n
genera l it is known t h a t the genera l regress ion problem i s n o t i n v e r t i b l e
(see Madansky [ 39 ] f o r a d iscuss ion on this p o i n t ) and t h a t the two l i n e s -- _ ___._ ._._ ----I. --- w i l l be d i f f e r e n t . I f we con t ro l x then the regress ion of y on x gives
the c o r r e c t s o l u t i o n , however t h e regress ion of x on y w i l l n o t ; i t may
even n o t be l i n e a r (Lindley [ 3 8 ] ) . Thus f o r x con t ro l l ed t h e r e a r e two ..- - - .
regress ions and s i m i l a r l y f o r y cont ro l led . This p o i n t was somewhat
ambiguous i n Berkson 's paper.
The variance of b a s given by Berkson and v e r i f i e d by chef pk f g 1521 .---
-
is , i f x i s controlled,
which we may estimate by using @chef•’; [ 5 2 ] )
I f we control y, the variance of b is
The rea l value of t h i s model i s not tha t in some cases we can use
the l e a s t squares and achieve consistent estimates but in tha t we can
apply the model before we begin the experiment. I f it is a t a l l possible
t o control the observations on one of the variables then we should do so
for then no problems with estimation would ar i se .
A l l the t e s t s of hypotheses and confidence intervals t h a t can be
derived in the c lass ica l regression case are formally the same in the
Berkson model. Although the theory is different the actual resul t s w i l l
be the same fo r both models. This i s another advantage of the Berkson
model.
Our example is not considered for the Berkson model for the data,
being drawn a t random, is not applicable. We could, as some do, force
the data t o the model and "see what happens", however we would only be
repeating the l eas t squares solution for no errors i n X of chapter 2.
-
CHAPTER 6
CUMULANTS
The idea of using cumulants and moments t o es t imate has been
considered by severa l authors ( c . f . [21], [28] , [3?_], WL, We i n v e s t i - . ._.__----
g a t e this method wi th p a r t i c u l a r reference t o Geary [21].
For convenience l e t t h e r e l a t i o n be w r i t t e n a s
where 8 = -B1/B2.
Let
Thus B1 Z + B2 W = 0.
If M(h) denotes t h e moment genera t ing function of a random var iab le
and L(h) t h e cumulant generat ing funct ion, then L(h) = Logp M(h) o r -
[ ? 'ihi] . exp L(h) = exp i=l i!
W e def ine the cumulants R (cl, c2) of order c 1 2 + c i n terms of
the moments of the (xi, yi) w i t h t h e following i d e n t i t y i n 6.
Since
-
and a w e l l known theorem s t a t e s t h a t the moment generat ing funct ion (m.g.f.)
of a sum of independent random v a r i a b l e s is the product of t h e m. g. f . ' s ;
we have t h a t , Z and E and W and 6 being independent, the re fo re
Let L(cl, c2) denote t h e c + c o rde r cumulant of (2 , W ) and l e t 1 2 cl, c2 # 0. Geary shows t h a t , from (6.3) and (6.4) I L(cl, c ) = R(clt c 2 ) -
2
A fundamental proper ty of cumulants is t h a t they a r e i n v a r i a n t under
change of o r i g i n i f the order i s a t l e a s t two (one, i n the un iva r i a t e case ) .
Thus t h e r e w i l l be no d i f f i c u l t y i n computation even though we performed
t h e transformations i n equations (6.2). I n f a c t it was t h i s proper ty of
cumulants t h a t l e d them t o t h e i r o r i g i n a l name of "semi-invariants".
(Kendall and ~ t u a r t - L 3 1 1 ) . ~ - - - - - _ I _ _ _ _ _ _ -.
Theorem 6.1.
-1 To prove this l e t y = and rewri te equation (6.1) as Z = y W , then
E[exp(B1z + B2w).I = E[exp(Bly + B2)w1 (6.6)
i s an i d e n t i t y . Therefore
-
is immediate from equations (6.3) and (6.6) . Consider equation (6.7) and i d e n t i f y the c o e f f i c i e n t s of
on b o t h s i d e s of the equation. Clear ly the c o e f f i c i e n t on the l e f t hand
s i d e i s L(cl+l , c 2 ) , f o r t h e r i g h t hand s i d e l e t d = c + c + 1 and 1 2 consider the expansion
of (B1y + fi2) and i ts c o e f f i c i e n t
Thus the required term of the r i g h t hand s i d e i s
and our requi red c o e f f i c i e n t i s
c +1 (c, +l) ! y L' (dl
~ e t us now i d e n t i f y the c o e f f i c i e n t s of (c2+1) on both
-
sides of equation C6.7) . On the l e f t hand s ide the coeff ic ient is clear ly L(cl, c2 + 1) , for
the r ight hand s ide l e t d = c + c + 1 and consider the term 1 2
d of the expansion of (o1y + e2) . Thus the required term of the r ight hand s ide is
Tkus the required coeff ic ient is
which is equal t o the l e f t hand coeff ic ient of L(clI C, + 1). Multiply
both of these coefficients by y so tha t
d-c2 cl+l -1 Me know tha t y ' Y and (.cl + 1) ! (d - c2) ! i s equal t o
, -1 (c2 + 1): (d - cl). since d = c 1 2 + c + 1. Thus expression (6.7) is equal t o the r ight hand s ide of (6.8) and
o r B1 L(c1 + 1, c,) + B2 L(cl, c2 + 1) = 0, which proves he or em 1.
The re fore
-
and may be estimated by
Often a "k" is used i n place of the "R", i.e.
i n which case the s t a t i s t i c s i n (6.12) are known as "k-stat is t ics"
CKaplan [ 281) . --.-,----
For convenience we now give the values of up t o fourth order cumulants
i n terms of the moments measured about the means as
L(1) = M(1) = 0
given by Coc
The sample k-s ta t i s t ics are unbiased and consistent estimates of the
population cumulants , thus the quotient (6.11) converges i n probability
-
t o B (6.10) , s o long a s !4 Ccl + I, c2) does not tend t o zero , i .e.
L(cl +- 1, c2) # 0. Thus s o long as L(cl + I, c ) # 0 we have an i n f i n i t y 2 of c o n s i s t e n t es t imates of 6.
In our b a s i c s t r u c t u r a l model we a r e concerned wi th normal d i s t r i b u -
t i o n s which, a s i s w e l l known, have vanishing cumulants i f the order i s
t h r e e o r higher. Since c + c + 1 is always a t l e a s t th ree we thus know 1 2 t h a t the cumulants w i l l vanish and t h a t , i n the normal case , t h e use of
cumulants is o f no value whatsoever.
In t h e normal case then cumulants w i l l n o t be used f o r es t imat ion;
unfor tunate ly they a re no t much b e t t e r f o r t h e non-normal cases. When we
compute moments (and the re fo re cumulants, s ee equations (6.13) ) of order
h igher than four we a r e almost wasting our e f f o r t s , unless we have a very
' l a r g e number o f observat ions , f o r t h e inaccuracy becomes q u i t e unmanageable
Wendall and S t u a r t [ 311 . - --.---- - It must a l s o be remembered t h a t f o r symmetric d i s t r i b u t i o n s a l l odd
o rde r cumulants a r e zero and a l l i n a l l it must be s t a t e d t h a t es t imat ion
via cumulants w i l l genera l ly be unsa t i s fac to ry . Ne do not es t imate 6 f o r our example v i a t h e r e s u l t s of t h i s chapter .
Any r e s u l t s t h a t w e could g e t would n o t have any value s i n c e X i s normally
d i s t r i b u t e d and t h i s i s p r e c i s e l y the case where we do no t use cumulants.
-
CHAPTER 7
THE ANALYSIS OF VARIANCE
7.1. Repl ica t ion of the observations
2 2 2 -2 Fle have seen t h a t when we know
O l f O2 o r X = O 0 no problem
2 1
a r i s e s i n e s t ima t ing the l i n e a r r e l a t i o n . We consider now the i d e a of
being able t o es t imate a t l e a s t one of these parameters from another type
of a d d i t i o n a l knowledge ; r e p l i c a t i o n .
Repl ica t ion has not been considered much i n the l i t e r a t u r e . Dorf and "- - -
Gurland 1171 f e e l t h a t t h i s may be because those a c t u a l l y involved i n --. .-- -.
experiments do n o t themselves r e p l i c a t e t h e i r observations. Why this
should be s o i s not too c l e a r s ince v i l l e g a s [ 5 9 L p o i n t s o u t t h a t t h e r e
i s no d i f f i c u l t y i n i d e n t i f i c a t i o n nor i n the achieving of cons i s t en t
e s t ima tes of the parameters when r e p l i c a t i o n is ava i l ab le .
~ o l l o w i n g the s o l u t i o n of Hays [24] we s h a l l de r ive , by way o f the -----_I_-
ana lys i s o f var iance , t h r e e es t imates due t o ~ u k q [58] along w i t h a four th -- es t ima te given by Dorf and Gurland [17] who a l s o der ive t h e f i r s t th ree - '.A*
b u t w i t h . t h e assumption t h a t COVCE, 6 ) = 0. This assumption we w i l l drop
and g ive a l l four es t imates f o r t h e case t h a t E and 6 may o r may no t
be independent. Although this chapter is mainly concerned wi th t h e Analysis
of Variance, t h i s s e c t i o n w i l l a l s o consider o t h e r e s t ima tes f o r t h e case
o f r e p l i c a t i o n .
It i s convenient t o change t h e no ta t ion somewhat and say t h a t we have
n "treatments" wi th N i = 1 , 2 , . . . , n, observations on each treatment. i
-
t he t r u e values X i Y . Thus i f (XI, Y1) (X2, Y 2 ) , . . , ( X n , Yn) a r e t h e t r u e values o f our n t rea tments then the observations w i l l be
To keep g e n e r a l i t y we do no t assume t h a t a l l t h e Ni
a r e equa l , although
w e w i l l assume t h a t N . 1 2 f o r a t l e a s t one i. This assures us t h a t 1
we do indeed have r e p l i c a t e d observations.
Define a a s t h e e f f e c t of t reatment i , s o t h a t ai = Xi - u , i
then ai
i s a normally d i s t r i b u t e d random v a r i a b l e with mean ze ro and
var iance O2 Define the following q u a n t i t i e s x '
n Me = - C . Eij w h e r e N = C N .
N i l l i=l i
Let SST denote the t o t a l sum of squares ,
- 2 SST = ? . (xij-x)
11
-
5 8
Thus the SST can be broken down in to two components; the variation
- 2 due t o both treatments and chance (2 Ni(xi - X ) which we c a l l the sums of squares between (SSB) , and the variation due t o chance alone which w e
c a l l the sums of squares within (SSW). Let us consider SSB. W e show
f i r s t t ha t
Since xij = lJ + ai + E i j we have
and
L .*. SSB = L Ni (p + ai + Me - l~ - Ma - Me)
i
since
-
59
We define the mean square between (MSB) as SSB divided by degrees of
freedom; there are n-1 d. f . fo r SSB. The expected value of the mean
square between (EMSB) is then
When the N are a l l equal t o m f o r a l l i then i
from the defini t ions of a and M . When the N are not a l l equal, i a i
Snedecor and Cochran [ 541 give . - . --
Since the expressions i n (7.4) and (7.3) are ident ica l when the Ni
are a l l equal we s h a l l use (7.4) i n e i the r case, thus we maintain the
generali ty of having Ni
not constant. It may a lso be shown t h a t
From equations (7.4) and (7.5) we have t h a t
2 2 N - C N ~
EMSB = (T + O N(n-1) X 1 '
W e w i l l denote t h i s expression B xx'
The mean square within (MSW) is defined as the SSW divided by
degrees of freedom. There are N-n degrees of freedom fo r SSW.
-
To evaluate the expected mean square wi th in (EMSW) l e t i be f ixed and
l e t xilf x~~~ ..., x be t h e r e p l i c a t e d observations on X . I f we i N i i
- 2 only consider t reatment i we may es t imate Xi by xi and G1 by
1 - 2 - x i - x . ) . Using a l l of the observations then we can es t imate 2 N 2 - 1 3 1 O1 1. - - 2
by ? x i - . ) divided by degrees of freedom, c l e a r l y we have 1 3 1
n 2 1 (Ni - n) = N - n d. f . Therefore w e e s t ima te 0 by i= 1 1
This expression i s the same a s (7.7) thus MSW es t imates 0: and
2 EMSW = Dl. We c a l l t h i s quan t i ty Wxx.
I n an analogous manner we may c a l c u l a t e t h e v a r i a t i o n due t o the y ' s
'"YY and W ) and t h a t due t o both x and y cBxy and W . W e
YY xy
t a b u l a t e these r e s u l t s i n t a b l e 7.1.
Table 7.1. ANOVAR TABLE FOR REPLICATION
SOURCE OF VARIATION + MEAN SQUARE: EXPECTED MEAN SQUARE
-
On inspect ion of the expected mean squares we f ind t h r e e es t imates o f
B a l l converging i n p r o b a b i l i t y t o 6 a s n +- and Ni +- f o r a t l e a s t
one i , s o long a s t h e r e spec t ive denominators do n o t converge t o zero.
Thus t h e following th ree es t imates a r e a l l cons i s t en t .
We do no t necessa r i ly assume Cov(&, 6 ) # 0; i f we do however then
C s g n 6) i s not necessa r i ly t h a t of Cov(x, y ) (see chapter 2, s e c t i o n 2.3.3).
2 W e may o b t a i n (sgn 8 ) from Cov(x, y ) = 6 a + Cov(&, 6 ) using W o r X xy from a p l o t t i n g of t h e observations.
We may a l s o use t a b l e 7.1 f o r the func t iona l case i f w e modify t h e
t a b l e s l i g h t l y . I n the func t iona l case a a r e no t random var iab les , b u t i 2
w e can i n t e r p r e t them a s " f ixed e f f e c t s " . The quan t i ty (5 has no meaning, X 2
however, we may use the t a b l e by r e t a i n i n g E N . (ai - Ma) a s i n (7.2) and i 1 2 2
s u b s t i t u t e t h i s d i r e c t l y i n the t a b l e f o r (N - 3 ) OX. The balance of N
this s e c t i o n i s devoted t o o t h e r es t imates o f B f o r the case of r e p l i c a t i o n .
With r e p l i c a t i o n we have an immediate e s t ima te o f A = G:/G: and we can
use the maximum l ike l ihood es t imate of 8 with A known. This es t imate i s
(Dorf and Gurland [171)
where
-
It i s c l e a r t h a t b4 i s cons i s t en t s o long a s B # 0. I f B = 0 then xy xy
h
we may take b = 0 c o n s i s t e n t l y unless B - X B = 0 a l s o , i n which 4 Y Y XX
event b4 is indeterminate.
The es t imate b4 i s no t r e a l l y a maximum l ike l ihood es t ima te of B
f o r we do no t know A ; we only have an es t ima te of A. For the r ep l i ca ted
case Vi l legas [59] gives the maximum l ike l ihood s o l u t i o n f o r f i and we -
s h a l l o u t l i n e h i s method. Assume t h a t E and 6 a r e not independent,
an assumption t h a t w e usual ly l i k e t o make and a r e able t o make i n t h e
r e p l i c a t e d case , and l e t
These a r e vec to r q u a n t i t i e s , hence t h e underl ining. We l e t the number
o f r e p l i c a t i o n s be the same f o r each t rea tment , say Ni = M. Notice t h a t
Let C be the (unknown) variance-covariance matr ix o f t h e e r r o r s i j e . The p r o b a b i l i t y dens i ty f o r one - z ( ignoring the cons tant ) is
-
and therefore the l ike l ihood function i s
W e know from t a b l e 7.1 t h a t Z can be estimated. l e t t h i s es t imate h
be S. Vil legas shows t h a t 8 and 8