linear structural relation - summitsummit.sfu.ca/system/files/iritems1/4166/b13525062.pdf · 2 ox...

106
THE ERRORS I N VARIABLES MODEL: ESTIMATION OF THE LINEAR STRUCTURAL RELATION by Stephen Werner B. Sc. , Simon Fraser University, 1969 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREmNTS FOR THE DEGREE OF MASTER OF SCIENCE in the Department 0 f Mathematics (C) STEPHEN WERNER 1973 .L.' SIMON FRASER UNIVERSITY March 1973 All rights reserved. This thesis may not be reproduced in whole or in part, by photocopy or other means, without permission of the author.

Upload: others

Post on 27-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • THE ERRORS I N VARIABLES MODEL: ESTIMATION O F

    THE LINEAR STRUCTURAL RELATION

    by

    Stephen W e r n e r

    B. Sc. , S i m o n Fraser U n i v e r s i t y , 1969

    A T H E S I S SUBMITTED I N PARTIAL FULFILLMENT OF

    THE R E Q U I R E m N T S FOR THE DEGREE O F

    MASTER O F SCIENCE

    i n the D e p a r t m e n t

    0 f

    M a t h e m a t i c s

    (C) STEPHEN WERNER 1973 .L.'

    SIMON FRASER UNIVERSITY

    M a r c h 1973

    All r igh ts reserved. T h i s thesis may n o t be

    reproduced i n w h o l e or i n part , by photocopy

    or other m e a n s , w i t h o u t p e r m i s s i o n of the

    author.

  • APPROVAL

    Name: Stephen Werner

    Degree: Master of Science

    T i t l e of Thesis: The Errors i n Variables Model: Estimation of the

    Linear Structural Relation

    Examining Committee :

    Chai man : C. Y. Shen

    - >---- . C . Villegas

    Senior Supervisor

    D. Eaves

    - - - - . R. Rennie

    D. Mallory V External Examiner

    Date Approved: 16, 1973

    (ii)

  • PARTIAL COPYRIGHT LICENSE

    I h e r e b y g r a n t t o Simon F r a s e r U n i v e r s i t y t h e r i g h t t o lend

    my t h e s i s o r d i s s e r t a t i o n ( t h e t i t l e o f which is shown below) t o u s e r s

    o f t h e Simon F r a s e r U n i v e r s i t y L i b r a r y , and t o make p a r t i a l o r s i n g l e

    c o p i e s o n l y f o r s u c h u s e r s o r i n r e s p o n s e t o a r e q u e s t from t h e l i b r a r y

    o f a n y o t h e r u n i v e r s i t y , o r o t h e r e d u c a t i o n a l i n s t i t u t i o n , on i t s own

    b e h a l f o r f o r one of i t s u s e r s . I f u r t h e r a g r e e t h a t pe rmiss ion f o r

    m u l t i p l e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be g r a n t e d

    by me o r t h e Dean of Graduate S t u d i e s . It is unders tood t h a t copying

    o r p u b l i c a t i o n of t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t b e a l lowed

    w i t h o u t my w r i t t e n pe rmiss ion .

    T i t l e of ~ h e s i s / ~ i s s e r t a t i o n :

    GRRoRL' ,^I J~R,,E.LGs I ' (o\E~ :

  • ABSTRACT

    Consider the l inear re la t ion Y = a f @ X where a and 6 are

    constants which we wish t o estimate from a sample of n p a i r s of

    observations. We cannot d i rec t ly measure X o r Y because of e r rors

    of observation, ra ther we measure x = X + & and y = Y + 6. The random

    variables & and 6 are normally d i s t r ibu ted with mean zero and f i n i t e

    variances 2 2 al and O2 respectively. I f X is also normally d i s t r ibu ted

    (independently of E and 6 ) with mean zero and variance 2 OX we are i n

    a s i t ua t ion known as the l i nea r s t r u c t u r a l model. I f the d i f f e r en t values

    of X a re considered as addit ional parameters we a re i n a s i t ua t ion known

    as the l i nea r functional model. Chiefly we w i l l deal with the s t ruc tu ra l

    model and it is the case t h a t no parameters of t h i s model can be consistently

    estimated. The purpose of t h i s paper i s t o show why t h i s i s s o and t o show

    what is required i n the way of ex t ra knowledge o r assumptions i n order t h a t

    we may consistently estimate these parameters. We are chief ly concerned

    with estimating f3 and we give estimates f o r the various cases as they

    a r i s e . In many cases an asymptotic variance of the estimate i s also given.

    The l a s t chapter of the paper i s essen t ia l ly concerned with confidence

    in t e rva l s f o r B .

  • ACKNOWLEDGMENT

    I would l i k e t o take t h i s opportunity t o express my g r a t i t u d e t o

    Professor Cesareo Vi l legas f o r suggest ing the t o p i c and f o r h i s he lp and

    encouragement while preparing t h i s t h e s i s . I would a l s o l i k e t o thank

    Simon Frase r Universi ty and the Simon Frase r Universi ty P r e s i d e n t ' s

    research g r a n t f o r t h e i r f i n a n c i a l support. F i n a l l y , I want t o thank

    M r s . A. Gerencser f o r typing the e n t i r e work.

  • TABLE OF CONTENTS

    PAGE

    T i t l e Page

    Approval

    Abst rac t

    Acknowledgment

    Table of Contents

    L i s t of Tables

    CHAPTER 1 In t roduc t ion :

    1.2 The C l a s s i c a l Leas t Squares So lu t ion

    1 . 3 The Two Sub-Models of t h e E r r o r s i n Var iables

    Mode 1

    1.3.1 The Funct ional Model

    1.3.2 The S t r u c t u r a l Model

    1.4 Example

    CHAPTER 2 Leas t Squares and Maximum Likel ihood I

    2.1 I d e n t i f i a b i l i t y of t h e Parameters

    2.2 Leas t Squares Est imation

    2.3 Maximum Likel ihood Est imation

    2.3.1 Knowledge of one e r r o r va r i ance

    2 2 2.3.2 Knowledge of t h e r a t i o A = CS /cf

    2 1 2

    2.3.3 Both 0 and CS: known

    2 . 3 . 4 When a i s known

    2 .4 Example

    CHAPTER 3 Est imates Derived from Grouping t h e Data ,

    3.2 Example

    i

    i i

    iii

    i v

    v

    v i i

    1

    2

    5

  • CHAPTER 4 Instrumental Variables '

    4.1 One Instrumental Variable Observed Without

    Er ro r

    4.2 Two Instrumental Variables Observed With E r r o r

    4.3 Example

    CHAPTER 5 Contro l l ing the Observations - CHAPTER 6 Cumulants (-

    CHAPTER 7 The Analysis of Variance

    7.1 Replicat ion of the Observations

    7.2 The Analysis of Variance f o r an Instrumental

    Variable

    7.3 The Analysis of Variance f o r the Method of

    Grouping

    7.4 Example

    PAGE

    39

    3 9

    CHAPTER 8 Confidence I n t e r v a l s and Tes ts of Hypotheses , ,' 75

    8.1 Confidence I n t e r v a l f o r @, No Extra Information 75

    8.2 The Case When h is Known 78

    2 2 8.3 The Case When Both cll and c12 a r e Known 81

    8.4 The Use of Ins t rumenta l Variables 82

    8.5 Confidence I n t e r v a l s Based on Wald's Method 83

    8.6 Confidence I n t e r v a l s Using Three Groups 86

    8.7 Tes t ing Equal i ty of Lines Derived from Severa l 87

    Runs i n the Berkson Model

    8.8 Confidence I n t e r v a l s f o r t h e Replicated Case 89

    BIBLIOGRAPHY 94

  • LIST OF TABLES

    Table 1.1 Values of x and y

    1.2 Replicated Values of x and y

    2.1 Iden t i f i ca t ion of B

    2.2 Example

    3.1 Optimum Proportions

    3 . 2 Example

    7.1 Anovar Table f o r Replication

    7.2 Anovar f o r Regression with an Instrumental Variable

    7.3 Example

    PAGE

    8

    9

    13

    25

    35

    38

    60

    6 9

    7 4

  • CHAPTER 1

    INTRODUCTION

    Consider a l i n e a r r e l a t i o n Y = a + 6 X between two unobservable

    va r i ab les X and Y , where a and 6 a r e unknown constants . The

    purpose of t h i s paper i s t o p resen t var ious es t imates of t h i s r e l a t i o n .

    I n p a r t i c u l a r we w i l l be concerned wi th e s t ima t ing 6 , t h e slope.

    A t y p i c a l experiment w i l l c o n s i s t of n observations (xi, yi) where

    W e assume t h a t the Ei a r e i d e n t i c a l l y and independently d i s t r i b u t e d with

    mean ze ro and f i n i t e variance 2

    0 . S imi la r ly 6 w i l l have mean ze ro and i

    f i n i t e variance 0:. Unless otherwise s t a t e d both E and 6 w i l l be

    assumed t o follow a normal d i s t r i b u t i o n and be independent. The t r u e values ,

    'i and Y w i l l always be assumed independent of E and 6i, f o r every

    i t i

    The following a r e we l l known d e f i n i t i o n s which w i l l be use fu l throughout A

    t h e balance of t h i s paper. Let 8 denote an a r b i t r a r y parameter and l e t en

    be an es t imate of 8 based upon t h e random sample xl, x2, ..., x . n A

    Defini t ion 1: A cons i s t en t es t imate 8 of 8 i s one which converges i n n

    p r o b a b i l i t y t o 8. That i s , Gn i s c o n s i s t e n t i f , V E > 0 ,

  • n

    Defini t ion 2: 8 i s an unbiased es t imate of 8 i f E (gn) , the expected n

    value of 6, i s equal t o 8. It is n o t necessa r i ly t r u e t h a t a cons i s t en t

    es t imate is unbiased.

    h

    Defini t ion 3 : 6, i s a s u f f i c i e n t s t a t i s t i c i f , given the value of On , t he cond i t iona l d i s t r i b u t i o n of xlr X 2 , . . . X n i s independent of 8. In o the r words we g e t no e x t r a knowledge on the value of 8 by

    having complete knowledge of the sample values.

    Def in i t ion 4: Let X1 and X2 be real-valued continuous random var iab les

    w i t h d i s t r i b u t i o n funct ions F1 ( 0 ) and F2 ( * ) r e spec t ive ly . I f , f o r every

    r e a l number z ,

    we say t h a t F3 is the convolution of F1 and F2 and w r i t e F3 = FI * F2.

    It i s w e l l known t h a t i f X1 and X2 a r e independent then F3 i s t h e

    d i s t r i b u t i o n of X .t X2. 1

    Defini t ion 5 : We say t h a t a d i s t r i b u t i o n funct ion F1 i s d i v i s i b l e by a

    d i s t r i b u t i o n function i f the re e x i s t s a d i s t r i b u t i o n funct ion F3 such

    t h a t F1 = F2 * Fg.

    1.2. The C l a s s i c a l Least Squares Solut ion

    Let x and y be r e a l valued random var iab les wi th f i n i t e second

    moments , perhaps independent,

  • def ined on the same p r o b a b i l i t y space of r e f erence. Let a and 6 be

    constants and consider t h e random var iab le Z = a + f3 x.

    The regress ion l i n e of y on x i s defined as t h a t l i n e y = a + B x 2 1 where a and 6 a r e chosen s o t h a t E(y - Z) i s minimized . Let

    2 d = y - Z and consider the minimization of E(d ) .

    2 Let E(x) = p and E(y) = V. Let ox = E(x - p) 2 2 2 and a = Ely - V ) .

    Y

    Thus

    The only term involving a i s (a - V + 6 p) 2 and hence E(d ) i s minimized wi th r e spec t t o a by p u t t i n g (a - v + . f3 p ) = 0. Thus we have

    S u b s t i t u t i n g t h i s value of a i n t o equation ( 1 . 2 , d i f f e r e n t i a t i n g wi th

    r e spec t t o 6 and s e t t i n g equal t o zero g ives

    The l e a s t squares regress ion l i n e of y on x i s then

    1 --

    - This is not the usual d e f i n i t i o n of a regress ion curve, E (Y I x) , although it i s t r u e t h a t when x and y have a b i v a r i a t e normal d i s t r i b u t i o n t h i s regress ion l i n e w i l l coincide with t h e regress ion curve of y on x.

  • We know t h a t Cov ( x i y) and O2 can be consistently estimated by X

    and

    respectively, thus

    i s a consistent estimate of B. We also see t ha t

    i s a consistent estimate of a.

    The expressions i n (1.7) and (1.8) are the values t h a t minimize

    2 1 (yi - a - @ xi) and fo r t h i s reason are cal led the l e a s t squares estimates of a and @. No er rors have been associated with e i t h e r x o r y and no claim has been made of a l i nea r re la t ion between them. The under-

    lying model for t h i s case i s y = a + @ x + 6 , where 6 is a random

    variable with E (6) = E (x6) = 0 and represents random var ia t ion associated

    with the data.

    Let Y and X be l inear ly re la ted random variables with Y = a + @ X

    and l e t Y be unobservable. Our observations are 'i and yi, f o r

    i = 1 2 . n where yi = yi + G i 1 ~ ( 8 ) = 0 and Y and 6 are independent. The underlying model f o r the l e a s t squares regression l i ne

    of y on X i s then y = a + @ X + 8. ~t i s c lear t h a t , with X

  • replac ing x above, these two models may be considered i d e n t i c a l . There-

    h

    f o r e the s t a t i s t i c s and $ a s given by (1.7) and (1.8) a r e

    c o n s i s t e n t es t imates of a and 6 .

    From the symmetry of the model w e may de r ive s i m i l a r e s t ima tes when

    X i s observed with e r r o r and Y is n o t , t h i s l i n e being c a l l e d the

    regress ion l i n e of x on Y where x and Y a r e now our observations.

    When both Y and X a r e s u b j e c t t o e r r o r at tempts have been made

    t o compute both of the regress ion l i n e s of y on x and x on y (which,

    i n genera l , a r e d i f f e r e n t ) and "average" them t o g e t an es t ima te of the

    t r u e l i n e Y = a + 6 X. While it i s t r u e t h a t 6 l i e s between the s lopes

    of these two l i n e s we w i l l no t be able t o f i n d it i n t h i s way.

    1.3 The Two Sub-Models of the Errors i n Variables Model

    A s of y e t w e have n o t ind ica ted how t h e t r u e values X and Y behave.

    There a r e two b a s i c models, a l l s a t i s f y i n g t h e assumptions of the f i r s t

    s e c t i o n , which we consider . They a re the func t iona l model and the s t r u c t u r a l

    model, t h e l a t t e r being the chief concern of t h i s paper.

    1.3.1 The Functional Model

    I n t h i s model t h e t r u e values X and Y a r e considered t o be f ixed

    (non-random) o r mathematical v a r i a b l e s , both s u b j e c t t o e r r o r s of observation.

    It i s the case t h a t X takes on a s e t of f ixed , unknown values: X1, X2,

    . . . , Xn c a l l e d " inc iden ta l parameters' ' by Neyman and S c o t t [ 45 ] . Although . - - . - - - .

    we w i l l no t e s p e c i a l l y consider t h i s case of the e r r o r s i n va r i ab les model

    i n t h e balance of the paper it is worth not ing t h a t when w e have r e p l i c a t e d

    observat ions , i . e . when f o r each i we take Ni

    a d d i t i o n a l observations

  • on Xi and Y (see chapter 7 ) , t h i s model i s e s s e n t i a l l y no d i f f e r e n t i

    from the model t o be described below.

    There is an i n t e r e s t i n g paper by S o l a r i [ 5 5 ] which shows t h a t when

    a = 0 and the maximum l ike l ihood equations a r e solved we achieve, not a

    maximum, b u t a saddle p o i n t and t h a t no maximum l ike l ihood s o l u t i o n e x i s t s .

    She presumes t h a t t h i s w i l l a l s o be t h e case f o r a n o t equal t o zero.

    For a f u l l e r d iscuss ion on the func t iona l case , r e f e r t o t h e Bibl io-

    graphy, e s p e c i a l l y Kendall [29, 301, S ~ r e n t [56] and Vi l l egas [59, 60, 611.

    1.3.2 The S t r u c t u r a l Model

    The model we descr ibe here w i l l be t h e underlying model f o r the balance

    of t h i s paper , although from time t o time some of t h e b a s i c assumptions w i l l

    be a l t e r e d .

    The ch ie f d i f f e rence between t h e s t r u c t u r a l and the func t iona l models

    is t h a t i n t h e s t r u c t u r a l case the t r u e values a r e random var iab les . Our

    b a s i c model has X (and thus Y ) fol lowing a normal d i s t r i b u t i o n . Let CI

    E(X) = p and E(Y) = V and l e t X have a f i n i t e variance d From a x ' random sample. (xl, yl) , (x2, y2) , . . . , (xnr yn) of s i z e n w e s e e t h a t

    From equations (1.9) w e s e e t h a t the model has s i x unknown parameters :

  • 2 2 2 a , (3, ?..I, OX, o1 and 02. Since x and y have a b i v a r i a t e normal d i s -

    2 2 t r i b u t i o n with parameters ?..I, V , Ox, O and Cov(x, y) it i s c l e a r t h a t

    Y

    even p e r f e c t information of these parameters w i l l no t be s u f f i c i e n t t o

    provide information on the parameters of the s t r u c t u r a l model.

    Thus the b a s i c s t r u c t u r a l model, a s it s tands , i s u n i d e n t i f i a b l e and

    2 a l l t h a t we a r e able t o es t imate i s p , V , Ox, O2 and COV (x , y ) . Our Y

    r e a l i n t e r e s t i s i n es t imat ing B and a and unless we a r e given some

    a d d i t i o n a l knowledge, o r a r e prepared t o a l t e r the model, we cannot do this.

    This paper dea l s then with ways and means of e s t ima t ing the l i n e a r

    s t r u c t u r a l r e l a t i o n i n those cases i n which it i s p o s s i b l e t o do s o and

    g iv ing those cases which do and do no t l ead t o c o n s i s t e n t es t imates . I t i s

    - - c l e a r t h a t i f b i s a cons i s t en t es t imate of 6 then a = y - b x i s a

    c o n s i s t e n t es t imate of a. Thus es t imates f o r a a r e t i e d up i n es t imates

    f o r B and we s h a l l n o t consider them f u r t h e r .

    I t should be noted a t t h i s p o i n t t h a t use of the words " s t r u c t u r a l f '

    and "funct ional" has no t been f ixed i n t h e l i t e r a t u r e , f o r ins t ance Lindley

    [ 3 7 , 381 uses the word func t iona l t o denote what we c a l l s t r u c t u r a l models.

    I t i s o f t e n d i f f i c u l t t o d i s t i n g u i s h between the two cases and although the

    d i f fe rences may be q u i t e minor, i n f a c t o f t en g iv ing t h e same numerical

    r e s u l t s , it i s no t c o r r e c t t o force d a t a t o an inappropr ia t e model. This

    i l l u s t r a t e s an o f t en neglected r u l e i n s t a t i s t i c s : never p i c k a model o r

    decide on t h e type of inferences t o be made a f t e r the d a t a has been c o l l e c t e d ,

    t h e r e is a good chance of in t roducing a s e r i o u s b i a s i n t o t h e r e s u l t s . The

    c o r r e c t time t o choose a model is before any observations a r e made, s e e

    Acton [ 1 1 on t h i s po in t . \_

  • 1.4 Example

    To i l l u s t r a t e the var ious es t imates given i n the fol lowing chapters

    we give an a r t i f i c i a l l y generated example. Since our main i n t e r e s t is the

    es t imat ion of the parameters a and f3 and s i n c e t h e b e s t es t imate of a - -

    is a = y - b x , where b is the es t ima te of 6, we only given t h e calcu- l a t e d value of b f o r the various es t imates .

    A l l of t h e da ta was drawn from random normal t a b l e s [47] and t r ans - -- - ..

    2 2 2 formed s o t h a t E(X) = 10, CT = 4, E(E) = ~ ( 6 ) = 0, CT1 = -04 and O2 =

    X

    ,0625. The l i n e chosen was Y = 2 + 2X and 30 values of X were

    obtained, from which we ca lcu la ted 30 values of Y. We chose 60 values

    each f o r E and 6 , t h e l a s t 30 of which were used only f o r chapter 7

    i n which we need r e p l i c a t e d observations.

    I n the t a b l e s below we give only t h e computed values of x = X + E

    and y = Y + 6 because, i n an a c t u a l experiment, t h e s e would be

    observations.

    Table 1.1 VALUES OF x AND y

    our only

  • The next t ab le only applies fo r repl icated observations, the order of the

    observations is the same as i n the table above. For example the i - j

    en t r i e s i n Tables (1.1) and (1.2) both correspond t o the same t rue value 'ij.

    Table 1 . 2 REPLICATED VALUES OF x AND

    We w i l l require the following s t a t i s t i c s and thus l is t them here.

    The s t a t i s t i c s apply only t o Table 1.1. They are not required for the

    repl icated data.

  • CHAPTER 2

    LEAST SQUARES AND MAXIMUM LIKELIHOOD

    2.1 I d e n t i f i a b i l i t y of the Parameters -

    Let us consider the b a s i c s t r u c t u r a l model a s o u t l i n e d i n s e c t i o n 1.3.

    It was mentioned t h a t es t imat ion of f3 i s no t p o s s i b l e wi th the model a s

    ou t l ined , we now consider why t h i s i s s o .

    The f i v e moments given i n s e c t i o n 1 .3 completely determine a b i v a r i a t e

    normal d i s t r i b u t i o n and the parameters may be est imated by the sample

    moments which a r e s u f f i c i e n t s t a t i s t i c s . These es t imates a r e ,

    - -1 x = n C xi

    i=l

    - y = n

    -1 " i Z l Y i

    2 -1 - 2 Sx = n C (xi - X) i-1

    2 -1 - 2 S = n 2 ( y i - y )

    Y i-1

    -1 - - S = n C (xi - X) (yi - y) .

    XY i=l

    A s previous ly pointed o u t we cannot es t imate our s i x unknown p a r ame t e rs

    with these f i v e equations; unless we can somehow "assign" a value t o a t

    l e a s t one, no f u r t h e r es t imat ion i s poss ib le .

    The f i r s t two of the equations i n s e c t i o n 1.3 do no t con t r ibu te any

    information i n es t imat ing the o t h e r parameters. Thus we drop them and

    consider

  • The parameters i n (2.2) a r e un iden t i f i ab le , meaning t h a t they cannot

    be determined uniquely from the j o i n t d i s t r i b u t i o n of our observed v a r i a b l e s .

    Following the terminology of Reiers$l [ 5 0 ] we s h a l l r e f e r t o a s t r u c t u r e when

    our parameters and d i s t r i b u t i o n s i n the model have been spec i f i ed . I f

    P(x, y ) denotes the d i s t r i b u t i o n of our observed va r i ab les the re w i l l e x i s t

    an i n f i n i t y of s t r u c t u r e s genera t ing P (x, y) . These s t r u c t u r e s a r e c a l l e d equivalent i n the sense t h a t they a l l generate t h e same d i s t r i b u t i o n P(x, y )

    b u t t h e parameters do not necessa r i ly have t h e same value i n each s t r u c t u r e .

    For example, i f x and y a r e j o i n t l y d i s t r i b u t e d a s above with E (x) = P I

    = 3, o2 = 9 and cov(x, y ) = 4 then we f i n d t h a t both of the E(y) = V , ox Y

    s t r u c t u r e s

    l ead t o t h e same j o i n t d i s t r i b u t i o n P (x, y ) of x and y.

    I f S1 is any s t r u c t u r e then an equivalent s t r u c t u r e S2 genera t ing

    the same P(x , y ) may be formed t ak ing y z 0 such t h a t Y + # 0 2 - 2

    and y < o2 6-I OX . S2 i s then formed (Moran [41]) by rep lac ing 2 -. O x '

    2 2 2 2 -1 2 2 o,, 02, 8 and a with @ox(i3 + y ) - l I o2 1 + Y o x ( f 3 + ~ ) I o2 - B y o x B + y and a - y ,J r e spec t ive ly .

  • We say t h a t a parameter i s i d e n t i f i a b l e i f a l l equivalent s t r u c t u r e s

    lead t o t h e same value of the parameter. Thus w e say t h a t the parameters

    i n t h i s model, i n p a r t i c u l a r 6 , a r e not i d e n t i f i a b l e .

    We now consider th ree theorems which w i l l t e l l us under what condi-

    t i o n s 6 is an i d e n t i f i a b l e parameter. The proofs , which a r e s t r a i g h t -

    forward, may be found i n ~ e i e r ~ ~ i l _ l ~ ~ - ] -and (theoxem 1-- -Bty) in-

    Reiersgfl and Koopmans [ 351 . . .-- - --- - -

    Theorem 1. I f E and 6 a r e normally d i s t r i b u t e d , no t necessa r i ly

    independent, 6 i s i d e n t i f i a b l e i f and only i f n e i t h e r X nor Y

    i s normally d i s t r i b u t e d .

    Theorem 2 . When 6 is i d e n t i f i a b l e t h e o t h e r parameters a r e a l s o iden-

    t i f i a b l e i f and only i f n e i t h e r X nor Y is d i v i s i b l e by a normal

    d i s t r i b u t i o n (see d e f i n i t i o n 5 , chap te r 1.) and exac t ly one of E and

    6 i s i d e n t i c a l l y zero.

    Theorem 3. When E and 6 a r e independent and X is normally d i s t r i -

    buted then f3 i s i d e n t i f i a b l e i f and only i f the d i s t r i b u t i o n s of n e i t h e r

    E nor 6 a r e d i v i s i b l e by a normal d i s t r i b u t i o n .

    W e may l ist the i d e n t i f i a b i l i t y of f3 i n t h e various cases w i t h the

    following t a b l e given by Reiersgfl 1503. -

  • @ = O o r B = m I f3 no t i d e n t i f i a b l e X no t nornmlly d i s t r i b u t e d I i d e n t i f i a b l e X normally

    d i s t r i b u t e d

    Nei ther P ( E l nor P (6 )

    Table 2 .l. IDENTIFICATION OF 8

    d i v i s i b l e by a normal I $ i d e n t i f i a b l e

    CASE

    d i s t r i b u t i o n

    E i t h e r P ( E ) o r P ( 6 )

    Conclusion on B

    d i v i s i b l e by a normal 6 not i d e n t i f i a b l e I d i s t r i b u t i o n I

    B + O

    f3 f i n i t e

    I t i s c l e a r t h a t i f x and y a r e independent then w e may assume

    t h a t X and Y a r e cons tant and t h a t f3 may have any value. I f X

    and Y a r e no t independent then B i s no t zero o r i n f i n i t e . Thus we

    have t h e conclusion f3 is n o t i d e n t i f i a b l e i f f3 = 0 o r B = 03 i n

    t a b l e 2 .l.

    -

    Least Squares Est imation

    A survey o f the e r r o r s i n va r i ab les model would be incomplete wi thout ,

    mention of some o f the many l e a s t squares es t imates proposed over t h e yea r s ,

    f o r l e a s t squares has been pursued by almost everyone concerned wi th a

    regress ion problem. The b a s i c idea is t o minimize a s u m of squares

    (poss ib ly weighted) of r e s idua l s i n some d i r e c t i o n . The sum of absolute

    values has a l s o been considered, b u t does n o t lend i t s e l f t o Calculus,

  • being discontinuous a t the o r i g i n , and we s h a l l no t cons ider it f u r t h e r .

    When only one va r i ab le , Y say , i s observed with e r r o r , the es t imate

    derived i n chapter 1 i s an e f f i c i e n t and c o n s i s t e n t e s t ima te of B .

    When, however, X i s a l s o s u b j e c t t o e r r o r b is n e i t h e r c o n s i s t e n t

    nor unbiased. An exception t o t h i s i s when w e have r e p l i c a t e d observations,

    however t h i s w i l l be defer red t o a l a t e r chapter .

    Divide t h e numerator and denominator o f b i n equation (2.3) by

    n . The denominator converges i n p r o b a b i l i t y t o 2 ox. The numerator may be

    w r i t t e n a s

    - x-IJ 1 - The expressions - E(yi - V ) , - 2(xi - p ) V and (x - p ) v a l l converge n n

    i n p r o b a b i l i t y t o zero. Thus the numerator converges i n p r o b a b i l i t y t o

    t h e same l i m i t a s the f i r s t expression. This l i m i t is $ o:/o: which we

    may w r i t e a s

    R i c h a m o n and Wu bJ.i.give t h e mean of b as /----

  • The expressions i n equations (2 .4 ) and (2.5) a re c lear ly the same.

    ~ h u s b i s a consis tent and unbiased estimate of

    not of f3.

    We now consider some of the attempts made t o take the e r rors of the

    x observations i n t o account. I t should be noted tha t a l l of the estimates

    i n the balance of t h i s sect ion (except for the l a s t one) w i l l , i n general,

    be inconsis tent .

    It should perhaps be mentioned a t t h i s point t ha t consistency is not

    an important property i n small samples (Madansky [ 3 9 ] ) since we a re never .---. --- . -.

    too sure jus t how close b i s t o B . Consistency r e l a t e s t o i d e n t i f i a b i l i t y

    i n the sense t h a t i f no consis tent estimate e x i s t s then the parameters a r e

    not iden t i f iab le , i .e. we have too many parameters.

    One of the e a r l i e s t authors, Adcock [2], suggested minimizing the

    sums of squares of the normal distances from the observed points t o the

    461 ca l led t h i s the major axis of the correlation t rue l i n e . fl-earyan L.

    e l l i p s e , making an angle 0 with the X-axis where

    tan 28 = 2'11

    "20-"02

    where the 'ij

    denote population moments.

    Solving fo r 0 we see t h a t

    (lJ20-~02kd ("20-llo2 1 L+4"L 8 = Tan 0 = - 11

  • From equations (2.21 we see tka t sgn@) = ~gnC1-1~~) and hence in

    equation (2 .7 ) we w i l l take the positive square root as the negative

    root would imply sgn(B) = -sgn(pll) . The population constants

    1-111~ 1-120 and pO2 may be estimated from

    the sample moments Mll, M20 and MO2 respectively where

    ' ~ e t T be the estimate of 8 . Ne thus have as estimates of @ and

    b = tan T

    The standard deviations of these estimates are given by - Kermack and Haldane -- - - -- - - - I-----.

    [33] and are

    where r, the sample correlation coefficient, i s given by

    It was implicit ly assumed for the above estimate tha t the er ror variances

  • were t h e same. I n the nex t s e c t i o n we w i l l a l s o achieve t h i s r e s u l t when

    2 -1 we assume ?I = (5 0 known t o be uni ty .

    2 1

    The disadvantage with t h i s e s t ima te i s t h a t it i s not i n v a r i a n t

    under changes of s c a l e , although it i s i n v a r i a n t under r o t a t i o n . I n

    p r a c t i c e t h e former i s usually the more important. I t was suggested by

    Jones [27] and T e i s s i e r - [ 5 7 ] - (see Ker-%ck_-and - Haldane .-- - - . [ 3 3 ] ) t h a t t h e -- - --- -

    coordinates be s tandardized t o overcome t h i s problem. Thus we transform

    and hence we make pZ0 - - vo2 = 1 and t a n 0 (see equation (2.8) )

    i d e n t i c a l l y un i ty . I n terms of t h e o r i g i n a l d a t a the l i n e becomes

    the sign of the s lope being t h a t of pll. This l i n e (2.9) is c a l l e d ,

    "The reduced major ax i s" by Kermack and Haldane [ 3 3 ] who give t h e s tandard - -- - - -

    dev ia t ions of

    and

  • For i l l u s t r a t i o n l e t us so lve t h e l e a s t squares es t imate when we

    2 -2 assume = 0 /a i s known. We w i l l no t assume X = 1, b u t ou t s o l u t i o n 2 1 below w i l l be seen t o be t h e same a s eqn. (2.8) i f we s u b s t i t u t e X = 1

    i n t o it.

    To allow f o r more genera l i ty w e w i l l allow the variance of 'i - a -

    B xi t o vary with i. I n t h i s case we minimize the sum

    where w a r e inverse ly p ropor t iona l t o t h e variance of y i - c 1 - B ~ ~ i

    given X (Deming [16) c f . a l s o Kummel 1361) t h e cons tant of p ropor t iona l i ty

    being independent o f i . We follow the method of Lindley - [37 ] and minimize S i n equation

    (2.10) . Since X is fixed,

    Let w = - I f o r convenience and minimize S with r e s p e c t t o B. i A + B ~

  • A - h s e t t i n g this equal t o ze ro and l e t t i n g a = y - f3x g ives

    A

    where we have again taken t h e p o s i t i v e square r o o t i n o rde r t h a t f3 and h

    cov(x, y) have t h e same s ign . It can be shown t h a t t h i s value f3 does

    indeed correspond t o a minimum value of S.

    I n the terminology of equation (2.8) t h e above equation becomes

    This es t imate o f f3 i s cons i s t en t , and i n f a c t i s the only c o n s i s t e n t

    es t imate given i n t h i s sec t ion ; unless i t i s the case t h a t A = 1. It i s

    i n t e r e s t i n g t o note t h a t , a s e a r l y a s 1879, Kummel [36] minimized a weighted y-- -- - -A

    sum of squares and achieved a r e s u l t , the same a s equation 2 1 , which

    agreed with Adcock ' s es t imate only when t h e e r r o r variances were equal . I n

    s p i t e of t h i s the re a r e many more l e a s t squares es t imates i n t h e l i t e r a t u r e

    which a r e not c o n s i s t e n t and which ignore the e r r o r var iances . Some of

    these a r e amazingly complex, an example b e i n q ~ o r k -A-d - [ - 641 .- who has poss ib ly

    the most d i f f i c u l t es t imate t o compute, r e l y i n g a s it does on i t e r a t i v e

    methods.

    For papers genera l iz ing l e a s t squares t o t h e mul t iva r i a t e cases I

    r e f e r t o Sprent [56 ] and Vi l legas [59, 60, 611. These papers consider only --- --._- _ _ - - - - - - . - . - - - - -- - - - . - c o n s i s t e n t es t imates .

  • 2.3. Maximum Likelihood Est imation

    I n order t o es t imate the parameters i n equations (2 .2) , t h e only

    information being t h a t of equations ( 2 . 1 , we have shown, v ia R e i e r s d l ' s

    theorems, t h a t more information i s requi red . The l a s t th ree equations of

    (2.1) a r e t h e maximum l ike l ihood s o l u t i o n s of the parameters on the l e f t

    hand s i d e of equations (2 .2) . This s e c t i o n w i l l be devoted t o maximum

    l ike l ihood es t imates of 6 where we a r e e i t h e r provided wi th a d d i t i o n a l

    information on the e r r o r var iances o r a r e prepared t o make c e r t a i n

    assumptions regarding them.

    2.3.1. Knowledge of one e r r o r variance

    L (A) G1 is known.

    2 - 2 I n t h i s case OX - Ox

    - O: can be es t imated and we may es t imate @ by

    2 (B) G2 i s known.

    2 2 The problem i s symmetrical i n GI and G2 and hence we es t imate

    I n the f i r s t case the re is a p o s i t i v e p r o b a b i l i t y t h a t the known

    value of 0: could tu rn o u t l a r g e r than the es t imate of o implying

    2 t h a t OX < 0 which i s impossible. I f t h i s should happen the procedure

  • does no t give an es t imate of $. Simi la r ly i n (B) it could happen t h a t

    2 2 S < C2 and again we cannot es t imate 6 by t h i s method. The p r o b a b i l i t y Y

    of this happening i n e i t h e r case w i l l tend t o zero a s the sample s i z e

    inc reases .

    2 2 2.3.2. Knowledge of the r a t i o = 0 /O 2 1

    The inconsistency t h a t may a r i s e i n s e c t i o n (2.3.1) cannot occur

    i n t h i s case , perhaps this is why this case is the most popular f o r s tudy

    i n the l i t e r a t u r e .

    With A known, equations (2.2) become

    2 2 and $ Cov(x, y ) + 8 ( A o: - o ) - A Cov(x, y ) = 0 and thus we es t imate

    Y

    h

    A s before we take t h e p o s i t i v e square r o o t i n order t h a t f3 w i l l have

    t h e c o r r e c t s i g n , t h a t o f Cov(x, y ) . Equations (2.15) can be seen t o

    be t h e same a s t h a t achieved by using Lindley ' s weighted least-squares

    es t imate of the preceding sec t ion .

  • 2 2 2.3.3. Both rS1 and O2 known -

    I n t h i s case we could use one of C2.12), (2.13) o r (2.15) t o

    give us an es t imate of 8. We could a l s o use (Madansky [39] ) t h e geometric

    mean of (2.13) and (2.141,

    where sgn (g) i s chosen t o be t h e same a s sgn (cov(x. y ) 1 . Thus we have four (usual ly d i f f e r e n t ) es t imates f o r 8 , a s r e s u l t which d i d not endear

    i t s e l f t o many people. This s i t u a t i o n the re fo re became known a s the

    "over ident i f ied" case, leading some ( e . g. Allen [ 31) t o recommend t h a t knowledge o f X i s t o be p re fe r red . Kendall _ and -- S t u a r t -_- [32] _ _ . say t h a t

    t h i s case is q u i t e unmanageable s i n c e we must ob ta in the maximum l ike l ihood

    by solv ing th ree equations f o r the two unknowns, 6 and a 2 X

    ~ i e f e ~ - m , __- - i n h i s review of t h i s book, does not agree and mentions

    2 t h a t the maximum l ike l ihood equations f o r B , ax, a and p can be solved

    by maximizing the l ike l ihood with r e s p e c t t o t h e f o u r parameters. Following

    this, Barne t t [4] d i f f e r e n t i a t e s t h e log-l ikel ihood wi th r e s p e c t t o a, B, - - -

    2 p and OX and r e a l i z e s the equations,

  • again + i s used t o denote sgn CCovCx, y) ) . It is i n t e r e s t i n g t o note t h a t t h i s es t imate f o r f3 i s the same a s

    A

    t h a t when h i s known. The es t imate 02 i s no t the same a s Cov(x. y) /f3. X

    which i s t h e b e s t we could achieve with j u s t h known.

    2 I n t h i s case. with both o2 and o known. the same d i f f i c u l t i e s

    1 2

    mentioned i n s e c t i o n (2.3.1) may a r i s e . I f so l t h i s method f a i l s t o

    provide an es t imate .

    When both e r r o r variances a r e known Madanskv [35&states t h a t we may

    consider the case where COV(E, 6 ) # 0 and t h a t (2.16) i s the maximum

    l ike l ihood es t imate of B . H e says t h a t w e may es t imate cov(E, 6 ) = polo2

    This i s not c o r r e c t ( c f . Moran [41]) f o r only B2 is i d e n t i f i a b l e with

    To see t h i s l e t p # 0 and consider the l a s t equation of (2.21 which

    now becomes

    and hence i t is no longer t rue t h a t sgn(f3) = sgn(Cov(x, y ) 1 .

    I f sgn(p) is known then t h e magnitude o f p and f3 i t s e l f a r e

    i d e n t i f i a b l e r however from equation (2.19) we see t h a t sgn(p1 i s not

    necessa r i ly t h a t of cov(x, y) and it is unl ikely t h a t sgn(p) w i l l be

    forthcoming .

  • I f we a s s m p # 0 and t h a t we know 2 O1 w e can i d e n t i f y only a 2 x while i f only A i s known no parameters a r e i d e n t i f i a b l e . It i s thus wise

    when planning an experiment t o at tempt t o keep c o v ( ~ , 6) = 0 i f we p lan

    on using maximum l ike l ihood es t imat ion .

    2.3.4. When a i s known - I f a is known then Y = a f 6 X passes through the p o i n t X = 0,

    Y = a and by t r a n s l a t i o n of the coordinate axes we may make t h e l i n e pass

    through the o r i g i n . We may take

    a s a c o n s i s t e n t es t imate of f3 s o long a s # 0 A t e s t f o r = 0

    should of course be made before es t imates a r e taken on any of the para-

    meters, f o r i f s o they would not be cons i s t en t .

    There is a danger t h a t could a r i s e i n t h i s s i t u a t i o n , namely when we

    a r e only concerned with approximating the t r u e r e l a t i o n , which may be non-

    l i n e a r , by a l i n e a r r e l a t i o n over a c e r t a i n range. For example, we know

    t h a t the t r u e r e l a t i o n passes through the o r i g i n , however we a r e only

    concerned wi th the range a < X < b, a > 0 . I n t h i s event o u r l i n e may

    be anything b u t l i n e a r i n t h e v i c i n i t y of the o r i g i n and using a = 0

    could s e r i o u s l y e f f e c t our r e s u l t s .

    We s h a l l d e f e r t o a l a t e r chapter the study o f r e p l i c a t i o n (another

    form of a d d i t i o n a l knowledge) using l e a s t squares and maximum l ike l ihood

    methods .

  • I n this chapter , and i n most o the r s , w e give t h e numerical value of

    t h e various es t imates given, app l i ed t o t h e d a t a i n chapter 1 i n t a b u l a r

    form. Unless comment seems t o be requi red f o r our r e s u l t s we s h a l l l e t

    t h e t a b l e s t a n d on i t s own mer i t s . We see from t a b l e 2.2 t h a t the b e s t

    es t imates of t h i s chapter gave r i s e t o t h e b e s t numerical values.

    Table 2.2. EXAMPLE

    EQUATION

    2.3

    2.8

    2.9

    2.12

    2.13

    2 . l 5

    2.16

    2.17

    2.20

    METHOD OF ESTIMATION

    Least squares, no e r r o r i n X

    Minimize normal d i s t ances

    Standardized coordinates

    2 Max. l ike l ihood O1 = .04, known

    Max. l ike l ihood 0: = .0625, known

    Max. l ike l ihood = 1.5625, known

    Max. l i ke l ihood (I) 0: = .04, o2 = .0625 2

    Max. l ike l ihood (11) o2 = .04, B: = .0625 1

    a known, a = 2

    - -- -

    ESTIMATE OF $

  • CHAPTER 3

    ESTIMATES DERIVED FROM GROUPING THE DATA

    Grouping es t imates a r e loosely based on t h e idea t h a t t o de f ine a

    s t r a i g h t l i n e only two p o i n t s a r e requi red . We form two groups, take the

    - - - - means, (xl, yl) and (x2, y2) of each and choose t h e l i n e throuqh t h e s e

    Some es t imates r equ i re t h a t some of the da ta be dropped, t h i s da ta

    being a t h i r d group. To maintain consistency we s h a l l denote our groups

    GI, G2 and G3. The da ta t o be dropped, i f any, w i l l comprise G2,

    otherwise G2 w i l l be empty.

    The theorems i n the previous chapter t e l l us t h a t i f our parameters

    a r e no t i d e n t i f i a b l e they remain s o no mat ter how we may rearrange t h e

    da ta . With this i n mind we now i n v e s t i g a t e Wald's method [62] which was I--_- -

    t he f i r s t published, although the paper by Nair and S t r i v a s t a v a [43] may -- "

    have been w r i t t e n concurrently o r perhaps e a r l i e r .

    Wald s t a t e s t h a t t h e l i n e Y = a + f3 X can be es t imated i n c e r t a i n

    cases from the observed values o f x and y wi thout knowledge of 2 O1

    2 and C2. The es t imates a r e a l l cons i s t en t . These cases occur when the

    following four assumptions a r e s a t i s f i e d , t h e four th be ing known a s "Wald's

    condit ion".

    1 The e r r o r terms El, E ~ , . . ., E are independently and n

    2 i d e n t i c a l l y d i s t r i b u t e d with f i n i t e variance O~ , a s a r e

    the 61, 62, . .. , 6 with f i n i t e variance 0; . n

  • (2) E(ci 6 . ) = 0 for a l l i and for a l l j . 3

    (-31 There ex i s t s a s ingle l inear re la t ion between the t rue

    values X and Y , i . e . Yi = a + 6 Xi.

    I (Xl+. . .+X 1- (XWl+. . .+X I m n (4) l i m . i n f . 7 0 . n

    where for convenience we l e t n = 2 m be even. Let

    I f the above four conditions are s a t i s f i e d we w i l l estimate 6 by

    Let

    and

    From assumption (3) we have t h a t

    W e now show tha t b = b2/bl i s a consistent estimate of 6 i f

  • our four assumptions a r e s a t i s f i e d . From Y = a + 6 Xi we have: i

    1 1 2 2 The variance of -[ (6 +. . .+6 ) - (6m1+. . .+6,) ] is ?(n 02) = a2/n and n 1 m -

    1 2 I1 t h e variance of -[ (E +. . .+E ) - ( E +. ..+& ) ] = al/n I both of these

    n 1 m m+l n

    converging i n p r o b a b i l i t y t o zero. Applying assumption (4) t o both the

    numerator and denominator ensures t h a t b converges i n p r o b a b i l i t y t o 6 , - -

    i . e . b is c o n s i s t e n t . Since b i s c o n s i s t e n t we have t h a t a = y - b x

    i s a c o n s i s t e n t e s t ima te o f a .

    W e now tu rn t o es t imat ing 2 2 al and a2. Let

    The equations ( 3 . 4 ) represent what would be sample es t ima tes of

    and Cov(X, Y ) a s given i n equations (2.2) i f we a c t u a l l y knew O x t OY

    the t r u e values Xi 2

    and Yi. L e t Sx, s2 and S be def ined a s i n Y xy

    equations 2 1 . Then

  • Equations (3.5) may be proved a s follows, r e c a l l i n g t h a t es t imates

    (2.1) a r e i n f a c t biased. From equations (2.2) we have

    b u t

    and

    Thus from (3.7) and (3.8) equation (3.6) becomes

    This proves the f i r s t equation of (3.5) and t h e second equation may be

    shown i n an exac t ly s i m i l a r way. The t h i r d equation follows e a s i l y from

    assunptions (2) and from the assumption of independence between e r r o r

    terms and t r u e values.

    From assumption (3) w e know t h a t

    Thus from the l a s t o f equations (3.5) and from (3.9)

    S u b s t i t u t i n g equations (3.10) i n t o t h e f i r s t two of equations (3.5)

  • gives

    We have shown tha t b i s a consistent estimate of and hence it is

    c lear tha t the expressions

    and

    2 converge i n probabili ty t o o2 and o respectively. Thus (3.12) a re

    1 2

    consistent estimates of equations 3 1 because of the (n/n-1) adjust-

    ment they are a l s o unbiased.

    Although our estimates are consistent i f assumption (4) i s s a t i s f i e d

    they may not be the most effect ive t h a t we could derive. Our observations

    i n t o two groups and where

    and (xi, yi) C G3 i f i > m. It is true tha t any other division of the

    observations w i l l a l so give consistent estimates so long as the grouping

    i s performed independently of the errors , E and 6i, and so long as i

    condition (4) remains sa t i s f i ed . We consider now how we may improve

    our estimates . W e obtain a be t t e r est.imate by finding tha t estimate which w i l l give us

    the shortest confidence in terva l for B (Wald [ 6 2 ] ) . We w i l l see i n chapter 8 -- t ha t the shor tes t confidence in terva l a r i ses when lbll is a maximum.

    From equations (3.1) we see tha t 1 bll is maximized by ordering

  • the observations, renumbering where necessary, xl 5 x2 5 . . . 5 x . Thus n

    G1 i s the s e t { (xl. yl) , (x2, y2) , ..., (xmI ym) 3 and G3 the s e t {hrn+,t Y,,) I ..., (X n , y,)). Depending as it does on the values x 1' x 2' . . . , x it i s unlikely t ha t t h i s grouping w i l l be independent of the

    n ' e r ro r s El , E ~ , . . . , E . I f we knew the r e l a t i ve s i ze s of the t rue values

    n

    X1t X2 , ..., Xn (more on t h i s l a t e r ) we could order the t rue values X1 5

    X2 5 . . . C Xn, again renumbering where necessary. and l e t G = { (xi. yi) I 1 xi € bl. X 2 , . . . , xm}1 and Gg = (xi, yi) I Xi E { x ~ + ~ ~ XmtzI .. . I X n l l . This grouping is en t i r e ly independent of the e r rors .

    The two groupings, ordering x and ordering X I w i l l be i den t i ca l i n

    t he case where the range of E is the f i n i t e i n t e r v a l [-c, c] and a l l of

    the observed values xl, x2, ..., x f a l l outside of the i n t e r v a l [XI-c, n

    x 1 + c 1 , where x ' denotes the median of xl, x2, ..., x . In t h i s case we n

    may order the x ' s with confidence t h a t we have performed the grouping

    independently of the e r rors . I n p rac t ice we may order i n t h i s way i f 3 c >

    0 such t h a t P [ I E ~ 2 c] i s small and the number of x i n [xl-c, x'+cl i is a l so small.

    Let b ' and b" be the est imates of B obtained by ordering the

    X1t .-., and the XI, ..., Xn respectively. We consider the case n where b" is not known, a s may of ten be t he case, and w i l l now f ind upper

    and lower bounds f o r b" . I f E is normal, l e t v2 be t he sum of the squared res iduals i n the

    x-direction divided by degrees of freedom. A good estimate of c w i l l then

    be 3v and the i n t e rva l [-c, c] may be considered as a possible range

    f o r E . I f E i s not normal it may be wise t o increase c t o a s much as

    C = 5v.

    Let S be the s e t of a l l possible groupings which s a t i s f y the following

  • conditions

    where x' i s again the median value of the x.

    For each grouping g € S calculate b and l e t b* and b** be

    the obtained minimum and maximum values, respectively. Since the X-ordered

    grouping i s i n S we therefore have b* and b** as lower and upper

    limits, respectively, of b" . Wald gives a conditions which, i f s a t i s f i ed , w i l l imply tha t the

    expression i n assumption (4) w i l l not converge s tochast ical ly t o zero.

    This condition is 3 X € R such tha t

    where [-c, c ] i s the range of E.

    I f X does not have t h i s property, as it obviously doesn't when X,

    & and 6 are normal, i t may happen tha t f o r every grouping defined

    independently of the errors the expression i n assumption (41 converges

    s tochast ical ly t o zero. There i s one case where the expression does not

    converge t o zero, even though our variables a re normally dis t r ibuted,

    and tha t i s where the order of the X ' s i s known. We can never be sure

    of the order by merely looking a t the data a f t e r the experiment, but

    sometimes in the laboratory we can s e t up the equipment so t h a t the t rue

    X is, say, increased from observation t o observation. I f t h i s i s so

    then the ordering given t o the observed x ' s is merely the order of the i r

  • occurrence. We see then that E(x + x + .. . + x ) < y < EIx + .. . + 1 2 m m + l

    xn) and that b w i l l not tend t o zero. Thus we achieve, perhaps not

    1

    the most e f f i c i en t estimate, but a t l eas t a consistent estimate. This i s

    yet another verif icat ion of the truism tha t the time t o begin the s t a t i s -

    t i c a l analysis is before, not a f t e r , the experiment is performed.

    We have dealt with Wald's method i n some d e t a i l because it was the

    f i r s t of the grouping estimators and because it is a f a i r ly simple and

    straightforward procedure. I t i s a lso qui te commonly misunderstood and,

    t o quote Moran [40], "caused a considerable amount of confusion i n the

    l i t e ra tu re . . . Im. The main d i f f icu l ty , as might be expected, i s dividing the observations so th3t the dis tr ibut ions of the errors are unaffected.

    In the in teres ts of increased efficiency Bar t l e t t [6] divided the --, . . . . - . ---.-. .

    observations in to three groups. This was also done by Nair and Str ivis tava

    [ 431 (cf . a l so Nair and Bannerjee [ 421) but we w i l l follow the outline of -- .- ---- - - .

    Bar t l e t t . Consider the uniform model i n which x = X i s observed without e r ror

    and spaced a t equidistant unit intervals . In t h i s case the ordinary l eas t

    squares estimate

    w i l l provide an unbiased estimate of 6 with an e r ro r variance of 2 - 2

    02/E (xi-X) . I f we l e t n = 2k+1 then it can be shown by induction tha t - 2

    1 (xi-X) = !?, (!?,+I) (2k+1) . The observations are s p l i t i n t o three groups 3

    where the two end groups each have k elements, k i s as close t o n/3

    as possible. Bar t le t t then uses the estimate

  • for 8 . For locating the l ine Bar t le t t has i t pass through the overall - -

    man (x, y ) while Nair and Shrivistava use (3.13) to locate the l ine ,

    as well as t o estimate the slope.

    Estimate (3.13) has an error variance of

    Thus the re la t ive efficiency of b ' is

    1 n which can be shown t o be a maximum when k = - 0 R + l ) = - 3 3 . Thus

    When k = n/2 we have

  • Thus, by using th ree groups r a t h e r than two, we have increased the

    r e l a t i v e e f f i c i e n c y of our es t imate . The inc rease i n e f f i c i e n c y , a s i t

    t u r n s ou t , i s approximately twenty percent . B a r t l e t t suggests t h a t i n

    genera l k = n/3 i s t o be p r e f e r r e d t o k = n/2.

    Rather than j u s t considering X uniformly d i s t r i b u t e d , Gibson a_"_"%_

    Jowett [2 31 went a s t e p f u r t h e r and considered s e v e r a l o t h e r d i s t r i b u t i o n s . -_- -

    They found t h a t the three-group method was " su rp r i s ing ly e f f i c i e n t " , b u t

    recommended the genera l use of the r a t i o 1 : f o r d iv id ing t h e obser-

    va t ions r a t h e r than 1: 1: 1 a s given by B a r t l e t t and Nair and Bannerjee.

    They do t h i s s i n c e t h e normal d i s t r i b u t i o n has the more common occurrence,

    t h e r a t i o 1 : being optimum i n t h i s case and being f a i r l y good i n

    o the r cases , although it is no t t o o good f o r extreme skewness. For those

    s p e c i f i c cases where the d i s t r i b u t i o n of X i s known they give the

    following t a b l e which gives the optimum r a t i o s .

    Table 3.1. OPTIMUM PROPORTIONS

    D I S TRI B UTI ON

    Normal

    Uniform

    B e l l Shape

    U-Shape

    J-Shape

    Skew

    FREQUENCY CURVE RANGE PROPORTIONS

    .27: .46: .27

    .33: .33: .33

    .31: .38: .31

    .39: .22: -39

    .45: .40: .15

    .36: .45: .19

    APPROXIMATE RAT1 OS

    1:2:1

    1: 1: 1

    3: 4: 3

    2:1:2

    3: 3: 1

    4:5:2

  • The f i n a l grouping es t imates t h a t we consider i n t h i s chapter a r e

    those due t o Neyman and S c o t t [453. We w i l l b r i e f l y review two methods . -- _

    of es t imat ion given by them with necessary and s u f f i c i e n t condit ions f o r

    t h e i r consistency. For both o f these methods they admit t h a t cons i s t en t

    es t imates of f3 w i l l be achieved i n "very except ional cases only". I n

    both methods we do not necessa r i ly assume t h a t the e r r o r s a r e uncorre la ted .

    For the f i r s t method f i x two numbers a and b

    0 and

    Let Z

    P ( x > b) > 0 . Let

    and W be the mean values of x and y j j i

    group G f o r j = 1, 2. j

    An es t imate f o r 6 is then

    the law of l a rge numbers,

    L I b = - 1 z - z '

    2 1

    such t h a t P ( x 5 a ) >

    r e spec t ive ly i n

    and converge i n p r o b a b i l i t y t o

    E (Wi) and E (zi) r e spec t ive ly , thus the s t o c h a s t i c l i m i t of b is

    ( E W ~ - E W ~ ) / ( E Z ~ ~ E ~ ~ ) . The authors cons ider condi t ions f o r E (W -W 1 = 2 1 B E(z2-z1). Let ( c , d) be t h e s h o r t e s t i n t e r v a l such t h a t PCc C E 5 d)=

    1. Since E (E) = 0 it is c l e a r t h a t c 5 0 5 d and they show t h a t

    necessary and s u f f i c i e n t condit ions f o r the consistency of b a r e

  • The second method involves f ix ing two proportions P1 and P 2 '

    with pl > 0, p2 > 0 and pl + p2 5 1. Let 3

    and W denote the 3

    means of x and y respectively f o r i = 1, 2 , . . . [npl] = r and l e t i i

    Z 4 and W denote the means of x and yi respectively fo r i = n-S+1, 4 i

    n-S+2, . . . , n where S = [nP2] . The estimate for 6 i s then

    Neyman and Scot t show tha t i f X and X are points such t h a t P1 1-p2

    P(x 5 X ) = pl and P ( x Z X ) = p2 and i f (c, d) has the same p 1 1-p2

    meaning a s above then a necessary and su f f i c i en t condition f o r the consis-

    tency of b2 is

    We w i l l a l so consider grouping methods when we have repl icated

    observations and when we use the analysis of variance. These however w i l l

    be considered i n l a t e r chapters.

    The most important applications of grouping methods occur when we

    have some ex t ra knowledge on the posi t ion of the t rue values. For example,

    if the order of the X ' s is known o r i f we have knowledge t h a t our X ' s

    were achieved from two (or from k) processes (cf. Madansky [39] on t h i s ---------lp _ _ - --

    po in t ) , we may form two (or k) groups with each x i n i t s appropriate -/

    -- -

    group and be assured t h a t , s o long a s E i s independent of the processes,

    we have grouped independently of the e r rors and tha t Waldls condition

    (assumption ( 4 ) ) i s s a t i s f i e d . This does not contradict the Reirsal

    theorems, for we have addi t ional information a t our disposal which can be

    used t o give us a consistent estimate of 8 .

  • 3.2. Example

    The table below giving most of the estimates of th i s chapter i s se l f -

    explanatory. We do not give values for the Neyman and Scott estimates

    because for any values of a , b or p lt P2

    we might reasonably choose

    we would get resu l t s s imilar t o others l i s t e d here.

    Table 3.2. EXAMPLE

    METHOD OF ESTIMATION

    Wald's Method : Unordered Data

    11 " : x-ordered Data

    I I I' : X-ordered Data

    B a r t l e t t ' s Method : Unordered Data

    11 " : x-orderedData

    11 " t X-ordered Data

    Optimum Proportion : .27 : .46 : .27

    II I1 : x-ordered Data

    It II 8 X-ordered Data

    ESTIMATE OF 6

    1.94

    1.95

    1.97

    1.8

    1.97

    1.98

    1.88

    1.98

    1.99

  • CHAPTER 4

    INSTRUMENTAL VARIABLES

    In t h i s chapter we s h a l l consider the use of addi t ional knowledge i n

    the form of Instrumental Variables. These instrumental variables form a t

    l e a s t one s e t of extra data, highly correla ted with X but independent

    of E and 6 . I t i s not too d i f f i c u l t t o f ind variables correla ted with

    X and Y, the so-called " invest igat ional variables", y e t it may prove

    d i f f i c u l t t o have them independent of the e r rors .

    A fu r ther problem t h a t could a r i s e is when the invest igat ional and

    instrumental variables a r e so highly correla ted tha t perhaps the inst ru-

    mental variables should have been added t o the re la t ion as a t h i r d

    dimension. Madansky [39] . -- gives .- an example of t h i s with Y and X being, _ _ _ _ - -

    respectively, the pr ice and the quanti ty avai lable of bu t te r . The re la t ion

    i s Y = cx + f3 X with Z1 the pr ice of margarine. an instrumental variable.

    He points out t h a t the t rue re la t ion may perhaps have been b e t t e r expressed

    as y = c i + B x + y z .

    Instrumental variables were developed independently by Geary [ 221

    and by Reiersdl [48, 491. We consider f i r s t the simplest case where we have -- _ _ - w- - -- - but one instrumental variable.

    4.1. One Instrumental Variable observed without e r r o r

    Let Zi, i = 1 2, . . . n be a s e t of variables correla ted with and Yi but independent of E and di. We may assume t h a t Zi is

    i

    observed without e r ro r , for if there is an e r ro r qi , we may replace 'i

  • i n what follows by Zi + Tli (Moran [41]) . In order t h a t the notation be -- -- -

    a l i t t l e simpler we consider the homogeneous r e l a t i on and a prime on a

    - variable w i l l denote measurement about i t s mean, e .g. X! = Xi - X. Thus

    1

    our re la t ion becomes Y' = B xl or

    where B = -B1/B2.

    I f we multiply (4.1) by Z f /n and sum over i we r ea l i ze

    Consider a l so the analogous expression involving the observed variables

    It is c l ea r t ha t

    and

    Thus E ( A ) = E ( B ) = 0 s ince we assume independence between Z and the i

    e r r o r terms. Var (A) and Var (B) a re both O C ' / ~ ) and hence A and B

    both converge t o zero i n probabi l i ty . Thus i r respec t ive of the ac tua l

    l imit ing values of C yf Z f , C Y; Z f , C x! Zf and C X I Zf we see tha t 1 1

    expression (4.3) converges i n probabi l i ty t o zero and therefore

  • J I

    Cx! Z !

    converges i n p r o b a b i l i t y t o B . Thus

    i s a c o n s i s t e n t es t imate of B = -6 /B s o long a s our assumption 1 2

    Cov(z, x) # 0 i s s a t i s f i e d .

    Let us now compute an asymptotic variance f o r b f o r t h e case where

    t h e l i n e goes through the o r ig in . Since X and Z a r e c o r r e l a t e d w e may

    l e t

    E X . Z = y 1 i

    where y i s a cons tant which we can es t imate s i n c e Z i s observed wi thout

    e r r o r . Consider

    From t h e law of l a r g e numbers we have t h a t

    and

  • The d i s t r ibu t ion of z ( 6 i - 6 ei) Zi converges Limit Theorem t o a normal d i s t r i bu t ion with mean zero and

    by the Central

    variance

    We can estimate 0; s ince Z i s observed without e r r o r and, by a

    well-known theorem, the d i s t r ibu t ion of (4.7) converges t o a normal

    d i s t r i bu t ion with mean zero and variance

    Thus the asymptotic variance of b i s

    4.2. Two Instrumental Variables observed with e r r o r

    Reiers$l[49] considers the case where we have two s e t s Z1 and z2 --_ - - I of instrumental var iables r e l a t ed by y1 Z1 + y2 Z2 = 0, where Y1 and

    y2 are known constants. We assume Z1 and Z2 are observed with e r r o r

    and t h a t our observations are

    where the random variables Wli

    and w have f i n i t e variances and means 2 i

    of zero. We assume t h a t a l l e r ro r s a re independent of a l l t rue values and of

    each other . The observations can then be represented by the quadruples

  • (xi' Yi t 2 z ) and we es t imate f3 by li' 2 i

    Let us show t h a t b i s a c o n s i s t e n t e s t ima te of 6 . Rewrite b a s

    I f we t ake the expected value of t h e denominator of (4.11) w e g e t

    Applying the law of l a rge numbers t o (4.12) w e g e t

    s i n c e we have assumed independence of e r r o r s . Note t h a t Z2 = -Yl/y2 Z l ,

    thus the denominator converges i n p r o b a b i l i t y t o

    Applying t h e same technique t o t h e numerator of (4.10) we s e e t h a t

  • From equations (4.10) , (4.12) and (4.13) we see t h a t b converges i n

    probabi l i ty , so long as Cov(Z X) # 0, t o 1'

    Thus b is a consistent estimate of B , so long as Cov(zl, x) # 0.

    For an example i n the use of t h i s type of instrumental var iable the

    reader is referred t o the paper by Carlsonc-Sobel a x d d m o e which is

    concerned with estimation i n a biochemical s i tua t ion .

    Durbin [20] considers the one instrumental variable without e r r o r - -. case f o r various values of Z. He shows t h a t i f Z = '1 according as

    i

    t o whether x i s greater o r smaller than x ' , the median of the x . ' s , i 1

    the method reduces t o Wald's method. I f we put the x 's i n t o one of i

    three groups and l e t 'i = -1, 0 o r 1 according t o the group i n which

    x was placed we have B a r t l e t t ' s method. i

    He a l so considers ordering the x and l e t t i n g Z = i, the posi t ion i i

    of x . This i s a b e t t e r instrumental var iable i f the order of the x ' s i is tha t of the X ' s , but Durbin shows t h a t it w i l l still be a good choice

    even i f the E 's a re re la t ive ly large, i n which case he says t h a t the b ias

    should be l e s s than tha t for the or ig ina l variables. I f we can s e t up our

    experiment such t h a t our t r u e values a r e increasing then t h i s w i l l be a

    per fec t instrumental variable. In t h i s case we do not reorder the x but

    l e t Zi be the order of occurrence.

  • 45

    There is a f u r t h e r s tudy on ins t rumenta l va r i ab les due t o Tukey -[581

    which we w i l l consider i n chapter 7, on the Analysis of Variance.

    4.3. Example

    The concept of using two ins t rumenta l va r i ab les i s not too u s e f u l

    i n p r a c t i c e and s ince reference has been made t o an example using t h i s

    case we s h a l l apply our example only t o t h e case of one ins t rumenta l

    v a r i a b l e observed without e r r o r .

    Est imates using Wald's and B a r t l e t t ' s methods have a l ready been

    given and it would be p o i n t l e s s t o reproduce them. We consider Durbin's

    method o f order ing the x ' s and l e t t i n g Z = i. For comparison we w i l l i

    a l s o o r d e r t h e X's. The r e s u l t s a r e

    'i = i, order x : b = 1.977

    'i =.i, order X : b = 1.983 .

  • CHAPTER 5

    CONTROLLING THE OBSERVATIONS

    I n many experiments, especial ly under laboratory conditions, we are

    not so concerned with measuring X but with actual ly achieving X . For

    example, i n an experiment comparing reaction r a t e s of some chemicals with

    respect t o temperature it i s not often t h a t we measure the r a t e s and see

    what the temperature happened t o be a t the time. Rather, we would pick

    cer ta in temperatures, say 15O~. 20•‹c, ..., e t c . , and measure the corres- ponding r a t e s . This is an example of what Berkson [ 71 would c a l l a --__ "controlled experiment"; X, the temperature, being the controlled

    variable.

    I n the l i t e r a t u r e the model achieved when one of the var iables i s

    controlled is often referred t o as the "Berkson Model", s ince Berkson

    was the or ig ina tor of the idea. This model makes a pleasant change from

    the usual errors-in-variables model s ince, i f the conditions a r e s a t i s f i e d ,

    both a and f3 are i den t i f i ab l e with f3 estimated by the "usual"

    estimate

    I t i s a l s o the case, as Berkson claims, t h a t b is consistent.

    Unfortunately he does not show t h i s too well and i s c r i t i c i z e d and disputed

    by endal all_-[30]. Lirndley--[38] however gives a mathematical j u s t i f i ca t ion

    of the consistency of b i n t h i s model and, following h i s method, we s h a l l

    show tha t b is consistent. An almost i den t i ca l solut ion is given by

  • chef f6 1521 who also considers the idea of making several runs or w-

    replications on the relat ion. He allows for the poss ib i l i t y of a d i f fe rent

    l i ne y = a + b . x being the t rue l i n e for each run due t o circumstances j I

    which we may o r may not be able t o control during the J (j = 1, 2, . . . , J) runs. We l e t a and b be random variables with E ( a . ) = and E (b . ) =

    j j J I

    6 fo r a l l j and t e s t hypotheses which essent ia l ly s t a t e tha t the l ines

    are , i n fac t , the same. This w i l l be deferred t o the chapter on confidence

    intervals and t e s t s of hypotheses.

    In the example a t the beginning of t h i s chapter the thermometer reads

    the value we require, X. However we know tha t e r ro r ex i s t s and tha t the

    t rue temperature is X I while our observation i s x. We know that x is

    not a random variable since it was preselected and E , of course, i s a

    random variable defined as before, thus X i t s e l f is a random variable and

    we have X = x + E. Similar as t h i s model may appear t o the errors-in-

    variables model, there are i n f ac t some rather large differences. One

    difference ex is t s i n tha t we do not only drop the assumption of independence

    between X and E but can actually point out tha t t he i r correlation

    coeff ic ient i s -1. It is c lear tha t i n t h i s model the words "s tructural"

    and "functional" have no relevance a t a l l . Our observations on Y are not

    controlled so y = Y + 6 is the same as before. So

  • where 5 E ( @ & + 6 ) i s a normally d i s t r i b u t e d random v a r i a b l e with zero 2 2 2 2

    expecta t ion and f i n i t e variance G = @ Gl + 02. Equation ( 5 . 2 ) should 3

    look f a m i l i a r f o r it represents the c l a s s i c a l l e a s t squares s i t u a t i o n wi th

    y a random var iab le , x a f ixed o r mathematical v a r i a b l e and 5 a

    normally d i s t r i b u t e d random var iab le with mean zero and f i n i t e variance.

    A s shown i n chapter 1 we minimize S = 1 (yi - a - b xi) i n t h e v e r t i c a l d i r e c t i o n and achieve equation (5.1) a s our c o n s i s t e n t es t imate of 6.

    We minimize i n the d i r e c t i o n of the uncontro l led observat ions , thus i f w e

    had con t ro l l ed y r a t h e r than x we would have minimized S i n the

    h o r i z o n t a l d i r ec t ion .

    There i s a f u r t h e r i n t e r e s t i n g p o i n t about this model. Let the l i n e

    achieved by c o n t r o l l i n g x and minimizing i n the y-direct ion be y = a + 1

    bl X. This value of y is an unbiased and c o n s i s t e n t e s t ima te of a + @ x

    f o r given x. Let the l i n e achieved by c o n t r o l l i n g y and minimizing i n

    the x-direct ion be x = a2 + b2 y , a and b2 w i l l have an expecta t ion of 2 '/B and - f o r given y. I n f a c t this second l i n e w i l l be equal t o t h e f i r s t except f o r poss ib le sampling d i f fe rences which converge t o ze ro

    a s sample s i z e increases . This f a c t , impressive i n i t s e l f , l e d Berkson t o

    claim t h a t wi th h i s model the re would be only one regress ion l i n e . I n

    genera l it is known t h a t the genera l regress ion problem i s n o t i n v e r t i b l e

    (see Madansky [ 39 ] f o r a d iscuss ion on this p o i n t ) and t h a t the two l i n e s -- _ ___._ ._._ ----I. --- w i l l be d i f f e r e n t . I f we con t ro l x then the regress ion of y on x gives

    the c o r r e c t s o l u t i o n , however t h e regress ion of x on y w i l l n o t ; i t may

    even n o t be l i n e a r (Lindley [ 3 8 ] ) . Thus f o r x con t ro l l ed t h e r e a r e two ..- - - .

    regress ions and s i m i l a r l y f o r y cont ro l led . This p o i n t was somewhat

    ambiguous i n Berkson 's paper.

    The variance of b a s given by Berkson and v e r i f i e d by chef pk f g 1521 .---

  • is , i f x i s controlled,

    which we may estimate by using @chef•’; [ 5 2 ] )

    I f we control y, the variance of b is

    The rea l value of t h i s model i s not tha t in some cases we can use

    the l e a s t squares and achieve consistent estimates but in tha t we can

    apply the model before we begin the experiment. I f it is a t a l l possible

    t o control the observations on one of the variables then we should do so

    for then no problems with estimation would ar i se .

    A l l the t e s t s of hypotheses and confidence intervals t h a t can be

    derived in the c lass ica l regression case are formally the same in the

    Berkson model. Although the theory is different the actual resul t s w i l l

    be the same fo r both models. This i s another advantage of the Berkson

    model.

    Our example is not considered for the Berkson model for the data,

    being drawn a t random, is not applicable. We could, as some do, force

    the data t o the model and "see what happens", however we would only be

    repeating the l eas t squares solution for no errors i n X of chapter 2.

  • CHAPTER 6

    CUMULANTS

    The idea of using cumulants and moments t o es t imate has been

    considered by severa l authors ( c . f . [21], [28] , [3?_], WL, We i n v e s t i - . ._.__----

    g a t e this method wi th p a r t i c u l a r reference t o Geary [21].

    For convenience l e t t h e r e l a t i o n be w r i t t e n a s

    where 8 = -B1/B2.

    Let

    Thus B1 Z + B2 W = 0.

    If M(h) denotes t h e moment genera t ing function of a random var iab le

    and L(h) t h e cumulant generat ing funct ion, then L(h) = Logp M(h) o r -

    [ ? 'ihi] . exp L(h) = exp i=l i!

    W e def ine the cumulants R (cl, c2) of order c 1 2 + c i n terms of

    the moments of the (xi, yi) w i t h t h e following i d e n t i t y i n 6.

    Since

  • and a w e l l known theorem s t a t e s t h a t the moment generat ing funct ion (m.g.f.)

    of a sum of independent random v a r i a b l e s is the product of t h e m. g. f . ' s ;

    we have t h a t , Z and E and W and 6 being independent, the re fo re

    Let L(cl, c2) denote t h e c + c o rde r cumulant of (2 , W ) and l e t 1 2 cl, c2 # 0. Geary shows t h a t , from (6.3) and (6.4) I L(cl, c ) = R(clt c 2 ) -

    2

    A fundamental proper ty of cumulants is t h a t they a r e i n v a r i a n t under

    change of o r i g i n i f the order i s a t l e a s t two (one, i n the un iva r i a t e case ) .

    Thus t h e r e w i l l be no d i f f i c u l t y i n computation even though we performed

    t h e transformations i n equations (6.2). I n f a c t it was t h i s proper ty of

    cumulants t h a t l e d them t o t h e i r o r i g i n a l name of "semi-invariants".

    (Kendall and ~ t u a r t - L 3 1 1 ) . ~ - - - - - _ I _ _ _ _ _ _ -.

    Theorem 6.1.

    -1 To prove this l e t y = and rewri te equation (6.1) as Z = y W , then

    E[exp(B1z + B2w).I = E[exp(Bly + B2)w1 (6.6)

    i s an i d e n t i t y . Therefore

  • is immediate from equations (6.3) and (6.6) . Consider equation (6.7) and i d e n t i f y the c o e f f i c i e n t s of

    on b o t h s i d e s of the equation. Clear ly the c o e f f i c i e n t on the l e f t hand

    s i d e i s L(cl+l , c 2 ) , f o r t h e r i g h t hand s i d e l e t d = c + c + 1 and 1 2 consider the expansion

    of (B1y + fi2) and i ts c o e f f i c i e n t

    Thus the required term of the r i g h t hand s i d e i s

    and our requi red c o e f f i c i e n t i s

    c +1 (c, +l) ! y L' (dl

    ~ e t us now i d e n t i f y the c o e f f i c i e n t s of (c2+1) on both

  • sides of equation C6.7) . On the l e f t hand s ide the coeff ic ient is clear ly L(cl, c2 + 1) , for

    the r ight hand s ide l e t d = c + c + 1 and consider the term 1 2

    d of the expansion of (o1y + e2) . Thus the required term of the r ight hand s ide is

    Tkus the required coeff ic ient is

    which is equal t o the l e f t hand coeff ic ient of L(clI C, + 1). Multiply

    both of these coefficients by y so tha t

    d-c2 cl+l -1 Me know tha t y ' Y and (.cl + 1) ! (d - c2) ! i s equal t o

    , -1 (c2 + 1): (d - cl). since d = c 1 2 + c + 1. Thus expression (6.7) is equal t o the r ight hand s ide of (6.8) and

    o r B1 L(c1 + 1, c,) + B2 L(cl, c2 + 1) = 0, which proves he or em 1.

    The re fore

  • and may be estimated by

    Often a "k" is used i n place of the "R", i.e.

    i n which case the s t a t i s t i c s i n (6.12) are known as "k-stat is t ics"

    CKaplan [ 281) . --.-,----

    For convenience we now give the values of up t o fourth order cumulants

    i n terms of the moments measured about the means as

    L(1) = M(1) = 0

    given by Coc

    The sample k-s ta t i s t ics are unbiased and consistent estimates of the

    population cumulants , thus the quotient (6.11) converges i n probability

  • t o B (6.10) , s o long a s !4 Ccl + I, c2) does not tend t o zero , i .e.

    L(cl +- 1, c2) # 0. Thus s o long as L(cl + I, c ) # 0 we have an i n f i n i t y 2 of c o n s i s t e n t es t imates of 6.

    In our b a s i c s t r u c t u r a l model we a r e concerned wi th normal d i s t r i b u -

    t i o n s which, a s i s w e l l known, have vanishing cumulants i f the order i s

    t h r e e o r higher. Since c + c + 1 is always a t l e a s t th ree we thus know 1 2 t h a t the cumulants w i l l vanish and t h a t , i n the normal case , t h e use of

    cumulants is o f no value whatsoever.

    In t h e normal case then cumulants w i l l n o t be used f o r es t imat ion;

    unfor tunate ly they a re no t much b e t t e r f o r t h e non-normal cases. When we

    compute moments (and the re fo re cumulants, s ee equations (6.13) ) of order

    h igher than four we a r e almost wasting our e f f o r t s , unless we have a very

    ' l a r g e number o f observat ions , f o r t h e inaccuracy becomes q u i t e unmanageable

    Wendall and S t u a r t [ 311 . - --.---- - It must a l s o be remembered t h a t f o r symmetric d i s t r i b u t i o n s a l l odd

    o rde r cumulants a r e zero and a l l i n a l l it must be s t a t e d t h a t es t imat ion

    via cumulants w i l l genera l ly be unsa t i s fac to ry . Ne do not es t imate 6 f o r our example v i a t h e r e s u l t s of t h i s chapter .

    Any r e s u l t s t h a t w e could g e t would n o t have any value s i n c e X i s normally

    d i s t r i b u t e d and t h i s i s p r e c i s e l y the case where we do no t use cumulants.

  • CHAPTER 7

    THE ANALYSIS OF VARIANCE

    7.1. Repl ica t ion of the observations

    2 2 2 -2 Fle have seen t h a t when we know

    O l f O2 o r X = O 0 no problem

    2 1

    a r i s e s i n e s t ima t ing the l i n e a r r e l a t i o n . We consider now the i d e a of

    being able t o es t imate a t l e a s t one of these parameters from another type

    of a d d i t i o n a l knowledge ; r e p l i c a t i o n .

    Repl ica t ion has not been considered much i n the l i t e r a t u r e . Dorf and "- - -

    Gurland 1171 f e e l t h a t t h i s may be because those a c t u a l l y involved i n --. .-- -.

    experiments do n o t themselves r e p l i c a t e t h e i r observations. Why this

    should be s o i s not too c l e a r s ince v i l l e g a s [ 5 9 L p o i n t s o u t t h a t t h e r e

    i s no d i f f i c u l t y i n i d e n t i f i c a t i o n nor i n the achieving of cons i s t en t

    e s t ima tes of the parameters when r e p l i c a t i o n is ava i l ab le .

    ~ o l l o w i n g the s o l u t i o n of Hays [24] we s h a l l de r ive , by way o f the -----_I_-

    ana lys i s o f var iance , t h r e e es t imates due t o ~ u k q [58] along w i t h a four th -- es t ima te given by Dorf and Gurland [17] who a l s o der ive t h e f i r s t th ree - '.A*

    b u t w i t h . t h e assumption t h a t COVCE, 6 ) = 0. This assumption we w i l l drop

    and g ive a l l four es t imates f o r t h e case t h a t E and 6 may o r may no t

    be independent. Although this chapter is mainly concerned wi th t h e Analysis

    of Variance, t h i s s e c t i o n w i l l a l s o consider o t h e r e s t ima tes f o r t h e case

    o f r e p l i c a t i o n .

    It i s convenient t o change t h e no ta t ion somewhat and say t h a t we have

    n "treatments" wi th N i = 1 , 2 , . . . , n, observations on each treatment. i

  • t he t r u e values X i Y . Thus i f (XI, Y1) (X2, Y 2 ) , . . , ( X n , Yn) a r e t h e t r u e values o f our n t rea tments then the observations w i l l be

    To keep g e n e r a l i t y we do no t assume t h a t a l l t h e Ni

    a r e equa l , although

    w e w i l l assume t h a t N . 1 2 f o r a t l e a s t one i. This assures us t h a t 1

    we do indeed have r e p l i c a t e d observations.

    Define a a s t h e e f f e c t of t reatment i , s o t h a t ai = Xi - u , i

    then ai

    i s a normally d i s t r i b u t e d random v a r i a b l e with mean ze ro and

    var iance O2 Define the following q u a n t i t i e s x '

    n Me = - C . Eij w h e r e N = C N .

    N i l l i=l i

    Let SST denote the t o t a l sum of squares ,

    - 2 SST = ? . (xij-x)

    11

  • 5 8

    Thus the SST can be broken down in to two components; the variation

    - 2 due t o both treatments and chance (2 Ni(xi - X ) which we c a l l the sums of squares between (SSB) , and the variation due t o chance alone which w e

    c a l l the sums of squares within (SSW). Let us consider SSB. W e show

    f i r s t t ha t

    Since xij = lJ + ai + E i j we have

    and

    L .*. SSB = L Ni (p + ai + Me - l~ - Ma - Me)

    i

    since

  • 59

    We define the mean square between (MSB) as SSB divided by degrees of

    freedom; there are n-1 d. f . fo r SSB. The expected value of the mean

    square between (EMSB) is then

    When the N are a l l equal t o m f o r a l l i then i

    from the defini t ions of a and M . When the N are not a l l equal, i a i

    Snedecor and Cochran [ 541 give . - . --

    Since the expressions i n (7.4) and (7.3) are ident ica l when the Ni

    are a l l equal we s h a l l use (7.4) i n e i the r case, thus we maintain the

    generali ty of having Ni

    not constant. It may a lso be shown t h a t

    From equations (7.4) and (7.5) we have t h a t

    2 2 N - C N ~

    EMSB = (T + O N(n-1) X 1 '

    W e w i l l denote t h i s expression B xx'

    The mean square within (MSW) is defined as the SSW divided by

    degrees of freedom. There are N-n degrees of freedom fo r SSW.

  • To evaluate the expected mean square wi th in (EMSW) l e t i be f ixed and

    l e t xilf x~~~ ..., x be t h e r e p l i c a t e d observations on X . I f we i N i i

    - 2 only consider t reatment i we may es t imate Xi by xi and G1 by

    1 - 2 - x i - x . ) . Using a l l of the observations then we can es t imate 2 N 2 - 1 3 1 O1 1. - - 2

    by ? x i - . ) divided by degrees of freedom, c l e a r l y we have 1 3 1

    n 2 1 (Ni - n) = N - n d. f . Therefore w e e s t ima te 0 by i= 1 1

    This expression i s the same a s (7.7) thus MSW es t imates 0: and

    2 EMSW = Dl. We c a l l t h i s quan t i ty Wxx.

    I n an analogous manner we may c a l c u l a t e t h e v a r i a t i o n due t o the y ' s

    '"YY and W ) and t h a t due t o both x and y cBxy and W . W e

    YY xy

    t a b u l a t e these r e s u l t s i n t a b l e 7.1.

    Table 7.1. ANOVAR TABLE FOR REPLICATION

    SOURCE OF VARIATION + MEAN SQUARE: EXPECTED MEAN SQUARE

  • On inspect ion of the expected mean squares we f ind t h r e e es t imates o f

    B a l l converging i n p r o b a b i l i t y t o 6 a s n +- and Ni +- f o r a t l e a s t

    one i , s o long a s t h e r e spec t ive denominators do n o t converge t o zero.

    Thus t h e following th ree es t imates a r e a l l cons i s t en t .

    We do no t necessa r i ly assume Cov(&, 6 ) # 0; i f we do however then

    C s g n 6) i s not necessa r i ly t h a t of Cov(x, y ) (see chapter 2, s e c t i o n 2.3.3).

    2 W e may o b t a i n (sgn 8 ) from Cov(x, y ) = 6 a + Cov(&, 6 ) using W o r X xy from a p l o t t i n g of t h e observations.

    We may a l s o use t a b l e 7.1 f o r the func t iona l case i f w e modify t h e

    t a b l e s l i g h t l y . I n the func t iona l case a a r e no t random var iab les , b u t i 2

    w e can i n t e r p r e t them a s " f ixed e f f e c t s " . The quan t i ty (5 has no meaning, X 2

    however, we may use the t a b l e by r e t a i n i n g E N . (ai - Ma) a s i n (7.2) and i 1 2 2

    s u b s t i t u t e t h i s d i r e c t l y i n the t a b l e f o r (N - 3 ) OX. The balance of N

    this s e c t i o n i s devoted t o o t h e r es t imates o f B f o r the case of r e p l i c a t i o n .

    With r e p l i c a t i o n we have an immediate e s t ima te o f A = G:/G: and we can

    use the maximum l ike l ihood es t imate of 8 with A known. This es t imate i s

    (Dorf and Gurland [171)

    where

  • It i s c l e a r t h a t b4 i s cons i s t en t s o long a s B # 0. I f B = 0 then xy xy

    h

    we may take b = 0 c o n s i s t e n t l y unless B - X B = 0 a l s o , i n which 4 Y Y XX

    event b4 is indeterminate.

    The es t imate b4 i s no t r e a l l y a maximum l ike l ihood es t ima te of B

    f o r we do no t know A ; we only have an es t ima te of A. For the r ep l i ca ted

    case Vi l legas [59] gives the maximum l ike l ihood s o l u t i o n f o r f i and we -

    s h a l l o u t l i n e h i s method. Assume t h a t E and 6 a r e not independent,

    an assumption t h a t w e usual ly l i k e t o make and a r e able t o make i n t h e

    r e p l i c a t e d case , and l e t

    These a r e vec to r q u a n t i t i e s , hence t h e underl ining. We l e t the number

    o f r e p l i c a t i o n s be the same f o r each t rea tment , say Ni = M. Notice t h a t

    Let C be the (unknown) variance-covariance matr ix o f t h e e r r o r s i j e . The p r o b a b i l i t y dens i ty f o r one - z ( ignoring the cons tant ) is

  • and therefore the l ike l ihood function i s

    W e know from t a b l e 7.1 t h a t Z can be estimated. l e t t h i s es t imate h

    be S. Vil legas shows t h a t 8 and 8