[lecture notes in statistics] relations, bounds and approximations for order statistics volume 53 ||...

CHAPI'ER5

ORDER STATISTICS FROM A SAMPLE CONTAINING A SINGLE OUTI.JER

5.0. Introduction

Density functions and joint density functions of order statistics arising from a sample containing a

single outlier have been given by Shu (1978) and David and Shu (1978), and have been made use of by

David et al. (1977) in tabulating means, variances and covariances of order statistics from a normal sample

comprising one outlier. One may also refer to Vaughan and Venables (1972) for more general expressions

of distributions of order statistics when the sample, in fact, includes k outliers. The importance of a

systematic study of the order statistics from an outlier model and the usefulness of the tables of means,

variances and covariances of these order statistics in the context of robustness has been well demonstrated

by several authors including Andrews et al. (1972), David and Shu (1978), David (1981), and Tiku et al.

(1986).

In this Chapter, we fIrst derive several recurrence relations and identities satisfIed by the single and

the product moments of order statistics. These relations are then applied in order to determine the maxi

mum number of single and double integrals to be evaluated for the calculation of the fIrst two single

moments and the product moments of order statistics in a sample of size n, assuming these quantities for

all sample sizes less than n to be known. All these results are presented in Sections 2 through 4. Results

of similar nature for the symmetric outlier case, i.e., the case when all the n observations including the

outlying observation have distributions symmetric about zero, are presented in Section 5. In Section 6, we

derive some relations between the moments of order statistics from a symmetric outlier model and the

moments of order statistics from a single outlier model obtained by folding both the distributions at zero

and also discuss the cumulative rounding error committed by using these recurrence relations. These

results have all been established recently in a series of papers by Balakrishnan (1987b, 1988a,b,c) and they

generalize the results presented in Chapter 2. The functional behaviour of order statistics from location

and scale-outlier models is discussed in detail in Section 7. Finally, in Section 8 we make use of these

results to study the bias and mean square error of various omnibus robust estimators of the location para

meter in the normal case. Work of this nature has also been carried out earlier by Crow and Siddiqui

(1967), Gastwirth and Cohen (1970), Andrews et al. (1972), and Tiku (1980). By considering a single

outlier exponential model, Kale and Sinha (1971) and Joshi (1972) have made investigations regarding the

mean square error of a class of estimators of the mean of the exponential distribution; see also Barnett and

Lewis (1978) and Lawless (1982) for further details on this topic.

Throughout this Chapter results are derived under the assumption that the distributions under discus

sion are absolutely continuous. Since any distribution function can be represented as a weak limit of a

sequence of absolutely continuous distributions, all results which do not explicitly refer to densities con

tinue to hold for general (not necessarily absolutely continuous) distributions. See, for example, Exercises

17-19.

B. C. Arnold et al., Relations, Bounds and Approximations for Order Statistics

© Springer-Verlag Berlin Heidelberg 1989

109

5.1. Distributions of order statistics

We shall derive here the distributions of order statistics obtained from a sample of size n when an

unidentified single outlier is present in the sample. For convenience, let us represent the sample by n

independent absolutely continuous random variables Xr (r = l,2, ... ,n-I) and Y, such that Xr has pdf f(x)

and cdf F(x), and Y has pdf g(x) and cdf G(x). Further, l~t

Zl:n ~ Z2:n ~ ... ~ Zn:n (5.1) be the order statistics obtained by arranging the n independent observations in increasing order of

magnitude.

The cdf of Zn:n' denoted by hn:n(x), may now be obtained as

Hn:n(x) = Pr{all of Xl'X2, ... ,xn_1' Y :s: x}

= (F(x)}n-1 G(x). (5.2)

Similarly, the cdf of Zi:n (1 :s: i :s: n-1) may be obtained as

Hi:n(x) = Pr{at least i of Xl'''2, ... ,Xn_1, Y :s: x}

= Pr{exactly i-I of Xl'X2, ... ,xn_1 :s: x and Y :s: x}

+ Pr{at least i of Xl'''2, ... ,xn-1 :s: x}

= [to {F(x)}i-1 (l_F(x)}n-i G(x) + Fi:n_1(x), (5.3)

where Fi:n_1(x) is the cdf of the i'th order statistic in a sample of size n-I drawn from a population with

pdf f(x) and cdf F(x). Differentiating the expressions in (5.2) and (5.3), we obtain the density function of

Zi:n (1 ~ i ~ n) as _ (n-1)! (i-2 n-i

hi:n (x) - (1-2)! (n-l)! F(x)} ( 1-F(x)} G(x) f(x) (n-l) , i-I n-i

+ (I-1)!(n~I)! (F(x)} (l-F(x)} g(x) (n-l) , i-I n-i-1

+ (1-1)!(n-~-I)! (F(x)} (l-F(x)} (l-G(x)} f(x), - < x < 00, (5.4)

where the ftrst term drops out if i = 1, and the last term if j = n.

In view of the importance of this result we will also give an alternative proof which lends itself to

further extensions. The event x < Zi:n :s: x+l;x may be seen as follows

i-I 11 I n-i ---------x - x-+-l;-x-------oo

Zr ~ x for i-I of the Zr' x < Zr ~ x+l;x for exactly one Zr' and Zr > x+l;x for the remaining n-i of the Zr.

Realizing now that the outlying observation Y could belong to anyone of three parcels, we have the

following three possibilities:

(i) Xr:S: x for i-2 of the Xr and Y :s: x, x < Xr :s: x+l;x for exactly cine Xr' and Xr > x+3x for the remain

ing n-i of the Xr' with a probability

(n-1)! i-2 s: s: n-i. (1-2) !1!(n-l)! (F(x)} G(x)(F(x+ux) - F(x)}{ I-F(x+ux)} ,

(ii) X ~ x for i-I of the X , x < Y ~ x+3x, and X > x+3x for the remaining n-i of the Xr' with a r r r probability

(n-1)! i-I s: s: n-i. (1-1) !O! (n-l)! (F(x)} (G(x+ux) - G(x)}{l-F(x+ux)} ,

and

11U

(iii) Xr:S x for i-I of the Xr, x < Xr :S x+~x for exactly one Xr, and Xr > x+~x for the remaining n-i-l

of the X and Y > x+~x, with a probability r (n-l)! i-I" "n-i-l" (I-I)!!! (n-l-1)! (F(x)} (F(x+ux) - F(x)}{ I-F(x+ux)} (1-G(x+ux)}.

Regarding ~x as small, we could write

Pr(x < Zi:n :S x+~x)

_ (n-1)! i-2 "n-i" = (1-2)! (n-l)! (F(x)} G(x) (1-F(x+ux)} f(x) uX

(n-1)! i-I "n-i" + (1-1)!(n-l)! (F(x)} (1-F(x+ux)} g(x) uX

(n-l)1 i-I "n-i-l ." " [" 2] + (1-1)!(n-~-1)! (F(x)} (l-F(x+ux)} (1-G(xTUx)} f(x) uX + 0 (ux) (5.5)

where 0 [(~X)2] denotes the probability of more than one Zr falling in the interval (x,x+~x) and hence is a

term of order (~x? Dividing both sides of (5.5) by ~x and letting ~x .... 0, we once again obtain the

density function of Zi:n as in equation (5.4).

Proceeding exactly on similar lines by noting that the joint event x < Zi:n :S x+~x, y < Zj:n :S y + ~y

is obtained by the configuration (neglecting terms of lower order of probability)

__ i-_l __ 1 j-i-l n-j

x x+~x y y+~y 00

we obtain the joint density function of Zi:n and Zj:n (1 :S i < j :S n) as

_ (n-l)! (i-2 j-i-l n-j hi,j:n(x,y) - (1 2)!0 1-1)!(n-J)! F(x)} (F(y)-F(x)} (1-F(y)} G(x) f(x) f(y)

(n-1)! i-I j-i-l n-j + (1-1)!0-1-1)!(n-J)! (F(x)} (F(y)-F(x)} (1-F(y)} g(x) f(y)

(n-l)! i-I j-i-2 n-j + (1 1)!0-1-2)!(n-J)! (F(x)} (F(y)-F(x)} (l-F(y)} (G(y)-G(x)} f(x)f(y)

(n-l)! i-I j-i-l n-j + (1-1)!0-1-1)!(n-J)! (F(x)} (F(y)-F(x)} (l-F(y)} f(x) g(y)

(n-l)! i-I j-i-l n-j-l + (1-1)!0-1-1)!(n-J~1)! (F(x)} (F(y)-F(x)} (1-F(y)} (l-G(y)}f(x) f(y),

--00 < X < Y < 00, (5.6)

where the first term drops out if i = 1, the last term if j = n, and the middle term if j = i+1. Note that the

densities in equations (5.4) and (5.6) are special cases of a very general result of Vaughan and Venables

(1972). They have essentially expressed the joint density function of k order statistics arising from n

absolutely continuous populations in the form of a permanent.

5.2. Relations for sin&le moments

With the density function of Zi:n as in equation (5.4), we have the single moment ~f~~ (1 :S i :S n,

k ~ 1) as.

lI~k) = [ xk h. (x) dx. "'1:n l:n

--00

(5.7)

111

Further, let fi:n(x) be the pdf of the i'th order statistic in a sample of size n drawn from a continuous

population with pdf f(x) and cdf F(x), and the single moments of order statistics for this case be denoted by

v~k) = [ xk f. (x) dx l:n l:n

--00

= {B(i,n-i+l)}-1 L xk {F(x)}i-l (1_F(x)}n-i f(x), (5.8)

for 1 ~ i ~ n and k = 1,2,.... Then these single moments of order statistics, viz., Ilf~~ and vf~~ , satisfy the

following recurrence relations and identities for arbitrary continuous distributions F and G. As mentioned

by David et al. (1977), these results also provide useful checks in assessing the accuracy of the computation

of moments of order statistics from a single outlier model.

Relation 5.1: For n ~ 2 and k = 1,2, ... ,

n

~ ,,~k) = (n-l) v(k) + " (k). L ""1:n 1:1 ""1:1

i=1 Proof. By considering the identity

n n-l ~ Z~ = ~ X~ + yk L l:n L 1

i=1 i=1

and taking expectations on both sides, we immediately obtain the required relation.

Setting k = 1 and 2, in particular, we derive the results

n

L Ili:n = (n-l) vl:l + 111:1 i=1

and

n ~ ,,~2) = (n-l) v(2) + ,,(2) L ""1:n 1:1 ""1:1'

i=1 which have been made use of by David et al. (1977).

Relation 5.2: For 1 ~ i ~ n-l and k = 1,2, ... ,

(5.9)

i ,,~k) + (n-i) ,,~k) = (n-l) ,,~k) + v~k) . (5.10) ""1+I:n ""1:n ""1:n-l l:n-I

Proof. First, consider the expression of i Ilf!~:n from equations (5.4) and (5.7) and split the first term into

two by writing the multiple i as (i-I) + 1. Next, consider the expression of (n-i) Ilf~~ and split the last

term into two by writing the multiple n-i as (n-i-I) + 1. Finally, adding the two expressions and simplify

ing the resulting equation, we obtain the relation (5.10).

Relation 5.2 has been derived by Balakrishnan (I988a) and it is easy to see that we just require the

value of the k'th moment of a single order statistic in a sample of size n as Relation 5.2 would enable us to

compute the k'th moment of the remaining n-I order statistics, using of course the moments Il(k) and v(k)

in samples of size less than n. In particular, setting n = 2m and i = m in equation (5.10) we obtain the

following relation.

Relation 5.3. For even values of n, say n = 2m, and k ~ 1,

1 { ( k) + ( k) } _ [1 1] (k) + 1 v (k) 2" Ilm+l:2m Ilm:2m - - 2m 11m: 2m-l 2m m:2m-I

112

= ~~~~m-l + k [v~~im-l - ~~~im-l]· (5.11) For k = 1, in particular, we have the expected value of the median in a sample of even size (n = 2m)

on the LHS of (5.11), while on the RHS we have the expected value of the median in a sample of odd size

(n = 2m-I) and an additive biasing factor involving the difference of the expected values of the medians in

a sample of size 2m-l from the outlier and the non-outlier models.

Relation 5.4: For 1 SiS n-l and k = 1,2, ... ,

n ,,9') = ~ (_I,j-i [J?-I] [~-1] ,,~k~ + v~k) (5.12) '-1:n /., r J-l 1-1 '-J:J l:n-l·

j=i

~. This relation is simply obtained by considering the expression of ~~~ from equations (5.4) and

(5.7), upon expanding the term {1-F(x)}m, (m = n-i, n-i-l), binomially in powers of F(x), and finally

simplifying the resulting expression by making use of equation (5.7) and Relation 2.6.

Relation 5.5: For 2 SiS n and k = 1,2, ... ,

n

~~k) = ~ (_I~-n+i-l [J?-l] [j-~] ~ (k~ + v9') . (5.13) l:n /., J-l n-l I:J 1-1:n-l

j=n-i+l

~. First, consider the expression of ~~~ from equations (5.4) and (5.7). Next, write the term {F(x)}m

as [1-{I-F(x)}]m, (m = i-2,i-l) and expand it binomially in powers of (1-F(x)}. Relation (5.13) now

follows immediately upon simplifying the resulting expression by making use of equation (5.7) and

Relation 2.8.

Remark 5.6: Relations 5.4 and 5.5 have both been derived by Balakrishnan (1988a) and they are quite

important as they usefully express the k'th moment of the i'th order statistic from a sample of size n in

terms of the k'th moment of the largest and the smallest order statistics in samples of size n and less,

respectively. In addition, they also involve the k'th moment of the i'th and (i-l)'th order statistics in a

sample of size n-l from the non-outlier model. In any case, we could note once again from these two

relations that we just require the value of the k'th moment of a single order statistic (either the smallest or

the largest) in a sample of size n in order to compute the k'th moment of the remaining n-l order statistics,

given the moments ~ (k) and ik) in samples of size less than n. Note that this conforms with the comment

made earlier based on Relation 5.2. This is quite expected as both Relations 5.4 and 5.5 could be derived '~

by repeated application of Relation 5.2. However, we need to be careful in employing these two recur-

rence relations in the computational algorithm as increasing values of n result in large combinatorial terms

and hence in an error of large magnitude.

5.3. Relations for product moments

With the joint density function of Zi:n and Zj:n as in equation (5.6), we have the product moment

~ij:n (1 S i < j S n) as

Ili,j:n = JI xy hij:n(x,y) dy dx

1

113

(5.14)

= I I xy hij :n (y,x) dy dx, (5.15)

W2 where WI = {(x,y) : --00 < X < Y < oo} and W2 = {(x,y): --00 < Y < x < oo}. Further, let f. '. (x,y) be the I,J.n joint pdf of the i'th and j'th order statistics in a sample of size n drawn from a continuous population with

pdf f(x) and cdf F(x), and the product moments of order statistics for this case be denoted by

Vij:n = I I xy fij:n(x,y) dy dx (5.16)

WI

= JI xy fij:n(y,x) dy dx, (5.17)

2 for 1 :::; i < j :::; n. Then these product moments of order statistics, viz., Il··. and v· '. , satisfy the follow-I,J.n IJ.n ing recurrence relations and identities for any arbitrary continuous distributions F and O. In addition to

providing straight forward generalizations of the results given in Chapter 2 and some simple checks for

assessing the accuracy of the computation of product moments, these results could also be effectively

applied to reduce considerably the amount of numerical computation involved in the evaluation of means,

variances and covariances of order statistics from an outlier model, at least for small sample sizes.

Relation 5.7: For n ~ 2,

n n

L L Ili,j:n = (n-l)V~~~ + Il~~~ + (n-l)(n-2) vi:l + 2(n-l) v1:11l1:1' i=1 j=1

Proof. The above relation follows directly by considering the identity

nn nn n nn

L L Zi:n Zj:n = L L Zi Zj = L zf + L L Zi Zj i=1 j=1 i=1 j=1 i=1 i=1 j=1

and taking expectation on both sides.

Relation 5.8: For n ~ 2,

n-l n

L L Ili,j:n = [n21] vL + (n-l) v1:11l1:1' i=lj=i+l

i*j

Proof. This is simply derived from Relation 5.7 upon using the result that

n ~ 1I~2) = (n-l) v(2) + II (2) I.. ""1: n 1: I ""1: 1

i=1 obtained from equation (5.9).

(5.18)

(5.19)

Relations 5.7 and 5.8 are very simple to use and, hence, as pointed out by David et al. (1977), could

be effectively applied to check the accuracy of the computation of the product moments.

Relation 5.9: For 2 :::; i < j :::; n,

(i-I) Il··. + G-i) Il· 1'. + (n-j+ 1) Il· l' l' I,J.n 1- ,J.n 1- J- .n = (n-l) Il· 1 . l' 1 + v· l' l' l' (5.20) 1- ,J- .n- 1- J- .n-

Proof. First, consider the expression of the LHS of (5.20) obtained from equation (5.14). Now split the

114

first term in (i-I) J.!,ij:n into two by writing the multiple i-I and (i-2)+1; split the middle term in (i-i)

J.Li-1 ·:n into two by writing the multiple j-i as (j-i-1)+1; similarly, split the last term in (n-j+1) J.Li-1j-1:n

into two by writing the multiple n-j+1 as (n-j)+1. Finally, adding all these and simplifying the resulting

expression by making use of equations (5.14) and (5.16), we obtain the RHS of (5.20).

Relation 5.9 has been derived by Balakrishnan (1988a) and it should be noted that the recmrence

fonnula (5.20) would enable us to calculate all the product moments J.Lij:n (1 ~ i < j S n) by knowing n-1

suitably chosen moments, for example, J.L1 2· , J.L2 3· , ... ,J.L 1 . . ,.n ,.n n- ,n.n Relation 5.10: For 1 ~ i < j ~ n,

i-I n-j

J.Lij:n - vij:n + l l (_l)n-r-s [n;l] [n-~-S]{J.Ln_j-S+1,n-i-s+1:n_r-s - Vn- j-s+1,n-i-s+1:n-r-s} r=O s=O

j-i

= k l (-I~-i-rG~] [til] {(j-i") vr:n-j+r[J.Lj-r:j-r - vj_r:j-i"] r=1

+ (n-:J·+r) v. . [" . - v .] } (521) J-r:j-r t"'r:n-J+r r:n-J+r· .

Proof. Noting that WI u W 2 = 1R2, we have for 1 ~ i < j ~ n

./= IJ xy hi,j:n(x,y) dy dx

IR

= J J xy hij:n(x,y) dy dx + J J xy hij:n(x,y) dy dx

WI W2

= J.Lij:n + J J xy hij:n(x,y) dy dx W2

(5.22)

upon using (5.14). By writing the term {F(x)}a as [l-{l-F(x)}]a , expanding the terms {F(x)}a and

(l-F(y)} b binomially in powers of I-F(x) and F(y), respectively, in the integral over W2' and simplifying

the resulting expression using (5.15) and (5.17) we get from equation (5.22)

i-I n-j ./_ + ~ ~ (_l)n-r-s [n] [n-s]

- J.Lij:n L L s r vn- j-s+1,n-i-s+1:n-r-s r=0 s=O

i-I n-j + ~ ~ (_l)n-r-s [n-1] [n-1-s] { _ }

L L s r J.Ln-j-s+1,n-i-s+1:n-i"-s vn- j-s+1,n-i-s+1:n-i"-s· r=O s=O

Making use of Relation 2.14 we may rewrite the above equation as

j-i

./= J.Lij:n - Vij:n + l (-1~-i-r G:'] [ji~11] Vj-i":j_r Vr:n- j+r r=l

i-I n-j

+ l l (_l)n-r-s [n;l] [n-~-s ]{J.Ln-j-s+1,n-i-S+l:n-i"-S - Vn-j-s+1,n-i-s+1:n-r-s}. r=O s=O

We could. alternatively write

./= JT xy h ... (x,y) dy dx ~ I,J.n

IR

(5.23)

115

= (( xy hij:n(x,y) dy dx.

Now expanding the tenn (F(y)-F(x)} a binomially in powers of F(x) and F(y) and simplifying the resulting

expression by using equations (5.7) and (5.8), we also obtain

j-i

./= k l (_I~-i-r G~][j~11]{G-r) vr:n- j+r ~j-r:j-r r=1

+ (n-J+r) vj_r :j;- ~r:n-j+r}' Relation (5.21) follows upon equating (5.23) and (5.24).

(5.24)

In the above relation, established by Balakrishnan (1988a), it should be noted that only two product

moments, viz., ~". and ~ '+1 '+1" are in samples of size n from the outlier model. In particular, IJ.n n-J ,n-I .n for j = i+ 1 we have the following relation.

Relation 5.11: For i = 1,2, ... ,n-l,

~i,i+l:n - vi,i+l:n + (_I)n {~n-i,n-i+l:n - Vn- i,n-i+l:n}

i-I n-i-l

= l l (_I)n+l-r-s [n;I] [n-~-s] {~n--i-s,n--i-S+l:n-r-s - Vn- i-s,n-i-s+l:n-r-s} r=O s=1 i-I

+ l (_I)n+l-r [n;I] {~n--i,n-i+l:n-r - Vn- i,n-i+l:n-r} r=1

+ k [~]{i vl:n-i[~i:i - Vi:i] + (n-i) vi:i[~I:n-i - v1:n- i]}· (5.25)

Similarly, by setting j = n--i+l in equation (5.21), we obtain the following relation.

Relation 5.12: For i = 1,2, ... ,[n!2],

{1+(-I)n} [~i,n-i+l:n - Vi,n-i+l:n] i-I i-I

l l (-1 )n+ l-r-s [ n; 1] [n-~ -s ]{ ~i-s,n-i-s+ 1 :n-r-s - v i-s,n-i-s+ 1 :n-r-s } r=O s=1

i-I

+ l (_I)n+l-r [n;I] {~i,n-i+l:n-r - Vi,n-i+l:n-r} r=1

n-2i+l

+ k l (_I)n+l-r [i+~-d [n~7]{(i+r-l)Vn_i-r+l:n_i-r+l [~r:i+r-l - Vr:i+r-I] r=1

+ (n-i-r+l) vr:i+r-l [~n-i-r+l:n-i-r+l -Vn-i-r+l:n-i-r+l)}' (5.26)

For odd values of n, we note from Relation 5.11 that we need to calculate only (n-l)/2 product

moments ~. '+1' (I!> i !> ~). Similarly, for even values of n, Relation 5.12 shows that the product 1,1 .n ~

moments~. '+1' (I!> i !> [n/2]) could all be obtained from the moments in samples of sizes n-I and I,n-I .n less. In particular, for even values of n, say n = 2m and i = 1, Relation 5.12 yields

2 [~1,2m:2m - vI ,2m:2m)

116

2m-l

= im 1: (-ll-l [2~] {r v2m-r:2m-r [Ilr:r - Vr:r] + (2m-r) vr:r [1l2m-r:2m-r - V2m-r:2m-r]} r=l

which, upon using equation (2.26) and simplifying, yields the relation

2m-l

1l1,2m:2m = 1: (-1l-1 [2~-1] vr:r 1l2m-r:2m-r. (5.27) r =1

This relation generalizes the result of Govindarajulu (1963a) to the case when the order statistics arise from

a sample comprising a single outlier.

Furthermore, by setting n = 2m and i = m in equation (5.26), we obtain the following recurrence

relation.

Relation 5.13: For m = 1,2, ... ,

2[11 -v ] m,m+l:2m m,m+l:2m

m-1 m-1

1: L (_lrs- 1 [2m-I] [2m-l-s] { } s r Ilm-s,m-s+l:2m-r-s - vm- s,m-s+l:2m-r-s

+

r=O s =1

m-l

L (-ll-1 [2m;l] {llm,m+l:2m-r - Vm,m+1:2m-r} r=1

+ i [2:]{Vm:m[1l1:m - v1:m] + Vl:m[llm:m - Vm:m]}· (5.28)

In particular, for the case m = 1, upon using the result that V1,2:2 = vt1 we see that Relation 5.13

reduces to the well-known identity 111,2:2 = vl:lIl1:1"

5.4. Relations for covariances

Making use of the results derived in Sections 2 and 3, we now obtain upper bounds for the number

of single and product moments to be evaluated for calculating the means, variances and covariances of

order statistics in a sample of size n by assuming these quantities to be available in samples of sizes n-l

and less. The following theorem, proved by Balakrishnan (1988a), thus generalizes the result given in

Section 2.3 to the case when the order statistics arise from a sample containing a single outlier.

Theorem 5.14: In order to find the means, variances and covariances of order statistics in a sample of size

n out of which n-l variables are from an arbitrary continuous population with cdf F(x) and one outlying

variable from another continuous population with cdf G(x), given these quantities in samples of sizes n-l

and less and also the quantities corresponding to the population with cdf F(x), one has to evaluate at most

two single moments and (n-2)/2 product moments for even values of n; and two single moments and

(n-l)/2 product moments for odd values of n.

Proof. In view of Relation 5.4, it is sufficient to evaluate just two single moments Iln:n and Jl~~~ for

calculating the means and variances of all order statistics in a sample of size n. Further, as mentioned

earlier Relation 5.9 would enable us to calculate all the product moments Jl. .. (l:'> i < j :'> n) if we know 1,J.n

117

n-1 suitably chosen moments, for example 11· '+1' (1 ~ i ~ n-1). When n is odd, we need to evaluate 1,1 .n

only (n-1)/2 of these moments as the remaining (n-1)/2 product moments could be obtained by applying

Relation 5.11. Similarly, when n is even, say n = 2m, we need to evaluate only (n-2)/2 = m-1 of the

immediate upper-diagonal product moments as Relation 5.13 gives the product moment 11 +1'2 and m,m . m the remaining m-1 product moments could be obtained by applying Relation 5.11.

Relation 5.15: For 1 ~ k ~ n-1,

n-k+1 k+1

1: [~=n 1l1,j:n + 1: [n~~d 1l1j:n j=2 j=2

[n-1] [n-1] = k v1:k 1l1:n-k + k-1 v1:n-k 1l1:k' (5.29)

Proof. For 1 ~ k ~ n-1, first consider

n-k+1 n-k+1

1: [~={] 1l1,j:n = 1: [~={] f f xy h1j:n(x,y) dx dy, (5.30) j=2 j=2 WI

where hI "n(x,y) is as given in equation (5.6). Now upon interchanging the summation and the integral ,J. signs and then using the binomial identity that

m \' [m] r m-r m I. r (F(y)-F(x)} (l-F(y)} = (l-F(x)} , (5.31)

r=O we obtain from (5.30) that

n-k+1

1: [~={] 1l1j:n = f f xy Hk,n(x,y) dx dy, j=2 WI

(5.32)

where

_ (n-1)! n-k-1 k-1 - (k-l)!(n=k-I)! (l-F(x)} (l-F(y)} g(x) f(y)

(n-1)! n-k-1 k-1 + (k-l)!(n-k-l)! (l-F(x)} (l-F(y)} f(x) g(y)

(n-1)! n-k-2 k-1 + (k-I)!(n-k-2)! (l-F(x)} (l-F(y)} (l-G(x)} f(x) f(y)

(n-1)! n-k-1 k-2 + (k-2)!(n=k-I)! (l-F(x)} (l-F(y)} (l-G(y)} f(x) f(y). (5.33)

Next, consider for 1 ~ k ~ n-1

k+1 k+1

1: [n~k~d Ilij:n = 1: [n~k~l] f f xy h1,j:n(y,x) dy dx, (5.34) j=2 j=2 W2

from equation (5.15). Now upon interchanging the summation and the integral signs and then using the

binomial identity that

m

1: [~] {F(x)-F(y)}r {l_F(x)}m-r = {l_F(y)}m,

r=O we obtain from (5.34) that

118

k+l

I [n~~l] ~lj:n = r I xy Hk,n(x,y) dx dy, j=2 ";"2

(5.35)

where Hk (x,y) is as defined in (5.33). Finally, upon adding equations (5.32) and (5.35), noting that ,n

WI u W2 = ~ , and then simplifying the resulting expression by using equations (5.7) and (5.8), we obtain

the relation in (5.29).

Note that Relation 5.15 involves the product moments ~lj:n (2 S; j S; n) and first order single

moments only, and that there are only [n/2] distinct equations since the relation for k is same as the rela

tion for n-k. Thus, for even values of n, there are only n/2 equations in n-1 product moments and a

knowledge of (n-2)/2 of these would enable us to calculate all of these product moments provided the first

single moment in samples of sizes less than n are all known. Similarly, for odd values of n, we only need

to know (n-1)/2 product moments. Note that these bounds are exactly same as the bounds given in

Theorem 5.14 for the product moments to be evaluated for the calculation of all the product moments.

This is quite expected, since the product moments ~lj:n (2 S; j S; n), along with Relation 5.9 are also

sufficient for the evaluation of all the product moments.

Relation 5.16: For 1 S; i S; n-1 ,

n i

l ~ij:n + l ~r,i+1:n = (n-1) v1:1 ~i:n-1 + vi:n- 1 ~1:1' (5.36)

j=i+1 r=1

Proof. For 1 S; i S; n-l, consider

n n

l ~ij:n = l I I xy hij:n(x,y) dx dy, (5.37) j=i+l j=i+l WI

where hi' J" (x,y) is as given in (5.6). Upon interchanging the summation and the integral signs and then , .n using the binomial identity in (5.31), we get from (5.37) that

n

l ~ij:n = I I xy H;,n(x,y) dx dy, j=i+l WI

where

* _ (n-1)! (i-2 n-i-l Hi,n(x,y) - (1-2)!(n-I-I)! F(x)} (1-F(x)} G(x) f(x) f(y)

(n-1)' (i-1 n-i-1 + (l-l)!(n-~-I)! F(x)} (l-F(x)} g(x) f(y)

(n-l)! (i-1 n-i-2 + (I-I)! (n-I-2)! F(x)} (1-F(x)} (l-G(x)} f(x) f(y)

(n-1)' (i-1 n-i-1 + (l-I)!(n-~-I)! F(x)} (1-F(x)} f(x) g(y).

Proceeding similarly, we also obtain

i

l ~r,i+1:n r:.1

i

l r I xy hr,i+1:n(Y'X) dy dx r=1 ";"2

= I I xy H;,n(X,y) dx dy, W2

(5.38)

(5.39)

(5.40)

119

* where Hi,n(x,y) is as defined above in (5.39). By adding equations (5.38) and (5.40), noting that

WI u W 2 = ~, and then simplifying the resulting expression using equations (5.7) and (5.8), we derive the

required relation.

Relation 5.16 has been made use of by Balakrishnan (1988a) in deriving some similar relations

satisfied by the covariances of order statistics, viz., 0. "n = Cov(Z .. ,Z"n) = 11··. - Il" n IlJ" n (1 ~ i < j -1J. 1.n J. 1,J.n 1. . ~ n). These generalize the results of Joshi and Balakrishnan (1982) to the case when the order statistics

arise from a sample comprising a single outlier.

Relation 5.17: For 1 ~ i ~ n-l,

n i

\' 0·· + \' 0·· 1 L 1J:n L J,1+:n j=i+l j=1

i

= [i v1:1 -.L Ilj:n) [lli+1:n -lli:nJ - [111:1 - Vl:lJ [Ili:n - Vi:n- 1} J=l .

Proof. Using the fact that Ili,j:n = 0ij:n + Ili:n Ilj:n in (5.36) we get n i

\' 0·· + \' 0·· 1 L 1,J:n L J,1+:n j=i+l j=l

n

= (n-1)v1:1 Ili:n_1 + vi:n-11l1:1 -Ili:n L Ilj:n -lli+1:n L Ilj:n'

With the identity

n

j=i+1 j=l

L Ilj:n = (n-1) VI: 1 + 111:1 - L Ilj:n j=i+1 j=1

obtained from Relation 5.1, we have the RHS of equation (5.42) as

i

(n-1) v1:1 [lli:n-1 -lli:nJ + 111:1 [Vi:n- 1 -lli:nJ - [lli+1:n -lli:nJ L Ilj:n' j=l

Now making use of the result that

(n-1) [Ili:n-l -lli:nJ = i [lli+1:n -lli:nJ - [Vi:n- 1 -lli:nJ

obtained from Relation 5.2, we derive the relation in (5.41).

(5.41)

(5.42)

Note that Relations 5.16 and 5.17 give extremely simple and useful results for checking the calcula

tions of product moments and covariances of order statistics from a sample of size n comprising a single

outlier. In particular, setting i = 1 and i = n-1 in equation (5.41), we get the identities

n

20 +\'0. 1,2:n L 1,J:n j=3

[V1:1 -1l1:nJ [1l2:n -1l1:nJ - [111:1 - v1:1J [1l1:n - v1:n- 1J and

n-2

2 0 + \' 0· n-1,n:n L J,n:n = [Iln:n -1l1:1J [Iln:n -lln-1:nJ - [111:1 - v1:1] [lln-1:n - vn- 1:n- 1] j=l

= [Iln:n - Ill: 1] [Iln:n - V n-1:n-1 J - [Iln:n - V 1:1 J [Iln-l:n - vn- 1:n- 1]·

120

The last equation has been obtained from the previous equation simply by rearranging the terms on the

right-hand side.

5.5. Results for symmetric outlier model

Let us assume that the density functions f(x) and g(x) are both symmetric about zero. It is then easy

to see from equations (5.4) and (5.6) that

hn_i+ l:n(x) = hi:n(-x), I :s; i :s; n,

and

hn_j+ l,n-i+ l:n(y,x) = hi,j:n(-x,-y), 1:S; i < j :s; n. As a result, for a symmetric outlier model (the case when both f(x) and g(x) are symmetric about zero) we

have the relations

(k) _ (_l)k (k) ~n-i+1:n - ~i:n'

~n-j+l,n-i+l:n = ~ij:n' and

1 :s; i :s; n, k ~ 1,

1 :s; i < j :s; n,

°n-j+I,n-i+1:n = ~n-j+l,n-i+1:n - ~n-j+1:n ~n-i+1:n = ~ij:n - ~i:n ~j:n

(5.43)

(5.44)

= 0ij:n' 1 :s; i :s; j :s; n. (5.45)

Equations (5.43) - (5.45) could be used to simplify many of the results presented in Sections 2 through 4.

These relations then would help us reduce the bounds given in Theorem 5.14 for the case when the order

statistics arise from a symmetric outlier model.

Relation 5.18: For n even, say n = 2m,

(k) _ ~m:2m-l - 0 for odd k

= 1 {2m II (k) _ V (k) } for even k. 2m-l r-m:2m m:2m-l

Proof. Equation (5.46) is obtained from (5.43) s1mply by setting n = 2m-l and i = m.

(5.46)

(5.47)

Equation (5.47), on

the other hand, follows directly from Relation 5.3 upon using the result that ~~!L2m = ~~~~m for even

values of k.

Next, upon using (5.43) and (5.44) in Relation 5.10 we also derive thefollowing result.

Relation 5.19: For 1 :s; i < j :s; n,

j-i

(1 +(_l)n) [".. - v.. ] =!. '\' (-1' i-i-r-1 r.n] [j:r-11] {(i-r) V . ["1· - vI. ] r-1,J:n 1,j:n n I., r U-r 1- r:n-J+r r- :J-r :J-r r=1

+ (n-j·+r) vI· [" . - V . J} :J-r r-r:n-J+r r:n-J+r i-I

+ l (_l)n-r-l [n;l] {~n-j+1,n-i+l:n-r - Vn- j+1,n-i+1:n-r} r=1 i-I n-j

+ '\' '\' (_1)n-r-s-1 [n-1] [n-1-S] { I., I., s r ~n-j-s+1,n-i-s+1:n-r-s

r=O s=l -V . 1 . 1 } n-J-s+ ,n-1-S+ :n-r-s . (5.48)

121

Note that the right-hand side of the above realtion involves only the expected values and product

moments in samples of sizes n-1 and less. In particular, for even values of n we get from (5.48) that

2 II.. = 2 II . 1 . 1 r-lJ:n r-n-J+ ,n-l+ :n

i-I

= 2 Vi,j:n + l (-1/-1 [n;l] {lln-j+1,n-i+1:n-r - Vn- j+1,n-i+1:n-r} r=1

j-i j-i-r-1 r ] [ ] { + k l (-1) U~r ji~11 G-r) v r:n-j+r [Ill :j-r - vI :j-rJ

r=l

+ (n-j·+r) vI· [" . - v . J} :J-r r-r:n-J+r r:n-J+r

i-I n-j

+ l l (_l/+s-l

r=O s=l

[n-1] [n-1-s] { } s r Iln-j-s+1,n-i-s+1:n-r-s - vn-j- s+1,n-i-s+1:n-r-s . (5.49)

This result, established by Balakrishnan (1988a) generalizes the relation in (2.43) to the case when the

order statistics arise from a symmetric outlier model. Also from equation (5.49) we may note that we do

not have to evaluate any product moments in a sample of size n whenever n is even. In addition, by

setting i = n-l andj = n in equation (5.49) and using the fact that 111:1 = v1:1 = 0, we get

n-2

2 II - 2 v + '\ ( l)r-1 [n-1] {II v } r-l,2:n - 1,2:n L - r r-1,2:n-r - 1,2:n-r r=l

which, upon using the result in (2.44) yields the relation

n-2

2 1l1,2:n = l (-1/-1 {[n;l] 1l1,2:n-r + [~=U V1,2:n-r} r=l

for even values of n.

(5.50)

In the following theorem, which is an analogue of Theorem 5.14 for the symmetric outlier model

case, we essentially make use of equations (5.46) - (5.49) in order to detennine the upper bounds for the

number of single and product moments to be evaluated in a sample of size n.

Theorem 5.20: In order to detennine the means, variances and covariances of order statistics in a sample

of size n out of which n-1 variables are from an arbritary continuous symmetric population with cdf F(x)

and one outlying variable from another continuous symmetric population with cdf G(x), given these quanti

ties in samples of sizes n-1 and less and also the quantities corresponding to the population with cdf F(x),

one has to evaluate at most one single moment if n is even; and one single moment and (n-l)/2 product

moments if n is odd.

Proof. In order to compute the first single moments Ili:n (1 ~ i ~ n), because of Relation 5.4 we have to

evaluate at most one single moment if n is even and no single moment if n is odd as in this case we have

Iln+l = 0 from equation (5.46). Next, in order to compute the second single moments Ilf7~ (1 ~ i ~ n), T:n

we have to evaluate at most one single moment if n is odd and no single moment if n is even as in this

case we have

11 (2) = .! {(n-1) 11 (2) + v(2) } n n n n 2: n 2: n- 1 2: n- 1

122

from equation (5.47). Finally. in order to obtain all the product moments ~ij:n (1 ~ i < j ~ n). we note from Theorem 5.14 that we have to evaluate at most (n-l)/2 product moments if n is odd; however. for

even values of n we do not have to evaluate any product moment as in this case we could compute all the

product moments by using equation (5.49). Hence. the theorem.

5.6. Results for two related outlier models

In the previous section we have established some relations for both single and product moments of

order statistics from a symmetric outlier model. In this section we once again consider the moments of

order statistics from a symmetric outlier model and express them in terms of the moments of order statistics

* in samples drawn from the population with pdf f (x) (obtained by folding the pdf f(x) at zero) and the

* moments of order statistics in samples drawn from the population with pdf f (x) comprising a single outlier

* with pdf g (x) (obtained by folding the pdf g(x) at zero). These results have been proved recently by

Balakrishnan (1988b) and they generalize the relations presented in Section 2.6 to the case when the order

statistics arise from a symmetric outlier model. These results have also been successfully applied by

Balakrishnan and Ambagaspitiya (1988) in order to evaluate the means. variances and covariances of order

statistics from a single scale-outlier double exponential model. They have then used these quantities in

order to examine the variances of various location estimators expressible as linear functions of the order

statistics. Similar work for the normal case has been carried out by David and Shu (1978).

In this regard. let us denote for x > 0

* * F (x) = 2 F(x) - 1. f (x) = 2 f(x). (5.51)

and

* * G (x) = 2 G(x) - 1. g (x) = 2 g(x). (5.52)

* * That is. the density functions f (x) and g (x) are obtained by folding the density functions f(x) and g(x) at

the point zero. respectively. Let us now denote the single and the product moments of order statistics in a

random sample of size n drawn from a population with pdf f*(x) and cdf F*(X) defined in (5.51) by v;~~)

(1 ~ i ~ n. k ~ 1) and V;j:n (1 ~ i < j ~ n). Furthermore. let Il;~~) (1 ~ i ~ n. k ~ 1) and Il;j:n (1 ~ i < j ~ n) denote the single and the product moments of order statistics obtained from a sample of n inde-

* * pendent random variables out of which n-l have pdf f (x) and cdf F (x) defined in (5.51) and one variable

* * has pdf g (x) and cdf G (x) defined in (5.52). Then Balakrishnan (1988b) has essentially derived some

(k). *(k) *(k) * * simple formulae expressing the moments Ili:n' ~ij:n In terms of ~i:n • vi:n • Ilij:n and vij :n Relation 5.21: For 1 ~ i ~ nand k = 1.2 .....

i-I n-l 2n II~) = t [n-l] v ~(k) + (_I)k t [n-l] *(k)

.... l~n L r-l l-r:n-r L r vr-i+l:r ~l ~i

123

i-I n

+ l [n;l] ~;~?n; + (_I)k l [~=U ~;~~ll:r' (5.53) r=O r=i.

Proof. First, express the single moment ~~~~ as the sum of three integrals using equations (5.4) and (5.7)

and split the integrals each into two parts, one on the range (--,0) and the other on the range (0,00). In the

integrals on the range (--00,0), make a substitution such that the range becomes (0,00), use the facts F(-x) =

I-F(x) and G(-x) = I--G(x) along with equations (5.51) and (5.52), and express the integrands in terms of

*** * * m * F , G , f and g . Now expand (I +F (x») in powers of F (x), integrate termwise and the relation in

(5.53) readily follows.

A similar relation for the product moments is established in the following result.

Relation 5.22: For 1 :!> i < j :!> n,

n 2 ~ij:n

i-I

= l r=1 i-I

+ l r=0

n-l

[n-l] * ~ r-l Vi-r,j;:n-r + l

r=j n

[n;l] ~;-r,j-r:n-r + l r=j

[n-l] * r V r+ I-j,r+ l-i:r

[n-l] * r-l ~r+l-j,r+l-i:r

j-l j-l ~ [n-l] * * ~ [n-l] * * - l r-l vj;:n; ~r+l-i:r - l r vr+l-i:r ~j;:n;'

r=i r=i

(5.54)

Proof. First of all, express the product moment ~ij:n as the sum of five integrals using equations (5.6) and

(5.14). Next, by noting that

WI = ((x,y): --00 < x < y < 00) = Rl u R2 u R3,

where

R1 ={(x,y): 0<x<y<oo),R2 ={(x,y): --oo<x<O,O<y<oo)

and

R3 = ((x,y): --00 < x < y < 0),

split each of the five integrals into three parts. In the integrals on the range R1, express F and G in terms

* * * * *m of F and G and f and g in terms of f and g , respectively, expand (1 + F) binomially in powers of

* F and integrate termwise. In the integrals on the range R2, make a substitution x = -z, use the results

* * * * that F(-x) = I-F(x) and G(-x) = I--G(x), express the integrand in terms of F , G , f and g , expand

(F*<y) + F*(x»)m binomially in powers of F*<y) and F*(x) and then integrate termwise. Finally in the

integrals on the range R3, put x = -u and y = -v, use the results that F(-z) = I-F(z) and G(-z) = I-G(z),

* * * * *m * express the integrand in terms of F , G , f and g , expand (I+F) binomially in powers of F and

integrate termwise. Combining all these results and simplifying the resulting expression, we derive the

relation in (5.54).

*(k) *(k) * * Remark 5.23: If the moments v i :m ' ~i:m' v ij :m and ~ij:m are all available for sample sizes up to n,

then all the single moments ~~~) (1 :!> i :!> n) and the product moments ~. '. (I:!> i < j :!> n) of order stat-1. n l,j.n

istics in a· sample of size n from a symmetric outlier model (with a single outlier present) could be com-

puted by using Relations 5.21 and 5.22. Thus, for example, Balakrishnan and Ambagaspitiya (1988) have

applied Relations 5.21 and 5.22 in computing the means, variances and covariances of order statistics from

124

a single scale-outlier double exponential model by making use of the single and the product moments of

order statistics from a standard exponential distribution and also the single and the product moments of

order statistics from a single scale-outlier exponential model.

Remark 5.24: In particular, if we set G(x) ;: F(x) (that is, when the variable Y in the sample is not an

outlier), then Relations 5.21 and 5.22 simply reduce to

i-I n 2n i k ) = \' [n] v~(k) + (_I)k \'

en L r l-r:n-r L [n] v*(k) r r+ I-i:r

r=O r=i and

i-I n j-I

2n vij:n = L [~] V;-r,j-r:n-r + L [~] V;+I-j,r+l-i:r - L [~] V;+I_i:r V~-r:n-r' r=O r=j r=i

respectively. Note that these are precisely the same as the results presented in Section 2.6.

Remark 5.25: In using Relations 5.21 and 5.22, an error coud essentially arise from two sources, (i) due to

approximations in the coefficients, and (ii) due to approximations in the pivotal quantities. Since the

coefficients occurring in both the relations are simple binomial coefficients which are integral and could be

evaluated exactly at least for small values of n, we may assume that the error involved due to the approxi

mations in the coefficients to be zero. Now if E and E' are the errors involved in approximating each of

the pivotal values v *(k) and Il *(k), respectively, then the maximum cumulative rounding error in evaluating

Il~~~ (I ~ i ~ n) by means of Relation 5.21 is given by

i-I n-l i-I n

2-fl [E L [~=U +E L [n;l] +E' L [n;l] +E' L [~=n] r=1 r=i r=O r=i

n-I n n-I

< 2-n e* [L [n;l] + L [~=U] < 2--(n-l) e* L [n;l] = e*, r=O r=1 r=O

* where E = max(E,E '). That is, the maximum error involved in numerically computing the single moments

Il~~~ (I ~ i ~ n, k <! I) using the moments v *(k) and Il *(k) is at most E * = max(E,E '), where E and E' are

respectively the maximum errors involved in approximating each of the pivotal quantities v *(k) and Il *(k).

*(k) *(k) * * Hence, if the moments v. , Il· , v·· and Il·· are computed sufficiently accurately, then the em l:m 1,J:m 1,J:m

single moments Ilf~~ (I < i ~ n, k <! I) and the product moments Ilij:n (I ~ i < j ~ n) could all be com

puted by using Relations 5.21 and 5.22 without accumulating serious rounding errors, at least for small

sample sizes.

5.7. Functional behaviour of order statistics

Let us first consider the special case of a location-outlier model, i.e., G(x) = F(x-A.) for all x. We

could now write Y = Xn +A., where Xn is a random variable with pdf f(x) and F(x) and independent of the

remaining n-I variables Xl' X2, ... ,Xn_l . For convenience let us denote the i'th order statistic in this case

125

by Zi:n(A), its pdf by hi:n(x;A) and the cdf by Hi:n(x;A). Then it could be easily seen from equation (5.3)

that Hi:n(x;A) is a decreasing function of A. The behaviour of Zi:n(A) as a function of A has been studied

in detail by David and Shu (1978) (also see Hampel, 1974). Denoting the observed values of the random

variables X, Y and Z by x, y and z, and inserting y = xn +1.. into the ordered sample of size n-I, viz.,

xl. 1 S; x2. 1 S; ... S; x 1. l' we then have for fixed values of xl' x2'···, x that .n- .n- n- .n- n zl. (A) = x +1.. if x +1.. S; x1.n_1 .n n n.

= xI:n-I if xn +1.. > xI:n_1 (5.55) and for i = 2,3, ... ,n-1

and

= xi-I:n-I = x +1..

n

= xi:n_1

zn:n(A) = xn- I:n- I = X +1.. n

if x +1.. S; x· I·n 1 n 1- . -

if x. 1 1 < x +1.. S; x. 1 1- :n- n l:n-if xn +1.. > xi:n_1

if xn +1.. S; xn- I :n- I if xn +1.. > xn_l :n_l .

From equations (5.55) - (5.57), we see that zi:n(A) is a nondecreasing function of A; also

lim zl. (A) = zl. (-00) = -00, A~.n .n

lim z .. (A) = z .. (-00) = x. 1. l' 2 S; i S; n, A~ l.n l.n 1- .n-

lim Z •• (A) = Z •• (00) = x.. I' 1 S; i S; n-I, 1..-7 00 l.n l.n l.n-

and

(5.56)

(5.57)

lim Z (A) = Z (00) = 00 (5.58) 1..-700 n:n n:n .

For a finite value of A we see from equations (5.55) - (5.57) that E(Zi:n(A» = lli:n(A.), 1 S; i S; n, exists if

E(X) exists. By using the monotone convergence theorem (for example, see, Loeve, 1977) we also get for

2S;iS;n

lim E(Z .. (A» = E [~im Z .. (A)] A~ I.n II.~ I.n

implying that

Ili:n(-OO) = E(Xi_I :n_I) = vi_l :n_l ; similarly for 1 S; i S; n-I

t~ E(Zi:n(A» = E[~~~ Zi:n(A)]

implying that

Ili:n(oo) = E(Xi:n_l ) = vi:n_l · Also

Ill:n(-OO) = -00 and IlI:n(oo) = 00.

Furthermore, upon noting that for fixed x and y

f(x-A) -7 0, F(x-A) -7 0, F(Y-A) - F(x-A) -7 0

as A -7 00, we have from equations (5.4) and (5.6) that

pm hi:n(x;A) = hi:n(x;oo) = fi:n_l(x), 1 S; i S; n-I, 11.-700

and

lim h ... (x,y;A) = h ... (x,y;oo) = f... 1 (x,y), 1 S; i < j S; n-l. 1..-7 00 I,J.n 1,J.n I,J.n-

Using a similar argument we also have

(5.59)

(5.60)

(5.61)

(5.62)

(5.63)

126

i~ hi:n(X;A) = hi:n(x;--) = fi_1:n_1(x), 2 SiS n,

and

i~ hij:n(X,y;A) = hij:n(x,y;--) = fi-1j-1:n-1(x,y), 2 S i < j S n.

Now upon using the Lebesgue dominated convergence theorem (see Loeve, 1977) we obtain

lim a ... (A) = a ... (00) = Cov(X.. l'X.. 1)' 1 SiS j S n-1, A-+oo IJ.n IJ.n I.n- J.n-and

lim a ... (A) = a ... (--) = Cov(X. 1. l'X. 1. 1)' 2 SiS j S n. A-!--<>O IJ.n IJ.n 1- .n- J- .n-

(5.64)

(5.65)

(5.66)

(5.67)

Next, let us consider a scale-outlier model, i.e., G(x) = F(x/'t) for all x, where t > O. In this case we

could write Y = tXn, where Xn is a random variable with pdf f(x) and cdf F(x) and independent of the n-1

'" variates Xl'Xz, ... ,Xn_1. For convenience let us denote the i'th order statistic in this case by Zi:n(t), its

'" '" '" pdf by hi:n(x;t) and the cdf by Hi:n(x;t). From equation (5.3) it could then be seen that Hi:n(x;t) is a decreasing function of t for fixed positive values of x and an increasing function of t for fixed negative

'" values of x. The functional behaviour of Zi:n(t) as a function of t has been studied by David and Shu

(1978). As before, denoting the realizations of the variables X, Y and Z by x, y and z, and inserting y =

t xn into the ordered sample x1:n-1 S x2:n-1 S ... S xn- 1:n- 1' we have for fixed values of xl'x2, ... ,xn that

'" zl:n(t) = t xn if t xn S x1:n-1

= x1:n-1 if t xn > x1:n- 1 (5.68) and for 2 SiS n-1

and

'"

= xi-l:n-1 =tx n

= xi:n-1

if t xn S xi-1:n-1

if xi-1:n-1 < t xn S xi:n-1 if t xn > xi:n- 1

zn:n(t) = xn- 1:n- 1 if t xn S xn- 1:n- 1

(5.69)

= t xn ift xn > xn- 1:n- 1. (5.70)

'" From equations (5.68) - (5.70), we see that zi:n(t) is nondecreasing in t if xn > 0 and is nonincreasing in

t if xn < O. Moreover, for 2 SiS n-1,

'" '" lim z. (t) = zi:n(oo) = xi:n-1 if xn > 0 A-+oo I:n

= xi-1:n-1 if xn < O. (5.71)

Now if X is symmetrically distributed about zero, we have from equation (5.71) that for 2 SiS n-1

1 im E(Z:. (t» = E [lim Z: (t)] = E [z: (00)] A-+oo l.n t-+oo I:n I:n

which implies = Pr[Xn > 0) E[Xi:n_1) + Pr[Xn < 0) E[Xi_1:n_1)

Jl;:n(oo) = i [Vi- 1:n- 1 + vi:n-d· (5.72)

In addition, we also have

'" '" limE(ZI. (t»=Jll. (00)= __ t-+oo .n .n

and

127

* * ~ im E(Zn:n('t)) = Iln:n(oo) = 00 •

• ~OO

We shall make use of all these results in the next section in order to examine the bias and the mean square

error of various estimators of the location parameter that are expressible as linear functions of the order

statistics when an unidentified single outlying observation is present in a sample of size n.

5.8. Almlications in robustness studies

The robust estimation of the location and scale parameters of symmetric populations, in particular,

normal distribution, has been of considerable interest in the recent past. For various symmetric populations,

Crow and Siddiqui (1967) have studied the efficiency of some estimators of the location parameter such as

the median, Winsorized means and trimmed means. Based on Monte Carlo methods, Andrews et al. (1972)

have carried out a similar study for a much larger class of robust estimators of the location parameter that

also includes many adaptive estimators which essentially adapt themselves to some special features of a

given sample. One could also refer to Tiku et al. (1986) for a detailed comparative study of various robust

estimators of the location and scale parameters.

Here we shall restrict our attention to the following estimators of the location parameter that are

based on a sample of size n.

(a) Sample mean:

n X =! ~ z ..

n n l l:n' i=1

(b) Trimmed means:

n-r

Tn(r) = n~2r l Zi:n; i=r+l

(c) Winsorized means:

n-r-l

Wn(r) = k [ l Zi:n + (r+l) [Zr+l:n + Zn-r:n]]; i=r+2

(d) Modified maximum likelihood estimators:

n-r-l

Mn(r) = ~ [ l Zi:n + (1+r~)[Zr+l:n + Zn--r:n]], i=r+2

where m = n-2r+2r~ ; Tiku (1967, 1980) has given the expression for ~ while Tiku et al. (1986)

have tabulated the values of ~ for various choices of n and r;

(e) Linearly weighted means:

n 2- r

Ln(r) = n 1 2 l (2i-l) [Zr+i:n + Zn-i'-i+l:n] 2(2 - r) i=1

for even values of n;

(f) Oastwirth mean:

128

o = 0.3 (Z + Z n ) + 0.2 (Zn + Zn ) n [!]+l:n n-[3"]:n Z:n r+l:n

for even values of n, where [!] denotes the integral part of !.

For the location-outlier model, the estimators considered above may all be written as

n

M(A) = l ai Zi:n(A).

i=l n

From equations (5.55) - (5.57) it could be immediately seen that L a· z.. (A) is a nondecreasing contini=1 1 1.n

n uous function of A and, in addition, E(M(A» = L a. IJ. .. (A) is an increasing function of A, with E(M(oo» =

i=1 1 1.n

00 except when an = 0, and E(M(--» = -- except when a1 = O. From equations (5.58) and (5.59), we get

when a1 = 0 that

n

E(M(--» = l ai Vi-l:n-l' i=2

and similarly when an = 0 we get from equations (5.58) and (5.60) that

n-l

E(M(oo» = l ai vi:n- 1·

i=1 Making use of the tables of expected values of nonnal order statistics under the location-outlier

model prepared by David et a1. (1977), bias of various estimators of the mean IJ. of a nonnal N(IJ.,I) popu

lation, based on a sample of size n = 10 with one observation being from a nonnal N(IJ.+A,I) distribution,

have been computed for some specific values of A. These are presented in Table 5.1 given below.

Estimator 0.0

XIo°.O 0.0

T IO(I) 0.0

T IO(2) 0.0

MedlO 0.0

W IO(I) 0.0

W IO(2) 0.0

MIO(I) 0.0

MIO(2) 0.0

LIO(I) 0.0

L IO(2) 0.0

010 0.0

Table 5.1

Bias of various estimators of IJ. for n = 10 when a single observation

is from N(IJ.+A,I) and the others from N(IJ.,I)

A

0.5 1.0 1.5 2.0 3.0 4.0

0.05000 0.10000 0.15000 0.20000 0.30000 0.40000

0.04912 0.09325 0.12870 0.15400 0.17871 0.18470

0.04869 0.09023 0.12041 0.13904 0.15311 0.15521

0.04832 0.08768 0.11381 0.12795 0.13642 0.13723

0.04938 0.09506 0.13368 0.16298 0.19407 0.20239

0.04889 0.09156 0.12389 0.14497 0.16217 0.16504

0.04934 0.09484 0.13311 0.16194 0.19229 0.20037

0.04886 0.09137 0.12342 0.14418 0.16091 0.16369

0.04869 0.09024 0.12056 0.13954 0.15459 0.15727

0.04850 0.08892 0.11700 0.13328 0.14436 0.14576

0.04847 0.08873 0.11649 0.13237 0.14285 0.14407

00

0.18563

0.15538

0.13726

0.20377

0.16530

0.20169

0.16394

0.15758

0.14585

0.14414

129

By looking at the values of the bias of various estimators given in Table 5.1, we get the ordering

X10 -< W 10(1) -< M1O(1) -< T 10(1) -< W 10(2) -< M1O(2) -< L1O(l)

-< T 10(2) -< L 1O(2) -< 010 -< Med1O, (5.73)

where -< denotes "inferior to". It should be noted that the trimmed means have a smaller bias than the

corresponding modified maximum likelihood estimators which, in tum, have a smaller bias than the

corresponding Winsorized means. Also, as rightly pointed out by David and Shu (1978), the median is

more biased than what we may naively have thought it to be. In addition, the estimators based on the 6

central order statistics are seen to be less subject to bias than those based on the 8 central order statistics.

This is not surprising as we would expect an estimator omitting the 4 extreme observations (two smallest

and two largest) to exclude the single outlier present in the sample with a larger probability as compared to

an estimator omitting only the 2 extreme observations (one smallest and one largest). For the case when n

= 10 and A = 2, for example, we have the following values of Pr{rank(Y)=i} = IIi from an extensive set of

tables prepared by Milton (1970):

t 123

II( 0.001 0.003 0.005

456

0.009 0.014 0.023

7 8 9 10

0.040 0.072 0.159 0.674

From these values we see that an estimator omitting the four extreme observations (two smallest and two

largest) excludes the outlier with probability 0.837 while an estimator omitting the two extreme observa

tions (one smallest and one largest) excludes the outlier with probability 0.675 only.

Considering now the expression of E(M2(A» and then integrating by parts, we obtain

E(M2(A» = 2 fo x Pr{ IM(A) I > x} dx. (5.74)

Assuming f to be a standardized symmetric density function and taking accordingly ai = an_i+ 1 for i

= 1,2, ... ,[~], we have M(O) to be symmetrically distributed about 0 and also M(-A) to be distributed

exactly as -M(A). Then

Pr{ I M(A) I > x} = Pr{M(A) > x} + Pr{M(-A) > x}

may be expected to be an increasing function of I AI and, consequently, we have E(M2(A» to be an

increasing function of A from equation (5.74). Furthermore, for a1 = an = 0, we obtain

E(M2(±00» = lim E(M(A»2 A-IOO

n-1

=E[lim \' a.Z .. (A)]2 , L I l.n 11.-100 i=2 n-1

= E a. , Z. (II.) [L lim ,]2 I 11.-100 l:n

i=2 n-1

= E[ L ai Xi :n- 1] 2

i=2 n-1 n-1

= [E(M(oo)] 2 + L L aiaj Cov[Xi:n_1,Xj:n_1]i=2 j=2

In Table 5.2 we have computed the mean square error of the various estimators considered earlier for

130

whom the bias are given in Table 5.1. Based on these mean square error values, we observe the partial

ordering

MedIO -< 010 -< L1O(2) -< L1O(I) -< T 10(2) (5.75)

as well as W 10(1) -< M1O(I), W 10(2) -< M1O(2) and W 10(2) -< T 10(1)·

Tllbl~ !l.~

Mean square error of various estimators of ~ for n = 10 when a

single observation is from N(~+A.,I) and the others from N(~,I)

A.

Estimator 0.0 0.5 1.0 1.5 2.0 3.0 4.0 00

"10 0.10000 0.10250 0.11000 0.12250 0.14000 0.19000 0.26000 00

T1O(l) 0.10534 0.10791 0.11471 0.12387 0.13285 0.14475 0.14865 0.14942

T1O(2) 0.11331 0.11603 0.12297 0.13132 0.13848 0.14580 0.14730 0.14745

MedlO 0.13833 0.14161 0.14964 0.15852 0.16524 0.17072 0.17146 0.17150

W IO(I) 0.10437 0.10693 0.11403 0.12405 0.13469 0.15039 0.15627 0.15755

W1O(2) 0.11133 0.11402 0.12106 0.12995 0.13805 0.14713 0.14926 0.14950

M1O(I) 0.10432 0.10688 0.11396 0.12385 0.13430 0.14950 0.15513 0.15581

M1O(2) 0.11125 0.11395 0.12097 0.12974 0.13770 0.14649 0.14853 0.14876

L1O(1) 0.11371 0.11644 0.12337 0.13169 0.13882 0.14626 0.14797 0.14820

L1O(2) 0.12097 0.12386 0.13105 0.13933 0.14598 0.15206 0.15310 0.15318

°10 0.12256 0.12549 0.13276 0.14111 0.14777 0.15376 0.15472 0.15479

Realize the dilemma we are in at this juncture as the ordering in (5.75) based on the mean square error is

almost a reverse ordering of the estimators given in (5.73) based on the bias. Table 5.2 also reveals that

the modified maximum likelihood estimators perform better than the Winsorized means which in turn

perform better than the trimmed means when A. is small. This is not surprising since the modified maxi

mum likelihood estimators are almost best linear unbiased estimators (and are also almost the maximum

likelihood estimators) based on the n-2r central order statistics. For large values of A., however, the

trimmed mean T 10(2) and the modified maximum likelihhod estimator M IO(2) both remain optimal. For

more details on some properties of these estimators and their applications in <ieveloping some robust

inference procedures, one could refer to Andrews et al. (1972) and Tiku et al. (1986).

Similarly, for the scale-outlier model we may write the location estimators as

n-r * ~ * M (t) = l ai Zi:n(t)

i=r+l which, when ai = an+1--i' are clearly symmetrically distributed about 0 for all values of t. As a result we

* have E(M ('t» = 0 for all values of t. Moreover, by comparing equation (5.71) with equation (5.58) we

* immediately see that the limiting behaviour of M (t), given Xn > 0, as t -+ 00 corresponds to that of M(A.)

* as A. -+ 00, and in a similar way the limiting behaviour of M (t), given Xn < 0, as t -+ 00 corresponds to that

of M(A.) as A. -+ -. We have, therefore,

~~~ E [M*2(t)] = E [M*2(oo)]

131

= Pr(X > 0) E[M2(oo)] + Pr(Xn < 0) E[M2(_)]

= i{E[M2(oo)] + E[M2(_)]}

= EtM2(oo)]. (5.76) Under the scale-outlier model the estimators of location considered earlier are all unbiased for all

values of't. By making use of the table of variances and covariances of normal order statistics under the

Table 5.3

Variance of various estimators of Il for n = 10 when a

single observation is from N(Il,'t2) and the others from N(Il,I)

't

Estimator 0.5 1.0 2.0 3.0 4.0

XlO 0.09250 0.10000 0.13000 0.18000 0.25000 00

T lO(l) 0.09491 0.10534 0.12133 0.12955 0.13417 0.14942

T lO(2) 0.09953 0.11331 0.12773 0.13389 0.13717 0.14745

Med10 0.11728 0.13833 0.15375 0.15953 0.16249 0.17150

W lO(l) 0.09571 0.10437 0.12215 0.13221 0.13801 0.15754

W 1O(2) 0.09972 0.11133 0.12664 0.13365 0.13745 0.14950

M lO(l) 0.09548 0.10432 0.12187 0.13171 0.13735 0.15581

M lO(2) 0.09940 0.11125 0.12638 0.13328 0.13699 0.14876

L lO(I) 0.09934 0.11371 0.12815 0.13436 0.13769 0.14820

L1O(2) 0.10432 0.12097 0.13531 0.14101 0.14398 0.15318

°10 0.10573 0.12256 0.13703 0.14270 0.14565 0.15479

scale-outlier model prepared by David et al. (1977), variance of various estimators of the mean Il of a

normal N(Il,I) population, based on a sample of size n = 10 with one observation being from a normal

N(Il,'t2) distribution, have been computed for some specific choices of 'to These values are presented in

Table 5.3. From these values we observe that the partial ordering in (5.75) still holds, except for the

"inlier" situation 't = 0.5 when T 10(2) is inferior to L lO(1). We also observe that W 10(1) -< M lO(I),

W 10(2) -< M lO(2), W 10(2) -< T 10(1) and T 10(2) -< M lO(2) except when 't is very large. These orderings, in

general, agree with those based on Table 5.2. As David and Shu (1978) rightly pointed out, this general

agreement is less surprising when we remember that there is identity of results not only in the null case (A

= 0 or 't = 1) but also for the limiting case (A = 't = 00), the latter by equation (5.76).

Making use of the results given in Section 5.6 and the explicit expressions for the means, variances

and covariances of order statistics from a single scale-outlier exponential model, Balakrishnan and

Ambagaspitiya (1988) have evaluated the means, variances and covariances of order statistics from a single

scale-outlier double exponential model. The model they have considered is that a sample of size n consists

of n-l observations from a Laplace population with pdf

f(x) = k e-ix-lli/cr, - < x < 00, - < Il < 00, cr> 0,

while one observation is from a population with pdf

132

g(X) = :zke-lx-lll/aO", ---00 < X < "',0" > 0, a> O.

After noting that the various estimators of the location parameter Il considered earlier are all unbiased,

Balakrishnan and Ambagaspitiya (1988) have studied the performance of these estimators by computing

their variance for various choices of a and different sample sizes. The median, the linearly weighted mean

and the Gastwirth mean all perform very efficiently in this case and this is to be expected as the double

exponential distribution is a symmetric long-tailed distribution. In addition, they have also noted that the

trimmed means do better than the corresponding modified maximum likelihood estimators which in tum

perform better than the corresponding Winsorized means.

133

Exercises

1. Suppose XI,x2 •...• X are independent random variables with X. (i=I.2 •...• n) having pdf f.(x) and cdf nIl

Fi(x). Then show that the cdf of Xi:n may be written as

n

H .. (x) = L L n F. (x) fr {I-F. (X)}. 1.n . S k=1 Jk k=r+ I Jk

f=1 r where the summation Sr extends over all permutations (i1.j2 •...• jn) of 1.2 •...• n for which

ji <j2<···<jr andjr+1 <jr+2< ···<jn·

2. (Sen (1970)). In the above problem. denoting the collection of distribution functions (FI.F2 •...• Fn) by

n .9," the average cdf by P = ~ r~1 Fr and its unique quantile of order p by ~p' i.e .• P(~p) = P. show

that for i = 2.3 •...• n-1 and x ::;; ~i-I < ~i ::;; y.

n n

Pr{x < Xi:n ::;; yl ~ ~ Pr{x < Xi:n ::;; YiP}. where the equality holds only if F I = F2 = ... = F n = F at both the points x and y. Also show. for

all x. that

Pr{XI:n ::;; xl ~ ~ Pr{XI:n ::;; xlP} and

Pr{Xn:n ::;; xl ~::;; Pr{Xn:n ::;; xIP}. where the equalities hold only if FI = F2 = ... = F n = F at the point x. (Hint: You may use a result

of Hoeffding (1956) on the distribution of successes in n independent trials).

3. As in Problem 1. let X1:n ::;; X2:n ::;; ... ::;; Xn:n denote the order statistics obtained from n independent

random variables Xi (i = 1.2 •...• n) having pdf fi(x) and cdf Fi(x). Then show that:

(a) the pdf of the largest order statistic Xn:n is given by

n n

hn:n(x) = { n 1 F/X)} L {fr(X)IF/X)}; r= r=1

(b) the cdf of the sample range W n= Xn:n -X1:n is given by

n n Pr(Wn ::;; w) = r~1 f-oo f/x) S~1 (Fs(x+w) - F/x)} dx.

s;tr

4. (Cohn et al. (1960)). Suppose Xij (i = 1.2 •...• k; j = 1.2 •...• n) are k independent random samples of

size n. with X.. having pdf f(x) and cdf F.(x). j = 1.2 •...• n. Then show that the maxima from the k 1J 1 1

samples are the k largest of the kn random variables with probability

k k k nk [ { n F~-I(X)} L {n (1-Fs(X))}fs(X) dx.

-00 m=1 r=1 s=1 s;tr

134

5. (Kelleher and Walsh (1972». In the sample X1:n :S X2:n :S ..• :S Xn:n of ordered continuous

variables, suppose Xn:n is an outlier in the sense that it arises from a different population than that

giving rise to the remainder of the sample. Show then that the confidence coefficient of the interval

(X .. .x '+1')' i=2,3, ... ,[n!2], for the median in samples of n still equals 2-fi nt[n]. l.n n-l .n r=i r

6. (Conover (1965); David (1966». Suppose k mutually independent random samples of size n, each

drawn from a continuous population with cdf F(x), are ordered on the basis of the largest observation

in each sample. Further, suppose Yij (i = 1,2, ... ,n; j = 1,2, ... ,k) denote the i'th variate in order of

magnitude in the sample whose largest member Y 1j has rank j among the k maxima Y 11'

Y 12""'Y lk' Then show that j-1

Pr(Yij < x) = l [~] {l-Fn(x)}P {Fn(x) t-p

p=o j-1 i-I q-l

+ l l l j[f] q[~] [j~l] [q;l] (-llq-r-p [{F(X)}n-1-r

p=O q=l r=O

- {F(X) }nk-fip] I [nk-np+ 1-n+r],

where the triple summation is zero when i = 1.

7. (Neyman and Scott (1971». Let Xn:n be called a y outlier on the right if Xn:n > Xn- 1:n + y(Xn- l :n - X1:n), y> O. Let us also denote the probability that a sample of n observations from a continuous

population with pdf f(x) and cdf F(x) will contain such a y outlier on the right by TI(y,n,F). Then

prove that

8.

TI(Y,n,F) = n(n-1) ([ {F[ry:r] - F(x) }n-2 f(y) f(x) dy dx.

Show also that for fixed n > 2, the above probability TI(Y,n,F) is a decreasing function of y which

tends to 0 as n -; 00.

(Vaughan and Venables (1972». Suppose Xl' X2"",Xn are independently distributed, Xj having

density function f/x) and cdf F/x). Consider a sampling process where one value is taken from each

of the above n distributions and the sample ordered so that X1:n :S X2:n :S ••• :S Xn:n.

a. Show that the pdf of Xk . , I :S kl :S n, is given by + l·n +

F1(x1) F2(x1) Fn(x1) ] kC l rows

F1(x l ) F2(x1) Fn(x l )

hkl :n(xI) = {(kC 1)!(n-kl )! }-1 fl (Xl) f2(x1) fn(x1)

I-F1 (Xl) I-F2(x l ) 1-Fn(xl ) }-l<1 row, 1-FI(x1) I-F2(x1) 1-Fn(x1)

+ + where I A I denotes the permanent of a square matrix A. The permanent of A is defined like

b.

c.

135

the determinant, except that all signs are positive; for details, see Aitken (1939).

Similarly, show that the joint pdf of Xkl :n and X~:n' 1 ~ kl < k2 ~ n, for xl < x2' is given

by

hkl'~:n(xl'x2) = {(kCl)!(~-kCl)!(n-k2)! rl

+ t

] k C ' """

Fl(Xl ) F2(x l ) Fn(xl )

Fl(xl ) F2(x l ) Fn(xl )

fl (Xl>' f2(xl ) fn(x l )

Fl (x2)-Fl (Xl) F2(x2)-F2(xl ) F n(x2)-F n(xl )

Fl (x2)-Fl (Xl) F2(x2)-F2(x l ) F n(x2)-F n(x l ) ] k,-kC'row.

fl (x2) f2(x2) fn(x2)

l-Fl (x2) l-F2(x2) l-F n(x2)

l-Fl (x2) l-F2(x2) l-Fn(x2)

Generalize these results and derive the joint density of any subset Xkl :n' X~:n'···' Xkp:n of

order statistics, where 1 ~ kl < ~ < ... < kp ~ n.

d. In particular, show that the density of the full joint distribution of Xl :n, X2:n,···, Xn:n at xl'

x2, ... ,xn is given by + + fl (Xl) . . fn(x l )

e. For the special case when there is exactly one outlier in the sample, that is, Fl = F2 = ...

= F n-l = F and F n = G, show that the densities in (a) and (b) reduce to the expressions in

equations (5.4) and (5.6), respectively.

9. (Balakrishnan (1987b». For a single-outlier model, show for n ~ 2 that

n n n \' 1,,9') = \' 1 ik) + l \' [,,9') _ v~k)] L 1"'1: n L 1 1 : n n L "'1 : n 1 : n

i=l i=l i=l

n n-l = l \' ,,(k~ + \' [l_l] v (k~

n L "'1: 1 LIn 1: 1

i=i i=l and

n n n

2 (n-~+l) ~~~~= 2 (n-~+l) v~~~ + k 2 [~~~~ -v~~~] ~l ~l ~l

136

n n-l \' lI~k~ + \'

n L "-1:1 L i=1 i=l

In particular, by setting G(x) = F(x) in the above results, deduce the identities given in Exercise 4 of

Chapter 2.

10. (Balakrishnan (1988a». For a single-outlier model, show that

n-l n

L Ili,i+l:n + L [j=n 1l1,j:j i=1 j=2

n-1 n-l

= L {[j=n v1:n_j Ilj:j + [nj1] vj :j 1l1:n-j } - L [nj1] V1,j:j" j=l j=2

11. Assuming Xl' X2, ... ,xn_1 to be a random sample from a standard exponential distribution with pdf

f(x) = e -x, 0 ~ x < 00, and Y to be an independent random variable with pdf g(x) = ~ e -xla, x ~ 0, a

> 0, and then denoting Zl:n ~ Z2:n ~ ... ~ Zn:n for the order statistics obtained from these n

variables, derive E(Zi:n)' Var(Zi:n) and Cov(Zi:n,Zj:n).

12. Assuming Y to be an independent random variable with pdf g(x) = e -(x-Il), x ~ Il, once again derive

E(ZI·.n)' Var(Z .. ) and Cov(Z .. ,Z .. ). . I.n I.n J.n

13. (Balakrishnan and Ambagaspitiya (1988». Making use of the expressions of means, variances and

covariances of order statistics derived in Exercise 11 in Relations 5.21 and 5.22, compute the means,

variances and covariances of order statistics from a single scale-outlier double exponential model,

viz.,

Xl' ... ,Xn_1 with pdf ie-lxi, -<>0 < X < 00

and

Y with pdf k e -I x 1 la, -<>0 < x < 00, a > 0,

for n = 10 and 20, and a = 0.5,1.0(1.0)10.0. Making use of these values, compare the efficiency of

the various estimators of the location parameter considered in Section 8.

14. (Smith and Tong (1983); David (1986». Let xr = Yr+zr' r = 1,2, ... ,n, and write x(l) ~ x(2) ~ ... ~

x(n) or x[l] ~ x[2] ~ ... ~ x[nl' and likewise for Yr and zr. Then, show for 1 ~ i ~ n, that

min [ ] x[i] ~ . Y[r] + z[i+l-r] r=1,2, ... ,1

and

xCi) ~ r=l,~~~ .. ,i hr) + Z(i+1-rl

Thence, derive the inequality

137

15. (David (1986». For a single-location outlier model, defining Zr = B > 0 for some unknown value of

r and Zr = 0 otherwise, making use of the results in Exercise 14 show that for i = 1, 2, ... ,n-l

E[Y(i)] :::; E[X(i)] :::; min [E(Y(i+l»' E(Y(i» + B], and for i = n

max [E(Y(n»' E(Y(l» + B] :::; E(X(n»:::; E(Y(n» + B.

16. (Smith and Tong (1983); David (1986». Let x(i)' i=I,2, ... ,n, denote the n sums Y(i)+ z(n+1-i)

arranged in increasing order of magnitude. Further, let l be a convex linear function of ordered xi'

n viz. l = r c. x(.) with C1 :::; C2 :::; ... :::; C . Then show that for C1 ~ 0

i=1 lIn n n

2 Ci X(i):::; l:::; 2 Ci [Y(i) + Z(i)]' i=l i=1

Also, show that the above inequality continues to hold if Cm < 0, Cm+1 ~ 0, m = 1,2, ... ,n.

17. Using the notation in Section 5.1, without assuming absolute continuity verify that

i Hi+l:n(x) + (n-i) Hi:n(x) = (n-l) Hi:n_l(x) + Fi:n_l(x)

18.

for 1 :::; i :::; n-1. [Hint: Use equations (5.3) and (2.68)]. Due to the above result we may note that

Relation 5.2 continues to hold without the assumption of absolute continuity.

Denoting the joint c.d.f. of Z. and Z. (1:::; r < s :::; n) by H . (x,y), x :::; y, show that r.n s.n r,s.n

(i-l)H ... (x,y) + G-i)R. 1'. (x,y) + (n-j+l)H. l' l' (x.y) = (n-l)H. l' l' l(x,y) 1J.n 1- J.n 1- J- .n 1- ,j- .n-+ F. l' l' 1 (x,y) 1- J- .n-

for 2 :::; i < j :::; n. Due to this result we may note that Relation 5.9 continues to hold without the

assumption of absolute continuity.

19. Let Xl' X2, ... ,xn be LLd. from an arbitrary distribution function F. Let U l' U2 .... , Un be i.i.d.

Uniform (O.B), independent of the X's. For i = 1,2, ... ,n, define Yi = Xi+Ur Prove that for 1 :::; i :::; n,

Y" n ~ Xl" as B -+ O. In addition, if the X.'s have a finite k'th moment. show that E(Y~'n) -+ E(Xk1· 'n) 1. .n 1 1. •

as B -+ O. Since the Yi's are absolutely connot refer to densities continue to hold for general

distributions.

20. (Balakrishnan (l988c». (i) Let Xl' X2 .... , Xn be independent variables with Xi (i = 1.2 ..... n) having

pdf fi(x) and cdf Fi(x). Let Xl:n :::; X2:n :::; ... :::; Xn:n be the order statistics obtained from the

realizations of Xl' X2 .... , Xn. Then the density of Xi:n (1 :::; i :::; n), viz., hi:n(x). and the joint density

of X" and X" (1:::; i < j :::; n), viz., h. '. (x.y). are as given in Exercise 8. 1.n j.n 1,j.n [rl'r 2.· ... rm]

Now let us denote hi :n-m (x), 1 :::; i :::; n - m, for the density function of the i'th order statistic

138

in a sample of size n-m obtained by dropping X • X •...• X from the original set of n variables; r l r2 rm

similarly. let us denote hfr~. 1 (x.y). 1 S; i < j S; n-I. for the joint density of the i'th and j'th order I.J.n-statistics in a sample of size n-l obtained by dropping Xr from the original set of n variables. Then

show that for 1 S; i S; n-I.

n

ihi+I:n(x) + (n-i)hi:n(x) = L hn_l(x) r=I

and for 2 S; i < j S; n.

(i-I) h ... (x.y) + (j-i)h. 1·. (x.y) + (n-j+l)h. 1· 1. (x.y) I.J.n 1- .J.n 1- .J- .n n [r]

= ;1 hi-Ij-I:n-I(x.y).

(ii) For the p-<>utlier model. that is. FI = F2 = ... = Fn-p = F and Fn_p+l = ... = Fn = 0. then

deduce the relations

ihi+I:n(x) + (n-i)hi:n(x) = (n-p)hr'"LI(x) + ph{~~_I(x). 1 S; i S; n-l. and

(i-l)h ... (x.y) + (j-i)h. 1·. (x.y) + (n-j+l)h. 1· 1. (x.y) IJ.n 1- .J.n 1- .J- .n

= (n-p)hfFll . 1. l(x.y) + phfOI] . 1. I(x.y), 2 S; i < j S; n. 1- .J- .n- 1- ,J- .n-

where h{~~_1 (x) and h/~~I (x) are the density functions of the i'th order statistic in a sample of size

n-l from the p-<>utlier model and the (p-I) - outlier model. respectively. and similarly for the joint

density functions.

(iii) Denote

and

Sn_m:n_m(x) = 1: Is;r 1 <r2<···<rmS;n

with Sl:n(x) == hI:n(x) and Sn:n(x) == hn:n(x). show that

n

Then by repeated application of the fIrst relation in (i),

hi:n(x) = L (_l~-i [ti] Sj:/x). 1 S; i S; n-I.

j=i and

n . . 1 J-n+l- [. 1]

hi:n(x) = 1: (-1) ~=i S1:j(x). 2 S; i S; n j=n-i+l

21. (Balakrishnan. 1989a). (i) Let Xl' X2' ...• Xn be independent random variables with Xi (i = I.2 •...• n)

having pdf fi(x) and cdf Fi(x). Then by making use of the relations given in Exercise 20, show that

(i-I) 0... + (j-i) 0. 1·. + (n-j+I) 0· 1· 1. I.J.n 1- .J.n 1- .J- .n

139

n

l {crf~L-l:n-l + [~f~Ln-l - ~i-l:n] H~Ln-l - ~j:n]} r=1

for 2 S; i < j S; n, where cr· .. and cr~r~.n 1 denote the covariances of the i'th and j'th order statistics IJ.n l,j. -in samples of size n and n-l (with Xr dropped), respectively.

(ii) For the p-outlier model, that is, Fl = F2 = ... = Fn-p = F and Fn-p+l = ... = Fn = G, deduce the relation

(i-I) cr... + G-i) cr. 1·. + (n-j+l) cr· 1· 1. IJ.n 1- ,j.n 1- J- .n

= (n-p) {crEL-l:n-l + [~ELn-l - ~i-l:n] [~j-l:n-l - ~j:n]} + p{crf~L-l:n-l + r~f~Ln-l - ~i-l:n] H~Ln-l - ~j:n]}'

where, as before, [F] and [01 denote the quantities in a sample of size .n-l from the p-outlier model

and the (P-l)-outlier model, respectively.

22. (Balakrishnan, 1989b). Let X1:n S; X2:n S; ••• S; Xn:n denote the order statistics obtained from n

independent absolutely continuous random variables Xi (i = 1,2, ... ,n), with Xi having pdf fi(x) and

cdf Fi(x). Let us denote the single and product moments of these order statistics by ~f~~ (1 S; i S; n;

k ~ 1) and ~ij:n (1 S; i < j S; n). Let the density functions fi(x) be all symmetric about O. Then for x > 0, let

0i(x) = 2 Fi(x) - 1 and gi(x) = 2 fi(x); that is, the density functions gi(x), i = 1,2, ... ,n, are obtained by folding the density functions fi(x) at

zero. Let Y I:Ii S; Y 2:n S; ••• S; Y n:n denote the order statistics obtained from n independent absolutely

continuous random variables Yi (i = 1.2, ... ,n). with Yi having pdf gi(x) and cdf 0i(x). Further. let us

(k)[rl' ... ,rel [rl' ... ,rel [rl' ... ,rel denote V.. D for the k'th single moment of Y.. D and v. .. 0 for the product moment

1. n-{. l.n-{. l,j .n-{. [r l' ...• r el [r l' ... ,r £1 [r l' ...• r £1

of Yi:n-£ and Yj :n-£ • where Yi:n-£ denotes the i'th order statistic in a sample of size n-£

obtained by dropping Y ,Y , ... ,Y from the original set of n variables Y1'Y2' ...• Y . ~ ~ ~ n

Then show that:

(i) For 1 S; i S; n and k = 1.2, ...•

(k) -n {i~1 \' (k)[r l' ...• r el k ~ ~i:n=2 L L vi-l:n-l +(-1) L

£=0 l:S;rl< ... <rtn £=i

\' (k)[ r l' ...• rn- el}. L vl-i+l:£ •

l:S;rl<···<rn_tn (ii) For 1 S; i < j S; n,

-n {i-l [r l' ... ,r el n [r1 •... ,rn_el ~ij:n = 2 l l Vi-lj-l:n-l+ II V£-j+l.l-i+l:£

l=O l:S;rl<.··<rtn £=j 1Yl< ... <rn_tn

j-l \' [r1, ... ,rel [rl+1, ... ,rn1} - l L vj-l :n-£ vl-i+l:£ .

l=i l:S;rl< ... <rtn (iii) Verify that for a single-outlier model, these results reduce to Relations 5.21 and 5.22.

140

23. (Balakrishnan, 1989c). Let Xl'X2, ... ,Xn be independent random variables with Xi (i = 1,2, ... ,n)

having pdf fi(x) and cdf F/x). Then by making use of the relations given in Exercise 20, show that:

(i) for I ::; i ::; m::; n-I and k = 1,2, ... ,

n-m \' (k) [r l' ... ,rn_ml \' L Ili:m = L

l::;r I < ... <rn-m::;n r==O

[i-l+r] [n-i -r] ~k) r n-m-r ll1+r:n'

and

(ii) for I ::; i < j ::; m ::; n-l,

[i-l+r] [j-i-l+s-r] [n- j -s]. .. r s-r n-m-s llI+r,s+J:n

l::;r 1 < ... <rn_m::;n r=O s= r

n-m n-m [r l' ... ,rn_ml \' \'

Ili,j:m = L L

For the p-outlier model, that is, Fl = ... == F == F and F +1 == ... = F == G, by denoting n-p n-p n

Il~~) [rl and 11··. [rl for the single and the product moments of order statistics in a sample of size m 1.m 1,J.m with r outliers present in it, deduce the following relations:

(iii) for 0 ::; p ::; nand 1::; i ::; m ::; n-l,

p n-m

L [n~;~r] [~] Il~~~[p-rl = L [i-;+r] [~=~=~] Il~~~:n[pl r=O r=O

and

(iv) for 0::; p ::; nand 1::; i < j ::; m ::; n-l,

p n-m n-m

L [n~;~r] [~] Ili,j:m[p-rl = L L [i-;+r] [j-i;~;s-r] [~=~~] Ili+r,s+j:n[p)· r=O r=O s=r

[lecture notes in statistics] relations, bounds and approximations for order statistics volume 53 ||...

Documents