[lecture notes in statistics] relations, bounds and approximations for order statistics volume 53 ||...
TRANSCRIPT
CHAPI'ER5
ORDER STATISTICS FROM A SAMPLE CONTAINING A SINGLE OUTI.JER
5.0. Introduction
Density functions and joint density functions of order statistics arising from a sample containing a
single outlier have been given by Shu (1978) and David and Shu (1978), and have been made use of by
David et al. (1977) in tabulating means, variances and covariances of order statistics from a normal sample
comprising one outlier. One may also refer to Vaughan and Venables (1972) for more general expressions
of distributions of order statistics when the sample, in fact, includes k outliers. The importance of a
systematic study of the order statistics from an outlier model and the usefulness of the tables of means,
variances and covariances of these order statistics in the context of robustness has been well demonstrated
by several authors including Andrews et al. (1972), David and Shu (1978), David (1981), and Tiku et al.
(1986).
In this Chapter, we fIrst derive several recurrence relations and identities satisfIed by the single and
the product moments of order statistics. These relations are then applied in order to determine the maxi
mum number of single and double integrals to be evaluated for the calculation of the fIrst two single
moments and the product moments of order statistics in a sample of size n, assuming these quantities for
all sample sizes less than n to be known. All these results are presented in Sections 2 through 4. Results
of similar nature for the symmetric outlier case, i.e., the case when all the n observations including the
outlying observation have distributions symmetric about zero, are presented in Section 5. In Section 6, we
derive some relations between the moments of order statistics from a symmetric outlier model and the
moments of order statistics from a single outlier model obtained by folding both the distributions at zero
and also discuss the cumulative rounding error committed by using these recurrence relations. These
results have all been established recently in a series of papers by Balakrishnan (1987b, 1988a,b,c) and they
generalize the results presented in Chapter 2. The functional behaviour of order statistics from location
and scale-outlier models is discussed in detail in Section 7. Finally, in Section 8 we make use of these
results to study the bias and mean square error of various omnibus robust estimators of the location para
meter in the normal case. Work of this nature has also been carried out earlier by Crow and Siddiqui
(1967), Gastwirth and Cohen (1970), Andrews et al. (1972), and Tiku (1980). By considering a single
outlier exponential model, Kale and Sinha (1971) and Joshi (1972) have made investigations regarding the
mean square error of a class of estimators of the mean of the exponential distribution; see also Barnett and
Lewis (1978) and Lawless (1982) for further details on this topic.
Throughout this Chapter results are derived under the assumption that the distributions under discus
sion are absolutely continuous. Since any distribution function can be represented as a weak limit of a
sequence of absolutely continuous distributions, all results which do not explicitly refer to densities con
tinue to hold for general (not necessarily absolutely continuous) distributions. See, for example, Exercises
17-19.
B. C. Arnold et al., Relations, Bounds and Approximations for Order Statistics
© Springer-Verlag Berlin Heidelberg 1989
109
5.1. Distributions of order statistics
We shall derive here the distributions of order statistics obtained from a sample of size n when an
unidentified single outlier is present in the sample. For convenience, let us represent the sample by n
independent absolutely continuous random variables Xr (r = l,2, ... ,n-I) and Y, such that Xr has pdf f(x)
and cdf F(x), and Y has pdf g(x) and cdf G(x). Further, l~t
Zl:n ~ Z2:n ~ ... ~ Zn:n (5.1) be the order statistics obtained by arranging the n independent observations in increasing order of
magnitude.
The cdf of Zn:n' denoted by hn:n(x), may now be obtained as
Hn:n(x) = Pr{all of Xl'X2, ... ,xn_1' Y :s: x}
= (F(x)}n-1 G(x). (5.2)
Similarly, the cdf of Zi:n (1 :s: i :s: n-1) may be obtained as
Hi:n(x) = Pr{at least i of Xl'''2, ... ,Xn_1, Y :s: x}
= Pr{exactly i-I of Xl'X2, ... ,xn_1 :s: x and Y :s: x}
+ Pr{at least i of Xl'''2, ... ,xn-1 :s: x}
= [to {F(x)}i-1 (l_F(x)}n-i G(x) + Fi:n_1(x), (5.3)
where Fi:n_1(x) is the cdf of the i'th order statistic in a sample of size n-I drawn from a population with
pdf f(x) and cdf F(x). Differentiating the expressions in (5.2) and (5.3), we obtain the density function of
Zi:n (1 ~ i ~ n) as _ (n-1)! (i-2 n-i
hi:n (x) - (1-2)! (n-l)! F(x)} ( 1-F(x)} G(x) f(x) (n-l) , i-I n-i
+ (I-1)!(n~I)! (F(x)} (l-F(x)} g(x) (n-l) , i-I n-i-1
+ (1-1)!(n-~-I)! (F(x)} (l-F(x)} (l-G(x)} f(x), - < x < 00, (5.4)
where the ftrst term drops out if i = 1, and the last term if j = n.
In view of the importance of this result we will also give an alternative proof which lends itself to
further extensions. The event x < Zi:n :s: x+l;x may be seen as follows
i-I 11 I n-i ---------x - x-+-l;-x-------oo
Zr ~ x for i-I of the Zr' x < Zr ~ x+l;x for exactly one Zr' and Zr > x+l;x for the remaining n-i of the Zr.
Realizing now that the outlying observation Y could belong to anyone of three parcels, we have the
following three possibilities:
(i) Xr:S: x for i-2 of the Xr and Y :s: x, x < Xr :s: x+l;x for exactly cine Xr' and Xr > x+3x for the remain
ing n-i of the Xr' with a probability
(n-1)! i-2 s: s: n-i. (1-2) !1!(n-l)! (F(x)} G(x)(F(x+ux) - F(x)}{ I-F(x+ux)} ,
(ii) X ~ x for i-I of the X , x < Y ~ x+3x, and X > x+3x for the remaining n-i of the Xr' with a r r r probability
(n-1)! i-I s: s: n-i. (1-1) !O! (n-l)! (F(x)} (G(x+ux) - G(x)}{l-F(x+ux)} ,
and
11U
(iii) Xr:S x for i-I of the Xr, x < Xr :S x+~x for exactly one Xr, and Xr > x+~x for the remaining n-i-l
of the X and Y > x+~x, with a probability r (n-l)! i-I" "n-i-l" (I-I)!!! (n-l-1)! (F(x)} (F(x+ux) - F(x)}{ I-F(x+ux)} (1-G(x+ux)}.
Regarding ~x as small, we could write
Pr(x < Zi:n :S x+~x)
_ (n-1)! i-2 "n-i" = (1-2)! (n-l)! (F(x)} G(x) (1-F(x+ux)} f(x) uX
(n-1)! i-I "n-i" + (1-1)!(n-l)! (F(x)} (1-F(x+ux)} g(x) uX
(n-l)1 i-I "n-i-l ." " [" 2] + (1-1)!(n-~-1)! (F(x)} (l-F(x+ux)} (1-G(xTUx)} f(x) uX + 0 (ux) (5.5)
where 0 [(~X)2] denotes the probability of more than one Zr falling in the interval (x,x+~x) and hence is a
term of order (~x? Dividing both sides of (5.5) by ~x and letting ~x .... 0, we once again obtain the
density function of Zi:n as in equation (5.4).
Proceeding exactly on similar lines by noting that the joint event x < Zi:n :S x+~x, y < Zj:n :S y + ~y
is obtained by the configuration (neglecting terms of lower order of probability)
__ i-_l __ 1 j-i-l n-j
x x+~x y y+~y 00
we obtain the joint density function of Zi:n and Zj:n (1 :S i < j :S n) as
_ (n-l)! (i-2 j-i-l n-j hi,j:n(x,y) - (1 2)!0 1-1)!(n-J)! F(x)} (F(y)-F(x)} (1-F(y)} G(x) f(x) f(y)
(n-1)! i-I j-i-l n-j + (1-1)!0-1-1)!(n-J)! (F(x)} (F(y)-F(x)} (1-F(y)} g(x) f(y)
(n-l)! i-I j-i-2 n-j + (1 1)!0-1-2)!(n-J)! (F(x)} (F(y)-F(x)} (l-F(y)} (G(y)-G(x)} f(x)f(y)
(n-l)! i-I j-i-l n-j + (1-1)!0-1-1)!(n-J)! (F(x)} (F(y)-F(x)} (l-F(y)} f(x) g(y)
(n-l)! i-I j-i-l n-j-l + (1-1)!0-1-1)!(n-J~1)! (F(x)} (F(y)-F(x)} (1-F(y)} (l-G(y)}f(x) f(y),
--00 < X < Y < 00, (5.6)
where the first term drops out if i = 1, the last term if j = n, and the middle term if j = i+1. Note that the
densities in equations (5.4) and (5.6) are special cases of a very general result of Vaughan and Venables
(1972). They have essentially expressed the joint density function of k order statistics arising from n
absolutely continuous populations in the form of a permanent.
5.2. Relations for sin&le moments
With the density function of Zi:n as in equation (5.4), we have the single moment ~f~~ (1 :S i :S n,
k ~ 1) as.
lI~k) = [ xk h. (x) dx. "'1:n l:n
--00
(5.7)
111
Further, let fi:n(x) be the pdf of the i'th order statistic in a sample of size n drawn from a continuous
population with pdf f(x) and cdf F(x), and the single moments of order statistics for this case be denoted by
v~k) = [ xk f. (x) dx l:n l:n
--00
= {B(i,n-i+l)}-1 L xk {F(x)}i-l (1_F(x)}n-i f(x), (5.8)
for 1 ~ i ~ n and k = 1,2,.... Then these single moments of order statistics, viz., Ilf~~ and vf~~ , satisfy the
following recurrence relations and identities for arbitrary continuous distributions F and G. As mentioned
by David et al. (1977), these results also provide useful checks in assessing the accuracy of the computation
of moments of order statistics from a single outlier model.
Relation 5.1: For n ~ 2 and k = 1,2, ... ,
n
~ ,,~k) = (n-l) v(k) + " (k). L ""1:n 1:1 ""1:1
i=1 Proof. By considering the identity
n n-l ~ Z~ = ~ X~ + yk L l:n L 1
i=1 i=1
and taking expectations on both sides, we immediately obtain the required relation.
Setting k = 1 and 2, in particular, we derive the results
n
L Ili:n = (n-l) vl:l + 111:1 i=1
and
n ~ ,,~2) = (n-l) v(2) + ,,(2) L ""1:n 1:1 ""1:1'
i=1 which have been made use of by David et al. (1977).
Relation 5.2: For 1 ~ i ~ n-l and k = 1,2, ... ,
(5.9)
i ,,~k) + (n-i) ,,~k) = (n-l) ,,~k) + v~k) . (5.10) ""1+I:n ""1:n ""1:n-l l:n-I
Proof. First, consider the expression of i Ilf!~:n from equations (5.4) and (5.7) and split the first term into
two by writing the multiple i as (i-I) + 1. Next, consider the expression of (n-i) Ilf~~ and split the last
term into two by writing the multiple n-i as (n-i-I) + 1. Finally, adding the two expressions and simplify
ing the resulting equation, we obtain the relation (5.10).
Relation 5.2 has been derived by Balakrishnan (I988a) and it is easy to see that we just require the
value of the k'th moment of a single order statistic in a sample of size n as Relation 5.2 would enable us to
compute the k'th moment of the remaining n-I order statistics, using of course the moments Il(k) and v(k)
in samples of size less than n. In particular, setting n = 2m and i = m in equation (5.10) we obtain the
following relation.
Relation 5.3. For even values of n, say n = 2m, and k ~ 1,
1 { ( k) + ( k) } _ [1 1] (k) + 1 v (k) 2" Ilm+l:2m Ilm:2m - - 2m 11m: 2m-l 2m m:2m-I
112
= ~~~~m-l + k [v~~im-l - ~~~im-l]· (5.11) For k = 1, in particular, we have the expected value of the median in a sample of even size (n = 2m)
on the LHS of (5.11), while on the RHS we have the expected value of the median in a sample of odd size
(n = 2m-I) and an additive biasing factor involving the difference of the expected values of the medians in
a sample of size 2m-l from the outlier and the non-outlier models.
Relation 5.4: For 1 SiS n-l and k = 1,2, ... ,
n ,,9') = ~ (_I,j-i [J?-I] [~-1] ,,~k~ + v~k) (5.12) '-1:n /., r J-l 1-1 '-J:J l:n-l·
j=i
~. This relation is simply obtained by considering the expression of ~~~ from equations (5.4) and
(5.7), upon expanding the term {1-F(x)}m, (m = n-i, n-i-l), binomially in powers of F(x), and finally
simplifying the resulting expression by making use of equation (5.7) and Relation 2.6.
Relation 5.5: For 2 SiS n and k = 1,2, ... ,
n
~~k) = ~ (_I~-n+i-l [J?-l] [j-~] ~ (k~ + v9') . (5.13) l:n /., J-l n-l I:J 1-1:n-l
j=n-i+l
~. First, consider the expression of ~~~ from equations (5.4) and (5.7). Next, write the term {F(x)}m
as [1-{I-F(x)}]m, (m = i-2,i-l) and expand it binomially in powers of (1-F(x)}. Relation (5.13) now
follows immediately upon simplifying the resulting expression by making use of equation (5.7) and
Relation 2.8.
Remark 5.6: Relations 5.4 and 5.5 have both been derived by Balakrishnan (1988a) and they are quite
important as they usefully express the k'th moment of the i'th order statistic from a sample of size n in
terms of the k'th moment of the largest and the smallest order statistics in samples of size n and less,
respectively. In addition, they also involve the k'th moment of the i'th and (i-l)'th order statistics in a
sample of size n-l from the non-outlier model. In any case, we could note once again from these two
relations that we just require the value of the k'th moment of a single order statistic (either the smallest or
the largest) in a sample of size n in order to compute the k'th moment of the remaining n-l order statistics,
given the moments ~ (k) and ik) in samples of size less than n. Note that this conforms with the comment
made earlier based on Relation 5.2. This is quite expected as both Relations 5.4 and 5.5 could be derived '~
by repeated application of Relation 5.2. However, we need to be careful in employing these two recur-
rence relations in the computational algorithm as increasing values of n result in large combinatorial terms
and hence in an error of large magnitude.
5.3. Relations for product moments
With the joint density function of Zi:n and Zj:n as in equation (5.6), we have the product moment
~ij:n (1 S i < j S n) as
Ili,j:n = JI xy hij:n(x,y) dy dx
1
113
(5.14)
= I I xy hij :n (y,x) dy dx, (5.15)
W2 where WI = {(x,y) : --00 < X < Y < oo} and W2 = {(x,y): --00 < Y < x < oo}. Further, let f. '. (x,y) be the I,J.n joint pdf of the i'th and j'th order statistics in a sample of size n drawn from a continuous population with
pdf f(x) and cdf F(x), and the product moments of order statistics for this case be denoted by
Vij:n = I I xy fij:n(x,y) dy dx (5.16)
WI
= JI xy fij:n(y,x) dy dx, (5.17)
2 for 1 :::; i < j :::; n. Then these product moments of order statistics, viz., Il··. and v· '. , satisfy the follow-I,J.n IJ.n ing recurrence relations and identities for any arbitrary continuous distributions F and O. In addition to
providing straight forward generalizations of the results given in Chapter 2 and some simple checks for
assessing the accuracy of the computation of product moments, these results could also be effectively
applied to reduce considerably the amount of numerical computation involved in the evaluation of means,
variances and covariances of order statistics from an outlier model, at least for small sample sizes.
Relation 5.7: For n ~ 2,
n n
L L Ili,j:n = (n-l)V~~~ + Il~~~ + (n-l)(n-2) vi:l + 2(n-l) v1:11l1:1' i=1 j=1
Proof. The above relation follows directly by considering the identity
nn nn n nn
L L Zi:n Zj:n = L L Zi Zj = L zf + L L Zi Zj i=1 j=1 i=1 j=1 i=1 i=1 j=1
and taking expectation on both sides.
Relation 5.8: For n ~ 2,
n-l n
L L Ili,j:n = [n21] vL + (n-l) v1:11l1:1' i=lj=i+l
i*j
Proof. This is simply derived from Relation 5.7 upon using the result that
n ~ 1I~2) = (n-l) v(2) + II (2) I.. ""1: n 1: I ""1: 1
i=1 obtained from equation (5.9).
(5.18)
(5.19)
Relations 5.7 and 5.8 are very simple to use and, hence, as pointed out by David et al. (1977), could
be effectively applied to check the accuracy of the computation of the product moments.
Relation 5.9: For 2 :::; i < j :::; n,
(i-I) Il··. + G-i) Il· 1'. + (n-j+ 1) Il· l' l' I,J.n 1- ,J.n 1- J- .n = (n-l) Il· 1 . l' 1 + v· l' l' l' (5.20) 1- ,J- .n- 1- J- .n-
Proof. First, consider the expression of the LHS of (5.20) obtained from equation (5.14). Now split the
114
first term in (i-I) J.!,ij:n into two by writing the multiple i-I and (i-2)+1; split the middle term in (i-i)
J.Li-1 ·:n into two by writing the multiple j-i as (j-i-1)+1; similarly, split the last term in (n-j+1) J.Li-1j-1:n
into two by writing the multiple n-j+1 as (n-j)+1. Finally, adding all these and simplifying the resulting
expression by making use of equations (5.14) and (5.16), we obtain the RHS of (5.20).
Relation 5.9 has been derived by Balakrishnan (1988a) and it should be noted that the recmrence
fonnula (5.20) would enable us to calculate all the product moments J.Lij:n (1 ~ i < j S n) by knowing n-1
suitably chosen moments, for example, J.L1 2· , J.L2 3· , ... ,J.L 1 . . ,.n ,.n n- ,n.n Relation 5.10: For 1 ~ i < j ~ n,
i-I n-j
J.Lij:n - vij:n + l l (_l)n-r-s [n;l] [n-~-S]{J.Ln_j-S+1,n-i-s+1:n_r-s - Vn- j-s+1,n-i-s+1:n-r-s} r=O s=O
j-i
= k l (-I~-i-rG~] [til] {(j-i") vr:n-j+r[J.Lj-r:j-r - vj_r:j-i"] r=1
+ (n-:J·+r) v. . [" . - v .] } (521) J-r:j-r t"'r:n-J+r r:n-J+r· .
Proof. Noting that WI u W 2 = 1R2, we have for 1 ~ i < j ~ n
./= IJ xy hi,j:n(x,y) dy dx
IR
= J J xy hij:n(x,y) dy dx + J J xy hij:n(x,y) dy dx
WI W2
= J.Lij:n + J J xy hij:n(x,y) dy dx W2
(5.22)
upon using (5.14). By writing the term {F(x)}a as [l-{l-F(x)}]a , expanding the terms {F(x)}a and
(l-F(y)} b binomially in powers of I-F(x) and F(y), respectively, in the integral over W2' and simplifying
the resulting expression using (5.15) and (5.17) we get from equation (5.22)
i-I n-j ./_ + ~ ~ (_l)n-r-s [n] [n-s]
- J.Lij:n L L s r vn- j-s+1,n-i-s+1:n-r-s r=0 s=O
i-I n-j + ~ ~ (_l)n-r-s [n-1] [n-1-s] { _ }
L L s r J.Ln-j-s+1,n-i-s+1:n-i"-s vn- j-s+1,n-i-s+1:n-i"-s· r=O s=O
Making use of Relation 2.14 we may rewrite the above equation as
j-i
./= J.Lij:n - Vij:n + l (-1~-i-r G:'] [ji~11] Vj-i":j_r Vr:n- j+r r=l
i-I n-j
+ l l (_l)n-r-s [n;l] [n-~-s ]{J.Ln-j-s+1,n-i-S+l:n-i"-S - Vn-j-s+1,n-i-s+1:n-r-s}. r=O s=O
We could. alternatively write
./= JT xy h ... (x,y) dy dx ~ I,J.n
IR
(5.23)
115
= (( xy hij:n(x,y) dy dx.
Now expanding the tenn (F(y)-F(x)} a binomially in powers of F(x) and F(y) and simplifying the resulting
expression by using equations (5.7) and (5.8), we also obtain
j-i
./= k l (_I~-i-r G~][j~11]{G-r) vr:n- j+r ~j-r:j-r r=1
+ (n-J+r) vj_r :j;- ~r:n-j+r}' Relation (5.21) follows upon equating (5.23) and (5.24).
(5.24)
In the above relation, established by Balakrishnan (1988a), it should be noted that only two product
moments, viz., ~". and ~ '+1 '+1" are in samples of size n from the outlier model. In particular, IJ.n n-J ,n-I .n for j = i+ 1 we have the following relation.
Relation 5.11: For i = 1,2, ... ,n-l,
~i,i+l:n - vi,i+l:n + (_I)n {~n-i,n-i+l:n - Vn- i,n-i+l:n}
i-I n-i-l
= l l (_I)n+l-r-s [n;I] [n-~-s] {~n--i-s,n--i-S+l:n-r-s - Vn- i-s,n-i-s+l:n-r-s} r=O s=1 i-I
+ l (_I)n+l-r [n;I] {~n--i,n-i+l:n-r - Vn- i,n-i+l:n-r} r=1
+ k [~]{i vl:n-i[~i:i - Vi:i] + (n-i) vi:i[~I:n-i - v1:n- i]}· (5.25)
Similarly, by setting j = n--i+l in equation (5.21), we obtain the following relation.
Relation 5.12: For i = 1,2, ... ,[n!2],
{1+(-I)n} [~i,n-i+l:n - Vi,n-i+l:n] i-I i-I
l l (-1 )n+ l-r-s [ n; 1] [n-~ -s ]{ ~i-s,n-i-s+ 1 :n-r-s - v i-s,n-i-s+ 1 :n-r-s } r=O s=1
i-I
+ l (_I)n+l-r [n;I] {~i,n-i+l:n-r - Vi,n-i+l:n-r} r=1
n-2i+l
+ k l (_I)n+l-r [i+~-d [n~7]{(i+r-l)Vn_i-r+l:n_i-r+l [~r:i+r-l - Vr:i+r-I] r=1
+ (n-i-r+l) vr:i+r-l [~n-i-r+l:n-i-r+l -Vn-i-r+l:n-i-r+l)}' (5.26)
For odd values of n, we note from Relation 5.11 that we need to calculate only (n-l)/2 product
moments ~. '+1' (I!> i !> ~). Similarly, for even values of n, Relation 5.12 shows that the product 1,1 .n ~
moments~. '+1' (I!> i !> [n/2]) could all be obtained from the moments in samples of sizes n-I and I,n-I .n less. In particular, for even values of n, say n = 2m and i = 1, Relation 5.12 yields
2 [~1,2m:2m - vI ,2m:2m)
116
2m-l
= im 1: (-ll-l [2~] {r v2m-r:2m-r [Ilr:r - Vr:r] + (2m-r) vr:r [1l2m-r:2m-r - V2m-r:2m-r]} r=l
which, upon using equation (2.26) and simplifying, yields the relation
2m-l
1l1,2m:2m = 1: (-1l-1 [2~-1] vr:r 1l2m-r:2m-r. (5.27) r =1
This relation generalizes the result of Govindarajulu (1963a) to the case when the order statistics arise from
a sample comprising a single outlier.
Furthermore, by setting n = 2m and i = m in equation (5.26), we obtain the following recurrence
relation.
Relation 5.13: For m = 1,2, ... ,
2[11 -v ] m,m+l:2m m,m+l:2m
m-1 m-1
1: L (_lrs- 1 [2m-I] [2m-l-s] { } s r Ilm-s,m-s+l:2m-r-s - vm- s,m-s+l:2m-r-s
+
r=O s =1
m-l
L (-ll-1 [2m;l] {llm,m+l:2m-r - Vm,m+1:2m-r} r=1
+ i [2:]{Vm:m[1l1:m - v1:m] + Vl:m[llm:m - Vm:m]}· (5.28)
In particular, for the case m = 1, upon using the result that V1,2:2 = vt1 we see that Relation 5.13
reduces to the well-known identity 111,2:2 = vl:lIl1:1"
5.4. Relations for covariances
Making use of the results derived in Sections 2 and 3, we now obtain upper bounds for the number
of single and product moments to be evaluated for calculating the means, variances and covariances of
order statistics in a sample of size n by assuming these quantities to be available in samples of sizes n-l
and less. The following theorem, proved by Balakrishnan (1988a), thus generalizes the result given in
Section 2.3 to the case when the order statistics arise from a sample containing a single outlier.
Theorem 5.14: In order to find the means, variances and covariances of order statistics in a sample of size
n out of which n-l variables are from an arbitrary continuous population with cdf F(x) and one outlying
variable from another continuous population with cdf G(x), given these quantities in samples of sizes n-l
and less and also the quantities corresponding to the population with cdf F(x), one has to evaluate at most
two single moments and (n-2)/2 product moments for even values of n; and two single moments and
(n-l)/2 product moments for odd values of n.
Proof. In view of Relation 5.4, it is sufficient to evaluate just two single moments Iln:n and Jl~~~ for
calculating the means and variances of all order statistics in a sample of size n. Further, as mentioned
earlier Relation 5.9 would enable us to calculate all the product moments Jl. .. (l:'> i < j :'> n) if we know 1,J.n
117
n-1 suitably chosen moments, for example 11· '+1' (1 ~ i ~ n-1). When n is odd, we need to evaluate 1,1 .n
only (n-1)/2 of these moments as the remaining (n-1)/2 product moments could be obtained by applying
Relation 5.11. Similarly, when n is even, say n = 2m, we need to evaluate only (n-2)/2 = m-1 of the
immediate upper-diagonal product moments as Relation 5.13 gives the product moment 11 +1'2 and m,m . m the remaining m-1 product moments could be obtained by applying Relation 5.11.
Relation 5.15: For 1 ~ k ~ n-1,
n-k+1 k+1
1: [~=n 1l1,j:n + 1: [n~~d 1l1j:n j=2 j=2
[n-1] [n-1] = k v1:k 1l1:n-k + k-1 v1:n-k 1l1:k' (5.29)
Proof. For 1 ~ k ~ n-1, first consider
n-k+1 n-k+1
1: [~={] 1l1,j:n = 1: [~={] f f xy h1j:n(x,y) dx dy, (5.30) j=2 j=2 WI
where hI "n(x,y) is as given in equation (5.6). Now upon interchanging the summation and the integral ,J. signs and then using the binomial identity that
m \' [m] r m-r m I. r (F(y)-F(x)} (l-F(y)} = (l-F(x)} , (5.31)
r=O we obtain from (5.30) that
n-k+1
1: [~={] 1l1j:n = f f xy Hk,n(x,y) dx dy, j=2 WI
(5.32)
where
_ (n-1)! n-k-1 k-1 - (k-l)!(n=k-I)! (l-F(x)} (l-F(y)} g(x) f(y)
(n-1)! n-k-1 k-1 + (k-l)!(n-k-l)! (l-F(x)} (l-F(y)} f(x) g(y)
(n-1)! n-k-2 k-1 + (k-I)!(n-k-2)! (l-F(x)} (l-F(y)} (l-G(x)} f(x) f(y)
(n-1)! n-k-1 k-2 + (k-2)!(n=k-I)! (l-F(x)} (l-F(y)} (l-G(y)} f(x) f(y). (5.33)
Next, consider for 1 ~ k ~ n-1
k+1 k+1
1: [n~k~d Ilij:n = 1: [n~k~l] f f xy h1,j:n(y,x) dy dx, (5.34) j=2 j=2 W2
from equation (5.15). Now upon interchanging the summation and the integral signs and then using the
binomial identity that
m
1: [~] {F(x)-F(y)}r {l_F(x)}m-r = {l_F(y)}m,
r=O we obtain from (5.34) that
118
k+l
I [n~~l] ~lj:n = r I xy Hk,n(x,y) dx dy, j=2 ";"2
(5.35)
where Hk (x,y) is as defined in (5.33). Finally, upon adding equations (5.32) and (5.35), noting that ,n
WI u W2 = ~ , and then simplifying the resulting expression by using equations (5.7) and (5.8), we obtain
the relation in (5.29).
Note that Relation 5.15 involves the product moments ~lj:n (2 S; j S; n) and first order single
moments only, and that there are only [n/2] distinct equations since the relation for k is same as the rela
tion for n-k. Thus, for even values of n, there are only n/2 equations in n-1 product moments and a
knowledge of (n-2)/2 of these would enable us to calculate all of these product moments provided the first
single moment in samples of sizes less than n are all known. Similarly, for odd values of n, we only need
to know (n-1)/2 product moments. Note that these bounds are exactly same as the bounds given in
Theorem 5.14 for the product moments to be evaluated for the calculation of all the product moments.
This is quite expected, since the product moments ~lj:n (2 S; j S; n), along with Relation 5.9 are also
sufficient for the evaluation of all the product moments.
Relation 5.16: For 1 S; i S; n-1 ,
n i
l ~ij:n + l ~r,i+1:n = (n-1) v1:1 ~i:n-1 + vi:n- 1 ~1:1' (5.36)
j=i+1 r=1
Proof. For 1 S; i S; n-l, consider
n n
l ~ij:n = l I I xy hij:n(x,y) dx dy, (5.37) j=i+l j=i+l WI
where hi' J" (x,y) is as given in (5.6). Upon interchanging the summation and the integral signs and then , .n using the binomial identity in (5.31), we get from (5.37) that
n
l ~ij:n = I I xy H;,n(x,y) dx dy, j=i+l WI
where
* _ (n-1)! (i-2 n-i-l Hi,n(x,y) - (1-2)!(n-I-I)! F(x)} (1-F(x)} G(x) f(x) f(y)
(n-1)' (i-1 n-i-1 + (l-l)!(n-~-I)! F(x)} (l-F(x)} g(x) f(y)
(n-l)! (i-1 n-i-2 + (I-I)! (n-I-2)! F(x)} (1-F(x)} (l-G(x)} f(x) f(y)
(n-1)' (i-1 n-i-1 + (l-I)!(n-~-I)! F(x)} (1-F(x)} f(x) g(y).
Proceeding similarly, we also obtain
i
l ~r,i+1:n r:.1
i
l r I xy hr,i+1:n(Y'X) dy dx r=1 ";"2
= I I xy H;,n(X,y) dx dy, W2
(5.38)
(5.39)
(5.40)
119
* where Hi,n(x,y) is as defined above in (5.39). By adding equations (5.38) and (5.40), noting that
WI u W 2 = ~, and then simplifying the resulting expression using equations (5.7) and (5.8), we derive the
required relation.
Relation 5.16 has been made use of by Balakrishnan (1988a) in deriving some similar relations
satisfied by the covariances of order statistics, viz., 0. "n = Cov(Z .. ,Z"n) = 11··. - Il" n IlJ" n (1 ~ i < j -1J. 1.n J. 1,J.n 1. . ~ n). These generalize the results of Joshi and Balakrishnan (1982) to the case when the order statistics
arise from a sample comprising a single outlier.
Relation 5.17: For 1 ~ i ~ n-l,
n i
\' 0·· + \' 0·· 1 L 1J:n L J,1+:n j=i+l j=1
i
= [i v1:1 -.L Ilj:n) [lli+1:n -lli:nJ - [111:1 - Vl:lJ [Ili:n - Vi:n- 1} J=l .
Proof. Using the fact that Ili,j:n = 0ij:n + Ili:n Ilj:n in (5.36) we get n i
\' 0·· + \' 0·· 1 L 1,J:n L J,1+:n j=i+l j=l
n
= (n-1)v1:1 Ili:n_1 + vi:n-11l1:1 -Ili:n L Ilj:n -lli+1:n L Ilj:n'
With the identity
n
j=i+1 j=l
L Ilj:n = (n-1) VI: 1 + 111:1 - L Ilj:n j=i+1 j=1
obtained from Relation 5.1, we have the RHS of equation (5.42) as
i
(n-1) v1:1 [lli:n-1 -lli:nJ + 111:1 [Vi:n- 1 -lli:nJ - [lli+1:n -lli:nJ L Ilj:n' j=l
Now making use of the result that
(n-1) [Ili:n-l -lli:nJ = i [lli+1:n -lli:nJ - [Vi:n- 1 -lli:nJ
obtained from Relation 5.2, we derive the relation in (5.41).
(5.41)
(5.42)
Note that Relations 5.16 and 5.17 give extremely simple and useful results for checking the calcula
tions of product moments and covariances of order statistics from a sample of size n comprising a single
outlier. In particular, setting i = 1 and i = n-1 in equation (5.41), we get the identities
n
20 +\'0. 1,2:n L 1,J:n j=3
[V1:1 -1l1:nJ [1l2:n -1l1:nJ - [111:1 - v1:1J [1l1:n - v1:n- 1J and
n-2
2 0 + \' 0· n-1,n:n L J,n:n = [Iln:n -1l1:1J [Iln:n -lln-1:nJ - [111:1 - v1:1] [lln-1:n - vn- 1:n- 1] j=l
= [Iln:n - Ill: 1] [Iln:n - V n-1:n-1 J - [Iln:n - V 1:1 J [Iln-l:n - vn- 1:n- 1]·
120
The last equation has been obtained from the previous equation simply by rearranging the terms on the
right-hand side.
5.5. Results for symmetric outlier model
Let us assume that the density functions f(x) and g(x) are both symmetric about zero. It is then easy
to see from equations (5.4) and (5.6) that
hn_i+ l:n(x) = hi:n(-x), I :s; i :s; n,
and
hn_j+ l,n-i+ l:n(y,x) = hi,j:n(-x,-y), 1:S; i < j :s; n. As a result, for a symmetric outlier model (the case when both f(x) and g(x) are symmetric about zero) we
have the relations
(k) _ (_l)k (k) ~n-i+1:n - ~i:n'
~n-j+l,n-i+l:n = ~ij:n' and
1 :s; i :s; n, k ~ 1,
1 :s; i < j :s; n,
°n-j+I,n-i+1:n = ~n-j+l,n-i+1:n - ~n-j+1:n ~n-i+1:n = ~ij:n - ~i:n ~j:n
(5.43)
(5.44)
= 0ij:n' 1 :s; i :s; j :s; n. (5.45)
Equations (5.43) - (5.45) could be used to simplify many of the results presented in Sections 2 through 4.
These relations then would help us reduce the bounds given in Theorem 5.14 for the case when the order
statistics arise from a symmetric outlier model.
Relation 5.18: For n even, say n = 2m,
(k) _ ~m:2m-l - 0 for odd k
= 1 {2m II (k) _ V (k) } for even k. 2m-l r-m:2m m:2m-l
Proof. Equation (5.46) is obtained from (5.43) s1mply by setting n = 2m-l and i = m.
(5.46)
(5.47)
Equation (5.47), on
the other hand, follows directly from Relation 5.3 upon using the result that ~~!L2m = ~~~~m for even
values of k.
Next, upon using (5.43) and (5.44) in Relation 5.10 we also derive thefollowing result.
Relation 5.19: For 1 :s; i < j :s; n,
j-i
(1 +(_l)n) [".. - v.. ] =!. '\' (-1' i-i-r-1 r.n] [j:r-11] {(i-r) V . ["1· - vI. ] r-1,J:n 1,j:n n I., r U-r 1- r:n-J+r r- :J-r :J-r r=1
+ (n-j·+r) vI· [" . - V . J} :J-r r-r:n-J+r r:n-J+r i-I
+ l (_l)n-r-l [n;l] {~n-j+1,n-i+l:n-r - Vn- j+1,n-i+1:n-r} r=1 i-I n-j
+ '\' '\' (_1)n-r-s-1 [n-1] [n-1-S] { I., I., s r ~n-j-s+1,n-i-s+1:n-r-s
r=O s=l -V . 1 . 1 } n-J-s+ ,n-1-S+ :n-r-s . (5.48)
121
Note that the right-hand side of the above realtion involves only the expected values and product
moments in samples of sizes n-1 and less. In particular, for even values of n we get from (5.48) that
2 II.. = 2 II . 1 . 1 r-lJ:n r-n-J+ ,n-l+ :n
i-I
= 2 Vi,j:n + l (-1/-1 [n;l] {lln-j+1,n-i+1:n-r - Vn- j+1,n-i+1:n-r} r=1
j-i j-i-r-1 r ] [ ] { + k l (-1) U~r ji~11 G-r) v r:n-j+r [Ill :j-r - vI :j-rJ
r=l
+ (n-j·+r) vI· [" . - v . J} :J-r r-r:n-J+r r:n-J+r
i-I n-j
+ l l (_l/+s-l
r=O s=l
[n-1] [n-1-s] { } s r Iln-j-s+1,n-i-s+1:n-r-s - vn-j- s+1,n-i-s+1:n-r-s . (5.49)
This result, established by Balakrishnan (1988a) generalizes the relation in (2.43) to the case when the
order statistics arise from a symmetric outlier model. Also from equation (5.49) we may note that we do
not have to evaluate any product moments in a sample of size n whenever n is even. In addition, by
setting i = n-l andj = n in equation (5.49) and using the fact that 111:1 = v1:1 = 0, we get
n-2
2 II - 2 v + '\ ( l)r-1 [n-1] {II v } r-l,2:n - 1,2:n L - r r-1,2:n-r - 1,2:n-r r=l
which, upon using the result in (2.44) yields the relation
n-2
2 1l1,2:n = l (-1/-1 {[n;l] 1l1,2:n-r + [~=U V1,2:n-r} r=l
for even values of n.
(5.50)
In the following theorem, which is an analogue of Theorem 5.14 for the symmetric outlier model
case, we essentially make use of equations (5.46) - (5.49) in order to detennine the upper bounds for the
number of single and product moments to be evaluated in a sample of size n.
Theorem 5.20: In order to detennine the means, variances and covariances of order statistics in a sample
of size n out of which n-1 variables are from an arbritary continuous symmetric population with cdf F(x)
and one outlying variable from another continuous symmetric population with cdf G(x), given these quanti
ties in samples of sizes n-1 and less and also the quantities corresponding to the population with cdf F(x),
one has to evaluate at most one single moment if n is even; and one single moment and (n-l)/2 product
moments if n is odd.
Proof. In order to compute the first single moments Ili:n (1 ~ i ~ n), because of Relation 5.4 we have to
evaluate at most one single moment if n is even and no single moment if n is odd as in this case we have
Iln+l = 0 from equation (5.46). Next, in order to compute the second single moments Ilf7~ (1 ~ i ~ n), T:n
we have to evaluate at most one single moment if n is odd and no single moment if n is even as in this
case we have
11 (2) = .! {(n-1) 11 (2) + v(2) } n n n n 2: n 2: n- 1 2: n- 1
122
from equation (5.47). Finally. in order to obtain all the product moments ~ij:n (1 ~ i < j ~ n). we note from Theorem 5.14 that we have to evaluate at most (n-l)/2 product moments if n is odd; however. for
even values of n we do not have to evaluate any product moment as in this case we could compute all the
product moments by using equation (5.49). Hence. the theorem.
5.6. Results for two related outlier models
In the previous section we have established some relations for both single and product moments of
order statistics from a symmetric outlier model. In this section we once again consider the moments of
order statistics from a symmetric outlier model and express them in terms of the moments of order statistics
* in samples drawn from the population with pdf f (x) (obtained by folding the pdf f(x) at zero) and the
* moments of order statistics in samples drawn from the population with pdf f (x) comprising a single outlier
* with pdf g (x) (obtained by folding the pdf g(x) at zero). These results have been proved recently by
Balakrishnan (1988b) and they generalize the relations presented in Section 2.6 to the case when the order
statistics arise from a symmetric outlier model. These results have also been successfully applied by
Balakrishnan and Ambagaspitiya (1988) in order to evaluate the means. variances and covariances of order
statistics from a single scale-outlier double exponential model. They have then used these quantities in
order to examine the variances of various location estimators expressible as linear functions of the order
statistics. Similar work for the normal case has been carried out by David and Shu (1978).
In this regard. let us denote for x > 0
* * F (x) = 2 F(x) - 1. f (x) = 2 f(x). (5.51)
and
* * G (x) = 2 G(x) - 1. g (x) = 2 g(x). (5.52)
* * That is. the density functions f (x) and g (x) are obtained by folding the density functions f(x) and g(x) at
the point zero. respectively. Let us now denote the single and the product moments of order statistics in a
random sample of size n drawn from a population with pdf f*(x) and cdf F*(X) defined in (5.51) by v;~~)
(1 ~ i ~ n. k ~ 1) and V;j:n (1 ~ i < j ~ n). Furthermore. let Il;~~) (1 ~ i ~ n. k ~ 1) and Il;j:n (1 ~ i < j ~ n) denote the single and the product moments of order statistics obtained from a sample of n inde-
* * pendent random variables out of which n-l have pdf f (x) and cdf F (x) defined in (5.51) and one variable
* * has pdf g (x) and cdf G (x) defined in (5.52). Then Balakrishnan (1988b) has essentially derived some
(k). *(k) *(k) * * simple formulae expressing the moments Ili:n' ~ij:n In terms of ~i:n • vi:n • Ilij:n and vij :n Relation 5.21: For 1 ~ i ~ nand k = 1.2 .....
i-I n-l 2n II~) = t [n-l] v ~(k) + (_I)k t [n-l] *(k)
.... l~n L r-l l-r:n-r L r vr-i+l:r ~l ~i
123
i-I n
+ l [n;l] ~;~?n; + (_I)k l [~=U ~;~~ll:r' (5.53) r=O r=i.
Proof. First, express the single moment ~~~~ as the sum of three integrals using equations (5.4) and (5.7)
and split the integrals each into two parts, one on the range (--,0) and the other on the range (0,00). In the
integrals on the range (--00,0), make a substitution such that the range becomes (0,00), use the facts F(-x) =
I-F(x) and G(-x) = I--G(x) along with equations (5.51) and (5.52), and express the integrands in terms of
*** * * m * F , G , f and g . Now expand (I +F (x») in powers of F (x), integrate termwise and the relation in
(5.53) readily follows.
A similar relation for the product moments is established in the following result.
Relation 5.22: For 1 :!> i < j :!> n,
n 2 ~ij:n
i-I
= l r=1 i-I
+ l r=0
n-l
[n-l] * ~ r-l Vi-r,j;:n-r + l
r=j n
[n;l] ~;-r,j-r:n-r + l r=j
[n-l] * r V r+ I-j,r+ l-i:r
[n-l] * r-l ~r+l-j,r+l-i:r
j-l j-l ~ [n-l] * * ~ [n-l] * * - l r-l vj;:n; ~r+l-i:r - l r vr+l-i:r ~j;:n;'
r=i r=i
(5.54)
Proof. First of all, express the product moment ~ij:n as the sum of five integrals using equations (5.6) and
(5.14). Next, by noting that
WI = ((x,y): --00 < x < y < 00) = Rl u R2 u R3,
where
R1 ={(x,y): 0<x<y<oo),R2 ={(x,y): --oo<x<O,O<y<oo)
and
R3 = ((x,y): --00 < x < y < 0),
split each of the five integrals into three parts. In the integrals on the range R1, express F and G in terms
* * * * *m of F and G and f and g in terms of f and g , respectively, expand (1 + F) binomially in powers of
* F and integrate termwise. In the integrals on the range R2, make a substitution x = -z, use the results
* * * * that F(-x) = I-F(x) and G(-x) = I--G(x), express the integrand in terms of F , G , f and g , expand
(F*<y) + F*(x»)m binomially in powers of F*<y) and F*(x) and then integrate termwise. Finally in the
integrals on the range R3, put x = -u and y = -v, use the results that F(-z) = I-F(z) and G(-z) = I-G(z),
* * * * *m * express the integrand in terms of F , G , f and g , expand (I+F) binomially in powers of F and
integrate termwise. Combining all these results and simplifying the resulting expression, we derive the
relation in (5.54).
*(k) *(k) * * Remark 5.23: If the moments v i :m ' ~i:m' v ij :m and ~ij:m are all available for sample sizes up to n,
then all the single moments ~~~) (1 :!> i :!> n) and the product moments ~. '. (I:!> i < j :!> n) of order stat-1. n l,j.n
istics in a· sample of size n from a symmetric outlier model (with a single outlier present) could be com-
puted by using Relations 5.21 and 5.22. Thus, for example, Balakrishnan and Ambagaspitiya (1988) have
applied Relations 5.21 and 5.22 in computing the means, variances and covariances of order statistics from
124
a single scale-outlier double exponential model by making use of the single and the product moments of
order statistics from a standard exponential distribution and also the single and the product moments of
order statistics from a single scale-outlier exponential model.
Remark 5.24: In particular, if we set G(x) ;: F(x) (that is, when the variable Y in the sample is not an
outlier), then Relations 5.21 and 5.22 simply reduce to
i-I n 2n i k ) = \' [n] v~(k) + (_I)k \'
en L r l-r:n-r L [n] v*(k) r r+ I-i:r
r=O r=i and
i-I n j-I
2n vij:n = L [~] V;-r,j-r:n-r + L [~] V;+I-j,r+l-i:r - L [~] V;+I_i:r V~-r:n-r' r=O r=j r=i
respectively. Note that these are precisely the same as the results presented in Section 2.6.
Remark 5.25: In using Relations 5.21 and 5.22, an error coud essentially arise from two sources, (i) due to
approximations in the coefficients, and (ii) due to approximations in the pivotal quantities. Since the
coefficients occurring in both the relations are simple binomial coefficients which are integral and could be
evaluated exactly at least for small values of n, we may assume that the error involved due to the approxi
mations in the coefficients to be zero. Now if E and E' are the errors involved in approximating each of
the pivotal values v *(k) and Il *(k), respectively, then the maximum cumulative rounding error in evaluating
Il~~~ (I ~ i ~ n) by means of Relation 5.21 is given by
i-I n-l i-I n
2-fl [E L [~=U +E L [n;l] +E' L [n;l] +E' L [~=n] r=1 r=i r=O r=i
n-I n n-I
< 2-n e* [L [n;l] + L [~=U] < 2--(n-l) e* L [n;l] = e*, r=O r=1 r=O
* where E = max(E,E '). That is, the maximum error involved in numerically computing the single moments
Il~~~ (I ~ i ~ n, k <! I) using the moments v *(k) and Il *(k) is at most E * = max(E,E '), where E and E' are
respectively the maximum errors involved in approximating each of the pivotal quantities v *(k) and Il *(k).
*(k) *(k) * * Hence, if the moments v. , Il· , v·· and Il·· are computed sufficiently accurately, then the em l:m 1,J:m 1,J:m
single moments Ilf~~ (I < i ~ n, k <! I) and the product moments Ilij:n (I ~ i < j ~ n) could all be com
puted by using Relations 5.21 and 5.22 without accumulating serious rounding errors, at least for small
sample sizes.
5.7. Functional behaviour of order statistics
Let us first consider the special case of a location-outlier model, i.e., G(x) = F(x-A.) for all x. We
could now write Y = Xn +A., where Xn is a random variable with pdf f(x) and F(x) and independent of the
remaining n-I variables Xl' X2, ... ,Xn_l . For convenience let us denote the i'th order statistic in this case
125
by Zi:n(A), its pdf by hi:n(x;A) and the cdf by Hi:n(x;A). Then it could be easily seen from equation (5.3)
that Hi:n(x;A) is a decreasing function of A. The behaviour of Zi:n(A) as a function of A has been studied
in detail by David and Shu (1978) (also see Hampel, 1974). Denoting the observed values of the random
variables X, Y and Z by x, y and z, and inserting y = xn +1.. into the ordered sample of size n-I, viz.,
xl. 1 S; x2. 1 S; ... S; x 1. l' we then have for fixed values of xl' x2'···, x that .n- .n- n- .n- n zl. (A) = x +1.. if x +1.. S; x1.n_1 .n n n.
= xI:n-I if xn +1.. > xI:n_1 (5.55) and for i = 2,3, ... ,n-1
and
= xi-I:n-I = x +1..
n
= xi:n_1
zn:n(A) = xn- I:n- I = X +1.. n
if x +1.. S; x· I·n 1 n 1- . -
if x. 1 1 < x +1.. S; x. 1 1- :n- n l:n-if xn +1.. > xi:n_1
if xn +1.. S; xn- I :n- I if xn +1.. > xn_l :n_l .
From equations (5.55) - (5.57), we see that zi:n(A) is a nondecreasing function of A; also
lim zl. (A) = zl. (-00) = -00, A~.n .n
lim z .. (A) = z .. (-00) = x. 1. l' 2 S; i S; n, A~ l.n l.n 1- .n-
lim Z •• (A) = Z •• (00) = x.. I' 1 S; i S; n-I, 1..-7 00 l.n l.n l.n-
and
(5.56)
(5.57)
lim Z (A) = Z (00) = 00 (5.58) 1..-700 n:n n:n .
For a finite value of A we see from equations (5.55) - (5.57) that E(Zi:n(A» = lli:n(A.), 1 S; i S; n, exists if
E(X) exists. By using the monotone convergence theorem (for example, see, Loeve, 1977) we also get for
2S;iS;n
lim E(Z .. (A» = E [~im Z .. (A)] A~ I.n II.~ I.n
implying that
Ili:n(-OO) = E(Xi_I :n_I) = vi_l :n_l ; similarly for 1 S; i S; n-I
t~ E(Zi:n(A» = E[~~~ Zi:n(A)]
implying that
Ili:n(oo) = E(Xi:n_l ) = vi:n_l · Also
Ill:n(-OO) = -00 and IlI:n(oo) = 00.
Furthermore, upon noting that for fixed x and y
f(x-A) -7 0, F(x-A) -7 0, F(Y-A) - F(x-A) -7 0
as A -7 00, we have from equations (5.4) and (5.6) that
pm hi:n(x;A) = hi:n(x;oo) = fi:n_l(x), 1 S; i S; n-I, 11.-700
and
lim h ... (x,y;A) = h ... (x,y;oo) = f... 1 (x,y), 1 S; i < j S; n-l. 1..-7 00 I,J.n 1,J.n I,J.n-
Using a similar argument we also have
(5.59)
(5.60)
(5.61)
(5.62)
(5.63)
126
i~ hi:n(X;A) = hi:n(x;--) = fi_1:n_1(x), 2 SiS n,
and
i~ hij:n(X,y;A) = hij:n(x,y;--) = fi-1j-1:n-1(x,y), 2 S i < j S n.
Now upon using the Lebesgue dominated convergence theorem (see Loeve, 1977) we obtain
lim a ... (A) = a ... (00) = Cov(X.. l'X.. 1)' 1 SiS j S n-1, A-+oo IJ.n IJ.n I.n- J.n-and
lim a ... (A) = a ... (--) = Cov(X. 1. l'X. 1. 1)' 2 SiS j S n. A-!--<>O IJ.n IJ.n 1- .n- J- .n-
(5.64)
(5.65)
(5.66)
(5.67)
Next, let us consider a scale-outlier model, i.e., G(x) = F(x/'t) for all x, where t > O. In this case we
could write Y = tXn, where Xn is a random variable with pdf f(x) and cdf F(x) and independent of the n-1
'" variates Xl'Xz, ... ,Xn_1. For convenience let us denote the i'th order statistic in this case by Zi:n(t), its
'" '" '" pdf by hi:n(x;t) and the cdf by Hi:n(x;t). From equation (5.3) it could then be seen that Hi:n(x;t) is a decreasing function of t for fixed positive values of x and an increasing function of t for fixed negative
'" values of x. The functional behaviour of Zi:n(t) as a function of t has been studied by David and Shu
(1978). As before, denoting the realizations of the variables X, Y and Z by x, y and z, and inserting y =
t xn into the ordered sample x1:n-1 S x2:n-1 S ... S xn- 1:n- 1' we have for fixed values of xl'x2, ... ,xn that
'" zl:n(t) = t xn if t xn S x1:n-1
= x1:n-1 if t xn > x1:n- 1 (5.68) and for 2 SiS n-1
and
'"
= xi-l:n-1 =tx n
= xi:n-1
if t xn S xi-1:n-1
if xi-1:n-1 < t xn S xi:n-1 if t xn > xi:n- 1
zn:n(t) = xn- 1:n- 1 if t xn S xn- 1:n- 1
(5.69)
= t xn ift xn > xn- 1:n- 1. (5.70)
'" From equations (5.68) - (5.70), we see that zi:n(t) is nondecreasing in t if xn > 0 and is nonincreasing in
t if xn < O. Moreover, for 2 SiS n-1,
'" '" lim z. (t) = zi:n(oo) = xi:n-1 if xn > 0 A-+oo I:n
= xi-1:n-1 if xn < O. (5.71)
Now if X is symmetrically distributed about zero, we have from equation (5.71) that for 2 SiS n-1
1 im E(Z:. (t» = E [lim Z: (t)] = E [z: (00)] A-+oo l.n t-+oo I:n I:n
which implies = Pr[Xn > 0) E[Xi:n_1) + Pr[Xn < 0) E[Xi_1:n_1)
Jl;:n(oo) = i [Vi- 1:n- 1 + vi:n-d· (5.72)
In addition, we also have
'" '" limE(ZI. (t»=Jll. (00)= __ t-+oo .n .n
and
127
* * ~ im E(Zn:n('t)) = Iln:n(oo) = 00 •
• ~OO
We shall make use of all these results in the next section in order to examine the bias and the mean square
error of various estimators of the location parameter that are expressible as linear functions of the order
statistics when an unidentified single outlying observation is present in a sample of size n.
5.8. Almlications in robustness studies
The robust estimation of the location and scale parameters of symmetric populations, in particular,
normal distribution, has been of considerable interest in the recent past. For various symmetric populations,
Crow and Siddiqui (1967) have studied the efficiency of some estimators of the location parameter such as
the median, Winsorized means and trimmed means. Based on Monte Carlo methods, Andrews et al. (1972)
have carried out a similar study for a much larger class of robust estimators of the location parameter that
also includes many adaptive estimators which essentially adapt themselves to some special features of a
given sample. One could also refer to Tiku et al. (1986) for a detailed comparative study of various robust
estimators of the location and scale parameters.
Here we shall restrict our attention to the following estimators of the location parameter that are
based on a sample of size n.
(a) Sample mean:
n X =! ~ z ..
n n l l:n' i=1
(b) Trimmed means:
n-r
Tn(r) = n~2r l Zi:n; i=r+l
(c) Winsorized means:
n-r-l
Wn(r) = k [ l Zi:n + (r+l) [Zr+l:n + Zn-r:n]]; i=r+2
(d) Modified maximum likelihood estimators:
n-r-l
Mn(r) = ~ [ l Zi:n + (1+r~)[Zr+l:n + Zn--r:n]], i=r+2
where m = n-2r+2r~ ; Tiku (1967, 1980) has given the expression for ~ while Tiku et al. (1986)
have tabulated the values of ~ for various choices of n and r;
(e) Linearly weighted means:
n 2- r
Ln(r) = n 1 2 l (2i-l) [Zr+i:n + Zn-i'-i+l:n] 2(2 - r) i=1
for even values of n;
(f) Oastwirth mean:
128
o = 0.3 (Z + Z n ) + 0.2 (Zn + Zn ) n [!]+l:n n-[3"]:n Z:n r+l:n
for even values of n, where [!] denotes the integral part of !.
For the location-outlier model, the estimators considered above may all be written as
n
M(A) = l ai Zi:n(A).
i=l n
From equations (5.55) - (5.57) it could be immediately seen that L a· z.. (A) is a nondecreasing contini=1 1 1.n
n uous function of A and, in addition, E(M(A» = L a. IJ. .. (A) is an increasing function of A, with E(M(oo» =
i=1 1 1.n
00 except when an = 0, and E(M(--» = -- except when a1 = O. From equations (5.58) and (5.59), we get
when a1 = 0 that
n
E(M(--» = l ai Vi-l:n-l' i=2
and similarly when an = 0 we get from equations (5.58) and (5.60) that
n-l
E(M(oo» = l ai vi:n- 1·
i=1 Making use of the tables of expected values of nonnal order statistics under the location-outlier
model prepared by David et a1. (1977), bias of various estimators of the mean IJ. of a nonnal N(IJ.,I) popu
lation, based on a sample of size n = 10 with one observation being from a nonnal N(IJ.+A,I) distribution,
have been computed for some specific values of A. These are presented in Table 5.1 given below.
Estimator 0.0
XIo°.O 0.0
T IO(I) 0.0
T IO(2) 0.0
MedlO 0.0
W IO(I) 0.0
W IO(2) 0.0
MIO(I) 0.0
MIO(2) 0.0
LIO(I) 0.0
L IO(2) 0.0
010 0.0
Table 5.1
Bias of various estimators of IJ. for n = 10 when a single observation
is from N(IJ.+A,I) and the others from N(IJ.,I)
A
0.5 1.0 1.5 2.0 3.0 4.0
0.05000 0.10000 0.15000 0.20000 0.30000 0.40000
0.04912 0.09325 0.12870 0.15400 0.17871 0.18470
0.04869 0.09023 0.12041 0.13904 0.15311 0.15521
0.04832 0.08768 0.11381 0.12795 0.13642 0.13723
0.04938 0.09506 0.13368 0.16298 0.19407 0.20239
0.04889 0.09156 0.12389 0.14497 0.16217 0.16504
0.04934 0.09484 0.13311 0.16194 0.19229 0.20037
0.04886 0.09137 0.12342 0.14418 0.16091 0.16369
0.04869 0.09024 0.12056 0.13954 0.15459 0.15727
0.04850 0.08892 0.11700 0.13328 0.14436 0.14576
0.04847 0.08873 0.11649 0.13237 0.14285 0.14407
00
0.18563
0.15538
0.13726
0.20377
0.16530
0.20169
0.16394
0.15758
0.14585
0.14414
129
By looking at the values of the bias of various estimators given in Table 5.1, we get the ordering
X10 -< W 10(1) -< M1O(1) -< T 10(1) -< W 10(2) -< M1O(2) -< L1O(l)
-< T 10(2) -< L 1O(2) -< 010 -< Med1O, (5.73)
where -< denotes "inferior to". It should be noted that the trimmed means have a smaller bias than the
corresponding modified maximum likelihood estimators which, in tum, have a smaller bias than the
corresponding Winsorized means. Also, as rightly pointed out by David and Shu (1978), the median is
more biased than what we may naively have thought it to be. In addition, the estimators based on the 6
central order statistics are seen to be less subject to bias than those based on the 8 central order statistics.
This is not surprising as we would expect an estimator omitting the 4 extreme observations (two smallest
and two largest) to exclude the single outlier present in the sample with a larger probability as compared to
an estimator omitting only the 2 extreme observations (one smallest and one largest). For the case when n
= 10 and A = 2, for example, we have the following values of Pr{rank(Y)=i} = IIi from an extensive set of
tables prepared by Milton (1970):
t 123
II( 0.001 0.003 0.005
456
0.009 0.014 0.023
7 8 9 10
0.040 0.072 0.159 0.674
From these values we see that an estimator omitting the four extreme observations (two smallest and two
largest) excludes the outlier with probability 0.837 while an estimator omitting the two extreme observa
tions (one smallest and one largest) excludes the outlier with probability 0.675 only.
Considering now the expression of E(M2(A» and then integrating by parts, we obtain
E(M2(A» = 2 fo x Pr{ IM(A) I > x} dx. (5.74)
Assuming f to be a standardized symmetric density function and taking accordingly ai = an_i+ 1 for i
= 1,2, ... ,[~], we have M(O) to be symmetrically distributed about 0 and also M(-A) to be distributed
exactly as -M(A). Then
Pr{ I M(A) I > x} = Pr{M(A) > x} + Pr{M(-A) > x}
may be expected to be an increasing function of I AI and, consequently, we have E(M2(A» to be an
increasing function of A from equation (5.74). Furthermore, for a1 = an = 0, we obtain
E(M2(±00» = lim E(M(A»2 A-IOO
n-1
=E[lim \' a.Z .. (A)]2 , L I l.n 11.-100 i=2 n-1
= E a. , Z. (II.) [L lim ,]2 I 11.-100 l:n
i=2 n-1
= E[ L ai Xi :n- 1] 2
i=2 n-1 n-1
= [E(M(oo)] 2 + L L aiaj Cov[Xi:n_1,Xj:n_1]i=2 j=2
In Table 5.2 we have computed the mean square error of the various estimators considered earlier for
130
whom the bias are given in Table 5.1. Based on these mean square error values, we observe the partial
ordering
MedIO -< 010 -< L1O(2) -< L1O(I) -< T 10(2) (5.75)
as well as W 10(1) -< M1O(I), W 10(2) -< M1O(2) and W 10(2) -< T 10(1)·
Tllbl~ !l.~
Mean square error of various estimators of ~ for n = 10 when a
single observation is from N(~+A.,I) and the others from N(~,I)
A.
Estimator 0.0 0.5 1.0 1.5 2.0 3.0 4.0 00
"10 0.10000 0.10250 0.11000 0.12250 0.14000 0.19000 0.26000 00
T1O(l) 0.10534 0.10791 0.11471 0.12387 0.13285 0.14475 0.14865 0.14942
T1O(2) 0.11331 0.11603 0.12297 0.13132 0.13848 0.14580 0.14730 0.14745
MedlO 0.13833 0.14161 0.14964 0.15852 0.16524 0.17072 0.17146 0.17150
W IO(I) 0.10437 0.10693 0.11403 0.12405 0.13469 0.15039 0.15627 0.15755
W1O(2) 0.11133 0.11402 0.12106 0.12995 0.13805 0.14713 0.14926 0.14950
M1O(I) 0.10432 0.10688 0.11396 0.12385 0.13430 0.14950 0.15513 0.15581
M1O(2) 0.11125 0.11395 0.12097 0.12974 0.13770 0.14649 0.14853 0.14876
L1O(1) 0.11371 0.11644 0.12337 0.13169 0.13882 0.14626 0.14797 0.14820
L1O(2) 0.12097 0.12386 0.13105 0.13933 0.14598 0.15206 0.15310 0.15318
°10 0.12256 0.12549 0.13276 0.14111 0.14777 0.15376 0.15472 0.15479
Realize the dilemma we are in at this juncture as the ordering in (5.75) based on the mean square error is
almost a reverse ordering of the estimators given in (5.73) based on the bias. Table 5.2 also reveals that
the modified maximum likelihood estimators perform better than the Winsorized means which in turn
perform better than the trimmed means when A. is small. This is not surprising since the modified maxi
mum likelihood estimators are almost best linear unbiased estimators (and are also almost the maximum
likelihood estimators) based on the n-2r central order statistics. For large values of A., however, the
trimmed mean T 10(2) and the modified maximum likelihhod estimator M IO(2) both remain optimal. For
more details on some properties of these estimators and their applications in <ieveloping some robust
inference procedures, one could refer to Andrews et al. (1972) and Tiku et al. (1986).
Similarly, for the scale-outlier model we may write the location estimators as
n-r * ~ * M (t) = l ai Zi:n(t)
i=r+l which, when ai = an+1--i' are clearly symmetrically distributed about 0 for all values of t. As a result we
* have E(M ('t» = 0 for all values of t. Moreover, by comparing equation (5.71) with equation (5.58) we
* immediately see that the limiting behaviour of M (t), given Xn > 0, as t -+ 00 corresponds to that of M(A.)
* as A. -+ 00, and in a similar way the limiting behaviour of M (t), given Xn < 0, as t -+ 00 corresponds to that
of M(A.) as A. -+ -. We have, therefore,
~~~ E [M*2(t)] = E [M*2(oo)]
131
= Pr(X > 0) E[M2(oo)] + Pr(Xn < 0) E[M2(_)]
= i{E[M2(oo)] + E[M2(_)]}
= EtM2(oo)]. (5.76) Under the scale-outlier model the estimators of location considered earlier are all unbiased for all
values of't. By making use of the table of variances and covariances of normal order statistics under the
Table 5.3
Variance of various estimators of Il for n = 10 when a
single observation is from N(Il,'t2) and the others from N(Il,I)
't
Estimator 0.5 1.0 2.0 3.0 4.0
XlO 0.09250 0.10000 0.13000 0.18000 0.25000 00
T lO(l) 0.09491 0.10534 0.12133 0.12955 0.13417 0.14942
T lO(2) 0.09953 0.11331 0.12773 0.13389 0.13717 0.14745
Med10 0.11728 0.13833 0.15375 0.15953 0.16249 0.17150
W lO(l) 0.09571 0.10437 0.12215 0.13221 0.13801 0.15754
W 1O(2) 0.09972 0.11133 0.12664 0.13365 0.13745 0.14950
M lO(l) 0.09548 0.10432 0.12187 0.13171 0.13735 0.15581
M lO(2) 0.09940 0.11125 0.12638 0.13328 0.13699 0.14876
L lO(I) 0.09934 0.11371 0.12815 0.13436 0.13769 0.14820
L1O(2) 0.10432 0.12097 0.13531 0.14101 0.14398 0.15318
°10 0.10573 0.12256 0.13703 0.14270 0.14565 0.15479
scale-outlier model prepared by David et al. (1977), variance of various estimators of the mean Il of a
normal N(Il,I) population, based on a sample of size n = 10 with one observation being from a normal
N(Il,'t2) distribution, have been computed for some specific choices of 'to These values are presented in
Table 5.3. From these values we observe that the partial ordering in (5.75) still holds, except for the
"inlier" situation 't = 0.5 when T 10(2) is inferior to L lO(1). We also observe that W 10(1) -< M lO(I),
W 10(2) -< M lO(2), W 10(2) -< T 10(1) and T 10(2) -< M lO(2) except when 't is very large. These orderings, in
general, agree with those based on Table 5.2. As David and Shu (1978) rightly pointed out, this general
agreement is less surprising when we remember that there is identity of results not only in the null case (A
= 0 or 't = 1) but also for the limiting case (A = 't = 00), the latter by equation (5.76).
Making use of the results given in Section 5.6 and the explicit expressions for the means, variances
and covariances of order statistics from a single scale-outlier exponential model, Balakrishnan and
Ambagaspitiya (1988) have evaluated the means, variances and covariances of order statistics from a single
scale-outlier double exponential model. The model they have considered is that a sample of size n consists
of n-l observations from a Laplace population with pdf
f(x) = k e-ix-lli/cr, - < x < 00, - < Il < 00, cr> 0,
while one observation is from a population with pdf
132
g(X) = :zke-lx-lll/aO", ---00 < X < "',0" > 0, a> O.
After noting that the various estimators of the location parameter Il considered earlier are all unbiased,
Balakrishnan and Ambagaspitiya (1988) have studied the performance of these estimators by computing
their variance for various choices of a and different sample sizes. The median, the linearly weighted mean
and the Gastwirth mean all perform very efficiently in this case and this is to be expected as the double
exponential distribution is a symmetric long-tailed distribution. In addition, they have also noted that the
trimmed means do better than the corresponding modified maximum likelihood estimators which in tum
perform better than the corresponding Winsorized means.
133
Exercises
1. Suppose XI,x2 •...• X are independent random variables with X. (i=I.2 •...• n) having pdf f.(x) and cdf nIl
Fi(x). Then show that the cdf of Xi:n may be written as
n
H .. (x) = L L n F. (x) fr {I-F. (X)}. 1.n . S k=1 Jk k=r+ I Jk
f=1 r where the summation Sr extends over all permutations (i1.j2 •...• jn) of 1.2 •...• n for which
ji <j2<···<jr andjr+1 <jr+2< ···<jn·
2. (Sen (1970)). In the above problem. denoting the collection of distribution functions (FI.F2 •...• Fn) by
n .9," the average cdf by P = ~ r~1 Fr and its unique quantile of order p by ~p' i.e .• P(~p) = P. show
that for i = 2.3 •...• n-1 and x ::;; ~i-I < ~i ::;; y.
n n
Pr{x < Xi:n ::;; yl ~ ~ Pr{x < Xi:n ::;; YiP}. where the equality holds only if F I = F2 = ... = F n = F at both the points x and y. Also show. for
all x. that
Pr{XI:n ::;; xl ~ ~ Pr{XI:n ::;; xlP} and
Pr{Xn:n ::;; xl ~::;; Pr{Xn:n ::;; xIP}. where the equalities hold only if FI = F2 = ... = F n = F at the point x. (Hint: You may use a result
of Hoeffding (1956) on the distribution of successes in n independent trials).
3. As in Problem 1. let X1:n ::;; X2:n ::;; ... ::;; Xn:n denote the order statistics obtained from n independent
random variables Xi (i = 1.2 •...• n) having pdf fi(x) and cdf Fi(x). Then show that:
(a) the pdf of the largest order statistic Xn:n is given by
n n
hn:n(x) = { n 1 F/X)} L {fr(X)IF/X)}; r= r=1
(b) the cdf of the sample range W n= Xn:n -X1:n is given by
n n Pr(Wn ::;; w) = r~1 f-oo f/x) S~1 (Fs(x+w) - F/x)} dx.
s;tr
4. (Cohn et al. (1960)). Suppose Xij (i = 1.2 •...• k; j = 1.2 •...• n) are k independent random samples of
size n. with X.. having pdf f(x) and cdf F.(x). j = 1.2 •...• n. Then show that the maxima from the k 1J 1 1
samples are the k largest of the kn random variables with probability
k k k nk [ { n F~-I(X)} L {n (1-Fs(X))}fs(X) dx.
-00 m=1 r=1 s=1 s;tr
134
5. (Kelleher and Walsh (1972». In the sample X1:n :S X2:n :S ..• :S Xn:n of ordered continuous
variables, suppose Xn:n is an outlier in the sense that it arises from a different population than that
giving rise to the remainder of the sample. Show then that the confidence coefficient of the interval
(X .. .x '+1')' i=2,3, ... ,[n!2], for the median in samples of n still equals 2-fi nt[n]. l.n n-l .n r=i r
6. (Conover (1965); David (1966». Suppose k mutually independent random samples of size n, each
drawn from a continuous population with cdf F(x), are ordered on the basis of the largest observation
in each sample. Further, suppose Yij (i = 1,2, ... ,n; j = 1,2, ... ,k) denote the i'th variate in order of
magnitude in the sample whose largest member Y 1j has rank j among the k maxima Y 11'
Y 12""'Y lk' Then show that j-1
Pr(Yij < x) = l [~] {l-Fn(x)}P {Fn(x) t-p
p=o j-1 i-I q-l
+ l l l j[f] q[~] [j~l] [q;l] (-llq-r-p [{F(X)}n-1-r
p=O q=l r=O
- {F(X) }nk-fip] I [nk-np+ 1-n+r],
where the triple summation is zero when i = 1.
7. (Neyman and Scott (1971». Let Xn:n be called a y outlier on the right if Xn:n > Xn- 1:n + y(Xn- l :n - X1:n), y> O. Let us also denote the probability that a sample of n observations from a continuous
population with pdf f(x) and cdf F(x) will contain such a y outlier on the right by TI(y,n,F). Then
prove that
8.
TI(Y,n,F) = n(n-1) ([ {F[ry:r] - F(x) }n-2 f(y) f(x) dy dx.
Show also that for fixed n > 2, the above probability TI(Y,n,F) is a decreasing function of y which
tends to 0 as n -; 00.
(Vaughan and Venables (1972». Suppose Xl' X2"",Xn are independently distributed, Xj having
density function f/x) and cdf F/x). Consider a sampling process where one value is taken from each
of the above n distributions and the sample ordered so that X1:n :S X2:n :S ••• :S Xn:n.
a. Show that the pdf of Xk . , I :S kl :S n, is given by + l·n +
F1(x1) F2(x1) Fn(x1) ] kC l rows
F1(x l ) F2(x1) Fn(x l )
hkl :n(xI) = {(kC 1)!(n-kl )! }-1 fl (Xl) f2(x1) fn(x1)
I-F1 (Xl) I-F2(x l ) 1-Fn(xl ) }-l<1 row, 1-FI(x1) I-F2(x1) 1-Fn(x1)
+ + where I A I denotes the permanent of a square matrix A. The permanent of A is defined like
b.
c.
135
the determinant, except that all signs are positive; for details, see Aitken (1939).
Similarly, show that the joint pdf of Xkl :n and X~:n' 1 ~ kl < k2 ~ n, for xl < x2' is given
by
hkl'~:n(xl'x2) = {(kCl)!(~-kCl)!(n-k2)! rl
+ t
] k C ' """
Fl(Xl ) F2(x l ) Fn(xl )
Fl(xl ) F2(x l ) Fn(xl )
fl (Xl>' f2(xl ) fn(x l )
Fl (x2)-Fl (Xl) F2(x2)-F2(xl ) F n(x2)-F n(xl )
Fl (x2)-Fl (Xl) F2(x2)-F2(x l ) F n(x2)-F n(x l ) ] k,-kC'row.
fl (x2) f2(x2) fn(x2)
l-Fl (x2) l-F2(x2) l-F n(x2)
l-Fl (x2) l-F2(x2) l-Fn(x2)
Generalize these results and derive the joint density of any subset Xkl :n' X~:n'···' Xkp:n of
order statistics, where 1 ~ kl < ~ < ... < kp ~ n.
d. In particular, show that the density of the full joint distribution of Xl :n, X2:n,···, Xn:n at xl'
x2, ... ,xn is given by + + fl (Xl) . . fn(x l )
e. For the special case when there is exactly one outlier in the sample, that is, Fl = F2 = ...
= F n-l = F and F n = G, show that the densities in (a) and (b) reduce to the expressions in
equations (5.4) and (5.6), respectively.
9. (Balakrishnan (1987b». For a single-outlier model, show for n ~ 2 that
n n n \' 1,,9') = \' 1 ik) + l \' [,,9') _ v~k)] L 1"'1: n L 1 1 : n n L "'1 : n 1 : n
i=l i=l i=l
n n-l = l \' ,,(k~ + \' [l_l] v (k~
n L "'1: 1 LIn 1: 1
i=i i=l and
n n n
2 (n-~+l) ~~~~= 2 (n-~+l) v~~~ + k 2 [~~~~ -v~~~] ~l ~l ~l
136
n n-l \' lI~k~ + \'
n L "-1:1 L i=1 i=l
In particular, by setting G(x) = F(x) in the above results, deduce the identities given in Exercise 4 of
Chapter 2.
10. (Balakrishnan (1988a». For a single-outlier model, show that
n-l n
L Ili,i+l:n + L [j=n 1l1,j:j i=1 j=2
n-1 n-l
= L {[j=n v1:n_j Ilj:j + [nj1] vj :j 1l1:n-j } - L [nj1] V1,j:j" j=l j=2
11. Assuming Xl' X2, ... ,xn_1 to be a random sample from a standard exponential distribution with pdf
f(x) = e -x, 0 ~ x < 00, and Y to be an independent random variable with pdf g(x) = ~ e -xla, x ~ 0, a
> 0, and then denoting Zl:n ~ Z2:n ~ ... ~ Zn:n for the order statistics obtained from these n
variables, derive E(Zi:n)' Var(Zi:n) and Cov(Zi:n,Zj:n).
12. Assuming Y to be an independent random variable with pdf g(x) = e -(x-Il), x ~ Il, once again derive
E(ZI·.n)' Var(Z .. ) and Cov(Z .. ,Z .. ). . I.n I.n J.n
13. (Balakrishnan and Ambagaspitiya (1988». Making use of the expressions of means, variances and
covariances of order statistics derived in Exercise 11 in Relations 5.21 and 5.22, compute the means,
variances and covariances of order statistics from a single scale-outlier double exponential model,
viz.,
Xl' ... ,Xn_1 with pdf ie-lxi, -<>0 < X < 00
and
Y with pdf k e -I x 1 la, -<>0 < x < 00, a > 0,
for n = 10 and 20, and a = 0.5,1.0(1.0)10.0. Making use of these values, compare the efficiency of
the various estimators of the location parameter considered in Section 8.
14. (Smith and Tong (1983); David (1986». Let xr = Yr+zr' r = 1,2, ... ,n, and write x(l) ~ x(2) ~ ... ~
x(n) or x[l] ~ x[2] ~ ... ~ x[nl' and likewise for Yr and zr. Then, show for 1 ~ i ~ n, that
min [ ] x[i] ~ . Y[r] + z[i+l-r] r=1,2, ... ,1
and
xCi) ~ r=l,~~~ .. ,i hr) + Z(i+1-rl
Thence, derive the inequality
137
15. (David (1986». For a single-location outlier model, defining Zr = B > 0 for some unknown value of
r and Zr = 0 otherwise, making use of the results in Exercise 14 show that for i = 1, 2, ... ,n-l
E[Y(i)] :::; E[X(i)] :::; min [E(Y(i+l»' E(Y(i» + B], and for i = n
max [E(Y(n»' E(Y(l» + B] :::; E(X(n»:::; E(Y(n» + B.
16. (Smith and Tong (1983); David (1986». Let x(i)' i=I,2, ... ,n, denote the n sums Y(i)+ z(n+1-i)
arranged in increasing order of magnitude. Further, let l be a convex linear function of ordered xi'
n viz. l = r c. x(.) with C1 :::; C2 :::; ... :::; C . Then show that for C1 ~ 0
i=1 lIn n n
2 Ci X(i):::; l:::; 2 Ci [Y(i) + Z(i)]' i=l i=1
Also, show that the above inequality continues to hold if Cm < 0, Cm+1 ~ 0, m = 1,2, ... ,n.
17. Using the notation in Section 5.1, without assuming absolute continuity verify that
i Hi+l:n(x) + (n-i) Hi:n(x) = (n-l) Hi:n_l(x) + Fi:n_l(x)
18.
for 1 :::; i :::; n-1. [Hint: Use equations (5.3) and (2.68)]. Due to the above result we may note that
Relation 5.2 continues to hold without the assumption of absolute continuity.
Denoting the joint c.d.f. of Z. and Z. (1:::; r < s :::; n) by H . (x,y), x :::; y, show that r.n s.n r,s.n
(i-l)H ... (x,y) + G-i)R. 1'. (x,y) + (n-j+l)H. l' l' (x.y) = (n-l)H. l' l' l(x,y) 1J.n 1- J.n 1- J- .n 1- ,j- .n-+ F. l' l' 1 (x,y) 1- J- .n-
for 2 :::; i < j :::; n. Due to this result we may note that Relation 5.9 continues to hold without the
assumption of absolute continuity.
19. Let Xl' X2, ... ,xn be LLd. from an arbitrary distribution function F. Let U l' U2 .... , Un be i.i.d.
Uniform (O.B), independent of the X's. For i = 1,2, ... ,n, define Yi = Xi+Ur Prove that for 1 :::; i :::; n,
Y" n ~ Xl" as B -+ O. In addition, if the X.'s have a finite k'th moment. show that E(Y~'n) -+ E(Xk1· 'n) 1. .n 1 1. •
as B -+ O. Since the Yi's are absolutely connot refer to densities continue to hold for general
distributions.
20. (Balakrishnan (l988c». (i) Let Xl' X2 .... , Xn be independent variables with Xi (i = 1.2 ..... n) having
pdf fi(x) and cdf Fi(x). Let Xl:n :::; X2:n :::; ... :::; Xn:n be the order statistics obtained from the
realizations of Xl' X2 .... , Xn. Then the density of Xi:n (1 :::; i :::; n), viz., hi:n(x). and the joint density
of X" and X" (1:::; i < j :::; n), viz., h. '. (x.y). are as given in Exercise 8. 1.n j.n 1,j.n [rl'r 2.· ... rm]
Now let us denote hi :n-m (x), 1 :::; i :::; n - m, for the density function of the i'th order statistic
138
in a sample of size n-m obtained by dropping X • X •...• X from the original set of n variables; r l r2 rm
similarly. let us denote hfr~. 1 (x.y). 1 S; i < j S; n-I. for the joint density of the i'th and j'th order I.J.n-statistics in a sample of size n-l obtained by dropping Xr from the original set of n variables. Then
show that for 1 S; i S; n-I.
n
ihi+I:n(x) + (n-i)hi:n(x) = L hn_l(x) r=I
and for 2 S; i < j S; n.
(i-I) h ... (x.y) + (j-i)h. 1·. (x.y) + (n-j+l)h. 1· 1. (x.y) I.J.n 1- .J.n 1- .J- .n n [r]
= ;1 hi-Ij-I:n-I(x.y).
(ii) For the p-<>utlier model. that is. FI = F2 = ... = Fn-p = F and Fn_p+l = ... = Fn = 0. then
deduce the relations
ihi+I:n(x) + (n-i)hi:n(x) = (n-p)hr'"LI(x) + ph{~~_I(x). 1 S; i S; n-l. and
(i-l)h ... (x.y) + (j-i)h. 1·. (x.y) + (n-j+l)h. 1· 1. (x.y) IJ.n 1- .J.n 1- .J- .n
= (n-p)hfFll . 1. l(x.y) + phfOI] . 1. I(x.y), 2 S; i < j S; n. 1- .J- .n- 1- ,J- .n-
where h{~~_1 (x) and h/~~I (x) are the density functions of the i'th order statistic in a sample of size
n-l from the p-<>utlier model and the (p-I) - outlier model. respectively. and similarly for the joint
density functions.
(iii) Denote
and
Sn_m:n_m(x) = 1: Is;r 1 <r2<···<rmS;n
with Sl:n(x) == hI:n(x) and Sn:n(x) == hn:n(x). show that
n
Then by repeated application of the fIrst relation in (i),
hi:n(x) = L (_l~-i [ti] Sj:/x). 1 S; i S; n-I.
j=i and
n . . 1 J-n+l- [. 1]
hi:n(x) = 1: (-1) ~=i S1:j(x). 2 S; i S; n j=n-i+l
21. (Balakrishnan. 1989a). (i) Let Xl' X2' ...• Xn be independent random variables with Xi (i = I.2 •...• n)
having pdf fi(x) and cdf Fi(x). Then by making use of the relations given in Exercise 20, show that
(i-I) 0... + (j-i) 0. 1·. + (n-j+I) 0· 1· 1. I.J.n 1- .J.n 1- .J- .n
139
n
l {crf~L-l:n-l + [~f~Ln-l - ~i-l:n] H~Ln-l - ~j:n]} r=1
for 2 S; i < j S; n, where cr· .. and cr~r~.n 1 denote the covariances of the i'th and j'th order statistics IJ.n l,j. -in samples of size n and n-l (with Xr dropped), respectively.
(ii) For the p-outlier model, that is, Fl = F2 = ... = Fn-p = F and Fn-p+l = ... = Fn = G, deduce the relation
(i-I) cr... + G-i) cr. 1·. + (n-j+l) cr· 1· 1. IJ.n 1- ,j.n 1- J- .n
= (n-p) {crEL-l:n-l + [~ELn-l - ~i-l:n] [~j-l:n-l - ~j:n]} + p{crf~L-l:n-l + r~f~Ln-l - ~i-l:n] H~Ln-l - ~j:n]}'
where, as before, [F] and [01 denote the quantities in a sample of size .n-l from the p-outlier model
and the (P-l)-outlier model, respectively.
22. (Balakrishnan, 1989b). Let X1:n S; X2:n S; ••• S; Xn:n denote the order statistics obtained from n
independent absolutely continuous random variables Xi (i = 1,2, ... ,n), with Xi having pdf fi(x) and
cdf Fi(x). Let us denote the single and product moments of these order statistics by ~f~~ (1 S; i S; n;
k ~ 1) and ~ij:n (1 S; i < j S; n). Let the density functions fi(x) be all symmetric about O. Then for x > 0, let
0i(x) = 2 Fi(x) - 1 and gi(x) = 2 fi(x); that is, the density functions gi(x), i = 1,2, ... ,n, are obtained by folding the density functions fi(x) at
zero. Let Y I:Ii S; Y 2:n S; ••• S; Y n:n denote the order statistics obtained from n independent absolutely
continuous random variables Yi (i = 1.2, ... ,n). with Yi having pdf gi(x) and cdf 0i(x). Further. let us
(k)[rl' ... ,rel [rl' ... ,rel [rl' ... ,rel denote V.. D for the k'th single moment of Y.. D and v. .. 0 for the product moment
1. n-{. l.n-{. l,j .n-{. [r l' ...• r el [r l' ... ,r £1 [r l' ...• r £1
of Yi:n-£ and Yj :n-£ • where Yi:n-£ denotes the i'th order statistic in a sample of size n-£
obtained by dropping Y ,Y , ... ,Y from the original set of n variables Y1'Y2' ...• Y . ~ ~ ~ n
Then show that:
(i) For 1 S; i S; n and k = 1.2, ...•
(k) -n {i~1 \' (k)[r l' ...• r el k ~ ~i:n=2 L L vi-l:n-l +(-1) L
£=0 l:S;rl< ... <rtn £=i
\' (k)[ r l' ...• rn- el}. L vl-i+l:£ •
l:S;rl<···<rn_tn (ii) For 1 S; i < j S; n,
-n {i-l [r l' ... ,r el n [r1 •... ,rn_el ~ij:n = 2 l l Vi-lj-l:n-l+ II V£-j+l.l-i+l:£
l=O l:S;rl<.··<rtn £=j 1Yl< ... <rn_tn
j-l \' [r1, ... ,rel [rl+1, ... ,rn1} - l L vj-l :n-£ vl-i+l:£ .
l=i l:S;rl< ... <rtn (iii) Verify that for a single-outlier model, these results reduce to Relations 5.21 and 5.22.
140
23. (Balakrishnan, 1989c). Let Xl'X2, ... ,Xn be independent random variables with Xi (i = 1,2, ... ,n)
having pdf fi(x) and cdf F/x). Then by making use of the relations given in Exercise 20, show that:
(i) for I ::; i ::; m::; n-I and k = 1,2, ... ,
n-m \' (k) [r l' ... ,rn_ml \' L Ili:m = L
l::;r I < ... <rn-m::;n r==O
[i-l+r] [n-i -r] ~k) r n-m-r ll1+r:n'
and
(ii) for I ::; i < j ::; m ::; n-l,
[i-l+r] [j-i-l+s-r] [n- j -s]. .. r s-r n-m-s llI+r,s+J:n
l::;r 1 < ... <rn_m::;n r=O s= r
n-m n-m [r l' ... ,rn_ml \' \'
Ili,j:m = L L
For the p-outlier model, that is, Fl = ... == F == F and F +1 == ... = F == G, by denoting n-p n-p n
Il~~) [rl and 11··. [rl for the single and the product moments of order statistics in a sample of size m 1.m 1,J.m with r outliers present in it, deduce the following relations:
(iii) for 0 ::; p ::; nand 1::; i ::; m ::; n-l,
p n-m
L [n~;~r] [~] Il~~~[p-rl = L [i-;+r] [~=~=~] Il~~~:n[pl r=O r=O
and
(iv) for 0::; p ::; nand 1::; i < j ::; m ::; n-l,
p n-m n-m
L [n~;~r] [~] Ili,j:m[p-rl = L L [i-;+r] [j-i;~;s-r] [~=~~] Ili+r,s+j:n[p)· r=O r=O s=r