classifying logistic populations using sample medians

11
Journal of Statistical Planning and Inference 137 (2007) 1647 – 1657 www.elsevier.com/locate/jspi Classifying logistic populations using sample medians Parminder Singh a , , Satya N. Mishra b a Department of Mathematics, Guru Nanak Dev University, Amritsar-143005, India b Department of Mathematics and Statistics, University of South Alabama, Mobile, AL 36688, USA Available online 10 October 2006 Abstract Consider k( 2) independent populations 1 ,..., k such that an observation from population i follows a logistic distribution with unknown location parameter i and common known scale parameter 2 ,i = 1,...,k. Let [1] ··· [k] be the unknown ordered values of s and the population associated with [k] be the upper extreme population (UEP) and the population associated [1] be the lower extreme population (LEP). In this paper, we discuss a procedure on the lines of Liu [On a multiple three-decision procedure for comparing several treatments with a control. Austral. J. Statist. 39, 79–97] and Boher [Multiple three-decision rules for parametric signs. J. Amer. Statist. Assoc. 74, 432–437], for classifying k logistic populations by the location parameters as better or worse than a control/standard population. In the absence of any standard/control population, we propose a selection procedure for simultaneous selection of two non-empty random size subsets, one containing population associated with largest mean and the other containing population associated with smallest mean with a pre-specified probability P (1/k(k 1)<P < 1). The required selection constants to implement the proposed procedures are tabulated. Using these selection constants, the simultaneous confidence intervals for the parameters [i ] [1] ,i = 2,...,k, [k] [i ] ,i = 1,...,k 1, are constructed. A simple instructive numerical example is given. © 2006 Elsevier B.V.All rights reserved. Keywords: Subset selection; Type-III error; Control population; Probability of correct selection; Simultaneous confidence intervals 1. Introduction Let 1 ,..., k be k( 2) independent populations such that the population i is characterized by the logistic distribution with cumulative distribution function (cdf) F i (x) = 1 1 + exp{−(x i )/ 3} , |x | < , where i (−∞ < i < ) is the unknown location parameter and 2 is the known common variance, i = 1,...,k. Let [1] ··· [k] be the unknown ordered values of s, the population associated with [k] be the upper extreme population (UEP) and the population associated [1] be the lower extreme population (LEP). Without loss of generality, we assume the known value of as unity. In Ranking and selection problems, the two familiar approaches are the Corresponding author. E-mail address: [email protected] (P. Singh). 0378-3758/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2006.09.012

Upload: parminder-singh

Post on 26-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Statistical Planning and Inference 137 (2007) 1647–1657www.elsevier.com/locate/jspi

Classifying logistic populations using sample medians

Parminder Singha,∗, Satya N. Mishrab

aDepartment of Mathematics, Guru Nanak Dev University, Amritsar-143005, IndiabDepartment of Mathematics and Statistics, University of South Alabama, Mobile, AL 36688, USA

Available online 10 October 2006

Abstract

Consider k(�2) independent populations �1, . . . , �k such that an observation from population �i follows a logistic distributionwith unknown location parameter �i and common known scale parameter �2, i = 1, . . . , k. Let �[1] � · · · ��[k] be the unknownordered values of �s and the population associated with �[k] be the upper extreme population (UEP) and the population associated�[1] be the lower extreme population (LEP). In this paper, we discuss a procedure on the lines of Liu [On a multiple three-decisionprocedure for comparing several treatments with a control. Austral. J. Statist. 39, 79–97] and Boher [Multiple three-decision rulesfor parametric signs. J. Amer. Statist. Assoc. 74, 432–437], for classifying k logistic populations by the location parameters as betteror worse than a control/standard population. In the absence of any standard/control population, we propose a selection procedurefor simultaneous selection of two non-empty random size subsets, one containing population associated with largest mean and theother containing population associated with smallest mean with a pre-specified probability P ∗(1/k(k − 1) < P ∗ < 1). The requiredselection constants to implement the proposed procedures are tabulated. Using these selection constants, the simultaneous confidenceintervals for the parameters �[i] − �[1], i = 2, . . . , k, �[k] − �[i], i = 1, . . . , k − 1, are constructed. A simple instructive numericalexample is given.© 2006 Elsevier B.V. All rights reserved.

Keywords: Subset selection; Type-III error; Control population; Probability of correct selection; Simultaneous confidence intervals

1. Introduction

Let �1, . . . , �k be k (�2) independent populations such that the population �i is characterized by the logisticdistribution with cumulative distribution function (cdf)

Fi(x) = 1

1 + exp{−�(x − �i )/�√

3} , |x| < ∞,

where �i (−∞ < �i < ∞) is the unknown location parameter and �2 is the known common variance, i = 1, . . . , k.Let �[1] � · · · ��[k] be the unknown ordered values of �s, the population associated with �[k] be the upper extremepopulation (UEP) and the population associated �[1] be the lower extreme population (LEP). Without loss of generality,we assume the known value of � as unity. In Ranking and selection problems, the two familiar approaches are the

∗ Corresponding author.E-mail address: [email protected] (P. Singh).

0378-3758/$ - see front matter © 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2006.09.012

1648 P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657

indifference zone formulation due to Bechhofer (1954) and the subset selection approach due to Gupta (1965). Sincethe inception of these approaches, many authors have proposed selection procedures specific to a family of probabilitydistributions. Tong (1969) considered the problem of classifying the set of k normal populations with respect tocontrol population. Patel and Wyckoff (1990) used sample quasi ranges for classifying a set of k normal populationsby their variances as better or worse than a control. Bohrer (1979) proposed a multiple three-decision procedure fortwo-sided comparisons of the treatments better and worse than the control with respect to location parameters undernormal probability model. Later, Liu (1997) gave extensive tables for critical points to facilitate the applications ofthe procedure due to Bohrer et al. (1981). Mishra (1986) and Mishra and Dudewicz (1987) extended subset selectionapproach to simultaneous selection of extreme populations. Initially, Hsu (1981) discussed the multiple comparisonswith the best that can be made simultaneously with the selection statement without decreasing the nominal level of thelatter. Lam (1986, 1989) proposed simultaneous confidence intervals for the ranked parameters �[i] − �[j ], for i �= j

and �[k] − �i , i = 1, . . . , k − 1, by taking k normal populations. Misra and Dhariyal (1993) proposed a set of 100P ∗%simultaneous confidence intervals for all distances from the best and the worst populations.

The logistic density function is symmetric about zero and is more peaked in the center than the normal densityfunction. The hazard function of logistic distribution is proportional to the cdf, which makes it useful as a growthcurve model. The logistic distributions play a very important role in economics, health, social sciences, and logisticregression, etc. For detailed applications we refer to Balakrishan (1992), Gupta and Han (1991), Berkson (1944) andGupta (1962). Moreover, the shape of the logistic distribution has close similarity, in many aspects, with the normaldistribution. However, the logistic distribution has heavier tails as compared to normal distribution, hence can be usedas best approximation when there is a suspicion of an outlier.

Lorenzen and McDonald (1981) were the first to propose a selection rule, based on the sample medians, to select asubset of logistic populations. Han (1987) studied the rule based on sample means and computed approximate selectionconstants. Later, Vander Laan (1989) studied the subset selection rule for the logistic distributions based on the samplemeans, when the common sample size is equal to one. Recently, Singh and Gill (2003) proposed a selection procedure,based on sample medians, for selecting good logistic populations.

Throughout the paper, we shall use the following convention: Xi1, . . . , Xin is a random sample of size n = 2m − 1from the population �i , i = 0, 1, . . . , k. Xi:m is the sample median; Fm(.) and fm(.) are, respectively, the cdf andpdf of the median of sample drawn from the standard logistic population, i.e. logistic population with mean 0 andunit variance. Let X[1]m � · · · �X[k]m denote the known numerical ordering of Xs; and X(i)m be the unknown valueof the sample median associated with the �[i]. Suppose �(i) is the population associated with �[i], i = 1, . . . , k. Let�U(�, 0) = {(i : �i − �0 > 0, i = 1, . . . , k}; �L(�, 0) = {i : �i − �0 < 0, i = 1, . . . , k}; and let b+(b−) mean a valuejust greater (less) than b but not equal to b. Further, let l denote the cardinality of the set �U.

In Section 2, we discuss a procedure, based on sample medians, for classifying k logistic populations by their locationparameters as better or worse than standard or control. A selection procedure for simultaneous selection of extremepopulations and its infimum of probability of correct selection (CS) are given in Section 3. Section 4 deals with thesimultaneous upper confidence intervals for �[i] − �[1], i = 2, . . . , k, �[k] − �[i], i = 1, . . . , k − 1.

2. Classification with respect to control

Let �0, �1, . . . , �k be (k + 1) populations such that an observation from population �i follows logistic distributionwith unknown mean �i and known variance �2

i , i = 0, 1, . . . , k. The population �i (�0) is called the treatment (controlor standard) population, i = 1, . . . , k. The treatment population �i is called better than the control in terms of locationparameter if �i − �0 > 0 and is termed worse than the control if �i − �0 < 0, i = 1, . . . , k.

In many experimental situations, there is prior information that k treatments differ from the standard/control and itmay be of interest to decide which treatments are better than control and which are worse. For example, in comparingseveral varieties of wheat with a control variety, it is natural to ask which varieties are better (in terms of average yield)than control and which ones are not.

Classifying a treatment better than control when it is actually worse (and vice versa) is considered as misclassifi-cation whenever the comparisons are restricted in well-defined terms. A common statistical problem is the two-sidedcomparisons of treatments better than the control and the treatments worse than the control such that for a given value of� (0 < � < 1) the probability of no misclassification is at least 1−�, irrespective of the true configuration of parameters.

P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657 1649

Bohrer (1979) proved that the third decision is necessary to control the probability of no misclassification at 1 − �, for� < 0.5.

A multiple three-decision procedure, say R1, for the two-sided comparisons of the treatments with the control,in order to decide which treatments are better than the control and which ones are worse than the control, is asfollows:

infer �i − �0 > 0 if Xi:m > X0:m + c,

infer �i − �0 < 0 if Xi:m < X0:m − c,

make no decision on sign of (�i − �0) if |Xi:m − X0:m|�c, (2.1)

where for a pre-specified small value of � (0 < � < 1) the critical point c (> 0) is chosen such that

a(�|R1) = P [no misclassification of any treatment]�1 − �, for all � ∈ Rk+1. (2.2)

Misclassification of treatment �i , also called type-III error, meaning a decision of a positive �i − �0 while itis actually negative (or, conversely, that a decision of negative �i − �0 while it is actually positive). It may benoted that probability requirement (2.2) controls type-III error of (2.1) at level � but type-I error rate is not con-trolled at level �. To compute the critical point c satisfying (2.2), it is required to find a �0�Rk+1 which gives

a(�0|R1) = inf�a(�|R1). Such a vector �0 is referred to as the least favorable point. For the procedure (2.1),we have

a(�|R1) = P [Xi:m − X0:m � − c, for all i ∈ �U(�, 0) and Xj :m − X0:m �c, for all j ∈ �L(�, 0)]= P [Xi:m − �i � − c + X0:m − �0 − �i0 for all i ∈ �U(�, 0)

and Xj :m − �j �c + X0:m − �0 − �j0 for all j ∈ �L(�, 0)],

�p0 = �p − �0, p = 1, . . . , k. Since �i0 > 0 for i��U(�, 0) and �j0 < 0 for j��L(�, 0), therefore

a(�|R1)�P [Xi:m − �i �X0:m − �0 − c for all i ∈ �U(�, 0)

and Xj :m − �j �X0:m − �0 + c for all j ∈ �L(�, 0)].

Using the above discussion, and the symmetric property of the pdf fm(.) of sample median of size n=2m−1, and alsothe results proved by Tong (1969), we state the following theorem, which enables us to find a ��Rk+1 that minimizesa(�|R1).

Theorem 2.1. Let l denote the integer part of k/2 and �0 = (0, 0+, . . . , 0+, 0−, . . . , 0−) �Rk+1, where the last k − l

components equal to 0−, middle l components equal to 0+. Then

inf�

a(�|R1) = a(�0|R1)

=∫ ∞

−∞[1 − Fm{x − c}]l[Fm{x + c}]k−lfm(x) dx. (2.3)

It can be easily seen that the pdf and cdf of the sample median, based on sample of size n = 2m − 1, from a standardlogistic population, are

fm(x) = (2m)

2(m)a[e(−ax)]m[1 + e(−ax)]−2m, |x| < ∞ (2.4)

1650 P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657

Table 1Values for c for � = .10

m/k 2 3 4 5 6 7 8 9 10

1 1.7574 2.3084 2.5464 2.7286 2.8560 2.9657 3.0528 3.1312 3.19732 1.1167 1.4501 1.5938 1.7025 1.7775 1.8418 1.8923 1.9376 1.97553 .8786 1.1363 1.2472 1.3307 1.3880 1.4370 1.4753 1.5097 1.53824 .7468 .9639 1.0573 1.1274 1.1754 1.2163 1.2483 1.2770 1.30075 .6605 .8515 .9337 .9952 1.0372 1.0731 1.1011 1.1261 1.14696 .5985 .7709 .8451 .9005 .9384 .9707 .9959 1.0184 1.03707 .5512 .7096 .7777 .8285 .8633 .8929 .9159 .9365 .95368 .5135 .6608 .7241 .7714 .8037 .8312 .8525 .8717 .88759 .4826 .6209 .6803 .7247 .7549 .7807 .8007 .8186 .8335

10 .4567 .5874 .6436 .6855 .7141 .7384 .7573 .7742 .788211 .4346 .5588 .6122 .6520 .6792 .7023 .7202 .7363 .749612 .4154 .5340 .5850 .6230 .6490 .6710 .6881 .7035 .716113 .3985 .5123 .5612 .5976 .6224 .6436 .6600 .6747 .686814 .3836 .4930 .5400 .5750 .5989 .6192 .6350 .6491 .660815 .3702 .4757 .5211 .5548 .5779 .5975 .6127 .6263 .6375

Table 2Values for c for � = .05

m/k 2 3 4 5 6 7 8 9 10

1 2.3099 2.8199 3.0548 3.2322 3.3586 3.4665 3.5531 3.6306 3.69652 1.4508 1.7507 1.8895 1.9928 2.0658 2.1277 2.1771 2.2210 2.25823 1.1367 1.3658 1.4719 1.5504 1.6057 1.6524 1.6895 1.7225 1.75034 .9642 1.1561 1.2449 1.3104 1.3565 1.3953 1.4261 1.4535 1.47655 .8518 1.0199 1.0978 1.1551 1.1954 1.2293 1.2561 1.2800 1.30006 .7711 .9226 .9928 1.0443 1.0805 1.1109 1.1350 1.1564 1.17447 .7097 .8486 .9130 .9602 .9933 1.0212 1.0432 1.0628 1.07928 .6610 .7900 .8497 .8935 .9243 .9501 .9705 .9887 1.00399 .6210 .7420 .7980 .8390 .8678 .8920 .9112 .9281 .9424

10 .5876 .7018 .7547 .7934 .8206 .8435 .8615 .8775 .890911 .5590 .6675 .7177 .7545 .7803 .8020 .8191 .8343 .847012 .5342 .6378 .6857 .7208 .7455 .7661 .7825 .7969 .809113 .5124 .6117 .6576 .6913 .7149 .7347 .7503 .7642 .775814 .4931 .5886 .6327 .6651 .6877 .7068 .7218 .7351 .746315 .4758 .5679 .6105 .6417 .6635 .6818 .6963 .7092 .7199

and

Fm(x) = (2m)

2(m)

m−1∑j=0

(m − 1

j

)(2m − j − 1)−1(−1)m−1−j [1 + e−ax]j+1−2m, (2.5)

where a = �/√

3.The integral on the right-hand side of (2.3) is obtained numerically using Gaussian quadrature, by taking fm(x) and

Fm(x) as given in (2.4) and (2.5), respectively, and the constants c evaluated from the equation a(�0|R1) = 1 − � for1 − � = .90, .95 and .99 are presented in Tables 1–3 for m = 1(1)15 and k = 1(1)10.

As we have already mentioned that the multiple three decision does not control the type-I error at level �. Now, wewill prove that the proposed three-decision controls the type-I error at 2�. For the multiple three decision procedure,the type-I error may be defined as the event of inferring the population or treatment �i to be better or worse than thecontrol population �0 when actually they are equally good as control population, i.e. �i = �0. Below we show that therate of this error is controlled at level 2�. Let P0(A) be the probability of the event A when �i = �0, ∀i.

P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657 1651

Table 3Values for c for � = .1

m/k 2 3 4 5 6 7 8 9 10

1 3.4563 3.9183 4.1494 4.3207 4.4460 4.5515 4.6375 4.7138 4.77932 2.1148 2.3728 2.5037 2.5992 2.6691 2.7274 2.7749 2.8168 2.85263 1.6403 1.8324 1.9303 2.0012 2.0531 2.0962 2.1312 2.1620 2.18834 1.3842 1.5429 1.6238 1.6823 1.7249 1.7604 1.7891 1.8143 1.83595 1.2190 1.3569 1.4273 1.4780 1.5150 1.5457 1.5705 1.5924 1.61106 1.1014 1.2248 1.2879 1.3333 1.3664 1.3938 1.4159 1.4354 1.45217 1.0122 1.1250 1.1826 1.2240 1.2541 1.2791 1.2994 1.3171 1.33228 .9417 1.0460 1.0994 1.1377 1.1656 1.1887 1.2074 1.2238 1.23789 .8840 .9817 1.0316 1.0674 1.0935 1.1150 1.1325 1.1478 1.1609

10 .8358 .9279 .9749 1.0086 1.0332 1.0535 1.0700 1.0844 1.096711 .7947 .8820 .9267 .9586 .9819 1.0012 1.0167 1.0304 1.042012 .7591 .8423 .8849 .9153 .9375 .9559 .9707 .9837 .994913 .7279 .8076 .8483 .8775 .8987 .9163 .9304 .9429 .953514 .7003 .7768 .8159 .8439 .8643 .8811 .8948 .9067 .916915 .6755 .7493 .7870 .8139 .8336 .8498 .8629 .8744 .8842

Then, we have

P0[no error] = 1 − P0[at least one error]

= 1 − P0

[max

1� i �k(Xi:m − X0:m) > c or min

1� i �k(Xi:m − X0:m) < − c

]

�1 − P0

[max

1� i �k(Xi:m − X0:m) > c

]− P0

[min

1� i �k(Xi:m − X0:m) < − c

]

= P0

[max

1� i �k(Xi:m − X0:m) < c

]+ P0

[min

1� i �k(Xi:m − X0:m) > − c

]− 1

=∫ ∞

−∞[Fm{x + c}]kfm(x) dx +

∫ ∞

−∞[1 − Fm{x − c}]kfm(x) dx − 1

�2∫ ∞

−∞[1 − Fm{x − c}]l[Fm{x + c}]k−lfm(x) dx − 1 (by using Theorem 2.1)

�2(1 − �) − 1

= 1 − 2�.

Thus, the multiple three-decision procedure controls the rate of type-I error at 2�.For determining the efficiencies of the proposed procedure, we define the power function, say b(�, �), as below:b(�, �)=P(correct classification of all those treatments with|�i −�0| > �), where � > 0 and it specifies the smallest

absolute effect from the control. Let b(�, �) equal to unity for those � satisfying max1� i �k|�i−�0|��, as no treatmentsneed to be detected for such �. Since there is no prior information about the true configuration of �, we use the leastfavorable power

b(�) = min�

b(�, �)

as a sensitivity measure of the procedure. Now,

b(�, �|R1) = P [Ti � − c, for all i ∈ �U(�, �)and Tj �c, for all j ∈ �L (�, �)].The following theorem gives the least favorable power of the proposed procedure.

1652 P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657

Table 4

m k � = 0 � = .1 � = .2 � = .3 � = .4 � = .5 � = .6 � = .7 � = .8 � = .9

5 3 .9500 .9684 .9805 .9883 .9932 .9961 .9978 .9988 .9993 .99967 5 .9500 .9721 .9851 .9923 .9962 .9982 .9992 .9996 .9998 .99997 7 .9500 .9724 .9854 .9926 .9964 .9983 .9992 .9997 .9999 .9999

10 7 .9500 .9760 .9893 .9955 .9982 .9993 .9998 .9999 1.000 1.000

Theorem 2.2. Let l denote the integer part of k/2 and �0 = (0, �+, . . . , �+, −�−, . . . ,−�−)�Rk+1, where the last

k − l components are equal to �−, middle l components equal to �+. Then

b(�) = inf�

a(�, �|R1) = a(�0|R1)

=∫ ∞

−∞[1 − Fm{x − c − �}]l[Fm{x + c + �}]k−lfm(x) dx.

Proof. The proof is analogous to that of Theorem 2.1.The values of the least favorable power b(�), computed for the proposed procedure R1 under the parametric config-

uration �0 = (0, �+, . . . , �+, −�−, . . . ,−�−)�Rk+1 for variation in � (� = .1(.1)1) by taking the selected values of k

and n, are presented in the Table 4 where it is clear that the value of the least favorable power b(�) increases with thevalue of � and m.

3. Simultaneous selection of extreme populations

Here, our goal is simultaneous selection of two non-empty random size subsets, one subset, say SU, containing theUEP and the other, say SL, containing the LEP. Thus, a CS event will occur if �(k) ∈ SU and �(1) ∈ SL. For a given

P ∗(

1k(k−1)

< P ∗ < 1)

any subset selection procedure, say R2, is said to satisfy P ∗ condition if

P�[CS|R2]�P ∗ for all � ∈ . (3.1)

The proposed selection procedure is: R2: Include population �i in the subset SU, iff

Xi:m �X[k]m − c,

i.e., choose the set SU as

SU = {i : Xi:m �X[k]m − c}and include �i in the subset SL, iff

Xi:m �X[1]m + c,

i.e., choose the set SL as

SL = {i : Xi:m �X[1]m + c},where the selection constant c is determined so that probability requirement (3.1) is satisfied. The following theoremgives the expression for the probability of CS and its infimum.

Theorem 3.1. Let c be the constant satisfying the equation∫ ∞

−∞

∫ y+c

−∞(Fm(y + c) − Fm(x − c))k−2fm(x)fm(y) dx dy = P ∗. (3.2)

P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657 1653

Then, for the proposed selection rule R2, we get

inf�∈

P�(CS|R2) = P ∗.

Proof.

P�(CS|R2)

= P [X(k)m �X[k]m − c, and X(1)m �X[1]m + c]= P [X(i)m �X(k)m + c, i = 1, . . . , k − 1 and X(j)m �X(1)m − c, j = 2, . . . , k]= P [X(i)m �X(k)m + c, i = 1, . . . , k − 1 and X(j)m �X(1)m − c, j = 2, . . . , k]= P [X(1)m − c�X(i)m �X(k)m + c, i = 2, . . . , k − 1 and X(1)m �X(k)m + c]

=∫ ∞

−∞

∫ y+c+�[k]−�[1]

−∞P [x − c + (�[1] − �[i])�X(i)m − �[i] �y + (�[k] − �[i]) + c, i = 2, . . . , k − 1]

× fm(x)fm(y) dx dy

=∫ ∞

−∞

∫ y+c+�[k]−�[1]

−∞P [x − c + (�[1] − �[i])�X(i)m − �[i] �y + (�[k] − �[i]) + c, i = 2, . . . , k − 1]

× fm(x)fm(y) dx dy

=∫ ∞

−∞

∫ y+c+�[k]−�[1]

−∞

k∏i=2

(Fm(y + c + (�[k] − �[i]))) − Fm(x − c + (�[1] − �[i]))fm(x)fm(y) dx dy (3.3)

�∫ ∞

−∞

∫ y+

−∞[Fm(y + c) − Fm(x − c)]k−2fm(x)fm(y) dx dy (3.4)

= inf�∈

P�(CS|R2).

The inequality in (3.4) holds since the probability in (3.4) is greater than or equal to the probability obtained by putting�[1] = · · · = �[k]. This proves the Theorem 3.1. �

3.1. Computation of selection constants

By taking fm(x) and Fm(x) as given in (2.4) and (2.5), respectively, we have

∫ ∞

−∞

∫ y+c

−∞(Fm(y + c) − Fm(x − c))k−2fm(x)fm(y) dx dy

=(

(2m)

2(m)

)k ∫ ∞

−∞

∫ y+c

−∞

⎧⎨⎩

m−1∑j=0

(m − 1

j

)(2m − j − 1)−1(−1)m−1−j

⎫⎬⎭

k−2

× {[1 + e−a(y+c)]j+1−2m − [1 + e−a(x−c)]j+1−2m}k−2{a[e−ax]m[1 + e−ax]−2m}

× {a[e−ay]m[1 + e−ay]−2m} dx dy.

1654 P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657

Table 5Values for c for P ∗ = .90

m/k 2 3 4 5 6 7 8 9 10

1 1.7575 2.4719 2.7899 2.9942 3.1445 3.2632 3.3611 3.4446 3.51712 1.1168 1.5531 1.7413 1.8601 1.9463 2.0138 2.0692 2.1159 2.15653 .8787 1.2171 1.3611 1.4514 1.5166 1.5674 1.6089 1.6438 1.67404 .7468 1.0324 1.1532 1.2286 1.2829 1.3251 1.3595 1.3884 1.41345 .6606 .9121 1.0180 1.0840 1.1314 1.1682 1.1981 1.2233 1.24506 .5985 .8257 .9212 .9806 1.0232 1.0562 1.0831 1.1057 1.12517 .5512 .7600 .8476 .9019 .9410 .9712 .9957 1.0164 1.03418 .5135 .7078 .7892 .8396 .8758 .9038 .9266 .9457 .96219 .4827 .6650 .7413 .7886 .8225 .8487 .8700 .8879 .9033

10 .4567 .6292 .7012 .7459 .7779 .8026 .8227 .8396 .854111 .4346 .5986 .6670 .7094 .7398 .7633 .7824 .7983 .812112 .4154 .5720 .6374 .6779 .7068 .7292 .7474 .7627 .775813 .3986 .5487 .6113 .6501 .6779 .6993 .7167 .7313 .743914 .3836 .5280 .5883 .6256 .6522 .6728 .6896 .7036 .715615 .3702 .5095 .5676 .6035 .6293 .6492 .6653 .6788 .6904

Table 6Values for c for P ∗ = .95

m/k 2 3 4 5 6 7 8 9 10

1 2.3100 2.9978 3.3086 3.5091 3.6574 3.7745 3.8716 3.9543 4.02612 1.4509 1.8605 2.0401 2.1545 2.2379 2.3034 2.3572 2.4028 2.44243 1.1368 1.4511 1.5872 1.6732 1.7357 1.7845 1.8245 1.8583 1.88764 .9643 1.2281 1.3415 1.4129 1.4646 1.5050 1.5380 1.5659 1.58995 .8518 1.0834 1.1826 1.2448 1.2898 1.3249 1.3536 1.3777 1.39856 .7712 .9799 1.0691 1.1250 1.1653 1.1968 1.2224 1.2439 1.26267 .7098 .9013 .9830 1.0341 1.0710 1.0997 1.1231 1.1428 1.15988 .6610 .8390 .9147 .9621 .9963 1.0229 1.0445 1.0627 1.07859 .6210 .7880 .8589 .9033 .9353 .9602 .9804 .9975 1.0121

10 .5876 .7453 .8123 .8541 .8843 .9077 .9268 .9428 .956711 .5590 .7088 .7724 .8122 .8407 .8630 .8811 .8963 .909412 .5342 .6772 .7379 .7758 .8031 .8243 .8415 .8560 .868513 .5124 .6495 .7076 .7440 .7701 .7904 .8069 .8208 .832714 .4931 .6250 .6809 .7157 .7408 .7603 .7762 .7895 .801015 .4759 .6030 .6569 .6905 .7147 .7334 .7487 .7615 .7726

Putting u = [1 + e−ax]−1, v = [1 + e−ay]−1 and letting c1 = e−ac, the equation becomes

∫ ∞

−∞

∫ y+c

−∞(Fm(y + c) − Fm(x − c))k−2fm(x)fm(y) dx dy

=(

(2m)

2(m)

)k ∫ 1

0

∫ y/((1−c1)y+c1)

0

⎧⎨⎩

m−1∑j=0

(m − 1

j

)(2m − j − 1)−1(− mathrm1)m−1−j

⎫⎬⎭

k−2

× {[v/((1 − c1)v + c1)]−j−1+2m − [uc1/((c1 − 1)u + 1)]−j−1+2m}k−2{[1 − u]m−1[u]m−1}× {[1 − v]m−1[v]m−1} du dv.

The constants satisfying equation are computed numerically. Computed values of constants are tabulated in Tables 5–7for m = 2(1)15, k = 2(1)10 and P ∗ = .90, .95, .99.

P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657 1655

Table 7Values for c for P ∗ = .99

m/k 2 3 4 5 6 7 8 9 10

1 3.4566 4.1127 4.4140 4.6104 4.7549 4.8695 4.9660 5.0470 5.11812 2.1153 2.4884 2.6549 2.7625 2.8419 2.9045 2.9559 2.9998 3.03803 1.6404 1.9203 2.0440 2.1232 2.1812 2.2271 2.2642 2.2963 2.32414 1.3843 1.6159 1.7179 1.7827 1.8304 1.8675 1.8983 1.9242 1.94675 1.2190 1.4207 1.5090 1.5653 1.6062 1.6385 1.6651 1.6873 1.70666 1.1016 1.2820 1.3612 1.4115 1.4480 1.4766 1.5002 1.5200 1.53727 1.0124 1.1776 1.2496 1.2953 1.3286 1.3546 1.3761 1.3941 1.40988 .9419 1.0946 1.1613 1.2035 1.2345 1.2584 1.2783 1.2949 1.30949 .8841 1.0273 1.0895 1.1290 1.1577 1.1801 1.1988 1.2142 1.2276

10 .8359 .9709 1.0295 1.0667 1.0937 1.1149 1.1322 1.1469 1.159611 .7947 .9229 .9784 1.0136 1.0394 1.0593 1.0758 1.0898 1.101612 .7592 .8813 .9343 .9678 .9922 1.0113 1.0269 1.0402 1.051513 .7281 .8449 .8956 .9278 .9511 .9693 .9842 .9970 1.007714 .7003 .8126 .8613 .8922 .9145 .9321 .9464 .9585 .968915 .6757 .7839 .8307 .8605 .8821 .8988 .9126 .9243 .9343

Remark. It is easy to verify that pdf fm(x − �) of sample median based on sample size n = 2m − 1 from a logisticdistribution with mean � and variance unity has monotone likelihood ratio property. This is a necessary and suffi-cient condition to show that 2kP ∗ �max�∈E(S|R2)�2k (see Mishra, 1986), where S is the selected subset size.Furthermore, using the arguments of Mishra (1986), one can verify monotonicity property of the proposed selectionprocedure.

4. Simultaneous confidence intervals

In the following theorem we discuss the simultaneous upper confidence intervals for the parameters {�[i] − �[1], i =2, . . . , k, �[k] − �[i], i = 1, . . . , k − 1}.

Theorem 4.1. Let c1 and c2 be two positive constants satisfying the equation

∫ ∞

−∞

∫ y+c∗

−∞(Fm(y + c2) − Fm(x − c1))

k−2fm(x)fm(y) dx dy = P ∗, (4.1)

where c∗ = min(c1, c2, then a set of 100P ∗% simultaneous upper confidence intervals for the parameters {�[2] −�[1], . . . , �[k−1] − �[1], �[k] − �[1], �[k] − �[2], . . . , �[k] − �[k−1]} is given by {[0, D2], . . . , [0, Dk−1], [0, D], [0,

D2], . . . , [0, Dk−1]}, where D = X[k]m − X[1]m + c∗, Di = X[i]m − X[1]m + c1, and Di = X[k]m − X[i]m + c2,

i = 2, . . . , k − 1.

Proof. Define the event A as

A =k⋂

i=2

k−1⋂j=1

{X(i)m − �[i] �X(1)m − �[1] − c1, X(k)m − �[k] �X(j)m − �[j ] − c2}.

Obviously,

P�(A) =∫ ∞

−∞

∫ y+c

−∞(Fm(y + c1) − Fm(x − c2))

k−2fm(x)fm(y) dx dy = P ∗, � ∈ .

1656 P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657

Moreover,

A =k⋂

i=2

k−1⋂j=1

{X(i)m − X(1)m + c1 ��[i] − �(1), X(k)m − X(j)m + c2 ��[k] − �[j ]}

⊆k⋂

i=2

k−1⋂j=1

{X(i)m − X[1]m + c1 ��[i] − �[1], X[k]m − X(j)m + c2 ��[k] − �[j ]}

⊆k⋂

i=2

k⋂s=i

k−1⋂j=1

j⋂t=1

{X(s)m − X[1]m + c1 ��[i] − �[1], X[k]m − X(t)m + c2 ��[k] − �[j ]}

=k⋂

i=2

k−1⋂j=1

{min

i � s �kX(s)m − X[1]m + c1 ��[i] − �[1], X[k]m − max

1� t � jX(t)m + c2 ��[k] − �[j ]

}

⊆k⋂

i=2

k−1⋂j=1

{X[i]m − X[1]m + c1 ��[i] − �[1], X[k]m − X[j ]m + c2 ��[k] − �[j ]}.

Thus,

P�

⎡⎣ k⋂

i=2

k−1⋂j=1

{X[i]m − X[1]m + c1 ��[i] − �[1], X[k]m − X[j ]m + c2 ��[k] − �[j ]}⎤⎦ �P ∗.

Hence the proof follows.

5. An example

Random samples of common sample size n = 7 (m = 4) generated from logistic populations with mean �i andvariance one, i = 0, 1, . . . , 4, are given below:

�0(�0 = 3.0) − .4634 3.0699 2.9212 2.2858 2.9419 2.2320 4.3472,

�1(�1 = 2.5) 2.1827 3.4395 2.7718 1.5311 3.5330 3.1481 2.5956,

�2(�2 = 5.0) 4.7008 4.1997 1.5544 5.5713 4.9615 4.9413 3.6445,

�3(�3 = 1.0) .4803 2.0398 .5040 1.4225 .7829 .8523 1.1959,

�4(�4 = 1.5) .3481 .1338 1.0089 .5735 − 3.0307 − 1.0026 .0422.

From the above data, we have

X0:4 = 2.9212, X1:4 = 2.7718, X2:4 = 4.7008, X3:4 = .8523, X4:4 = .1338.

For m = 4, k = 4 and � = .05, Table 2 gives c = 1.2449. Using procedure given in (2.1), we make no decision on thesign of �1 − �0, and infer (�2 − �0) > 0, (�3 − �0) < 0 and (�4 − �0) < 0.

Acknowledgements

The authors are thankful to the referees for their valuable suggestions and comments which led to an appreciableimprovement of this paper.

P. Singh, S.N. Mishra / Journal of Statistical Planning and Inference 137 (2007) 1647–1657 1657

References

Balakrishan, N., 1992. Handbook of the Logistic Distribution. Marcel Dekker, New York.Bechhofer, R.E., 1954. A single-sample multiple decision procedure for ranking means of normal populations with known variance. Ann. Math.

Statist. 25, 16–39.Berkson, J., 1944. Application of the logistic function to bio-assay. J. Amer. Statist. Assoc. 39, 357–365.Bohrer, R., 1979. Multiple three-decision rules for parametric signs. J. Amer. Statist. Assoc. 74, 432–437.Bohrer, Chow, W., Faith, R., Joshi, V.M., Wu, C.F., 1981. Multiple three-decision rules for factorial simple effects: Bonferroni wins again!. J. Amer.

Statist. Assoc. 76, 119–124.Gupta, S.S., 1962. Life test sampling plans for normal and lognormal distributions. Technometrics 4, 151–175.Gupta, S.S., 1965. On some multiple decision (selection and ranking) rules. Technometrics 7, 225–245.Gupta, S.S., Han, S., 1991. An elimination type two-stage procedure for selecting the population with the largest mean from K logistic populations.

Amer. J. Math. Manage. Sci. 11, 351–370.Han, S., 1987. Contributions to selection and ranking theory with special reference to logistic population. Ph.D. Thesis (also Technical Report No.

87-138). Department of Statistics, Prudue University, West Lafayette, Indiana.Hsu, J.C., 1981. Simultaneous confidence intervals for testing for all distances from the best. Ann. Math. Statist. 9, 1026–1034.Lam, K., 1986. A new procedure for selecting good populations. Biometrica 73, 201–206.Lam, K., 1989. The multiple comparison of ranked parameters. Comm. Statist. Theory Methods 18 (4), 1217–1237.Liu, W., 1997. On a multiple three-decision procedure for comparing several treatments with a control. Austral. J. Statist. 39, 79–92.Lorenzen, T.J., McDonald, G.C., 1981. Selecting logistic populations using the sample medians. Comm. Statist. Theory Methods 10, 101–124.Mishra, S.N., 1986. Simultaneous selection of extreme populations: scale parameter case. Progr. Mathematics 20, 91–105.Mishra, S.N., Dudewicz, E.J., 1987. Simultaneous selection of extreme populations: a subset selection approach. Biometrical J 29, 471–483.Misra, N., Dhariyal, I.D., 1993. Simultaneous confidence intervals for all distances from the worst and best populations. Comm. Statist. Theory

Methods 22 (11), 3097–3116.Patel, J.K., Wyckoff, J., 1990. Classifying normal populations with respect to control using sample quasi ranges on censored data. Amer. J. Math.

Manage. Sci. 10, 367–385.Singh, P., Gill, A.N., 2003. A class of selection procedures for good logistic populations. J. Korean Statist. Soc. 32 (3), 299–309.Tong, Y.I., 1969. On partitioning a set of normal populations by their locations with respect to a control. Ann. Math. Statist. 40, 1300–1324.Vander Laan, 1989. Selection from logistic populations. Statist. Neerlandica 43, 169–174.