multiple comparisons based on a modified one-step m-estimator

12
This article was downloaded by: [Northeastern University] On: 05 December 2014, At: 16:58 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Applied Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cjas20 Multiple comparisons based on a modified one-step M-estimator Rand Wilcox a a Department of Psychology , University of Southern California , USA Published online: 02 Aug 2010. To cite this article: Rand Wilcox (2003) Multiple comparisons based on a modified one-step M-estimator, Journal of Applied Statistics, 30:10, 1231-1241, DOI: 10.1080/0266476032000137463 To link to this article: http://dx.doi.org/10.1080/0266476032000137463 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: rand

Post on 08-Apr-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

This article was downloaded by: [Northeastern University]On: 05 December 2014, At: 16:58Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Applied StatisticsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/cjas20

Multiple comparisons based on a modified one-stepM-estimatorRand Wilcox aa Department of Psychology , University of Southern California , USAPublished online: 02 Aug 2010.

To cite this article: Rand Wilcox (2003) Multiple comparisons based on a modified one-step M-estimator, Journal ofApplied Statistics, 30:10, 1231-1241, DOI: 10.1080/0266476032000137463

To link to this article: http://dx.doi.org/10.1080/0266476032000137463

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Journal of Applied Statistics, Vol. 30, No. 10, December, 2003, 1231–1241

Multiple comparisons based on a modifiedone-step M-estimator

RAND R. WILCOX, Department of Psychology, University of Southern California,USA

Although many methods are available for performing multiple comparisonsbased on some measure of location, most can be unsatisfactory in at least some situations,in simulations when sample sizes are small, say less than or equal to twenty. That is, theactual Type I error probability can substantially exceed the nominal level, and for somemethods the actual Type I error probability can be well below the nominal level, suggestingthat power might be relatively poor. In addition, all methods based on means can haverelatively low power under arbitrarily small departures from normality. Currently, amethod based on 20% trimmed means and a percentile bootstrap method performsrelatively well (Wilcox, in press). However, symmetric trimming was used, even whensampling from a highly skewed distribution and a rigid adherence to 20% trimming canresult in low efficiency when a distribution is sufficiently heavy-tailed. Robust M-estimatorsare more flexible but they can be unsatisfactory in terms of Type I errors when samplesizes are small. This paper describes an alternative approach based on a modified one-step M-estimator that introduces more flexibility than a trimmed mean but provides bettercontrol over Type I error probabilities compared with using a one-step M-estimator.

1 Introduction

There are, of course, many multiple comparison procedures that can be used witha variety of measures of location. As is well known, arbitrarily small departuresfrom normality can destroy power when comparing means, a result that followsalmost immediately from a classic paper by Tukey (1960). Given the goal ofcomparing groups based on some reasonable measure of location, two families ofrobust alternatives to the sample mean currently stand out: trimmed means and

Correspondence: R. R. Wilcox, Department of Psychology, University Park Campus, Los Angeles,California 90089-1061, USA. E-mail: [email protected]

ISSN 0266-4763 print; 1360-0532 online/03/0101231-11 © 2003 Taylor & Francis Ltd

DOI: 10.1080/0266476032000137463

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

1232 R. R. Wilcox

M-estimators. In terms of controlling Type I error probabilities and probabilitycoverage, excellent results can be had when comparing all pairs of independentgroups by employing 20% trimmed means with a percentile bootstrap methodrecently proposed by Wilcox (in press). Moreover, power is relatively high whendistributions are normal, compared with standard methods based on means—but,unlike methods based on means, power can remain relatively high when samplingfrom a heavy-tailed distribution.

There are, however, two practical concerns regarding trimmed means. The firsthas to do with symmetric trimming. To explain, consider a random sample of nobservations: X1, . . . , Xn. Typically the trimmed mean consists of deciding inadvance that the proportion of observations to be trimmed will be c, in which casegó[cn] observations will be trimmed from both tails. That is, put the observationsin ascending order yielding X(1)O. . .OX(n) , in which case the c-trimmed is

Xtó1

nñ2g(X(gò1)ò. . .òX(nñg))

Symmetric trimming is intuitively appealing when sampling from a symmetricdistribution. But if sampling is from a distribution that is highly skewed to theright, then intuition suggests that, at least in some situations, more observationsshould be trimmed from the right tail versus the left. However, if asymmetrictrimming is used, probability coverage can now be unsatisfactory (and so, ofcourse, control over the probability of a Type I error can be unsatisfactory as well)when using any of the methods covered in Wilcox (1997).

The finite sample breakdown point of an estimator is the smallest proportion ofobservations which, when altered, can make its value arbitrarily small or large. Thefinite sample breakdown point of the trimmed mean is c. So with 20% trimming,more than 20% of the observations must be outliers to render the trimmed meanmeaningless. Huber (1993) has argued that a finite sample breakdown point lessthan or equal to 10% is dangerous. Setting có0.2 addresses this concern andpower remains reasonably high compared with standard methods based on zerotrimming. However, another concern is fixing the amount of trimming to be used.For many situations, 20% trimming seems adequate in terms of the proportion ofoutliers one is likely to encounter. Nevertheless, situations do arise where theproportion of outliers is found to be greater than 20%. (An illustration is given inthe final section of this paper.) One could routinely use a higher amount oftrimming; however, if the amount of trimming is too high, power can be low whensampling from a light tailed distribution (such as a normal distribution) whereoutliers are relatively rare.

In contrast are M-estimators where a measure of location, h, is the solution to

;t�Xiñhq �ó0 (1)

where ( is some odd function and q is some measure of scale. Setting ((x)óxcorresponds to the sample mean. Among the many choices for (, Huber’s (

((x)ómax[ñK, min(K, x)]

stands out (e.g. Staudte & Sheather, 1990), where the constant K is typically takento be 1.28. The standard choice for q is the so-called median absolute deviation

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

Multiple comparisons 1233

(MAD) estimate of scale. Letting M be the usual median, MAD is the median ofthe values DX1ñM D , . . . , DXnñM D . Equation (1) can be solved with the Newton–Raphson method and a single iteration of this technique yields (with Kó1.28) thewell-known one-step M-estimator

1.28(MADN) (i2ñ i1)ò&nñ i2iói1ò1X(i)

nñ i1ñ i2(2)

where MADNóMAD/0.6745, i1 is the number of observations Xi such that(XiñM)/MADN\ñK and i2 is the number of observations Xi such that (XiñM)/MADN[K. One appealing feature of equation (2) is that its finite breakdownpoint is 0.5, the highest possible value. Moreover, its efficiency compares well tothe sample mean under normality, but unlike the mean its efficiency remains highwhen sampling from a heavy-tailed distribution.

A practical concern with equation (2) is computing confidence intervals ortesting hypotheses. Currently, it seems that a percentile bootstrap method performsreasonably well provided the sample sizes are not too small (Wilcox, 1997). Butwith sample sizes less than or equal to 20, problems arise. (Switching to a bootstrap-t method makes matters worse.) In contrast, with sample sizes as small as ten,methods for computing confidence intervals for trimmed means are available thatperform well in simulations where methods based on a robust M-estimator areunsatisfactory, provided the amount of trimming is at least 20%. The purpose ofthis paper is to suggest using a simple modification of the one-step M-estimatorgiven by equation (2) and to consider how multiple comparisons might be madeusing a percentile bootstrap method. In simulations, the proposed method competeswell with techniques based on trimmed means in terms of both power andprobability coverage.

2 A modified one-step M-estimator

The proposed modification of the one-step M-estimator given by equation (2) isto drop the term containing MADN and simply use

hó ;nñ i2

ió i1ò1

X(i)

nñ i1ñ i2(3)

with K adjusted so that efficiency is good under normality. Here, the main concernis with small sample sizes, so K is adjusted so that efficiency is reasonablygood under normality for nO100. In particular, using simulations with 10 000replications, it was found that with Kó2.24, the standard error of the samplemean divided by the standard error of h is approximately 0.9 for nó20(5)100. Fornó10 and 15, this ratio is 0.88. So, in effect, one flags the value Xi to be an outlierif DXiñM D /MADN[2.24, which is a special case of the outlier detection strategyof Rousseeuw & van Zomeren (1990), but unlike the one-step M-estimator, onemerely averages the values remaining after outliers are discarded. The basic strategyof checking for outliers and discarding them is a rather obvious approach to robustestimation and it is not new. In particular, the class of skipped estimators originallyproposed by Tukey and studied by Andrews et al. (1972) is based on this approach,but the class of skipped estimators they studied is based on outlier detection rulesstemming from the boxplot. An advantage of using an outlier detection rule basedon M and MAD is that the resulting finite sample breakdown point of h is 0.5.

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

1234 R. R. Wilcox

That h has a finite sample breakdown point of 0.5 follows from the well-knownfact that MAD has a finite sample breakdown point of 0.5. In practical terms, if adistribution is sufficiently heavy-tailed, h can have a substantially smaller standarderror versus the 20% trimmed mean, which in turn can mean substantially morepower when testing hypotheses. (If we use a standard boxplot rule based on thelower and upper quartiles to detect outliers, the breakdown point would be only0.25.)

In fairness, when sampling from a heavy-tailed distribution, the efficiency of thetrimmed mean can be increased by increasing the amount of trimming. Themedian, for example, will have high efficiency when sampling from a very heavy-tailed distribution, but not when a distribution is normal. What is being suggestedis that the amount of trimming be empirically determined using an outlier detectionrule that avoids masking. (That is, the outlier detection rule should not itself beaffected by outliers in a manner that masks their presence.) The idea is that byintroducing flexibility into how much is trimmed, good efficiency will be obtainedwhen sampling from normal distributions as well as very heavy-tailed distributionswhere the 20% trimmed mean performs rather poorly. Simultaneously, the goal isto allow asymmetric trimming and achieve reasonably good control over theprobability of a Type I error in situations where unsatisfactory control is obtainedwith the one-step M-estimator given by equation (2).

Although the main goal is to test hypotheses using an estimator that avoids thepractical problems mentioned in the introduction, perhaps some general commentsabout h, the parameter corresponding to h, should be made. First, it is noted thath is well defined and given by

hó1m �

gòKq

gñKq

xdF(x)

where g is the population median, q is the population value estimated by MADN,and

mó�gòKq

gñKq

dF(x)

Staudte & Sheather (1990) note that any measure of location should satisfy fourbasic properties. First, if the random variable X has measure of location h, thenfor any constant b, Xòb should have measure of location hòb. Second, themeasure of location associated withñX should beñh. Third, XP0 implies hP0,and fourth bX should have measure of location bh (scale equivariance). It is readilyverified that h satisfies all four of these properties.

Finally, general results on the influence function of skipped estimators arereported by Andrews et al. (1972), which can be used to derive an asymptoticexpression for the standard error of h. However, even when using simpler skippedestimators, expressions for asymptotic variances are rather complicated whensampling from asymmetric distributions. Here, the emphasis is on small samplesizes when performing all pairwise comparisons among J independent groups. Byemploying the percentile bootstrap method, an expression for the standard errorwill not be needed.

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

Multiple comparisons 1235

3 Hypothesis testing

For J independent groups, this section takes up the problem of testing

H0 : hjóhk (4)

for all j\k. The strategy is to compare two groups using a special case of a generalstrategy derived by Liu & Singh (1997) and then control the familywise error rate(the probability of at least one Type I error) using a method described later in thissection.

Let Xij, ió1, . . . , nj; jó1, . . . , J be a random sample of nj observations randomlysampled from the jth group. First, consider Jó2 and for fixed j let X*1j , . . . , X*nj j

be a bootstrap sample obtained by randomly resampling with replacement nj

observations from X1j, . . . , Xnj j. Next, let

p*jkóP(h*j [h*k )

where h*j is the value of h based on a bootstrap sample from the jth group. Thatis, p*jk is the probability that, when resampling from the empirical distributionassociated with the jth and kth groups, the value of h*j will be greater than thevalue of h*k . Notice that p*jk reflects the degree to which the empirical distributionsdiffer. If the empirical distributions are identical, then p*ó0.5.

Here, p*jk is estimated in the following manner. Generate B bootstrap samplesfrom the jth group, each bootstrap sample having nj observations, and for each ofthe bootstrap samples compute h and label the results h*jb , bó1, . . . , B. Let Ibó1if h*jb[h*kb, otherwise Ibó0. Then an estimate of p*jk is

p*jkó1B;B

bó1Ib

An alternative estimate is to compute the Mann-Whitney U statistic on the bootstrapestimates of h and then rescale U to get an estimate of p*jk . From general results inLiu & Singh (1997), it follows that when hjóhk, the distribution of p*jk convergesto a uniform distribution over the unit interval. This result also follows from generaltheoretical results reported by Hall (1986). So when comparing groups j and k only,this suggests rejecting H0:hjóhk if p*jkOa /2 or if p*jkP1ña /2.

There remains the problem of controlling the familywise error rate (FWE). Aseemingly simple strategy is to employ the Bonferroni inequality and reject ifp*jkOa /(2C) or if p*jkP1ña /(2C ), where C is the total number of hypotheses to betested. This approach was found to perform reasonably well in simulations providedall groups have a sample size of 100. However, for small to moderate sample sizes,this approach was found to be much too conservative, meaning that the actualvalue of FWE can be substantially smaller than the nominal level. In particular, itwas found that when testing at the 0.05 level, FWE is approximately 0.025 or lessamong the situations considered in Section 5. Surely this is unacceptable andrequires some adjustment when sample sizes are small.

A better approach is to reject if p*jkOa /C or if p*jkP1ña /C, except of coursewhen Có1. It was found, however, that even this approach is somewhat conserva-tive in terms of Type I errors. Here, a slight variation of this method is usedinstead, where a is adjusted using the C-variate Studentized maximum modulusdistribution with infinite degrees of freedom. Let

p*mjkómin{ p*jk , 1ñ p*jk }

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

1236 R. R. Wilcox

Let P* be the minimum value of p*mjk over all j\k. Then reject H0 : hjóhk ifp*mjk\aC, where aCó2î(1ñ'(qC)), ' is the standard normal cumulative distribu-tion, and qC is the 1ña quantile of a Studentized maximum modulus distributionwith infinite degrees of freedom. Confidence intervals for hjñhk are given by(D*jk(Lò1) , D*jk(U )), where L is P*aB rounded to the nearest integer, UóBñL, andD*jkbóh*jbñh*kb. Bechhofer & Dunnett (1982) report quantiles of this distributionfor Có2(1)32. This will be called method M.

Although the emphasis here is on small sample sizes, it is noted that whensampling from standard normal distributions with Jó4 and all sample sizes equalto 100, FWE is approximately 0.06 with aó0.05. Perhaps, with very large samplesizes, control over FWE becomes unsatisfactory, but this has not been established.

4 Design of the simulation study

The small-sample properties of method M were studied for both Jó2 and 4 andaó0.05. The first set of simulations is based on the exponential and lognormaldistributions. The lognormal is particularly important because it is known to causeproblems when working with means (e.g. Westfall & Young, 1993). The secondset of simulations generates observations from one of four g-and-h distributions:normal, symmetric with heavy tails, asymmetric with relatively light tails andasymmetric with a relatively heavy tail. For Jó2, Bó1000 was used and for Jó4,results are based on Bó2000.

Simulations with Jó2 are based on sample sizes nó(11, 11), (11, 21). Unequalstandard deviations were considered by generating observations from a specificdistribution and multiplying the observations in the jth group by pj. The twochoices for the pj values were �ó(p1,p2)ó(1, 1) and (1, 6). When generatingobservations from a skewed distribution, observations were shifted so that hjó0before multiplying by pj, in which case the null hypothesis remains true. Theproportion of rejections among 2000 replications was used to estimate the actualprobability of a Type I error.

Next, simulations were run by generating observations from g-and-h distribu-tions, which includes normal distributions as a special case. If Z is a standardnormal random variable, then an observation, X, from the g-and-h distribution isgiven by

Xó�egZñ1g � ehZ2/2

When gó0, this last expression is taken to be

XóZehZ2/2

The case góhó0 corresponds to a standard normal random variable. With gó0,X has a symmetric distribution with increasingly heavier tails as h increases. As gincreases from 0, the distribution becomes more skewed. Hoaglin (1985) gives adetailed description of the g-and-h distribution. To provide at least some indicationof its properties, Table 1 lists the skewness (c1) and kurtosis (c2) for the fourdistributions considered here. The strategy is to perform simulations on symmetricdistributions having relatively light or heavy tails, plus skewed distributions alsohaving relatively light or heavy tails. The case góhó0.5 is included to see howeach method performs under what would seem like extreme conditions. The idea

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

Multiple comparisons 1237

T 1. Some properties of the g-and-h distribution

g h c1 c2 c1 c2

0.0 0.0 0.00 3.00 0.00 3.00.0 0.5 0.00 — 0.00 11 896.20.5 0.0 1.75 8.9 1.81 9.70.5 0.5 — — 120.10 18 393.6

is that, if a method performs reasonably well under extreme conditions, thisprovides some assurance that it will perform well under conditions likely to beencountered in practice.

When h[1/k and g[0, E(Xñk)k is not defined and the corresponding entryin Table 1 is left blank. A possible criticism of simulations performed on a computeris that observations are generated from a finite interval, so the moments are finiteeven when in theory they are not, in which case observations are not beinggenerated from a distribution having the theoretical skewness and kurtosis valueslisted in Table 1. In fact, as h increases, there is an increasing difference betweenthe theoretical and actual values for skewness and kurtosis. Accordingly, Table 1also lists the estimated skewness (c1) and kurtosis (c2) based on 100 000 observa-tions generated from the distribution.

5 Results

5.1 Jó2

Table 2 reports the results for the exponential and lognormal distributions whentesting at the 0.05 level. To add perspective, Table 2 also lists results when h isreplaced by a 20% trimmed mean and Pa is adjusted as described in Wilcox (inpress). This is method P in Table 2. Method BT is the bootstrap-t method basedon 20% trimmed means. Method Y is the method for trimmed means proposedby Yuen (1974).

Table 3 shows the results of the simulations when sampling from a g-and-hdistribution that includes normal distributions as a special case. Now the estimatedType I error probabilities for method M range between 0.021 and 0.051. Generally,method M is more conservative than methods P, BT and Y.

T 2. Estimated Type I error probabilities for exponential and lognormal distributions, Jó2

Method

Distribution n � P BT Y M

exponential (11, 11) (1, 1) 0.059 0.047 0.037 0.030(11, 11) (1, 6) 0.063 0.061 0.076 0.055(11, 21) (1, 1) 0.051 0.061 0.050 0.030(11, 21) (1, 6) 0.054 0.048 0.079 0.038(21, 11) (1, 6) 0.060 0.051 0.065 0.061

lognormal (11, 11) (1, 1) 0.050 0.052 0.031 0.042(11, 11) (1, 6) 0.051 0.070 0.079 0.056(11, 21) (1, 1) 0.051 0.067 0.042 0.052(11, 21) (1, 6) 0.054 0.059 0.083 0.054(21, 11) (1, 6) 0.059 0.055 0.067 0.069

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

1238 R. R. Wilcox

T 3. Estimated Type I error probabilities for g-and-h distributions, Jó2

Method

g h n � P BT Y M

0.0 0.0 (11, 11) (1, 1) 0.059 0.045 0.050 0.033(11, 11) (1, 6) 0.062 0.048 0.059 0.045(11, 21) (1, 1) 0.050 0.056 0.057 0.034(11, 21) (1, 6) 0.054 0.051 0.059 0.042(21, 11) (1, 6) 0.061 0.053 0.055 0.044

0.0 0.5 (11, 11) (1, 1) 0.039 0.058 0.038 0.021(11, 11) (1, 6) 0.053 0.053 0.040 0.033(11, 21) (1, 1) 0.039 0.068 0.042 0.027(11, 21) (1, 6) 0.055 0.066 0.042 0.021(21, 11) (1, 6) 0.055 0.062 0.043 0.037

0.5 0.0 (11, 11) (1, 1) 0.058 0.056 0.042 0.029(11, 11) (1, 6) 0.062 0.052 0.063 0.051(11, 21) (1, 1) 0.050 0.062 0.052 0.032(11, 21) (1, 6) 0.053 0.052 0.065 0.049(21, 11) (1, 6) 0.058 0.054 0.059 0.044

0.5 0.5 (11, 11) (1, 1) 0.038 0.056 0.034 0.025(11, 11) (1, 6) 0.045 0.064 0.046 0.033(11, 21) (1, 1) 0.041 0.068 0.039 0.025(11, 21) (1, 6) 0.047 0.068 0.048 0.024(21, 11) (1, 6) 0.053 0.065 0.049 0.039

The power of method M was found to be competitive with a variety of othertechniques. For example, when sampling from standard normal distributions withk1ñk2ó1, aó0.05 and sample sizes of 25 for both groups, power is approximately0.931 when using Welch’s (1933) heteroscedastic method for means. Using methodM instead, power is 0.86. If instead observations are sampled from the mixednormal distribution,

H(x)ó0.9'(x)ò0.1'(x /10)

power is 0.8 versus 0.28 using Welch’s (1933) method instead. If method P with20% trimmed means is used, power is 0.75. Generally, it seems that method Mprovides power about as high or higher than method P, but presumably exceptionsoccur because they are based on different measures of location. The main point isthat there is little separating methods M, P and BT in terms of Type I errorprobabilities but method M provides a more flexible approach regarding where andhow much to trim, which can translate into higher efficiency versus rigidly using20% trimming. (An illustration is given in the final section of the paper.)

5.2 Checks on aC

Before continuing, it is noted that as a partial check on the use of aC, simulationswere used to estimate aC when sampling from standard normal distributions. Table4 shows the resulting estimates for Jó3(1)8, labelled as aC and based on 10 000replications, with all sample sizes equal to 11 and when sampling from standardnormal distributions. As is evident, there is little separating aC and its estimatedvalue. Very similar results are obtained when all of the sample sizes are increasedto 100.

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

Multiple comparisons 1239

T 4. A comparison of aC and aC, FWEó0.05

J aC aC

3 0.0160 0.01684 0.0075 0.00855 0.0056 0.00536 0.0041 0.00347 0.0029 0.00258 0.0024 0.0018

In Table 4, exact values for qC have not been tabled for C[32, so an approxi-mation was used instead:

qCó2.383904C1/10ñ0.202

which was derived from the exact quantiles corresponding to Có10(1)28 inconjunction with the half-slope ratio method described by Velleman & Hoaglin(1981). (The absolute value of the residuals of this approximation do not exceed0.01 for any C, Có10(1)28.) So the results in Table 4 are not as compelling whenJ[6, but they were included anyway to provide at least a crude check.

5.3 Jó4

Next, for Jó4, simulations were run with the same distributions used with Jó2and with sample sizes (n1, n2, n3, n4)ó(11, 11, 11, 11) and (10, 15, 20, 25). To studythe effect of unequal variances, observations in the jth group were multiplied by pj

where now (p1,p2,p3,p4)ó(1, 1, 1, 1), (1, 1, 1, 5) and (5, 1, 1, 1). (Of course, forskewed distributions, distributions are first shifted so that hjó0 before multiplyingby pj. The value of h was determined based on one million observations, generatedfrom the appropriate distribution.)

Table 5 shows the results for exponential and lognormal distributions usingBó2000. These distributions are of particular interest because they are known tocause problems when comparing means. Note that when using method Y, theestimated probability of a Type I error goes as high as 0.104. Method BT avoidsType I error probabilities that are well above the nominal level, but the estimate

T 5. Estimated Type I error probabilities for lognormal and exponential distributions, Jó4

Method

Distribution n � P BT Y M

exponential (11, 11, 11, 11) (1, 1, 1, 1) 0.048 0.039 0.060 0.040(11, 11, 11, 11) (1, 1, 1, 5) 0.049 0.039 0.096 0.056(10, 15, 20, 25) (1, 1, 1, 1) 0.042 0.041 0.046 0.049(10, 15, 20, 25) (1, 1, 1, 5) 0.039 0.040 0.061 0.053(10, 15, 20, 25) (5, 1, 1, 1) 0.046 0.052 0.098 0.064

lognormal (11, 11, 11, 11) (1, 1, 1, 1) 0.048 0.015 0.030 0.042(11, 11, 11, 11) (1, 1, 1, 5) 0.048 0.046 0.091 0.056(10, 15, 20, 25) (1, 1, 1, 1) 0.040 0.030 0.033 0.052(10, 15, 20, 25) (1, 1, 1, 5) 0.038 0.039 0.056 0.054(10, 15, 20, 25) (5, 1, 1, 1) 0.042 0.063 0.104 0.069

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

1240 R. R. Wilcox

T 6. Estimated Type I error probabilities for g-and-h distributions, Jó4

Method

g h n � P BT Y M

0.0 0.0 (11, 11, 11, 11) (1, 1,1, 1) 0.051 0.039 0.060 0.041(1, 1, 1, 5) 0.050 0.039 0.070 0.051

(10, 15, 20, 25) (1, 1, 1, 1) 0.053 0.041 0.059 0.055(1,1, 1, 5) 0.047 0.040 0.054 0.053(5, 1, 1, 1) 0.053 0.047 0.069 0.060

0.0 0.5 (11, 11, 11, 11) (1, 1, 1, 1) 0.045 0.021 0.060 0.028(1,1, 1, 5) 0.046 0.031 0.061 0.036

(10, 15, 20, 25) (1, 1, 1, 1) 0.043 0.032 0.059 0.030(1, 1, 1, 5) 0.039 0.026 0.054 0.032(5, 1, 1, 1) 0.046 0.033 0.069 0.039

0.5 0.0 (11, 11, 11, 11) (1, 1, 1, 1) 0.052 0.029 0.047 0.038(1, 1, 1, 5) 0.052 0.042 0.077 0.048

(10, 15, 20, 25) (1, 1, 1, 1) 0.053 0.039 0.052 0.042(1, 1, 1, 5) 0.048 0.032 0.034 0.045(5, 1, 1, 1) 0.053 0.049 0.078 0.054

0.5 0.5 (11, 11, 11, 11) (1, 1, 1, 1) 0.043 0.020 0.028 0.023(1, 1, 1, 5) 0.047 0.038 0.042 0.040

(10, 15, 20, 25) (1, 1, 1, 1) 0.044 0.039 0.031 0.037(1, 1, 1, 5) 0.040 0.032 0.034 0.038(5, 1, 1, 1) 0.047 0.053 0.047 0.030

drops as low as 0.015. The two highest estimates of the probability of a Type Ierror using method M were 0.069 and 0.064; otherwise the estimates rangebetween 0.040 and 0.056.

Table 6 reports simulation results when observations are generated from g-and-h distributions. The largest of the estimated Type I error probabilities, when usingmethod M, is 0.060 and the lowest is 0.023. For method Y, the highest estimateis 0.078.

6 Concluding remarks

It might help to note that the proposed method has certain similarities to Gosset’soriginal derivation of Student’s T: assume normality, homoscedasticity, makeadjustments to the critical values when the sample sizes are small, and hope thatthe method continues to perform well under heteroscedasticity and non-normality.As is well known, Student’s T does not perform well in simulations, but the resultsin this paper indicate that method M does provide adequate control over FWE.

Despite the advantages of using the modified one-step M-estimator, if the goalis to have the Type I error probability as close as possible to the nominal level,method P appears to be better in general. However, all indications are that,typically, method M competes well. Moreover, its flexibility about how much andwhere to trim can make a practical difference. For example, in an unpublishedstudy by E. Dana dealing with self-awareness, the following values were observed.

Group 1: 77 87 87 114 151 210 219 246 253 262 296

299 306 376 428 515 666 1310 2611

Group 2: 59 106 174 207 219 237 313 365 458 497 515

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014

Multiple comparisons 1241

529 557 615 625 645 973 1065 3215

Using method M, the 0.95 confidence interval for the difference based on themodified one-step M-estimator is (ñ343.53, 4.01) versus (ñ391.31, 99.85) usingmethod P, the point being that the ratio of the lengths of the confidence intervalsis 0.71. For the first group, the modified one-step M-estimator trims the threelargest values rather than the two largest values, as is done by the 20% trimmedmean. For both groups, none of the smaller values are trimmed by the modifiedone-step M-estimator.

REFERENCES

A, D. F., B, P. J., H, F. R., H, P. J., R, W. H. & T, J. W. (1972)Robust Estimates of Location: Survey and Advances (Princeton, NJ, Princeton University Press).

B, R. E. & D, C. W. (1982) Multiple comparisons for orthogonal contrasts, Techno-metrics, 24, pp. 213–222.

H, P. (1986) On the bootstrap and confidence intervals, Annals of Statistics, 14, p. 1431–1452.H, D. C. (1985) Summarizing shape numerically: the g-and-h distributions. In: D. H,

F. M & J. T (eds) Exploring Data Tables, Trends, and Shapes, pp. 461–515 (New York,Wiley).

H, P.(1993) Projection pursuit and robustness. In: S. M, E. R & W. S

(eds) New Directions in Statistical Data Analysis and Robustness (Boston, Birkhauser Verlag).L, R. Y & S, K. (1997) Notions of limiting P values based on data depth and bootstrap, Journal

of the American Statistical Association, 92, pp. 266–277.R, P. J. & Z, B. C. (1990) Unmasking multivariate outliers and leverage points

(with discussion), Journal of the American Statistical Association, 85, pp. 633–639.S, R. G. & S, S. J. (1990) Robust Estimation and Testing (New York, Wiley).T, J. W. (1960) A survey of sampling from contaminated normal distributions. In: I. O et al.

(eds) Contributions to Probability and Statistics (Stanford, CA, Stanford University Press).V, P. F. & H, D. C. (1981) Applications, Basics, and Computing of Exploratory Data

Analysis (Boston, MA, Duxbury).W, B. L. (1933) The significance of the difference between two means when the population

variances are unequal, Biometrika, 29, pp. 350–362.W, P. H. & Y, S. S. (1993) Resampling-Based Multiple Testing (New York, Wiley).W, R. R. (1997) Introduction to Robust Estimation and Hypothesis Testing (San Diego, CA,

Academic Press).W, R. R. (in press) Pairwise comparisons of trimmed means for two or more groups, Psychometrika.Y, K. K. (1974) The two-sample trimmed t for unequal population variances, Biometrika, 61,

pp. 165–170.

Dow

nloa

ded

by [

Nor

thea

ster

n U

nive

rsity

] at

16:

58 0

5 D

ecem

ber

2014