bias robust estimation ofscale by - stat.washington.edu · the august 1991 technical report no. 214...

BIAS ROBUST ESTIMATION OF SCALE

by

R. Douglas MartinRuben H. Zamar

TECHNICAL REPORT No. 214

August 1991

Department of Statistics, GN-22

University of Washington

Seattle, Washington 9819S USA

tun<11ng was

Bias Robust Estimation of Scale

R.D. Martin

University of Washington

Ruben H. Zamar

University of British Columbia

August, 1991

the of Naval Res<earcb, contract NOOOl4-S8-K-0265. Partialthe National and Research of V<>UlaAU"

we OOIISll:Ier

oeiongs to a contam-

ination neighborhood a parametric location-scale We the class of

M-estimates of and that under as-sumptions, these scale estimates to asymptotic functionals uniformly

with respect to the underlying distribution, and with respect to the M-estimate

defining score function X. We establish expressions for the maximum asymptotic

bias of M-estimates of scale over the contamination neighborhood as a function of

fraction.of contamination. Using these expressions we construct as1{mpt()t:i.c:;ally

msn-maa: bias robust estlmat(~S

of the Madm (medianof nun

max bias-robust within the class ofHuber's proposal 2 joint estimates of location

and scale. We also consider the larger class of M-estimates of scale with generallocation, and show that a scaled version of the Shorth (the shortest half of the data)

is approximately min-max bias robust in this class. Finally, we present the results

of a Monte Carlo study showing that the Shorth has attractive finite sample size

mean squared error properties for contaminated Gaussian data.

The August 1991 Technical Report No. 214 form of this paper is a considerably

ext,en(1eCl version of the October, 1989 Technical Report 184.

KEY WORDS: Bias robustness, M-estimates, scale, location, Madm, Shorth

main theoretical approach to robustness has consisted of studying the asymp

totic of an underlying distribution of the data belongs

to some neighborhood (e.g, e-contamination or Levy neighborhood) of a paramet

ric model. In this context one to obtain estimates which optimize some ap

pealing criterion, e.g., minimize the maximum asymptotic variance over a given

neighborhood. Huber (1964) is the earliest example of this approach, with focus on

M-estimates of location.

The best known part of Huber (1964) is the result that a particular M-estimate

of location, namely the one with psi-function 'ljJ( x) = min{c, max(x, -c)}, minimizes

the maximum asymptotic variance over symmetric e-contamination neighborhoods

of a Gaussian model. A consigerably less well known part of Huber (1964) is that

concerned with asymptotic bias of location estimates for unrestricted asymmetric

e-contamination neighborhoods of a nominal Gaussian model: among all translation

equivariant estimates, the median minimizes the maximum asymptotic bias over

such neighborhoods. The relevance of this result seems considerable in view of the

needed realism of allowing asymmetric contamination.

Recently there has been a renewed interest in bias-robustness. In particular

Donoho and Liu (1988) have shown that minimum distance estimates have desir

able bias robustness properties. Martin, Yohai and Zamar (1989) have obtained

asymptotically minimax bias regression estimates, and Martin and Zamar (1989)

have obtained minimax bias estimates of scale for positive random variables.

In this paper we obtain minimax bias robust estimates of scale for contamination

models with a nominal distribution which is symmetric about an unknown location

parameter. More precisely, we assume that

(AO) Fo is a specified distribution, function with an even, and unimodal density fo.

The distribution for independent and identically distributed observations XI, ... ,Xn

belongs to the e-contaminated family

= (1- z E R, t (0, .,5)} ,

unsnown Iecation parameter, So IS

minimax estimate is to rlAl"nrp

one

prior resutts

delpelldllng on

tuncnoas to;je(~tlC.n 4 constructs mnumax

ranges overare uniformly

a curve, a versus c. bias curve

includes gross error c = 0, and

also breakdown-point BT {c)goes to the two-number summary consisting of the andprovides considerable information, one naturally would like to have the entire curve

BT{c) if possible. Not only do such curves allow one to check the range of accuracy

of the GEST as a approximation, they may also lead to different preference

ordering of competing estimates that one might make on the basis of GEST and cT(e.g. see Section 5 and also Martin, Yohai and Zamar, 1989, who find min-max bias

robust regression estimates with GEST = 00).Figure 1 displays the maximum bias curves for three proposed robust estimates

of scale: H95, a Huber proposal 2 estimate of scale, ~djusted for 95%~fficiency

at the Gaussian (Huher, 1964); the median ofahsolute deviatioIlsa.houtthe media.n(Madm); and the "shortest half" of the data (Shorth). Observe that

c'Shorth = CMadm .5, the largest possible value of e" and cR95 = .17. The breakdownpoint of a classical Gaussian maximum likelihood estimate is typically zero. The

GEST lines provide local linear approximations to the maximum bias, which are

reasonable for not too large values of e (just how large the reader can judge for

himself - see the rule of thumb in Hampel et, al. 1986).

In this section we show that, under certain regularity conditions, the finite sample

value and the asymptotic value of robust M-scale.estimates are uniformly close, as

F rarrges over the family Fe. Mor~ver ,priorresults .in Martin andZamar (1989)indicate that the bias is a significant component of the mean-squared error for rather

small to moderate sample sizes, depending on the value of c.

The remainder of the paper is organized as HII''''''''. ;jject.lon 2 introduces class of

M-estimates of scale with general location. This the well known Huber

{Proposal 2) of location and scale, class of scale estimates

are with so of regression

(Rousseeuw and Yohai, 1984). Section 2 also shows under certain regularity

conditions, the sample value the of scale

robust estrmates are by

(Madm). Secnoa 5 constructs rsmimax bias-robust scale, which are

shown to be min-max in the larger class of M-estimates of with general location

introduced in 2. Section 5 also that these estimates are reasonably well

Shorth. Section 6 briefly discusses the between bias-

robust Huber estimates and Sections 7 and 8 give some encouraging

finite results. Finally, Section 9 closes with a brief discussion of the GES

linear approximation to the maximum bias curve. Proofs of lemmas and theorems

are given in Section 10.

Our results on the Shorth complement recent results of Rousseeuw and Leroy

(1988), who propose the Shorth as a robust scale estimate. They derive the influ

ence function, the finite sample breakdpwn-point, and a correction factor to achieve

approxi:rnate sample unbiasedness at the normal distribution. Another

interesting recent work on the Shorth is that of Grubel (1988), who establishes

asymptotic normality.

2. M-ESTIMATES OF SCALE WITH GENERAL LOCATION

Estimates of scale are conveniently viewed as translation invariant, scale equivari

ant functionals S(F) defined over a subset F of distribution functions F, which is as

sumed to include all the empirical distribution functions Fn and the e-contamination

family (1). scale estimate sn is then obtained by evaluating the funCtional S(F)at r; sn S(Fn).

Suppose that

(AI) X is even, nondeereasinq on [0,(0), bounded, with at most a finite number of

jumps, and that x((0) = 1.

and for each t E

b(X) = EFoX(X)

S(F, t) be the M-estimate of scale of X - t defined by

{s: > (2)

assume

€< < 1- €, a €E

=1.

no assumption that

definition (2) is needed to insure and to handle possible discon-

tinuities of F and X. If X (or F) is continuous, then S(F, t) equation

EFX((X - t)jS(F, t)J = b(x). (3)

Since the location parameter Ito in (1) is unknown, it must be estimated along

with so. Let T(F) be a location and scale equivariant functional, that is,

T(F((x t)js)] sT(F(x)]+t, 'V tER, 'V s>O.

The M-estimate of scale with general location is now defined as

S(F) = S(F,T(F)].

Some particular cases are:

Huber Proposal 2. In this case T(F) and S[F, T(F)] simultaneously satisfy (3), with

t = T(F), and

where

EF1jJ[(X - T(F))jS(F,T(F))] = 0, (4)

(A3) 1jJ(x) is odd, notulecreasinq, bounded, with at most a finite number of jumps.

In particular the Madm is obtained when X is the jump function

if Ixl ~ aif Ixl > a,

(5)

with a = (3j4), and 1jJ is the "sign" function

{1 if x < 0

- 0 if x 0

1 if x >

case IS

case

S(F, t).

IS a minimizer of S(F, t),

(7)

It is not difficult to see S(F) satisfies (3) and (4) with t/J(x) = Xl(x).Since X(x) is bounded, t/J(x) tends to zero as x tends to infinity, that is, t/J(x) isredescending. In particular, the Shorth is obtained when X is given by (6). Observethat the Madm and the Shorth have both the same chi-function (6) but differentcentering functionals,

The following lemma shows that under mild assumptions the breakdown point ofthe the functionaLS(F,t) is larger than f. The proof is straightforward and thereforeomitted.

LEMMA 1. Let K > 0 be given and suppose that (AO), (AI) and (A2) hold. Then,there exist 0 < 81 < 82 < 00 such that 81 ~ S(F, t) ~ 82 for all ItI < K andFE s;

Theorem 1 below shows that, under some regularity conditions which include thecontinuity of X, S(Fn) -+ S(F) a.s.[F] as n -+ 00, uniformly over :Fe X C, where Cis a certain class of x-functions. Unfortunately, the case of x-functions of the jumptype gi~en by (5) is not covered by Theorem L However, Theorem 6 and the MonteCarlo results presented in sections 7 and 8 support the fiIl.i.t~ sample relevance of theasymptotic minimax-bias theory for this important special case.

The following definitions are needed for stating Theorem L

gX(8,t) = EFoX[(X - t)ls] hx(s,t) = (8188)gx(s,t). (8)

>X and

Then, for

IS

all <5 > 0,

(a) U.LUm-+oo lSUlf,JJ;'L::'1:

(b) IfO.

t)l > <5} = O.

m-+oo I> <5} = 0,

(d) xo> °he class C as the set of x-functions satisfying allthe previous assumptions and (i) X( x) = 1 V Ixl ~ Xo and (ii) there exists hoes, t)such that hx(s, t) ~ hoes, t) < 0, "'Is> 0, "'It E R.

Then (a), (b) and (c) hold uniformly on C.

Remark. Suppose that a certain function Xo satisfies (AI) and (A2) and is such

that

hxo(s, t) = -~EFO [x~ (X s t) xi t] < 0, "'Is> O,t E Rand Xo(x) 1, Vlxl > xo.

Thenthe set

{I [(X - t) X- t]}C = X : X'(x):::; X~(x) , V x > 0 and hxCs,t) = -;EFo X' -s- -s-

satisfies the assumptions of Theorem 1.

3. GENERALIZED BIAS

Although the M-estimates of scale with general location introduced in Section2 are Fisher con~istent at the nominal distribution Fo, they are in general asymptotically biased for F E Fe. Furthermore, the "raw" asymptotic bias Br[S(F)] =

S(F) - So can be of two distinct kinds: When F is an outliers generating distribu

tion, the bias Br[S(F)] is positive, and when F is an inliers generating distribution,

the bias Br[S(F)] negative.As in Martin and (1989), we consider generalized bias functions which

are of positive and negative bias is inde-

pendently chosen, by allowing the user to put positive and negative bias on different

scales. Specifically, we define the bias

if 0 < S(F) :::; So

if So < :::; 00(9)

oare COfltULUoUS, non-negative

00.

are interested maximum geileraJ.u~ed bias,

B(c) = maxB[S(F)].FE:Fe

(10)

l,From monotonicity LI and L2 , it follows that B(X, T) = max {LI[S- / so] , L2[S+ / so]},where S- and S+ denote the supremum and the infimum of the functional S(F) as

F ranges over Fe.

4. BIAS ROBUST HUBER ESTIMATES

In view of the historical importance and high degree of familiarity of Huber

(Proposal 2) estimates we first focus on obtaining bias robust estimates in this

class. To emphasize the dependence on X and t/J we use the notation S(F, X, t/J),S+(X,t/J), S+(X,t/J), etc.

The first step toward finding the bias robust Huber estimate is deriving the

expressions (16) and (17) for S-(X,T1J;) and S+(X,T1J;)' Claims which are made

below without proof can be easily verified under (AO)-(A3).

The maximum value S+(X, t/J) of the scale functional S(F, X, t/J) is produced by apoint mass contamination at infinity, 600 , and such contamination also produces the

maximum value of the location estimate T1J;(F). The estimating equations in this

limit case are

(1 - c)EFoX[(X - t)/s1 =b(X) (11)

and

(1 - c)EFot/J[(X - t)/8]+e= O. (12)

maximum asymptotic bias due to outliersnrovides an algorithmof

Let ,At) be the unique solution of (11) for fixed t and let '1'1J;(8) be the unique

solution of (12) for fixed s :> O. The fUIletioIlmx,1J;(t) 1'1J;bx(t)] is continuous andnon-decreasing. Also, the pair (8*, t*) simultaneously satisfy (11) and (12) if and

only if t* mx,1J;(t*) and s" = IX(t*).following lemma characterizes

location

IS

LEMA-fA 2. Suppose that For each n ~ 1 let tn m x,'4t(tn-d , withto by (13). Sn = inf{t > to : = t}. (a)UUJln_oo t n = t'" Sn = IxCt*) = ,(b) the maximum asymptotic bias ofthe location estimate T(F, x, t/J) is t" and the maximum value of scale functional

S(F, X, t/J) is S+(X, = s".

The minimum value of the scale functional S(F,X, t/J), S-(X, t/J), is produced bya point mass contamination 80 at zero. In this case the estimating equations are

and

(1 €)EFoX[(X - t)/s] + €x(tls) = b(x)

(1 - €)EFot/J[(X - t)/s] + €t/J( -tis) = o.

(14)

(15)

By monotonicity of t/J, t = 0 for all s > O. Let 9;1 be the inverse of 9x(s,t) withrespect to s, for fixed t, Then, from (14) with t = 0 it follows that

(16)

Optimal Centedng

The of t/Jhas an effect on the maximum asymptotic bias of the scaleestimate by virtue of affecting the bias t* of the location estimate. Observe that

since S-(X, t/J) doesn't depend on t/J (see (16)), the optimal choice of t/J must bebased S+(X,t/J) alone.

It follows from Lemma 2 and (11), with t t", that

(17)

Since for all 0 < a < 1 the function 9;1(a) is non-decreasing in t. Therefore, by the

Huber (1964) result, ~ to all s > 0 and

X satisfying

dIt1tlcult to 'neorem 2 holds for the class of

all M-lestima,tes of scale with cell.teriag functional T(F) navmz the "monotonicity"

property

T(F) ~ T[(1 - c)Fo+ (18)

The Minimax-Bias Huber Estimate of Scale

By Theorem 2 it suffices to consider S+(X, 't/Jo) and S- (X,'t/Jo) and the function 't/Jcan be dropped from the notations. It will be shown that under certain conditions

the maximum generalized bias B(X) (see (10)) is minimized by a jump function Xa*(see (6)).

For each a > 0, let B(a) = B(Xa) and

b(a) = b(Xa) = 2[1 - Fo(a)]. (19)

We begin by showing that given 0 < c < .5, Fo, L l and L 2 there exists a" such that

Va> O. (20)

Let ao = Fo-l [(1+ c)/2] and al = FO-

l [(2 c)/2]. ~From (19), the correspondingvalues of bare bo b(ao) 1 - c and bl b(al) = c. Hence, letting S- (a) = S- (Xa)and S+(a) = S+(Xa), we have

lim S-(a) =a.....ao

and

_l rb(a)]go II - c

. 1 -1 [ b(a)] 1 -1 )lim -Fo 1 - 2( ) = -Fo (.5) = O. (21a.....ao a 1 - c ao

+ 11'm g-:1 [b(a) - c] 2:: 11'm g-1 [b(a) - c]}1I~\ S (a) - a.....al t" 1 _ C a.....al 0 1 - c

lim .!:.F,-1 [1 _b(a) - c] = ..!-FO-l(l) =

a.....al a 0 2(1 c) al(22)

<

---+ +00ao < a* < al

a ---+ ao or

robust among all Xa.is bias robust among

THEOREltf 3. s* = a" given by (23), and to as (13).

Suppose that in addition to (AO)-(A1), the following conditions hold:

(1) lo(x) > 0, V x E Rand 10(8X)/lo(x) is increasing in lxI, V 0 < 8 < L(2) The function ko(x) = [/o(8*X - to)+ 10(8*X to)]/Io(x) is decreasing in Ixl.(3) S- (a) and S+(a) are both strictly monotone at a = a":

Then B(a*) :s; B(X, t/J) for all pair (X, t/J) satisfying (A1)-(A3).

It can be shown that the conditions of Theorem 3 hold, for example, when Fo isthe standard c < .35 (see Martin and Zamar, 1987).

Near Optimality of the Madm

Let b* be the value of b(Xa*) EFoXa*(X). Since the bias robust estimate ofTheorem 3 is based on Xa*' using the median for centering, it follows that the biasrobust Huber estimate is the n - [nb*] order statistic of the absolute value of the

residuals about the median (scaled by l/a*), where ao(c) < a* < al(c). Since both,ao(c) and al(c) tend to Fo-

1(.75) as € -+ .5, so does a", Thus, as c -+ .5, the bias

robust Huber estimate is the well known Madm, whose breakdown-point is equal to

.5.It came as a pleasant surprise that for a broad range of c, the maximum bias of the

bias robust estimate is very close to the Madm for the leading case of the nominal

Gaussian distribution and the logarithmic loss function Lz(t) = -L1(t) = log(t).Table 1 shows the values of a", b* = b(a*), the minimax bias B(a*) and maximum

bias of the Madm values of L The value of a for the Madm is

.674. Therefore in case appreciable between the Madm

and the bias robust Note in particular that even when we choose e small,

e.g. c = .05, breakdown-point of estimate is close

to .5.

5. BIAS ROBUST S-ESTIMATES

robustness can

one search a nnnunax sosunon,

en-

0.20

0.30

OAOOA5

0.501

0.499

OA87OA76

0.324

0.609

1.072

1.440

0.063

0.1350.221

0.324

0.612

1.166

1.779

Table 1. Bias-robust Huber proposal 2 estimates of scale when Fo standardnormal. Logarithmic loss function.

particular one may consider the entire class of M-estima,tes of scale with generallocation. This larger class of course includes joint M-estimates of location and scalewith redescending as well as monotone t/J for the location estimate.

As a first step in dealing with this problem, we show that it suffices to restrictattention to the smaller class of S-estimates of scale.

The following notation is needed for stating Theorem 4. Let S+(X) and S-(X)denote the maximum and minimum asymptotic values of the S-estimate of scale

based on X (see (7)). Let S+(X, T) and S-(X, T)denote the maximum and minimumasymptotic values of the M-estimate of scaleS,JF, T(F)] with general location, basedon X and the location estimate T(F).

THEOREM 4. Suppose that (AO)-(A2) hold and let 1'x(s) = gx(s,O), where gAs, t)is given by (8). Let T location-scale estimate satisfying thecondition

TI(l - c)Fo cDol = o.Then,(a)(b)

such the :S-estlInat;e based onclass of all of

exists a jump tunction Xa*

minimax asymptotic over theffPT1Pr:;l.l location.

Therefore, the minimax estimate is an S-estimate based on the jump function

Xa*

Near Optimality of the Shorth

As in Section 4, a* -+ Fo-1(.75) as E -+ .5. Thus the minimax estimate of scale

with general location tends to the Shorth as E -+ .5. Table 2 shows the values

of a*,b* = b(a*), the bias B(a*) and the maximum bias B(Shorth) ofthe Shorth for some values of E. These results show that the minimax estimate isreasonably well approximated by the Shorth in terms of maximum bias, the approximation being less good for larger values of E. One again finds that a breakdown

point reasonably close to .5 is obtained by the minimax estimate for a wide rangeof values of E.

E a* b(a*) B(a*) B(Shorth)0.05 0.650 0.516 0.060 0.060

0.10 0.700 0.484 0.127 0.135

0.15 0.716 0.474 0.201 0.220

0.20 0.726 0.468 0.284 0,322

0.30 0.751 0.453 0.495 0.612

0.40 0.763 0.445 0.845 1.166

0.45 0.746 0.456 1.236 1.779

Table 2. Bias-robust M-estimates of scale with general location when Fo =stand..ard. normat, Logarithmic loss function.

Andrews et rate of

6.VERSUS SHORTH

SCALE: J.V.I..l:":L.l.JJ..V.I.

considered Section 3 excludes centering

are of location with redescending location

estnnazes are of course allowed in the larger class considered in Section We now

show that of location Tx(F) is in fact an M-estimate of location with

redescending psi-function tf;(x) = X'(x): Let t* = argmintS(F,t). monotonicityof X(x) on [0,00) and the definition of the S-estimate of scale S(F) (see (7)) implies

that

So, t* minimizes the function let) = EFX[X - tjS(F)J and therefore satisfies theequation l'(t*) = 0, that is, t* satisfies the location M-estimate equation

ret) = EFX' [X - t* j S(F)] = 0.

price paid for using a

breakdown point.

onorsn relative to

Figures 2a and 2b display the maximum bias curves of the minimax Huber and S

estimates of scale (for the case of logarithmic loss function) for outliers and inliers,

respectively. The logarithmic bias for the Madm and the Shorth are also shown.

Figure 2 uniformly smaller bias for the minimax S-estimate than for the

mininla:x fiuber estimate.notice that in Figure 2a the maximum bias curve for the Shorth is uniformly

smaller than that of the minimax S-estimate, whereas the opposite is true in Figure

2b. This is a consequence of the relative way in which the logarithmic loss function

penalizes positive and negative bias. It is worth noticing that if one is concerned

Shorth is the best choice bias. The

better :S-€~stlma,tes relative to the Huber in the case of

outliers is a consequence of the S-estimate of location being an M-estimate with

redescending suffers no bias gross outliers.

Also to 1 we would remark

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

1.091

1.08

1.10

1.101.23

1.43

1.62

1.74

1.62

1.74

1.87

1

1.07

0.98

0.89

0.85

0.83

0.82

0.85

0.92

n::;::: 40

1.10

1.06

1.00

1.04

1.14

1.27

1.46

1.66

1.86

1.66

1.86

2.00

1.12

1.02

0.91

0.82

0.76

0.77

0.77

0.80

0.77

0.80

0.85

n::;::: 100

1.06

1.071.12

1.27

1.39

1.61

1.83

2.01

1.832.01

2.16

1

Lon0.90

0.83

0.77

0.75

0.78

0.77

0.77

0.78

0.82

Table 3. Mean-squared-error relative efficiencies of SHORTH and MADM.

7. FINITE SAMPLE RELEVANCE OF ASYMPTOTIC BIAS ROBUSTNESS

Unf()rtunately,~~efuneti()ns. XU}are discontinuousal1~SO Theorem1 cannot beinvoked to claimfinite sample relevance for the asymptoticminimax theory. However

we can prove the following result, which is relevant to the finite sample size situation.

THEOREM 6. Let 0 < a < 00. For each ~ 0

lim sup PF {S-(a) - ~ :$ S"a(Fn) :2:: S+(a) ~, V n :2:: m} = 1.m....oo FE:Fe

are al1rlost

results summarized on ngures 3 4 suggest

Shorth and for Madm,COITeflpo,ndmg raasimura bias curves. Ut.~~lrv~ that for both for

outuers and the asymptotic maximum bias curves to be rather closeto the finite sample bias curves.

8. FINITE SAMPLE COMPARISON WITH OTHER ESTIMATES

A Monte Carlo simulation study was carried out to compare the bias and meansquared-error (MSE) performance of the following scale estimates: the minimaxbias scale estimates, the Madm, the Shorth, the A-estimate of scale discussed byLax (1985), and the rejection-plus-standard-deviation (with a = .01) discussed bySimonoff (1987). Some results for sample size n = 20 are presented on Figure 5

(MSE) and Figure 6 (bias), for the case Fo = N(O, 1) and logarithmic loss. Eachsimulated s~mple contains exactly £20 outliers generated from the four differentdistributions indicated at the tops of the figures. Similar results (not presented here)were obtained for n = 40, n = 100 and for other type of contaminating distributions.The main conclusions are: (1) when £ < .10 the four estimates perform equally well;(2) for larger fractions of outliers the Shorth and the Madm usually outperformthe other two estimates, with the Shorth being somewhat better; and (3) when theoutliers are large and well separated from the rest of the data, e.g., generated froma N(10, 1), the rejection-plus-standard-deviation estimate performs better than theother three estimates.

9. THE GES APPROXIMATION

gross-error-sensitivity as a measuremnuence curve

mation IS remaraamy

more tnas-robustness

Hampel et al. (1986) established that based on the gross-error sensitivity, theMadm is the most bias robust M-estimate of scale for vanishingly small fractions ofcontannnation e.

same gross-error-sensitivityes the Madm, namely 0.787 (see Leroy and Rousseeuw,1988). However, this unanswered the question of optimality for each £ E

(0, .5), and our that the Shorth is a better estimate than the Madmfrom global (i.e. e > 0) of

On it that

samenussenee parameters. For

approximation to maximal bias curve

of the Madm the impact of the bias of estimation of the nuisance

location parameter. Since the bias of the Shorth is unaffected

by the asymptotic bias of the location estimate, the GES approximation is better

in this case.

10. PROOFS OF LEMMAS AND THEOREMS

The following lemma is needed to prove Theorem 1.

LEMMA 3. Let 0 < 81 < 82 < 00 as Lemma 1. Suppose that (A-0)-(A-2) hold

and also assume that X and hx(s, t) are continuous and hx(s, t) < 0, V s > 0, t E R.Then, for all K > 0, we have:

(a) X[(x - t)/s] is uniformly continuous on (x,s,t) E RX[Sl,S2]X[-K,K].(b) S(F,t) is uniformly continuous on t E R, uniformly on F E:Fe

(c) X[(x - t)/S(F,t)] is uniformly continuous on (x,t) E Rx[-K,K], uniformly on

FE :F€.

Proof. Let 6 > 0 be fixed and let B = [Sl'S2] x [-K, K]. Since X(x) is continuous,

even, monotoneon[O, 0) and bounded, it is uniformly continuous. Let ~l > 0 be

such that Ix- xII < Al implieslx(x) - x(x')1 < 6. Also, since liml$l->oo X(x) 1,

there exists Xo > 0 such that Ixl > Xo and Ix'i > Xo imply Ix(x) - x(x')1 < 8. Let

Xl > 0 be such that if Ixl > Xl, then I(x-t)/sl > xofor all (s, t) E B. So X[(x t)/8]is uniformly continuous on {x : Ixl > Xl} x B. If Ixl :::; Xl then, s - $-/ I :::; xII~

~I +1~1-1;1 and (a) follows. To prove (b) notice that assumptions on hx(s, t)

exists ~ > 0 such that It - t'l < A, ItI :::; K, WI :::; K imply

[X-t' 1

S(F,t) - 6 - [X-t 1 <60

.S(F, t) 6 I - "4'

>4

> (1 - f) EFoX

. fJ 3fJo> fJ(l - f) mm 1h:~(s,t)l- -4 ~ -4 > O.(s,t)EB

-fJo/4

(24)

Thus, S(F,t) ~ S(F,t') -fJ and (b) holds. Finally, (c) follows directly from (a) and(b) 0

Proof of Theorem 1. Let fJ > 0 be fixed. It can be shown, as in the proof of Lemma3 (b), that there exists 0 < fJo < 1 such that

[X -t 1

EFX S(F, t) _ fJ - b(x) ~ fJo,

For all, ~ 0, let Bn(t,,) = {~Ex [S(~0:'5] - b(x) >,}. By Lemma 3 (c) thereexist-]{ = t 1 < t2 < ... < tm = K such that

j=mn Bn(t,O) 2 nBn(tj, fJo/2), (25)Itl::;K j=l

for all n, By (25) and Berstein inequality, for each j = 1, ... , m and for all F E Fo

PF {B~(tj,fJo/2)} ~ PF {I~ ~X [S(~:t~/~ fJl- EFX [S(~Z/~ fJ11 > ~ } ~ e-:#.(26)

By (25) and (26),

PF { inf [S(Fn,t) - S(F, t)J > -fJ} > PF {nltl::;KBn(t, O)}Itl::;K

m

> 1- LPF {B~(tj,fJo/2)} ~ 1 - me-nf/Ul2 ,j=l

for all F E :Fe. Analogously, we can show that

PF {sup [S(Fn,t) - S(F, t)J < fJ} ~ 1 Itl::;K

00

Ln=l

follows by the Borel-Cantelli lemma. To prove (b) first notice that, since

b(X) < (1 - E), there exists K 1 > 0 such that for all ItI > K},

[X - t] (X - t) (X - t)EFX S(F, 0) 2:: EFX -;;- 2:: (1 - €)EFoX -;;- > b(X),

where 81 is as in Lemma 1. Notice that by the Dominated Convergence Theorem

Hence, S(F, t) > S(F,O), Viti> K}, V F E :Fe and so S(F) = inftER S(F, t)infltl~Kl S(F, t). On the other hand, let K3 and 81 > 0 be such that (1- €)PFo(IXI ~K 3 ) > b(X) + 81 , Observe now that

lim inf X ( x - tl:) l(lxl < K 3 ) = l(lxl < K 3 ) , V z E R,K2-+00 Itl>K2 81 + 0

where l(1xl < K 3 ) = 1 if Ixl < K 3 and equal to zero otherwise. Hence, by theDominated Convergence Theorem

Therefore, there exist K 2 2:: K 1 such that

EF Lti~t X [S(;~ ~ 8]} > (1 - €)EFo Ltt~t X [S(;~ ~ 8] l(IXI ~ K3)}> b(X) 81 , (27)

Let 82 = min{85, 8i}/12. By (a), (27) and Berstein inequality,

1 X < {

I < 6} > PF{ sup IS(Fn,t) - S(F, t)1 < 6} +Itl~K2

PF{S(Fn):f: inf S(F,t)} ~ 1 e-n'Y, V F E:Fe,Itl~K2

for some "'I > 0 and (b) follows. Since

PF {IS[Fn,T(Fn)] - S[F,T(F)] > 6} < PF{ sup IS(Fn,t) - S(F, t)l > -26

} +Itl$2K

PF{IS[F,T(Fn)] -S[F,T(F)]I > ~},

(c) follows from (a) and Lemma 3 (c).Finally, (d) follows by noticing that, under the given assumptions, all the statementsmade in the proof of (a), (b) and (c) hold uniformly for all X in C 0

Proof of Lemma 2. Since the median minimizes the maximum asymptotic biasamong location equivariant estimates (Huber, 1964), and since to and t l are the maximum asymptotic biases of the median and a location equivariant estimate, to :::; t«.

Thus, t l = m(to) :::; m(tl ) t2 and in general, tn :::; tn+!' Let t** = limn-+cotn. Since

t** = limn-+cotn+! limn-+com(tn) = m(limn-+cotn) m(t**), we have t** ~ t", Onthe other hand, ift satisfiest m(t) > to then t m(t) ~ tn for all n. Thereforet ~ t** and so t** :::; t*. The second part of (a) follows directly from the continuityof "'I(t). To prove (b) observe that t* is a lower bound for the maximum bias of the

Huber estimate of location. This lower bound is achieved if the estimate is computedby the recursion formula tn+! = m(tn), starting from the median 0

For each b E (c,I - let Cb of x-functions satisfying (AI) andb(X) = b. Also let C be the class of functions satisfying (AI) and (A2), that is,

C = Ue<b<l-eCb' following lemma is needed to prove Theorem 3.

XE 0

>FO-

l[l (b/2)]. Under the assumptions

for all X E Cbi and (b)

1 c) and let a

>b E

'heorem 3 weLemmaof

= s" X(x)fo(x)ko(x)dx 2::s*ko(a) x(x)fo(x)dx = 2s*ko(a) J:='[l-X(x)]fo(x)dx = s*ko(a)[J-=-~[l-X(x)]fo(x)dx+

J£1OO[l - x(x)]fo(x)dx] 2:: J-=-~[l- X(x)]ko(x)dx + s" J:='[l - X(x)]ko(x)fo(x)dx and(b) follows 0

Proof of Theorem 3. First of all notice that since S+ (a) and S- (a) are increasing at

a* and Ll and L'J, are strictly monotone, we have LdS+(a)] = L'J,[S-(a)] = B(a*).Let X E C be fixed and set b = b(X). Let a = FO-

l[l - (b/2)] so that b(X£1) = b. If9x(S*, to) 2:: (b-c)/(l-c) then S+(X) 2:: s*. So B(X) 2:: L'J,[S+(X)] 2:: L'J,(s*) = B(a*).On the other hand, suppose that 9x(S*, to) < (b - c)/(l c), that is S+(X) < s".

Since X E C b, by Lemma 4 (b) we have 9xa(S*, to) :s; 9x(S*, to) < (b - c)/(1 - c).Hence S+(a) < s", too. In view of the optimality of X£1* among jump functions we

have B(a) 2:: B(a*) and so Ll[S-(a)] 2:: Ll[S-(a*)]. For the particular b in question,

by Lemma 4 (a), S-(X£1) 2:: S-(X)· Therefore, B(X) 2:: LdS-(x)] 2:: Ll[S-(a)] 2::Ll[S-(a*)] B(a*), and the theorem follows 0

Proof of Theorem 4. Let Foo = (1 - c)Fo + c600 , too = T(Foo ) and Soo = SAFoo ) .

First notice that

(28)

where Sx(F,O) is the S-scale functional based on X and the true location O. By

definition of the S-estimate of scale, for all F E F o Sx(F) = inft Sx(F,t) :s;inf, SAFoo , t) = SAFoo , 0), and so S+(X) :s; SAFoo , 0). Assume first that Soo < 00

and so too < 00. By monotonicity of 9x(S, t), b EFooX[(X - too)/soo] = (1

C)9x(Soo, too) + c 2:: (1 C)9x(Soo, 0)+ c EFooX(X/soo). Therefore,

(29)

Observe that, if Soo = 00, then (29) trivially holds. Now, (28) and (29) imply

that S+(X) :s; l;I[(b - c)/(1 - c)] :s; S+(X, T) and (a) follows by taking T(F)argminSx(F, t).

prove (b)

tneorem we

orlle:orem 4 and neorem 2 Martin

o< a < 00. It suffices to show that, for

lim sup {sup SXa(Fn ) ~ S+(a)+~} = 0,m ......oo Fe:Fe n~m

and

For each 8> 0 the approximating (even) function Pc(x) is defined as

~ >0

(30)

(31)

p,t"') = { ~ - [(a - "')/.5]if 0 x'5a-8if a-8'5x'5aif x ~ a

(32)

Notice that Ps(x) is continuous and that Ps(x) ~ Xa(X) for all z. For each t E Rand all F, let

Ss(F, t) = sup {s : EFPs[(X - t)js] ~ b(Xa)} . (33)

Clearly, for all t and all F (including the empirical c.d.f. Fn ) we have SXa(F,t) '5Ss(F, t) and so

(34)

It is not difficult to verify that, for all given ~ > 0 there exists 80 > 0 such that

By lllE:orem

(35)

> (36)

(30) tonows

way,

(33), (35). ) can proved a similar

if 0 '5 x '5 a

if a ~ x ~ a 8if x ~ a 8,

Ss(F,t) = inf {s : EFPs[(x - t)/s] b(Xa)} 0 (38)

ACKNOWLEDGEMENTSWe wish to thank Ricardo Fraiman for some useful comments with regard to

establishing uniform consistency of M-estimates of scale. We also thank the referees

for a number of very helpful comments and suggestions.

Q

ci

FIGURE 1.Maximum Bias Curves

H95

... GES(H95)

GES(MADM)=GES{SHORTH) _----------------------------------

0.0 0.1 0.2

EPSILON

0.3 0.4 0.5

HUBER, MADM AJ."JD SHORTH. LOG SCALE

0.50.40.30.2

II

II

II

IISS

I • SHORTHI'1.1,

I'I' •

I' •I' •

I' •/ .

I' •/ .

/ ./ .

/ ./ .

" ." .-:.

coQ

(J) co:;!; Qal

....Q

....Q

q<::>

0.0 0.1

EPSILON

(a) Maximum Bias Due to Outliers

0.50.40.302

IMADMooSHORTH I I

'JflE~/ Iss

/ I

'I /'I I

I'V I'I

II'

I'/

/I'

/I'

//

/.-.-".-.-.-

".-'"

coQ

(J)(Q:;!; Qm

....Q

Nc::i

q<::>

0.0 0.1

EPSILON

It.::IUMC #3. Mean-souared-error curves for n=20, } and logarithmic loss function.

N(0,25) contamination N(6,1) contamination

0.30.2

EPSILON

~C\I

U1.,...

w~(f)

~.,...

t.()

c.i

0c.i

0.0 0.1

-"

0.40.30.2

EPSILON

0.1

rej[.01 ]shorthA-estt.()

.,...

~.,...

0c.i

U(3.5,10) contamination N(10,1) contamination

oN

U1.,...

~.,...

t.()

c.i

I __

0.0 0.1 0.2

EPSILON

0.3

// /

/

............

0.4

~C\I

U1.,...

w 0(f)

~.,...

t.()

c.i

~0

0.0 0.1 0.2

EPSILON

0.3

curves tor n=2U. t-=NlU.l1 ana looantnmlc loss function.

N(O,25) contamination N(6,1)

.,...

oo

(j)<t:iii

1.00

......-

.,...

0.1 0.2

EPSILON

0.3 0.4 0.0 0.1

U(3.5,10) contamination 1)

.,...

/'/'

~m

1.0.,...

1.0o

oo

0.1 0.2

EPSILON

0.3 0.4 0.1 0.3

#5. ::>amole (n=20) and asymptotic maximum bias curve for MAD and SHORTH for F=N(0,1)

N(6,1) contamination N(0,0.01) contamination

0.40.30.2

EPSILON

0.1

-

0.0

oo

lOci

lO.,...

(j) ~~ .,...co

0.40.30.2

EPSILON

.f'//

hh

h' ./ /

~,/..~: <-:~ /

.~ .........--"."'~~

MAD n=infSHORTH n=infMAD n=20SHORTH n=20

0.1

C\!.,...

coo

0.0

N(10,1) contamination PT MASS °contamination

MAD n=infSHORTH n=infMAD n=20SHORTH

N.,...

coci

vci

0.0 0.1 0.2

EPSILON

0.3

~ .... " .. ~ .. #

0.4

lO

(j) ~« .,...co

lOci

oci

0.0 0.1 0.2

EPSILON

0.3 0.4

II.':HJMC #6. :sample (n=40) and asymptotic maximum bias curve for MAD and SHORTH for ) and loaarithmic

N(6,1) contamination N(0,0.01) contamination

LqT'"'

oT'"'

l()

ci

oci

0.0

MAD n=infSHORTH n=infMAD n=40SHORTH n=40

0.2

EPSILON

0.3 0.4

l()

C\i

0C\i

l()

(j)T'"'

<:(

co 0T'"'

ioci

0ci

0.0 0.1 0.2

EPSILON

0.4

Nl1 0,1) contamination PT MASS °,..ru·,,+~rnin on

0.40.30.2

MAD n=infSHORTHMADSHORTH

0.1

l()

C\i

0C\i

l()

(j)T'"'

<co 0T'"'

l()

0

0ci

0.00.40.30.20.10.0

Lqo

LqT'"'

qT'"'

oci

EPSILON EPSILON

REFERENCES

Andrews, D.F., Bickel, P.J., Hampel, F.R., Huber, P.J., Rogers, W.H., and Tukey,

J.W. (1912). Robust estimates of location: Survey and Advances. Princeton University Press, Princeton.

Donoho, D.L. and Lin, R.C. (1988). The automatic robustness of minimum distancefunctionals. Ann. Statist. 16, 552-586.

Grubel, R. (1988). The length of the Shorth.Ann. Statist. 16, 619-628.

Hampel, F.R. (1968). Contributions to the theory of robust estimation. Ph.D.thesis. University of California. Berkeley.

Hampel, F.R. (1911). A general qualitative definition of robustness. Ann. Math.

Statist. 42, 1881-1896.

Hampel, F.R. (1914). The influence curve and its role in robust estimation. Jour.Amer. Statist. Assoc. 69, 383-393.

Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986). Robuststatistics: The approach based on influence functions. John Wiley. New York.

Huber, P.J. (1964). Robust estimation of a location parameter. Ann. Math. Statist.35, 35-101.

Huber, P.J. (1981). Robust statistics. John Wiley. New York.

Lax, D.A. (1985). Robust estimators of scale: finite sample performance in longtailed symmetric distributions, Jour. Amer. Statist. Assoc. 80, 136-741.

Martin, R.D. and Zamar, R.H. (1981). Minimax bias robust M-estimates of scale.Tech. Rep. No 12. Department of Statistics, University of Washington, Seattle,WA.

Martin, R.D., Yohai, V.J. Zamar, R.H. (1989). Minimax bias robust regression.Ann. Statist. 17,

Martin, R.D. and Zamar, R.H. (1989). Asymptotically minimax bias robust Mestimates of scale positive random variables. Jour. Amer. Statist. Assoc. 84

estimator based on the

Simoaoff, J.S. (1981). and estimation of scale. Jour. ofStatist. Comput. and Simulation 27, 19-92.

Yohai, V.J and Zamar, R.H. (1988). High breakdown-point estimates by meansthe minimization of an efficient scale. Jour. Amer. Statist. Assoc. 83, 406-413.

bias robust estimation ofscale by - stat.washington.edu · the august 1991 technical report no. 214...

Documents