bias robust estimation ofscale by - stat.washington.edu · the august 1991 technical report no. 214...
TRANSCRIPT
BIAS ROBUST ESTIMATION OF SCALE
by
R. Douglas MartinRuben H. Zamar
TECHNICAL REPORT No. 214
August 1991
Department of Statistics, GN-22
University of Washington
Seattle, Washington 9819S USA
tun<11ng was
Bias Robust Estimation of Scale
R.D. Martin
University of Washington
Ruben H. Zamar
University of British Columbia
August, 1991
the of Naval Res<earcb, contract NOOOl4-S8-K-0265. Partialthe National and Research of V<>UlaAU"
we OOIISll:Ier
oeiongs to a contam-
ination neighborhood a parametric location-scale We the class of
M-estimates of and that under as-sumptions, these scale estimates to asymptotic functionals uniformly
with respect to the underlying distribution, and with respect to the M-estimate
defining score function X. We establish expressions for the maximum asymptotic
bias of M-estimates of scale over the contamination neighborhood as a function of
fraction.of contamination. Using these expressions we construct as1{mpt()t:i.c:;ally
msn-maa: bias robust estlmat(~S
of the Madm (medianof nun
max bias-robust within the class ofHuber's proposal 2 joint estimates of location
and scale. We also consider the larger class of M-estimates of scale with generallocation, and show that a scaled version of the Shorth (the shortest half of the data)
is approximately min-max bias robust in this class. Finally, we present the results
of a Monte Carlo study showing that the Shorth has attractive finite sample size
mean squared error properties for contaminated Gaussian data.
The August 1991 Technical Report No. 214 form of this paper is a considerably
ext,en(1eCl version of the October, 1989 Technical Report 184.
KEY WORDS: Bias robustness, M-estimates, scale, location, Madm, Shorth
main theoretical approach to robustness has consisted of studying the asymp
totic of an underlying distribution of the data belongs
to some neighborhood (e.g, e-contamination or Levy neighborhood) of a paramet
ric model. In this context one to obtain estimates which optimize some ap
pealing criterion, e.g., minimize the maximum asymptotic variance over a given
neighborhood. Huber (1964) is the earliest example of this approach, with focus on
M-estimates of location.
The best known part of Huber (1964) is the result that a particular M-estimate
of location, namely the one with psi-function 'ljJ( x) = min{c, max(x, -c)}, minimizes
the maximum asymptotic variance over symmetric e-contamination neighborhoods
of a Gaussian model. A consigerably less well known part of Huber (1964) is that
concerned with asymptotic bias of location estimates for unrestricted asymmetric
e-contamination neighborhoods of a nominal Gaussian model: among all translation
equivariant estimates, the median minimizes the maximum asymptotic bias over
such neighborhoods. The relevance of this result seems considerable in view of the
needed realism of allowing asymmetric contamination.
Recently there has been a renewed interest in bias-robustness. In particular
Donoho and Liu (1988) have shown that minimum distance estimates have desir
able bias robustness properties. Martin, Yohai and Zamar (1989) have obtained
asymptotically minimax bias regression estimates, and Martin and Zamar (1989)
have obtained minimax bias estimates of scale for positive random variables.
In this paper we obtain minimax bias robust estimates of scale for contamination
models with a nominal distribution which is symmetric about an unknown location
parameter. More precisely, we assume that
(AO) Fo is a specified distribution, function with an even, and unimodal density fo.
The distribution for independent and identically distributed observations XI, ... ,Xn
belongs to the e-contaminated family
= (1- z E R, t (0, .,5)} ,
unsnown Iecation parameter, So IS
minimax estimate is to rlAl"nrp
one
prior resutts
delpelldllng on
tuncnoas to;je(~tlC.n 4 constructs mnumax
ranges overare uniformly
a curve, a versus c. bias curve
includes gross error c = 0, and
also breakdown-point BT {c)goes to the two-number summary consisting of the andprovides considerable information, one naturally would like to have the entire curve
BT{c) if possible. Not only do such curves allow one to check the range of accuracy
of the GEST as a approximation, they may also lead to different preference
ordering of competing estimates that one might make on the basis of GEST and cT(e.g. see Section 5 and also Martin, Yohai and Zamar, 1989, who find min-max bias
robust regression estimates with GEST = 00).Figure 1 displays the maximum bias curves for three proposed robust estimates
of scale: H95, a Huber proposal 2 estimate of scale, ~djusted for 95%~fficiency
at the Gaussian (Huher, 1964); the median ofahsolute deviatioIlsa.houtthe media.n(Madm); and the "shortest half" of the data (Shorth). Observe that
c'Shorth = CMadm .5, the largest possible value of e" and cR95 = .17. The breakdownpoint of a classical Gaussian maximum likelihood estimate is typically zero. The
GEST lines provide local linear approximations to the maximum bias, which are
reasonable for not too large values of e (just how large the reader can judge for
himself - see the rule of thumb in Hampel et, al. 1986).
In this section we show that, under certain regularity conditions, the finite sample
value and the asymptotic value of robust M-scale.estimates are uniformly close, as
F rarrges over the family Fe. Mor~ver ,priorresults .in Martin andZamar (1989)indicate that the bias is a significant component of the mean-squared error for rather
small to moderate sample sizes, depending on the value of c.
The remainder of the paper is organized as HII''''''''. ;jject.lon 2 introduces class of
M-estimates of scale with general location. This the well known Huber
{Proposal 2) of location and scale, class of scale estimates
are with so of regression
(Rousseeuw and Yohai, 1984). Section 2 also shows under certain regularity
conditions, the sample value the of scale
robust estrmates are by
(Madm). Secnoa 5 constructs rsmimax bias-robust scale, which are
shown to be min-max in the larger class of M-estimates of with general location
introduced in 2. Section 5 also that these estimates are reasonably well
Shorth. Section 6 briefly discusses the between bias-
robust Huber estimates and Sections 7 and 8 give some encouraging
finite results. Finally, Section 9 closes with a brief discussion of the GES
linear approximation to the maximum bias curve. Proofs of lemmas and theorems
are given in Section 10.
Our results on the Shorth complement recent results of Rousseeuw and Leroy
(1988), who propose the Shorth as a robust scale estimate. They derive the influ
ence function, the finite sample breakdpwn-point, and a correction factor to achieve
approxi:rnate sample unbiasedness at the normal distribution. Another
interesting recent work on the Shorth is that of Grubel (1988), who establishes
asymptotic normality.
2. M-ESTIMATES OF SCALE WITH GENERAL LOCATION
Estimates of scale are conveniently viewed as translation invariant, scale equivari
ant functionals S(F) defined over a subset F of distribution functions F, which is as
sumed to include all the empirical distribution functions Fn and the e-contamination
family (1). scale estimate sn is then obtained by evaluating the funCtional S(F)at r; sn S(Fn).
Suppose that
(AI) X is even, nondeereasinq on [0,(0), bounded, with at most a finite number of
jumps, and that x((0) = 1.
and for each t E
b(X) = EFoX(X)
S(F, t) be the M-estimate of scale of X - t defined by
{s: > (2)
assume
€< < 1- €, a €E
=1.
no assumption that
definition (2) is needed to insure and to handle possible discon-
tinuities of F and X. If X (or F) is continuous, then S(F, t) equation
EFX((X - t)jS(F, t)J = b(x). (3)
Since the location parameter Ito in (1) is unknown, it must be estimated along
with so. Let T(F) be a location and scale equivariant functional, that is,
T(F((x t)js)] sT(F(x)]+t, 'V tER, 'V s>O.
The M-estimate of scale with general location is now defined as
S(F) = S(F,T(F)].
Some particular cases are:
Huber Proposal 2. In this case T(F) and S[F, T(F)] simultaneously satisfy (3), with
t = T(F), and
where
EF1jJ[(X - T(F))jS(F,T(F))] = 0, (4)
(A3) 1jJ(x) is odd, notulecreasinq, bounded, with at most a finite number of jumps.
In particular the Madm is obtained when X is the jump function
if Ixl ~ aif Ixl > a,
(5)
with a = (3j4), and 1jJ is the "sign" function
{1 if x < 0
- 0 if x 0
1 if x >
case IS
case
S(F, t).
IS a minimizer of S(F, t),
(7)
It is not difficult to see S(F) satisfies (3) and (4) with t/J(x) = Xl(x).Since X(x) is bounded, t/J(x) tends to zero as x tends to infinity, that is, t/J(x) isredescending. In particular, the Shorth is obtained when X is given by (6). Observethat the Madm and the Shorth have both the same chi-function (6) but differentcentering functionals,
The following lemma shows that under mild assumptions the breakdown point ofthe the functionaLS(F,t) is larger than f. The proof is straightforward and thereforeomitted.
LEMMA 1. Let K > 0 be given and suppose that (AO), (AI) and (A2) hold. Then,there exist 0 < 81 < 82 < 00 such that 81 ~ S(F, t) ~ 82 for all ItI < K andFE s;
Theorem 1 below shows that, under some regularity conditions which include thecontinuity of X, S(Fn) -+ S(F) a.s.[F] as n -+ 00, uniformly over :Fe X C, where Cis a certain class of x-functions. Unfortunately, the case of x-functions of the jumptype gi~en by (5) is not covered by Theorem L However, Theorem 6 and the MonteCarlo results presented in sections 7 and 8 support the fiIl.i.t~ sample relevance of theasymptotic minimax-bias theory for this important special case.
The following definitions are needed for stating Theorem L
gX(8,t) = EFoX[(X - t)ls] hx(s,t) = (8188)gx(s,t). (8)
>X and
Then, for
IS
all <5 > 0,
(a) U.LUm-+oo lSUlf,JJ;'L::'1:
(b) IfO.
t)l > <5} = O.
m-+oo I> <5} = 0,
(d) xo> °he class C as the set of x-functions satisfying allthe previous assumptions and (i) X( x) = 1 V Ixl ~ Xo and (ii) there exists hoes, t)such that hx(s, t) ~ hoes, t) < 0, "'Is> 0, "'It E R.
Then (a), (b) and (c) hold uniformly on C.
Remark. Suppose that a certain function Xo satisfies (AI) and (A2) and is such
that
hxo(s, t) = -~EFO [x~ (X s t) xi t] < 0, "'Is> O,t E Rand Xo(x) 1, Vlxl > xo.
Thenthe set
{I [(X - t) X- t]}C = X : X'(x):::; X~(x) , V x > 0 and hxCs,t) = -;EFo X' -s- -s-
satisfies the assumptions of Theorem 1.
3. GENERALIZED BIAS
Although the M-estimates of scale with general location introduced in Section2 are Fisher con~istent at the nominal distribution Fo, they are in general asymptotically biased for F E Fe. Furthermore, the "raw" asymptotic bias Br[S(F)] =
S(F) - So can be of two distinct kinds: When F is an outliers generating distribu
tion, the bias Br[S(F)] is positive, and when F is an inliers generating distribution,
the bias Br[S(F)] negative.As in Martin and (1989), we consider generalized bias functions which
are of positive and negative bias is inde-
pendently chosen, by allowing the user to put positive and negative bias on different
scales. Specifically, we define the bias
if 0 < S(F) :::; So
if So < :::; 00(9)
oare COfltULUoUS, non-negative
00.
are interested maximum geileraJ.u~ed bias,
B(c) = maxB[S(F)].FE:Fe
(10)
l,From monotonicity LI and L2 , it follows that B(X, T) = max {LI[S- / so] , L2[S+ / so]},where S- and S+ denote the supremum and the infimum of the functional S(F) as
F ranges over Fe.
4. BIAS ROBUST HUBER ESTIMATES
In view of the historical importance and high degree of familiarity of Huber
(Proposal 2) estimates we first focus on obtaining bias robust estimates in this
class. To emphasize the dependence on X and t/J we use the notation S(F, X, t/J),S+(X,t/J), S+(X,t/J), etc.
The first step toward finding the bias robust Huber estimate is deriving the
expressions (16) and (17) for S-(X,T1J;) and S+(X,T1J;)' Claims which are made
below without proof can be easily verified under (AO)-(A3).
The maximum value S+(X, t/J) of the scale functional S(F, X, t/J) is produced by apoint mass contamination at infinity, 600 , and such contamination also produces the
maximum value of the location estimate T1J;(F). The estimating equations in this
limit case are
(1 - c)EFoX[(X - t)/s1 =b(X) (11)
and
(1 - c)EFot/J[(X - t)/8]+e= O. (12)
maximum asymptotic bias due to outliersnrovides an algorithmof
Let ,At) be the unique solution of (11) for fixed t and let '1'1J;(8) be the unique
solution of (12) for fixed s :> O. The fUIletioIlmx,1J;(t) 1'1J;bx(t)] is continuous andnon-decreasing. Also, the pair (8*, t*) simultaneously satisfy (11) and (12) if and
only if t* mx,1J;(t*) and s" = IX(t*).following lemma characterizes
location
IS
LEMA-fA 2. Suppose that For each n ~ 1 let tn m x,'4t(tn-d , withto by (13). Sn = inf{t > to : = t}. (a)UUJln_oo t n = t'" Sn = IxCt*) = ,(b) the maximum asymptotic bias ofthe location estimate T(F, x, t/J) is t" and the maximum value of scale functional
S(F, X, t/J) is S+(X, = s".
The minimum value of the scale functional S(F,X, t/J), S-(X, t/J), is produced bya point mass contamination 80 at zero. In this case the estimating equations are
and
(1 €)EFoX[(X - t)/s] + €x(tls) = b(x)
(1 - €)EFot/J[(X - t)/s] + €t/J( -tis) = o.
(14)
(15)
By monotonicity of t/J, t = 0 for all s > O. Let 9;1 be the inverse of 9x(s,t) withrespect to s, for fixed t, Then, from (14) with t = 0 it follows that
(16)
Optimal Centedng
The of t/Jhas an effect on the maximum asymptotic bias of the scaleestimate by virtue of affecting the bias t* of the location estimate. Observe that
since S-(X, t/J) doesn't depend on t/J (see (16)), the optimal choice of t/J must bebased S+(X,t/J) alone.
It follows from Lemma 2 and (11), with t t", that
(17)
Since for all 0 < a < 1 the function 9;1(a) is non-decreasing in t. Therefore, by the
Huber (1964) result, ~ to all s > 0 and
X satisfying
dIt1tlcult to 'neorem 2 holds for the class of
all M-lestima,tes of scale with cell.teriag functional T(F) navmz the "monotonicity"
property
T(F) ~ T[(1 - c)Fo+ (18)
The Minimax-Bias Huber Estimate of Scale
By Theorem 2 it suffices to consider S+(X, 't/Jo) and S- (X,'t/Jo) and the function 't/Jcan be dropped from the notations. It will be shown that under certain conditions
the maximum generalized bias B(X) (see (10)) is minimized by a jump function Xa*(see (6)).
For each a > 0, let B(a) = B(Xa) and
b(a) = b(Xa) = 2[1 - Fo(a)]. (19)
We begin by showing that given 0 < c < .5, Fo, L l and L 2 there exists a" such that
Va> O. (20)
Let ao = Fo-l [(1+ c)/2] and al = FO-
l [(2 c)/2]. ~From (19), the correspondingvalues of bare bo b(ao) 1 - c and bl b(al) = c. Hence, letting S- (a) = S- (Xa)and S+(a) = S+(Xa), we have
lim S-(a) =a.....ao
and
_l rb(a)]go II - c
. 1 -1 [ b(a)] 1 -1 )lim -Fo 1 - 2( ) = -Fo (.5) = O. (21a.....ao a 1 - c ao
+ 11'm g-:1 [b(a) - c] 2:: 11'm g-1 [b(a) - c]}1I~\ S (a) - a.....al t" 1 _ C a.....al 0 1 - c
lim .!:.F,-1 [1 _b(a) - c] = ..!-FO-l(l) =
a.....al a 0 2(1 c) al(22)
<
---+ +00ao < a* < al
a ---+ ao or
robust among all Xa.is bias robust among
THEOREltf 3. s* = a" given by (23), and to as (13).
Suppose that in addition to (AO)-(A1), the following conditions hold:
(1) lo(x) > 0, V x E Rand 10(8X)/lo(x) is increasing in lxI, V 0 < 8 < L(2) The function ko(x) = [/o(8*X - to)+ 10(8*X to)]/Io(x) is decreasing in Ixl.(3) S- (a) and S+(a) are both strictly monotone at a = a":
Then B(a*) :s; B(X, t/J) for all pair (X, t/J) satisfying (A1)-(A3).
It can be shown that the conditions of Theorem 3 hold, for example, when Fo isthe standard c < .35 (see Martin and Zamar, 1987).
Near Optimality of the Madm
Let b* be the value of b(Xa*) EFoXa*(X). Since the bias robust estimate ofTheorem 3 is based on Xa*' using the median for centering, it follows that the biasrobust Huber estimate is the n - [nb*] order statistic of the absolute value of the
residuals about the median (scaled by l/a*), where ao(c) < a* < al(c). Since both,ao(c) and al(c) tend to Fo-
1(.75) as € -+ .5, so does a", Thus, as c -+ .5, the bias
robust Huber estimate is the well known Madm, whose breakdown-point is equal to
.5.It came as a pleasant surprise that for a broad range of c, the maximum bias of the
bias robust estimate is very close to the Madm for the leading case of the nominal
Gaussian distribution and the logarithmic loss function Lz(t) = -L1(t) = log(t).Table 1 shows the values of a", b* = b(a*), the minimax bias B(a*) and maximum
bias of the Madm values of L The value of a for the Madm is
.674. Therefore in case appreciable between the Madm
and the bias robust Note in particular that even when we choose e small,
e.g. c = .05, breakdown-point of estimate is close
to .5.
5. BIAS ROBUST S-ESTIMATES
robustness can
one search a nnnunax sosunon,
en-
0.20
0.30
OAOOA5
0.501
0.499
OA87OA76
0.324
0.609
1.072
1.440
0.063
0.1350.221
0.324
0.612
1.166
1.779
Table 1. Bias-robust Huber proposal 2 estimates of scale when Fo standardnormal. Logarithmic loss function.
particular one may consider the entire class of M-estima,tes of scale with generallocation. This larger class of course includes joint M-estimates of location and scalewith redescending as well as monotone t/J for the location estimate.
As a first step in dealing with this problem, we show that it suffices to restrictattention to the smaller class of S-estimates of scale.
The following notation is needed for stating Theorem 4. Let S+(X) and S-(X)denote the maximum and minimum asymptotic values of the S-estimate of scale
based on X (see (7)). Let S+(X, T) and S-(X, T)denote the maximum and minimumasymptotic values of the M-estimate of scaleS,JF, T(F)] with general location, basedon X and the location estimate T(F).
THEOREM 4. Suppose that (AO)-(A2) hold and let 1'x(s) = gx(s,O), where gAs, t)is given by (8). Let T location-scale estimate satisfying thecondition
TI(l - c)Fo cDol = o.Then,(a)(b)
such the :S-estlInat;e based onclass of all of
exists a jump tunction Xa*
minimax asymptotic over theffPT1Pr:;l.l location.
Therefore, the minimax estimate is an S-estimate based on the jump function
Xa*
Near Optimality of the Shorth
As in Section 4, a* -+ Fo-1(.75) as E -+ .5. Thus the minimax estimate of scale
with general location tends to the Shorth as E -+ .5. Table 2 shows the values
of a*,b* = b(a*), the bias B(a*) and the maximum bias B(Shorth) ofthe Shorth for some values of E. These results show that the minimax estimate isreasonably well approximated by the Shorth in terms of maximum bias, the approximation being less good for larger values of E. One again finds that a breakdown
point reasonably close to .5 is obtained by the minimax estimate for a wide rangeof values of E.
E a* b(a*) B(a*) B(Shorth)0.05 0.650 0.516 0.060 0.060
0.10 0.700 0.484 0.127 0.135
0.15 0.716 0.474 0.201 0.220
0.20 0.726 0.468 0.284 0,322
0.30 0.751 0.453 0.495 0.612
0.40 0.763 0.445 0.845 1.166
0.45 0.746 0.456 1.236 1.779
Table 2. Bias-robust M-estimates of scale with general location when Fo =stand..ard. normat, Logarithmic loss function.
Andrews et rate of
6.VERSUS SHORTH
SCALE: J.V.I..l:":L.l.JJ..V.I.
considered Section 3 excludes centering
are of location with redescending location
estnnazes are of course allowed in the larger class considered in Section We now
show that of location Tx(F) is in fact an M-estimate of location with
redescending psi-function tf;(x) = X'(x): Let t* = argmintS(F,t). monotonicityof X(x) on [0,00) and the definition of the S-estimate of scale S(F) (see (7)) implies
that
So, t* minimizes the function let) = EFX[X - tjS(F)J and therefore satisfies theequation l'(t*) = 0, that is, t* satisfies the location M-estimate equation
ret) = EFX' [X - t* j S(F)] = 0.
price paid for using a
breakdown point.
onorsn relative to
Figures 2a and 2b display the maximum bias curves of the minimax Huber and S
estimates of scale (for the case of logarithmic loss function) for outliers and inliers,
respectively. The logarithmic bias for the Madm and the Shorth are also shown.
Figure 2 uniformly smaller bias for the minimax S-estimate than for the
mininla:x fiuber estimate.notice that in Figure 2a the maximum bias curve for the Shorth is uniformly
smaller than that of the minimax S-estimate, whereas the opposite is true in Figure
2b. This is a consequence of the relative way in which the logarithmic loss function
penalizes positive and negative bias. It is worth noticing that if one is concerned
Shorth is the best choice bias. The
better :S-€~stlma,tes relative to the Huber in the case of
outliers is a consequence of the S-estimate of location being an M-estimate with
redescending suffers no bias gross outliers.
Also to 1 we would remark
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
1.091
1.08
1.10
1.101.23
1.43
1.62
1.74
1.62
1.74
1.87
1
1.07
0.98
0.89
0.85
0.83
0.82
0.85
0.92
n::;::: 40
1.10
1.06
1.00
1.04
1.14
1.27
1.46
1.66
1.86
1.66
1.86
2.00
1.12
1.02
0.91
0.82
0.76
0.77
0.77
0.80
0.77
0.80
0.85
n::;::: 100
1.06
1.071.12
1.27
1.39
1.61
1.83
2.01
1.832.01
2.16
1
Lon0.90
0.83
0.77
0.75
0.78
0.77
0.77
0.78
0.82
Table 3. Mean-squared-error relative efficiencies of SHORTH and MADM.
7. FINITE SAMPLE RELEVANCE OF ASYMPTOTIC BIAS ROBUSTNESS
Unf()rtunately,~~efuneti()ns. XU}are discontinuousal1~SO Theorem1 cannot beinvoked to claimfinite sample relevance for the asymptoticminimax theory. However
we can prove the following result, which is relevant to the finite sample size situation.
THEOREM 6. Let 0 < a < 00. For each ~ 0
lim sup PF {S-(a) - ~ :$ S"a(Fn) :2:: S+(a) ~, V n :2:: m} = 1.m....oo FE:Fe
are al1rlost
results summarized on ngures 3 4 suggest
Shorth and for Madm,COITeflpo,ndmg raasimura bias curves. Ut.~~lrv~ that for both for
outuers and the asymptotic maximum bias curves to be rather closeto the finite sample bias curves.
8. FINITE SAMPLE COMPARISON WITH OTHER ESTIMATES
A Monte Carlo simulation study was carried out to compare the bias and meansquared-error (MSE) performance of the following scale estimates: the minimaxbias scale estimates, the Madm, the Shorth, the A-estimate of scale discussed byLax (1985), and the rejection-plus-standard-deviation (with a = .01) discussed bySimonoff (1987). Some results for sample size n = 20 are presented on Figure 5
(MSE) and Figure 6 (bias), for the case Fo = N(O, 1) and logarithmic loss. Eachsimulated s~mple contains exactly £20 outliers generated from the four differentdistributions indicated at the tops of the figures. Similar results (not presented here)were obtained for n = 40, n = 100 and for other type of contaminating distributions.The main conclusions are: (1) when £ < .10 the four estimates perform equally well;(2) for larger fractions of outliers the Shorth and the Madm usually outperformthe other two estimates, with the Shorth being somewhat better; and (3) when theoutliers are large and well separated from the rest of the data, e.g., generated froma N(10, 1), the rejection-plus-standard-deviation estimate performs better than theother three estimates.
9. THE GES APPROXIMATION
gross-error-sensitivity as a measuremnuence curve
mation IS remaraamy
more tnas-robustness
Hampel et al. (1986) established that based on the gross-error sensitivity, theMadm is the most bias robust M-estimate of scale for vanishingly small fractions ofcontannnation e.
same gross-error-sensitivityes the Madm, namely 0.787 (see Leroy and Rousseeuw,1988). However, this unanswered the question of optimality for each £ E
(0, .5), and our that the Shorth is a better estimate than the Madmfrom global (i.e. e > 0) of
On it that
samenussenee parameters. For
approximation to maximal bias curve
of the Madm the impact of the bias of estimation of the nuisance
location parameter. Since the bias of the Shorth is unaffected
by the asymptotic bias of the location estimate, the GES approximation is better
in this case.
10. PROOFS OF LEMMAS AND THEOREMS
The following lemma is needed to prove Theorem 1.
LEMMA 3. Let 0 < 81 < 82 < 00 as Lemma 1. Suppose that (A-0)-(A-2) hold
and also assume that X and hx(s, t) are continuous and hx(s, t) < 0, V s > 0, t E R.Then, for all K > 0, we have:
(a) X[(x - t)/s] is uniformly continuous on (x,s,t) E RX[Sl,S2]X[-K,K].(b) S(F,t) is uniformly continuous on t E R, uniformly on F E:Fe
(c) X[(x - t)/S(F,t)] is uniformly continuous on (x,t) E Rx[-K,K], uniformly on
FE :F€.
Proof. Let 6 > 0 be fixed and let B = [Sl'S2] x [-K, K]. Since X(x) is continuous,
even, monotoneon[O, 0) and bounded, it is uniformly continuous. Let ~l > 0 be
such that Ix- xII < Al implieslx(x) - x(x')1 < 6. Also, since liml$l->oo X(x) 1,
there exists Xo > 0 such that Ixl > Xo and Ix'i > Xo imply Ix(x) - x(x')1 < 8. Let
Xl > 0 be such that if Ixl > Xl, then I(x-t)/sl > xofor all (s, t) E B. So X[(x t)/8]is uniformly continuous on {x : Ixl > Xl} x B. If Ixl :::; Xl then, s - $-/ I :::; xII~
~I +1~1-1;1 and (a) follows. To prove (b) notice that assumptions on hx(s, t)
exists ~ > 0 such that It - t'l < A, ItI :::; K, WI :::; K imply
[X-t' 1
S(F,t) - 6 - [X-t 1 <60
.S(F, t) 6 I - "4'
>4
> (1 - f) EFoX
. fJ 3fJo> fJ(l - f) mm 1h:~(s,t)l- -4 ~ -4 > O.(s,t)EB
-fJo/4
(24)
Thus, S(F,t) ~ S(F,t') -fJ and (b) holds. Finally, (c) follows directly from (a) and(b) 0
Proof of Theorem 1. Let fJ > 0 be fixed. It can be shown, as in the proof of Lemma3 (b), that there exists 0 < fJo < 1 such that
[X -t 1
EFX S(F, t) _ fJ - b(x) ~ fJo,
For all, ~ 0, let Bn(t,,) = {~Ex [S(~0:'5] - b(x) >,}. By Lemma 3 (c) thereexist-]{ = t 1 < t2 < ... < tm = K such that
j=mn Bn(t,O) 2 nBn(tj, fJo/2), (25)Itl::;K j=l
for all n, By (25) and Berstein inequality, for each j = 1, ... , m and for all F E Fo
PF {B~(tj,fJo/2)} ~ PF {I~ ~X [S(~:t~/~ fJl- EFX [S(~Z/~ fJ11 > ~ } ~ e-:#.(26)
By (25) and (26),
PF { inf [S(Fn,t) - S(F, t)J > -fJ} > PF {nltl::;KBn(t, O)}Itl::;K
m
> 1- LPF {B~(tj,fJo/2)} ~ 1 - me-nf/Ul2 ,j=l
for all F E :Fe. Analogously, we can show that
PF {sup [S(Fn,t) - S(F, t)J < fJ} ~ 1 Itl::;K
00
Ln=l
follows by the Borel-Cantelli lemma. To prove (b) first notice that, since
b(X) < (1 - E), there exists K 1 > 0 such that for all ItI > K},
[X - t] (X - t) (X - t)EFX S(F, 0) 2:: EFX -;;- 2:: (1 - €)EFoX -;;- > b(X),
where 81 is as in Lemma 1. Notice that by the Dominated Convergence Theorem
Hence, S(F, t) > S(F,O), Viti> K}, V F E :Fe and so S(F) = inftER S(F, t)infltl~Kl S(F, t). On the other hand, let K3 and 81 > 0 be such that (1- €)PFo(IXI ~K 3 ) > b(X) + 81 , Observe now that
lim inf X ( x - tl:) l(lxl < K 3 ) = l(lxl < K 3 ) , V z E R,K2-+00 Itl>K2 81 + 0
where l(1xl < K 3 ) = 1 if Ixl < K 3 and equal to zero otherwise. Hence, by theDominated Convergence Theorem
Therefore, there exist K 2 2:: K 1 such that
EF Lti~t X [S(;~ ~ 8]} > (1 - €)EFo Ltt~t X [S(;~ ~ 8] l(IXI ~ K3)}> b(X) 81 , (27)
Let 82 = min{85, 8i}/12. By (a), (27) and Berstein inequality,
1 X < {
I < 6} > PF{ sup IS(Fn,t) - S(F, t)1 < 6} +Itl~K2
PF{S(Fn):f: inf S(F,t)} ~ 1 e-n'Y, V F E:Fe,Itl~K2
for some "'I > 0 and (b) follows. Since
PF {IS[Fn,T(Fn)] - S[F,T(F)] > 6} < PF{ sup IS(Fn,t) - S(F, t)l > -26
} +Itl$2K
PF{IS[F,T(Fn)] -S[F,T(F)]I > ~},
(c) follows from (a) and Lemma 3 (c).Finally, (d) follows by noticing that, under the given assumptions, all the statementsmade in the proof of (a), (b) and (c) hold uniformly for all X in C 0
Proof of Lemma 2. Since the median minimizes the maximum asymptotic biasamong location equivariant estimates (Huber, 1964), and since to and t l are the maximum asymptotic biases of the median and a location equivariant estimate, to :::; t«.
Thus, t l = m(to) :::; m(tl ) t2 and in general, tn :::; tn+!' Let t** = limn-+cotn. Since
t** = limn-+cotn+! limn-+com(tn) = m(limn-+cotn) m(t**), we have t** ~ t", Onthe other hand, ift satisfiest m(t) > to then t m(t) ~ tn for all n. Thereforet ~ t** and so t** :::; t*. The second part of (a) follows directly from the continuityof "'I(t). To prove (b) observe that t* is a lower bound for the maximum bias of the
Huber estimate of location. This lower bound is achieved if the estimate is computedby the recursion formula tn+! = m(tn), starting from the median 0
For each b E (c,I - let Cb of x-functions satisfying (AI) andb(X) = b. Also let C be the class of functions satisfying (AI) and (A2), that is,
C = Ue<b<l-eCb' following lemma is needed to prove Theorem 3.
XE 0
>FO-
l[l (b/2)]. Under the assumptions
for all X E Cbi and (b)
1 c) and let a
>b E
'heorem 3 weLemmaof
= s" X(x)fo(x)ko(x)dx 2::s*ko(a) x(x)fo(x)dx = 2s*ko(a) J:='[l-X(x)]fo(x)dx = s*ko(a)[J-=-~[l-X(x)]fo(x)dx+
J£1OO[l - x(x)]fo(x)dx] 2:: J-=-~[l- X(x)]ko(x)dx + s" J:='[l - X(x)]ko(x)fo(x)dx and(b) follows 0
Proof of Theorem 3. First of all notice that since S+ (a) and S- (a) are increasing at
a* and Ll and L'J, are strictly monotone, we have LdS+(a)] = L'J,[S-(a)] = B(a*).Let X E C be fixed and set b = b(X). Let a = FO-
l[l - (b/2)] so that b(X£1) = b. If9x(S*, to) 2:: (b-c)/(l-c) then S+(X) 2:: s*. So B(X) 2:: L'J,[S+(X)] 2:: L'J,(s*) = B(a*).On the other hand, suppose that 9x(S*, to) < (b - c)/(l c), that is S+(X) < s".
Since X E C b, by Lemma 4 (b) we have 9xa(S*, to) :s; 9x(S*, to) < (b - c)/(1 - c).Hence S+(a) < s", too. In view of the optimality of X£1* among jump functions we
have B(a) 2:: B(a*) and so Ll[S-(a)] 2:: Ll[S-(a*)]. For the particular b in question,
by Lemma 4 (a), S-(X£1) 2:: S-(X)· Therefore, B(X) 2:: LdS-(x)] 2:: Ll[S-(a)] 2::Ll[S-(a*)] B(a*), and the theorem follows 0
Proof of Theorem 4. Let Foo = (1 - c)Fo + c600 , too = T(Foo ) and Soo = SAFoo ) .
First notice that
(28)
where Sx(F,O) is the S-scale functional based on X and the true location O. By
definition of the S-estimate of scale, for all F E F o Sx(F) = inft Sx(F,t) :s;inf, SAFoo , t) = SAFoo , 0), and so S+(X) :s; SAFoo , 0). Assume first that Soo < 00
and so too < 00. By monotonicity of 9x(S, t), b EFooX[(X - too)/soo] = (1
C)9x(Soo, too) + c 2:: (1 C)9x(Soo, 0)+ c EFooX(X/soo). Therefore,
(29)
Observe that, if Soo = 00, then (29) trivially holds. Now, (28) and (29) imply
that S+(X) :s; l;I[(b - c)/(1 - c)] :s; S+(X, T) and (a) follows by taking T(F)argminSx(F, t).
prove (b)
tneorem we
orlle:orem 4 and neorem 2 Martin
o< a < 00. It suffices to show that, for
lim sup {sup SXa(Fn ) ~ S+(a)+~} = 0,m ......oo Fe:Fe n~m
and
For each 8> 0 the approximating (even) function Pc(x) is defined as
~ >0
(30)
(31)
p,t"') = { ~ - [(a - "')/.5]if 0 x'5a-8if a-8'5x'5aif x ~ a
(32)
Notice that Ps(x) is continuous and that Ps(x) ~ Xa(X) for all z. For each t E Rand all F, let
Ss(F, t) = sup {s : EFPs[(X - t)js] ~ b(Xa)} . (33)
Clearly, for all t and all F (including the empirical c.d.f. Fn ) we have SXa(F,t) '5Ss(F, t) and so
(34)
It is not difficult to verify that, for all given ~ > 0 there exists 80 > 0 such that
By lllE:orem
(35)
> (36)
(30) tonows
way,
(33), (35). ) can proved a similar
if 0 '5 x '5 a
if a ~ x ~ a 8if x ~ a 8,
Ss(F,t) = inf {s : EFPs[(x - t)/s] b(Xa)} 0 (38)
ACKNOWLEDGEMENTSWe wish to thank Ricardo Fraiman for some useful comments with regard to
establishing uniform consistency of M-estimates of scale. We also thank the referees
for a number of very helpful comments and suggestions.
Q
ci
FIGURE 1.Maximum Bias Curves
H95
... GES(H95)
GES(MADM)=GES{SHORTH) _----------------------------------
0.0 0.1 0.2
EPSILON
0.3 0.4 0.5
HUBER, MADM AJ."JD SHORTH. LOG SCALE
0.50.40.30.2
II
II
II
IISS
I • SHORTHI'1.1,
I'I' •
I' •I' •
I' •/ .
I' •/ .
/ ./ .
/ ./ .
" ." .-:.
coQ
(J) co:;!; Qal
....Q
....Q
q<::>
0.0 0.1
EPSILON
(a) Maximum Bias Due to Outliers
0.50.40.302
IMADMooSHORTH I I
'JflE~/ Iss
/ I
'I /'I I
I'V I'I
II'
I'/
/I'
/I'
//
/.-.-".-.-.-
".-'"
coQ
(J)(Q:;!; Qm
....Q
Nc::i
q<::>
0.0 0.1
EPSILON
It.::IUMC #3. Mean-souared-error curves for n=20, } and logarithmic loss function.
N(0,25) contamination N(6,1) contamination
0.30.2
EPSILON
~C\I
U1.,...
w~(f)
~.,...
t.()
c.i
0c.i
0.0 0.1
-"
0.40.30.2
EPSILON
0.1
rej[.01 ]shorthA-estt.()
.,...
~.,...
0c.i
U(3.5,10) contamination N(10,1) contamination
oN
U1.,...
~.,...
t.()
c.i
I __
0.0 0.1 0.2
EPSILON
0.3
// /
/
............
0.4
~C\I
U1.,...
w 0(f)
~.,...
t.()
c.i
~0
0.0 0.1 0.2
EPSILON
0.3
curves tor n=2U. t-=NlU.l1 ana looantnmlc loss function.
N(O,25) contamination N(6,1)
.,...
oo
(j)<t:iii
1.00
......-
.,...
0.1 0.2
EPSILON
0.3 0.4 0.0 0.1
U(3.5,10) contamination 1)
.,...
/'/'
~m
1.0.,...
1.0o
oo
0.1 0.2
EPSILON
0.3 0.4 0.1 0.3
#5. ::>amole (n=20) and asymptotic maximum bias curve for MAD and SHORTH for F=N(0,1)
N(6,1) contamination N(0,0.01) contamination
0.40.30.2
EPSILON
0.1
-
0.0
oo
lOci
lO.,...
(j) ~~ .,...co
0.40.30.2
EPSILON
.f'//
hh
h' ./ /
~,/..~: <-:~ /
.~ .........--"."'~~
MAD n=infSHORTH n=infMAD n=20SHORTH n=20
0.1
C\!.,...
coo
0.0
N(10,1) contamination PT MASS °contamination
MAD n=infSHORTH n=infMAD n=20SHORTH
N.,...
coci
vci
0.0 0.1 0.2
EPSILON
0.3
~ .... " .. ~ .. #
0.4
lO
(j) ~« .,...co
lOci
oci
0.0 0.1 0.2
EPSILON
0.3 0.4
II.':HJMC #6. :sample (n=40) and asymptotic maximum bias curve for MAD and SHORTH for ) and loaarithmic
N(6,1) contamination N(0,0.01) contamination
LqT'"'
oT'"'
l()
ci
oci
0.0
MAD n=infSHORTH n=infMAD n=40SHORTH n=40
0.2
EPSILON
0.3 0.4
l()
C\i
0C\i
l()
(j)T'"'
<:(
co 0T'"'
ioci
0ci
0.0 0.1 0.2
EPSILON
0.4
Nl1 0,1) contamination PT MASS °,..ru·,,+~rnin on
0.40.30.2
MAD n=infSHORTHMADSHORTH
0.1
l()
C\i
0C\i
l()
(j)T'"'
<co 0T'"'
l()
0
0ci
0.00.40.30.20.10.0
Lqo
LqT'"'
qT'"'
oci
EPSILON EPSILON
REFERENCES
Andrews, D.F., Bickel, P.J., Hampel, F.R., Huber, P.J., Rogers, W.H., and Tukey,
J.W. (1912). Robust estimates of location: Survey and Advances. Princeton University Press, Princeton.
Donoho, D.L. and Lin, R.C. (1988). The automatic robustness of minimum distancefunctionals. Ann. Statist. 16, 552-586.
Grubel, R. (1988). The length of the Shorth.Ann. Statist. 16, 619-628.
Hampel, F.R. (1968). Contributions to the theory of robust estimation. Ph.D.thesis. University of California. Berkeley.
Hampel, F.R. (1911). A general qualitative definition of robustness. Ann. Math.
Statist. 42, 1881-1896.
Hampel, F.R. (1914). The influence curve and its role in robust estimation. Jour.Amer. Statist. Assoc. 69, 383-393.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986). Robuststatistics: The approach based on influence functions. John Wiley. New York.
Huber, P.J. (1964). Robust estimation of a location parameter. Ann. Math. Statist.35, 35-101.
Huber, P.J. (1981). Robust statistics. John Wiley. New York.
Lax, D.A. (1985). Robust estimators of scale: finite sample performance in longtailed symmetric distributions, Jour. Amer. Statist. Assoc. 80, 136-741.
Martin, R.D. and Zamar, R.H. (1981). Minimax bias robust M-estimates of scale.Tech. Rep. No 12. Department of Statistics, University of Washington, Seattle,WA.
Martin, R.D., Yohai, V.J. Zamar, R.H. (1989). Minimax bias robust regression.Ann. Statist. 17,
Martin, R.D. and Zamar, R.H. (1989). Asymptotically minimax bias robust Mestimates of scale positive random variables. Jour. Amer. Statist. Assoc. 84
estimator based on the
Simoaoff, J.S. (1981). and estimation of scale. Jour. ofStatist. Comput. and Simulation 27, 19-92.
Yohai, V.J and Zamar, R.H. (1988). High breakdown-point estimates by meansthe minimization of an efficient scale. Jour. Amer. Statist. Assoc. 83, 406-413.