nonsmooth analysis and frechet …boos/library/mimeo.archive/isms__1575.pdfwith respect to the...
TRANSCRIPT
•
•
•
..NONSMOOTH ANALYSIS AND FRECHET DIFFERENTIABILITY
OF
M-FUNCTIONALS
by
Brenton R. ClarkeMurdoch University
Key Words: M-estimators, Robustness, Gross error sensitivity,Asymptotic expansions, Asymptotic normality, Weak continuity,Selection functional, Local uniqueness, Empirical distributionfunction.
Mathematics subject classification (Amer. Math. Soc.)primary 62E20secondary 62G35
Abbreviated Title: Relaxing Conditions for Frechet Differentiability.
•
•
-2-
1. INTRODUCTION
In a recent paper the author showed that contrary to popular
opinion, strict Fr'chet differentiability of the class of M-fUnctionals
is frequently possible. A necessary requirement for existence of the
Frechet derivative is that the defining psi function is uniformly
bounded, and this naturally excludes those nonrobust estimators
such as the maximum likelihood estimator in normal parametric models.
On the other hand, in that paper, smoothness assumptions were imposed
on the defining psi function which are not appropriate for many common
robust proposals in M-estimation theory, such as lIuber's(1964) minimax
solution and Hampel's(l974) three part redescender used in estimating
location. A host of robust solutions for more general parametric
families are obtained through Hampel' s(1968) lemma 5, and generalizations
of it (cL Hampel 1978), and these almost invariably are functions with
"sharp corners". Indeed the problem that is presented by failure of
psi functions to have continuous partial derivatives has been the
focus of papers by Huber(1967), Carroll(1978) with respect to proofs
of asymptotic normality. While Frechet differentiability of the M-functional
apriori gives asymptotic normality of the M-estimator, at least for
real valued observation spaces, it also gives a direct expansion
by which the degree of robustness can be directly m('asured through the
gross error sensitivity. The latter quantity is the supremum of the
absolute value of the influence curve of Hampel(l968,1974), and Huber(l977)
assuming existence of the Fr'chet derivative shows that the maximum
asymptotic bias in contaminated neighbourhoods of a parametric distribution
is proportional to the gross error sensitivity. Subsequently Frechet
differentiability of a statistical functional is an important tool
in the robust description of an estimator, and complements the definition
of a robust functional as one that is weakly continuous (cf. Hampel 1971).
-3-
In this paper the methods of nonsmooth analysis • described in the book
by F.H. Clarke(1983). are introduced to the theory of statistical expansions,
and are used here in the proofs of weak continuity and Fr~chet differentia
bility of M-functionals. Subsequently the conditions for Fr~chet
differentiability given in Clarke(1983) can be relaxed to include most
popular M-functionals.
The M-estimator is a solution of equations
IJJ(X.T) dF (x)n = o • (1.1)
where F is that distribution which attributes atomic mass lin to eachn
of n independent identically distributed observations xl •.... ,xn > having
common distribution Fe=G , the space of probability distributions
defined on some separable metrizeable observation space R. For the
applications in this paper it is only necessary to consider R = E.
the real line. The parameter Te.O. an open subset of Euc lidean •
r-space Er
, and F = {F : T EO} is a parametric family where theT
usual assumption is that F = Fe for some eeo. The function tP: RxO -+- Er
can be defined through minimization of some loss function. or obtained
by some other optimal criteria .. The theory of robustness makes use of
the M-functional T defined on G. so that more generally T[G] is a
solution of equations
Ill. IjJ(X,T) dG(x) = 0
if a solution exists, T[G] = 00 otherwise. Thus the estimator is
(1. 2)
given by the functional T evaluated at F • and its asymptotic propertiesn
follow from continuity and differentiability of T at F with respect to
suitable metrics defined on G. This approach to asymptotic theory for
statistics was first considered by Von Mises(1947) .
-4-
To avoid ambiguity. and also for good statistical practice. the
concept of a selection functional p was introduced by Clarke(1983).
in order to identify in the event of several solutions of the equations (1.2).
that root which is to be the estimator.
is defined by ~.p so that
That is. the M-functional
in6 I(~.G) p(G ••) = peG, T[~.p,G]),
where
W(x,') dG(x) = 0, • EM
,if a solution exists. Otherwise T[~.p,G] =~. The functional T
is then Frechet differentiable at F with respect to the pair (G,d*),
for suitable metrics d* on G. if T can be approximated by a linear
-e
,functional TF which is defined on the linear space spanned by the
differences G - H of members of G, so that
,I T[G] - T[F] - TF(G - F) I = o(d*(G,F)) (1. 3)
•
•
as d*((;,F) -+ 0, GE.G. Essentially the expansion for Frechet differen-
tiability is dependent on a local expansion of equations(I.2), and
a robust selection functional will automatically select the Frechet
differentiable root. whenever one exists. To the latter end one uses
an auxilliary functional p(G.T) = IT - el to prove existence of a unique
Frechet differentiable root in a local neighbourhood of the parameter e
when considering the derivative at Fe' Also it is sufficient to consider
the expansion (1.3) for T defined on G, and the usual mathematical
extension of the domain of T to the linear space of signed measures is
of little importance here .
The Frechet derivative may be considered strong in the sense
that existence of the Frechet derivative for statistical functionals
implies existence of the weaker Hadamard or compact derivatives of
Reeds(1976), Fernholz(1983), and the Gateaux derivative discussed by
Kallianpur(1963), a special case of which is the influence curve
IC(x.F.T) = .urn£-+0
-5-
T[(l-e:)F+e:o ] - T[F]x
; here 0 is the distribution attributing mass 1 to the point x.x
The G~teaux derivative is given by
J IC(x.F.T) d(G-F)(x).
which coincides with the Fr~chet derivative when the latter exists.
Unfortunately comments by Kallianpur(1963) which were in specific
relation to the maximum likelihood estimator (rnle) led other researchers
to believe the derivative too strong to obtain. Indeed ~wber(l98l)
states " Unfortunately the concept of Fr~chet differentiability appears
too strong: in too many cases, the Frechet derivative does not exist,
"and even if it does. the fact is difficult to establish.
In Clarke(l983) simple conditions for Frcchet differentiability of
M-functionals were given together with a counterexample to the comments
of Kallianpur.
Boos and Serfling(1980) introduce the related notion of a
quasi-differential which assumes the same expansion (1.3). but
restricts G=F and allows for small order errors in probabilityn
with respect to the Kolmogorov distance between F and F. Thisn
expansion does not offer the same properties of robust description
of the estimating functional, and even the mean functional satisfies
e-•
this stochastic form of differentiability. Beran(1977) also adopts
a differential approach using the Hellinger metric, though this appears
to be for more specific application.
A weaker set of conditions than conditions A of Clarke(1983)
are introduced in section 2. though for smooth psi functions conditions
A of that paper are easier to apply. Theorem 2.1 of this paper is,
necessary to show condition A4
,introduced here. hplds for the popular
nonsmooth psi functions. It can be considered as a variation
-e•
- 6 -
or a generalization of the Glivenko Cantelli result. Conditions A'
are used in sections 3 and 4 in the theorems that give existence of a
unique continuous and ~chet differentiable root of equations (1.2).
In particular the arguments for weak continuity follow when either of
L~vy or Prokhorov metrics are used. Important examples of application
are given in section 5, together with the conclusion.
2. A DISCUSSION OF DEFINITIONS AND CONDITIONS A'
Suppose f maps Er to itself and 0 is a point near which
f is Lipschitz. Denote nf to be the set of points at which f fails
to be differentiable, which by Radermachers theorem is known to be a
set of Lebesgue measure zero. Let Jf(T) be the usual r x r matrix
of partial derivatives whenever T E.nf
.
Definition 2.1: The gene~alized Jacobian of f at 6 ~ denoted by
(If(6) ~ is the convex hull of all r x r matY'ices Z obtained m;
the limit of a sequence of the fo~ Jf( T. )1
whc1>e T· +6 and1
The generalized .Jacobian (If(6) is said to be of maximal rank
provided every matrix in (If(6) is of maximal rank (i.e. nonsingular).
The following proposition is proved on page 71 of F.H. Clarke (1983).
Proposition 2.1: The gene~alized Jacobian (If(6) is upper semicontinuous~
which means, g1:ven (> a there exists a 0 > a such that for T 6 Uo(6) ,
the open ball of Y'adiuB 0 cente"l'ed at 6,
Here
(If(T) c af(e) + c Brxr
B is the unit ball of matrices for which B 6B impliesrxr rxr
II B II ~ 1 .
- 7 -
Remark 2.1: Without loss of generality we can assume "B"
least upper bound of IBYI where lyl ~ 1
to be the
Frequently several solutions of equations (1.1), (1.2) can exist
whereupon a robust selection of the functional root is obtained using the
idea of a selection functional p introduced in Clarke (1983). The
robust selection functional retains the continuity properties of the selected
root in small enough neighbourhoods n(E,F) c G of a distribution F,
which can be considered here to be defined by metrices d.. The M-functiona1
is then defined by ~ and p as T[~,p,.J. Typical choices for d*
include dk , dL ' dp the Ko1mogorov, L~vy and Prokhorov metrics respectively.
Condi tions A':
A'o
A'1
~(X.T) is an r x 1 vector function on R x e which
is continuous and bounded on R x D where Dee is some
nondegenerate compact interval containing 0 in its interior,
and R is some separable metrizab1e space
e-•
A' ~(X,T) is locally Lipschitz in T about 0 in the sense2
that for some constant ex
IWCX,T) - ljJ(x,O) I < I'r - 01
uniformly in xeR and for all T in a neighbourhood of 0
/\' Letting differentiation be with respect to the argument in3
parentheses ;lKF
('r) is of maximal rank at t = 0o
/\~ Given <') > 0 there exists an c > 0 such that for all Gsn(E ,FO)
SUPTED IKG(T) - KFe (T) I < <')
and
aKG(T) c aKF (T) + 0 Brxr uniformly in T'" D .e
Remark 2.2:
Remark 2.3:
A' = AQ - Q
For a function
- 8 -
satisfying A'1
it follows from remark
2.2 and theorem 6.1 in Clarke (1983) that given 0 > 0 there exists an
E > 0 such that for all Gen(E,Fe)
SUPTED IKG(T) - KF (T) I < 0 ,e
whenever n(E,Fe) is generated by metrics dk , dL, dp
This establishes the first part of condition A~.
Remark 2.4: If KF (T) is continuously differentiable in T at 8e
then A3 ~ A; , where condition A3 is that of Clarke (1983).
Conditions A' - A'o 3can be considered fairly straightforward, whereas
the condition A'4
is not so obvious. When R = E , the real line, it can
-ebe shown to be a consequence of the following theorem, a proof of which is
detailed in the appendix. It is sufficient here to establish the result
for the Kolmogorov distance dk .
Theorem 2.1 : Let A be a class of continuous functions defined on E
with the following properties: (1) A is uniformly bounded, that is,
there exists a constant H such that If(x) I ::; H < 00 foY' aU f.:::. A
and x E E ; and (2) A is equiaontinuous. Let Fo IS: G be (liven.
Then,
foY' every 0> 0 there is an <: > 0 such that dk(Fs,G) ::; E
implies
saPhA sUPx4liEu{+oo} IfIx
fdG = flxfdFel < 6 ,
wheY'e integration is peY'formed over the intervals Jx which can be
eitheY' open or closed of the foY'm (_oo,x) or (_oo,x]
(2.1)
Remark 2.5:
- 9 -
A similar proof yields the same result with dL replaced
In some instances Fr6chet differentiability with respect to
dk implies that with respect to dL , dp following (6.2) of Clarke (1983).
Consider W with continuous partial derivatives bar on a finite
set of points SeT) . From F.H. Clarke (1983 pp. 75-83) it follows that
3KG(T) = 3 JW(y,T)dG(y) c J 3W(y,T)dG(y) , (2.2)
from right hand side can be expanded to a finite summation
f.(y,T)dG(y) + I 3W(X,T)G{X}.J Xe$(T)
Here f. e A andJ
for j = I, ...m .
~~ (y,T) = fj(y,T) on the connected interval I j ,
Since W is Lipschitz in T and 3W(X,T) bounded,
theorem 2.1 implies condition A~ .
3. UNIQUENESS OF FUNCTIONAL SOLUTIONS TO EQUATIONS
For those psi functions which do not admit a unique root of
e-
the equations, at least a unique root of the equations in a local region
of the parameter space about e can be shown to exist for small enough
neighbourhoods of Fe' If conditions A' are with respect to L6vy
or Prokhorov neighbourhoods, existence of a weakly continuous root is shown,
for which the global argument of Clarke (1983) can be used to select it if
more than one root exists. When the Kolmogorov distance is used only
consistency is directly established.
The following propositions are established on pp.252-255 of F.H. Clarke
(1983), and obviate the condition of continuous derivatives in the argument
for the inverse function theorem.
- 10 -
Proposition 3.1:
Section 2 and
Suppose f satisfies properties described in
4A f $ infaf(O) IIM(S,f) II ,where the infimum is taken over a'll matrices M(O ,f) € df(O) ~ and foY'
some 0 > O,TEiUo(O) imp ties
2A f :::; infdf(T) IIM(T ,f) IIThen foY' arobitroaroy T1 • T2 e (fo(O) , the closure o[the baH Uo(O) ,
If(Td - f(T2)1 ~ 2Af
IT1-T21 .
Proposition 3.2: UndeY' the conditions of Prooposition 3.1
f(Uo(O)) contains U,\ o(f(O)) .f
Remark 3.1: For VE U,\ 0(f(8)) we can define f-1
(v) to be thef
unique T ~UA 0(0) such that f(T) = v and Proposition 3.1 impliesf
f- 1 is Lipschitz with Lipschitz constant 1/(2Af
)
-e LO!TVDa~: Lc t (Jondi tiona I\. ' ho tel foY' Borne tjJ ~ p 'l'hen theY'e is a
01 > a and on (1 > 0 such that foY' aU
M( T , G) E aKG (-r )
wiU satisfy IIM(T .G) II > 2,\ lJher>e ,\ 1"S defined to be a vatue foY' which
M(0.FS)E3KF (0) impUes IIM(S,Fo)ll> 41.. •o
Remark 3.2: [f KF
(T) is continuously differentiable in T theno
the choice of ,\ = 1/(4 IIM(G ,Fe) -111) satisies the criterion of Lemma 3.1.
whenever
is upper semicontinuous, choose
3Kp (T) c 3KF (0) + ,\B8 0 rxr
there exists an (1 > 0 such that
JKFe
such that
1\.'4
Proof of Lcnuna 3.1: Since
by Proposition 2.1 01 > 0
l e U01
(0) . By condition
Gen(El,Fe) implies
AB rxr uniformly in
- 11 -
Hence given M(T ,G) ES ClKG(T)
such that
for there exists M(a,Fa
) e: ClKF
(a)e
IIM(T,G) - M(a,Fe)11 < 2>. ,
whence by Proposition 3.1 IIM(T,G)II > 21.. •
It is now possible to state and prove the uniqueness argument of Theorem
3.1 of Clarke (1983) using weakened conditions A'. The result also
implies existence of a weakly continuous root for either L~vy or
Prokhorov neighbourhoods. As usual the following selection functional is
only used as an auxilliary device.
Theorem 3.1: Let p(G,T) = IT-81 and suppose aonditions A' hold.
Then given K > 0 thepe exists an c > 0 suah that G .n(~,F8)
e-
implies T[ ljJ,p ,GJ exists and is an element of U (8)K
Fw:other
for this ( there is a K* > 0 suah that
l(ljJ,G) n U *(e) = T[ljJ,p,GJ ,K
is of maximal Y'ank for For any null sequenae
of positive number-s le t, {Gk
} be an arbi trar>y sequenae for whiah
lim T[ljJ,p,GkJ = TrljJ,p,FeJ = 8 .k->=
- 12 -
Proof of Theorem 3.1 Since aKp (.) is upper semicontinuous in6
such that 'eU *(6) impliesK
infaKG
(.) IIM(T.G) II > 21.. for all G en(€l'F6)
where the infimum is taken over all matrices M(',G)E: aKG(.) Here
01 101, and A are defined in Lemma 3.1. Hence
41.. (G) = infaK (a) IIM(a,G) II > 21.. .G
Choose 0 < €* ~ €1 so that the following relations hold
aKG(T) c aKF (T) + (A/4)B8 rxr by A'it
c aKp (6) + (A/2)B6 rxr
by Proposition 2.1
c aKG
(6) + ASrxr by A'it
Then for every M(T ,G) e: aKG
(.) there exists an M(8 ,G) €' aKG(6) such that
IIM(.,G) - M(G,G) II < >. < 2A(G) ,
whenever Ge n(€:* ,F8) and uniformly in • e U *(8) .K
By Proposition 3.1 KG(.) is a one-to-one function from U *(6) onto
K
KG(UK
*(6)) and by Proposition 3.2 the image set contains the open ball
of radius AK*/2 about KG(6)
as in Clarke (1983).
The argument for uniqueness now proceeds
4. FHEClUiT DIFFI:RENTlA13ILITY
With this
KF (T) has at least6
T = 6 , which is denoted M(8)
It will be assumed in this section that
a continuous derivative KF (T) at6
This is common with absolutely continuous parametric families.
restriction Pr~chet differentiability follows.
- 13 -
Theorem 4.1: Let P(G,T) = IT-el and assume conditions A'
hold with respect to this functional and neighbourhoods generated by
the metrics d* on G. Suppose for al l G 6 G
J $(x,e)d(G-Fe)(x) = O(d*(G,Fe))
R
Then T[$,p,.] is Frechet differentiable at Fe with respect to
(G,d*) and has derivative
(4.1)
To prove the theorem it is necessary to introduce the following
generalization of the mean value result
in F.H. Clarke (1983)
described as Proposition 2.6.5
Proposition 4.1 Let f be Lipschitz on an open convex set U in
Er and let Tl and T2 be points in U. Then one has
(The ,'ight hand side above denotes f,he (~onvex hull of aU points of the
Since
ambigui ty.J
Proof of Theorem 4.1: Abbreviate T[$,p,.J = T[.] and let
K*,E be given by Theorem 2. Let {E k} be so that €k + 0+ as
k -+ 00 and let {Gk} be any sequence such that Gk 6. n(€k,Fa) By
theorem 2, TCGk ] exists and is unique in U * (e) for k > ko whereK
Eko
~ E By A' see that for arbitrary 6 > 0'+
aKG
(T) c aKF
(T) + 6B uniformly in TeDk a rxr
(4.2)
- 14 -
for sufficiently large k. Consider the two term expansion,
where I~k-0 I < IT[GkJ - 6I ' which tends to zero as k -+ QO by
(4.3)
theorem 3.1, and Tk is evaluated at different points for each
component function expansion obtained as a consequence of Proposition 4.1
(i. e. ~k takes different values in each row of matrix M). See from
(4.3), (4.1) and Lemma 3.1 that
Also,
T[GkJ - 0 = -M(O)-l KG (0) - M(O)-l{M(~k,Gk)-M(O)}(T[GkJ - 0) .k
By upper semicontinuity of KC;(T) in T and (4.2)
IIM(;k,Gk) - M(6)" = 0(1) .
So
5. EXAMPLES AND CONCLUSION
Huber (1964, 1981) introduced a proposal for estimation of location
and scale of the normal distribution defined as a solution of
where e = {(T1,T2): - QO < T1 < QO , T2 > o} and the vector function
IjJ = (ljJl,~'2)' where
IjJl(X) = max L-k , min(k,x)J
and ~(k) = Imin(k2,x2)d~(x) Here cl> denotes the normal distribution.
setting ] = (61,02) , where now distinguishes the vector parameter,
it follows that since Kcl> (~) is continuously differentiable, the Jacobian
- 15 -
M(~)1 f 1/!{(y)d~(y) 0::; - e:;-2
(S .1)
0 f Y1/!2 (y) d~ (y)
Condition A'o follows since E~[1jJ]::; 0 A' A'1 ' 2
hold by
inspection, and A'3
holds since M(!) is nonsingular. Remark 2.2
suffices for the first part of A~. To apply theorem 2.1 consider the
function
f(x,;E) ::; I (x)(Tl-kT 2 ,Tl+k'r Z)
1+ -
-1 -ke-
It is clear that A::; {f( .• r):r ~D} forms a bounded equicontinuous
class of functions on E. Also
J f(x,!)dG(x)hl-kT z,Tl+k1 Z)
where differentiation of W is with respect to the second argument.
::; (x-e I}F.t!(X) ~ 82
while
- 16 -
A'4 follows by Theorem 2.1 and because ClljJ(Tl-k'r 2 • .:O and
Cl1jJ('tl+k'r 2 •.!) are bounded. Assumption (4.1) holds for the
Kolmogorov distance through integration by parts and noting that 1jJ
is a function of total bounded variation. Thus by Theorems 3.1
and 4.1 there exists a root that is Fr~chet differentiable at
F! = ~(x~e:) with respect to dk Since ~ has a bounded density
T is Fr~chet differentiable with respect to dL
, dp
also.
Consequently, the infinitesimal robustness of this M-estimator
at the normal parametric distribution is evident through Fr~chet
differentiability. It
distribution Fo(~~el}
fO (x)
is also Fr~chet differentiable at the
for which the density function of Fo
x2
= (l-€) e -2" for Ix I ~ k
ffi
is
k2
(l-E) 2---eI2iT
- klxlfor Ix I > k
with k and c connected through
~ - 2 ~(-k) = (K l-E
(~ =~' being the standard normal density). Then the M-estimator
coincides with the mle, and provides another example of a robust
and asymptotically efficient estimator.
Examples where multiple roots of the equations exist include
Hampel's 3-part redescender M-estimator for location dependent on
three parameters a.b.c;
ljJa,b,c (x) = x
a sign(x)
~a c-b sign(x)
o
Ixl ~ aa ~ Ixl ~ b
b ~ Ixl ~ c
c ~ Ixl
- 17 -
I -1 1 IWith the choice of selection functional p(G,.) = • - G (-) ,2
whereby the root closest to the median is selected, the functional
T[~a,bJc'P,·] is Fr€chet differentiable at ~(Xe:l).
In a sense weak continuity and Frechet differentiability of the
functional at the empirical distribution function are also important.
Weak continuity at F indicates stability of the estimate in then
presence of rounding errors in the recording of observations,
and at least for sufficiently large n the effects of gross errors
can be considered blunted. Fr~chet differentiability at F on then
other hand,could be used to justify asymptotics involved in Edgeworth
type expansions and bootstrapping, for example as considered in
Hampel (1982) , Beran(1982). When the psi function is smooth, the
only change to the arguments of Clarke(1983) forFr~chet differentiability
at Fn
is to replace Fo by Fn in conditions AI
-A4
.
Similarly the same substitution of conditions can be made in
the results of this paper, however if it should occur that an observation
X falls exactly at the point where ~(X,.) does not have a continuous
partial derivative at T = T[F ] then the generalized gradientn
aKF (T[Fn]) does not reduce to a single matrix. Even though suchn
an event would occur with probability zero in most forseeable examples
in which the underlying distribution was absolutely continuous, it can
be said nevertheless that the proof used in theorem 4.1 does not follow
through. In this instance the question of whether T is Frechet
differentiable at F is then left open.n
At least in the domain of
M-functionals defined through (1.2), it can be concluded that Huber's(1981)
remarks should not be interpreted in the sense that Fr€chet differentiability
is too strong. This is only the case for nonrobust M-functionals,
and consequently we should consider Frechet differentiability an advantage.
- 18 -
The problems induced by nonsmooth psi functions are not
unique to proofs of Frechet differentiability, and are applicable
to many asymptotic proofs. More frequently it is the case, that
rather than consider the difficulties, the appropriate smoothness
assumptions are made in the proofs, but somehow the results are
expected to be applicable to those continuous but nonsmooth functions also.
Nonsmooth analysis can the be considered as one possible avenue
of justifying such an approach.
6 . ACKNOtJLEDGEMENT
The author wishes to express gratitude to Professor F.B.Hampel
and Professor R.J.Carroll for encouragement and the opportunity to
pursue this research. This research has received partial support
from a US Airforce Grant No. F 49620 82 C 009, while the author was
visiting the University of North Carolina at Chapel Hill.
-19 -
6. APPENDIX
The proof of theorem 2.1 is preceded by some necessary lemmas.
The notational abbreviation G(x-) = lim h~o G(x - h) is used.
Lemma 1: Let G(x) be any distribution function for which G(x) - G(x-) <
n/4 for x ~(a,b), where a < b real, and n > 0 are given. If
G(b-) - G(a) > n, then there exists a finite partition
a = Xo < Xl < ••••• < xk ' = b,
so that
G(x~) - G(x. 1) < n, j=l, ... ,k'.J J -
Proof: Define G""l (t) = inf {x I G(x) ~ t, X Ell [a,b]}
Since G is right continuous G(G-I(t)) ~ t, choose
Yj = G- I {G(a) + teG(b -) - G(a))},
where k ~ 1 is chosen so that
G(b -) - G(a) < k .; 2(G(b -) - G(a))
Then
n n
G(Yj) - G(Yj_l) ? G(a) + teG(b-) - G(a)) - G(Yj_l)
? G(a) + teG(b-) - G(a))
- {G(a) + Lr!.(G(b -) - G(a)) + n/4}
1 - /= k(G(b) - G(a)) - n 4
~ n/4
If y. ~(a,b), j=l, ... ,k,J
then
t !wn
G(y.) - G(y~) Gey.)J J J
? G(y.)J
? n/4
G(y j -1)
- G(y. 1)J-
- 20 -
But this contradicts the initial assumption. Now since
G(Yj) ~ G(a) + i<G(b-) - G(a)) j = 1, ... , k,
then
G(y~)J G(y j -I)
1 -~ ~G(b) - G(a)) < n
..Note that Yo = a, Yl > a, and if Yk < b, then G(b-) - G(Yk) = O.
Let
a = Xo < xl < •.•• < xk '
be the partition formed from
= b
k{y.}. 1 u {b}
J J =
Lemma 6.2: Let Fe be given. Also for given c > 0 let e (-c, cJ.
Then VI n > 0 3 ( > 0 such that
sUPfeA supx«C Ifen! f(y)dG(y)x
dk(G,Fe) s c' implies
- fen! f(Y)dFe(Y)!< nx
(6.1)
where intervals I may represent either open or closed intervals fromx
-00 to x.
Proof:
in e
lGiven n > 0, let {d.}. I be the at most finite set of points1 1=
such that Pe(d i ) - Fe(d i ) ~ n/(16H), if they exist. Since the
family A is equicontinuous and C the closure of e, is compact, we
may choose a decomposition
-c = aD < Ul < •.••. < a = cm
so that u i _1 < x < Y <. a i implies !f(x) - fey) I < T)/4, for every fEA,
there exists a finite decomposition
* *a. I < a. ,1- 1
so that
so that
be the further decomposition obtained* kLet {a. } .1 1=0
m{a. } .
1 1=0and {d.} ~ I'
1 1=
*-From Lemma 6. 1 whenever Fe (a i )
n·{x .. }.11J J =0
i=l, ..... ,m.
i = 1, ... , k.
by combining the points
and
*ai_I = \0 < xii < ••• <
for which
x.In.
1
*a.1
If
Fe(xij) - Fe (x i (j_l)) < n/(4H)
* - . *FeCai ) - ['O(a i _l ) < n/(4H), set
j=I, ... ,n.1
X.10
and
(6.2)
- 21 -
n'That is, no further partitioning is necessary. Let {b.}. be the1 1=0
set of points that partition (-c,c] formed by combining {x .. }nji ,1J "0
i = 1, ... , k. Denote F* the possibly improper distribution that
attributes weight Fe(b i ) - Fe(bi) to the points bi , and weight
to the points p. = !lb. + b. 1)' i = 1, ... , n' - 1.1 2' 1 1+
Suppose xEC is given. Then either: Ca) there exists an 0:s; i :s; n' - 1x
such that b. < x < b. 1;J 1 +
X X
or (b) there exists an 1 :s; i :s; n'x
for
which x = b..lX
For case (a) and
IJenI fdF*x
feA
- fenl fdF e I :;x
i xI J If(P.)
j =1 (b. b.) JJ -1, J
:s; i Fe{C} + 2H{Fe(bi +1) - Fe(b i )}x x
3< 11/4 + 11/2 =~
For case (b) where x = b.1
X
If fdF* - f. fdF e \enI Ln Ix x
for some 1:s; i x :s; n'
LX
$ I f If(P.)j=l (b. l,b.) J
J - J
Hence ~
sUPfEA sUPXEC Ifcnlx
fdF * - JcnlxfdFel < ~Ihis is true for any distribution satisfying the inequalities (6.2).
- 22 -
..
In particular we can choose E* such that dk(G,Fe) < E* implies
G(x.~) - G(x .. 1) < nl (4H) , for j = I, ... , n.1J 1J - 1
i = 1, ... , k.
Let G* be the corresponding improper measure constructed from G.
Then
sup£ A sup c If fdG* - f fdGI6 xe, CnI CnI
x xIt is now convenient to consider case (b)
supfeA If fdG* - f fdF e*Ien I CnIx x
3<~
first
(+)
i x+ L IG{b.}
j=l J~ II Q"f If d(G-F,) I
J-1 (b. l,b.)J - J
$ H ~ni If dCG-Fe) I + ~i IG{b.} - FG{b·}I~J=l (b. l,b.) ]=1 J J
J- J
(*)
Choose a < E' < E* such that dk(G,Fe) < E' implies (*) < n/4.
There are two possibilities for case (a). Either
• for some a $ i s n' - 1,x
whence
(+) s (*) + sUPf~A If fd(G:F;)I(b. c]nI
lX' X
(*) < n/4 ,
or
a $ i :S n' - 1,xsXSb· 1 ,I +
X
p.IX
for which
(+) :S (*) + H /GCbi-+1) - G(b i ) - Fe(b i - +1) + Fe(b i ) I < n/2.X X X X
For either case if it happens
If fdG*CnIx
that dk(G,Fe) < E',
f fdF~ I < n/ 2 .Cn!x
then
Then the lemma follows.
- 23 -
Proof of Theorem 2.1: Given 0 > 0 choose c > 0 so that
Fe{E - C} < 0/(8H)
For any x e (_oo,_c] and G within K01mogorov distance 0/(8H) from Fe
If I fdG - II fdFel ~ H(G{Ix} + Fe{Ix})x x
~ 0/2
Let c' be given by Lemma 6.2 for the choice of n = 0/2.
Choose c = min{c', 0/(8H)}. Then for arbitrary x > c and G within
Kolmogorov distance E
IJI
fdG - II fdFelx x
Hence
from Fe
~ H(G{E - C} + Fe{E - C})
+ SUPy~clfcnI fdG fcnr fdF e \y y
< 0/2 + 0/2 by Lemma 6.2
dk(G,Fe) < ( implies
sUPfeA sUPxeE If I fdG - JfdFel < 0x
•
..
.,
- 24 -
REFERENCES
Beran, R.: Robust location estimates, Ann. Statist.; 5, 431-444 (1977)
Beran, R.: Estimated sampling distributions: the bootstrap and
competitors, Ann. Statist., 10, 212-225 (1982)
Boos, D.O. and Serfling, R.J.: A note on differentials and the CLT
and LIL for statistical functions with application to M-estimates.
Ann. Statist., 8, 618-624 (1980)
Carroll, R.J.: On the asymptotic distribution of multivariate
M-estimates, J. of Mult. Analysis, 8, 361-371 (1978)
Clarke, B.R.: Uniqueness and Fr~chet differentiability of functional
solutions to maximum likelihood type equations, Ann. Statist.,
1196-1205 (1983)
Clarke, F.H.: Optimization and Nonsmooth Analysis, Wiley, N~w York
(1983)
Fernholz, L.T.: Von Mises Calculus for Statistical Functionals,
Lecture notes in statistics, 19 Springer Verlag, New York, 1983.
Hampel, F.R.: Contributions to the Theory of Robust Estimation, Ph.D.
Thesis, University of California, Berkeley, 1968
Hampel, F.R.: A general qualitative definition of robustness. Ann Math.
Statist. 42, 1887-1896 (1971)
Hampel, F.R.: The influence curve and its role in robust estimation,
J. Amer. Statist. Ass., 62, 1179-1186 (1974)
Hampel, F.R.: Optimally bounding the gross error-sensitivity and the
influence of position in factor space; Proc. Amer. Statist. Assoc.,
Statist. Computing Section, 59-64 (1978)
- 25 -
Hampel, F.R.: Small-sample asymptotic distributions of M-estimators
of location, Biometrika, 69, 29-46 (1982)
Huber, P.J.: Robust estimation of a location parameter, Ann. Math.
Statist., 35, 73-101 (1964)
Huber, P.J.: The behaviour of maximum likelihood estimates under
non standard conditions, in: Proc. Fifth Berkeley Symposium on
Mathematical Statistics and Probability Vol. 1, University of
California Press, Berkeley (1967)
Huber, P.J.: Robust Statistical Procedures, Regional Conference
Series in Applied Mathematics No. 27, Soc. Industr. App. Math.,
Philadelphia, Penn. 1977
Huber, P.J.: Robust Statistics, Wiley, New York, 1981
Kallianpur, G.: Von Mises functions and maximum likelihood
estimation. Sankhya, Ser. A., 23, 149-158 (1963)
Reeds, J.A.: On the Definition of von Mises Functions, Ph.D
Thesis, Department of Statistics, Harvard University, Cambridge,
Mass. 1976.
Von Mises, R.: On the asymptotic distribution of differentiable
statistical functions. An .. Math. Statist. 18,309-348 (1947)
..
..