nonsmooth analysis and frechet …boos/library/mimeo.archive/isms__1575.pdfwith respect to the...

25
.. NON SMOOTH ANALYSIS AND FRECHET DIFFERENTIABILITY OF M-FUNCTIONALS by Brenton R. Clarke Murdoch University Key Words: M-estimators, Robustness, Gross error sensitivity, Asymptotic expansions, Asymptotic normality, Weak continuity, Selection functional, Local uniqueness, Empirical distribution function. Mathematics subject classification (Amer. Math. Soc.) primary 62E20 secondary 62G35 Abbreviated Title: Relaxing Conditions for Frechet Differentiability.

Upload: others

Post on 09-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

..NONSMOOTH ANALYSIS AND FRECHET DIFFERENTIABILITY

OF

M-FUNCTIONALS

by

Brenton R. ClarkeMurdoch University

Key Words: M-estimators, Robustness, Gross error sensitivity,Asymptotic expansions, Asymptotic normality, Weak continuity,Selection functional, Local uniqueness, Empirical distributionfunction.

Mathematics subject classification (Amer. Math. Soc.)primary 62E20secondary 62G35

Abbreviated Title: Relaxing Conditions for Frechet Differentiability.

Page 2: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

-2-

1. INTRODUCTION

In a recent paper the author showed that contrary to popular

opinion, strict Fr'chet differentiability of the class of M-fUnctionals

is frequently possible. A necessary requirement for existence of the

Frechet derivative is that the defining psi function is uniformly

bounded, and this naturally excludes those nonrobust estimators

such as the maximum likelihood estimator in normal parametric models.

On the other hand, in that paper, smoothness assumptions were imposed

on the defining psi function which are not appropriate for many common

robust proposals in M-estimation theory, such as lIuber's(1964) minimax

solution and Hampel's(l974) three part redescender used in estimating

location. A host of robust solutions for more general parametric

families are obtained through Hampel' s(1968) lemma 5, and generalizations

of it (cL Hampel 1978), and these almost invariably are functions with

"sharp corners". Indeed the problem that is presented by failure of

psi functions to have continuous partial derivatives has been the

focus of papers by Huber(1967), Carroll(1978) with respect to proofs

of asymptotic normality. While Frechet differentiability of the M-functional

apriori gives asymptotic normality of the M-estimator, at least for

real valued observation spaces, it also gives a direct expansion

by which the degree of robustness can be directly m('asured through the

gross error sensitivity. The latter quantity is the supremum of the

absolute value of the influence curve of Hampel(l968,1974), and Huber(l977)

assuming existence of the Fr'chet derivative shows that the maximum

asymptotic bias in contaminated neighbourhoods of a parametric distribution

is proportional to the gross error sensitivity. Subsequently Frechet

differentiability of a statistical functional is an important tool

in the robust description of an estimator, and complements the definition

of a robust functional as one that is weakly continuous (cf. Hampel 1971).

Page 3: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

-3-

In this paper the methods of nonsmooth analysis • described in the book

by F.H. Clarke(1983). are introduced to the theory of statistical expansions,

and are used here in the proofs of weak continuity and Fr~chet differentia­

bility of M-functionals. Subsequently the conditions for Fr~chet

differentiability given in Clarke(1983) can be relaxed to include most

popular M-functionals.

The M-estimator is a solution of equations

IJJ(X.T) dF (x)n = o • (1.1)

where F is that distribution which attributes atomic mass lin to eachn

of n independent identically distributed observations xl •.... ,xn > having

common distribution Fe=G , the space of probability distributions

defined on some separable metrizeable observation space R. For the

applications in this paper it is only necessary to consider R = E.

the real line. The parameter Te.O. an open subset of Euc lidean •

r-space Er

, and F = {F : T EO} is a parametric family where theT

usual assumption is that F = Fe for some eeo. The function tP: RxO -+- Er

can be defined through minimization of some loss function. or obtained

by some other optimal criteria .. The theory of robustness makes use of

the M-functional T defined on G. so that more generally T[G] is a

solution of equations

Ill. IjJ(X,T) dG(x) = 0

if a solution exists, T[G] = 00 otherwise. Thus the estimator is

(1. 2)

given by the functional T evaluated at F • and its asymptotic propertiesn

follow from continuity and differentiability of T at F with respect to

suitable metrics defined on G. This approach to asymptotic theory for

statistics was first considered by Von Mises(1947) .

Page 4: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

-4-

To avoid ambiguity. and also for good statistical practice. the

concept of a selection functional p was introduced by Clarke(1983).

in order to identify in the event of several solutions of the equations (1.2).

that root which is to be the estimator.

is defined by ~.p so that

That is. the M-functional

in6 I(~.G) p(G ••) = peG, T[~.p,G]),

where

W(x,') dG(x) = 0, • EM

,if a solution exists. Otherwise T[~.p,G] =~. The functional T

is then Frechet differentiable at F with respect to the pair (G,d*),

for suitable metrics d* on G. if T can be approximated by a linear

-e

,functional TF which is defined on the linear space spanned by the

differences G - H of members of G, so that

,I T[G] - T[F] - TF(G - F) I = o(d*(G,F)) (1. 3)

as d*((;,F) -+ 0, GE.G. Essentially the expansion for Frechet differen-

tiability is dependent on a local expansion of equations(I.2), and

a robust selection functional will automatically select the Frechet

differentiable root. whenever one exists. To the latter end one uses

an auxilliary functional p(G.T) = IT - el to prove existence of a unique

Frechet differentiable root in a local neighbourhood of the parameter e

when considering the derivative at Fe' Also it is sufficient to consider

the expansion (1.3) for T defined on G, and the usual mathematical

extension of the domain of T to the linear space of signed measures is

of little importance here .

The Frechet derivative may be considered strong in the sense

that existence of the Frechet derivative for statistical functionals

implies existence of the weaker Hadamard or compact derivatives of

Reeds(1976), Fernholz(1983), and the Gateaux derivative discussed by

Kallianpur(1963), a special case of which is the influence curve

Page 5: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

IC(x.F.T) = .urn£-+0

-5-

T[(l-e:)F+e:o ] - T[F]x

; here 0 is the distribution attributing mass 1 to the point x.x

The G~teaux derivative is given by

J IC(x.F.T) d(G-F)(x).

which coincides with the Fr~chet derivative when the latter exists.

Unfortunately comments by Kallianpur(1963) which were in specific

relation to the maximum likelihood estimator (rnle) led other researchers

to believe the derivative too strong to obtain. Indeed ~wber(l98l)

states " Unfortunately the concept of Fr~chet differentiability appears

too strong: in too many cases, the Frechet derivative does not exist,

"and even if it does. the fact is difficult to establish.

In Clarke(l983) simple conditions for Frcchet differentiability of

M-functionals were given together with a counterexample to the comments

of Kallianpur.

Boos and Serfling(1980) introduce the related notion of a

quasi-differential which assumes the same expansion (1.3). but

restricts G=F and allows for small order errors in probabilityn

with respect to the Kolmogorov distance between F and F. Thisn

expansion does not offer the same properties of robust description

of the estimating functional, and even the mean functional satisfies

e-•

this stochastic form of differentiability. Beran(1977) also adopts

a differential approach using the Hellinger metric, though this appears

to be for more specific application.

A weaker set of conditions than conditions A of Clarke(1983)

are introduced in section 2. though for smooth psi functions conditions

A of that paper are easier to apply. Theorem 2.1 of this paper is,

necessary to show condition A4

,introduced here. hplds for the popular

nonsmooth psi functions. It can be considered as a variation

Page 6: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

-e•

- 6 -

or a generalization of the Glivenko Cantelli result. Conditions A'

are used in sections 3 and 4 in the theorems that give existence of a

unique continuous and ~chet differentiable root of equations (1.2).

In particular the arguments for weak continuity follow when either of

L~vy or Prokhorov metrics are used. Important examples of application

are given in section 5, together with the conclusion.

2. A DISCUSSION OF DEFINITIONS AND CONDITIONS A'

Suppose f maps Er to itself and 0 is a point near which

f is Lipschitz. Denote nf to be the set of points at which f fails

to be differentiable, which by Radermachers theorem is known to be a

set of Lebesgue measure zero. Let Jf(T) be the usual r x r matrix

of partial derivatives whenever T E.nf

.

Definition 2.1: The gene~alized Jacobian of f at 6 ~ denoted by

(If(6) ~ is the convex hull of all r x r matY'ices Z obtained m;

the limit of a sequence of the fo~ Jf( T. )1

whc1>e T· +6 and1

The generalized .Jacobian (If(6) is said to be of maximal rank

provided every matrix in (If(6) is of maximal rank (i.e. nonsingular).

The following proposition is proved on page 71 of F.H. Clarke (1983).

Proposition 2.1: The gene~alized Jacobian (If(6) is upper semicontinuous~

which means, g1:ven (> a there exists a 0 > a such that for T 6 Uo(6) ,

the open ball of Y'adiuB 0 cente"l'ed at 6,

Here

(If(T) c af(e) + c Brxr

B is the unit ball of matrices for which B 6B impliesrxr rxr

II B II ~ 1 .

Page 7: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 7 -

Remark 2.1: Without loss of generality we can assume "B"

least upper bound of IBYI where lyl ~ 1

to be the

Frequently several solutions of equations (1.1), (1.2) can exist

whereupon a robust selection of the functional root is obtained using the

idea of a selection functional p introduced in Clarke (1983). The

robust selection functional retains the continuity properties of the selected

root in small enough neighbourhoods n(E,F) c G of a distribution F,

which can be considered here to be defined by metrices d.. The M-functiona1

is then defined by ~ and p as T[~,p,.J. Typical choices for d*

include dk , dL ' dp the Ko1mogorov, L~vy and Prokhorov metrics respectively.

Condi tions A':

A'o

A'1

~(X.T) is an r x 1 vector function on R x e which

is continuous and bounded on R x D where Dee is some

nondegenerate compact interval containing 0 in its interior,

and R is some separable metrizab1e space

e-•

A' ~(X,T) is locally Lipschitz in T about 0 in the sense2

that for some constant ex

IWCX,T) - ljJ(x,O) I < I'r - 01

uniformly in xeR and for all T in a neighbourhood of 0

/\' Letting differentiation be with respect to the argument in3

parentheses ;lKF

('r) is of maximal rank at t = 0o

/\~ Given <') > 0 there exists an c > 0 such that for all Gsn(E ,FO)

SUPTED IKG(T) - KFe (T) I < <')

and

aKG(T) c aKF (T) + 0 Brxr uniformly in T'" D .e

Page 8: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

Remark 2.2:

Remark 2.3:

A' = AQ - Q

For a function

- 8 -

satisfying A'1

it follows from remark

2.2 and theorem 6.1 in Clarke (1983) that given 0 > 0 there exists an

E > 0 such that for all Gen(E,Fe)

SUPTED IKG(T) - KF (T) I < 0 ,e

whenever n(E,Fe) is generated by metrics dk , dL, dp

This establishes the first part of condition A~.

Remark 2.4: If KF (T) is continuously differentiable in T at 8e

then A3 ~ A; , where condition A3 is that of Clarke (1983).

Conditions A' - A'o 3can be considered fairly straightforward, whereas

the condition A'4

is not so obvious. When R = E , the real line, it can

-ebe shown to be a consequence of the following theorem, a proof of which is

detailed in the appendix. It is sufficient here to establish the result

for the Kolmogorov distance dk .

Theorem 2.1 : Let A be a class of continuous functions defined on E

with the following properties: (1) A is uniformly bounded, that is,

there exists a constant H such that If(x) I ::; H < 00 foY' aU f.:::. A

and x E E ; and (2) A is equiaontinuous. Let Fo IS: G be (liven.

Then,

foY' every 0> 0 there is an <: > 0 such that dk(Fs,G) ::; E

implies

saPhA sUPx4liEu{+oo} IfIx

fdG = flxfdFel < 6 ,

wheY'e integration is peY'formed over the intervals Jx which can be

eitheY' open or closed of the foY'm (_oo,x) or (_oo,x]

(2.1)

Page 9: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

Remark 2.5:

- 9 -

A similar proof yields the same result with dL replaced

In some instances Fr6chet differentiability with respect to

dk implies that with respect to dL , dp following (6.2) of Clarke (1983).

Consider W with continuous partial derivatives bar on a finite

set of points SeT) . From F.H. Clarke (1983 pp. 75-83) it follows that

3KG(T) = 3 JW(y,T)dG(y) c J 3W(y,T)dG(y) , (2.2)

from right hand side can be expanded to a finite summation

f.(y,T)dG(y) + I 3W(X,T)G{X}.J Xe$(T)

Here f. e A andJ

for j = I, ...m .

~~ (y,T) = fj(y,T) on the connected interval I j ,

Since W is Lipschitz in T and 3W(X,T) bounded,

theorem 2.1 implies condition A~ .

3. UNIQUENESS OF FUNCTIONAL SOLUTIONS TO EQUATIONS

For those psi functions which do not admit a unique root of

e-

the equations, at least a unique root of the equations in a local region

of the parameter space about e can be shown to exist for small enough

neighbourhoods of Fe' If conditions A' are with respect to L6vy

or Prokhorov neighbourhoods, existence of a weakly continuous root is shown,

for which the global argument of Clarke (1983) can be used to select it if

more than one root exists. When the Kolmogorov distance is used only

consistency is directly established.

The following propositions are established on pp.252-255 of F.H. Clarke

(1983), and obviate the condition of continuous derivatives in the argument

for the inverse function theorem.

Page 10: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 10 -

Proposition 3.1:

Section 2 and

Suppose f satisfies properties described in

4A f $ infaf(O) IIM(S,f) II ,where the infimum is taken over a'll matrices M(O ,f) € df(O) ~ and foY'

some 0 > O,TEiUo(O) imp ties

2A f :::; infdf(T) IIM(T ,f) IIThen foY' arobitroaroy T1 • T2 e (fo(O) , the closure o[the baH Uo(O) ,

If(Td - f(T2)1 ~ 2Af

IT1-T21 .

Proposition 3.2: UndeY' the conditions of Prooposition 3.1

f(Uo(O)) contains U,\ o(f(O)) .f

Remark 3.1: For VE U,\ 0(f(8)) we can define f-1

(v) to be thef

unique T ~UA 0(0) such that f(T) = v and Proposition 3.1 impliesf

f- 1 is Lipschitz with Lipschitz constant 1/(2Af

)

-e LO!TVDa~: Lc t (Jondi tiona I\. ' ho tel foY' Borne tjJ ~ p 'l'hen theY'e is a

01 > a and on (1 > 0 such that foY' aU

M( T , G) E aKG (-r )

wiU satisfy IIM(T .G) II > 2,\ lJher>e ,\ 1"S defined to be a vatue foY' which

M(0.FS)E3KF (0) impUes IIM(S,Fo)ll> 41.. •o

Remark 3.2: [f KF

(T) is continuously differentiable in T theno

the choice of ,\ = 1/(4 IIM(G ,Fe) -111) satisies the criterion of Lemma 3.1.

whenever

is upper semicontinuous, choose

3Kp (T) c 3KF (0) + ,\B8 0 rxr

there exists an (1 > 0 such that

JKFe

such that

1\.'4

Proof of Lcnuna 3.1: Since

by Proposition 2.1 01 > 0

l e U01

(0) . By condition

Gen(El,Fe) implies

AB rxr uniformly in

Page 11: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 11 -

Hence given M(T ,G) ES ClKG(T)

such that

for there exists M(a,Fa

) e: ClKF

(a)e

IIM(T,G) - M(a,Fe)11 < 2>. ,

whence by Proposition 3.1 IIM(T,G)II > 21.. •

It is now possible to state and prove the uniqueness argument of Theorem

3.1 of Clarke (1983) using weakened conditions A'. The result also

implies existence of a weakly continuous root for either L~vy or

Prokhorov neighbourhoods. As usual the following selection functional is

only used as an auxilliary device.

Theorem 3.1: Let p(G,T) = IT-81 and suppose aonditions A' hold.

Then given K > 0 thepe exists an c > 0 suah that G .n(~,F8)

e-

implies T[ ljJ,p ,GJ exists and is an element of U (8)K

Fw:other

for this ( there is a K* > 0 suah that

l(ljJ,G) n U *(e) = T[ljJ,p,GJ ,K

is of maximal Y'ank for For any null sequenae

of positive number-s le t, {Gk

} be an arbi trar>y sequenae for whiah

lim T[ljJ,p,GkJ = TrljJ,p,FeJ = 8 .k->=

Page 12: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 12 -

Proof of Theorem 3.1 Since aKp (.) is upper semicontinuous in6

such that 'eU *(6) impliesK

infaKG

(.) IIM(T.G) II > 21.. for all G en(€l'F6)

where the infimum is taken over all matrices M(',G)E: aKG(.) Here

01 101, and A are defined in Lemma 3.1. Hence

41.. (G) = infaK (a) IIM(a,G) II > 21.. .G

Choose 0 < €* ~ €1 so that the following relations hold

aKG(T) c aKF (T) + (A/4)B8 rxr by A'it

c aKp (6) + (A/2)B6 rxr

by Proposition 2.1

c aKG

(6) + ASrxr by A'it

Then for every M(T ,G) e: aKG

(.) there exists an M(8 ,G) €' aKG(6) such that

IIM(.,G) - M(G,G) II < >. < 2A(G) ,

whenever Ge n(€:* ,F8) and uniformly in • e U *(8) .K

By Proposition 3.1 KG(.) is a one-to-one function from U *(6) onto

K

KG(UK

*(6)) and by Proposition 3.2 the image set contains the open ball

of radius AK*/2 about KG(6)

as in Clarke (1983).

The argument for uniqueness now proceeds

4. FHEClUiT DIFFI:RENTlA13ILITY

With this

KF (T) has at least6

T = 6 , which is denoted M(8)

It will be assumed in this section that

a continuous derivative KF (T) at6

This is common with absolutely continuous parametric families.

restriction Pr~chet differentiability follows.

Page 13: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 13 -

Theorem 4.1: Let P(G,T) = IT-el and assume conditions A'

hold with respect to this functional and neighbourhoods generated by

the metrics d* on G. Suppose for al l G 6 G

J $(x,e)d(G-Fe)(x) = O(d*(G,Fe))

R

Then T[$,p,.] is Frechet differentiable at Fe with respect to

(G,d*) and has derivative

(4.1)

To prove the theorem it is necessary to introduce the following

generalization of the mean value result

in F.H. Clarke (1983)

described as Proposition 2.6.5

Proposition 4.1 Let f be Lipschitz on an open convex set U in

Er and let Tl and T2 be points in U. Then one has

(The ,'ight hand side above denotes f,he (~onvex hull of aU points of the

Since

ambigui ty.J

Proof of Theorem 4.1: Abbreviate T[$,p,.J = T[.] and let

K*,E be given by Theorem 2. Let {E k} be so that €k + 0+ as

k -+ 00 and let {Gk} be any sequence such that Gk 6. n(€k,Fa) By

theorem 2, TCGk ] exists and is unique in U * (e) for k > ko whereK

Eko

~ E By A' see that for arbitrary 6 > 0'+

aKG

(T) c aKF

(T) + 6B uniformly in TeDk a rxr

(4.2)

Page 14: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 14 -

for sufficiently large k. Consider the two term expansion,

where I~k-0 I < IT[GkJ - 6I ' which tends to zero as k -+ QO by

(4.3)

theorem 3.1, and Tk is evaluated at different points for each

component function expansion obtained as a consequence of Proposition 4.1

(i. e. ~k takes different values in each row of matrix M). See from

(4.3), (4.1) and Lemma 3.1 that

Also,

T[GkJ - 0 = -M(O)-l KG (0) - M(O)-l{M(~k,Gk)-M(O)}(T[GkJ - 0) .k

By upper semicontinuity of KC;(T) in T and (4.2)

IIM(;k,Gk) - M(6)" = 0(1) .

So

5. EXAMPLES AND CONCLUSION

Huber (1964, 1981) introduced a proposal for estimation of location

and scale of the normal distribution defined as a solution of

where e = {(T1,T2): - QO < T1 < QO , T2 > o} and the vector function

IjJ = (ljJl,~'2)' where

IjJl(X) = max L-k , min(k,x)J

and ~(k) = Imin(k2,x2)d~(x) Here cl> denotes the normal distribution.

setting ] = (61,02) , where now distinguishes the vector parameter,

it follows that since Kcl> (~) is continuously differentiable, the Jacobian

Page 15: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 15 -

M(~)1 f 1/!{(y)d~(y) 0::; - e:;-2

(S .1)

0 f Y1/!2 (y) d~ (y)

Condition A'o follows since E~[1jJ]::; 0 A' A'1 ' 2

hold by

inspection, and A'3

holds since M(!) is nonsingular. Remark 2.2

suffices for the first part of A~. To apply theorem 2.1 consider the

function

f(x,;E) ::; I (x)(Tl-kT 2 ,Tl+k'r Z)

1+ -

-1 -ke-

It is clear that A::; {f( .• r):r ~D} forms a bounded equicontinuous

class of functions on E. Also

J f(x,!)dG(x)hl-kT z,Tl+k1 Z)

where differentiation of W is with respect to the second argument.

::; (x-e I}F.t!(X) ~ 82

while

Page 16: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 16 -

A'4 follows by Theorem 2.1 and because ClljJ(Tl-k'r 2 • .:O and

Cl1jJ('tl+k'r 2 •.!) are bounded. Assumption (4.1) holds for the

Kolmogorov distance through integration by parts and noting that 1jJ

is a function of total bounded variation. Thus by Theorems 3.1

and 4.1 there exists a root that is Fr~chet differentiable at

F! = ~(x~e:) with respect to dk Since ~ has a bounded density

T is Fr~chet differentiable with respect to dL

, dp

also.

Consequently, the infinitesimal robustness of this M-estimator

at the normal parametric distribution is evident through Fr~chet

differentiability. It

distribution Fo(~~el}

fO (x)

is also Fr~chet differentiable at the

for which the density function of Fo

x2

= (l-€) e -2" for Ix I ~ k

ffi

is

k2

(l-E) 2---eI2iT

- klxlfor Ix I > k

with k and c connected through

~ - 2 ~(-k) = (K l-E

(~ =~' being the standard normal density). Then the M-estimator

coincides with the mle, and provides another example of a robust

and asymptotically efficient estimator.

Examples where multiple roots of the equations exist include

Hampel's 3-part redescender M-estimator for location dependent on

three parameters a.b.c;

ljJa,b,c (x) = x

a sign(x)

~a c-b sign(x)

o

Ixl ~ aa ~ Ixl ~ b

b ~ Ixl ~ c

c ~ Ixl

Page 17: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 17 -

I -1 1 IWith the choice of selection functional p(G,.) = • - G (-) ,2

whereby the root closest to the median is selected, the functional

T[~a,bJc'P,·] is Fr€chet differentiable at ~(Xe:l).

In a sense weak continuity and Frechet differentiability of the

functional at the empirical distribution function are also important.

Weak continuity at F indicates stability of the estimate in then

presence of rounding errors in the recording of observations,

and at least for sufficiently large n the effects of gross errors

can be considered blunted. Fr~chet differentiability at F on then

other hand,could be used to justify asymptotics involved in Edgeworth

type expansions and bootstrapping, for example as considered in

Hampel (1982) , Beran(1982). When the psi function is smooth, the

only change to the arguments of Clarke(1983) forFr~chet differentiability

at Fn

is to replace Fo by Fn in conditions AI

-A4

.

Similarly the same substitution of conditions can be made in

the results of this paper, however if it should occur that an observation

X falls exactly at the point where ~(X,.) does not have a continuous

partial derivative at T = T[F ] then the generalized gradientn

aKF (T[Fn]) does not reduce to a single matrix. Even though suchn

an event would occur with probability zero in most forseeable examples

in which the underlying distribution was absolutely continuous, it can

be said nevertheless that the proof used in theorem 4.1 does not follow

through. In this instance the question of whether T is Frechet

differentiable at F is then left open.n

At least in the domain of

M-functionals defined through (1.2), it can be concluded that Huber's(1981)

remarks should not be interpreted in the sense that Fr€chet differentiability

is too strong. This is only the case for nonrobust M-functionals,

and consequently we should consider Frechet differentiability an advantage.

Page 18: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 18 -

The problems induced by nonsmooth psi functions are not

unique to proofs of Frechet differentiability, and are applicable

to many asymptotic proofs. More frequently it is the case, that

rather than consider the difficulties, the appropriate smoothness

assumptions are made in the proofs, but somehow the results are

expected to be applicable to those continuous but nonsmooth functions also.

Nonsmooth analysis can the be considered as one possible avenue

of justifying such an approach.

6 . ACKNOtJLEDGEMENT

The author wishes to express gratitude to Professor F.B.Hampel

and Professor R.J.Carroll for encouragement and the opportunity to

pursue this research. This research has received partial support

from a US Airforce Grant No. F 49620 82 C 009, while the author was

visiting the University of North Carolina at Chapel Hill.

Page 19: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

-19 -

6. APPENDIX

The proof of theorem 2.1 is preceded by some necessary lemmas.

The notational abbreviation G(x-) = lim h~o G(x - h) is used.

Lemma 1: Let G(x) be any distribution function for which G(x) - G(x-) <

n/4 for x ~(a,b), where a < b real, and n > 0 are given. If

G(b-) - G(a) > n, then there exists a finite partition

a = Xo < Xl < ••••• < xk ' = b,

so that

G(x~) - G(x. 1) < n, j=l, ... ,k'.J J -

Proof: Define G""l (t) = inf {x I G(x) ~ t, X Ell [a,b]}

Since G is right continuous G(G-I(t)) ~ t, choose

Yj = G- I {G(a) + teG(b -) - G(a))},

where k ~ 1 is chosen so that

G(b -) - G(a) < k .; 2(G(b -) - G(a))

Then

n n

G(Yj) - G(Yj_l) ? G(a) + teG(b-) - G(a)) - G(Yj_l)

? G(a) + teG(b-) - G(a))

- {G(a) + Lr!.(G(b -) - G(a)) + n/4}

1 - /= k(G(b) - G(a)) - n 4

~ n/4

If y. ~(a,b), j=l, ... ,k,J

then

t !wn

G(y.) - G(y~) Gey.)J J J

? G(y.)J

? n/4

G(y j -1)

- G(y. 1)J-

Page 20: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 20 -

But this contradicts the initial assumption. Now since

G(Yj) ~ G(a) + i<G(b-) - G(a)) j = 1, ... , k,

then

G(y~)J G(y j -I)

1 -~ ~G(b) - G(a)) < n

..Note that Yo = a, Yl > a, and if Yk < b, then G(b-) - G(Yk) = O.

Let

a = Xo < xl < •.•• < xk '

be the partition formed from

= b

k{y.}. 1 u {b}

J J =

Lemma 6.2: Let Fe be given. Also for given c > 0 let e (-c, cJ.

Then VI n > 0 3 ( > 0 such that

sUPfeA supx«C Ifen! f(y)dG(y)x

dk(G,Fe) s c' implies

- fen! f(Y)dFe(Y)!< nx

(6.1)

where intervals I may represent either open or closed intervals fromx

-00 to x.

Proof:

in e

lGiven n > 0, let {d.}. I be the at most finite set of points1 1=

such that Pe(d i ) - Fe(d i ) ~ n/(16H), if they exist. Since the

family A is equicontinuous and C the closure of e, is compact, we

may choose a decomposition

-c = aD < Ul < •.••. < a = cm

so that u i _1 < x < Y <. a i implies !f(x) - fey) I < T)/4, for every fEA,

there exists a finite decomposition

* *a. I < a. ,1- 1

so that

so that

be the further decomposition obtained* kLet {a. } .1 1=0

m{a. } .

1 1=0and {d.} ~ I'

1 1=

*-From Lemma 6. 1 whenever Fe (a i )

n·{x .. }.11J J =0

i=l, ..... ,m.

i = 1, ... , k.

by combining the points

and

*ai_I = \0 < xii < ••• <

for which

x.In.

1

*a.1

If

Fe(xij) - Fe (x i (j_l)) < n/(4H)

* - . *FeCai ) - ['O(a i _l ) < n/(4H), set

j=I, ... ,n.1

X.10

and

(6.2)

Page 21: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 21 -

n'That is, no further partitioning is necessary. Let {b.}. be the1 1=0

set of points that partition (-c,c] formed by combining {x .. }nji ,1J "0

i = 1, ... , k. Denote F* the possibly improper distribution that

attributes weight Fe(b i ) - Fe(bi) to the points bi , and weight

to the points p. = !lb. + b. 1)' i = 1, ... , n' - 1.1 2' 1 1+

Suppose xEC is given. Then either: Ca) there exists an 0:s; i :s; n' - 1x

such that b. < x < b. 1;J 1 +

X X

or (b) there exists an 1 :s; i :s; n'x

for

which x = b..lX

For case (a) and

IJenI fdF*x

feA

- fenl fdF e I :;x

i xI J If(P.)

j =1 (b. b.) JJ -1, J

:s; i Fe{C} + 2H{Fe(bi +1) - Fe(b i )}x x

3< 11/4 + 11/2 =~

For case (b) where x = b.1

X

If fdF* - f. fdF e \enI Ln Ix x

for some 1:s; i x :s; n'

LX

$ I f If(P.)j=l (b. l,b.) J

J - J

Hence ~

sUPfEA sUPXEC Ifcnlx

fdF * - JcnlxfdFel < ~Ihis is true for any distribution satisfying the inequalities (6.2).

Page 22: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 22 -

..

In particular we can choose E* such that dk(G,Fe) < E* implies

G(x.~) - G(x .. 1) < nl (4H) , for j = I, ... , n.1J 1J - 1

i = 1, ... , k.

Let G* be the corresponding improper measure constructed from G.

Then

sup£ A sup c If fdG* - f fdGI6 xe, CnI CnI

x xIt is now convenient to consider case (b)

supfeA If fdG* - f fdF e*Ien I CnIx x

3<~

first

(+)

i x+ L IG{b.}

j=l J~ II Q"f If d(G-F,) I

J-1 (b. l,b.)J - J

$ H ~ni If dCG-Fe) I + ~i IG{b.} - FG{b·}I~J=l (b. l,b.) ]=1 J J

J- J

(*)

Choose a < E' < E* such that dk(G,Fe) < E' implies (*) < n/4.

There are two possibilities for case (a). Either

• for some a $ i s n' - 1,x

whence

(+) s (*) + sUPf~A If fd(G:F;)I(b. c]nI

lX' X

(*) < n/4 ,

or

a $ i :S n' - 1,xsXSb· 1 ,I +

X

p.IX

for which

(+) :S (*) + H /GCbi-+1) - G(b i ) - Fe(b i - +1) + Fe(b i ) I < n/2.X X X X

For either case if it happens

If fdG*CnIx

that dk(G,Fe) < E',

f fdF~ I < n/ 2 .Cn!x

then

Then the lemma follows.

Page 23: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 23 -

Proof of Theorem 2.1: Given 0 > 0 choose c > 0 so that

Fe{E - C} < 0/(8H)

For any x e (_oo,_c] and G within K01mogorov distance 0/(8H) from Fe

If I fdG - II fdFel ~ H(G{Ix} + Fe{Ix})x x

~ 0/2

Let c' be given by Lemma 6.2 for the choice of n = 0/2.

Choose c = min{c', 0/(8H)}. Then for arbitrary x > c and G within

Kolmogorov distance E

IJI

fdG - II fdFelx x

Hence

from Fe

~ H(G{E - C} + Fe{E - C})

+ SUPy~clfcnI fdG fcnr fdF e \y y

< 0/2 + 0/2 by Lemma 6.2

dk(G,Fe) < ( implies

sUPfeA sUPxeE If I fdG - JfdFel < 0x

Page 24: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

..

.,

- 24 -

REFERENCES

Beran, R.: Robust location estimates, Ann. Statist.; 5, 431-444 (1977)

Beran, R.: Estimated sampling distributions: the bootstrap and

competitors, Ann. Statist., 10, 212-225 (1982)

Boos, D.O. and Serfling, R.J.: A note on differentials and the CLT

and LIL for statistical functions with application to M-estimates.

Ann. Statist., 8, 618-624 (1980)

Carroll, R.J.: On the asymptotic distribution of multivariate

M-estimates, J. of Mult. Analysis, 8, 361-371 (1978)

Clarke, B.R.: Uniqueness and Fr~chet differentiability of functional

solutions to maximum likelihood type equations, Ann. Statist.,

1196-1205 (1983)

Clarke, F.H.: Optimization and Nonsmooth Analysis, Wiley, N~w York

(1983)

Fernholz, L.T.: Von Mises Calculus for Statistical Functionals,

Lecture notes in statistics, 19 Springer Verlag, New York, 1983.

Hampel, F.R.: Contributions to the Theory of Robust Estimation, Ph.D.

Thesis, University of California, Berkeley, 1968

Hampel, F.R.: A general qualitative definition of robustness. Ann Math.

Statist. 42, 1887-1896 (1971)

Hampel, F.R.: The influence curve and its role in robust estimation,

J. Amer. Statist. Ass., 62, 1179-1186 (1974)

Hampel, F.R.: Optimally bounding the gross error-sensitivity and the

influence of position in factor space; Proc. Amer. Statist. Assoc.,

Statist. Computing Section, 59-64 (1978)

Page 25: NONSMOOTH ANALYSIS AND FRECHET …boos/library/mimeo.archive/ISMS__1575.pdfwith respect to the Kolmogorov distance between F and F. This n expansion does not offer the same properties

- 25 -

Hampel, F.R.: Small-sample asymptotic distributions of M-estimators

of location, Biometrika, 69, 29-46 (1982)

Huber, P.J.: Robust estimation of a location parameter, Ann. Math.

Statist., 35, 73-101 (1964)

Huber, P.J.: The behaviour of maximum likelihood estimates under

non standard conditions, in: Proc. Fifth Berkeley Symposium on

Mathematical Statistics and Probability Vol. 1, University of

California Press, Berkeley (1967)

Huber, P.J.: Robust Statistical Procedures, Regional Conference

Series in Applied Mathematics No. 27, Soc. Industr. App. Math.,

Philadelphia, Penn. 1977

Huber, P.J.: Robust Statistics, Wiley, New York, 1981

Kallianpur, G.: Von Mises functions and maximum likelihood

estimation. Sankhya, Ser. A., 23, 149-158 (1963)

Reeds, J.A.: On the Definition of von Mises Functions, Ph.D

Thesis, Department of Statistics, Harvard University, Cambridge,

Mass. 1976.

Von Mises, R.: On the asymptotic distribution of differentiable

statistical functions. An .. Math. Statist. 18,309-348 (1947)

..

..