mallows theorem

Upload: bikram-paul

Post on 14-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Mallows Theorem

    1/21

    Central Limit Theorem and convergence to

    stable laws in Mallows distance

    Oliver Johnson and Richard SamworthStatistical Laboratory, University of Cambridge,Wilberforce Road, Cambridge, CB3 0WB, UK.

    October 3, 2005

    Running Title: CLT and stable convergence in Mallows distanceKeywords: Central Limit Theorem, Mallows distance, probability metric,

    stable law, Wasserstein distance

    Abstract

    We give a new proof of the classical Central Limit Theorem, in the

    Mallows (Lr-Wasserstein) distance. Our proof is elementary in thesense that it does not require complex analysis, but rather makes useof a simple subadditive inequality related to this metric. The key is toanalyse the case where equality holds. We provide some results con-cerning rates of convergence. We also consider convergence to stabledistributions, and obtain a bound on the rate of such convergence.

    1 Introduction and main results

    The spirit of the Central Limit Theorem, that normalised sums of indepen-dent random variables converge to a normal distribution, can be understoodin different senses, according to the distance used. For example, in additionto the standard Central Limit Theorem in the sense of weak convergence,we mention the proofs in Prohorov (1952) of L1 convergence of densities, inGnedenko and Kolmogorov (1954) of L convergence of densities, in Barron

    1

  • 7/30/2019 Mallows Theorem

    2/21

    (1986) of convergence in relative entropy and in Shimizu (1975) and Johnson

    and Barron (2004) of convergence in Fisher information.In this paper we consider the Central Limit Theorem with respect to the

    Mallows distance and prove convergence to stable laws in the infinite variancesetting. We study the rates of convergence in both cases.

    Definition 1.1 For any r > 0, we define the Mallows r-distance betweenprobability distribution functions FX and FY as

    dr(FX, FY) =

    inf(X,Y)

    E|X Y|r1/r

    ,

    where the infimum is taken over pairs (X, Y) whose marginal distributionfunctions are FX and FY respectively, and may be infinite. Where it causesno confusion, we write dr(X, Y) for dr(FX, FY).

    Define Fr to be the set of distribution functions F such that|x|rdF(x) 0, and let Sn =(X1 + . . . + Xn)/

    n. Then

    limn

    d2(Sn, Z2) = 0.

    Moreover, Theorem 3.2 shows that for any r

    2, if dr(Xi, Z2) 1), andSn = (X1 + . . . + Xn)/n

    1/. If there

    2

  • 7/30/2019 Mallows Theorem

    3/21

    exists an -stable random variable Y such that supi d(Xi, Y) 0, and write D(X) for d2

    2(X, Z,2

    ).In Proposition 2.4, we observe that

    D

    X1 + X2

    2

    D(X1), (1)

    with equality if and only if X1, X2 Z,2 . Hence D(Tk) is decreasingand bounded below, so converges to some D.

    2. In Proposition 2.5, we use a compactness argument to show that thereexists a strictly increasing sequence kr and a random variable T suchthat

    limr D(Tkr) = D(T).Further,

    limr

    D(Tkr+1) = limr

    D

    Tkr + T

    kr

    2

    = D

    T + T

    2

    ,

    where the Tkr and T are independent copies of Tkr and T respectively.

    3

  • 7/30/2019 Mallows Theorem

    4/21

    3. We combine these two results: since D(Tkr) and D(Tkr+1) are both

    subsequences of the convergent subsequence D(Tk), they must have acommon limit. That is,

    D = D(T) = D

    T + T

    2

    ,

    so by the condition for equality in Proposition 2.4, we deduce thatT N(0, 2) and D = 0.

    4. Proposition 2.4 implies the standard subadditive relation

    (m + n)D(Sm+n) mD(Sm) + nD(Sn).Now Theorem 6.6.1 of Hille (1948) implies that D(Sn) converges toinfn D(Sn) = 0.

    The proof of Theorem 1.3 is given in Section 5.

    2 Subadditivity of Mallows distance

    The Mallows distance and related metrics originated with a transportationproblem posed by Monge in 1781 (Rachev 1984, Dudley 1989, pp.329330).

    Kantorovich generalised this problem, and considered the distance obtainedby minimising Ec(X, Y), for a general metric c (known as the cost function),over all joint distributions of pairs (X, Y) with fixed marginals. This distanceis also known as the Wasserstein metric. Rachev (1984) reviews applicationsto differential geometry, infinite-dimensional linear programming and infor-mation theory, among many others. Mallows (1972) focused on the metricwhich we have called d2, while d1 is sometimes called the Gini index.

    In Lemma 2.3 below, we review the existence and uniqueness of the con-struction which attains the infimum in Definition 1.1, using the concept of aquasi-monotone function.

    Definition 2.1 A function k : R2 R induces a signed measure k onR2given by

    k {(x, x] (y, y]} = k(x, y) + k(x, y) k(x, y) k(x, y).We say that k is quasi-monotone if k is a non-negative measure.

    4

  • 7/30/2019 Mallows Theorem

    5/21

    The function k(x, y) =

    |x

    y

    |r is quasi-monotone for r

    1, and if r > 1

    then the measure k is absolutely continuous, with a density which is positiveLebesgue almost everywhere. Tchen (1980, Corollary 2.1) gives the followingresult, a two-dimensional version of integration by parts.

    Lemma 2.2 Letk(x, y) be a quasi-monotone function and let H1(x, y) andH2(x, y) be distribution functions with the same marginals, where H1(x, y) H2(x, y) for allx, y. Suppose there exists an H1- andH2- integrable functiong(x, y), bounded on compact sets, such that k(xB, yB) g(x, y), where xB =(B) x B. Then

    k(x, y)dH2(x, y)k(x, y)dH1(x, y) = H2 (x, y) H1 (x, y) dk(x, y).

    Here Hi (x, y) = P(X < x, Y < y), where (X, Y) have joint distributionfunction Hi.

    Lemma 2.3 For r 1, consider the joint distribution of pairs(X, Y) whereX and Y have fixed marginals FX and FY, both inFr. Then

    E|X Y|r E|X Y|r, (2)

    where X = F1

    X (U), Y = F1

    Y (U) and U U(0, 1). For r > 1, equality isattained only if (X, Y) (X, Y).

    Proof Observe, as in Frechet (1951), that if the random variables X, Y havefixed marginals FX and FY, then

    P(X x, Y y) H+(x, y), (3)

    where H+(x, y) = min(FX(x), FY(y)). This bound is achieved by takingU U(0, 1) and setting X = F1X (U), Y = F1Y (U).

    Thus, by Lemma 2.2, with k(x, y) = |x y|r

    , for r 1, and takingH1(x, y) = P(X x, Y y) and H2 = H+, we deduce that

    E|X Y|r E|X Y|r =

    {H+(x, y) H1(x, y)} dk(x, y) 0,

    so (X, Y) achieves the infimum in the definition of the Wasserstein distance.

    5

  • 7/30/2019 Mallows Theorem

    6/21

    Finally, since taking r > 1 implies that the measure k has a strictly pos-

    itive density with respect to Lebesgue measure, we can only have equality in(2) ifP(X x, Y y) = min{FX(x), FY(y)} Lebesgue almost everywhere.But the joint distribution function is right-continuous, so this condition de-termines the value ofP(X x, Y y) everywhere.

    Using the construction in Lemma 2.3, Bickel and Freedman (1981) establishthat if X1 and X2 are independent and Y1 and Y2 are independent, then

    d22(X1 + X2, Y1 + Y2) d22(X1, Y1) + d22(X2, Y2). (4)Similar subadditive expressions arise in the proof of convergence of Fisher

    information in Johnson and Barron (2004). By focusing on the case r = 2in Definition 1.1, and by using the theory of L2 spaces and projections, weestablish parallels with the Fisher information argument.

    We prove Equation (4) below, and further consider the case of equalityin this relation. Major (1978, p.504) gives an equivalent construction tothat given in Lemma 2.3. If FY is a continuous distribution function, thenFY(Y) U(0, 1), so we generate Y FY and take X = F1X FY(Y).Recall that ifEX = and Var X = 2, we write D(X) for d22(X, Z,2).

    Proposition 2.4 If X1, X2 are independent, with finite variances 21 ,

    22 >

    0, then for any t (0, 1),D

    tX1 +

    1 tX2

    tD(X1) + (1 t)D(X2),

    with equality if and only if X1 and X2 are normal.

    Proof We consider bounding D(X1 + X2) for independent X1 and X2 withmean zero, since the general result follows on translation and rescaling.

    We generate independent Yi N(0, 2i ), and take Xi = F1Xi 2i (Yi ) =hi(Y

    i ), say, for i = 1, 2. Further, writing

    2 = 21+22, we define Y

    = Y1 +Y2

    and set X = F1

    X1+X2 2(Y1 + Y2 ) = h(Y1 + Y2 ), say. Thend22(X1 + X2, Y1 + Y2) = E(X

    Y)2 E(X1 + X2 Y1 Y2 )2= E(X1 Y1 )2 + E(X2 Y2 )2= d22(X1, Y1) + d

    22(X2, Y2).

    6

  • 7/30/2019 Mallows Theorem

    7/21

    Equality holds if and only if (X1 + X2 , Y

    1 + Y

    2 ) has the same distribution

    as (X, Y). By our construction of Y = Y1 + Y2 , this means that (X1 +X2 , Y

    1 + Y

    2 ) has the same distribution as (X

    , Y1 + Y2 ), so P{X1 + X2 =

    h(Y1 + Y2 )} = P{X = h(Y1 + Y2 )} = 1. Thus, if equality holds, then

    h1(Y1 ) + h2(Y

    2 ) = h(Y

    1 + Y

    2 ) almost surely. (5)

    Brown (1982) and Johnson and Barron (2004), showed that equality holdsin Equation (5) if and only if h, h1, h2 are linear. In particular, Proposition2.1 of (Johnson and Barron 2004) implies that there exist constants ai andbi such that

    E{h(Y1 + Y2 ) h1(Y1 ) h2(Y2 )}2 2

    21

    22

    (21 + 22)

    2

    E{h1(Y1 ) a1Y1 b1}2 + E{h2(Y2 ) a2Y2 b2}2

    .(6)

    Hence, if Equation (5) holds, then hi(u) = aiu + bi almost everywhere. SinceYi and X

    i have the same mean and variance, it follows that ai = 1, bi = 0.

    Hence h1(u) = h2(u) = u and Xi = Y

    i .

    Recall that Tk = S2k , where Sn = (X1 + . . . + Xn)/

    n is a normalised sum ofindependent and identically distributed random variables of mean zero and

    finite variance 2.

    Proposition 2.5 There exists a strictly increasing sequence (kr) N and arandom variable T such that

    limr

    D(Tkr) = D(T).

    If Tkr and T are independent copies of Tkr and T respectively, then

    limr

    D(Tkr+1) = limr

    DTkr + Tkr

    2 = DT + T

    2 .Proof Since Var (Tk) = 1 for all k, the sequence (Tk) is tight. Therefore,by Prohorovs theorem, there exists a strictly increasing sequence (kr) and arandom variable T such that

    Tkrd T (7)

    7

  • 7/30/2019 Mallows Theorem

    8/21

    as r

    . Moreover, the proof of Lemma 5.2 of Brown (1982) shows that

    the sequence (T2kr) is uniformly integrable. But this, combined with Equation(7) implies that limr d2(Tkr , T) = 0 (Bickel and Freedman 1981, Lemma8.3(b)). Hence

    D(Tkr) = d22(Tkr , Z2) {d2(Tkr, T) + d2(T, Z2)}2 d22(T, Z2) = D(T)

    as r . Similarly, d22(T, Z2) {d2(T, Tkr) + d2(Tkr , Z2)}2, yielding theopposite inequality. This proves the first part of the proposition.

    For the second part, it suffices to observe that Tkr + Tkr

    d T + T asr , and E(Tkr + Tkr)2 E(T + T)2, and then use the same argumentas in the first part of the proposition.

    Combining Propositions 2.4 and 2.5, as described in Section 1, the proof ofTheorem 1.2 is now complete.

    3 Convergence of dr for general r

    The subadditive inequality (4) arises in part from a moment inequality; thatis, if X1 and X2 are independent with mean zero, then E|X1 + X2|r E|X1|r + E|X2|r, for r = 2. Similar results imply that for r 2, we havelimn dr(Sn, Z2) = 0. First, we prove the following lemma:

    Lemma 3.1 Consider independent random variablesV1, V2, . . . andW1, W2, . . .,where for some r 2 and for all i, E|Vi|r < andE|Wi|r < . Then forany m, there exists a constant c(r) such that

    drr (V1 + . . . + Vm, W1 + . . . + Wm)

    c(r) mi=1

    drr(Vi, Wi) +

    mi=1

    d22(Vi, Wi)

    r/2.

    8

  • 7/30/2019 Mallows Theorem

    9/21

    Proof We consider independent Ui

    U(0, 1), and set Vi = F

    1V (Ui) and

    Wi = F1W (Ui). Then

    drr(V1 + . . . + Vm, W1 + . . . + Wm)

    Emi=1

    (Vi Wi )r

    c(r) m

    i=1

    E |Vi Wi |r + m

    i=1

    E |Vi Wi |2r/2

    as required. This final line is an application of Rosenthals inequality (Petrov

    1995, Theorem 2.9) to the sequence (Vi Wi ).Using Lemma 3.1, we establish the following theorem.

    Theorem 3.2 LetX1, X2, . . . be independent and identically distributed ran-dom variables with mean zero, variance 2 > 0 and E|X1|r < for somer 2. If Sn = (X1 + . . . + Xn)/

    n, then

    limn

    dr(Sn, Z2) = 0.

    Proof Theorem 1.2 covers the case ofr = 2, so need only consider r > 2. We

    use a scaled version of Lemma 3.1 twice. First, we use Vi = Xi, Wi N(0, 2)and m = n, in order to deduce that, by monotonicity of the r-norms:

    drr (Sn, Z2) c(r)

    n1r/2drr(X1, Z2) + d22(X1, Z2)

    r/2

    c(r) n1r/2 + 1 drr(X1, Z2),so that drr (Sn, Z2) is uniformly bounded in n, by K, say. Then, for generaln, define N = n, take m = n/N, and u = n (m 1)N N. InLemma 3.1, take

    Vi

    = X(i1)N+1

    + . . . + XiN

    , for i = 1, . . . , m

    1

    Vm = X(m1)N+1 + . . . + Xn,

    and Wi N(0, N 2) for i = 1, . . . , m 1, Wm N(0, u2) independently.Now the uniform bound above gives, on rescaling,

    drr(Vi, Wi) = Nr/2drr(SN, Z2) Nr/2K for i = 1, . . . m 1

    9

  • 7/30/2019 Mallows Theorem

    10/21

    and drr(Vm, Wm) = ur/2drr(Su, Z2)

    Nr/2K. Further d22(Vi, Wi) = Nd

    22(SN, Z2)

    for i = 1, . . . m 1 and d22(Vm, Wm) = ud22(Su, Z2) Nd22(S1, Z2). Hence,using Lemma 3.1 again, we obtain

    drr (Sn, Z2)

    =1

    nr/2drr (V1 + . . . + Vm, W1 + . . . + Wm)

    c(r)nr/2

    mi=1

    drr(Vi, Wi) +

    mi=1

    d22(Vi, Wi)

    r/2

    c(r)mKNr/2

    nr/2+ N(m 1)

    nd2

    2

    (SN, Z2) +N

    nd2

    2

    (S1, Z2)r/2

    c(r)

    mK

    (m 1)r/2 +

    d22(SN, Z2) +1

    m 1 d22(S1, Z2)

    r/2.

    This converges to zero since limn d2(SN, Z2) = 0.

    4 Strengthening subadditivity

    Under certain conditions, we obtain a rate for the convergence in Theorem1.2. Equation (1) shows that D(Tk) is decreasing. Since D(Tk) is boundedbelow, the difference sequence D(Tk) D(Tk+1) converges to zero, As inJohnson and Barron (2004) we examine this difference sequence, to showthat its convergence implies convergence of D(Tk) to zero.

    Further, in the spirit of Johnson and Barron (2004), we hope that if thedifference sequence is small, then equality nearly holds in Equation (5), andso the functions h, h1, h2 are nearly linear. This implies that if Cov (X, Y)is close to its maximum, then X is be close to h(Y) in the L2 sense.

    Following del Barrio, et al. (1999), we define a new distance quantity

    D(X) = infm,s2 d22(X, Zm,s2). Notice that D(X) = 2

    2

    2k 22

    , wherek =10

    F1X (x)1(x)dx. This follows since F1X and

    1 are increasingfunctions, so k 0 by Chebyshevs rearrangement lemma. Using results ofdel Barrio et al. (1999), it follows that

    D(X) = 2 k2 = D(X) D(X)2

    42,

    10

  • 7/30/2019 Mallows Theorem

    11/21

    and convergence of D(Sn) to zero is equivalent to convergence of D(Sn) to

    zero.

    Proposition 4.1 Let X1 and X2 be independent and identically distributedrandom variables with mean , variance2 > 0 and densities (with respect toLebesgue measure). Defining g(u) = 1,2 F(X1+X2)/2(u), if the derivativeg(u) c for all u then

    D

    X1 + X2

    2

    1 c2

    D(X1) +

    cD(X1)2

    82

    1 c4

    D(X1).

    Proof As before, translation invariance allows us to take EXi = 0. For

    random variables X, Y, we consider the difference term Equation (3) andwrite g(u) = F1Y FX(u), and h(u) = g1(u). The function k(x, y) = {xh(y)}2 is quasi-monotone and induces the measure dk(x, y) = 2h(y)dxdy.Taking H1(x, y) = P(X x, Y y) and H2(x, y) = min{FX(x), FY(y)} inLemma 2.2 implies that

    E{X h(Y)}2 = 2

    h(y) {H2(x, y) H1(x, y)} dxdy,

    since E{X h(Y)}2 = 0. By assumption h(y) 1/c, so

    E{X h(Y)}2 2

    c {Cov (X, Y) Cov (X, Y))}.Again take Y1 , Y

    2 independent N(0,

    2) and set Xi = F1Xi

    FYi(Yi ) =hi(Y

    i ). Then define Y

    = Y1 + Y2 and take X

    = F1X1+X2 FY1+Y2(Y).Then there exist a and b such that

    d22(X1, Y1) + d22(X2, Y2) d22(X1 + X2, Y1 + Y2)

    = E(X1 + X2 Y1 Y2 )2 E(X Y)2

    = 2Cov (X, Y) 2Cov (X1 + X2 , Y1 + Y2 )

    cE

    {X1 + X

    2

    h(Y1 + Y

    2 )

    }2

    = cE{h1(Y1 ) + h2(Y2 ) h(Y1 + Y2 )}2 cE{h1(Y1 ) aY1 b}2 cD(X1),

    where the penultimate inequality follows by Equation (6). Recall that D(X) 22, so that D(X) = D(X) D(X)2/(42) D(X)/2. The result followson rescaling.

    11

  • 7/30/2019 Mallows Theorem

    12/21

    We briefly discuss the strength of the condition imposed. If X has mean

    zero, distribution function FX and continuous density fX, define the scaleinvariant quantity

    C(X) = infu

    (12 FX)(u) = inf

    p(0,1)fX(F

    1X (p))

    2(12 (p))

    = inf p(0,1)

    fX(F

    1X (p))

    (1(p)).

    We want to understand when C(X) > 0.

    Example 4.2 If X U(0, 1), thenC(X) = 1/

    12 supx (x) =

    /6.

    Lemma 4.3 If X has mean zero and variance 2

    then C(X)2

    2

    /(2

    +median(X)2).

    Proof By the Mean Value Inequality, for all p

    |12 (p)| = |12 (p) 12 (1/2)| C(X)|F1X (p) F1X (1/2)|,

    so that

    2 + F1X (1/2)2 =

    1

    0

    F1X (p)2dp + F1X (1/2)

    2 =

    1

    0

    {F1X (p) F1X (1/2)}2dp

    1C(X)2

    1

    0

    12

    (p)2dp = 2

    C(X)2 .

    In general we are concerned with the rate at which fX(x) 0 at the edgesof the support.

    Lemma 4.4 If for some > 0,

    fX(F1

    X (p)) c(1 p)1

    as p 1 (8)then limp1 fX(F

    1X (p))/(

    1(p)) = . Correspondingly if

    fX(F1X (p)) cp1 as p 0 (9)

    then limp0 fX(F1X (p))/(

    1(p)) = .

    12

  • 7/30/2019 Mallows Theorem

    13/21

    Proof Simply note that by the Mills ratio (Shorack and Wellner 1986, p.850)

    as x , (x) (x)/x, so that as p 1, (1(p)) (1 p)1(p) (1 p)2 log(1 p).Example 4.5

    1. The density of the n-fold convolution of U(0, 1) random variables isgiven byfX(x) = x

    n1/(n1)! for 0 < x < 1, henceF1X (p) = (n!p)1/n,and fX(F

    1X (p)) = n/(n!)

    1/np(n1)/n, so that Equation (9) holds.

    2. For an Exp(1) random variable, fX(F1X (p)) = 1 p, so that Equation

    (8) fails and C(X) = 0.To obtain bounds on D(Sn) as n , we need to control the sequenceC(Sn). Motivated by properties of the (seemingly related) Poincare constant,we conjecture that C((X1+X2)/

    2) C(X1) for independent and identically

    distributed Xi. If this is true and C(X) = c then C(Sn) c for all n.Assuming that C(Sn) c for all n, note that D(Tk) (1 c/4)kD(X1)

    (1 c/4)k(22). Now

    D(Tk+1

    )

    D(Tk

    )(1

    c/2)1 + cD(Tk)82(1 c/2) ,

    so

    k=0

    1 +

    cD(Tk)

    82(1 c/2)

    exp

    k=0

    cD(Tk)

    82(1 c/2)

    exp

    1

    1 c/2

    .

    We deduce that

    D(Tk) D(X1)exp

    1

    1 c/2

    (1 c/2)k ,

    or D(Sn) = O(nt), where t = log2(1 c/2).

    Remark 4.6 In general, convergence of d4(Sn, Z2) cannot occur at a ratefaster thanO(1/n). This follows becauseES4n = 3

    4 + (X1)/n, where(X),

    13

  • 7/30/2019 Mallows Theorem

    14/21

    the excess kurtosis, is defined by (X) = EX4

    3(EX2)2 (whenEX = 0).

    Thus by Minkowskis inequality,

    d4(Sn, Z2) (ES4n)1/4 (EZ42)1/4

    = 31/4

    1 +(X)

    n

    1/4 1 = 3

    1/4|(X)|4n

    + O

    1

    n2

    .

    Motivated by this remark, and by analogy with the rates discovered in John-son and Barron (2004), we conjecture that the true rate of convergence isD(Sn) = O(1/n). To obtain this, we would need to control 1 C(Sn).

    5 Convergence to stable distributions

    We now consider convergence to other stable distributions. Gnedenko andKolmogorov (1954) review classical results of this kind. We say that Y is-stable if, when Y1, . . . Y n are independent copies of Y, we have (Y1 + . . . +Yn bn)/n1/ Y for some sequence (bn). Note that -stable variables onlyexist for 0 < 2; we assume for the rest of this Section that < 2.Definition 5.1 If X has a distribution function of the form

    FX(x) = c1 + bX(x)

    |x| for x < 0

    1 FX(x) = c2 + bX(x)x

    for x 0

    wherebX(x) 0 asx , then we say thatX is in the domain of normalattraction of some stable Y with tail parameters c1, c2.

    Theorem 5 of Section 35 of (Gnedenko and Kolmogorov 1954) shows that ifFX is of this form, there exist a sequence (an) and an -stable distributionfunction FY, determined by the parameters , c1, c2, such that

    X1 + . . . + Xn ann1/

    d FY. (10)

    Although Equation (10) is obviously very similar to the standard CentralLimit Theorem, one important distinguishing feature is that both E|X| andE|Y| are infinite for 0 < < 2.

    14

  • 7/30/2019 Mallows Theorem

    15/21

    We use the following moment bounds from von Bahr and Esseen (1965).

    If X1, X2, . . . are independent, then

    E|X1 + . . . + Xn|r ni=1

    E|Xi|r for 0 < r 1 (11)

    E|X1 + . . . + Xn|r 2ni=1

    E|Xi|r when EXi = 0,for 1 < r 2. (12)

    Now, using ideas of Stout (1979), we show that for a subset of the domainof normal attraction, d(X, Y) < , for some > .

    Definition 5.2 We say that a random variable is in the domain of strongnormal attraction of Y if the function bX(x) from Definition 5.1 satisfies

    bX(x) C|x|,

    for some constantC and some > 0.

    Cramer (1963) shows that such random variables have an Edgeworth-styleexpansion, and thus convergence to Y occurs. However, his proof requiressome involved analysis and use of characteristic functions. See also Mijn-heer (1984) and Mijnheer (1986), which use bounds based on the quantiletransformation described above.

    We can regard Definition 5.2 as being analogous to requiring a bounded(2+)th moment in the Central Limit Theorem, which allows an explicit rateof convergence (via the Berry-Esseen theorem). We now show the relevanceof Definition 5.2 to the problem of stable convergence.

    Lemma 5.3 IfX is in the domain of strong normal attraction of an -stablerandom variable Y, then d(X, Y) < for some > .

    Proof We show that Majors construction always gives a joint distribution

    (X, W) with E|X W| < , and hence d(X, W) < . FollowingStout (1979), define a random variable W by

    P(W x) = c2x if x > (2c2)1/.P(W x) = c1|x| if x < (2c1)1/.

    P(W [(2c1)1/, (2c2)1/]) = 0.

    15

  • 7/30/2019 Mallows Theorem

    16/21

    Then for w > 1/2, F1W (w) =

    {c2/(1

    w)

    }1/, and so for x

    0,

    x F1W (FX(x)) = x

    1

    c2c2 + bX(x)

    1/.

    Now, since bX(x) 0, there exists Ksuch that ifx Kthen bX(x) c2/2.By the Mean Value Inequality, if t 1/2, then1 (1 + t)1/ |t|2

    1+1/

    ,

    so that for x K x F1W FX(x) 21+1/x|bX(x)|c2 .Thus, if X is in the strong domain of attraction, then

    |x|K

    x F1W FX(x)dFX(x)

    21+1/C

    c2

    |x|K

    |x|(1)dFX(x).

    Hence d(X, W) is finite for all if 1 and for < /(1 ), if < 1.Moreover, Mijnheer (1986, Equation (2.2)) shows that if Y is -stable, thenas x ,

    P(Y x) = c2x

    + O

    1

    x2

    .

    and so Y is in its own domain of strong normal attraction. Thus using theconstruction above, d(Y, W) is finite for all if 1 and for < /(1)otherwise.

    Recall that the triangle inequality holds, for d or d, according as 1

    or < 1. Hence d(X, Y) is finite for all if min(, ) 1 and for 1), and letSn = (X1 + . . . + Xn)/n

    1/. Supposethere exists an -stable random variable Y and Y1, Y2, . . . having the samedistribution as Y, and satisfying

    1

    n

    n

    i=1E{|Xi Yi| (|Xi Yi| > b)} 0 as b . (13)

    If = 1 thenlimn d(Sn, Y) = 0, and if = 1 then there exists a sequencecn = n

    1ni=1 E(Xi Yi) such that limn d(Sn cn, Y) = 0.

    Proof (Suggested by an anonymous referee). Fix > 0. Suppose first that1 < 2 and let di = E(Xi Yi). Note that di = 0 if > 1. Let b > 0 and

    17

  • 7/30/2019 Mallows Theorem

    18/21

    define

    Ui = (Xi Yi) (|Xi Yi| b) E{(Xi Yi) (|Xi Yi| b)}Vi = (Xi Yi) (|Xi Yi| > b) E{(Xi Yi) (|Xi Yi| > b)}.

    Then by Equation (12),

    d(Sn cn, Y) 1

    nE

    ni=1

    (Xi Yi di)

    =1

    nE

    ni=1

    Ui +ni=1

    Vi

    21

    nE

    ni=1

    Ui

    +21

    nE

    ni=1

    Vi

    21

    nE n

    i=1Ui2/2

    +2

    n

    ni=1

    E|Vi|

    21

    n

    ni=1

    EU2i

    /2+

    221

    n

    ni=1

    E{|Xi Yi| (|Xi Yi| > b)}

    +221

    n

    ni=1

    [E {|Xi Yi| (|Xi Yi| > b)}]

    21b

    n1/2+

    22

    n

    ni=1

    E{|Xi Yi| (|Xi Yi| > b)}

    The result follows on choosing b sufficiently large to control the second term,and then n sufficiently large to control the first.

    For 0 < < 1, take Ui as before, take Vi = (Xi Yi) (|Xi Yi| > b) andai = E{(Xi Yi) (|Xi Yi| b)}. Now using Equation (11),

    d(Sn, Y) 1

    nE

    ni=1

    Ui +ni=1

    Vi +ni=1

    ai

    1nE

    ni=1

    Ui

    +1

    nE

    ni=1

    Vi

    +1

    n

    ni=1

    ai

    1nE n

    i=1

    Ui2/2

    + 1n

    ni=1

    E|Vi| + b

    n1 ,

    so again since b is arbitrary, the result follows.

    Note when X1, X2, . . . are identically distributed, the Lindeberg condition(13) reduces to the requirement that d(X1, Y) < .

    18

  • 7/30/2019 Mallows Theorem

    19/21

    References

    Barron, A. R. (1986). Entropy and the central limit theorem. Ann. Probab.14, 336342.

    Bickel, P. J. and Freedman, D. A. (1981). Some asymptotic theory for thebootstrap. Ann. Statist. 9, 11961217.

    Brown, L. D. (1982). A proof of the central limit theorem motivated by theCramer-Rao inequality. In G. Kallianpur, P. R. Krishnaiah, and J. K.Ghosh (eds.), Statistics and Probability, Essays in Honour of C.R. Rao,pp. 141148. North-Holland, New York.

    Cramer, H. (1963). On asymptotic expansions for sums of independent ran-dom variables with a limiting stable distribution. Sankhya Ser. A 25,1324.

    Csiszar, I. (1965). A Note on limiting distributions on topological groups.Magyar Tud. Akad. Math. Kutato Int. Kozl. 9, 595599.

    del Barrio, E. et al. (1999). Tests of goodness of fit based on the L2-Wasserstein distance. Ann. Statist. 27, 12301239.

    Dudley, R. M. (1989). Real Analysis and Probability. Wadsworth andBrooks/Cole Advanced Books and Software, Pacific Grove, CA.

    Frechet, M. (1951). Sur les tableaux de correlation dont les marges sontdonnees. Ann. Univ. Lyon. Sect. A. (3) 14, 5377.

    Gnedenko, B. V. and Kolmogorov, A. N. (1954). Limit Distributions for Sumsof Independent Random Variables. Addison-Wesley, Cambridge, Mass.

    Hille, E. (1948). Functional Analysis and Semi-Groups. American Mathe-matical Society Colloquium Publications, vol. 31. American MathematicalSociety, New York.

    Johnson, O. T. and Barron, A. R. (2004). Fisher information inequalitiesand the central limit theorem. Probability Theory and Related Fields129,391409.

    19

  • 7/30/2019 Mallows Theorem

    20/21

    Kendall, D. G. (1963). Information theory and the limit theorem for Markov

    Chains and processes with a countable infinity of states. Ann. Inst. Statist.Math. 15, 137143.

    Major, P. (1978). On the invariance principle for sums of independent iden-tically distributed random variables. J. Multivariate Anal. 8, 487517.

    Mallows, C. L. (1972). A note on asymptotic joint normality. Ann. Math.Statist. 43, 508515.

    Mijnheer, J. (1984). On the rate of convergence to a stable limit law. InLimit theorems in Probability and Statistics, Vol. I, II (Veszprem, 1982),

    vol. 36 of Colloq. Math. Soc. Janos Bolyai, pp. 761775. North-Holland,Amsterdam.

    Mijnheer, J. (1986). On the rate of convergence to a stable limit law. II.Litovsk. Mat. Sb. 26, 482487.

    Petrov, V. V. (1995). Limit Theorems of Probability, Sequences of Indepen-dent Random Variables. Oxford Science Publications, Oxford.

    Prohorov, Y. (1952). On a local limit theorem for densities. Doklady Akad.Nauk SSSR (N.S.) 83, 797800. In Russian.

    Rachev, S. T. (1984). The MongeKantorovich problem on mass transferand its applications in stochastics. Teor. Veroyatnost. i Primenen. 29,625653.

    Rachev, S. T. and Ruschendorf, L. (1992). Rate of convergence for sumsand maxima and doubly ideal metrics. Teor. Veroyatnost. i Primenen.37, 276289.

    Rachev, S. T. and Ruschendorf, L. (1994). On the rate of convergence inthe CLT with respect to the Kantorovich metric, In Probability in Ba-nach spaces, 9 (Sandjberg, 1993), vol. 35 of Progr. Probab., pp. 193207.Birkhauser, Boston.

    Renyi, A. (1961). On measures of entropy and information. In J. Neyman(ed.), Proceedings of the 4th Berkeley Conference on Mathematical Statis-tics and Probability, pp. 547561, Berkeley. University of California Press.

    20

  • 7/30/2019 Mallows Theorem

    21/21

    Shimizu, R. (1975). On Fishers amount of information for location family. In

    G.P.Patil et al (ed.), Statistical Distributions in Scientific Work, Volume3, pp. 305312. Reidel.

    Shorack, G. R. and Wellner J. A. (1986). Empirical processes with Applica-tions to Statistics. John Wiley and Sons Inc., New York.

    Stout, W. (1979). Almost sure invariance principles when EX21 = . Z.Wahrsch. Verw. Gebiete 49, 2332.

    Tchen, A. H. (1980). Inequalities for distributions with given marginals.Ann. Probab. 8, 814827.

    von Bahr, B. and Esseen, C.-G. (1965). Inequalities for the rth absolutemoment of a sum of independent random variables, 1 r 2. Ann.Math. Statist. 36, 299303.

    21