university of missouri college of arts and sciencekaplandm/pdfs/kaplan... · improved quantile...

IMPROVED QUANTILE INFERENCE VIA FIXED-SMOOTHINGASYMPTOTICS AND EDGEWORTH EXPANSION

DAVID M. KAPLAN

Abstract. Estimation of a sample quantile’s variance requires estimation of theprobability density at the quantile. The common quantile spacing method involvessmoothing parameter m. When m,n → ∞, the corresponding Studentized teststatistic asymptotically follows a standard normal distribution. Holding m fixedasymptotically yields a nonstandard distribution dependent on m that contains theEdgeworth expansion term capturing the variance of the quantile spacing. Conse-quently, the fixed-m distribution is more accurate than the standard normal underboth asymptotic frameworks. For the fixed-m test, I propose an m to maximizepower subject to size control, as calculated via Edgeworth expansion. Comparedwith similar methods, the new method controls size better and maintains good orbetter power in simulations. Results for two-sample quantile treatment effect infer-ence are given in parallel.

Keywords: Edgeworth expansion; Fixed-smoothing asymptotics; Inference;Quantile; Studentize; Testing-optimal.

1. Introduction

This paper considers inference on population quantiles, specifically via the Studen-

tized test statistic from Siddiqui (1960) and Bloch & Gastwirth (1968) (jointly SBG

hereafter), for one- and two-sample setups. Median inference is a special case. Non-

parametric inference on conditional quantiles is an immediate extension if the condi-

tioning variables are all discrete1; for continuous conditioning variables, the approach

Date: July 5, 2013; first version available online January 14, 2011.Email: [email protected]. Thanks to Yixiao Sun for much patient and enthusiastic guidanceand discussion. Thanks also to Karen Messer, Dimitris Politis, Andres Santos, Hal White, BrendanBeare, Ivana Komunjer, Joel Sobel, Marjorie Flavin, Kelly Paulson, Matt Goldman, Linchun Chen,and other attentive and interactive listeners from UCSD audiences. Mail: Department of Economics,University of Missouri–Columbia, 909 University Ave, 118 Professional Bldg, Columbia, MO 65211-6040, USA..1In addition to common variables like race and gender, variables like age measured in years may betreated as discrete as long as enough observations exist for each value of interest.

1

2 DAVID M. KAPLAN

in Kaplan (2012) may be used with an adjusted bandwidth rate. Like a t-statistic for

the mean, the SBG statistic is normalized using a consistent estimate of the variance

of the quantile estimator, and it is asymptotically standard normal.

I develop a new asymptotic theory that is more accurate than the standard normal

reference distribution traditionally used with the SBG statistic. With more accuracy

comes improved inference properties,2 i.e. controlling size where previous methods

were size-distorted without sacrificing much power. A plug-in procedure to choose

the testing-optimal smoothing parameter m translates the theory into practice.

This work builds partly on Goh (2004), who suggests using fixed-m asymptotics to

improve inference on the Studentized quantile. Goh (2004) uses simulated critical

values to examine the performance of existing m suggestions, which are tailored to

standard normal critical values. Here, a simple and accurate analytic approximation is

given for the critical values. Using this approximation, the new procedure for selecting

m is tailored to the fixed-m distribution. I also provide the theoretical justification

for the accuracy of the fixed-m distribution, which complements the simulations in

Goh (2004).

The two key results here are a nonstandard fixed-m asymptotic distribution and a

higher-order Edgeworth expansion. For a scalar location model, Siddiqui (1960) gives

the fixed-m result, and Hall & Sheather (1988, hereafter cited as “HS88”) give a spe-

cial case of the Edgeworth expansion in Theorem 1 below. The Edgeworth expansion

is more accurate than a standard normal since it contains higher-order terms that

otherwise end up in the remainder, so its remainder becomes of smaller order. There

are intuitive reasons why the fixed-m approximation is also more accurate than the

standard normal, but we show this by rigorous theoretical arguments.

In the standard first-order asymptotics, both m → ∞ and n → ∞ (and m/n → 0);

since m→∞, this may be called “large-m” asymptotics. In contrast, fixed-m asymp-

totics only approximates n → ∞ while fixing m at its actual finite sample value.

Fixed-m is an instance of “fixed-smoothing” asymptotics in the sense that this vari-

ance does not go to zero in the limit as it does in “increasing-smoothing.” It turns

out that the fixed-m asymptotics includes the high-order Edgeworth term captur-

ing the variance of the quantile spacing. Consequently, the fixed-m distribution is

2Inverting the level-α tests proposed here yields level-α confidence intervals. I use hypothesis testinglanguage throughout, but size distortion is analogous to coverage probability error, and higher powercorresponds to shorter interval lengths.

IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION 3

higher-order accurate under the conventional large-m asymptotics, while the standard

normal distribution is only first-order accurate. Under the fixed-m asymptotics, the

standard normal is not even first-order accurate. In other words, the fixed-m dis-

tribution is more accurate than the standard normal irrespective of the asymptotic

framework. Similar results on the accuracy of fixed-smoothing asymptotics have been

found in time series inference (e.g., Sun, Phillips & Jin, 2008); this result suggests

fixed-smoothing asymptotics may be valuable more generally.

Not only is accuracy gained with the Edgeworth and fixed-m distributions, but they

are sensitive to m. By reflecting the effect of m, they allow us to determine the best

choice of m. With the fixed-m and Edgeworth results, I construct a test dependent

on m using the fixed-m critical values, and then evaluate the type I and type II error

of the test using the more accurate Edgeworth expansion. Then I can optimally select

m to minimize type II error subject to control of type I error. Approximating (instead

of simulating) the fixed-m critical values reveals the Edgeworth/fixed-m connection

above, and it also makes the test easier to implement in practice.

The critical value correction here provides size control robust to incorrectly chosen m,

whereas in HS88 the choice of m is critical to controlling size (or not). In simulations,

the new method has correct size even where the HS88 method is size-distorted. Power

is still good because m is explicitly chosen to maximize it, using the Edgeworth ex-

pansion; in simulations, it can be significantly better than HS88. HS88 do not provide

a separate result for the two-sample case. Monte Carlo simulations show that the new

method controls size better than the common Gaussian plug-in version of HS88 and

various bootstrap methods, while maintaining competitive power. The method of

Hutson (1999), following a fractional order statistic approach, has better properties

but is not always computable. A two-sample analog with similarly good performance

is given by Goldman & Kaplan (2012). For two-sample inference, bootstrap can per-

form well, though the new method is good and open to a particular refinement that

would increase power; more research in both approaches is needed.

Section 2 presents the model and hypothesis testing problem. Section 3 gives a non-

standard asymptotic result and corresponding corrected critical value, while Section

4 gives Edgeworth expansions of the standard asymptotic limiting distribution. Using

these, a method for selecting smoothing parameter m is given in Section 5, which is

followed by a simulation study and conclusion. Results for the two-sample extension

4 DAVID M. KAPLAN

are provided in parallel. More details and discussion are available in the working pa-

per version; full technical proofs and calculations are in the supplemental appendices;

all are available on the author’s website.

2. Quantile estimation and hypothesis testing

For simplicity and intuition, consider an iid sample of continuous random variable

X, whose p-quantile is ξp. The estimator ξp used in SBG is an order statistic. The

rth order statistic for a sample of n values is defined as the rth smallest value in the

sample and written as Xn,r, such that Xn,1 < Xn,2 < · · · < Xn,n. The SBG estimator

is

ξp = Xn,r, r = bnpc+ 1,

where bnpc is the floor function (greatest integer not exceeding np). Writing the cu-

mulative distribution function (cdf) of X as F (x) and the probability density function

(pdf) as f(x) ≡ F ′(x), I make the usual assumption that f(x) is positive and con-

tinuous in a neighborhood of the point ξp. Consequently, ξp is the unique p-quantile

such that F (ξp) = p.

The standard asymptotic result (for the vector form, see Mosteller, 1946; Siddiqui,

1960, also gives the following scalar result) for a central3 quantile estimator is

√n(Xn,r − ξp)

d→ N[0, p(1− p){f(ξp)}−2].

A consistent estimator of 1/f(ξp) that is asymptotically independent of Xn,r leads to

the Studentized sample quantile, which has the pivotal asymptotic distribution√n(Xn,r − ξp)√

p(1− p){

1/f(ξp)} d→ N(0, 1).(1)

Siddiqui (1960) and Bloch & Gastwirth (1968) propose and show consistency of

1/f(ξp) = Sm,n ≡n

2m(Xn,r+m −Xn,r−m)(2)

when m→∞ and m/n→ 0 as n→∞.

For two-sided inference on ξp, I consider the parameterization ξp = β − γ/√n. The

null and alternative hypotheses are H0 : ξp = β and H1 : ξp 6= β, respectively. When

3“Central” means that in the limit, r/n → p ∈ (0, 1) as n → ∞; i.e., r → ∞ and is some fractionof the sample size n. In contrast, “intermediate” would take r → ∞ but r/n → 0 (or r/n → 1,n− r →∞); “extreme” would fix r <∞ or n− r <∞.


γ = 0 the null is true. The test statistic examined in this paper is

Tm,n ≡√n(Xn,r − β)

Sm,n√p(1− p)

(3)

and will be called the SBG test statistic due to its use of (2). From (1), Tm,n is

asymptotically standard normal when γ = 0. A corresponding hypothesis test would

then compare Tm,n to critical values from a standard normal distribution.

For the two-sample case, assume that there are independent samples of X and Y ,

with nx and ny observations, respectively. For simplicity, let n = nx = ny. For

instance, if 2n individuals are separated into balanced treatment and control groups,

one might want to test if the treatment effect at quantile p has significance level α.

The marginal pdfs are fX(·) and fY (·). Interest is in testing if ξp,x = ξp,y. Under the

null hypothesis H0 : ξpx = ξpy = ξp, the first-order asymptotic result is

√n(Xn,r − ξp)−

√n(Yn,r − ξp)

d→ N(0, p(1− p)

[{fX(ξp)}−2 + {fY (ξp)}−2

]),

using the fact that the variance of the sum (or difference) of two independent normals

is the sum of the variances. The pivot for the two-sample case is then√n(Xn,r − Yn,r)√

{fX(ξp)}−2 + {fY (ξp)}−2√p(1− p)

d→ N(0, 1).

The Studentized version uses the same quantile spacing estimators as above:

Tm,n ≡√n(Xn,r − Yn,r)√

{n/(2m)}2(Xn,r+m −Xn,r−m)2 + {n/(2m)}2(Yn,r+m − Yn,r−m)2√p(1− p)

.

The same m is used for X and Y in anticipation of a Gaussian plug-in approach that

would yield the same m for X and Y regardless. This two-sample setup is easily

extended to the case of unequal sample sizes nx 6= ny, starting with√nx(Xn,r− ξp)−√

nx/ny√ny(Yn,r − ξp) and assuming nx/ny is constant asymptotically, but this is

omitted for clarity.

3. Fixed-m asymptotics and corrected critical value

The standard asymptotic result with m → ∞ as n → ∞ is a standard normal

distribution for the SBG test statistic Tm,n under the null hypothesis when γ = 0.

Since in reality m < ∞ for any given finite sample test, holding m fixed as n → ∞may give a more accurate asymptotic approximation of the true finite sample test

statistic distribution. With m →∞, there is increasing smoothing and Var(Sm,n) =

6 DAVID M. KAPLAN

O(1/m)→ 0; with m fixed, we have fixed-smoothing asymptotics since that variance

does not disappear. The fixed-m distribution is given below, along with a simple

formula for critical values that doesn’t require simulation.

3.1. Fixed-m asymptotics. Siddiqui (1960, eqn. 5.4) provides the fixed-m asymp-

totic distribution; Goh (2004, Appendix D.1) provides a nice alternative proof for the

median, which readily extends to any quantile.4 If γ = 0,

Tm,nd→ Z/V4m ≡ Tm,∞ as n→∞, m fixed,(4)

with Z ∼ N(0, 1), V4m ∼ χ24m/(4m), Z ⊥⊥ V4m, and Sm,n and Tm,n as in (2) and

(3).

The above distribution is conceptually similar to the Student’s t-distribution. A t-

statistic from normally distributed iid data has a standard normal distribution if either

the variance is known (so denominator is constant) or the sample size approaches

infinity. The distribution Tm,∞ is also standard normal if either the variance is known

(denominator in Tm,n is constant) or m→∞. When using an estimated variance in

a finite sample t-statistic, the more accurate t-approximation has fatter tails than a

standard normal, and it is given by Z/√Vv, where Z ∼ N(0, 1), Vv ∼ χ2

v/v with v the

degrees of freedom, and Z ⊥⊥ Vv. Similarly, when an estimated variance is used in the

SBG test statistic (the random Sm,n instead of constant 1/f(ξp) in the denominator),

the result is the distribution in (4) above with asymptotically independent numerator

and denominator. The fixed-m distribution reflects this uncertainty in the variance

estimator that is lost under the standard asymptotics.

For the two-sample test statistic under the null, as n→∞ with m fixed,

Tm,nd→ ZU≡ Tm,∞, where

U ∼ (1 + ε)1/2, ε ≡(V2

4m,1 − 1) + (V24m,2 − 1){fX(ξp)/fY (ξp)}2

1 + {fX(ξp)/fY (ξp)}2,

and Z, V4m,1, and V4m,2 are mutually independent. The derivation is in the full

appendix online. The strategy is the same, using results from Siddiqui (1960) for

asymptotic independence and distributions for each component.

Unlike the one-sample case, Tm,∞ is not pivotal. We can either estimate the density

derivative fX(ξp)/fY (ξp) or consider the upper bound for critical values. This is the

4The result stated for a general quantile in Theorem 2 in Section 3.5 appears to have a typo andnot follow from a generalization of the proof given, nor does it agree with Siddiqui (1960).


same nuisance parameter faced by the two-sample fractional order statistic method

of Goldman & Kaplan (2012). Note that ε is a weighted average of V24m,1 − 1 and

V24m,2 − 1, where the weights sum to one and V4m,1 ⊥⊥ V4m,2. To see the effect of the

weights, consider any W1 and W2 with Cov(W1,W2) = 0, Var(W1) = Var(W2) = σ2W ,

and λ ∈ [0, 1]. Then

Var{λW1 + (1− λ)W2} = λ2Var(W1) + (1− λ)2Var(W2) + 2λ(1− λ)Cov(W1,W2)

= σ2W{λ2 + (1− λ)2}.

This is maximized by λ = 1 or 1 − λ = 1, giving σ2W . The minimum has first order

condition 0 = 2λ+2(1−λ)(−1), yielding λ = 1/2 = 1−λ and Var{λW1+(1−λ)W2} =

σ2W/2. This means the variance of ε (and U and Tm,∞) is smallest when the weights

are each 1/2, which is when fX(ξp) = fY (ξp). When the weights are zero and one,

which is when the variance of ε is largest, we get the special case of testing against

a constant and Tm,∞ = Tm,∞. Thus, critical values from the one-sample case provide

conservative inference in the two-sample case.

3.2. Corrected critical value. An approximation of the fixed-m cdf around the

standard normal cdf will lead to critical values based on the standard normal distri-

bution. The standard normal cdf is Φ(·), the standard normal pdf is φ(·) = Φ′(·),and the first derivative φ′(z) = −zφ(z). The first three central moments of the χ2

4m

distribution are 4m, 8m, and 32m, respectively. The general approach here is to use

the independence of Z and V4m to rewrite the cdf in terms of Φ, and then to expand

around E(V4m) = 1. Using (4), for a critical value z,

P (Tm,∞ < z) = P (Z/V4m < z) = E{Φ(zV4m)} = E[Φ{z + z(V4m − 1)}]

= E[Φ(z) + Φ′(z)z(V4m − 1) + (1/2)Φ′′(z){z(V4m − 1)}2 +O(m−2)

]= Φ(z)− z3φ(z)

2E

{(χ2

4m − 4m

4m

)2}

+O(m−2)

= Φ(z)− z3φ(z)

4m+O(m−2).(5)

Note that (5) is a distribution function itself: its derivative in z is {φ(z)/(4m)}(z4 −3z2 + 4m), which is positive for all z when m > 9/16 (as it always is); the limits at

−∞ and ∞ are zero and one; and it is cadlag since it is differentiable everywhere.

The approximation error O(m−2) does not change if m is fixed, but it goes to zero as

m→∞, which the selected m does as n→∞ (see Section 5.3). Uniform convergence

8 DAVID M. KAPLAN

of Φ(z)−z3φ(z)/(4m) to P (Tm,∞ < z) over z ∈ R as m→∞ then obtains via Polya’s

Theorem (e.g., DasGupta, 2008, Thm. 1.3(b)). Appendix A shows the accuracy of

(5) for m > 2; for m ≤ 2, simulated critical values can be used, as in the provided

code.

To find the value z that makes the nonstandard cdf above take probability 1 − α

(for rejection probability α under the null for an upper one-sided test), I set Φ(z)−z3φ(z)/(4m) = 1− α and solve for z. If z is some deviation of order m−1 around the

value z1−α such that Φ(z1−α) = 1− α for the standard normal cdf, then solving for c

in z = z1−α + c/m gives

c = z31−α/4 +O(m−1),(6)

and thus the upper one-sided test’s corrected critical value is

z = z1−α + c/m = z1−α +z31−α

4m+O(m−2).

For a symmetric two-sided test, since the additional term in (5) is an odd function of

z, the critical value is the same but with z1−α/2, yielding

z = zα,m +O(m−2), zα,m ≡ z1−α/2 +z31−α/2

4m.(7)

Note zα,m > z1−α/2 and depends on m; e.g., z.05,m = 1.96 + 1.88/m. Using the same

method with an additional term in the expansion, as compared in Appendix A,

z = z1−α/2 +z31−α/2

4m+z51−α/2 + 8z3

1−α/2

96m2+O(m−3).

The rest of this paper uses (7); the results in Section 5 also go through with the above

third-order corrected critical value or with a simulated critical value.

In the two-sample case under H0 : F−1X (p) = F−1

Y (p) = ξp, after calculating some

moments (see supplemental Two-sample Appendix C), the fixed-m distribution can

be approximated by

P (Tm,∞ < z) = Φ(z) + φ(z)zE(U − 1) + (1/2){−zφ(z)}z2E{(U − 1)2}

+O[E{(U − 1)3}]

= Φ(z)− φ(z)

4m

[z3{fX(ξp)}−4 + {fY (ξp)}−4

S40

− 2z{fX(ξp)}−2{fY (ξp)}−2

S40

]+O(m−2)


= Φ(z)− φ(z)

4m

[z3 1 + δ4

(1 + δ2)2 − 2zδ2

(1 + δ2)2

]+O(m−2),

S0 ≡([fX{F−1

X (p)}]−2

+[fY {F−1

Y (p)}]−2)1/2

, δ ≡ fX(ξp)/fY (ξp).(8)

The corresponding critical value is

z = zα,m +O(m−2) ≤ zα,m +O(m−2),

zα,m = z1−α/2 +z31−α/2[{fX(ξp)}−4 + {fY (ξp)}−4]− 2z1−α/2{fX(ξp)}−2{fY (ξp)}−2

4mS40

= z1−α/2 +z31−α/2(1 + δ4)− 2z1−α/2δ

2

4m(1 + δ2)2 .(9)

When the pdf of Y collapses toward a constant, {fY (ξp)}−1 → 0 and so δ → 0;

P (Tm,∞ < z) reduces to (5), and zα,m to zα,m. Thus, as an alternative to estimat-

ing δ = fX(ξp)/fY (ξp), the one-sample critical value zα,m provides conservative two-

sample inference, as discussed in Section 3.1. Under the exchangeability assumption

used for permutation tests, δ = 1 and zα,m attains its smallest possible value.

4. Edgeworth expansion

HS88 give the Edgeworth expansion for the asymptotic distribution of Tm,n when

γ = 0. To calculate the type II error of a test using the fixed-m critical values (Section

5.2), the case γ 6= 0 is needed. Section 5.2 shows how to apply this result.

4.1. One-sample case. The result here includes the result of HS88 as a special case,

and indeed the results match5 when γ = 0. The ui functions are indexed by γ, so u1,0

is the same as u1 from HS88 (page 385), and similarly for u2,0 and u3,0.

Theorem 1. Assume f(ξp) > 0 and that, in a neighborhood of ξp, f′′ exists and

satisfies a Lipschitz condition, i.e. for some ε > 0 and all x, y sufficiently close to

ξp, |f ′′(x)− f ′′(y)| ≤ constant|x− y|ε. Suppose m = m(n)→∞ as n→∞, in such

a manner that for some fixed δ > 0 and all sufficiently large n, nδ ≤ m(n) ≤ n1−δ.

5There appears to be a typo in the originally published result; the first part of the first term of u1

in HS88 was (1/6)[p(1− p)]1/2, but it appears to be (1/6)[p/(1− p)]1/2 as given above.

10 DAVID M. KAPLAN

Define C ≡ γf(ξp)/√p(1− p), and defining functions

u1,γ(z) ≡ 1

6

(p

1− p

)1/21 + p

p(z2 − 1)− C

√1− pp

[1− pf ′(ξp)

{f(ξp)}2− 1

2(1− p)

]z

− 1

2

(p

1− p

)1/2[1 +

f ′(ξp)

{f(ξp)}2(1− p)

]z2

− {(bnpc+ 1− np)− 1 + (1/2)(1− p)}{p(1− p)}−1/2,

u2,γ(z) ≡ 1

4

(2Cz2 − C2z − z3

), and

u3,γ(z) ≡ 3{f ′(ξp)}2 − f(ξp)f′′(ξp)

6{f(ξp)}4(z − C),

it follows that

sup−∞<z<∞

∣∣∣P (Tm,n < z)− {Φ(z + C) + n−1/2u1,γ(z + C)φ(z + C)

+m−1u2,γ(z + C)φ(z + C) + (m/n)2u3,γ(z + C)φ(z + C)}∣∣∣

= o{m−1 + (m/n)2}.(10)

A sketch of the proof is given in Appendix D. The assumptions are the same as in

HS88.

For γ = 0, the term m−1u2,0(z)φ(z) = −z3φ(z)/(4m) is identical to the term in the

fixed-m distribution in (5). Thus under the null, the fixed-m distribution captures the

high-order Edgeworth term associated with the variance of Sm,n. In other words, the

fixed-m distribution is high-order accurate under the conventional large-m asymp-

totics, while the standard normal distribution is only first-order accurate. Since the

fixed-m distribution is also more accurate under fixed-m asymptotics, where the stan-

dard normal is not even first-order accurate, theory strongly indicates that fixed-m

critical values are more accurate, which is born out in simulations here and in Goh

(2004).

Let γ 6= 0 so that the null hypothesis is false, where as before H0 : ξp = β with

ξp = β − γ/√n. Letting S0 ≡ 1/f(ξp),

Tm,n =

√n(Xn,r − ξp)− γSm,n

√p(1− p)

=

√n(Xn,r − ξp)

Sm,n√p(1− p)

− γ√p(1− p)S0

(S0

Sm,n+ 1− 1

),

P (Tm,n < z) = P

{ √n(Xn,r − ξp)

Sm,n√p(1− p)

− γ√p(1− p)S0

(S0

Sm,n+ 1− 1

)< z

}

IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION11

= P

{ √n(Xn,r − ξp)

Sm,n√p(1− p)

− γ√p(1− p)S0

(S0

Sm,n− 1

)< z + C

}

= P

{√n(Xn,r − ξp) + γ(Sm,n/S0 − 1)

Sm,n√p(1− p)

< z + C

}.(11)

If the true S0 were known and used in Tm,n instead of its estimator Sm,n, this

would be simply the distribution from HS88 with a shift of the critical value by

C ≡ γ/{S0

√p(1− p)}, which is γ normalized by the true (hypothetically known)

variance. But Sm,n is random, so the HS88 expansion is insufficient and Theorem 1

is needed.

4.2. Two-sample case. The strategy and results are similar; the full proof may be

found in the supplemental Two-sample Appendix.

Theorem 2. Let Xiiid∼ FX and Yi

iid∼ FY , and Xi ⊥⊥ Yj ∀i, j. With assumptions

in Theorem 1 applied to both FX and FY , and letting C ≡ γS−10 /√p(1− p), define

functions

u1,γ(z) ≡ 1

6

1 + p√p(1− p)

(g3x − g3

y

S30

)(z2 − 1)

+{

(a2g2x − a′2g2

y)(1− p) + (a1g2x − a′1g2

y)}{p/(1− p)}1/2(2p2S3

0)−1(z2)

−{

2(a2g2x − a′2g2

y) + (a1g2x − a′1g2

y)/(1− p)}

(2pS30)−1C{(1− p)/p}1/2(z)

+{

(a2g2x − a′2g2

y)− S20(a2 − a′2)

}{p(1− p)}1/2(2p2S3

0)−1

− {(bnpc+ 1− np)− 1 + (1/2)(1− p)}(gx − gy)(S0

√p(1− p)

)−1

,

u2,γ(z) ≡ −1

4

(g4x + g4

y

S40

)z3 +

1

2

g2xg

2y

S40

z − 1

2

g2xg

2y

S40

C +1

4

(g4x + g4

y

S40

)(2Cz2 − C2z),

u3,γ(z) ≡gxg′′x + gyg

′′y

6S20

(z − C),

gX(·) ≡ 1/fX(F−1X (·)), gx ≡ gX(p), g′′x ≡ g′′X(p), HX(x) ≡ F−1

X (e−x), ai ≡ H(i)X

(n∑j=r

j−1

),

gY (·) ≡ 1/fY (F−1Y (·)), gy ≡ gY (p), g′′y ≡ g′′Y (p), HY (y) ≡ F−1

Y (e−y), a′i ≡ H(i)Y

(n∑j=r

j−1

),

where H(i)(·) is the ith derivative of function H(·).

12 DAVID M. KAPLAN

It follows that

sup−∞<z<∞

∣∣∣P(Tm,n < z)− {Φ(z + C) + n−1/2u1,γ(z + C)φ(z + C)

+m−1u2,γ(z + C)φ(z + C) + (m/n)2u3,γ(z + C)φ(z + C)}∣∣∣

= o{m−1 + (m/n)2}.

For a two-sided test, u1,γ will cancel out, so the unknown terms therein may be

ignored. Under the null, when C = 0, the u2,γ term again exactly matches the fixed-

m distribution, demonstrating that the fixed-m distribution is more accurate than the

standard normal under large-m (as well as fixed-m) asymptotics. A similar comment

applies here, too, about the effect of Sm,n being random.

5. Optimal Smoothing Parameter Selection

5.1. Type I error. In order to select m to minimize type II error subject to control

of type I error, expressions for type I and type II error dependent on m are needed.

Comparing with the Edgeworth expansion in (10) when γ = 0, the approximate type

I error of the two-sided symmetric test can be calculated using the corrected critical

values from (7). Note that u1,0(z)φ(z) in (10) is an even function since φ(z) = φ(−z)

and z only appears as z2 in u1,0(z), so u1,0(z) = u1,0(−z); thus it will cancel out

for a two-sided test. Also note that the functions u2,0(z) and u3,0(z) are odd, so

u2,0(−z) = −u2,0(z) and u3,0(−z) = −u3,0(z). Below, the second high-order term will

disappear due to the use of the fixed-m critical value, leaving only the third high-order

term.

Proposition 3. If γ = 0, then P (|Tm,n| > zα,m) = eI + o{m−1 + (m/n)2},

eI = α− 2(m/n)2u3,0(z1−α/2)φ(z1−α/2).(12)

Proof. Starting with (10) with γ = 0, the two-sided symmetric test rejection proba-

bility under the null hypothesis for critical value z is

P (|Tm,n| > z | H0) = P (Tm,n > z | H0) + P (Tm,n < −z | H0)

= 2− 2Φ(z)− 2m−1u2,0(z)φ(z)

− 2(m/n)2u3,0(z)φ(z) + o{m−1 + (m/n)2}.


With the corrected critical value zα,m = z1−α/2 + z31−α/2/(4m),

P (|Tm,n| > zα,m | H0) = 2− 2Φ(zα,m) + 2z3α,mφ(zα,m)/(4m)

− 2(m/n)2u3,0(zα,m)φ(zα,m) + o{m−1 + (m/n)2}

= α− 2(m/n)2u3,0(z1−α/2)φ(z1−α/2) + o{m−1 + (m/n)2}.

�

The dominant part of the type I error, eI , depends on m, n, and z1−α/2, as well as

f(ξp), f′(ξp), and f ′′(ξp) through u3,0(z1−α/2). Up to higher-order remainder terms,

the type I error does not exceed nominal size α if u3,0(z1−α/2) ≥ 0, and there will be

size distortion if u3,0(z1−α/2) < 0.

For common regions of common distributions, u3,0(z1−α/2) ≥ 0 and so eI ≤ α when

using corrected critical values. Since z1−α/2 > 0, the sign of u3,0(z1−α/2) depends on

the sign of 3f ′(ξp)2− f(ξp)f

′′(ξp) or equivalently of the third derivative of the inverse

cdf ∂3

∂p3F−1(p). According to HS88, for p = 0.5, this is positive for all symmetric

unimodal densities and almost all skew unimodal densities, except when f ′ changes

quickly near ξp. I found that for any quantile p, the sign is positive for t-, normal,

exponential, χ2, and Frechet distributions; see Appendix B. This suggests that sim-

ply using the fixed-m corrected critical values alone is enough to reduce eI to α or

below.

For the two-sample case, as shown in the full appendix, the result is similar.

Proposition 4. If γ = 0, then

P(|Tm,n| > zα,m

)≤ P

(|Tm,n| > zα,m

)= eI + o{m−1 + (m/n)2},

eI = α− 2(m/n)2u3,0(z1−α/2)φ(z1−α/2).

Here also, the corrected critical value approximated from the fixed-m distribution

leads to the dominant part of type I error being bounded by α for all common dis-

tributions, since the sign of u3 is positive for the same reasons as in the one-sample

case. The one-sample critical value zα,m gives conservative inference; the infeasible

zα,m results in better power but requires estimating δ ≡ fX(ξp)/fY (ξp); the zα,m under

exchangeability (δ = 1) leads to the best power but also size distortion if exchange-

ability is violated.

14 DAVID M. KAPLAN

5.2. Type II Error. As above, I use the Edgeworth expansion in (10) to approximate

the type II error of the two-sided symmetric test using critical values from (7). Since

a uniformly most powerful test does not exist for general alternative hypotheses, I

will follow a common strategy in the optimal testing literature and pick a reasonable

alternative hypothesis against which to maximize power. The hope is that this will

produce a test near the power envelope at all alternatives, even if it is not strictly the

uniformly most powerful.

I choose to maximize power against the alternative where first-order power is 50% for

a two-sided test. From above, the type II error is thus 0.5 = P (|Tm,n| < z1−α/2).=

GC2(z21−α/2), where GC2 is the cdf of a noncentral χ2 distribution with one degree of

freedom and noncentrality parameter C2, and C was defined in Theorem 1. For α =

0.05, this gives γf(ξp)/√p(1− p) ≡ C = ±1.96, or γ = ±1.96

√p(1− p)/f(ξp).

Calculation of the following type II error may be found in the working paper.

Proposition 5. If C2 solves 0.5 = GC2(z1−α/2), and writing f for f(ξp) and similarly

f ′ and f ′′, then

P (|Tm,n| < zα,m) = eII + o{m−1 + (m/n)2},

eII = 0.5 + (1/4)m−1{φ(z1−α/2 − C)− φ(z1−α/2 + C)}Cz21−α/2

+ (m/n)2 3(f ′)2 − ff ′′

6f 4z1−α/2

{φ(z1−α/2 + C) + φ(z1−α/2 − C)

}+O(n−1/2).(13)

There is a bias term and a variance term in eII above. Per Bloch & Gastwirth (1968),

the variance and bias of Sm,n are indeed of order m−1 and (m/n)2, respectively, as

given in their (2.5) and (2.6):

AsyVar(Sm,n) = (2mf 2)−1 as m→∞,m/n→ 0,

AsyBias(Sm,n) = (m/n)2 3(f ′)2 − ff ′′

6f 5as m→∞,m/n→ 0,

similar to above aside from the additional 1/f 2 and 1/f factors. As m → ∞,

Var(Sm,n) → 0 (increasing smoothing); with m fixed, this variance is also fixed

(fixed smoothing). As n grows but Sm,n only uses a proportion of the n observa-

tions approaching zero, the bias will also decrease. The bias decreases to zero in the

fixed-m thought experiment, too, and thus is not captured by the fixed-m asymptotic

distribution.


Now there are expressions for the dominant components of type I and type II error

as functions of m, given in (12) and (13), so m can be chosen to minimize eII subject

to control of eI .

First, I look at eII more closely and try to sign some terms. As discussed before, for

common distributions 3(f ′)2−ff ′′

6f4 ≥ 0, so the entire (m/n)2 expression is positive; type

II error from this term is increasing in m, so a smaller value of m would minimize it.

The m−1 term in (13) is also positive since φ(z1−α/2 − C) > φ(z1−α/2 + C), so it is

minimized by a larger m.

It is then possible that minimizing eII gives an “interior” solution m, i.e. m ∈[1,min(r− 1, n− r)]. For example, for a standard normal distribution where n = 100

and p = 0.5, m can be between one and 49. As shown in Appendix B, 3(f ′)2− ff ′′ =[φ(0)]2, and dividing by 6f 4 yields 1/(6[φ(0)]2) = 2π/6. Then, in choosing m to

minimize (1/4)m−1φ(0)1.963 + (m/n)2√

2π(1.96)/6 = m−1(0.751) + m2(0.819)10−4,

the m−1 term dominates for small m and the other term for large m, so that eII is

minimized by m = 17, a feasible choice in practice.

The two-sample case is similar. Calculations are in the supplemental appendix.

Proposition 6. If C2 solves 0.5 = GC2(z1−α/2),

θ ≡ S−40 (g4

x + g4y) = (1 + δ4

a)/(1 + δ2

a

)2,

gx ≡ 1/fX(F−1X (p)

)and gy ≡ 1/fY

(F−1Y (p)

)are as defined in Theorem 2, as are g′′x

and g′′y , δa ≡ fX(F−1X (p)

)/fY(F−1Y (p)

)similar to (8), and

S0 ≡[{fX(F−1

X (p))}−2 + {fY (F−1Y (p))}−2

]1/2as in (8), then

P (|Tm,n| < zα,m) = eII + o{m−1 + (m/n)2},

eII = 0.5 + (1/4)m−1[φ(z1−α/2 + C){−θCz2

1−α/2 + (1− θ)z1−α/2}

+ φ(z1−α/2 − C){θCz21−α/2 + (1− θ)z1−α/2}

]+ (m/n)2

gxg′′x + gyg

′′y

6S20

z1−α/2{φ(z1−α/2 + C) + φ(z1−α/2 − C)

}+O(n−1/2).

5.3. Choice of m. With the fixed-m corrected critical values, eI ≤ α for all quantiles

of common distributions, as discussed. In implementing a method to choose m, below

16 DAVID M. KAPLAN

I use a Gaussian plug-in assumption (as is commonly done with other suggestions for

m), and consequently the calculated eI is indeed always less than nominal α (see

Appendix B). Thus, the method reduces to minimizing eII . Since eII has a positive

m−1 component and a positive m2 component, it will be U-shaped for positive m.

If the first-order condition yields an infeasibly big m, the biggest feasible m is the

minimizer over the feasible range.

For a randomized alternative for a symmetric two-sided test, the first-order condition

of (13) is

0 =∂eII∂m

= −(1/4)m−2[φ(z1−α/2 − C)− φ(z1−α/2 + C)]Cz21−α/2

+ (2m/n2)3(f ′)2 − ff ′′

6f 4z1−α/2

[φ(z1−α/2 + C) + φ(z1−α/2 − C)

],

m ={

(1/4){φ(z1−α/2 − C)− φ(z1−α/2 + C)}Cz21−α/2

/{(2/n2)3(f ′)2 − ff ′′

6f 4z1−α/2

[φ(z1−α/2 + C) + φ(z1−α/2 − C)

]}}1/3

.

Remember that C is chosen ahead (such as C = 1.96), φ is the standard normal

pdf, z1−α/2 is determined by α, and n is known for any given sample; but the object3(f ′)2−ff ′′

6f4 is unknown. Since size control isn’t dependent on m, it may be worth

estimating that object even though the variance of that estimator will be large; this

option has not been explored.

As is common in the kernel bandwidth selection literature, I will plug in6 φ for f (or

rather φ(Φ−1(p)) for f(ξp)). In all the simulations below, when np is close to 1 or n,

this makes no difference in the m selected; this is also true if F is actually normal,

or special cases like the median for log-normal. In some cases the m chosen will be

very different, but the effect on power is small; e.g., with Unif(0, 1), n = 45, p = 0.5,

α = 0.05, the Gaussian plug-in yields m = 9 instead of m = 22, but the power loss is

only 4% at the alternative considered in Proposition 5. Still, estimation of f , f ′, and

f ′′ may be best for larger n.

The Gaussian plug-in yields

mK(n, p, α, C) ={

(1/4){φ(z1−α/2 − C)− φ(z1−α/2 + C)}Cz21−α/2

/{(2/n2)3(φ′(Φ−1(p)))2 − φ(Φ−1(p))φ′′(Φ−1(p))

6(φ(Φ−1(p)))4

6Here, using N(0, 1) is mathematically equivalent to N(µ, σ2) since µ and σ cancel out.


× z1−α/2[φ(z1−α/2 + C) + φ(z1−α/2 − C)

]}}1/3

= n2/3(Cz1−α/2)1/3(3/4)1/3

((φ(Φ−1(p)))2

2(Φ−1(p))2 + 1

)1/3

×(φ(z1−α/2 − C)− φ(z1−α/2 + C)

φ(z1−α/2 − C) + φ(z1−α/2 + C)

)1/3

,

and for the 50% first-order power C and α = 0.05, as calculated below,

mK(n, p, α = 0.05, C = 1.96) = n2/3(1.42)

((φ(Φ−1(p)))2

2(Φ−1(p))2 + 1

)1/3

.(14)

Compare (14) with the suggested m from HS88. Using their Edgeworth expansion

under the null, they calculate the level error of a two-sided symmetric confidence

interval constructed by inverting Tm,n with standard normal critical values. Since the

n−1/2 term disappears for two-sided tests and confidence intervals, m can be chosen

to make the level error zero as long as u3,0 is positive7. Rewriting their (3.1) to match

my notation above,

mHS =⌊n2/3z

2/31−α/2

[1.5f 4/(3(f ′)2 − ff ′′)

]1/3⌋.

HS88 proceed with a median-specific, data-dependent method. More commonly, pa-

pers like Koenker & Xiao (2002) plug in φ for f when computing hHS (HS=Hall

and Sheather) in their simulations, as shown explicitly on page 3 of their electronic

appendix; Goh & Knight (2009) use the same Gaussian plug-in version for their sim-

ulations. Koenker & Xiao (2002) also use a Gaussian plug-in procedure based on

Bofinger (1975), whose m is of rate n4/5 to minimize MSE, that is also considered

in the simulation study below. Their “bandwidth” h is just m/n, so they use the

equivalent of

mHS = n2/3z2/31−α/2

(1.5

[φ(Φ−1(p))]2

2(Φ−1(p))2 + 1

)1/3

,(15)

mB = n4/5

(4.5

[φ(Φ−1(p))]4

[2(Φ−1(p))2 + 1]2

)1/5

.(16)

For p = 0.5, α = 0.05, these two will be equal if n = 20.93, or rather mB < mHS if

n < 21 and mB > mHS if n ≥ 21. Since they use the same critical values, size and

power will be smaller whenever m is bigger.

7If u3,0 is negative, their first order condition for m is similar but with an additional (1/2)1/3

coefficient; in Section 3, they only consider positive u3,0.

18 DAVID M. KAPLAN

Figure 1. Plot of the ratio mK/mHS against the first-order powerused to calculate C; 0.5 is used throughout this paper.

Since z2/31−α/2(1.5)1/3 = 1.79 for α = 0.05, mK(n, p, α = 0.05, C = 1.96) = 0.79mHS.

This ratio is calculated for other values of C based on other first-order power values,

and the plot is shown in Figure 1; the dependence is mild until values close to 0%

or 100% are chosen. Remember that mK is optimized for the test using corrected

critical values from the fixed-m distribution, whilemHS is chosen for use with standard

normal critical values. It then makes sense that mK < mHS since the fixed-m critical

values control size, allowing for smaller m when otherwise eI might be a binding

constraint for a test with standard normal critical values.

For the two-sample case, the strategy is the same. The testing-optimal mK is found

by taking the first-order condition of (6) with respect to m, and then solving for

m. Since the two-sample eII has positive m−1 and m2 terms, it is U-shaped, which

again justifies picking the largest feasible m if the calculated m is infeasibly large.

Depending if zα,m or zα,m is used, solving for m in the FOC yields

zα,m : mK = n2/3(3/4)1/3

(S2

0

gxg′′x + gyg′′y

)1/3

×

{2g2

xg2y

S40

+g4x + g4

y

S40

Cz1−α/2φ(z1−α/2 − C)− φ(z1−α/2 + C)

φ(z1−α/2 − C) + φ(z1−α/2 + C)

}1/3

,

zα,m : mK = n2/3(3/4)1/3

(S2

0

gxg′′x + gyg′′y

)1/3


×

{2g2

xg2y

S40

(z21−α/2 + 2) +

g4x + g4

y

S40

Cz1−α/2φ(z1−α/2 − C)− φ(z1−α/2 + C)

φ(z1−α/2 − C) + φ(z1−α/2 + C)

}1/3

,

where the latter m is weakly larger, with equality in the limiting special case where

either X or Y is a constant. I again chose to proceed with a Gaussian plug-in (using

sample variances) for the g and S0 terms due to the difficulty of estimating g′′, but

power can be increased by more refined plug-in methods.

Figure 2. Analytic and empirical eI and eII by m.Both: n = 21, p = 0.5, α = 0.05, F is log-normal with µ = 0, σ = 3/2,5000 replications (for empirical values). Lines are analytic; markerswithout lines are empirical. Legends: “An” is analytic (Edgeworth-derived), “Emp” is empirical (simulated), “std” is standard normalcritical values, and “fix” is fixed-m critical values.Right (eII): randomized alternative (see text), γ/

√n = 0.80.

Figure 2 compares the Edgeworth approximations to the true (simulated) type I

and type II error for scalar data drawn from a log-normal distribution. For non-

normal distributions like this, the approximation can be significantly different. With

standard normal critical values, type I error is monotonically decreasing with m while

type II error is monotonically increasing, so setting eI = α exactly should minimize

eII subject to eI ≤ α. With fixed-m critical values, this is not true, so the Edgeworth

expansion for general γ in Theorem 1 is necessary. With the fixed-m critical values,

type I error is always below α (approximately, and usually truly), and the type II

error curve near the selected m is very flat since it is near the minimum where the

slope is zero. Consequently, a larger-than-optimal m can result in larger power loss

for HS88 than the new method, and a smaller-than-optimal m incurs size distortion

for HS88 but not the new method.

20 DAVID M. KAPLAN

6. Simulation Study8

Regarding m, Siddiqui (1960) originally suggested a value of m on the order of n1/2.

Bloch & Gastwirth (1968) then suggested a rate of n4/5, to minimize the asymptotic

mean square error of Sm,n. With this n4/5 rate, Bofinger (1975) then suggested the mB

above in (16). HS88, using an Edgeworth expansion under the null hypothesis, find

that an m of order n2/3 minimizes the level error of two-sided tests, and they provide

an infeasible expression for m. For the median, they also give a data-dependent

method for selecting m; for a general quantile, the Gaussian plug-in version of mHS

in (15) is more commonly used, as in Koenker & Xiao (2002) and Goh & Knight

(2009).

The following simulations compare five methods. The first method is presented in

this paper, with fixed-m corrected critical values and mK chosen to minimize type II

error subject to control of type I error; the code is publicly available as Kaplan (2011).

For the two-sample zα,m, I implemented a simplistic estimator of θ ≡ S−40 (g4

x + g4y)

using the quantile spacings; a better estimator of θ would improve performance, as

Figure 11 exemplifies. The second method uses standard normal critical values and

mHS as in (15). The third method uses standard normal critical values and mB as

in (16). These three methods are referred to as “New method,” “HS,” and “B,”

respectively, in the text as well as figures in this section. For the new method, I used

the floor function to get an integer value for mK ; for HS and B, I instead rounded to

the nearest integer, to not amplify their size distortion.

The fourth method “BS” is a bootstrap method. The conclusion of much experimen-

tation was that a symmetric percentile-t using bootstrapped variance had the best

size control in the simulations presented below. For the number of outer (B1) and

inner (B2, for the variance estimator) bootstrap replications, B1 = 99 and B2 = 100

performed best and thus were used; any exceptions are noted below.

Compared to the best-performing bootstrap method, increasing B1 from 99 to 999

more often increased size distortion (Table 3), though not by much. Decreasing B2

from 100 to 25 was also worse. Empirical size distortion was also reduced by taking

the 1 − α quantile of the absolute values and comparing the absolute value of the

8MATLAB code implementing a hybrid of Hutson’s method and the new method, quantile inf.m, ispublicly available through the author’s website or MATLAB File Exchange; R code quantile inf.Ris also available available on the author’s website. MATLAB code for simulations is available fromthe author upon request.


(non-bootstrap) test statistic instead of taking α/2 and 1 − α/2 quantiles from the

sample of B1 Studentized bootstrap test statistics—i.e., the symmetric test was better

than the equal-tailed test. The more basic percentile method (without Studentiza-

tion; either equal-tail or symmetric) had worse size distortion in small samples or

less central quantiles, though less distortion with the median in larger samples; and

Studentizing with the SBG sparsity estimator or a kernel density estimator worsened

the size distortion in some cases. Assuming a normally distributed test statistic and

bootstrapping only the variance also had severe size distortion in many cases. For

example, for F = N(0, 1), n = 11, p = 0.1, α = 5%, one-sample empirical size of

the symmetric percentile-t with B1 = 99 and B2 = 100 is 9.7%; with B1 = 999 in-

stead, 9.8%; with an equal-tail test instead, 14.6%; with a symmetric percentile with

B1 = 99 or 999, 17.3% or 17.5%. Using m-out-of-n bootstrap had the potential to

perform better if the oracle m were used, but in many cases there was size distor-

tion even with the best m, and the risked increase in size distortion from picking the

wrong m was often large. The optimal choice of smoothing parameter there and with

the smoothed bootstrap is difficult but critical; but see Bickel & Sakov (2008) and

Ho & Lee (2005) for recent efforts. With respect to the example setup, a percentile

method smoothing with h = 0.92n−1/5 gives 5.1% for N(0, 1), but 23.1% for slash, or

9.0% and 17.1% for smoothed percentile-t; compare also the top and bottom halves

of Table 2 in Brown, Hall & Young (2001).

Figure 3. Graph showing for which combinations of quantile p ∈(0, 1) and sample size n Hutson (1999) is computable, in the middleregion. An extension marginally increases the region by not requiringan equal-tail test. Left: α = 5%. Right: α = 1%.

22 DAVID M. KAPLAN

Figure 4. Example of superior performance of Hutson (1999): size isalways controlled, power is better. In this one-sample case, uncalibratedpower is better for almost all alternatives even though all other methodsare size distorted. Distribution is log-normal with µ = 0, σ = 3/2;α = 5%, n = 21, and power is for p = 0.85.

Figure 5. Example of superior performance of Hutson (1999), two-sample. Distributions are both log-normal with µ = 0, σ = 3/2; α =5%, n = 21, and power is for p = 0.55.

The fifth method is an exact method from Hutson (1999). For one-sample inference,

it performed best in every single simulation where it could be used, so the results here

focus on cases when it cannot be computed. An example of its superior size control

and power is Figure 4. A graph showing when a simple extension of Hutson (1999), as

well as the original, can be computed is in Figure 3. Instead of putting α/2 probability

in each tail as in Hutson (1999), the extension shifts probability between the tails if


necessary. Simulations show that it performs equally well within the region of Figure

3. For two-sample inference, a simplistic extension of Hutson (1999) outperforms all

other methods most of the time, as in Figure 5, but not always. Since it seems like

a more refined extension of Hutson (1999) will be best for two-sample inference, the

results here focus on cases where that method cannot be used. However, if such a

refinement is not found, bootstrap and the new method provide strong competition;

HS and B can have significantly worse power, as Figure 5 shows.

Another “exact” method computes∑n

i=1 1{Xi < β}, which under H0 : ξp = β is

distributed Binomial(n, p). One downside is that the power curve often flattens out

as ξp deviates farther from β. A bigger downside is that randomization is needed,

particularly for small n and for np near one or n. For instance, with n = 11, p = 0.1,

and α = 5%, there is a 31.4% probability of observing zero Xi less than ξp. For a

two-sided test, we would randomly reject 0.025/0.314 = 8% of the time we observe

zero, and not reject 92% of the time we observe zero. Because of the common aversion

to randomized tests, this method is left out of the simulation results. However, the

randomized binomial test always has exact size, and its power curve near H0 is often

steeper than the non-exact tests; both are desirable properties.

The hope of this paper was to produce a new method that controls size better than

HS and B (and BS) while keeping power competitive. For one-sample inference in

cases when Hutson (1999) is not computable, the following simulations show that

the new method eliminates or reduces the size distortion of HS, B, and BS. The new

method also has good power, sometimes even better than BS. When Hutson (1999)

is computable, the new method can have better power than HS and B, too.

The two-sample results are more complicated since performance depends on how

similar the two distributions are. The one-sample results represent one extreme of

the two-sample case as the ratio fX(ξp)/fY (ξp) goes to infinity (or zero), when size

is hardest to control. The other extreme is when fX(ξp) = fY (ξp). In that case, HS

and B9 have less size distortion, but still some. BS, though, seems to control size

and have better power than the new method with estimated θ; with the true θ, BS

and the new method perform quite similarly. General two-sample setups would have

results in between these extremes of fX/fY →∞ (or zero) and fX/fY = 1.

Unless otherwise noted, 10,000 simulation replications were used, and α = 5%.

9Since HS88 and Bofinger (1975) do not provide values of m for the two-sample case, I used theirm values for the univariate case.

24 DAVID M. KAPLAN

Table 1. Empirical size as percentage, n = 3, 4, p = 0.5, one-sample.

n = 3 n = 4Distribution New BS HS/B New BS HS/BN(0, 1) 1.6 1.2 11.9 2.4 4.1 15.4Logn(0, σ = 3/2) 3.3 1.3 12.8 1.8 3.6 10.0Exp(1) 3.0 1.2 13.5 2.3 4.6 12.9Uniform(0, 1) 3.2 1.5 16.0 5.0 6.8a 21.3GEV(0, 1, 0) 1.9 1.3 12.1 2.1 4.0 13.6χ2

1 4.8 1.6 14.8 2.6 4.3 11.8

a6.7% with 999 outer replications

Table 2. Empirical size as percentage, n = 3, 4, p = 0.5, two-samplewith identical independent distributions.

n = 3 n = 4Distribution Newa BS HS/B New BS HS/BN(0, 1) 0.8 1.3 6.3 1.1 2.5 6.5Logn(0, σ = 3/2) 0.4 0.8 4.5 0.6 1.3 3.7Exp(1) 1.1 1.4 6.4 0.9 1.9 5.9Uniform(0, 1) 1.4 1.9 8.6 1.8 3.6 8.6

aIf true θ used instead: 1.6, 1.1, 1.6, 2.8.

In the extreme sample size case where n = 3 or n = 4, HS and B are severely size

distorted while the new method and BS control size (except BS for n = 4 and uniform

distribution), as shown in Table 1. In the other special case for two-sample, where

the distributions are identical, all rejection probabilities are much smaller, though

some size distortion remains for HS and B; see Figure 2. For one-sample power, BS is

very poor for n = 3, but better than the new method at some alternatives for n = 4;

see Figures 6. For two-sample power, BS is equal to the new method for a range of

alternatives with n = 3, and better for n = 4; see Figure 6.

For n = 11, all methods can have size distortion, but the new method has the least,

while HS and B have the most. This occurs even for a N(0, 1), but is worse for the

fat-tailed slash distribution; see Figure 7. Whether or not BS is size distorted, the

new method still has competitive power with it, as shown in Figure 8.

For n = 45 and p = 0.05 or p = 0.95, the new method controls size except for

empirical size around 5.5% for the slash distribution. In contrast, BS, HS, and B

are all size distorted, sometimes significantly; see Table 3. Still, the new method’s


Figure 6. Empirical power properties; p = 0.5. Top row: F =Exp(1), one-sample. Bottom: N(0, 1), two-sample. Left column:n = 3. Right: n = 4.

Table 3. Empirical size as percentage, p = 0.95, one-sample.

n = 45 n = 21BS; B1 = BS; B1 =

Distribution New 99 999 HS/B New 99 999 HS/BN(0, 1) 3.2 6.4 6.5 10.4 6.7 11.3 11.5 25.3Slash 5.3 7.8 8.1 10.7 12.9 17.8 18.6 30.3Logn(0, σ = 3/2) 5.0 7.3 7.5 10.7 11.3 14.6 15.2 28.7Exp(1) 4.0 6.9 7.0 10.7 8.0 12.6 13.1 26.9Uniform(0, 1) 3.3 5.4 5.3 11.2 4.7 6.8 6.7 22.3GEV(0, 1, 0) 4.1 7.2 7.3 11.0 7.3 12.2 12.6 25.9χ2

1 3.6 6.5 6.4 10.2 8.7 13.5 13.9 27.5

26 DAVID M. KAPLAN

Figure 7. Empirical size properties, one-sample, n = 11. Left:N(0, 1). Right: slash.

Figure 8. Empirical power properties, one-sample, n = 11. Left:N(0, 1), p = 0.2. Right: slash, p = 0.1.

Table 4. Empirical size as percentage, p = 0.95, two-sample withindependent distributions of same shape but differing in variance byfactor of 25. Distribution with larger variance is shown.

n = 21 n = 29 n = 45Distribution New BS HS/B New BS HS/B New BS HS/BN(0, 1) 6.3 9.5 21.1 3.8 6.9 13.5 3.4 5.5 8.6Logn(0, σ = 3/2) 11.5 15.6 28.2 5.9 9.3 17.8 4.6 7.2 10.1Exp(1) 7.4 10.9 21.6 4.0 7.9 14.1 3.5 6.0 8.1

Uniform(0,√

12) 5.6 6.6 19.8 4.3 5.7 16.0 3.9 5.2 10.7


Figure 9. Empirical power properties, one-sample. n = 45, p = 0.95.Left: slash. Right: GEV(0,1,0).

Figure 10. Empirical power properties, two-sample. n = 45, p =0.95. Left: N(0, 1) distributions. Right: Uniform(0,

√12).

power is competitive with BS even without calibration, as in Figure 9. For the

two-sample case with identical distributions, BS always controls size along with the

new method. However, moving toward the limiting case where one distribution is a

constant, when there is a significant different in variance between the two samples’

distributions (factor of 25), BS has significantly worse size distortion than the new

method for n = 21, and has size distortion while the new method has none for n = 29

and n = 45; see Table 4. In the case of identical distributions, the new method has

power competitive with BS for n = 21, but by n = 45 BS has significantly better

power; see Figures 10 and 11.

28 DAVID M. KAPLAN

Figure 11. Empirical power properties, two-sample. n = 21. Left:N(0, 1) distributions, p = 0.05. Right: Exp(1), p = 0.05. Betterestimators of θ would increase power for the new method.

The new method can’t be used as-is when np < 1 or np ≥ n−1, i.e. when the estimator

is the sample minimum or maximum. It would be computationally possible to use

a one-sided quantile spacing in those cases, but at that point an extreme quantile

framework is more appropriate; see Chernozhukov & Fernandez-Val (2011).

For any sample size, it appears that quantiles close enough to zero or one will cause

size distortion in HS, B, and BS. For example, even with n = 250 and a normal

distribution, tests of the p = 0.005 quantile for α = 0.05 have empirical size 9.6% for

BS and 20.7% for HS and B, compared with 5.3% for the new method.

There are two different proximate “causes” of size distortion for HS and B in general,

illustrated in Figure 12. When n = 21 and p = 0.2, r = b(21)(0.2)c + 1 = 5, which

means that m can be at most 4 such that r − m ≥ 1; and mHS = mB = 4, the

biggest m possible. This is the best choice, but still size distorted. For n = 21 and

any p < 0.35, mHS = mB = r − 1, the maximum possible m. However, when n = 21

and p ≥ 0.35, mHS and mB are strictly less than the maximum possible m. In those

cases, HS and B could have chosen a larger m to lessen the size distortion, but didn’t.

In some cases, both “causes” apply, such as with F = Exp(1), n = 71, p = 0.2, and

α = 0.05, where bigger mHS would have reduced size distortion, but not eliminated it.

Focusing on cases when Hutson (1999) can’t be computed, the use of standard critical

values is always the proximate cause since mHS is always the maximum allowed.


Figure 12. Proximate causes of size distortion for HS and B. Em-pirical type I error (x markers) is plotted against m, for a test withstandard normal critical values; missing marker on the left for m = 1is above 0.15. The dashed line in the right plot is the plug-in eI usedto select mHS. Left: no m can control size. Right: approximationerror leads to mHS = 7 < 10 and size distortion. F = Uniform(0, 1),n = 21, α = 0.05, and p = 0.2 on the left, p = 0.5 on the right; 5000replications.

Of theoretical (if not directly practical) interest is Figure 13. The new method chooses

m to maximize power, and in some cases this produces significantly better power than

HS or B. This is true in both the one-sample case (top row) and the two-sample case

(bottom row).

Theoretical calculations corroborate the new method’s power advantage over HS in

the Figure 13 simulations. Type II error for the new method is given in Proposition

5. As seen in Appendix C, using standard normal critical value z1−α/2 instead of zα,m

will only alter the m−1 term. To compare the new method to HS, only the m−1 and

(m/n)2 terms are needed since the first-order term and the n−1/2 term are identical.

The m−1 term is always larger for the new method due to the more conservative

critical value, while the (m/n)2 term is always larger for HS since mHS > mK .

Specifically, for α = 0.05, C = 1.96 (equivalent to −1.96 since the Cauchy distribution

is symmetric about its median), and p = 0.5, the smoothing parameters are mK ≈(0.77)n2/3 and mHS ≈ (0.97)n2/3, using (14) and (15). As discussed, I rounded mHS

to the nearest integer to avoid exacerbating the size distortion already seen in Table

3, for example, while the floor function is used for the new method. For n = 11,

mK = b3.8c = 3 and mHS = round(4.8) = 5; for n = 41, mK = b9.2c = 9 and

30 DAVID M. KAPLAN

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Power curves with F=t(1);α=0.05;nreplic=50000;n=11;p=0.5

absolute deviation from H0

empi

rical

nul

l rej

ectio

n pr

obab

ility

New methodHSB

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Power curves with F=t(1);α=0.05;nreplic=50000;n=41;p=0.5


empi

rical

nul

l rej

ectio

n pr

obab

ility

New methodHSB

0 0.5 1 1.5 2 2.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Power curves with Fx=t(1);F

y=t(1);α=0.05;nreplic=50000;n=11;p=0.5


empi

rical

nul

l rej

ectio

n pr

obab

ility

New methodHSB

0 0.2 0.4 0.6 0.8 1 1.20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Power curves with Fx=t(1);F

y=t(1);α=0.05;nreplic=50000;n=41;p=0.5


empi

rical

nul

l rej

ectio

n pr

obab

ility

New methodHSB

Figure 13. Empirical power curves, comparing n = 11 (left column)and n = 41 (right column). 50,000 replications each, p = 0.5, anddistribution is Cauchy. Top row: one-sample. Bottom row: two-sample.

mHS = round(11.5) = 12. For the Cauchy distribution, f(x) = π−1(1 + x2)−1,

f(0) = π−1, f ′(0) = 0, and f ′′(0) = −2π−1, so

3(f ′)2 − ff ′′

6f 4= π2/3.

Taking φ(1.96 + 1.96) ≈ 0 and φ(0) = (2π)−1/2, the new method’s type II error has

m−1 term approximately (0.751)m−1K , while the m−1 term for HS is approximately

zero. The (m/n)2 term is approximately (2.57)(m/n)2. This means that, up to

o(m−1 +m2/n2) terms, the type II error difference is

eKII − eHSII = m−1K (0.751) + (m2

K −m2HS)(2.57/n2).


Plugging in n = 11, mK = 3, and mHS = 5, eKII − eHSII = −0.090 < 0, meaning

better power for the new method. With n = 41, mK = 9, and mHS = 12, eKII−eHSII =

−0.013 < 0, again better power for the new method. To compare with Figure 13, note

that γ/√n = (C/f(0))

√p(1− p)/n ≈ 0.9 for n = 11 and γ/

√n ≈ 0.5 for n = 41.

The analytic differences from the m−1 and (m/n)2 terms are still smaller than the

differences in the top row of Figure 13; with small samples, the o(m−1 +m2/n2) terms

also contribute significantly.

The π2/3 coefficient in the (m/n)2 term, based on the Cauchy PDF and its deriva-

tives, is key to having that term outweigh the m−1 term. With a standard normal

distribution, the coefficient is π/3, smaller by a factor of π; for other distributions

and/or quantiles, the coefficient may be larger.

In the two-sample case, the results shown and discussed were for independent sam-

ples, as the theoretical results assumed. If there is enough negative correlation, size

distortion occurs for all methods; if there is positive correlation, power will decrease

for all methods. The effect of uncorrelated, dependent samples could go either way.

In practice, examining the sample correlation as well as one of the many tests for sta-

tistical independence is recommended, if independence is not known a priori.

7. Conclusion

This paper proposes new one- and two-sample quantile testing procedures based on

the SBG test statistic in (3). Critical values dependent on smoothing parameter m

are derived from fixed-smoothing asymptotics. These are more accurate than the

conventional standard normal critical values since the fixed-m distribution is shown

to be more accurate under both fixed-m and large-m asymptotics. Type I and II

errors are approximated using an Edgeworth expansion, and the testing-optimal mK

minimizes type II error subject to control of type I error, up to higher-order terms.

Simulations show that, compared with the previous SBG methods from HS88 and

Bofinger (1975), the new method greatly reduces size distortion while maintaining

good power. The new method also outperforms bootstrap methods for one-sample

tests and certain two-sample cases. Consequently, this new method is recommended

in one-sample cases when Hutson (1999) is not computable. In two-sample cases

when Goldman & Kaplan (2012) is not computable, performance depends on the

unknown distributions and warrants further research into both bootstrap methods

and the two-sample plug-in method (particularly estimating θ) proposed here. Finally,

32 DAVID M. KAPLAN

theoretical justification for fixed-smoothing asymptotics is provided outside of the

time series context; there are likely additional models that may benefit from this

perspective.

References

[1] Bickel, Peter J. & Anat Sakov, ‘On the choice of m in the m out of nbootstrap and confidence bounds for extrema.’ Statistica Sinica, 18 (3), pp.967–985, 2008.

[2] Bloch, Daniel A. & Joseph L. Gastwirth, ‘On a Simple Estimate of theReciprocal of the Density Function.’ The Annals of Mathematical Statistics,39 (3), pp. 1083–1085, 1968.

[3] Bofinger, Eve, ‘Estimation of a density function using order statistics.’Australian Journal of Statistics, 17 (1), pp. 1–7, 1975.

[4] Brown, B. M., Peter Hall & G. A. Young, ‘The smoothed median andthe bootstrap.’ Biometrika, 88 (2), pp. 519–534, 2001.

[5] Chernozhukov, Victor & Ivan Fernandez-Val, ‘Inference for Ex-tremal Conditional Quantile Models, with an Application to Market and Birth-weight Risks.’ The Review of Economic Studies, 78 (2), pp. 559–589, 2011.

[6] DasGupta, Anirban, Asymptotic Theory of Statistics and Probability, NewYork: Springer, pp.3, 2008.

[7] David, Herbert Aron, Order Statistics, Wiley series in probability andmathematical statistics: Wiley, 2nd edition, 1981.

[8] Goh, S. Chuan & Keith Knight, ‘Nonstandard Quantile-Regression In-ference.’ Econometric Theory, 25 (5), pp. 1415–1432, 2009.

[9] Goh, S. Chuan, ‘Smoothing choices and distributional approximations foreconometric inference.’ Ph.D. dissertation, UC-Berkeley, 2004.

[10] Goldman, Matt & David M. Kaplan, ‘IDEAL Quantile Inference viaInterpolated Duals of Exact Analytic L-statistics.’ 2012. Working paper.

[11] Hall, Peter & Simon J. Sheather, ‘On the Distribution of a StudentizedQuantile.’Technical Report 71, Department of Statistics, Pennsylvania StateUniversity, 1987.

[12] ‘On the Distribution of a Studentized Quantile.’ Journal of the RoyalStatistical Society. Series B (Methodological), 50 (3), pp. 381–391, 1988.

[13] Ho, Yvonne H. S. & Stephen M. S. Lee, ‘Iterated Smoothed BootstrapConfidence Intervals for Population Quantiles.’ The Annals of Statistics, 33(1), pp. 437–462, 2005.

[14] Hutson, Alan D., ‘Calculating nonparametric confidence intervals for quan-tiles using fractional order statistics.’ Journal of Applied Statistics, 26 (3), pp.343–353, 1999.

[15] Kaplan, David M., ‘Inference on quantiles: confidence in-tervals, p-values, and testing.’ MATLAB Central File Ex-change, http://www.mathworks.com/matlabcentral/fileexchange/

http://www.mathworks.com/matlabcentral/fileexchange/32410-quantile-inference-testing-univariate-and-bivariate


Table 5. Simulated rejection probabilities (%) for different fixed-mcritical value approximations, α = 5%.

rejection probability for critical valuem Goh (2004, simulated) including m−1 including m−2 t1 4.93 8.21 5.84 6.872 5.03 6.25 5.20 5.473 5.05 5.70 5.10 5.284 5.15 5.46 5.06 5.215 4.92 5.23 4.96 5.08

10 5.01 5.14 5.06 5.1420 5.04 5.01 4.99 5.04

32410-quantile-inference-testing-univariate-and-bivariate, Au-gust, 2011. Retrieved 25 March 2012.

[16] ‘IDEAL inference on conditional quantiles.’ 2012. Working paper.[17] Koenker, Roger & Zhijie Xiao, ‘Inference on the Quantile Regression

Process.’ Econometrica, 70 (4), pp. 1583–1612, 2002.[18] Mosteller, F., ‘On Some Useful “Inefficient” Statistics.’ Annals of Mathe-

matical Statistics, 17, pp. 377–408, 1946.[19] Siddiqui, Mohammed M., ‘Distribution of quantiles in samples from a bi-

variate population.’ J. Research of the NBS, B. Math. and Math. Physics,64B (3), pp. 145–150, 1960.

[20] Sun, Yixiao, Peter C. B. Phillips & Sainan Jin, ‘Optimal BandwidthSelection in Heteroskedasticity-Autocorrelation Robust Testing.’ Economet-rica, 76 (1), pp. 175–194, 2008.

Appendix A. Accuracy of approximation of fixed-m critical values

The approximate fixed-m critical value in (7) is quite accurate for all but m = 1, 2,

as Table 5 shows. The second approximation alternative adds the O(m−2) term to

the approximation. The third alternative uses the critical value from the Student’s

t-distribution with the degrees of freedom chosen to match the variance. To compare,

for various m and α, I simulated the two-sided rejection probability for a given critical

given Tm,∞ from (4) as the true distribution. One million simulation replications per

m and α were run; to gauge simulation error, I also include in the table the critical

values given in Goh (2004) (who ran 500,000 replications each). Additional values of

m and α are available in the online appendix.



34 DAVID M. KAPLAN

Appendix B. Sign of u3,0(z) for common distributions

Here, I only show the work for the normal distribution; the method is the same for

the other distributions. Work for t-, exponential, χ2, and Frechet distributions is in

the supplemental appendix. For the uniform distribution, u3,0 = 0 since f ′ = 0 and

f ′′ = 0.

Starting with the pdf and derivatives of the normal distribution,

fn(x) =1

σ√

2πe−(x−µ)2/(2σ2), f ′n(x) = −(x− µ)/σ2 · fn(x)

f ′′n(x) = −fn(x)/σ2 − f ′n(x)(x− µ)/σ2 = −fn(x)/σ2 − [−(x− µ)/σ2 · fn(x)](x− µ)/σ2

= −fn(x)/σ2 + (x− µ)2/σ4 · fn(x) = fn(x)[(x− µ)2/σ4 − 1/σ2],

3f ′n(x)2 − fn(x)f ′′n(x) = 3[−(x− µ)/σ2 · fn(x)]2 − fn(x) · fn(x)[(x− µ)2/σ4 − 1/σ2]

= [fn(x)]2{2(x− µ)2/σ4 + 1/σ2} ≥ 0,

for any µ and σ2. Thus the sign of u3,0(z) is always the sign of z for the normal

distribution at any quantile.

Appendix C. Calculation of type II error (Proposition 5)

The type II error is the probability of not rejecting when H0 is false. For a two-sided

symmetric test, this is

P (|Tm,n| < z) = P (Tm,n < z)− P (Tm,n < −z).

Letting the corrected critical value zα,m = z1−α/2 + z31−α/2/(4m) as in (7), and ex-

panding via (11) and (10), for C > 0,

P (|Tm,n| < z) = L+ +H+1 −H+

2 +O(n−1/2) + o(m−1 + (m/n)2) with

L+ = Φ(zα,m + C)− Φ(−zα,m + C),

H+1 = φ(zα,m + C)

[m−1u2,γ(zα,m + C) + (m/n)2u3,γ(zα,m + C)

],

H+2 = φ(−zα,m + C)

[m−1u2,γ(C − zα,m) + (m/n)2u3,γ(C − zα,m)

].

I write O(n−1/2) for the n−1/2 terms since they do not depend on m and thus do not

affect the optimization problem for selecting m. Define L−, H−1 , and H−2 similarly

but with −C < 0 instead of C > 0, and thus −γ = −C√p(1− p)/f(ξp) instead of


γ:

L− = Φ(zα,m − C)− Φ(−zα,m − C),

H−1 = φ(zα,m − C)[m−1u2,−γ(zα,m − C) + (m/n)2u3,−γ(zα,m − C)

],

H−2 = φ(−zα,m − C)[m−1u2,−γ(−C − zα,m) + (m/n)2u3,−γ(−C − zα,m)

].

I calculate average power where the alternatives +C and−C each have 0.5 probability.

Using φ(−x) = φ(x),

P (|Tm,n| < zα,m) =1

2

{(L+ + L−) + (H+

1 +H−1 )− (H+2 +H−2 )

+O(n−1/2) + o(m−1 + (m/n)2)

}.(17)

For the first-order term L+,

Φ(zα,m+C)− Φ(−zα,m + C)

= Φ(C + z1−α/2 + z31−α/2/(4m))− Φ(C − z1−α/2 − z3

1−α/2/(4m))

= 0.5 +m−1 1

4z31−α/2[φ(z1−α/2 + C) + φ(C − z1−α/2)](18)

since in Proposition 5, C solves 0.5 = Φ(z1−α/2 + C) − Φ(−z1−α/2 + C). Since the

fixed-m critical value is larger than the standard normal critical value, this term

contributes additional type II error in the m−1 term. Similarly,

L− = Φ(zα,m − C)− Φ(−zα,m − C)

= 0.5 +m−1 1

4z31−α/2[φ(C − z1−α/2) + φ(−z1−α/2 − C)] = L+.

Within the m−1 and (m/n)2 terms, anything o(1) will end up in the o(m−1 +(m/n)2)

remainder. Thus, for those terms,

zα,m = z1−α/2 +O(m−1), φ(zα,m + C) = φ(z1−α/2 + C) +O(m−1).

For the m−1 terms, since φ(x) = φ(−x) and C ≡ γf(ξp)/√p(1− p), and letting

d1 ≡ zα,m + C, d2 ≡ zα,m − C,

u2,γ(d1)− u2,−γ(−d1)

= −1

4d3

1 −1

4C2d1 +

1

42Cd2

1 −[−1

4(−d1)

3 − 1

4(−C)2(−d1) +

1

42(−C)(−d1)

2

]

36 DAVID M. KAPLAN

= −1

2(d3

1 + C2d1 − 2Cd21) = −1

2(z1−α/2 + C)z2

1−α/2 +O(m−1),

u2,−γ(d2)− u2,γ(−d2) = −1

2(d3

2 + C2d2 + 2Cd22) = −1

2(z1−α/2 − C)z2

1−α/2 +O(m−1),

φ(zα,m + C)m−1(u2,γ(zα,m + C)− u2,−γ(−zα,m − C))

+ φ(zα,m − C)m−1(u2,−γ(zα,m − C)− u2,γ(−zα,m + C))

= −1

2m−1

[φ(z1−α/2 + C)(z1−α/2 + C)z2

1−α/2 + φ(z1−α/2 − C)(z1−α/2 − C)z21−α/2

]+O(m−2).

(19)

For the (m/n)2 terms, writing f(ξp) and its derivatives as f , f ′, f ′′, and letting

M ≡ [3(f ′)2 − ff ′′]/(6f 4),

u3,γ(zα,m + C)− u3,−γ(−zα,m − C)

= M(zα,m + C − C)−M(−zα,m − C − (−C)) = 2Mz1−α/2 +O(m−1),

u3,−γ(zα,m − C)− u3,γ(−zα,m + C)

= M(zα,m − C − (−C))−M(−zα,m + C − C)) = 2Mz1−α/2 +O(m−1),

m2

n2

{φ(zα,m + C)[u3,γ(zα,m + C)− u3,−γ(−zα,m − C)]

+ φ(zα,m − C)[u3,−γ(zα,m − C)− u3,γ(−zα,m + C)]}

= (m/n)223(f ′)2 − ff ′′

6f 4z1−α/2

[φ(z1−α/2 + C) + φ(z1−α/2 − C)

]+O(m/n2).(20)

Combining (18), (19), and (20), (17) becomes

P (|Tm,n| < z) =1

2

{2(0.5 +m−1(1/4)z3

1−α/2[φ(C + z1−α/2) + φ(C − z1−α/2)])

− 2(1/4)m−1[φ(z1−α/2 + C)(z1−α/2 + C)z21−α/2

+ φ(z1−α/2 − C)(z1−α/2 − C)z21−α/2]

+ 2(m/n)2 3(f ′)2 − ff ′′

6f 4z1−α/2

[φ(z1−α/2 + C) + φ(z1−α/2 − C)

]}+O(n−1/2) + o(m−1 + (m/n)2),

which simplifies to the forms given in Proposition 5.


Appendix D. Proof sketch of Edgeworth expansion (Theorem 1) of

SBG test statistic under local alternative hypothesis

This is a sketch comparable to Hall & Sheather (1987, , “HS87” hereafter); full proof

is in the supplemental appendix. The structure here follows HS88, taking their results

as given (proofs in my full version). Full references for citations found only in this

proof are in HS88. This proof of Theorem 1 uses the same setup and definitions as

in the main text.

As in Section 2, the null hypothesis is H0 : ξp = β, and the true ξp = β − γ/√n. I

continue from (11), which showed that

P (Tm,n < z) = P

(√n(Xn,r − ξp) + γ(Sm,n/S0 − 1)

Sm,n√p(1− p)

< z + C

),

where C ≡ γf(ξp)/√p(1− p), S0 ≡ 1/f(ξp). I want to derive a higher-order expan-

sion around the (shifted) standard normal distribution.

As in HS88, I first deal with centering Xn,r = Xn,bnpc+1 since as n increases, bnpcincreases in discrete steps of one. So instead of Xn,r−ξp there is Xn,r−ηp, where

ηp ≡ F−1

[exp

(−

n∑r

j−1

)].(21)

This centering effect is no different when γ 6= 0; HS88 (p. 389) conclude that the

effect is, defining wn ≡ [εn − 1 + 12(1− p)]/[p(1− p)]1/2,

n1/2(Xn,r − ξp)/τ = n1/2(Xn,r − ηp)/τ + n−1/2wn +Op[n−1/2m−1/2 + (m/n)2n−1/2].

Then the remainder n−1/2m−1/2 + (m/n)2n−1/2 = o[m−1 + (m/n)2], and so

P [n1/2(Xn,r − ξp)/τ ≤ z] = P [n1/2(Xn,r − ηp)/τ ≤ z − n−1/2wn] +O(n−3/2)

= P [n1/2(Xn,r − ηp)/τ ≤ z]− n−1/2wnφ(z) +O(n−1),

P

[n1/2(Xn,r − ξp) + γ(Sm,n/S0 − 1)

τ≤ z

]= P

[n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)

τ≤ z

]− n−1/2wnφ(z) + o[m−1 + (m/n)2].(22)

As HS88 note, this is the so-called ‘delta method’ for deducing Edgeworth expansions;

see Bhattacharya and Ghosh (1978, p. 438). Thus, instead of proving (10), the

38 DAVID M. KAPLAN

theorem follows by proving

sup−∞<z<∞

∣∣∣∣∣P[n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)

τ≤ z

]

− [Φ(z) + n−1/2u†1,γ(z)φ(z) +m−1u2,γ(z)φ(z) + (m/n)2u3,γ(z)φ(z)]

∣∣∣∣∣= o[m−1 + (m/n)2],(23)

where

u†1,γ(z) ≡ 1

6

(p

1− p

)1/21 + p

p(z2 − 1)− γf(ξp)

p

(1− pf ′(ξp)

[f(ξp)]2− 1

2(1− p)

)z

− 1

2

(p

1− p

)1/2(1 +

f ′(ξp)

[f(ξp)]2(1− p)

)z2,

u2,γ(z) ≡ 1

4

[2γf(ξp)

[p(1− p)]1/2z2 − γ2[f(ξp)]

2

p(1− p)z − z3

],

u3,γ(z) ≡ f ′′(ξp)f(ξp) + 3[f ′(ξp)]2

6[f(ξp)]4

(γf(ξp)

[p(1− p)]1/2− z).

Now to prove the above. As defined in HS88, let

G ≡ F−1, g ≡ G′, and H(x) ≡ F−1(e−x),(24)

and let W1,W2, . . . , be independent exponential random variables with unit mean.

The sequence {Xn,s}ns=1 has the same joint distribution as {H(∑

s≤j≤n j−1Wj)}ns=1

(e.g. David, 1981, p. 21), and in a slight abuse of notation suggested by HS88, write

Xn,s = H(∑

s≤j≤n j−1Wj). Let Vj ≡ Wj − 1,

∆1 ≡r−1∑r−m

j−1Vj, ∆2 ≡r+m−1∑

r

j−1Vj, ∆3 ≡n∑

r+m

j−1Vj, ak ≡ H(k)

(n∑r

j−1

),

Z ≡ [p(1− p)]1/2[n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)]/τ(25)

= [n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)][(n/2m)(Xn,r+m −Xn,r−m)]−1.

Note that ∆1 and ∆2 are Op(n−1m1/2) and ∆3 is Op(n

−1/2). With the above and

Taylor expansions, Z = Y +R where

Y ≡ −pn1/2[(1 + δ)(∆2 + ∆3) +B], δ ≡ −(m/n)2g′′(p)[6g(p)]−1,(26)

B ≡ δΨ + (n/m)(b1∆1 + b2∆2)∆2 + (n/m)b3(∆1 + ∆2)(∆3 + Ψ)


+ (n/m)2b4(∆1 + ∆2)2(∆3 + Ψ) + b5∆3(∆3 + 2Ψ),(27)

b1 ≡ −p/2, b2 ≡ −p/2, b3 ≡ −p/2− (m/n)(a2/2a1), b4 ≡ (p/2)2, b5 ≡ −a2/2a1,

Ψ ≡ γ/(pg(p)√n), and

R = Op[n−1/2m−1/2 + n−3/2m+m−3/2 + (m/n)2+ε].(28)

Here, Y is of the same form as HS88, but with Ψ now additionally showing up in the

higher-order B terms. In the definition of Z, γ only appears in the numerator, not in

the denominator. From the stochastic expansion of Sm,n, which is already required

for the denominator of Z, Sm,n/S0 = 1 + ν, where ν contains the higher-order terms

that are dropped for the first-order asymptotic result,

ν =n

2mp(∆1 + ∆2) +

a2

2a1

(∆1 + ∆2 + 2∆3) + (m/n)2 g′′(p)

6g(p)

+Op((m/n)2+ε + n−1/2m−1/2 +mn−3/2).(29)

Thus, γ enters only in higher-order terms, through the numerator of Z.

Note the Taylor expansion, with sums all over some common range,

H(∑

Wj/j)−H

(∑1/j)

=(∑

Vj/j)H ′(∑

1/j)

+ (1/2)(∑

Vj/j)2

H ′′(∑

1/j)

+ . . . .(30)

For the numerator of Z, applying the expansion above yields

Xn,r − ηp =

(n∑j=r

Vj/j

)H ′

(n∑r

1/j

)+

1

2

(n∑j=r

Vj/j

)2

H ′′

(n∑r

1/j

)+Op(n

−3/2)a3

= (∆2 + ∆3)a1 +1

2(∆2 + ∆3)

2a2 +Op(n−3/2),(31)

since Xn,r = H(∑

j≥rWj/j), ηp ≡ H(∑n

r 1/j), and a3 = O(1). Including γ as

described earlier,

n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)

=√n[(∆2 + ∆3)a1 + (1/2)(∆2 + ∆3)

2a2 +Op(n−3/2) + γν

].(32)

40 DAVID M. KAPLAN

The denominator of Z does not include γ, so the stochastic expansion is identical to

the intermediate results from HS87,

Xn,r+m −H

(n∑

r+m

j−1

)−

{Xn,r−m −H

(n∑

r−m

j−1

)}= −a1(∆1 + ∆2)− (m/np)a2(∆1 + ∆2 + 2∆3) +Op(n

−3/2m1/2 + n−5/2m2),(33)

H

(n∑

r+m

j−1

)−H

(n∑

r−m

j−1

)= (2m/n)

[g(p) + (1/6)(m/n)2g′′(p) +O{(m/n)2+ε + n−1}

].(34)

When γ = 0, plugging in (31), (33), and (34) yields

Z = −pn1/2((∆2 + ∆3)(−a1/pg(p))− 1

2pg(p)(∆2 + ∆3)

2a2 +Op(n−3/2))

×[ n

2m

(− (a1/g(p))(∆1 + ∆2)− (m/n)(a2/pg(p))(∆1 + ∆2 + 2∆3)

+Op(n−3/2m1/2 + n−5/2m2)

)+ 1 + (m/n)2 g

′′(p)

6g(p)+O{(m/n)2+ε + n−1}

]−1

≡ −pn1/2Θ(1 + ν)−1 = −pn1/2Θ(1− ν + ν2 +O(ν3)),(35)

defining ν (same as above) and Θ for ease of notation.

If (32) (with the new γ term) is used instead of (31), and noting that above I had

divided through by g(p) in addition to factoring −pn1/2 out of the numerator,

Z = [−pn1/2Θ + γν][1 + ν]−1 = −pn1/2(Θ−Ψν)(1 + ν)−1

= −pn1/2[Θ− (Θ + Ψ)ν + (Θ + Ψ)ν2 +O(n−1/2ν3)],(36)

with Ψ ≡ γ/(pg(p)√n) as in (27). Θ and Ψ are both Op(n

−1/2), and ν = Op(m−3/2),

which is in R, so the remainder works out.

As given in HS88, with restrictions nη ≤ m ≤ n1−η and 0 < ε ≤ 1/6, (28) implies

R = Op{[m−1 + (m/n)2]n−εη},(37)

which they state may be strengthened to

P{|R| > [m−1 + (m/n)2]n−ζ} = O(n−λ), all 0 < ζ < εη and all λ > 0,


since by Markov’s and Rosenthal’s inequalities (Burkholder (1973), p. 40),

2∑i=1

P (|nm−1/2∆i| > nε) + P (|n1/2∆3| > nε) = O(n−λ), all ε > 0 and all λ > 0.

Thus, the theorem follows from proving the result for Y rather than for Z. As HS88

put it, “Since ∆1,∆2 and ∆3 are independent and have smooth distributions, it is

elementary (but tedious10) to estimate the difference between exp(−t2/2) and the

characteristic function of Y , and then derive an Edgeworth expansion by following

classical arguments, as in Petrov (1975) or Bhattacharya and Ghosh (1978). To simply

identify terms in the expansion here, only moments of Y are needed,” the result of

which is, as stated in (3.2) of HS87 (it turns out that the difference with γ 6= 0 is

that B contains more terms, but B enters the same way in these equations),

E[(−p−1Y )`] = z1(`) + z2(`) + z3(`) +O{m−3/2 +m−1/2(m/n)2},(38)

z1 ≡ n`/2E[{(1 + δ)(∆2 + ∆3)}`], z2 ≡ `n`/2E{(∆2 + ∆3)`−1B},

z3 ≡1

2`(`− 1)n`/2E{(∆2 + ∆3)

`−2B2}.

With the new Ψ terms, letting Di ≡ n1/2∆i to follow the notation of HS88,

z2 = `{n−1/2

[p−2b2E(D`−1

3 ) + 2b5E(D`3)Ψ√n]

+ 2m−1p−2b4[E(D`

3) + E(D`−13 )Ψ

√n]

+ n−1/2b5E(D`+13 ) + δE(D`−1

3 )Ψ√n}(39)

+ `(`− 1)n−1/2p−2b3(E(D`−1

3 ) + E(D`−23 )Ψ

√n)

+O(mn−3/2 +m−1/2n−1/2),

z3 = `(`− 1)m−1b23p−2[E(D`

3) + 2E(D`−13 )Ψ

√n+ E(D`−2

3 )Ψ2n]

+O(m−1/2n−1/2 +m−3/2 + (m/n)4).

(40)

Noting that b3 = b2 +O(m/n) = −(p/2) +O(m/n) and b23 = b4 +O(m/n) = (p/2)2 +

O(m/n),

z2 + z3 = `{n−1/2

[p−2b2E(D`−1

3 ) + 2b5E(D`3)Ψ√n]

+ 2m−1p−2b4[E(D`

3) + E(D`−13 )Ψ

√n]

+ n−1/2b5E(D`+13 ) + δE(D`−1

3 )Ψ√n}

10Again, please see full proof on my website for every “tedious” step.

42 DAVID M. KAPLAN

+ `(`− 1)n−1/2p−2b3(E(D`−1

3 ) + E(D`−23 )Ψ

√n)

+ `(`− 1)m−1b23p−2[E(D`

3) + 2E(D`−13 )Ψ

√n+ E(D`−2

3 )Ψ2n]

+O(mn−3/2 +m−1/2n−1/2 +m−3/2 + (m/n)4)

= n−1/2`{− (`− 1)(2p)−1Ψ

√nE(D`−2

3 )− `(2p)−1E(D`−13 )

− (a2/a1)E(D`3)Ψ√n− (a2/2a1)E(D`+1

3 )}

+m−1`{

((`− 1)/4)E(D`−23 )Ψ2n+ (`/2)Ψ

√nE(D`−1

3 ) + ((`+ 1)/4)E(D`3)}

+ δ`E(D`−13 )Ψ

√n+O

(mn−3/2 +m−1/2n−1/2 +m−3/2 + (m/n)4

).(41)

Plugging in z2 + z3, (38) becomes

E[(−p−1Y )`] = E[(1 + δ)(D2 +D3)]`

+ n−1/2`{− (`− 1)(2p)−1Ψ

√nE(D`−2

3 )− `(2p)−1E(D`−13 )

− (a2/a1)E(D`3)Ψ√n− (a2/2a1)E(D`+1

3 )}

+m−1`{`− 1

4E(D`−2

3 )Ψ2n+`

2Ψ√nE(D`−1

3 ) +`+ 1

4E(D`

3)}

+ δ`E(D`−13 )Ψ

√n+ o

(m−1 + (m/n)2

).(42)

As in HS88, let

K ≡ [p(1− p)]−1/2Y, L ≡ −[p/(1− p)]1/2(1 + δ)(D2 +D3),(43)

and as they state,

E(D2k3 ) = [(1− p)/p]k(2k)!(k!2k)−1 +O(n−1), E(D2k+1

3 ) = O(n−1/2),(44)

where D3 is asymptotically N(0, (1− p)/p). Multiplying equation (42) by {−[p/(1−p)]1/2}`(it)`/`! and adding over `, the formal expansion is

E(eitK) = E(eitL) + n−1/2α1(t) +m−1α2(t) + δα3(t) + o[m−1 + (m/n)2],(45)

where

α1(t) ≡ e−t2/2

[(1

2− b5(1− p)

)(p(1− p))−1/2

((it)3 + (it)

)+ (it)2Ψ

√n

(2b5 −

1

2(1− p)

)],


α2(t) ≡ e−t2/2 1

4

[(it)4 + (it)2

(3 + Ψ2n

p

1− p

)− 2((it)3 + (it))Ψ

√n

(p

1− p

)1/2],

α3(t) ≡ e−t2/2(it)

(−Ψ√n

(p

1− p

)1/2),

which are, respectively, Fourier-Stieltjes transforms of

a1(z) ≡ −[Ψ√n

(2b5 −

1

2(p− 1)

)z

+ [(1/2)− b5(1− p)][p(1− p)]−1/2z2

]φ(z),(46)

a2(z) ≡ 1

4

[−Ψ2n

p

1− pz + 2Ψ

√n

(p

1− p

)1/2

z2 − z3

]φ(z), and(47)

a3(z) ≡

[Ψ√n

(p

1− p

)1/2]φ(z).(48)

The characteristic function of L is

E(eitL) = {1 + δ(it)2 − n−1/2 1

6[p/(1− p)]1/2[(1 + p)/p](it)3}e−t2/2 +O(δ2 + n−1),

(49)

where the right-hand side is the Fourier-Stieltjes transform of

Φ(z)− δzφ(z) + n−1/2 1

6[p/(1− p)]1/2[(1 + p)/p](z2 − 1)φ(z) +O(δ2 + n−1).(50)

To explicitly show the characteristic function of L, I take an expansion of the cumulant

generating function, and use the known mean, variance, and third moment to plug in,

calculating the third cumulant from the third moment. The fourth cumulant is also

needed to show the bound on the continuation of the series expansion. Skipping

the steps of calculation (found in the full proof), E(L) = 0, E(L2) = 1 + 2δ +

O(δ2 + n−1), E(L3) = −n−1/2(1 + p)p−1/2(1 − p)−1/2 + O(δ2 + n−1). Note, then,

that O((µ′2 − 1)2) = O(δ2). Expanding the log of the characteristic function of L,

which is the cumulant generating function,

lnE(eitL) =∞∑j=1

(it)j

j!κj (κj ≡ jth cumulant)

= (it)κ1 − (t2/2)κ2 +(it)3

6κ3 +

(it)4

4!κ4 + . . .

44 DAVID M. KAPLAN

= (it)µ′1 − (t2/2)(µ′2 − (µ′1)2) +

(it)3

6(µ′3 − 3µ′2µ

′1 + 2(µ′1)

3)

+(it)4

4!(µ′4 − 4µ′3µ

′1 − 3(µ′2)

2 + 12µ′2(µ′1)

2 − 6(µ′1)4) + . . .

= 0− (t2/2)µ′2 +(it)3

6(µ′3) +

(it)4

4!(µ′4 − 3(µ′2)

2) + . . . , so

E(eitL) = exp{−(t2/2)(1 + (µ′2 − 1)) +(it)3

6(µ′3) +

(it)4

4!(µ′4 − 3(µ′2)

2) + . . .},

and plugging in leads to (49).

When simplifying terms, it can easily be verified that the Fourier-Stieltjes transform

of the expression in (50),

Φ(z)− δzφ(z) + n−1/2(1/6)(p/(1− p))1/2((1 + p)/p)(z2 − 1)φ(z)

= Φ(z)− δzφ(z) + C(z2 − 1)φ(z)

= Φ(z) + δφ′(z) + Cφ′′(z),

is e−t2/2 +δ(−1)2(it)2e−t

2/2 +C(−1)3(it)3e−t2/2 = e−t

2/2(1+δ(it)2−C(it)3), matching

(49). The proof in HS88 implies that the conditions are such that the remainder will

be the same in both equations.

Combining results going back to (45) and noting that b5 = −(a2/2a1) = 12+pg′(p)[2g(p)]−1+

O(n−1), the distribution function of K admits the Edgeworth expansion on the RHS

of (23),

P (K ≤ z) = Φ(z) + {n−1/2u†1,γ(z) +m−1u2,γ(z) + (m/n)2u3,γ(z)}φ(z)

+ o{m−1 + (m/n)2},(51)

where

u†1,γ(z) ≡ 1

6

(p

1− p

)1/21 + p

p(z2 − 1)− γf(ξp)

p

(1− pf ′(ξp)

[f(ξp)]2− 1

2(1− p)

)z

− 1

2

(p

1− p

)1/2(1 +

f ′(ξp)

[f(ξp)]2(1− p)

)z2,

u2,γ(z) ≡ 1

4

[2γf(ξp)

[p(1− p)]1/2z2 − γ2[f(ξp)]

2

p(1− p)z − z3

], and

u3,γ(z) ≡ 3[f ′(ξp)]2 − f(ξp)f

′′(ξp)

6[f(ξp)]4

(z − γf(ξp)

[p(1− p)]1/2

),

where the RHS of (51) is identical to the expansion indicated by (23).


Note that this is for ηp, not ξp (hence u†1,γ instead of u1,γ), but implies the final result

in Theorem 1 when combined with equation (22).

To show the final result, the inverse Fourier-Stieltjes transformed functions can be

added to get the final answers since it is a linear transform (and consequently linear

inverse).

Starting at (45), adding the inverse Fourier-Stieltjes transformed functions of the RHS

leads to the distribution (cdf) of K:

P (K ≤ z) = Φ(z)− δzφ(z) + n−1/2 1

6{p/(1− p)}1/2{(1 + p)/p}(z2 − 1)φ(z)

+O(δ2 + n−1) + n−1/2a1(z) +m−1a2(z) + δa3(z) + o{m−1 + (m/n)2}

= Φ(z) + δφ(z)

(Ψ√n

(p

1− p

)1/2

− z

)

+ n−1/2φ(z)

[1

6

(p

1− p

)1/21 + p

p(z2 − 1)−Ψ

√n

(2b5 −

1

2(1− p)

)z

− ((1/2)− b5(1− p))(p(1− p))−1/2z2

]

+m−1φ(z)1

4

[−Ψ2n

p

1− pz + 2Ψ

√n

(p

1− p

)1/2

z2 − z3

]+ o{m−1 + (m/n)2},

and using δ ≡ −(m/n)2 g′′(p)

6g(p)and Ψ = γ/(pg(p)

√n),

= Φ(z) + (m/n)2 g′′(p)

6g(p)φ(z)

(z − γ

g(p)[p(1− p)]1/2

)+ n−1/2φ(z)

[1

6

(p

1− p

)1/21 + p

p(z2 − 1)

− γ

pg(p)

(1 +

pg′(p)

g(p)− 1

2(1− p)

)z

− 1

2

(p

1− p

)1/2(1− g′(p)

g(p)(1− p)

)z2

]

+m−1φ(z)1

4

[− γ2

[g(p)]2p(1− p)z +

2γ

g(p)[p(1− p)]1/2z2 − z3

]

46 DAVID M. KAPLAN

+ o{m−1 + (m/n)2},

and using g(p) = 1/f(F−1(p)), g′(p) = −f ′(F−1(p))/[f(F−1(p))]3, and g′′(p) =

[−f ′′(ξp)f(ξp) + 3[f ′(ξp)]2]/[f(ξp)]

5, and thus g′(p)/g(p) = −f ′(ξp)/[f(ξp)]2,

= Φ(z) + (m/n)2φ(z)3[f ′(ξp)]

2 − f(ξp)f′′(ξp)

6[f(ξp)]4

(z − γf(ξp)

[p(1− p)]1/2

)+ n−1/2φ(z)

[1

6

(p

1− p

)1/21 + p

p(z2 − 1)

− γf(ξp)

p

(1− pf ′(ξp)

[f(ξp)]2− 1

2(1− p)

)z

− 1

2

(p

1− p

)1/2(1 +

f ′(ξp)

[f(ξp)]2(1− p)

)z2

]

+m−1φ(z)1

4

[2γf(ξp)

[p(1− p)]1/2z2 − γ2[f(ξp)]

2

p(1− p)z − z3

]+ o{m−1 + (m/n)2}

= Φ(z) + n−1/2u†1,γ(z)φ(z) + (m/n)2u3,γ(z)φ(z) +m−1u2,γ(z)φ(z)

+ o{m−1 + (m/n)2}

as in (51).

Department of Economics, University of Missouri–Columbia

university of missouri college of arts and sciencekaplandm/pdfs/kaplan... · improved quantile...

Documents