university of missouri college of arts and sciencekaplandm/pdfs/kaplan... · improved quantile...
TRANSCRIPT
IMPROVED QUANTILE INFERENCE VIA FIXED-SMOOTHINGASYMPTOTICS AND EDGEWORTH EXPANSION
DAVID M. KAPLAN
Abstract. Estimation of a sample quantile’s variance requires estimation of theprobability density at the quantile. The common quantile spacing method involvessmoothing parameter m. When m,n → ∞, the corresponding Studentized teststatistic asymptotically follows a standard normal distribution. Holding m fixedasymptotically yields a nonstandard distribution dependent on m that contains theEdgeworth expansion term capturing the variance of the quantile spacing. Conse-quently, the fixed-m distribution is more accurate than the standard normal underboth asymptotic frameworks. For the fixed-m test, I propose an m to maximizepower subject to size control, as calculated via Edgeworth expansion. Comparedwith similar methods, the new method controls size better and maintains good orbetter power in simulations. Results for two-sample quantile treatment effect infer-ence are given in parallel.
Keywords: Edgeworth expansion; Fixed-smoothing asymptotics; Inference;Quantile; Studentize; Testing-optimal.
1. Introduction
This paper considers inference on population quantiles, specifically via the Studen-
tized test statistic from Siddiqui (1960) and Bloch & Gastwirth (1968) (jointly SBG
hereafter), for one- and two-sample setups. Median inference is a special case. Non-
parametric inference on conditional quantiles is an immediate extension if the condi-
tioning variables are all discrete1; for continuous conditioning variables, the approach
Date: July 5, 2013; first version available online January 14, 2011.Email: [email protected]. Thanks to Yixiao Sun for much patient and enthusiastic guidanceand discussion. Thanks also to Karen Messer, Dimitris Politis, Andres Santos, Hal White, BrendanBeare, Ivana Komunjer, Joel Sobel, Marjorie Flavin, Kelly Paulson, Matt Goldman, Linchun Chen,and other attentive and interactive listeners from UCSD audiences. Mail: Department of Economics,University of Missouri–Columbia, 909 University Ave, 118 Professional Bldg, Columbia, MO 65211-6040, USA..1In addition to common variables like race and gender, variables like age measured in years may betreated as discrete as long as enough observations exist for each value of interest.
1
2 DAVID M. KAPLAN
in Kaplan (2012) may be used with an adjusted bandwidth rate. Like a t-statistic for
the mean, the SBG statistic is normalized using a consistent estimate of the variance
of the quantile estimator, and it is asymptotically standard normal.
I develop a new asymptotic theory that is more accurate than the standard normal
reference distribution traditionally used with the SBG statistic. With more accuracy
comes improved inference properties,2 i.e. controlling size where previous methods
were size-distorted without sacrificing much power. A plug-in procedure to choose
the testing-optimal smoothing parameter m translates the theory into practice.
This work builds partly on Goh (2004), who suggests using fixed-m asymptotics to
improve inference on the Studentized quantile. Goh (2004) uses simulated critical
values to examine the performance of existing m suggestions, which are tailored to
standard normal critical values. Here, a simple and accurate analytic approximation is
given for the critical values. Using this approximation, the new procedure for selecting
m is tailored to the fixed-m distribution. I also provide the theoretical justification
for the accuracy of the fixed-m distribution, which complements the simulations in
Goh (2004).
The two key results here are a nonstandard fixed-m asymptotic distribution and a
higher-order Edgeworth expansion. For a scalar location model, Siddiqui (1960) gives
the fixed-m result, and Hall & Sheather (1988, hereafter cited as “HS88”) give a spe-
cial case of the Edgeworth expansion in Theorem 1 below. The Edgeworth expansion
is more accurate than a standard normal since it contains higher-order terms that
otherwise end up in the remainder, so its remainder becomes of smaller order. There
are intuitive reasons why the fixed-m approximation is also more accurate than the
standard normal, but we show this by rigorous theoretical arguments.
In the standard first-order asymptotics, both m → ∞ and n → ∞ (and m/n → 0);
since m→∞, this may be called “large-m” asymptotics. In contrast, fixed-m asymp-
totics only approximates n → ∞ while fixing m at its actual finite sample value.
Fixed-m is an instance of “fixed-smoothing” asymptotics in the sense that this vari-
ance does not go to zero in the limit as it does in “increasing-smoothing.” It turns
out that the fixed-m asymptotics includes the high-order Edgeworth term captur-
ing the variance of the quantile spacing. Consequently, the fixed-m distribution is
2Inverting the level-α tests proposed here yields level-α confidence intervals. I use hypothesis testinglanguage throughout, but size distortion is analogous to coverage probability error, and higher powercorresponds to shorter interval lengths.
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION 3
higher-order accurate under the conventional large-m asymptotics, while the standard
normal distribution is only first-order accurate. Under the fixed-m asymptotics, the
standard normal is not even first-order accurate. In other words, the fixed-m dis-
tribution is more accurate than the standard normal irrespective of the asymptotic
framework. Similar results on the accuracy of fixed-smoothing asymptotics have been
found in time series inference (e.g., Sun, Phillips & Jin, 2008); this result suggests
fixed-smoothing asymptotics may be valuable more generally.
Not only is accuracy gained with the Edgeworth and fixed-m distributions, but they
are sensitive to m. By reflecting the effect of m, they allow us to determine the best
choice of m. With the fixed-m and Edgeworth results, I construct a test dependent
on m using the fixed-m critical values, and then evaluate the type I and type II error
of the test using the more accurate Edgeworth expansion. Then I can optimally select
m to minimize type II error subject to control of type I error. Approximating (instead
of simulating) the fixed-m critical values reveals the Edgeworth/fixed-m connection
above, and it also makes the test easier to implement in practice.
The critical value correction here provides size control robust to incorrectly chosen m,
whereas in HS88 the choice of m is critical to controlling size (or not). In simulations,
the new method has correct size even where the HS88 method is size-distorted. Power
is still good because m is explicitly chosen to maximize it, using the Edgeworth ex-
pansion; in simulations, it can be significantly better than HS88. HS88 do not provide
a separate result for the two-sample case. Monte Carlo simulations show that the new
method controls size better than the common Gaussian plug-in version of HS88 and
various bootstrap methods, while maintaining competitive power. The method of
Hutson (1999), following a fractional order statistic approach, has better properties
but is not always computable. A two-sample analog with similarly good performance
is given by Goldman & Kaplan (2012). For two-sample inference, bootstrap can per-
form well, though the new method is good and open to a particular refinement that
would increase power; more research in both approaches is needed.
Section 2 presents the model and hypothesis testing problem. Section 3 gives a non-
standard asymptotic result and corresponding corrected critical value, while Section
4 gives Edgeworth expansions of the standard asymptotic limiting distribution. Using
these, a method for selecting smoothing parameter m is given in Section 5, which is
followed by a simulation study and conclusion. Results for the two-sample extension
4 DAVID M. KAPLAN
are provided in parallel. More details and discussion are available in the working pa-
per version; full technical proofs and calculations are in the supplemental appendices;
all are available on the author’s website.
2. Quantile estimation and hypothesis testing
For simplicity and intuition, consider an iid sample of continuous random variable
X, whose p-quantile is ξp. The estimator ξp used in SBG is an order statistic. The
rth order statistic for a sample of n values is defined as the rth smallest value in the
sample and written as Xn,r, such that Xn,1 < Xn,2 < · · · < Xn,n. The SBG estimator
is
ξp = Xn,r, r = bnpc+ 1,
where bnpc is the floor function (greatest integer not exceeding np). Writing the cu-
mulative distribution function (cdf) of X as F (x) and the probability density function
(pdf) as f(x) ≡ F ′(x), I make the usual assumption that f(x) is positive and con-
tinuous in a neighborhood of the point ξp. Consequently, ξp is the unique p-quantile
such that F (ξp) = p.
The standard asymptotic result (for the vector form, see Mosteller, 1946; Siddiqui,
1960, also gives the following scalar result) for a central3 quantile estimator is
√n(Xn,r − ξp)
d→ N[0, p(1− p){f(ξp)}−2].
A consistent estimator of 1/f(ξp) that is asymptotically independent of Xn,r leads to
the Studentized sample quantile, which has the pivotal asymptotic distribution√n(Xn,r − ξp)√
p(1− p){
1/f(ξp)} d→ N(0, 1).(1)
Siddiqui (1960) and Bloch & Gastwirth (1968) propose and show consistency of
1/f(ξp) = Sm,n ≡n
2m(Xn,r+m −Xn,r−m)(2)
when m→∞ and m/n→ 0 as n→∞.
For two-sided inference on ξp, I consider the parameterization ξp = β − γ/√n. The
null and alternative hypotheses are H0 : ξp = β and H1 : ξp 6= β, respectively. When
3“Central” means that in the limit, r/n → p ∈ (0, 1) as n → ∞; i.e., r → ∞ and is some fractionof the sample size n. In contrast, “intermediate” would take r → ∞ but r/n → 0 (or r/n → 1,n− r →∞); “extreme” would fix r <∞ or n− r <∞.
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION 5
γ = 0 the null is true. The test statistic examined in this paper is
Tm,n ≡√n(Xn,r − β)
Sm,n√p(1− p)
(3)
and will be called the SBG test statistic due to its use of (2). From (1), Tm,n is
asymptotically standard normal when γ = 0. A corresponding hypothesis test would
then compare Tm,n to critical values from a standard normal distribution.
For the two-sample case, assume that there are independent samples of X and Y ,
with nx and ny observations, respectively. For simplicity, let n = nx = ny. For
instance, if 2n individuals are separated into balanced treatment and control groups,
one might want to test if the treatment effect at quantile p has significance level α.
The marginal pdfs are fX(·) and fY (·). Interest is in testing if ξp,x = ξp,y. Under the
null hypothesis H0 : ξpx = ξpy = ξp, the first-order asymptotic result is
√n(Xn,r − ξp)−
√n(Yn,r − ξp)
d→ N(0, p(1− p)
[{fX(ξp)}−2 + {fY (ξp)}−2
]),
using the fact that the variance of the sum (or difference) of two independent normals
is the sum of the variances. The pivot for the two-sample case is then√n(Xn,r − Yn,r)√
{fX(ξp)}−2 + {fY (ξp)}−2√p(1− p)
d→ N(0, 1).
The Studentized version uses the same quantile spacing estimators as above:
Tm,n ≡√n(Xn,r − Yn,r)√
{n/(2m)}2(Xn,r+m −Xn,r−m)2 + {n/(2m)}2(Yn,r+m − Yn,r−m)2√p(1− p)
.
The same m is used for X and Y in anticipation of a Gaussian plug-in approach that
would yield the same m for X and Y regardless. This two-sample setup is easily
extended to the case of unequal sample sizes nx 6= ny, starting with√nx(Xn,r− ξp)−√
nx/ny√ny(Yn,r − ξp) and assuming nx/ny is constant asymptotically, but this is
omitted for clarity.
3. Fixed-m asymptotics and corrected critical value
The standard asymptotic result with m → ∞ as n → ∞ is a standard normal
distribution for the SBG test statistic Tm,n under the null hypothesis when γ = 0.
Since in reality m < ∞ for any given finite sample test, holding m fixed as n → ∞may give a more accurate asymptotic approximation of the true finite sample test
statistic distribution. With m →∞, there is increasing smoothing and Var(Sm,n) =
6 DAVID M. KAPLAN
O(1/m)→ 0; with m fixed, we have fixed-smoothing asymptotics since that variance
does not disappear. The fixed-m distribution is given below, along with a simple
formula for critical values that doesn’t require simulation.
3.1. Fixed-m asymptotics. Siddiqui (1960, eqn. 5.4) provides the fixed-m asymp-
totic distribution; Goh (2004, Appendix D.1) provides a nice alternative proof for the
median, which readily extends to any quantile.4 If γ = 0,
Tm,nd→ Z/V4m ≡ Tm,∞ as n→∞, m fixed,(4)
with Z ∼ N(0, 1), V4m ∼ χ24m/(4m), Z ⊥⊥ V4m, and Sm,n and Tm,n as in (2) and
(3).
The above distribution is conceptually similar to the Student’s t-distribution. A t-
statistic from normally distributed iid data has a standard normal distribution if either
the variance is known (so denominator is constant) or the sample size approaches
infinity. The distribution Tm,∞ is also standard normal if either the variance is known
(denominator in Tm,n is constant) or m→∞. When using an estimated variance in
a finite sample t-statistic, the more accurate t-approximation has fatter tails than a
standard normal, and it is given by Z/√Vv, where Z ∼ N(0, 1), Vv ∼ χ2
v/v with v the
degrees of freedom, and Z ⊥⊥ Vv. Similarly, when an estimated variance is used in the
SBG test statistic (the random Sm,n instead of constant 1/f(ξp) in the denominator),
the result is the distribution in (4) above with asymptotically independent numerator
and denominator. The fixed-m distribution reflects this uncertainty in the variance
estimator that is lost under the standard asymptotics.
For the two-sample test statistic under the null, as n→∞ with m fixed,
Tm,nd→ ZU≡ Tm,∞, where
U ∼ (1 + ε)1/2, ε ≡(V2
4m,1 − 1) + (V24m,2 − 1){fX(ξp)/fY (ξp)}2
1 + {fX(ξp)/fY (ξp)}2,
and Z, V4m,1, and V4m,2 are mutually independent. The derivation is in the full
appendix online. The strategy is the same, using results from Siddiqui (1960) for
asymptotic independence and distributions for each component.
Unlike the one-sample case, Tm,∞ is not pivotal. We can either estimate the density
derivative fX(ξp)/fY (ξp) or consider the upper bound for critical values. This is the
4The result stated for a general quantile in Theorem 2 in Section 3.5 appears to have a typo andnot follow from a generalization of the proof given, nor does it agree with Siddiqui (1960).
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION 7
same nuisance parameter faced by the two-sample fractional order statistic method
of Goldman & Kaplan (2012). Note that ε is a weighted average of V24m,1 − 1 and
V24m,2 − 1, where the weights sum to one and V4m,1 ⊥⊥ V4m,2. To see the effect of the
weights, consider any W1 and W2 with Cov(W1,W2) = 0, Var(W1) = Var(W2) = σ2W ,
and λ ∈ [0, 1]. Then
Var{λW1 + (1− λ)W2} = λ2Var(W1) + (1− λ)2Var(W2) + 2λ(1− λ)Cov(W1,W2)
= σ2W{λ2 + (1− λ)2}.
This is maximized by λ = 1 or 1 − λ = 1, giving σ2W . The minimum has first order
condition 0 = 2λ+2(1−λ)(−1), yielding λ = 1/2 = 1−λ and Var{λW1+(1−λ)W2} =
σ2W/2. This means the variance of ε (and U and Tm,∞) is smallest when the weights
are each 1/2, which is when fX(ξp) = fY (ξp). When the weights are zero and one,
which is when the variance of ε is largest, we get the special case of testing against
a constant and Tm,∞ = Tm,∞. Thus, critical values from the one-sample case provide
conservative inference in the two-sample case.
3.2. Corrected critical value. An approximation of the fixed-m cdf around the
standard normal cdf will lead to critical values based on the standard normal distri-
bution. The standard normal cdf is Φ(·), the standard normal pdf is φ(·) = Φ′(·),and the first derivative φ′(z) = −zφ(z). The first three central moments of the χ2
4m
distribution are 4m, 8m, and 32m, respectively. The general approach here is to use
the independence of Z and V4m to rewrite the cdf in terms of Φ, and then to expand
around E(V4m) = 1. Using (4), for a critical value z,
P (Tm,∞ < z) = P (Z/V4m < z) = E{Φ(zV4m)} = E[Φ{z + z(V4m − 1)}]
= E[Φ(z) + Φ′(z)z(V4m − 1) + (1/2)Φ′′(z){z(V4m − 1)}2 +O(m−2)
]= Φ(z)− z3φ(z)
2E
{(χ2
4m − 4m
4m
)2}
+O(m−2)
= Φ(z)− z3φ(z)
4m+O(m−2).(5)
Note that (5) is a distribution function itself: its derivative in z is {φ(z)/(4m)}(z4 −3z2 + 4m), which is positive for all z when m > 9/16 (as it always is); the limits at
−∞ and ∞ are zero and one; and it is cadlag since it is differentiable everywhere.
The approximation error O(m−2) does not change if m is fixed, but it goes to zero as
m→∞, which the selected m does as n→∞ (see Section 5.3). Uniform convergence
8 DAVID M. KAPLAN
of Φ(z)−z3φ(z)/(4m) to P (Tm,∞ < z) over z ∈ R as m→∞ then obtains via Polya’s
Theorem (e.g., DasGupta, 2008, Thm. 1.3(b)). Appendix A shows the accuracy of
(5) for m > 2; for m ≤ 2, simulated critical values can be used, as in the provided
code.
To find the value z that makes the nonstandard cdf above take probability 1 − α
(for rejection probability α under the null for an upper one-sided test), I set Φ(z)−z3φ(z)/(4m) = 1− α and solve for z. If z is some deviation of order m−1 around the
value z1−α such that Φ(z1−α) = 1− α for the standard normal cdf, then solving for c
in z = z1−α + c/m gives
c = z31−α/4 +O(m−1),(6)
and thus the upper one-sided test’s corrected critical value is
z = z1−α + c/m = z1−α +z31−α
4m+O(m−2).
For a symmetric two-sided test, since the additional term in (5) is an odd function of
z, the critical value is the same but with z1−α/2, yielding
z = zα,m +O(m−2), zα,m ≡ z1−α/2 +z31−α/2
4m.(7)
Note zα,m > z1−α/2 and depends on m; e.g., z.05,m = 1.96 + 1.88/m. Using the same
method with an additional term in the expansion, as compared in Appendix A,
z = z1−α/2 +z31−α/2
4m+z51−α/2 + 8z3
1−α/2
96m2+O(m−3).
The rest of this paper uses (7); the results in Section 5 also go through with the above
third-order corrected critical value or with a simulated critical value.
In the two-sample case under H0 : F−1X (p) = F−1
Y (p) = ξp, after calculating some
moments (see supplemental Two-sample Appendix C), the fixed-m distribution can
be approximated by
P (Tm,∞ < z) = Φ(z) + φ(z)zE(U − 1) + (1/2){−zφ(z)}z2E{(U − 1)2}
+O[E{(U − 1)3}]
= Φ(z)− φ(z)
4m
[z3{fX(ξp)}−4 + {fY (ξp)}−4
S40
− 2z{fX(ξp)}−2{fY (ξp)}−2
S40
]+O(m−2)
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION 9
= Φ(z)− φ(z)
4m
[z3 1 + δ4
(1 + δ2)2 − 2zδ2
(1 + δ2)2
]+O(m−2),
S0 ≡([fX{F−1
X (p)}]−2
+[fY {F−1
Y (p)}]−2)1/2
, δ ≡ fX(ξp)/fY (ξp).(8)
The corresponding critical value is
z = zα,m +O(m−2) ≤ zα,m +O(m−2),
zα,m = z1−α/2 +z31−α/2[{fX(ξp)}−4 + {fY (ξp)}−4]− 2z1−α/2{fX(ξp)}−2{fY (ξp)}−2
4mS40
= z1−α/2 +z31−α/2(1 + δ4)− 2z1−α/2δ
2
4m(1 + δ2)2 .(9)
When the pdf of Y collapses toward a constant, {fY (ξp)}−1 → 0 and so δ → 0;
P (Tm,∞ < z) reduces to (5), and zα,m to zα,m. Thus, as an alternative to estimat-
ing δ = fX(ξp)/fY (ξp), the one-sample critical value zα,m provides conservative two-
sample inference, as discussed in Section 3.1. Under the exchangeability assumption
used for permutation tests, δ = 1 and zα,m attains its smallest possible value.
4. Edgeworth expansion
HS88 give the Edgeworth expansion for the asymptotic distribution of Tm,n when
γ = 0. To calculate the type II error of a test using the fixed-m critical values (Section
5.2), the case γ 6= 0 is needed. Section 5.2 shows how to apply this result.
4.1. One-sample case. The result here includes the result of HS88 as a special case,
and indeed the results match5 when γ = 0. The ui functions are indexed by γ, so u1,0
is the same as u1 from HS88 (page 385), and similarly for u2,0 and u3,0.
Theorem 1. Assume f(ξp) > 0 and that, in a neighborhood of ξp, f′′ exists and
satisfies a Lipschitz condition, i.e. for some ε > 0 and all x, y sufficiently close to
ξp, |f ′′(x)− f ′′(y)| ≤ constant|x− y|ε. Suppose m = m(n)→∞ as n→∞, in such
a manner that for some fixed δ > 0 and all sufficiently large n, nδ ≤ m(n) ≤ n1−δ.
5There appears to be a typo in the originally published result; the first part of the first term of u1
in HS88 was (1/6)[p(1− p)]1/2, but it appears to be (1/6)[p/(1− p)]1/2 as given above.
10 DAVID M. KAPLAN
Define C ≡ γf(ξp)/√p(1− p), and defining functions
u1,γ(z) ≡ 1
6
(p
1− p
)1/21 + p
p(z2 − 1)− C
√1− pp
[1− pf ′(ξp)
{f(ξp)}2− 1
2(1− p)
]z
− 1
2
(p
1− p
)1/2[1 +
f ′(ξp)
{f(ξp)}2(1− p)
]z2
− {(bnpc+ 1− np)− 1 + (1/2)(1− p)}{p(1− p)}−1/2,
u2,γ(z) ≡ 1
4
(2Cz2 − C2z − z3
), and
u3,γ(z) ≡ 3{f ′(ξp)}2 − f(ξp)f′′(ξp)
6{f(ξp)}4(z − C),
it follows that
sup−∞<z<∞
∣∣∣P (Tm,n < z)− {Φ(z + C) + n−1/2u1,γ(z + C)φ(z + C)
+m−1u2,γ(z + C)φ(z + C) + (m/n)2u3,γ(z + C)φ(z + C)}∣∣∣
= o{m−1 + (m/n)2}.(10)
A sketch of the proof is given in Appendix D. The assumptions are the same as in
HS88.
For γ = 0, the term m−1u2,0(z)φ(z) = −z3φ(z)/(4m) is identical to the term in the
fixed-m distribution in (5). Thus under the null, the fixed-m distribution captures the
high-order Edgeworth term associated with the variance of Sm,n. In other words, the
fixed-m distribution is high-order accurate under the conventional large-m asymp-
totics, while the standard normal distribution is only first-order accurate. Since the
fixed-m distribution is also more accurate under fixed-m asymptotics, where the stan-
dard normal is not even first-order accurate, theory strongly indicates that fixed-m
critical values are more accurate, which is born out in simulations here and in Goh
(2004).
Let γ 6= 0 so that the null hypothesis is false, where as before H0 : ξp = β with
ξp = β − γ/√n. Letting S0 ≡ 1/f(ξp),
Tm,n =
√n(Xn,r − ξp)− γSm,n
√p(1− p)
=
√n(Xn,r − ξp)
Sm,n√p(1− p)
− γ√p(1− p)S0
(S0
Sm,n+ 1− 1
),
P (Tm,n < z) = P
{ √n(Xn,r − ξp)
Sm,n√p(1− p)
− γ√p(1− p)S0
(S0
Sm,n+ 1− 1
)< z
}
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION11
= P
{ √n(Xn,r − ξp)
Sm,n√p(1− p)
− γ√p(1− p)S0
(S0
Sm,n− 1
)< z + C
}
= P
{√n(Xn,r − ξp) + γ(Sm,n/S0 − 1)
Sm,n√p(1− p)
< z + C
}.(11)
If the true S0 were known and used in Tm,n instead of its estimator Sm,n, this
would be simply the distribution from HS88 with a shift of the critical value by
C ≡ γ/{S0
√p(1− p)}, which is γ normalized by the true (hypothetically known)
variance. But Sm,n is random, so the HS88 expansion is insufficient and Theorem 1
is needed.
4.2. Two-sample case. The strategy and results are similar; the full proof may be
found in the supplemental Two-sample Appendix.
Theorem 2. Let Xiiid∼ FX and Yi
iid∼ FY , and Xi ⊥⊥ Yj ∀i, j. With assumptions
in Theorem 1 applied to both FX and FY , and letting C ≡ γS−10 /√p(1− p), define
functions
u1,γ(z) ≡ 1
6
1 + p√p(1− p)
(g3x − g3
y
S30
)(z2 − 1)
+{
(a2g2x − a′2g2
y)(1− p) + (a1g2x − a′1g2
y)}{p/(1− p)}1/2(2p2S3
0)−1(z2)
−{
2(a2g2x − a′2g2
y) + (a1g2x − a′1g2
y)/(1− p)}
(2pS30)−1C{(1− p)/p}1/2(z)
+{
(a2g2x − a′2g2
y)− S20(a2 − a′2)
}{p(1− p)}1/2(2p2S3
0)−1
− {(bnpc+ 1− np)− 1 + (1/2)(1− p)}(gx − gy)(S0
√p(1− p)
)−1
,
u2,γ(z) ≡ −1
4
(g4x + g4
y
S40
)z3 +
1
2
g2xg
2y
S40
z − 1
2
g2xg
2y
S40
C +1
4
(g4x + g4
y
S40
)(2Cz2 − C2z),
u3,γ(z) ≡gxg′′x + gyg
′′y
6S20
(z − C),
gX(·) ≡ 1/fX(F−1X (·)), gx ≡ gX(p), g′′x ≡ g′′X(p), HX(x) ≡ F−1
X (e−x), ai ≡ H(i)X
(n∑j=r
j−1
),
gY (·) ≡ 1/fY (F−1Y (·)), gy ≡ gY (p), g′′y ≡ g′′Y (p), HY (y) ≡ F−1
Y (e−y), a′i ≡ H(i)Y
(n∑j=r
j−1
),
where H(i)(·) is the ith derivative of function H(·).
12 DAVID M. KAPLAN
It follows that
sup−∞<z<∞
∣∣∣P(Tm,n < z)− {Φ(z + C) + n−1/2u1,γ(z + C)φ(z + C)
+m−1u2,γ(z + C)φ(z + C) + (m/n)2u3,γ(z + C)φ(z + C)}∣∣∣
= o{m−1 + (m/n)2}.
For a two-sided test, u1,γ will cancel out, so the unknown terms therein may be
ignored. Under the null, when C = 0, the u2,γ term again exactly matches the fixed-
m distribution, demonstrating that the fixed-m distribution is more accurate than the
standard normal under large-m (as well as fixed-m) asymptotics. A similar comment
applies here, too, about the effect of Sm,n being random.
5. Optimal Smoothing Parameter Selection
5.1. Type I error. In order to select m to minimize type II error subject to control
of type I error, expressions for type I and type II error dependent on m are needed.
Comparing with the Edgeworth expansion in (10) when γ = 0, the approximate type
I error of the two-sided symmetric test can be calculated using the corrected critical
values from (7). Note that u1,0(z)φ(z) in (10) is an even function since φ(z) = φ(−z)
and z only appears as z2 in u1,0(z), so u1,0(z) = u1,0(−z); thus it will cancel out
for a two-sided test. Also note that the functions u2,0(z) and u3,0(z) are odd, so
u2,0(−z) = −u2,0(z) and u3,0(−z) = −u3,0(z). Below, the second high-order term will
disappear due to the use of the fixed-m critical value, leaving only the third high-order
term.
Proposition 3. If γ = 0, then P (|Tm,n| > zα,m) = eI + o{m−1 + (m/n)2},
eI = α− 2(m/n)2u3,0(z1−α/2)φ(z1−α/2).(12)
Proof. Starting with (10) with γ = 0, the two-sided symmetric test rejection proba-
bility under the null hypothesis for critical value z is
P (|Tm,n| > z | H0) = P (Tm,n > z | H0) + P (Tm,n < −z | H0)
= 2− 2Φ(z)− 2m−1u2,0(z)φ(z)
− 2(m/n)2u3,0(z)φ(z) + o{m−1 + (m/n)2}.
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION13
With the corrected critical value zα,m = z1−α/2 + z31−α/2/(4m),
P (|Tm,n| > zα,m | H0) = 2− 2Φ(zα,m) + 2z3α,mφ(zα,m)/(4m)
− 2(m/n)2u3,0(zα,m)φ(zα,m) + o{m−1 + (m/n)2}
= α− 2(m/n)2u3,0(z1−α/2)φ(z1−α/2) + o{m−1 + (m/n)2}.
�
The dominant part of the type I error, eI , depends on m, n, and z1−α/2, as well as
f(ξp), f′(ξp), and f ′′(ξp) through u3,0(z1−α/2). Up to higher-order remainder terms,
the type I error does not exceed nominal size α if u3,0(z1−α/2) ≥ 0, and there will be
size distortion if u3,0(z1−α/2) < 0.
For common regions of common distributions, u3,0(z1−α/2) ≥ 0 and so eI ≤ α when
using corrected critical values. Since z1−α/2 > 0, the sign of u3,0(z1−α/2) depends on
the sign of 3f ′(ξp)2− f(ξp)f
′′(ξp) or equivalently of the third derivative of the inverse
cdf ∂3
∂p3F−1(p). According to HS88, for p = 0.5, this is positive for all symmetric
unimodal densities and almost all skew unimodal densities, except when f ′ changes
quickly near ξp. I found that for any quantile p, the sign is positive for t-, normal,
exponential, χ2, and Frechet distributions; see Appendix B. This suggests that sim-
ply using the fixed-m corrected critical values alone is enough to reduce eI to α or
below.
For the two-sample case, as shown in the full appendix, the result is similar.
Proposition 4. If γ = 0, then
P(|Tm,n| > zα,m
)≤ P
(|Tm,n| > zα,m
)= eI + o{m−1 + (m/n)2},
eI = α− 2(m/n)2u3,0(z1−α/2)φ(z1−α/2).
Here also, the corrected critical value approximated from the fixed-m distribution
leads to the dominant part of type I error being bounded by α for all common dis-
tributions, since the sign of u3 is positive for the same reasons as in the one-sample
case. The one-sample critical value zα,m gives conservative inference; the infeasible
zα,m results in better power but requires estimating δ ≡ fX(ξp)/fY (ξp); the zα,m under
exchangeability (δ = 1) leads to the best power but also size distortion if exchange-
ability is violated.
14 DAVID M. KAPLAN
5.2. Type II Error. As above, I use the Edgeworth expansion in (10) to approximate
the type II error of the two-sided symmetric test using critical values from (7). Since
a uniformly most powerful test does not exist for general alternative hypotheses, I
will follow a common strategy in the optimal testing literature and pick a reasonable
alternative hypothesis against which to maximize power. The hope is that this will
produce a test near the power envelope at all alternatives, even if it is not strictly the
uniformly most powerful.
I choose to maximize power against the alternative where first-order power is 50% for
a two-sided test. From above, the type II error is thus 0.5 = P (|Tm,n| < z1−α/2).=
GC2(z21−α/2), where GC2 is the cdf of a noncentral χ2 distribution with one degree of
freedom and noncentrality parameter C2, and C was defined in Theorem 1. For α =
0.05, this gives γf(ξp)/√p(1− p) ≡ C = ±1.96, or γ = ±1.96
√p(1− p)/f(ξp).
Calculation of the following type II error may be found in the working paper.
Proposition 5. If C2 solves 0.5 = GC2(z1−α/2), and writing f for f(ξp) and similarly
f ′ and f ′′, then
P (|Tm,n| < zα,m) = eII + o{m−1 + (m/n)2},
eII = 0.5 + (1/4)m−1{φ(z1−α/2 − C)− φ(z1−α/2 + C)}Cz21−α/2
+ (m/n)2 3(f ′)2 − ff ′′
6f 4z1−α/2
{φ(z1−α/2 + C) + φ(z1−α/2 − C)
}+O(n−1/2).(13)
There is a bias term and a variance term in eII above. Per Bloch & Gastwirth (1968),
the variance and bias of Sm,n are indeed of order m−1 and (m/n)2, respectively, as
given in their (2.5) and (2.6):
AsyVar(Sm,n) = (2mf 2)−1 as m→∞,m/n→ 0,
AsyBias(Sm,n) = (m/n)2 3(f ′)2 − ff ′′
6f 5as m→∞,m/n→ 0,
similar to above aside from the additional 1/f 2 and 1/f factors. As m → ∞,
Var(Sm,n) → 0 (increasing smoothing); with m fixed, this variance is also fixed
(fixed smoothing). As n grows but Sm,n only uses a proportion of the n observa-
tions approaching zero, the bias will also decrease. The bias decreases to zero in the
fixed-m thought experiment, too, and thus is not captured by the fixed-m asymptotic
distribution.
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION15
Now there are expressions for the dominant components of type I and type II error
as functions of m, given in (12) and (13), so m can be chosen to minimize eII subject
to control of eI .
First, I look at eII more closely and try to sign some terms. As discussed before, for
common distributions 3(f ′)2−ff ′′
6f4 ≥ 0, so the entire (m/n)2 expression is positive; type
II error from this term is increasing in m, so a smaller value of m would minimize it.
The m−1 term in (13) is also positive since φ(z1−α/2 − C) > φ(z1−α/2 + C), so it is
minimized by a larger m.
It is then possible that minimizing eII gives an “interior” solution m, i.e. m ∈[1,min(r− 1, n− r)]. For example, for a standard normal distribution where n = 100
and p = 0.5, m can be between one and 49. As shown in Appendix B, 3(f ′)2− ff ′′ =[φ(0)]2, and dividing by 6f 4 yields 1/(6[φ(0)]2) = 2π/6. Then, in choosing m to
minimize (1/4)m−1φ(0)1.963 + (m/n)2√
2π(1.96)/6 = m−1(0.751) + m2(0.819)10−4,
the m−1 term dominates for small m and the other term for large m, so that eII is
minimized by m = 17, a feasible choice in practice.
The two-sample case is similar. Calculations are in the supplemental appendix.
Proposition 6. If C2 solves 0.5 = GC2(z1−α/2),
θ ≡ S−40 (g4
x + g4y) = (1 + δ4
a)/(1 + δ2
a
)2,
gx ≡ 1/fX(F−1X (p)
)and gy ≡ 1/fY
(F−1Y (p)
)are as defined in Theorem 2, as are g′′x
and g′′y , δa ≡ fX(F−1X (p)
)/fY(F−1Y (p)
)similar to (8), and
S0 ≡[{fX(F−1
X (p))}−2 + {fY (F−1Y (p))}−2
]1/2as in (8), then
P (|Tm,n| < zα,m) = eII + o{m−1 + (m/n)2},
eII = 0.5 + (1/4)m−1[φ(z1−α/2 + C){−θCz2
1−α/2 + (1− θ)z1−α/2}
+ φ(z1−α/2 − C){θCz21−α/2 + (1− θ)z1−α/2}
]+ (m/n)2
gxg′′x + gyg
′′y
6S20
z1−α/2{φ(z1−α/2 + C) + φ(z1−α/2 − C)
}+O(n−1/2).
5.3. Choice of m. With the fixed-m corrected critical values, eI ≤ α for all quantiles
of common distributions, as discussed. In implementing a method to choose m, below
16 DAVID M. KAPLAN
I use a Gaussian plug-in assumption (as is commonly done with other suggestions for
m), and consequently the calculated eI is indeed always less than nominal α (see
Appendix B). Thus, the method reduces to minimizing eII . Since eII has a positive
m−1 component and a positive m2 component, it will be U-shaped for positive m.
If the first-order condition yields an infeasibly big m, the biggest feasible m is the
minimizer over the feasible range.
For a randomized alternative for a symmetric two-sided test, the first-order condition
of (13) is
0 =∂eII∂m
= −(1/4)m−2[φ(z1−α/2 − C)− φ(z1−α/2 + C)]Cz21−α/2
+ (2m/n2)3(f ′)2 − ff ′′
6f 4z1−α/2
[φ(z1−α/2 + C) + φ(z1−α/2 − C)
],
m ={
(1/4){φ(z1−α/2 − C)− φ(z1−α/2 + C)}Cz21−α/2
/{(2/n2)3(f ′)2 − ff ′′
6f 4z1−α/2
[φ(z1−α/2 + C) + φ(z1−α/2 − C)
]}}1/3
.
Remember that C is chosen ahead (such as C = 1.96), φ is the standard normal
pdf, z1−α/2 is determined by α, and n is known for any given sample; but the object3(f ′)2−ff ′′
6f4 is unknown. Since size control isn’t dependent on m, it may be worth
estimating that object even though the variance of that estimator will be large; this
option has not been explored.
As is common in the kernel bandwidth selection literature, I will plug in6 φ for f (or
rather φ(Φ−1(p)) for f(ξp)). In all the simulations below, when np is close to 1 or n,
this makes no difference in the m selected; this is also true if F is actually normal,
or special cases like the median for log-normal. In some cases the m chosen will be
very different, but the effect on power is small; e.g., with Unif(0, 1), n = 45, p = 0.5,
α = 0.05, the Gaussian plug-in yields m = 9 instead of m = 22, but the power loss is
only 4% at the alternative considered in Proposition 5. Still, estimation of f , f ′, and
f ′′ may be best for larger n.
The Gaussian plug-in yields
mK(n, p, α, C) ={
(1/4){φ(z1−α/2 − C)− φ(z1−α/2 + C)}Cz21−α/2
/{(2/n2)3(φ′(Φ−1(p)))2 − φ(Φ−1(p))φ′′(Φ−1(p))
6(φ(Φ−1(p)))4
6Here, using N(0, 1) is mathematically equivalent to N(µ, σ2) since µ and σ cancel out.
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION17
× z1−α/2[φ(z1−α/2 + C) + φ(z1−α/2 − C)
]}}1/3
= n2/3(Cz1−α/2)1/3(3/4)1/3
((φ(Φ−1(p)))2
2(Φ−1(p))2 + 1
)1/3
×(φ(z1−α/2 − C)− φ(z1−α/2 + C)
φ(z1−α/2 − C) + φ(z1−α/2 + C)
)1/3
,
and for the 50% first-order power C and α = 0.05, as calculated below,
mK(n, p, α = 0.05, C = 1.96) = n2/3(1.42)
((φ(Φ−1(p)))2
2(Φ−1(p))2 + 1
)1/3
.(14)
Compare (14) with the suggested m from HS88. Using their Edgeworth expansion
under the null, they calculate the level error of a two-sided symmetric confidence
interval constructed by inverting Tm,n with standard normal critical values. Since the
n−1/2 term disappears for two-sided tests and confidence intervals, m can be chosen
to make the level error zero as long as u3,0 is positive7. Rewriting their (3.1) to match
my notation above,
mHS =⌊n2/3z
2/31−α/2
[1.5f 4/(3(f ′)2 − ff ′′)
]1/3⌋.
HS88 proceed with a median-specific, data-dependent method. More commonly, pa-
pers like Koenker & Xiao (2002) plug in φ for f when computing hHS (HS=Hall
and Sheather) in their simulations, as shown explicitly on page 3 of their electronic
appendix; Goh & Knight (2009) use the same Gaussian plug-in version for their sim-
ulations. Koenker & Xiao (2002) also use a Gaussian plug-in procedure based on
Bofinger (1975), whose m is of rate n4/5 to minimize MSE, that is also considered
in the simulation study below. Their “bandwidth” h is just m/n, so they use the
equivalent of
mHS = n2/3z2/31−α/2
(1.5
[φ(Φ−1(p))]2
2(Φ−1(p))2 + 1
)1/3
,(15)
mB = n4/5
(4.5
[φ(Φ−1(p))]4
[2(Φ−1(p))2 + 1]2
)1/5
.(16)
For p = 0.5, α = 0.05, these two will be equal if n = 20.93, or rather mB < mHS if
n < 21 and mB > mHS if n ≥ 21. Since they use the same critical values, size and
power will be smaller whenever m is bigger.
7If u3,0 is negative, their first order condition for m is similar but with an additional (1/2)1/3
coefficient; in Section 3, they only consider positive u3,0.
18 DAVID M. KAPLAN
Figure 1. Plot of the ratio mK/mHS against the first-order powerused to calculate C; 0.5 is used throughout this paper.
Since z2/31−α/2(1.5)1/3 = 1.79 for α = 0.05, mK(n, p, α = 0.05, C = 1.96) = 0.79mHS.
This ratio is calculated for other values of C based on other first-order power values,
and the plot is shown in Figure 1; the dependence is mild until values close to 0%
or 100% are chosen. Remember that mK is optimized for the test using corrected
critical values from the fixed-m distribution, whilemHS is chosen for use with standard
normal critical values. It then makes sense that mK < mHS since the fixed-m critical
values control size, allowing for smaller m when otherwise eI might be a binding
constraint for a test with standard normal critical values.
For the two-sample case, the strategy is the same. The testing-optimal mK is found
by taking the first-order condition of (6) with respect to m, and then solving for
m. Since the two-sample eII has positive m−1 and m2 terms, it is U-shaped, which
again justifies picking the largest feasible m if the calculated m is infeasibly large.
Depending if zα,m or zα,m is used, solving for m in the FOC yields
zα,m : mK = n2/3(3/4)1/3
(S2
0
gxg′′x + gyg′′y
)1/3
×
{2g2
xg2y
S40
+g4x + g4
y
S40
Cz1−α/2φ(z1−α/2 − C)− φ(z1−α/2 + C)
φ(z1−α/2 − C) + φ(z1−α/2 + C)
}1/3
,
zα,m : mK = n2/3(3/4)1/3
(S2
0
gxg′′x + gyg′′y
)1/3
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION19
×
{2g2
xg2y
S40
(z21−α/2 + 2) +
g4x + g4
y
S40
Cz1−α/2φ(z1−α/2 − C)− φ(z1−α/2 + C)
φ(z1−α/2 − C) + φ(z1−α/2 + C)
}1/3
,
where the latter m is weakly larger, with equality in the limiting special case where
either X or Y is a constant. I again chose to proceed with a Gaussian plug-in (using
sample variances) for the g and S0 terms due to the difficulty of estimating g′′, but
power can be increased by more refined plug-in methods.
Figure 2. Analytic and empirical eI and eII by m.Both: n = 21, p = 0.5, α = 0.05, F is log-normal with µ = 0, σ = 3/2,5000 replications (for empirical values). Lines are analytic; markerswithout lines are empirical. Legends: “An” is analytic (Edgeworth-derived), “Emp” is empirical (simulated), “std” is standard normalcritical values, and “fix” is fixed-m critical values.Right (eII): randomized alternative (see text), γ/
√n = 0.80.
Figure 2 compares the Edgeworth approximations to the true (simulated) type I
and type II error for scalar data drawn from a log-normal distribution. For non-
normal distributions like this, the approximation can be significantly different. With
standard normal critical values, type I error is monotonically decreasing with m while
type II error is monotonically increasing, so setting eI = α exactly should minimize
eII subject to eI ≤ α. With fixed-m critical values, this is not true, so the Edgeworth
expansion for general γ in Theorem 1 is necessary. With the fixed-m critical values,
type I error is always below α (approximately, and usually truly), and the type II
error curve near the selected m is very flat since it is near the minimum where the
slope is zero. Consequently, a larger-than-optimal m can result in larger power loss
for HS88 than the new method, and a smaller-than-optimal m incurs size distortion
for HS88 but not the new method.
20 DAVID M. KAPLAN
6. Simulation Study8
Regarding m, Siddiqui (1960) originally suggested a value of m on the order of n1/2.
Bloch & Gastwirth (1968) then suggested a rate of n4/5, to minimize the asymptotic
mean square error of Sm,n. With this n4/5 rate, Bofinger (1975) then suggested the mB
above in (16). HS88, using an Edgeworth expansion under the null hypothesis, find
that an m of order n2/3 minimizes the level error of two-sided tests, and they provide
an infeasible expression for m. For the median, they also give a data-dependent
method for selecting m; for a general quantile, the Gaussian plug-in version of mHS
in (15) is more commonly used, as in Koenker & Xiao (2002) and Goh & Knight
(2009).
The following simulations compare five methods. The first method is presented in
this paper, with fixed-m corrected critical values and mK chosen to minimize type II
error subject to control of type I error; the code is publicly available as Kaplan (2011).
For the two-sample zα,m, I implemented a simplistic estimator of θ ≡ S−40 (g4
x + g4y)
using the quantile spacings; a better estimator of θ would improve performance, as
Figure 11 exemplifies. The second method uses standard normal critical values and
mHS as in (15). The third method uses standard normal critical values and mB as
in (16). These three methods are referred to as “New method,” “HS,” and “B,”
respectively, in the text as well as figures in this section. For the new method, I used
the floor function to get an integer value for mK ; for HS and B, I instead rounded to
the nearest integer, to not amplify their size distortion.
The fourth method “BS” is a bootstrap method. The conclusion of much experimen-
tation was that a symmetric percentile-t using bootstrapped variance had the best
size control in the simulations presented below. For the number of outer (B1) and
inner (B2, for the variance estimator) bootstrap replications, B1 = 99 and B2 = 100
performed best and thus were used; any exceptions are noted below.
Compared to the best-performing bootstrap method, increasing B1 from 99 to 999
more often increased size distortion (Table 3), though not by much. Decreasing B2
from 100 to 25 was also worse. Empirical size distortion was also reduced by taking
the 1 − α quantile of the absolute values and comparing the absolute value of the
8MATLAB code implementing a hybrid of Hutson’s method and the new method, quantile inf.m, ispublicly available through the author’s website or MATLAB File Exchange; R code quantile inf.Ris also available available on the author’s website. MATLAB code for simulations is available fromthe author upon request.
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION21
(non-bootstrap) test statistic instead of taking α/2 and 1 − α/2 quantiles from the
sample of B1 Studentized bootstrap test statistics—i.e., the symmetric test was better
than the equal-tailed test. The more basic percentile method (without Studentiza-
tion; either equal-tail or symmetric) had worse size distortion in small samples or
less central quantiles, though less distortion with the median in larger samples; and
Studentizing with the SBG sparsity estimator or a kernel density estimator worsened
the size distortion in some cases. Assuming a normally distributed test statistic and
bootstrapping only the variance also had severe size distortion in many cases. For
example, for F = N(0, 1), n = 11, p = 0.1, α = 5%, one-sample empirical size of
the symmetric percentile-t with B1 = 99 and B2 = 100 is 9.7%; with B1 = 999 in-
stead, 9.8%; with an equal-tail test instead, 14.6%; with a symmetric percentile with
B1 = 99 or 999, 17.3% or 17.5%. Using m-out-of-n bootstrap had the potential to
perform better if the oracle m were used, but in many cases there was size distor-
tion even with the best m, and the risked increase in size distortion from picking the
wrong m was often large. The optimal choice of smoothing parameter there and with
the smoothed bootstrap is difficult but critical; but see Bickel & Sakov (2008) and
Ho & Lee (2005) for recent efforts. With respect to the example setup, a percentile
method smoothing with h = 0.92n−1/5 gives 5.1% for N(0, 1), but 23.1% for slash, or
9.0% and 17.1% for smoothed percentile-t; compare also the top and bottom halves
of Table 2 in Brown, Hall & Young (2001).
Figure 3. Graph showing for which combinations of quantile p ∈(0, 1) and sample size n Hutson (1999) is computable, in the middleregion. An extension marginally increases the region by not requiringan equal-tail test. Left: α = 5%. Right: α = 1%.
22 DAVID M. KAPLAN
Figure 4. Example of superior performance of Hutson (1999): size isalways controlled, power is better. In this one-sample case, uncalibratedpower is better for almost all alternatives even though all other methodsare size distorted. Distribution is log-normal with µ = 0, σ = 3/2;α = 5%, n = 21, and power is for p = 0.85.
Figure 5. Example of superior performance of Hutson (1999), two-sample. Distributions are both log-normal with µ = 0, σ = 3/2; α =5%, n = 21, and power is for p = 0.55.
The fifth method is an exact method from Hutson (1999). For one-sample inference,
it performed best in every single simulation where it could be used, so the results here
focus on cases when it cannot be computed. An example of its superior size control
and power is Figure 4. A graph showing when a simple extension of Hutson (1999), as
well as the original, can be computed is in Figure 3. Instead of putting α/2 probability
in each tail as in Hutson (1999), the extension shifts probability between the tails if
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION23
necessary. Simulations show that it performs equally well within the region of Figure
3. For two-sample inference, a simplistic extension of Hutson (1999) outperforms all
other methods most of the time, as in Figure 5, but not always. Since it seems like
a more refined extension of Hutson (1999) will be best for two-sample inference, the
results here focus on cases where that method cannot be used. However, if such a
refinement is not found, bootstrap and the new method provide strong competition;
HS and B can have significantly worse power, as Figure 5 shows.
Another “exact” method computes∑n
i=1 1{Xi < β}, which under H0 : ξp = β is
distributed Binomial(n, p). One downside is that the power curve often flattens out
as ξp deviates farther from β. A bigger downside is that randomization is needed,
particularly for small n and for np near one or n. For instance, with n = 11, p = 0.1,
and α = 5%, there is a 31.4% probability of observing zero Xi less than ξp. For a
two-sided test, we would randomly reject 0.025/0.314 = 8% of the time we observe
zero, and not reject 92% of the time we observe zero. Because of the common aversion
to randomized tests, this method is left out of the simulation results. However, the
randomized binomial test always has exact size, and its power curve near H0 is often
steeper than the non-exact tests; both are desirable properties.
The hope of this paper was to produce a new method that controls size better than
HS and B (and BS) while keeping power competitive. For one-sample inference in
cases when Hutson (1999) is not computable, the following simulations show that
the new method eliminates or reduces the size distortion of HS, B, and BS. The new
method also has good power, sometimes even better than BS. When Hutson (1999)
is computable, the new method can have better power than HS and B, too.
The two-sample results are more complicated since performance depends on how
similar the two distributions are. The one-sample results represent one extreme of
the two-sample case as the ratio fX(ξp)/fY (ξp) goes to infinity (or zero), when size
is hardest to control. The other extreme is when fX(ξp) = fY (ξp). In that case, HS
and B9 have less size distortion, but still some. BS, though, seems to control size
and have better power than the new method with estimated θ; with the true θ, BS
and the new method perform quite similarly. General two-sample setups would have
results in between these extremes of fX/fY →∞ (or zero) and fX/fY = 1.
Unless otherwise noted, 10,000 simulation replications were used, and α = 5%.
9Since HS88 and Bofinger (1975) do not provide values of m for the two-sample case, I used theirm values for the univariate case.
24 DAVID M. KAPLAN
Table 1. Empirical size as percentage, n = 3, 4, p = 0.5, one-sample.
n = 3 n = 4Distribution New BS HS/B New BS HS/BN(0, 1) 1.6 1.2 11.9 2.4 4.1 15.4Logn(0, σ = 3/2) 3.3 1.3 12.8 1.8 3.6 10.0Exp(1) 3.0 1.2 13.5 2.3 4.6 12.9Uniform(0, 1) 3.2 1.5 16.0 5.0 6.8a 21.3GEV(0, 1, 0) 1.9 1.3 12.1 2.1 4.0 13.6χ2
1 4.8 1.6 14.8 2.6 4.3 11.8
a6.7% with 999 outer replications
Table 2. Empirical size as percentage, n = 3, 4, p = 0.5, two-samplewith identical independent distributions.
n = 3 n = 4Distribution Newa BS HS/B New BS HS/BN(0, 1) 0.8 1.3 6.3 1.1 2.5 6.5Logn(0, σ = 3/2) 0.4 0.8 4.5 0.6 1.3 3.7Exp(1) 1.1 1.4 6.4 0.9 1.9 5.9Uniform(0, 1) 1.4 1.9 8.6 1.8 3.6 8.6
aIf true θ used instead: 1.6, 1.1, 1.6, 2.8.
In the extreme sample size case where n = 3 or n = 4, HS and B are severely size
distorted while the new method and BS control size (except BS for n = 4 and uniform
distribution), as shown in Table 1. In the other special case for two-sample, where
the distributions are identical, all rejection probabilities are much smaller, though
some size distortion remains for HS and B; see Figure 2. For one-sample power, BS is
very poor for n = 3, but better than the new method at some alternatives for n = 4;
see Figures 6. For two-sample power, BS is equal to the new method for a range of
alternatives with n = 3, and better for n = 4; see Figure 6.
For n = 11, all methods can have size distortion, but the new method has the least,
while HS and B have the most. This occurs even for a N(0, 1), but is worse for the
fat-tailed slash distribution; see Figure 7. Whether or not BS is size distorted, the
new method still has competitive power with it, as shown in Figure 8.
For n = 45 and p = 0.05 or p = 0.95, the new method controls size except for
empirical size around 5.5% for the slash distribution. In contrast, BS, HS, and B
are all size distorted, sometimes significantly; see Table 3. Still, the new method’s
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION25
Figure 6. Empirical power properties; p = 0.5. Top row: F =Exp(1), one-sample. Bottom: N(0, 1), two-sample. Left column:n = 3. Right: n = 4.
Table 3. Empirical size as percentage, p = 0.95, one-sample.
n = 45 n = 21BS; B1 = BS; B1 =
Distribution New 99 999 HS/B New 99 999 HS/BN(0, 1) 3.2 6.4 6.5 10.4 6.7 11.3 11.5 25.3Slash 5.3 7.8 8.1 10.7 12.9 17.8 18.6 30.3Logn(0, σ = 3/2) 5.0 7.3 7.5 10.7 11.3 14.6 15.2 28.7Exp(1) 4.0 6.9 7.0 10.7 8.0 12.6 13.1 26.9Uniform(0, 1) 3.3 5.4 5.3 11.2 4.7 6.8 6.7 22.3GEV(0, 1, 0) 4.1 7.2 7.3 11.0 7.3 12.2 12.6 25.9χ2
1 3.6 6.5 6.4 10.2 8.7 13.5 13.9 27.5
26 DAVID M. KAPLAN
Figure 7. Empirical size properties, one-sample, n = 11. Left:N(0, 1). Right: slash.
Figure 8. Empirical power properties, one-sample, n = 11. Left:N(0, 1), p = 0.2. Right: slash, p = 0.1.
Table 4. Empirical size as percentage, p = 0.95, two-sample withindependent distributions of same shape but differing in variance byfactor of 25. Distribution with larger variance is shown.
n = 21 n = 29 n = 45Distribution New BS HS/B New BS HS/B New BS HS/BN(0, 1) 6.3 9.5 21.1 3.8 6.9 13.5 3.4 5.5 8.6Logn(0, σ = 3/2) 11.5 15.6 28.2 5.9 9.3 17.8 4.6 7.2 10.1Exp(1) 7.4 10.9 21.6 4.0 7.9 14.1 3.5 6.0 8.1
Uniform(0,√
12) 5.6 6.6 19.8 4.3 5.7 16.0 3.9 5.2 10.7
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION27
Figure 9. Empirical power properties, one-sample. n = 45, p = 0.95.Left: slash. Right: GEV(0,1,0).
Figure 10. Empirical power properties, two-sample. n = 45, p =0.95. Left: N(0, 1) distributions. Right: Uniform(0,
√12).
power is competitive with BS even without calibration, as in Figure 9. For the
two-sample case with identical distributions, BS always controls size along with the
new method. However, moving toward the limiting case where one distribution is a
constant, when there is a significant different in variance between the two samples’
distributions (factor of 25), BS has significantly worse size distortion than the new
method for n = 21, and has size distortion while the new method has none for n = 29
and n = 45; see Table 4. In the case of identical distributions, the new method has
power competitive with BS for n = 21, but by n = 45 BS has significantly better
power; see Figures 10 and 11.
28 DAVID M. KAPLAN
Figure 11. Empirical power properties, two-sample. n = 21. Left:N(0, 1) distributions, p = 0.05. Right: Exp(1), p = 0.05. Betterestimators of θ would increase power for the new method.
The new method can’t be used as-is when np < 1 or np ≥ n−1, i.e. when the estimator
is the sample minimum or maximum. It would be computationally possible to use
a one-sided quantile spacing in those cases, but at that point an extreme quantile
framework is more appropriate; see Chernozhukov & Fernandez-Val (2011).
For any sample size, it appears that quantiles close enough to zero or one will cause
size distortion in HS, B, and BS. For example, even with n = 250 and a normal
distribution, tests of the p = 0.005 quantile for α = 0.05 have empirical size 9.6% for
BS and 20.7% for HS and B, compared with 5.3% for the new method.
There are two different proximate “causes” of size distortion for HS and B in general,
illustrated in Figure 12. When n = 21 and p = 0.2, r = b(21)(0.2)c + 1 = 5, which
means that m can be at most 4 such that r − m ≥ 1; and mHS = mB = 4, the
biggest m possible. This is the best choice, but still size distorted. For n = 21 and
any p < 0.35, mHS = mB = r − 1, the maximum possible m. However, when n = 21
and p ≥ 0.35, mHS and mB are strictly less than the maximum possible m. In those
cases, HS and B could have chosen a larger m to lessen the size distortion, but didn’t.
In some cases, both “causes” apply, such as with F = Exp(1), n = 71, p = 0.2, and
α = 0.05, where bigger mHS would have reduced size distortion, but not eliminated it.
Focusing on cases when Hutson (1999) can’t be computed, the use of standard critical
values is always the proximate cause since mHS is always the maximum allowed.
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION29
Figure 12. Proximate causes of size distortion for HS and B. Em-pirical type I error (x markers) is plotted against m, for a test withstandard normal critical values; missing marker on the left for m = 1is above 0.15. The dashed line in the right plot is the plug-in eI usedto select mHS. Left: no m can control size. Right: approximationerror leads to mHS = 7 < 10 and size distortion. F = Uniform(0, 1),n = 21, α = 0.05, and p = 0.2 on the left, p = 0.5 on the right; 5000replications.
Of theoretical (if not directly practical) interest is Figure 13. The new method chooses
m to maximize power, and in some cases this produces significantly better power than
HS or B. This is true in both the one-sample case (top row) and the two-sample case
(bottom row).
Theoretical calculations corroborate the new method’s power advantage over HS in
the Figure 13 simulations. Type II error for the new method is given in Proposition
5. As seen in Appendix C, using standard normal critical value z1−α/2 instead of zα,m
will only alter the m−1 term. To compare the new method to HS, only the m−1 and
(m/n)2 terms are needed since the first-order term and the n−1/2 term are identical.
The m−1 term is always larger for the new method due to the more conservative
critical value, while the (m/n)2 term is always larger for HS since mHS > mK .
Specifically, for α = 0.05, C = 1.96 (equivalent to −1.96 since the Cauchy distribution
is symmetric about its median), and p = 0.5, the smoothing parameters are mK ≈(0.77)n2/3 and mHS ≈ (0.97)n2/3, using (14) and (15). As discussed, I rounded mHS
to the nearest integer to avoid exacerbating the size distortion already seen in Table
3, for example, while the floor function is used for the new method. For n = 11,
mK = b3.8c = 3 and mHS = round(4.8) = 5; for n = 41, mK = b9.2c = 9 and
30 DAVID M. KAPLAN
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Power curves with F=t(1);α=0.05;nreplic=50000;n=11;p=0.5
absolute deviation from H0
empi
rical
nul
l rej
ectio
n pr
obab
ility
New methodHSB
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Power curves with F=t(1);α=0.05;nreplic=50000;n=41;p=0.5
absolute deviation from H0
empi
rical
nul
l rej
ectio
n pr
obab
ility
New methodHSB
0 0.5 1 1.5 2 2.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Power curves with Fx=t(1);F
y=t(1);α=0.05;nreplic=50000;n=11;p=0.5
absolute deviation from H0
empi
rical
nul
l rej
ectio
n pr
obab
ility
New methodHSB
0 0.2 0.4 0.6 0.8 1 1.20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Power curves with Fx=t(1);F
y=t(1);α=0.05;nreplic=50000;n=41;p=0.5
absolute deviation from H0
empi
rical
nul
l rej
ectio
n pr
obab
ility
New methodHSB
Figure 13. Empirical power curves, comparing n = 11 (left column)and n = 41 (right column). 50,000 replications each, p = 0.5, anddistribution is Cauchy. Top row: one-sample. Bottom row: two-sample.
mHS = round(11.5) = 12. For the Cauchy distribution, f(x) = π−1(1 + x2)−1,
f(0) = π−1, f ′(0) = 0, and f ′′(0) = −2π−1, so
3(f ′)2 − ff ′′
6f 4= π2/3.
Taking φ(1.96 + 1.96) ≈ 0 and φ(0) = (2π)−1/2, the new method’s type II error has
m−1 term approximately (0.751)m−1K , while the m−1 term for HS is approximately
zero. The (m/n)2 term is approximately (2.57)(m/n)2. This means that, up to
o(m−1 +m2/n2) terms, the type II error difference is
eKII − eHSII = m−1K (0.751) + (m2
K −m2HS)(2.57/n2).
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION31
Plugging in n = 11, mK = 3, and mHS = 5, eKII − eHSII = −0.090 < 0, meaning
better power for the new method. With n = 41, mK = 9, and mHS = 12, eKII−eHSII =
−0.013 < 0, again better power for the new method. To compare with Figure 13, note
that γ/√n = (C/f(0))
√p(1− p)/n ≈ 0.9 for n = 11 and γ/
√n ≈ 0.5 for n = 41.
The analytic differences from the m−1 and (m/n)2 terms are still smaller than the
differences in the top row of Figure 13; with small samples, the o(m−1 +m2/n2) terms
also contribute significantly.
The π2/3 coefficient in the (m/n)2 term, based on the Cauchy PDF and its deriva-
tives, is key to having that term outweigh the m−1 term. With a standard normal
distribution, the coefficient is π/3, smaller by a factor of π; for other distributions
and/or quantiles, the coefficient may be larger.
In the two-sample case, the results shown and discussed were for independent sam-
ples, as the theoretical results assumed. If there is enough negative correlation, size
distortion occurs for all methods; if there is positive correlation, power will decrease
for all methods. The effect of uncorrelated, dependent samples could go either way.
In practice, examining the sample correlation as well as one of the many tests for sta-
tistical independence is recommended, if independence is not known a priori.
7. Conclusion
This paper proposes new one- and two-sample quantile testing procedures based on
the SBG test statistic in (3). Critical values dependent on smoothing parameter m
are derived from fixed-smoothing asymptotics. These are more accurate than the
conventional standard normal critical values since the fixed-m distribution is shown
to be more accurate under both fixed-m and large-m asymptotics. Type I and II
errors are approximated using an Edgeworth expansion, and the testing-optimal mK
minimizes type II error subject to control of type I error, up to higher-order terms.
Simulations show that, compared with the previous SBG methods from HS88 and
Bofinger (1975), the new method greatly reduces size distortion while maintaining
good power. The new method also outperforms bootstrap methods for one-sample
tests and certain two-sample cases. Consequently, this new method is recommended
in one-sample cases when Hutson (1999) is not computable. In two-sample cases
when Goldman & Kaplan (2012) is not computable, performance depends on the
unknown distributions and warrants further research into both bootstrap methods
and the two-sample plug-in method (particularly estimating θ) proposed here. Finally,
32 DAVID M. KAPLAN
theoretical justification for fixed-smoothing asymptotics is provided outside of the
time series context; there are likely additional models that may benefit from this
perspective.
References
[1] Bickel, Peter J. & Anat Sakov, ‘On the choice of m in the m out of nbootstrap and confidence bounds for extrema.’ Statistica Sinica, 18 (3), pp.967–985, 2008.
[2] Bloch, Daniel A. & Joseph L. Gastwirth, ‘On a Simple Estimate of theReciprocal of the Density Function.’ The Annals of Mathematical Statistics,39 (3), pp. 1083–1085, 1968.
[3] Bofinger, Eve, ‘Estimation of a density function using order statistics.’Australian Journal of Statistics, 17 (1), pp. 1–7, 1975.
[4] Brown, B. M., Peter Hall & G. A. Young, ‘The smoothed median andthe bootstrap.’ Biometrika, 88 (2), pp. 519–534, 2001.
[5] Chernozhukov, Victor & Ivan Fernandez-Val, ‘Inference for Ex-tremal Conditional Quantile Models, with an Application to Market and Birth-weight Risks.’ The Review of Economic Studies, 78 (2), pp. 559–589, 2011.
[6] DasGupta, Anirban, Asymptotic Theory of Statistics and Probability, NewYork: Springer, pp.3, 2008.
[7] David, Herbert Aron, Order Statistics, Wiley series in probability andmathematical statistics: Wiley, 2nd edition, 1981.
[8] Goh, S. Chuan & Keith Knight, ‘Nonstandard Quantile-Regression In-ference.’ Econometric Theory, 25 (5), pp. 1415–1432, 2009.
[9] Goh, S. Chuan, ‘Smoothing choices and distributional approximations foreconometric inference.’ Ph.D. dissertation, UC-Berkeley, 2004.
[10] Goldman, Matt & David M. Kaplan, ‘IDEAL Quantile Inference viaInterpolated Duals of Exact Analytic L-statistics.’ 2012. Working paper.
[11] Hall, Peter & Simon J. Sheather, ‘On the Distribution of a StudentizedQuantile.’Technical Report 71, Department of Statistics, Pennsylvania StateUniversity, 1987.
[12] ‘On the Distribution of a Studentized Quantile.’ Journal of the RoyalStatistical Society. Series B (Methodological), 50 (3), pp. 381–391, 1988.
[13] Ho, Yvonne H. S. & Stephen M. S. Lee, ‘Iterated Smoothed BootstrapConfidence Intervals for Population Quantiles.’ The Annals of Statistics, 33(1), pp. 437–462, 2005.
[14] Hutson, Alan D., ‘Calculating nonparametric confidence intervals for quan-tiles using fractional order statistics.’ Journal of Applied Statistics, 26 (3), pp.343–353, 1999.
[15] Kaplan, David M., ‘Inference on quantiles: confidence in-tervals, p-values, and testing.’ MATLAB Central File Ex-change, http://www.mathworks.com/matlabcentral/fileexchange/
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION33
Table 5. Simulated rejection probabilities (%) for different fixed-mcritical value approximations, α = 5%.
rejection probability for critical valuem Goh (2004, simulated) including m−1 including m−2 t1 4.93 8.21 5.84 6.872 5.03 6.25 5.20 5.473 5.05 5.70 5.10 5.284 5.15 5.46 5.06 5.215 4.92 5.23 4.96 5.08
10 5.01 5.14 5.06 5.1420 5.04 5.01 4.99 5.04
32410-quantile-inference-testing-univariate-and-bivariate, Au-gust, 2011. Retrieved 25 March 2012.
[16] ‘IDEAL inference on conditional quantiles.’ 2012. Working paper.[17] Koenker, Roger & Zhijie Xiao, ‘Inference on the Quantile Regression
Process.’ Econometrica, 70 (4), pp. 1583–1612, 2002.[18] Mosteller, F., ‘On Some Useful “Inefficient” Statistics.’ Annals of Mathe-
matical Statistics, 17, pp. 377–408, 1946.[19] Siddiqui, Mohammed M., ‘Distribution of quantiles in samples from a bi-
variate population.’ J. Research of the NBS, B. Math. and Math. Physics,64B (3), pp. 145–150, 1960.
[20] Sun, Yixiao, Peter C. B. Phillips & Sainan Jin, ‘Optimal BandwidthSelection in Heteroskedasticity-Autocorrelation Robust Testing.’ Economet-rica, 76 (1), pp. 175–194, 2008.
Appendix A. Accuracy of approximation of fixed-m critical values
The approximate fixed-m critical value in (7) is quite accurate for all but m = 1, 2,
as Table 5 shows. The second approximation alternative adds the O(m−2) term to
the approximation. The third alternative uses the critical value from the Student’s
t-distribution with the degrees of freedom chosen to match the variance. To compare,
for various m and α, I simulated the two-sided rejection probability for a given critical
given Tm,∞ from (4) as the true distribution. One million simulation replications per
m and α were run; to gauge simulation error, I also include in the table the critical
values given in Goh (2004) (who ran 500,000 replications each). Additional values of
m and α are available in the online appendix.
34 DAVID M. KAPLAN
Appendix B. Sign of u3,0(z) for common distributions
Here, I only show the work for the normal distribution; the method is the same for
the other distributions. Work for t-, exponential, χ2, and Frechet distributions is in
the supplemental appendix. For the uniform distribution, u3,0 = 0 since f ′ = 0 and
f ′′ = 0.
Starting with the pdf and derivatives of the normal distribution,
fn(x) =1
σ√
2πe−(x−µ)2/(2σ2), f ′n(x) = −(x− µ)/σ2 · fn(x)
f ′′n(x) = −fn(x)/σ2 − f ′n(x)(x− µ)/σ2 = −fn(x)/σ2 − [−(x− µ)/σ2 · fn(x)](x− µ)/σ2
= −fn(x)/σ2 + (x− µ)2/σ4 · fn(x) = fn(x)[(x− µ)2/σ4 − 1/σ2],
3f ′n(x)2 − fn(x)f ′′n(x) = 3[−(x− µ)/σ2 · fn(x)]2 − fn(x) · fn(x)[(x− µ)2/σ4 − 1/σ2]
= [fn(x)]2{2(x− µ)2/σ4 + 1/σ2} ≥ 0,
for any µ and σ2. Thus the sign of u3,0(z) is always the sign of z for the normal
distribution at any quantile.
Appendix C. Calculation of type II error (Proposition 5)
The type II error is the probability of not rejecting when H0 is false. For a two-sided
symmetric test, this is
P (|Tm,n| < z) = P (Tm,n < z)− P (Tm,n < −z).
Letting the corrected critical value zα,m = z1−α/2 + z31−α/2/(4m) as in (7), and ex-
panding via (11) and (10), for C > 0,
P (|Tm,n| < z) = L+ +H+1 −H+
2 +O(n−1/2) + o(m−1 + (m/n)2) with
L+ = Φ(zα,m + C)− Φ(−zα,m + C),
H+1 = φ(zα,m + C)
[m−1u2,γ(zα,m + C) + (m/n)2u3,γ(zα,m + C)
],
H+2 = φ(−zα,m + C)
[m−1u2,γ(C − zα,m) + (m/n)2u3,γ(C − zα,m)
].
I write O(n−1/2) for the n−1/2 terms since they do not depend on m and thus do not
affect the optimization problem for selecting m. Define L−, H−1 , and H−2 similarly
but with −C < 0 instead of C > 0, and thus −γ = −C√p(1− p)/f(ξp) instead of
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION35
γ:
L− = Φ(zα,m − C)− Φ(−zα,m − C),
H−1 = φ(zα,m − C)[m−1u2,−γ(zα,m − C) + (m/n)2u3,−γ(zα,m − C)
],
H−2 = φ(−zα,m − C)[m−1u2,−γ(−C − zα,m) + (m/n)2u3,−γ(−C − zα,m)
].
I calculate average power where the alternatives +C and−C each have 0.5 probability.
Using φ(−x) = φ(x),
P (|Tm,n| < zα,m) =1
2
{(L+ + L−) + (H+
1 +H−1 )− (H+2 +H−2 )
+O(n−1/2) + o(m−1 + (m/n)2)
}.(17)
For the first-order term L+,
Φ(zα,m+C)− Φ(−zα,m + C)
= Φ(C + z1−α/2 + z31−α/2/(4m))− Φ(C − z1−α/2 − z3
1−α/2/(4m))
= 0.5 +m−1 1
4z31−α/2[φ(z1−α/2 + C) + φ(C − z1−α/2)](18)
since in Proposition 5, C solves 0.5 = Φ(z1−α/2 + C) − Φ(−z1−α/2 + C). Since the
fixed-m critical value is larger than the standard normal critical value, this term
contributes additional type II error in the m−1 term. Similarly,
L− = Φ(zα,m − C)− Φ(−zα,m − C)
= 0.5 +m−1 1
4z31−α/2[φ(C − z1−α/2) + φ(−z1−α/2 − C)] = L+.
Within the m−1 and (m/n)2 terms, anything o(1) will end up in the o(m−1 +(m/n)2)
remainder. Thus, for those terms,
zα,m = z1−α/2 +O(m−1), φ(zα,m + C) = φ(z1−α/2 + C) +O(m−1).
For the m−1 terms, since φ(x) = φ(−x) and C ≡ γf(ξp)/√p(1− p), and letting
d1 ≡ zα,m + C, d2 ≡ zα,m − C,
u2,γ(d1)− u2,−γ(−d1)
= −1
4d3
1 −1
4C2d1 +
1
42Cd2
1 −[−1
4(−d1)
3 − 1
4(−C)2(−d1) +
1
42(−C)(−d1)
2
]
36 DAVID M. KAPLAN
= −1
2(d3
1 + C2d1 − 2Cd21) = −1
2(z1−α/2 + C)z2
1−α/2 +O(m−1),
u2,−γ(d2)− u2,γ(−d2) = −1
2(d3
2 + C2d2 + 2Cd22) = −1
2(z1−α/2 − C)z2
1−α/2 +O(m−1),
φ(zα,m + C)m−1(u2,γ(zα,m + C)− u2,−γ(−zα,m − C))
+ φ(zα,m − C)m−1(u2,−γ(zα,m − C)− u2,γ(−zα,m + C))
= −1
2m−1
[φ(z1−α/2 + C)(z1−α/2 + C)z2
1−α/2 + φ(z1−α/2 − C)(z1−α/2 − C)z21−α/2
]+O(m−2).
(19)
For the (m/n)2 terms, writing f(ξp) and its derivatives as f , f ′, f ′′, and letting
M ≡ [3(f ′)2 − ff ′′]/(6f 4),
u3,γ(zα,m + C)− u3,−γ(−zα,m − C)
= M(zα,m + C − C)−M(−zα,m − C − (−C)) = 2Mz1−α/2 +O(m−1),
u3,−γ(zα,m − C)− u3,γ(−zα,m + C)
= M(zα,m − C − (−C))−M(−zα,m + C − C)) = 2Mz1−α/2 +O(m−1),
m2
n2
{φ(zα,m + C)[u3,γ(zα,m + C)− u3,−γ(−zα,m − C)]
+ φ(zα,m − C)[u3,−γ(zα,m − C)− u3,γ(−zα,m + C)]}
= (m/n)223(f ′)2 − ff ′′
6f 4z1−α/2
[φ(z1−α/2 + C) + φ(z1−α/2 − C)
]+O(m/n2).(20)
Combining (18), (19), and (20), (17) becomes
P (|Tm,n| < z) =1
2
{2(0.5 +m−1(1/4)z3
1−α/2[φ(C + z1−α/2) + φ(C − z1−α/2)])
− 2(1/4)m−1[φ(z1−α/2 + C)(z1−α/2 + C)z21−α/2
+ φ(z1−α/2 − C)(z1−α/2 − C)z21−α/2]
+ 2(m/n)2 3(f ′)2 − ff ′′
6f 4z1−α/2
[φ(z1−α/2 + C) + φ(z1−α/2 − C)
]}+O(n−1/2) + o(m−1 + (m/n)2),
which simplifies to the forms given in Proposition 5.
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION37
Appendix D. Proof sketch of Edgeworth expansion (Theorem 1) of
SBG test statistic under local alternative hypothesis
This is a sketch comparable to Hall & Sheather (1987, , “HS87” hereafter); full proof
is in the supplemental appendix. The structure here follows HS88, taking their results
as given (proofs in my full version). Full references for citations found only in this
proof are in HS88. This proof of Theorem 1 uses the same setup and definitions as
in the main text.
As in Section 2, the null hypothesis is H0 : ξp = β, and the true ξp = β − γ/√n. I
continue from (11), which showed that
P (Tm,n < z) = P
(√n(Xn,r − ξp) + γ(Sm,n/S0 − 1)
Sm,n√p(1− p)
< z + C
),
where C ≡ γf(ξp)/√p(1− p), S0 ≡ 1/f(ξp). I want to derive a higher-order expan-
sion around the (shifted) standard normal distribution.
As in HS88, I first deal with centering Xn,r = Xn,bnpc+1 since as n increases, bnpcincreases in discrete steps of one. So instead of Xn,r−ξp there is Xn,r−ηp, where
ηp ≡ F−1
[exp
(−
n∑r
j−1
)].(21)
This centering effect is no different when γ 6= 0; HS88 (p. 389) conclude that the
effect is, defining wn ≡ [εn − 1 + 12(1− p)]/[p(1− p)]1/2,
n1/2(Xn,r − ξp)/τ = n1/2(Xn,r − ηp)/τ + n−1/2wn +Op[n−1/2m−1/2 + (m/n)2n−1/2].
Then the remainder n−1/2m−1/2 + (m/n)2n−1/2 = o[m−1 + (m/n)2], and so
P [n1/2(Xn,r − ξp)/τ ≤ z] = P [n1/2(Xn,r − ηp)/τ ≤ z − n−1/2wn] +O(n−3/2)
= P [n1/2(Xn,r − ηp)/τ ≤ z]− n−1/2wnφ(z) +O(n−1),
P
[n1/2(Xn,r − ξp) + γ(Sm,n/S0 − 1)
τ≤ z
]= P
[n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)
τ≤ z
]− n−1/2wnφ(z) + o[m−1 + (m/n)2].(22)
As HS88 note, this is the so-called ‘delta method’ for deducing Edgeworth expansions;
see Bhattacharya and Ghosh (1978, p. 438). Thus, instead of proving (10), the
38 DAVID M. KAPLAN
theorem follows by proving
sup−∞<z<∞
∣∣∣∣∣P[n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)
τ≤ z
]
− [Φ(z) + n−1/2u†1,γ(z)φ(z) +m−1u2,γ(z)φ(z) + (m/n)2u3,γ(z)φ(z)]
∣∣∣∣∣= o[m−1 + (m/n)2],(23)
where
u†1,γ(z) ≡ 1
6
(p
1− p
)1/21 + p
p(z2 − 1)− γf(ξp)
p
(1− pf ′(ξp)
[f(ξp)]2− 1
2(1− p)
)z
− 1
2
(p
1− p
)1/2(1 +
f ′(ξp)
[f(ξp)]2(1− p)
)z2,
u2,γ(z) ≡ 1
4
[2γf(ξp)
[p(1− p)]1/2z2 − γ2[f(ξp)]
2
p(1− p)z − z3
],
u3,γ(z) ≡ f ′′(ξp)f(ξp) + 3[f ′(ξp)]2
6[f(ξp)]4
(γf(ξp)
[p(1− p)]1/2− z).
Now to prove the above. As defined in HS88, let
G ≡ F−1, g ≡ G′, and H(x) ≡ F−1(e−x),(24)
and let W1,W2, . . . , be independent exponential random variables with unit mean.
The sequence {Xn,s}ns=1 has the same joint distribution as {H(∑
s≤j≤n j−1Wj)}ns=1
(e.g. David, 1981, p. 21), and in a slight abuse of notation suggested by HS88, write
Xn,s = H(∑
s≤j≤n j−1Wj). Let Vj ≡ Wj − 1,
∆1 ≡r−1∑r−m
j−1Vj, ∆2 ≡r+m−1∑
r
j−1Vj, ∆3 ≡n∑
r+m
j−1Vj, ak ≡ H(k)
(n∑r
j−1
),
Z ≡ [p(1− p)]1/2[n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)]/τ(25)
= [n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)][(n/2m)(Xn,r+m −Xn,r−m)]−1.
Note that ∆1 and ∆2 are Op(n−1m1/2) and ∆3 is Op(n
−1/2). With the above and
Taylor expansions, Z = Y +R where
Y ≡ −pn1/2[(1 + δ)(∆2 + ∆3) +B], δ ≡ −(m/n)2g′′(p)[6g(p)]−1,(26)
B ≡ δΨ + (n/m)(b1∆1 + b2∆2)∆2 + (n/m)b3(∆1 + ∆2)(∆3 + Ψ)
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION39
+ (n/m)2b4(∆1 + ∆2)2(∆3 + Ψ) + b5∆3(∆3 + 2Ψ),(27)
b1 ≡ −p/2, b2 ≡ −p/2, b3 ≡ −p/2− (m/n)(a2/2a1), b4 ≡ (p/2)2, b5 ≡ −a2/2a1,
Ψ ≡ γ/(pg(p)√n), and
R = Op[n−1/2m−1/2 + n−3/2m+m−3/2 + (m/n)2+ε].(28)
Here, Y is of the same form as HS88, but with Ψ now additionally showing up in the
higher-order B terms. In the definition of Z, γ only appears in the numerator, not in
the denominator. From the stochastic expansion of Sm,n, which is already required
for the denominator of Z, Sm,n/S0 = 1 + ν, where ν contains the higher-order terms
that are dropped for the first-order asymptotic result,
ν =n
2mp(∆1 + ∆2) +
a2
2a1
(∆1 + ∆2 + 2∆3) + (m/n)2 g′′(p)
6g(p)
+Op((m/n)2+ε + n−1/2m−1/2 +mn−3/2).(29)
Thus, γ enters only in higher-order terms, through the numerator of Z.
Note the Taylor expansion, with sums all over some common range,
H(∑
Wj/j)−H
(∑1/j)
=(∑
Vj/j)H ′(∑
1/j)
+ (1/2)(∑
Vj/j)2
H ′′(∑
1/j)
+ . . . .(30)
For the numerator of Z, applying the expansion above yields
Xn,r − ηp =
(n∑j=r
Vj/j
)H ′
(n∑r
1/j
)+
1
2
(n∑j=r
Vj/j
)2
H ′′
(n∑r
1/j
)+Op(n
−3/2)a3
= (∆2 + ∆3)a1 +1
2(∆2 + ∆3)
2a2 +Op(n−3/2),(31)
since Xn,r = H(∑
j≥rWj/j), ηp ≡ H(∑n
r 1/j), and a3 = O(1). Including γ as
described earlier,
n1/2(Xn,r − ηp) + γ(Sm,n/S0 − 1)
=√n[(∆2 + ∆3)a1 + (1/2)(∆2 + ∆3)
2a2 +Op(n−3/2) + γν
].(32)
40 DAVID M. KAPLAN
The denominator of Z does not include γ, so the stochastic expansion is identical to
the intermediate results from HS87,
Xn,r+m −H
(n∑
r+m
j−1
)−
{Xn,r−m −H
(n∑
r−m
j−1
)}= −a1(∆1 + ∆2)− (m/np)a2(∆1 + ∆2 + 2∆3) +Op(n
−3/2m1/2 + n−5/2m2),(33)
H
(n∑
r+m
j−1
)−H
(n∑
r−m
j−1
)= (2m/n)
[g(p) + (1/6)(m/n)2g′′(p) +O{(m/n)2+ε + n−1}
].(34)
When γ = 0, plugging in (31), (33), and (34) yields
Z = −pn1/2((∆2 + ∆3)(−a1/pg(p))− 1
2pg(p)(∆2 + ∆3)
2a2 +Op(n−3/2))
×[ n
2m
(− (a1/g(p))(∆1 + ∆2)− (m/n)(a2/pg(p))(∆1 + ∆2 + 2∆3)
+Op(n−3/2m1/2 + n−5/2m2)
)+ 1 + (m/n)2 g
′′(p)
6g(p)+O{(m/n)2+ε + n−1}
]−1
≡ −pn1/2Θ(1 + ν)−1 = −pn1/2Θ(1− ν + ν2 +O(ν3)),(35)
defining ν (same as above) and Θ for ease of notation.
If (32) (with the new γ term) is used instead of (31), and noting that above I had
divided through by g(p) in addition to factoring −pn1/2 out of the numerator,
Z = [−pn1/2Θ + γν][1 + ν]−1 = −pn1/2(Θ−Ψν)(1 + ν)−1
= −pn1/2[Θ− (Θ + Ψ)ν + (Θ + Ψ)ν2 +O(n−1/2ν3)],(36)
with Ψ ≡ γ/(pg(p)√n) as in (27). Θ and Ψ are both Op(n
−1/2), and ν = Op(m−3/2),
which is in R, so the remainder works out.
As given in HS88, with restrictions nη ≤ m ≤ n1−η and 0 < ε ≤ 1/6, (28) implies
R = Op{[m−1 + (m/n)2]n−εη},(37)
which they state may be strengthened to
P{|R| > [m−1 + (m/n)2]n−ζ} = O(n−λ), all 0 < ζ < εη and all λ > 0,
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION41
since by Markov’s and Rosenthal’s inequalities (Burkholder (1973), p. 40),
2∑i=1
P (|nm−1/2∆i| > nε) + P (|n1/2∆3| > nε) = O(n−λ), all ε > 0 and all λ > 0.
Thus, the theorem follows from proving the result for Y rather than for Z. As HS88
put it, “Since ∆1,∆2 and ∆3 are independent and have smooth distributions, it is
elementary (but tedious10) to estimate the difference between exp(−t2/2) and the
characteristic function of Y , and then derive an Edgeworth expansion by following
classical arguments, as in Petrov (1975) or Bhattacharya and Ghosh (1978). To simply
identify terms in the expansion here, only moments of Y are needed,” the result of
which is, as stated in (3.2) of HS87 (it turns out that the difference with γ 6= 0 is
that B contains more terms, but B enters the same way in these equations),
E[(−p−1Y )`] = z1(`) + z2(`) + z3(`) +O{m−3/2 +m−1/2(m/n)2},(38)
z1 ≡ n`/2E[{(1 + δ)(∆2 + ∆3)}`], z2 ≡ `n`/2E{(∆2 + ∆3)`−1B},
z3 ≡1
2`(`− 1)n`/2E{(∆2 + ∆3)
`−2B2}.
With the new Ψ terms, letting Di ≡ n1/2∆i to follow the notation of HS88,
z2 = `{n−1/2
[p−2b2E(D`−1
3 ) + 2b5E(D`3)Ψ√n]
+ 2m−1p−2b4[E(D`
3) + E(D`−13 )Ψ
√n]
+ n−1/2b5E(D`+13 ) + δE(D`−1
3 )Ψ√n}(39)
+ `(`− 1)n−1/2p−2b3(E(D`−1
3 ) + E(D`−23 )Ψ
√n)
+O(mn−3/2 +m−1/2n−1/2),
z3 = `(`− 1)m−1b23p−2[E(D`
3) + 2E(D`−13 )Ψ
√n+ E(D`−2
3 )Ψ2n]
+O(m−1/2n−1/2 +m−3/2 + (m/n)4).
(40)
Noting that b3 = b2 +O(m/n) = −(p/2) +O(m/n) and b23 = b4 +O(m/n) = (p/2)2 +
O(m/n),
z2 + z3 = `{n−1/2
[p−2b2E(D`−1
3 ) + 2b5E(D`3)Ψ√n]
+ 2m−1p−2b4[E(D`
3) + E(D`−13 )Ψ
√n]
+ n−1/2b5E(D`+13 ) + δE(D`−1
3 )Ψ√n}
10Again, please see full proof on my website for every “tedious” step.
42 DAVID M. KAPLAN
+ `(`− 1)n−1/2p−2b3(E(D`−1
3 ) + E(D`−23 )Ψ
√n)
+ `(`− 1)m−1b23p−2[E(D`
3) + 2E(D`−13 )Ψ
√n+ E(D`−2
3 )Ψ2n]
+O(mn−3/2 +m−1/2n−1/2 +m−3/2 + (m/n)4)
= n−1/2`{− (`− 1)(2p)−1Ψ
√nE(D`−2
3 )− `(2p)−1E(D`−13 )
− (a2/a1)E(D`3)Ψ√n− (a2/2a1)E(D`+1
3 )}
+m−1`{
((`− 1)/4)E(D`−23 )Ψ2n+ (`/2)Ψ
√nE(D`−1
3 ) + ((`+ 1)/4)E(D`3)}
+ δ`E(D`−13 )Ψ
√n+O
(mn−3/2 +m−1/2n−1/2 +m−3/2 + (m/n)4
).(41)
Plugging in z2 + z3, (38) becomes
E[(−p−1Y )`] = E[(1 + δ)(D2 +D3)]`
+ n−1/2`{− (`− 1)(2p)−1Ψ
√nE(D`−2
3 )− `(2p)−1E(D`−13 )
− (a2/a1)E(D`3)Ψ√n− (a2/2a1)E(D`+1
3 )}
+m−1`{`− 1
4E(D`−2
3 )Ψ2n+`
2Ψ√nE(D`−1
3 ) +`+ 1
4E(D`
3)}
+ δ`E(D`−13 )Ψ
√n+ o
(m−1 + (m/n)2
).(42)
As in HS88, let
K ≡ [p(1− p)]−1/2Y, L ≡ −[p/(1− p)]1/2(1 + δ)(D2 +D3),(43)
and as they state,
E(D2k3 ) = [(1− p)/p]k(2k)!(k!2k)−1 +O(n−1), E(D2k+1
3 ) = O(n−1/2),(44)
where D3 is asymptotically N(0, (1− p)/p). Multiplying equation (42) by {−[p/(1−p)]1/2}`(it)`/`! and adding over `, the formal expansion is
E(eitK) = E(eitL) + n−1/2α1(t) +m−1α2(t) + δα3(t) + o[m−1 + (m/n)2],(45)
where
α1(t) ≡ e−t2/2
[(1
2− b5(1− p)
)(p(1− p))−1/2
((it)3 + (it)
)+ (it)2Ψ
√n
(2b5 −
1
2(1− p)
)],
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION43
α2(t) ≡ e−t2/2 1
4
[(it)4 + (it)2
(3 + Ψ2n
p
1− p
)− 2((it)3 + (it))Ψ
√n
(p
1− p
)1/2],
α3(t) ≡ e−t2/2(it)
(−Ψ√n
(p
1− p
)1/2),
which are, respectively, Fourier-Stieltjes transforms of
a1(z) ≡ −[Ψ√n
(2b5 −
1
2(p− 1)
)z
+ [(1/2)− b5(1− p)][p(1− p)]−1/2z2
]φ(z),(46)
a2(z) ≡ 1
4
[−Ψ2n
p
1− pz + 2Ψ
√n
(p
1− p
)1/2
z2 − z3
]φ(z), and(47)
a3(z) ≡
[Ψ√n
(p
1− p
)1/2]φ(z).(48)
The characteristic function of L is
E(eitL) = {1 + δ(it)2 − n−1/2 1
6[p/(1− p)]1/2[(1 + p)/p](it)3}e−t2/2 +O(δ2 + n−1),
(49)
where the right-hand side is the Fourier-Stieltjes transform of
Φ(z)− δzφ(z) + n−1/2 1
6[p/(1− p)]1/2[(1 + p)/p](z2 − 1)φ(z) +O(δ2 + n−1).(50)
To explicitly show the characteristic function of L, I take an expansion of the cumulant
generating function, and use the known mean, variance, and third moment to plug in,
calculating the third cumulant from the third moment. The fourth cumulant is also
needed to show the bound on the continuation of the series expansion. Skipping
the steps of calculation (found in the full proof), E(L) = 0, E(L2) = 1 + 2δ +
O(δ2 + n−1), E(L3) = −n−1/2(1 + p)p−1/2(1 − p)−1/2 + O(δ2 + n−1). Note, then,
that O((µ′2 − 1)2) = O(δ2). Expanding the log of the characteristic function of L,
which is the cumulant generating function,
lnE(eitL) =∞∑j=1
(it)j
j!κj (κj ≡ jth cumulant)
= (it)κ1 − (t2/2)κ2 +(it)3
6κ3 +
(it)4
4!κ4 + . . .
44 DAVID M. KAPLAN
= (it)µ′1 − (t2/2)(µ′2 − (µ′1)2) +
(it)3
6(µ′3 − 3µ′2µ
′1 + 2(µ′1)
3)
+(it)4
4!(µ′4 − 4µ′3µ
′1 − 3(µ′2)
2 + 12µ′2(µ′1)
2 − 6(µ′1)4) + . . .
= 0− (t2/2)µ′2 +(it)3
6(µ′3) +
(it)4
4!(µ′4 − 3(µ′2)
2) + . . . , so
E(eitL) = exp{−(t2/2)(1 + (µ′2 − 1)) +(it)3
6(µ′3) +
(it)4
4!(µ′4 − 3(µ′2)
2) + . . .},
and plugging in leads to (49).
When simplifying terms, it can easily be verified that the Fourier-Stieltjes transform
of the expression in (50),
Φ(z)− δzφ(z) + n−1/2(1/6)(p/(1− p))1/2((1 + p)/p)(z2 − 1)φ(z)
= Φ(z)− δzφ(z) + C(z2 − 1)φ(z)
= Φ(z) + δφ′(z) + Cφ′′(z),
is e−t2/2 +δ(−1)2(it)2e−t
2/2 +C(−1)3(it)3e−t2/2 = e−t
2/2(1+δ(it)2−C(it)3), matching
(49). The proof in HS88 implies that the conditions are such that the remainder will
be the same in both equations.
Combining results going back to (45) and noting that b5 = −(a2/2a1) = 12+pg′(p)[2g(p)]−1+
O(n−1), the distribution function of K admits the Edgeworth expansion on the RHS
of (23),
P (K ≤ z) = Φ(z) + {n−1/2u†1,γ(z) +m−1u2,γ(z) + (m/n)2u3,γ(z)}φ(z)
+ o{m−1 + (m/n)2},(51)
where
u†1,γ(z) ≡ 1
6
(p
1− p
)1/21 + p
p(z2 − 1)− γf(ξp)
p
(1− pf ′(ξp)
[f(ξp)]2− 1
2(1− p)
)z
− 1
2
(p
1− p
)1/2(1 +
f ′(ξp)
[f(ξp)]2(1− p)
)z2,
u2,γ(z) ≡ 1
4
[2γf(ξp)
[p(1− p)]1/2z2 − γ2[f(ξp)]
2
p(1− p)z − z3
], and
u3,γ(z) ≡ 3[f ′(ξp)]2 − f(ξp)f
′′(ξp)
6[f(ξp)]4
(z − γf(ξp)
[p(1− p)]1/2
),
where the RHS of (51) is identical to the expansion indicated by (23).
IMPROVED QUANTILE INFERENCE: FIXED-SMOOTHING AND EDGEWORTH EXPANSION45
Note that this is for ηp, not ξp (hence u†1,γ instead of u1,γ), but implies the final result
in Theorem 1 when combined with equation (22).
To show the final result, the inverse Fourier-Stieltjes transformed functions can be
added to get the final answers since it is a linear transform (and consequently linear
inverse).
Starting at (45), adding the inverse Fourier-Stieltjes transformed functions of the RHS
leads to the distribution (cdf) of K:
P (K ≤ z) = Φ(z)− δzφ(z) + n−1/2 1
6{p/(1− p)}1/2{(1 + p)/p}(z2 − 1)φ(z)
+O(δ2 + n−1) + n−1/2a1(z) +m−1a2(z) + δa3(z) + o{m−1 + (m/n)2}
= Φ(z) + δφ(z)
(Ψ√n
(p
1− p
)1/2
− z
)
+ n−1/2φ(z)
[1
6
(p
1− p
)1/21 + p
p(z2 − 1)−Ψ
√n
(2b5 −
1
2(1− p)
)z
− ((1/2)− b5(1− p))(p(1− p))−1/2z2
]
+m−1φ(z)1
4
[−Ψ2n
p
1− pz + 2Ψ
√n
(p
1− p
)1/2
z2 − z3
]+ o{m−1 + (m/n)2},
and using δ ≡ −(m/n)2 g′′(p)
6g(p)and Ψ = γ/(pg(p)
√n),
= Φ(z) + (m/n)2 g′′(p)
6g(p)φ(z)
(z − γ
g(p)[p(1− p)]1/2
)+ n−1/2φ(z)
[1
6
(p
1− p
)1/21 + p
p(z2 − 1)
− γ
pg(p)
(1 +
pg′(p)
g(p)− 1
2(1− p)
)z
− 1
2
(p
1− p
)1/2(1− g′(p)
g(p)(1− p)
)z2
]
+m−1φ(z)1
4
[− γ2
[g(p)]2p(1− p)z +
2γ
g(p)[p(1− p)]1/2z2 − z3
]
46 DAVID M. KAPLAN
+ o{m−1 + (m/n)2},
and using g(p) = 1/f(F−1(p)), g′(p) = −f ′(F−1(p))/[f(F−1(p))]3, and g′′(p) =
[−f ′′(ξp)f(ξp) + 3[f ′(ξp)]2]/[f(ξp)]
5, and thus g′(p)/g(p) = −f ′(ξp)/[f(ξp)]2,
= Φ(z) + (m/n)2φ(z)3[f ′(ξp)]
2 − f(ξp)f′′(ξp)
6[f(ξp)]4
(z − γf(ξp)
[p(1− p)]1/2
)+ n−1/2φ(z)
[1
6
(p
1− p
)1/21 + p
p(z2 − 1)
− γf(ξp)
p
(1− pf ′(ξp)
[f(ξp)]2− 1
2(1− p)
)z
− 1
2
(p
1− p
)1/2(1 +
f ′(ξp)
[f(ξp)]2(1− p)
)z2
]
+m−1φ(z)1
4
[2γf(ξp)
[p(1− p)]1/2z2 − γ2[f(ξp)]
2
p(1− p)z − z3
]+ o{m−1 + (m/n)2}
= Φ(z) + n−1/2u†1,γ(z)φ(z) + (m/n)2u3,γ(z)φ(z) +m−1u2,γ(z)φ(z)
+ o{m−1 + (m/n)2}
as in (51).
Department of Economics, University of Missouri–Columbia