inference on distributions, quantiles, and quantile...

Distributions Quantiles IDEAL Simulations

Inference on distributions, quantiles,and quantile treatment effectsusing the Dirichlet distribution

David M. KaplanUniversity of Missouri (Economics)

Statistics Colloquium22 April 2014

Dave Kaplan (Mizzou Economics) Dirichlet-based distributional and quantile inference 1 / 66


Papers and coauthor

Top 3 (currently) under “Working Papers” on my research webpage:

“IDEAL quantile inference via interpolated duals of exactanalytic L-statistics” (with Matt Goldman, UC San Diego)

“IDEAL inference on conditional quantiles”

“True equality (of pointwise sensitivity) at last: a Dirichletalternative to Kolmogorov-Smirnov inference on distributions”(with Matt)



Outline

1 Inference on distributions

2 Inference on quantiles

3 Quantile inference: theory and methods

4 Quantile simulations

5 Conclusion







5 Conclusion



Wilks (1962)

Yiiid∼ F , continuous Ô⇒ F (Yi) d= Ui iid∼ Uniform(0,1)

Yn∶k: kth order statistic (kth-smallest value in sample);

F (Yn∶k) d= Un∶k ∼ β(k,n + 1 − k){F (Yn∶1), . . . , F (Yn∶n)} ∼ Dir∗(1, . . . ,1; 1)



Wilks (1962)



F (Yn∶k) d= Un∶k ∼ β(k,n + 1 − k)

{F (Yn∶1), . . . , F (Yn∶n)} ∼ Dir∗(1, . . . ,1; 1)



Wilks (1962)



F (Yn∶k) d= Un∶k ∼ β(k,n + 1 − k){F (Yn∶1), . . . , F (Yn∶n)} ∼ Dir∗(1, . . . ,1; 1)



−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

N(0,1), n=21

Y

Pro

babi

lity



−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

N(0,1), n=21, k=4, 90% CI

Y

Pro

babi

lity



−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

N(0,1), n=21, k=11, 90% CI

Y

Pro

babi

lity



−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

N(0,1), n=21, k=17, 90% CI

Y

Pro

babi

lity



Inference on distributions

Beta: 1 − α̃ CI for F (⋅) at each Yn∶k

Dirichlet: which α̃ makes 1 − α uniform confidence band

Hypothesis testing, incl. 2-sample/FOSD

n tests controlling FWER



−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

N(0,1), n=21

Y

Pro

babi

lity



−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

90% uniform confidence band

Y

F(Y

)



Comparison with Kolmogorov–Smirnov

Kolmogorov–Smirnov test (Kolmogorov, 1933; Smirnov, 1939,1948):

Statistic: Dn = supy√n∣F̂ (y) − F (y)∣

Limit: supt∈(0,1) ∣B(t)∣ (Brownian bridge)

“weighted KS”: weight by inverse asymptotic standarddeviation, 1/

√F (y)[1 − F (y)] (Anderson and Darling, 1952)




Similarities with KS:

Nonparametric, distribution-freeExact, finite-sample coverage/sizeFast to compute (given pre-computed α̃ ↦ α mapping/lookup)Identifies specific regions (quantiles) where equality is rejected(e.g., where a treatment has a measurable effect)

Primary advantage:

KS is “insensitive in tails” (Brownian bridge variance biggest inmiddle)Weighted KS tries for “equal weights” (p. 203), butoverweights tails (different asymptotics for “extreme” orderstatistics, Yn∶r w/ fixed r as n→∞)Dirichlet: equal relative pointwise type I error rates byconstruction






Primary advantage:

KS is “insensitive in tails” (Brownian bridge variance biggest inmiddle)

Weighted KS tries for “equal weights” (p. 203), butoverweights tails (different asymptotics for “extreme” orderstatistics, Yn∶r w/ fixed r as n→∞)Dirichlet: equal relative pointwise type I error rates byconstruction






Primary advantage:

KS is “insensitive in tails” (Brownian bridge variance biggest inmiddle)Weighted KS tries for “equal weights” (p. 203), butoverweights tails (different asymptotics for “extreme” orderstatistics, Yn∶r w/ fixed r as n→∞)

Dirichlet: equal relative pointwise type I error rates byconstruction






Primary advantage:

KS is “insensitive in tails” (Brownian bridge variance biggest inmiddle)Weighted KS tries for “equal weights” (p. 203), butoverweights tails (different asymptotics for “extreme” orderstatistics, Yn∶r w/ fixed r as n→∞)Dirichlet: equal relative pointwise type I error rates byconstruction



5 10 15 20

0.00

0.01

0.02

0.03

0.04

Pointwise type I error, n=20

Order statistic

Rej

ectio

n pr

obab

ility

Dirichlet KS weighted KS



0 20 40 60 80 100

0.00

0.01

0.02

0.03

0.04

Pointwise type I error, n=100

Order statistic

Rej

ectio

n pr

obab

ility




Comparison with KS

Asymptotic pointwise type I error rates (mostly droppingconstants):

KS: at central order statistics, constant (depends on pointwisevariance); extreme: exp{−√n}Weighted KS: central exp{− ln[ln(n)]}, extremeexp{−

√ln[ln(n)]}

Dirichlet: central and extremeexp{−c1(α) − c2(α)

√ln[ln(n)]}



0.0 0.5 1.0 1.5 2.0 2.5

02

46

810

12

Fitted and simulated α~(α, n)

ln[ln(n)]

−ln

(α~)



0.0 0.5 1.0 1.5 2.0 2.5

02

46

810

12

Universally fitted and simulated α~(α, n)

ln[ln(n)]

−ln

(α~)



H0 ∶ N(0,1); data ∶ N(0.3,1); n = 100, α = 0.1, 106 replications

Overall power: 82.4% KS, 80.5% Dirichlet, 62.4% weighted KS

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

H1:pnorm(qnorm(x, 0.3, 1), 0, 1)

Quantile

G F−1

(q)

H0

True

0 20 40 60 80 100

0.0

0.1

0.2

0.3

0.4

0.5

Pointwise power, n=100H1:pnorm(qnorm(x, 0.3, 1), 0, 1)

Order statistic

Rej

ectio

n pr

obab

ility




H0 ∶ N(0,1); data ∶ N(0,0.7); n = 100, α = 0.1, 106 replications

Overall power: 98.6% weighted KS, 92.0% Dirichlet, 65.6% KS

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

H1:pnorm(qnorm(x, 0, 0.7), 0, 1)

Quantile

G F−1

(q)

H0

True

0 20 40 60 80 100

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Pointwise power, n=100H1:pnorm(qnorm(x, 0, 0.7), 0, 1)

Order statistic

Rej

ectio

n pr

obab

ility




H0 ∶ N(0,1); data ∶ N(0,1.2); n = 100, α = 0.1, 106 replications

Overall power: 64.2% Dirichlet, 25.5% KS, 2.8% weighted KS

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

H1:pnorm(qnorm(x, 0, 1.2), 0, 1)

Quantile

G F−1

(q)

H0

True

0 20 40 60 80 100

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Pointwise power, n=100H1:pnorm(qnorm(x, 0, 1.2), 0, 1)

Order statistic

Rej

ectio

n pr

obab

ility




Empirical example: testing family of quantile treatment effect nullhypotheses

0.0 0.2 0.4 0.6 0.8 1.0

3040

5060

70

Gift wage: library task, period 1

Quantile

Boo

ks e

nter

ed

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●● Treatment

Control

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

80

Gift wage: fundraising task, period 1

Quantile

Dol

lars

rai

sed

●●

● ● ● ●● ● ●

●

● ●

●

● ●

● ● ● ●● ● ●

●

●●

●● Treatment

Control



Distributional inference summary

Inference on distributions, one-sample and two-sample

Precise control of familywise error rate and pointwise errorrate, unlike KS

Fast computation from simulation-calibrated α̃(α,n)Extensions: improve power via step-down/step-up; k-FWER;conditional distributions


Distributions Quantiles IDEAL Simulations Normality Order statistics Dirichlet





5 Conclusion



Exact inference on a quantile?

Let n = 11, consider only k = 9

CI for CDF: take quantiles of F (Y11∶9) ∼ β(9,3)

CI for quantile:

P (F (Y11∶9) > 0.53) = 95% = P (Y11∶9 > F−1(0.53)),

so Y11∶9 is endpoint for lower one-sided CI for 0.53-quantile

Exact, finite-sample coverage

But what if I care about the median instead?



Exact inference on a quantile?

Let n = 11, consider only k = 9

CI for CDF: take quantiles of F (Y11∶9) ∼ β(9,3)CI for quantile:

P (F (Y11∶9) > 0.53) = 95% = P (Y11∶9 > F−1(0.53)),

so Y11∶9 is endpoint for lower one-sided CI for 0.53-quantile

Exact, finite-sample coverage

But what if I care about the median instead?



Quantile inference via asymptotic normality

Ex: inference on median QY (0.5), using iid {Yi}ni=1

√n(Q̂ −Q0)

d→ N(0, σ2Q)1-sided: (−∞, Q̂ + 1.64σ̂Q/

√n)

Why do we do that?

P (Q̂ + 1.64σ/√n < Q0) = 0.05 = α

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

Y

Density

Determination of Upper Endpointn=11, p=0.5, α=0.05, Y~Unif(0,1)

Q̂ + 1.64σ n

5%

But: σ hard to estimate; asymptotic approx (worse in tails)





√n(Q̂ −Q0)

d→ N(0, σ2Q)

1-sided: (−∞, Q̂ + 1.64σ̂Q/√n)

Why do we do that?

P (Q̂ + 1.64σ/√n < Q0) = 0.05 = α

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

Y

Density


Q̂ + 1.64σ n

5%






√n(Q̂ −Q0)

d→ N(0, σ2Q)1-sided: (−∞, Q̂ + 1.64σ̂Q/

√n)

Why do we do that?

P (Q̂ + 1.64σ/√n < Q0) = 0.05 = α

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

Y

Density


Q̂ + 1.64σ n

5%




Normal Approximation for Uniform Sample 0.50-quantile, n = 11

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

Normal Approximation for Uniform Sample Quantiles, n=11

u

Density

ExactNormal approx




0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6


u

Density

ExactNormal approx



Quantile inference via order statistics

Ex: inference on median QY (0.5), using iid {Yi}ni=1Order statistic: Yn∶k is kth smallest out of n in {Yi}ni=1Approach: (−∞, Yn∶k] as lower one-sided CI; which k?

Want: α = P (Yn∶k < F−1Y (0.5)) = P (Un∶k < 0.5)

Use: Uiiid∼ Unif(0,1)



Endpoint selection using beta distribution

α = 0.05 = P (Un∶k < 0.5)

Un∶k ∼ Beta(k,n + 1 − k), k ∈ {1,2, . . . , n}

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Density

Determination of Upper Endpointn=11, p=0.5, α=0.05

α = 5%

k = 8.69




α = 0.05 = P (Un∶k < 0.5)

Un∶k ∼ Beta(k,n + 1 − k), k ∈ {1,2, . . . , n}

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Density

Determination of Upper Endpointn=11, p=0.5

k = 6

P(Un:6 < 0.5) = 50%

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Density


α = 5%

k = 8.69




α = 0.05 = P (Un∶k < 0.5)

Un∶k ∼ Beta(k,n + 1 − k), k ∈ {1,2, . . . , n}

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Density


k = 6

P(Un:6 < 0.5) = 50%k = 8

11.3%

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Density


α = 5%

k = 8.69




α = 0.05 = P (Un∶k < 0.5)

Un∶k ∼ Beta(k,n + 1 − k), k ∈ {1,2, . . . , n}

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Density


k = 6

P(Un:6 < 0.5) = 50%k = 8

11.3%

k = 9

3.3%

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Density


α = 5%

k = 8.69




α = 0.05 = P (Un∶k < 0.5)

Un∶k ∼ Beta(k,n + 1 − k), k ∈ {1,2, . . . , n} Ô⇒ k ∈ [1, n] ⊂ R

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Density


k = 6

P(Un:6 < 0.5) = 50%k = 8

11.3%

k = 9

3.3%

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Density


α = 5%

k = 8.69



Fractional order statistics

α = P (Un∶k < 0.5), U In∶k ∼ Beta(k,n + 1 − k)

Connecting back to reality, k ∈ [1, n] ⊂ R:

(observed) ULn∶k = (1 − ε)Un∶⌊k⌋ + εUn∶⌊k⌋+1, ε ≡ k − ⌊k⌋

(Jones 2002) U In∶kd= (1 −Cε)Un∶⌊k⌋ +CεUn∶⌊k⌋+1, Cε ∼ Beta(ε,1 − ε)

Hutson (1999): suggests this method but w/o theoreticaljustification



Fractional order statistics

α = P (Un∶k < 0.5), U In∶k ∼ Beta(k,n + 1 − k)Connecting back to reality, k ∈ [1, n] ⊂ R:

(observed) ULn∶k = (1 − ε)Un∶⌊k⌋ + εUn∶⌊k⌋+1, ε ≡ k − ⌊k⌋

(Jones 2002) U In∶kd= (1 −Cε)Un∶⌊k⌋ +CεUn∶⌊k⌋+1, Cε ∼ Beta(ε,1 − ε)

Hutson (1999): suggests this method but w/o theoreticaljustification



Recap:

Endpoint is Yn∶k (instead of Q̂ + 1.64σ̂/√n)

Determine exact, fractional k

Approx Y In∶k by Y L

n∶k, Interpolated Dual of Exact AnalyticL-statistic (IDEAL): O(n−1) coverage probability error


Distributions Quantiles IDEAL Simulations Unconditional Conditional





5 Conclusion



Assumptions

Assumption (PDF)

For each quantile uj , the PDF f(⋅) satisfies (i) f(F−1(uj)) > 0;(ii) f ′′(⋅) is continuous in some neighborhood of F−1(uj).

Assumption (sampling)

Independent samples {Xi}nxi=1 and {Yi}ny

i=1 are drawn iid fromrespective CDFs FX and FY . Respective quantile functions aredenoted QX(⋅) ≡ F −1

X (⋅) and QY (⋅) ≡ F−1Y (⋅).

Assumption (sample sizes)

Sample sizes grow as limnx→∞

√ny/nx ≡ µ > 0.



Theorem: fractional order statistic processes

Theorem

Wherever PDF≥ δ,

supk∈(1,n)

∣XIn∶k −XL

n∶k∣ = Op(n−1[logn]).



Lemma (Dirichlet PDF approximation)

Basically, precise approximation of Dirichlet PDF and derivative atvalues that for our purposes (i.e., for quantile inference) are drawnwith probability quickly approaching one.



Theorem (CDF approximation)

Let LI = lin. com. of idealized fractional order statistics,

LL = lin. com. of linearly interpolated fractional order statistics.

(i) For samples {Xi}ni=1 with Xiiid∼ F , and for a given constant K,

P(LL <K) − P(LI <K)

= C1(K,n, . . .)n−1 +O(n−3/2 log(n)).

(ii) For samples {Xi}ni=1 with Xiiid∼ F ,

supK∈R

[P(LL <K) − P(LI <K)]

= C2(n, . . .)n−1 +O(n−3/2 log(n))



Theorem (CDF approximation)

Let LI = lin. com. of idealized fractional order statistics,

LL = lin. com. of linearly interpolated fractional order statistics.

(iii) Given independent samples {Xi}nxi=1 and {Yi}ny

i=1 with

Xiiid∼ FX and Yi

iid∼ FY , with nx ∝ ny ∝ n,

supK∈R

[P (LLX +LLY <K) − P (LIX +LIY <K)] = O(n−1)



CPE: single quantile

Lower one-sided coverage probability for Q(u) is

P(XLn∶k(α) > Q(u))

Thm 3= P(XIn∶k(α) > Q(u)) + εh(1 − εh)z1−α exp{−z

21−α/2}

uh(α)(1 − uh(α))√2π

n−1

+O(n−3/2 log(n)),

P(XIn∶k(α) > Q(u)) = P(U In∶k(α) > u) = 1 − α.

Ô⇒ one- or two-sided CI: CPE is O(n−1)Ô⇒ calibrated CI: CPE is O(n−3/2 log(n))



CPE: quantile treatment effect

Coverage probability (lower one-sided CI)

= P{Y Ln∶m(α) −X

Ln∶k(α) > QY (p) −QX(p)}

= P(Y In∶m(α) −X

In∶k(α) > QY (p) −QX(p)} +O(n−1)

= P{F−1Y (U I,1

n∶m(α)) − F−1

Y (p)

− [F−1X (U I,2

n∶k(α)) − F −1

X (p)] > 0} +O(n−1)

= P⎧⎪⎪⎪⎨⎪⎪⎪⎩

U I,1n∶m(α)

− pfY (F−1

Y (p))−U I,2n∶k(α)

− pfX(F −1

X (p))> 0

⎫⎪⎪⎪⎬⎪⎪⎪⎭+ T +O(n−1)

= P{γ(U I,1n∶m(α)

− p) > U I,2n∶k(α)

− p} + T +O(n−1)

= 1 − α +C + T +O(n−1)

C: conditioning on γ̂; T: Taylor remainder; O(n−1): interpolation



CPE: joint, linear combinations

Linear combination: object of interest is ∑Jj=1ψjQ(uj). CPE is

same as for QTE: one-sided O(n−1/2), two-sided O(n−2/3)

Joint: object of interest is (Q(u1), . . . ,Q(uJ)). One- or two-sidedCPE is O(n−1)

Setting Type CPE (two-sided)

1-sample single quantile O(n−3/2 log(n))1-sample joint (multiple quantiles) O(n−1)1-sample linear combination O(n−2/3)2-sample treatment effect O(n−2/3)

CPE ≡ P (θ0 ∈ CI) − (1 − α)



Nonparametric conditional model

0.0 0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.0

0.5

1.0

Data with IDEAL 95% Confidence Intervals

X

Y

Data




0.0 0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.0

0.5

1.0


X

Y

DataPointwise




Complements Fan and Liu (2013): they show that the samemethod is valid in wide range of sampling assumptions; I showhigh-order accuracy under somewhat stronger assumptions



Setup

Object of interest: QY ∣X(p;X = x0)Observables: (Xi, Yi) ∈ Rd ×R

Discrete X easily added: no bias

Definition (effective sample)

effective sample window: Ch ≡ {x ∶ ∥x − x0∥ ≤ h}effective sample: {Yi ∶Xi ∈ Ch}

effective sample size: Nn ≡#{Yi ∶Xi ∈ Ch}



Single quantile: pointwise, joint over multiple x0

0.0 0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.0

0.5

1.0


X

Y

DataPointwiseJoint



Joint over multiple quantiles

0.0 0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.0

0.5

1.0

Pointwise (by x0) IDEAL 95% CIs forconditional joint upper and lower quartiles

X

Y

DataUpper quartile CILower quartile CI



Linear combinations (interquartile range)

0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Pointwise IDEAL 95% CIs forconditional interquartile range (IQR)

X

IQR

True IQRIDEAL CI



Conditional quantile treatment effects



Approach

Inference on QY ∣X(p;x0) using Yi in shrinking Ch

Use 1-sample IDEAL method on effective sample; CPE interms of Nn

Derive nonparametric bias from using Xi ≠ x0Optimal bandwidth; overall CPE



Conditional: assumptions

(1) cts (Xi, Yi) ∈ Rd ×R, Yi = QY ∣X(p;Xi) + εi, εi ⊥⊥ εj ∣ {Xk}nk=1,(εi ∣Xi = x) ∼ Fε∣X=x. (More general sampling: Fan and Liu, 2013working paper, Gaussian limits)

Assumption (bias)

(2) fX(x0) > 0; fX(⋅) has smoothness sX = kX + γX > 1 near x0(3) ∀u in neighborhood of p, QY ∣X(u; ⋅) has smoothnesssQ = kQ + γQ > 2(4i) As n→∞, h→ 0

Assumption (IDEAL)

(4ii) As n→∞, nhd/[log(n)]2 →∞ (Ô⇒ Nna.s.→ ∞)

(5) fY ∣X(QY ∣X(p;x0);x0) > 0 (unique cond. quantile)(6) Smoothness: ∀y in neighborhood of QY ∣X(u;x0), ∀x inneighborhood of x0, fY ∣X(y;x) ∈ C2 (wrt y)



Bias: QY ∣Ch(p) −QY ∣X(p;x0)

Theorem (2)

Under Assumptions 2–4(i) and 6, ξp ≡ QY ∣X(p;x0):

Bias = −h2fX(x0)F (0,2)

Y ∣X(ξp;x0) + 2f ′X(x0)F (0,1)

Y ∣X(ξp;x0)

6fX(x0)fY ∣X(ξp;x0)+ o(h2)



Optimal bandwidth and CPE rates (two-sided)

CPE-optimal: h∗ = argminhCPE(h)CPE = CPEGK +CPEBias = O(N−1

n ) +O(Nnh4)

Nn ≍ nhd

Theorem (3)

Under Assumptions 1–6, two-sided CPE-optimal:

h∗ ≍ n−1/(2+d) CPE = O(n−2/(2+d))

Ex: d = 1 Ô⇒ h∗ ≍ n−1/3, CPE = O(n−2/3)



O(n−2/3)



Optimal CPE comparison (two-sided)

IDEAL: CPE = O(n−2/(2+d))Local polynomial/asy. normality: CPE = O(n−[sQ/(sQ+d)]/2)d = 1 or d = 2: IDEAL better even if assume sQ =∞d = 3: IDEAL better unless assume kQ ≥ 11, 11th-degree localpolynomial (364 terms)

d→∞: IDEAL better unless kQ ≥ 4



IDEAL CPE summary

Setting Type CPE (two-sided)

1-sample single quantile O(n−3/2 log(n))1-sample joint (multiple quantiles) O(n−1)1-sample linear combination O(n−2/3)2-sample treatment effect O(n−2/3)

conditional single quantile O(n−2/(2+d))conditional joint O(n−2/(2+d))conditional linear combination O(n−8/(12+7d))conditional treatment effect O(n−8/(12+7d))

CPE ≡ P (θ0 ∈ CI) − (1 − α)



General kernel?

Above: implicit uniform kernel K(x/h) = 1{−h < x < h}Let f̃(x) = f(x)K(x/h)/E[K(x/h)]; can show bias is O(hr)for rth-order kernel for p-quantile from density

∫R fY ∣X(y;x)f̃(x)dxIf K(⋅) ≥ 0, then can perform rejection sampling (from fullsample of n) to get iid draws from this distribution; the f(x)cancels out, so probability of acceptance only depends onK(⋅), which is known—P (accept) =K(xi/h)/ supxK(x)But if r > 2, then some K(x) < 0: can’t have negativeacceptance probability (. . . right?)







5 Conclusion



Unconditional QTE: power

Location difference of one unit between medians of X and Ynx = ny = 25, FX = FY , α = 0.05, p = 0.5

N(0,1) Logistic(0,1) Exp(1) LogN(0,1)

IDEAL 0.79 0.39 0.91 0.75K11 0.70 0.31 0.86 0.64Hut07 0.75 0.38 0.87 0.57BS-t (99) 0.70 0.35 0.84 0.68BS-t (999) 0.71 0.36 0.86 0.68Horowitz98 0.62 0.27 0.87 0.66



Conditional: other methods

quantreg::rqss (nonparametric; est. asy. covariance matrix)

quantreg::rq (parametric; bootstrap)



Conditional: setup

Koenker rqss simulations: iid errors (various), σ(X) = 0.2 or0.2(1 +X), median, n = 400, α = 0.05

0.0 0.2 0.4 0.6 0.8 1.0

-0.4

-0.2

0.0

0.2

0.4

x(1 − x)sin⎛⎝2π(1 + 2−7 5) (x + 2−7 5)⎞⎠

x

QY|X(p;x)



Conditional: runtime

Runtimes (log-log)

Sample size

Runt

ime

(min

utes

)

200 1,000 5,000 20,000 100,000

0.01

0.1

110

100

IDEALBS(99)rqss



Pointwise CP: Gaussian, homoskedastic

0.2 0.4 0.6 0.8

020

4060

80100

Pointwise Coverage Probability

X

CP (%

)

NominalIDEALrqssrq(perc)



Pointwise CP: Cauchy, homoskedastic

0.2 0.4 0.6 0.8

020

4060

80100


X

CP (%

)




Pointwise CP: centered χ23, heteroskedastic

0.2 0.4 0.6 0.8

020

4060

80100


X

CP (%

)




0.0 0.2 0.4 0.6 0.8 1.0

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

Joint Hypotheses

X

Y

00

0

000

0

0

0000

0

0

0

000000

00000000000000000000000000

0+-

no shift



0.0 0.2 0.4 0.6 0.8 1.0

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

Joint Hypotheses

X

Y

00

0

000

0

0

0000

0

0

0

000000

00000000000000000000000000

++

+

+++

+

+

++++

+

+

+

++++++

++++++++++++++++++++++++++

0+-

no shiftshift = +0.1



0.0 0.2 0.4 0.6 0.8 1.0

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

Joint Hypotheses

X

Y

00

0

000

0

0

0000

0

0

0

000000

00000000000000000000000000

++

+

+++

+

+

++++

+

+

+

++++++

++++++++++++++++++++++++++- -

-

-- -

-

-

-- - -

-

-

-

--- - - -

---------- -- - - - - - - - - - - - - - -

0+-

no shiftshift = +0.1shift = −0.1



Joint power curves: Gaussian, homoskedastic

-0.4 -0.2 0.0 0.2 0.4

020

4060

80100

Joint Power Curves

Deviation of Null from Truth

Rej

ectio

n Pr

obab

ility

(%)

IDEALrqssrq(perc)



Joint power curves: Cauchy, homoskedastic

-0.4 -0.2 0.0 0.2 0.4

020

4060

80100

Joint Power Curves


Rej

ectio

n Pr

obab

ility

(%)

IDEALrqssrq(perc)



Joint power curves: centered χ23, heteroskedastic

-0.4 -0.2 0.0 0.2 0.4

020

4060

80100

Joint Power Curves


Rej

ectio

n Pr

obab

ility

(%)

IDEALrqssrq(perc)



Pointwise power against ±0.1: Gaussian, homoskedastic

0.2 0.4 0.6 0.8

020

4060

80100

Pointwise Power

Deviation: -0.1 or 0.1X

Rej

ectio

n Pr

obab

ility

(%)

IDEAL rqss



Pointwise power against ±0.1: Cauchy, homoskedastic

0.2 0.4 0.6 0.8

020

4060

80100

Pointwise Power


Rej

ectio

n Pr

obab

ility

(%)

IDEAL rqss



Pointwise power against ±0.1: centered χ23, heteroskedastic

0.2 0.4 0.6 0.8

020

4060

80100

Pointwise Power


Rej

ectio

n Pr

obab

ility

(%)

IDEAL rqss



Conditional QTE: median

0.0 0.2 0.4 0.6 0.8 1.0

-0.4

-0.2

0.0

0.2

0.4

x0s

y0s

CPLength

0.925 0.965 0.945 0.920 0.955 0.9500.367 0.354 0.324 0.283 0.271 0.323



Conditional quantile treatment effects

x0 valuep 0.04 0.224 0.408 0.592 0.776 0.96 Joint

Coverage Probability0.75 0.938 0.924 0.938 0.940 0.948 0.960 0.9520.50 0.926 0.962 0.964 0.952 0.944 0.952 0.9460.25 0.948 0.960 0.948 0.930 0.922 0.924 0.932

Median Interval Length0.75 0.366 0.364 0.325 0.281 0.268 0.3220.50 0.348 0.340 0.323 0.282 0.250 0.2970.25 0.373 0.373 0.337 0.316 0.258 0.341

Table: CP and median interval length for IDEAL CIs for conditionalQTEs; 1 − α = 0.95, n = 400 for both treatment and control samples, 500replications.







5 Conclusion



Recap

Wilks (1962) Dirichlet: inference on distributions

Fractional order statistic theory (Stigler 1977): IDEALinference on quantiles

Methods: single quantile (Hutson 1999), joint, linearcombinations, quantile treatment effects; unconditional,conditional (complementing Fan and Liu 2013)

Results: improved CPE in most common cases, robust

R code, examples, papers on my website



Inference on distributions, quantiles,and quantile treatment effectsusing the Dirichlet distribution

David M. KaplanUniversity of Missouri (Economics)

Statistics Colloquium22 April 2014


inference on distributions, quantiles, and quantile...

Documents