optimal sup-norm rate, adaptive estimation, and inference ... chen.pdf · general sieve npiv...

46
Optimal Sup-norm Rate, Adaptive Estimation, and Inference on NPIV Xiaohong Chen (Yale) and Tim Christensen (NYU) Cemmap Celebration Conference | Andrew’s Birthday Conference November 14-16, 2014

Upload: others

Post on 31-Mar-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Optimal Sup-norm Rate, Adaptive Estimation,and Inference on NPIV

Xiaohong Chen (Yale) and Tim Christensen (NYU)

Cemmap Celebration Conference | Andrew’s Birthday ConferenceNovember 14-16, 2014

Page 2: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Introduction (1)

I we consider nonparametric instrumental variables (NPIV) regression:

Yi = h0(Xi) + ui

E[ui|Xi] 6= 0

E[ui|Wi] = 0

I endogeneity is an important issue in economicsI nonparametric in h0 avoids functional form misspecification

I h0 is identified via the conditional moment restriction

E[Yi|Wi] = E[h0(Xi)|Wi]

I this “smoothes out” features of h0, making h0 difficult to recoverI NIPV is an ill-posed inverse problem with unknown operator

Page 3: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Introduction (2)

I there is a large and growing literature on NPIV:

1. identification/consistency: Newey & Powell (03); Carrasco, Florens& Renault (07); Andrews (11);...

2. convergence rates in L2 norm: Hall & Horowitz (05); Blundell, Chen& Kristensen (BCK, 07); Chen & Reiß (11); Darolles, Fan, Florens &Renault (11);...

3. almost rate-adaptive estimation in L2: Horowitz (14)4. almost rate-adaptive estimation of linear functionals: Breunig &

Johannes (13)5. inference on linear functionals of h0: Ai & Chen (AC, 03, 07);

Carrasco, Florens & Renault (07) Horowitz & Lee (13)6. inference on nonlinear functionals of h0: Chen & Pouzo (14)7. testing: Horowitz (12); Canay, Santos & Shaikh (13); Breunig

(13);...8. partial identification: Santos (12); Freyberger & Horowitz (13);...

I all the existing published results on NPIV are based on L2 norm.

Page 4: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Contributions of this paper1. we derive the upper bound on sup-norm convergence rates for

general sieve NPIV estimators.

2. we derive minimax lower bounds in sup-norm loss over Holder classof functions for NPIR (nonparametric indirect regression) and NPIV.

3. we show that spline and wavelet sieve NPIV estimators attain thesup-norm minimax lower bounds, and hence attain the optimalsup-norm convergence rates.

4. we introduce a data-driven procedure for choosing the dimension ofthe sieve NPIV that is sup-norm rate-adaptive

5. we provide inference theory for plug-in sieve NPIV estimators ofnonlinear functionals of h0 under mild conditions.

I An application: inference on exact consumer surplus innonparametric demand estimation when both price and income areendogenous.

Page 5: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Parametric vs nonparametric IV

I parametric IV model

Yi = X ′iβ0 + ui

E[uiXi] 6= 0

E[uiWi] = 0

I identified if rank(E[XiW′i ]) = dim(β0)

I nonparametric IV model

Yi = h0(Xi) + ui

E[ui|Xi] 6= 0

E[ui|Wi] = 0

I identified if h 7→ E[h(Xi)|Wi = ·] is injective

Page 6: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Parametric vs sieve nonparametric IV

I A parametric IV model can be estimated via 2SLS:

β = [X ′W (W ′W )−1W ′X]−1[X ′W (W ′W )−1W ′Y ]

I NP (03), AC (03), BCK (07): A nonparametric IV model can beestimated via sieve NPIV, i.e., 2SLS on basis functions

h(x) = ψJ(x)′c

c = [Ψ′B(B′B)−1B′Ψ]−1Ψ′B(B′B)−1B′Y

ψJ(x) = (ψJ1(x), . . . , ψJJ(x))′, Ψ = (ψJ(X1), . . . , ψJ(Xn))′

bK(w) = (bK1(w), . . . , bKK(w))′, B = (bK(W1), . . . , bK(Wn))′

I K ≥ J , with J = sieve number of endogenous regressors (the keysmoothing parameter), K = sieve number of instruments.

I Horowitz (11): modified sieve NPIV: K = J andbK = ψJ=orthognormal series of L2([0, 1]d).

Page 7: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Outline

1. Optimal sup-norm rates

2. Sup-norm rate-adaptive estimation

3. MC study I: Adaptive estimation procedure

4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals

5. MC study II: Bootstrap uniform confidence sets

Page 8: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Outline

1. Optimal sup-norm rates

2. Sup-norm rate-adaptive estimation

3. MC study I: Adaptive estimation procedure

4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals

5. MC study II: Bootstrap uniform confidence sets

Page 9: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Preliminaries: measuring ill-posedness

I let ΠK : L2(W )→ BK denote the orthogonal projection onto thesieve space BK = ∨bK1, . . . , bKK

I weak norm ‖h‖w,2 = ‖ΠKTh‖L2(W ) where Th(Wi) = E[h(Xi)|Wi]

I BCK (07): sieve measure of ill-posedness

s−1JK = sup

h∈ΨJ :‖h‖w,2 6=0

‖h‖L2(X)

‖h‖w,2=

1

smin(G−1/2ψ S′G

−1/2b )

where Gb = Gb,K = E[bK(Wi)bK(Wi)

′],Gψ = Gψ,J = E[ψJ(Xi)ψ

J(Xi)′],

S′ = S′JK = E[ψJ(Xi)bK(Wi)

′].

I the NPIV model is said to beI mildly ill-posed if s−1

JK = O(Jς/d) for some ς > 0I severely ill-posed if s−1

JK = O(exp( 12Jς/d)) for some ς > 0

Page 10: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Preliminaries: roughness properties of the sieve

I following Newey (97), we define ζ(J) = ζb(K) ∨ ζψ(J),

ζb(K) := supw‖G−1/2

b bK(w)‖`2

ζψ(J) := supx‖G−1/2

ψ ψJ(x)‖`2

I we also introduceξψ(J) := sup

x‖ψJ(x)‖`1

which is better suited to studying sup-norm rates

I sup-norm variance term depends on ξψ(J), eJ = λmin(Gψ,J), s−1JK

Page 11: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Assumptions imposed for sup-norm rate

1. (i) (Xi, Yi,Wi)ni=1 is an i.i.d. sample; (ii) X has compact supportX ⊂ Rd with nonempty interior; W has support W ⊂ Rdw ; (iii)supx |h0(x)| <∞; (iv) h 7→ E[h(X)|W = ·] is injective on L∞(X)

2. (i) supw E[u2i |Wi = w] ≤ σ2; (ii) E[|ui|(2+δ)] <∞ for some δ > 0.

3. (i) λmin(Gb,K) > 0; eJ = λmin(Gψ,J) > 0; J ≤ K;

(ii) s−1JKζ(J)

√(J log J)/n = o(1);

(iii) ζb(K)(2+δ)/δ√

(log J)/n = o(1)

4. there exists πJh0 ∈ ΨJ such that: (i) ‖h0 − πJh0‖∞ ≤ C∗J−p/d;(ii) s−1

JK‖h0 − πJh0‖w,2 ≤ C∗2‖h0 − πJh0‖L2(X);(iii) ‖QJ(h0 − πJh0)‖∞ ≤ C∗∞‖h0 − πJh0‖∞with QJ : L2(X)→ ΨJ the oblique projectionQJh(x) = ψJ(x)[S′G−1

b S]−1S′G−1b E[bK(Wi)h(Xi)].

Page 12: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Upper bound (1)

Theorem (Upper bound for NPIV)Let Assumptions 1–4 hold. Then:

‖h− h0‖∞ = Op

(J−p/d + s−1

JKξψ(J)√

(log J)/(neJ)).

I For Cohen-Daubechies-Vial (CDV) wavelets and B-splines, we showthat [ξψ(J)]2/eJ = O(J), hence

‖h− h0‖∞ = Op

(J−p/d + s−1

JK

√(J log J)/n

).

Page 13: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Upper bound (2)

CorollaryLet X = [0, 1]d, 0 < infx f(x), supx f(x) <∞, and let ΨJ be spannedby a CDV wavelet basis or B-spline basis of sufficient regularity. Then:

Mildly ill-posed case: Choosing J K (n/ log n)d/(2(p+ς)+d) yields:

‖h0 − h‖∞ = Op

((n/ log n)−p/(2(p+ς)+d)

).

Severely ill-posed case: Choosing J = c′0(log n)d/ς for any c′0 ∈ (0, 1)and K = c0J for some finite c0 ≥ 1 yields:

‖h0 − h‖∞ = Op

((log n)−p/ς

).

Page 14: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Optimality (1)

I Chen and Reiß (11) showed that the L2(X) rates

I ‖h− h0‖L2(X) = Op(n−p/(2(p+ς)+d)) in the mildly ill-posed case

I ‖h− h0‖L2(X) = Op((logn)−p/ς) in the severely ill-posed case

are optimal in a L2- minimax sense

I sup norm ≥ L2 norm

I therefore our sup-norm rate are optimal in the severely ill-posed case

I what about the mildly ill-posed case?

I now derive the minimax lower bound in sup-norm loss, i.e. the ratern over a parameter space H s.t.

lim infn→∞

infhn

suph∈H

Ph(‖h− hn‖∞ ≥ crn

)≥ c′ > 0,

for constants c, c′.

Page 15: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Optimality (2)

I trick: rewrite the NPIV model in terms of a nonparametric indirectregression (NPIR) model:

Yi = E[h0(Xi)|Wi] + εi

E[εi|Wi] = 0

εi ∼ N(0, σ0(Wi)2)

where E[ · |Wi] is known and σ0(·)2 ≥ σ20 > 0

I NPIV:

Yi = h0(Xi) + E[h0(Xi)|Wi]− h0(Xi) + εi︸ ︷︷ ︸=:ui

where by construction E[ui|Wi] = 0

I NPIR is more informative than NPIV

I implication: lower bound for NPIV ≥ lower bound for NPIR

Page 16: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Lower bound for NPIR

Assumption (S)(i) h0 ∈ Bp∞,∞([0, 1]d), (ii) there is a ς > 0 such that

‖Th‖L2(X) . ‖h‖B−ς2,2

for all h ∈ B(p, L) := h ∈ Bp∞,∞([0, 1]d) : ‖h‖Bp∞,∞ ≤ L.

Theorem (Lower bound for NPIR)Let Assumption S hold for the NPIR model with a random sample(Yi,Wi)ni=1. Then:

lim infn→∞

infhn

suph∈B(p,L)

Ph(‖h− hn‖∞ ≥ c(n/ log n)−p/(2(p+ς)+d)

)≥ c′ > 0,

where inf hndenotes the infimum over all estimators based on the sample

of size n, and the constants c, c′ depend only on p, L, d, ς, σ0 .

Page 17: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Lower bound for NPIV

Corollary (Lower bound for NPIV)Let Assumption S hold for the NPIV model with a random sample(Xi, Yi,Wi)ni=1 and infw E[u2|W = w] ≥ σ2

0. Then:

lim infn→∞

infhn

suph∈B(p,L)

Ph(‖h− hn‖∞ ≥ c(n/ log n)−p/(2(p+ς)+d)

)≥ c′ > 0,

where inf hndenotes the infimum over all estimators based on the sample

of size n, and the constants c, c′ depend only on p, L, d, ς.

Page 18: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Outline

1. Optimal sup-norm rates

2. Sup-norm rate-adaptive estimation

3. MC study I: Adaptive estimation procedure

4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals

5. MC study II: Bootstrap uniform confidence sets

Page 19: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Adaptive estimation for NPIV

I must choose J optimally to attain optimal rates

I optimal choice depends on the unknown p and s−1JK

I want a data-driven method for choosing J optimally

I existing methods focus on L2 loss, minimizing a MSE-type criterionI Horowitz (14): modified sieve NPIV: K = J andbK = ψJ=orthonormal series of L2([0, 1]d).

I optimal in L2 up to a logn factorI Liu & Tao (14): Mallows Cp model selection of sieve NPIV assuming

homoskedastic error.

I CV/AIC/BIC/Mallows criteria aren’t well suited to sup-norm rates

I we introduce a sup-norm adaptive Lepski-type procedure

Page 20: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Lepski-type procedureI set K = K(J) J deterministically (e.g. K = c0J + a)

I choose J by the following method. Define the sets:

J0 =j ∈ [Jmin, Jmax] : j−p/d ≤ C0Vsup(j)

J =

j ∈ [Jmin, Jmax] : ‖hj − hl‖∞ ≤

√2σ[Vsup(j) + Vsup(l)]

∀ l ∈ (j, Jmax]

where

Vsup(j) = s−1jK(j)ξψ(j)

√(log n)/(nej)

Vsup(j) = s−1jK(j)ξψ(j)

√(log n)/(nej)

sJK(J) = smin((Ψ′Ψ)−1/2(Ψ′B)(B′B)−1/2), eJ = λmin(Ψ′Ψ/n).

I J0 = minj∈J0j is optimal but infeasible

I J = minj∈J j is our data-driven estimator of J

I hJ denotes the sieve NPIV estimator with J = J , K = K(J)

Page 21: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Heuristic argument

I Suppose J0 ⊆ J . Then:

J := min J ≤ J0 := minJ0

and:

‖hJ − h0‖∞ ≤ ‖hJ0 − h0‖∞ + ‖hJ − hJ0‖∞≤ ‖hJ0 − h0‖∞

+2√

2σs−1J0K(J0)ξψ(J0)

√(log n)/(neJ0)

. ‖hJ0 − h0‖∞+s−1

J0K(J0)ξψ(J0)√

(log n)/(neJ0) wpa1

⇒ ‖hJ − h0‖∞ = Op(J−p/d0 + s−1

J0K(J0)ξψ(J0)√

(log n)/(neJ0))

I implication: J is rate adaptive to the oracle J0

Page 22: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Choosing Jmax

I still need to choose Jmax

I data-driven estimator of Jmax:

Jmax = minJ > Jmin : s−1JK(J)ζ(J)

√(JL(J) log n)/n ≥ 1

where L(J) = a log(log(J)) for some constant a > 0.

Page 23: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Oracle propertyI now consider the special case with a CDV wavelet or B-spline sieve,

rectangular support, and well-behaved density

Theorem (Adaptivity)Let Assumptions 1–4 hold and s−1

JmaxK(Jmax)

√(J2

max log n)/n = o(1).

Then: Jmax ≤ Jmax ≤ Jmax wpa1; and

J0 ⊆ J wpa1

and so:

‖hJ − h0‖∞ = Op(J−p/d0 + s−1

J0K(J0)

√(J0 log n)/n) .

I implication: sup-norm rate adaptive in the mildly and severelyill-posed cases; no loss of log n factor.

I automatically implies L2(X)-norm rate adaptive in the severelyill-posed case, and almost adaptive in the mildly ill-posed case (upto log n factor).

Page 24: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Outline

1. Optimal sup-norm rates

2. Sup-norm rate-adaptive estimation

3. MC study I: Adaptive estimation procedure

4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals

5. MC study II: Bootstrap uniform confidence sets

Page 25: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC design

I Newey and Powell (03) design, but with compact support: generate UiV ∗iW ∗i

∼ N 0

00

,

1 0.5 00.5 1 00 0 1

and set Xi = Φ((W ∗i + V ∗i )/

√2) and Wi = Φ(W ∗i )

I linear design: h0(x) = 4x− 2

I nonlinear design: h0(x) = log(|6x− 3|+ 1)sgn(x− 12 )

I generate 1000 samples of length 1000

I implement with cubic/quartic B-splines (with nested knots) andLegendre polynomials

I use σ = 1 (true σ) and σ = .1

I take L(J) = 110 log log J in definition of J

I compare sup-norm and L2-norm error of Lepski procedure againstinfeasible choice of J which minimizes sup-norm error in each sample

Page 26: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC design: linear h0 (black), nonlinear h0 (red)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 27: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC design: scatter plot of (Xi,Wi)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

X

W

Page 28: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC design: scatter plot of (Xi, Yi) with nonlinear h0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−5

−4

−3

−2

−1

0

1

2

3

4

5

X

Page 29: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC results: Lepski procedure, linear design

Table 1: Linear design, cubic (r = 4) and quartic (r = 5) B-spline bases

Lepski (σ = 1) Lepski (σ = 0.1) InfeasiblerJ rK L∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss

Results with K(J) = J − rJ + rKMean 4 4 0.4262 0.1547 0.4262 0.1547 0.4141 0.1608Med. 4 4 0.3828 0.1394 0.3828 0.1394 0.3708 0.1443Mean 4 5 0.4179 0.1524 0.4209 0.1536 0.3937 0.1540Med. 4 5 0.3681 0.1368 0.3692 0.1370 0.3476 0.1368Mean 5 5 0.6633 0.2355 0.6633 0.2355 0.6243 0.2494Med. 5 5 0.6007 0.2202 0.6007 0.2202 0.5646 0.2311

Results with K(J) = 2(J − rJ) + rK + 1Mean 4 4 0.4188 0.1526 0.4188 0.1526 0.3895 0.1552Med. 4 4 0.3696 0.1375 0.3696 0.1375 0.3470 0.1371Mean 4 5 0.3918 0.1439 0.3945 0.1449 0.3720 0.1486Med. 4 5 0.3430 0.1291 0.3430 0.1291 0.3295 0.1311Mean 5 5 0.6366 0.2277 0.6366 0.2277 0.5816 0.2352Med. 5 5 0.5800 0.2089 0.5800 0.2089 0.5228 0.2111

Page 30: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC results: Lepski procedure, linear design

Table 2: Linear design, Legendre polynomial bases

Lepski (σ = 1) Lepski (σ = 0.1) InfeasibleL∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss

Results with K(J) = JMean 0.0882 0.0492 0.2943 0.1185 0.0869 0.0494Med. 0.0777 0.0452 0.1674 0.0810 0.0764 0.0453

Results with K(J) = 2JMean 0.0878 0.0490 0.2745 0.1119 0.0862 0.0492Med. 0.0779 0.0453 0.1640 0.0807 0.0766 0.0455

Page 31: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC results: Lepski procedure, nonlinear design

Table 3: Nonlinear design, cubic (r = 4) and quartic (r = 5) B-spline bases

Lepski (σ = 1) Lepski (σ = 0.1) InfeasiblerJ rK L∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss

Results with K(J) = J − rJ + rKMean 4 4 0.4343 0.1621 0.4343 0.1621 0.4233 0.1671Med. 4 4 0.3855 0.1469 0.3855 0.1469 0.3748 0.1503Mean 4 5 0.4262 0.1600 0.4271 0.1605 0.4030 0.1615Med. 4 5 0.3738 0.1444 0.3744 0.1445 0.3514 0.1445Mean 5 5 0.6726 0.2407 0.6726 0.2407 0.6318 0.2531Med. 5 5 0.6069 0.2278 0.6069 0.2278 0.5646 0.2345

Results with K(J) = 2(J − rJ) + rK + 1Mean 4 4 0.4271 0.1601 0.4286 0.1609 0.3987 0.1623Med. 4 4 0.3764 0.1445 0.3764 0.1445 0.3518 0.1443Mean 4 5 0.4002 0.1518 0.4029 0.1528 0.3812 0.1563Med. 4 5 0.3410 0.1384 0.3414 0.1384 0.3258 0.1402Mean 5 5 0.6471 0.2330 0.6471 0.2330 0.5895 0.2390Med. 5 5 0.5797 0.2143 0.5797 0.2143 0.5341 0.2141

Page 32: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC results: Lepski procedure, nonlinear design

Table 4: Nonlinear design, Legendre polynomial bases

Lepski (σ = 1) Lepski (σ = 0.1) InfeasibleL∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss

Results with K(J) = JMean 0.2494 0.1305 0.4283 0.1719 0.2297 0.1224Med. 0.2367 0.1266 0.3210 0.1426 0.2218 0.1243

Results with K(J) = 2JMean 0.2475 0.1306 0.4063 0.1644 0.2241 0.1208Med. 0.2346 0.1267 0.3132 0.1395 0.2178 0.1242

Page 33: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Outline

1. Optimal sup-norm rates

2. Sup-norm rate-adaptive estimation

3. MC study I: Adaptive estimation procedure

4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals

5. MC study II: Bootstrap uniform confidence sets

Page 34: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Pointwise and uniform inference

I Our sup-norm rates allow for low-level mild conditions forasymptotic normality of plug-in sieve NPIV estimators of possiblynonlinear functionals of h0 in two cases

1. “pointwise” inference on f(h0)I e.g.: exact consumer surplus of a price variation from p0 to p1 at

income i

Qi = h0(Pi, Ii) + ui

f(h0) = S(p0)

where S′i(p) = −h0(p, i− Si(p))

Si(p1) = 0

cf. Hausman & Newey (95), Vanhems (10), Blundell et al. (12)

2. “uniform” inference on fτ (h0) : τ ∈ T where T ⊂ RdTI e.g.: uniform inference on consumer surplus/deadweight loss

Page 35: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Pointwise inference (1)I we focus on slower than root-n functionals that are bounded wrt the

sup norm:

5. (i) there exists a linear functional Df(h0)[·] and constant C s.t.

|f(h)− f(h0)−Df(h0)[h− h0]| ≤ C‖h− h0‖2∞

for all h ∈ Nn(h0) where h ∈ Nn wpa1;

(ii) V−1/2n ‖h− h0‖2∞ = op(n

− 12 )

I includes CS/DWL functionals and quadratic functional.

I sufficient for a more general condition of Chen and Pouzo (14).

I here Vn ∞ is the sieve variance

Vn = Df(h0)[ψJ ]′ΣnDf(h0)[ψJ ]

Σn = [S′G−1b S]−1

(S′G−1

b ΩG−1b S

)[S′G−1

b S]−1,

where S = E[bK(Wi)ψJ(Xi)

′], Ω = E[u2i bK(Wi)b

K(Wi)′]

Page 36: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Pointwise inference (2)

Theorem (Pointwise asymptotic normality of sieve t-statistics)Let Assumptions 1–5 (etc) hold. Then:

√n(f(h)− f(h0))

V1/2n

→d N(0, 1).

I here Vn ∞ is the sieve variance estimator

Vn = Df(h)[ψJ ]′ΣDf(h)[ψJ ]

Σ = [S′G−1b S]−1

(S′G−1

b ΩG−1b S

)[S′G−1

b S]−1

S = B′Ψ/n, Gb = (B′B/n), Ω = n−1∑ni=1 u

2i bK(Wi)b

K(Wi)′.

I just like 2SLS variance estimator but using basis functions. Chenand Pouzo (14), Newey (13).

Page 37: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Uniform inference (1)

I now impose a uniform (for τ ∈ T ) version of Assumption 5

5′. (i) Dfτ (h0)[·] is a linear functional for each τ ∈ T , (ii) there exists aconstant C s.t.

supτ∈T|fτ (h)− fτ (h0)−Dfτ (h0)[h− h0]| ≤ C‖h− h0‖2∞

for all h ∈ Nn(h0) where h ∈ Nn wpa1;

(ii) supτ∈T V−1/2τ,n ‖h− h0‖2∞ = op(n

− 12 )

I here Vτ,n = Dfτ (h0)[ψJ ]′ΣnDfτ (h0)[ψJ ]

I estimate with Vτ,n = Dfτ (h)[ψJ ]′ΣDfτ (h)[ψJ ]

Page 38: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Uniform inference (2)

Theorem (Uniform asymptotic normality of sieve t-statistics)Let Assumptions 1–5’ (etc) hold. Then there exists a sequence of tightGaussian processes Gn on `∞(T ) with covariance function

E[Gn(t1)Gn(t2)] =Dft1(h0)[ψJ ]′ΣnDft2(h0)[ψJ ]

V1/2t1,nV

1/2t2,n

and random variables Zn =d supτ∈T |Gn(τ)| such that

supτ∈T

∣∣∣∣∣√n(fτ (h)− fτ (h0))

V1/2τ,n

∣∣∣∣∣ = Zn + op(1)

as n, J,K →∞.

I we follow Chernozhukov, Chetverikov, Kato (14) (also seeChernozhukov, Lee, Rosen (13)) construction rather than strongapproximation

Page 39: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Example: uniform confidence bands

I fτ (h0) = h0(τ) with T = X , and Dfτ (h)[ψJ ] = ψJ(τ)

I by previous theorem, there exists a sequence of tight Gaussianprocesses Gn on `∞(X ) with covariance function

E[Gn(x1)Gn(x2)] =ψJ(x1)′Σnψ

J(x2)

V1/2x1,nV

1/2x2,n

and random variables Zn =d supx∈X |Gn(x)| such that

supx∈X

∣∣∣∣∣√n(h(x)− h0(x))

V1/2x,n

∣∣∣∣∣ = Zn + op(1)

as n, J,K →∞.

I invert for uniform confidence band

Page 40: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Example: uniform inference on exact consumer surplus

I fτ (h0) = Si(p) with T = [p0, p0]× [i, i]

Dfτ (h)[ψJ ] = −∫ p

p1ψJ(t, i− Si(t))e

∫ sp∂2h0(u,i−Si(u)) du dt

Dfτ (h)[ψJ ] = −∫ p

p1ψJ(t, i− Si(t))e

∫ sp∂2h(u,i−Si(u)) du dt

I uniform asymptotic normality of Si(p) : (p, i) ∈ [p0, p0]× [i, i]follows from previous theorem

I could equally consider uniform inference on deadweight loss

I our sup-norm rates here are critical to control bias

Page 41: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Outline

1. Optimal sup-norm rates

2. Sup-norm rate-adaptive estimation

3. MC study I: Adaptive estimation procedure

4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals

5. MC study II: Bootstrap uniform confidence sets

Page 42: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC design

I same Newey and Powell (03) linear and nonlinear designs

I estimate Jmax as before, use J = Jmax, K = K(Jmax) toimplement sieve NPIV estimator

I estimate critical values for uniform confidence bands using the sievescore bootstrap (Chen and Pouzo, 14) with Mammen (93) two-pointdistribution with 1000 bootstrap replications for each sample

I computationally simpler than bootstrap sieve t stat in Chen-Pouzo(14) or the bootstrap in Horowitz-Lee (12).

I compare MC with nominal coverage probabilities

Page 43: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Estimated UCBs (dashed), h (black line), h0 (red line)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Page 44: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC results: coverage probabilities

Table 5: Linear and nonlinear design, cubic (r = 4) and quartic (r = 5)

B-spline bases.

rJ rK 90% 95% 99% 90% 95% 99%

linear 4 4 0.933 0.966 0.996 0.944 0.971 0.994linear 4 5 0.937 0.975 0.995 0.937 0.963 0.994linear 5 5 0.961 0.983 0.997 0.959 0.985 0.997nonlinear 4 4 0.884 0.945 0.987 0.912 0.956 0.989nonlinear 4 5 0.894 0.946 0.987 0.906 0.951 0.987nonlinear 5 5 0.956 0.978 0.995 0.951 0.979 0.996

Note: Left panel uses K(J) = J − rJ + rK , right panel uses K(J) =2(J − rJ) + rK + 1.

Page 45: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

MC results: coverage probabilities

Table 6: Linear and nonlinear design, Legendre polynomial bases

90% 95% 99% 90% 95% 99%

linear 0.937 0.964 0.997 0.928 0.959 0.989nonlinear 0.901 0.952 0.988 0.906 0.948 0.989

Note: Left panel uses K(J) = J , right panel uses K(J) = 2J .

Page 46: Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV estimators. 2.we deriveminimax lower bounds in sup-norm lossover Holder class of functions

Conclusions

I contributions:

1. optimal sup-norm rates and attainability by sieve estimators2. Lepski procedure for adaptive estimation in sup norm3. pointwise and uniform inference on possibly nonlinear functionals

I first such results for NPIV (or indeed any ill-posed inverse problemwith unknown operator)

I application to inference on consumer surplus in demand estimationand uniform confidence bands