revisiting local asymptotic normality (lan) and passing on...
TRANSCRIPT
STATISTICS–THEORY AND PRACTICE
Richard A. Johnson Conference, University of Wisconsin
Revisiting Local Asymptotic
Normality (LAN) and Passing on
to Local Asymptotically Mixed
Normality (LAMN)
G.G. Roussas
University of California, Davis
(formerly, University of Wisconsin, Madison)
May, 2008
1. Basic Problem
On the basis of a random sample, whose proba-
bility law depends on a parameter θ, discriminate
between two values θ and θ∗ (θ 6= θ∗).
2
2. Some Notation
X0, X1, . . . , Xn i.i.d., (X ,A, Pθ), θ ∈ Θ open ⊆ Rk,
k ≥ 1, An = σ(X0, . . . , Xn), Pn,θ = Pθ | An,
P0,θ ≈ P0,θ∗, θ, θ∗ ∈ Θ (θ 6= θ∗),
q(X0; θ, θ∗) =dP0,θ∗
dP0,θ,
ϕj(θ, θ∗) = ϕ(Xj; θ, θ
∗) =[q(Xj; θ, θ
∗)]1/2
,
Ln(θ, θ∗) =dPn,θ∗
dPn,θ=
n∏j=0
ϕ2j (θ, θ∗),
Λn(θ, θ∗) = logLn(θ, θ∗);
restrict to θ∗’s close to θ; i.e., θn = θ + hn√n
,
hn → h ∈ Rk (all limits taken as n→∞).
3
3. Basic Assumption
ϕ0(θ, θ∗) is differentiable in q.m. with respect to
θ∗ at θ – under Pθ – with q.m. derivative ϕ0(θ) (a
k × 1 random vector). Set
Ln(θ, θn) =dPn,θndPn,θ
, Λn(θ, θn) = logLn(θ, θn),
∆n(θ) =2√n
n∑j=0
ϕj(θ),
Γ(θ) : h′Γ(θ)h = 4Eθ[h′ϕ0(θ)
]2, h ∈ Rk,
A(h, θ) =1
2h′Γ(θ)h.
4
4. First Installment of Results
Theorem 1: Λn(θ, θn)− h′∆n(θ)Pn,θ−→ −A(h, θ)
Theorem 2: L[∆n(θ) | Pn,θ
]⇒ N(0,Γ(θ))
Theorem 3: L[Λn(θ, θn) | Pn,θ
]⇒ N
(−1
2h′Γ(θ)h, h′Γ(θ)h
).
5
5. Reminder
{Pn} and {Qn}, defined on (Xn,An), are contiguous
if whenever Pn(An)→ 0, An ∈ An, then
Qn(An)→ 0, and vice versa.
6
6. Second Installment of Results
Theorem 4: Λn(θ, θn)− h′∆n(θ)Pn,θn−→ −A(h, θ)
Theorem 5: L[∆n(θ) | Pn,θn
]⇒ N(Γ(θ)h,Γ(θ))
Theorem 6: L[Λn(θ, θn) | Pn,θn
]⇒ N
(12h′Γ(θ)h, h′Γ(θ)h
).
7
7. Loose Interpretation of Theorem 1
Λn(θ, θn) ' h′∆n(θ)−A(h, θ)
or
Ln(θ, θn) ' eh′∆n(θ)−A(h,θ).
8
8. Precise Formulation of Above Result
Theorem 7: There exists a (suitably) truncated
version ∆∗n(θ) of ∆n(θ) such that:
Eθeh′∆∗n(θ) def= eBn(h) <∞,
Pn,θ[∆∗n(θ) 6= ∆n(θ)
]−→ 0,
Pn,θn[∆∗n(θ) 6= ∆n(θ)
]−→ 0,
and if
Rn,h(A) = e−Bn(h)∫A
eh′∆∗n(θ)dPn,θ, A ∈ An,
(so that
dRn,hdPn,θ
= eh′∆∗n(θ)−Bn(h), h ∈ Rk
),
then
‖Pn,θn −Rn,hn‖ → 0 or
sup
{∥∥∥∥∥Pn,θ+ h√n
−Rn,h
∥∥∥∥∥ ;h ∈ B bounded ⊂ Rk}→ 0.
9
9. Statistical Significance of Theorem 7
Exploitation of the local approximation (in the L1-
norm or total variation norm) of the given family
of probability measures by an exponential family of
probability measures.
10
10. Statistical Applications of the Theorems
(i) Θ ⊆ R
(a) For hypotheses (alternatives) for which there
are UMP tests in the exponential family, we can
construct tests ϕn – based on ∆(θj), θj, j = 0,1,2
boundary points – which are AUMP
(i.e., lim sup [sup (Eθωn − Eθϕn; θ ∈ A)] ≤ 0) among
all tests ωn such that lim sup [sup (Eθωn; θ ∈ H)] ≤α. For example: H : θ ≤ θ0, A > θ0 at level α,
ϕn = ϕn (∆n(θ0)) =
1, ∆n(θ0) ≥ cnγn, ∆n(θ0) = cn
0, ∆n(θ0) < cn
,
cn, γn : Eθ0ϕn = α.
11
(b) For hypotheses (alternatives) for which there
are UMPU tests in the exponential family, we can
construct tests ϕn – based on ∆(θj), θj, j = 0,1,2
boundary points – which are AUMP
(i.e., lim sup [sup (Eθωn − Eθϕn; θ ∈ A)] ≤ 0) among
all asymptotically unbiased tests ωn(i.e., lim inf [inf (Eθωn; θ ∈ H)] ≥ α). For example:
H : θ = θ0, A : θ 6= θ0 at level α,
ϕn = ϕn (∆n(θ0))
=
1, ∆n(θ0) < an or ∆n(θ0) > bn
0, an ≤∆n(θ0) ≤ bn,
(an < bn) with an → −ξα/2, bn → ξα/2 (ξp = p-th
quantile of N (0, Γ(θ0))) is AUMPU.
Remark: Actually, the tests are locally AUMP or
AUMPU, but they become globally so under the
additional assumption:
∆n(θj)Pn,θn−→ ±∞ if
√n(θn− θj)→ ±∞, j = 0,1,2.
12
(ii) Θ ⊆ Rk, k ≥ 1
H : θ = θ0, A : θ 6= θ0 at level α.
Simple tests ϕn are constructed – based on ∆n(θ0)
– which enjoy Wald-type asymptotically optimal
properties:
Weighted average power over certain surfaces is
largest – within a class of competing tests.
The sup of the difference of the sup and the inf of
the power over certain surfaces → 0.
The sup of the difference between the envelope
power and the power over certain surfaces – of the
test ϕn – compared to the same sup of any other
competing test have a difference, whose limsup is
≤ 0.
13
11. Set θn = θ + h√n
. Then the estimate Tn is
regular if
L[√n(Tn − θn) | Pn,θn
]⇒ L(θ),
a probability measure.
Theorem 8: For regular estimates,
L(θ) = N(0,Γ−1(θ)) ∗ L∗(θ),
L∗(θ) a specific probability measure.
14
12. Application to Asymptotic Efficiency of
Estimates via Theorem 8: The Weiss-Wolfowitz
Approach
Θ ⊆ R
Pn,θ[√n(Tn − θ) ≤ x
]→ FT (x; θ), a d.f., continu-
ously in Θ for each fixed x ∈ R and continuously
on R for each θ ∈ Θ. Let `T (θ) and uT (θ) be the
“smallest” and the “largest” median of the (con-
tinuous) d.f. FT (·; θ). Then
limPn,θ
(θ −
t1√n
+`T (θ)√n≤ Tn ≤ θ +
t2√n
+uT (θ)√n
)≤ B(θ; t1, t2),
B(θ; t1, t2) = Φ[t2σ(θ)]−Φ[t1σ(θ)], t1, t2 > 0,
Φ is the d.f. of N(0, σ2(θ)), σ2(θ) = 4Eθ [ϕ0(θ)]2.
15
Θ ⊆ Rk, k ≥ 1
Under similar assumptions as above,
limPn,θ[−t1 + `T (θ, h) ≤
√nh′(Tn − θ)
≤ t2 + uT (θ, h)]
≤ Φ[t2σ−1(θ, h)]−Φ[−t1σ−1(θ, h)],
t1, t2 > 0, Φ d.f. of N(0, σ2(θ, h)),
σ2(θ, h) = h′Γ−1(θ)h.
16
Θ ⊆ R
L[√n(Tn − θ) | Pn,θ
]⇒ LT,θ, a probability measure
having 0 as its median. Then
lim sup
(θ −
t1√n≤ Tn ≤ θ +
t2√n
)≤ B(θ; t1, t2),
t1, t2 > 0.
Consider median unbiased Tn’s; i.e., Pθ(Tn ≥ θ) ≥12, Pθ(Tn ≤ θ) ≥ 1
2. Then
lim supPθ
(θ −
t1√n≤ Tn ≤ θ +
t2√n
)≤ B(θ; t1, t2),
t1, t2 > 0.
17
13. Application to Asymptotic Efficiency of
Estimates: The Classical Approach
(i) Θ ⊆ R
Consider estimates Tn such that
Pθ[√n(Tn − θ) ≤ x
]→ ΦT (x; θ)
(or continuously so in Θ), where ΦT (·; θ) is the
d.f. of N(0, σ2T (θ)). Then σ2
T (θ) ≥ 1/σ2(θ) a.e.[`]
(or pointwise, respectively).
Also, Eθ[nEθ(Tn − θ)2
]≥ 1/σ2(θ) a.e.[`] (or point-
wise, respectively).
(ii) Θ ⊆ Rk, k ≥ 1
Consider estimates Tn such that
Pθ[√n(Tn − θ) ≤ x
]⇒ Φ(k)(x;CT (θ))
continuously in Θ, where Φ(k)(·;CT (θ)) is the d.f.
of N(0, CT (θ)) with CT (θ) positive definite. Then,
with Γ(θ) = Eθ[ϕ0(θ)ϕ′0(θ)
], CT (θ)−Γ−1(θ) is pos-
itive semi-definite a.e.[`k]. Also, CT is continuous
in Θ. Furthermore, if Γ is also continuous, then
CT (θ)−Γ−1(θ) is positive semi-definite for all θ ∈ Θ.
18
14. Generalizations
Most of the above results have been generalizedto:
(i) i.n.n.i.d. case
(ii) Markov processes
(iii) Semi-Markov processes
(iv) General stochastic processes, which need noteven be stationary
(v) Continuous time Markov processes
(vi) Levy processes of the discontinuous type
(vii) Continuous time diffusions and Gaussian pro-cesses with known covariance.Also,
(viii) Certain generalizations when the sample size isa stopping time.
19
15. Case Where the Underlying Family of
Probability Measures is not LAN but Rather
it is LAMN
Refer to Theorem 1:
Λn(θ, θn)−[h′∆n(θ)−A(h, θ)
] Pn,θ−→ 0,
A(h, θ) non-random. Instead, it may happen that
Λn(θ, θn)−[h′∆n(θ)− 1
2h′Tn(θ)h
] Pn,θ−→ 0,
Tn(θ) k × k An-measurable random matrices,
∆n(θ) = T1/2n (θ)Wn(θ),
Wn(θ) k-dimensional An-measurable random vec-
tors
L{
[Wn(θ), Tn(θ)] | Pn,θ}⇒ L{[W,T (θ)] | Pθ}
W ∼ N(0, Ik) independent of T (θ), or
L{
[∆n(θ), Tn(θ)] | Pn,θ}⇒ L{[∆(θ), T (θ)] | Pθ}
∆(θ) = T1/2(θ)W . It follows that
L{[∆(θ) | T (θ)] | Pθ} = N(T (θ)h, T (θ)).
20
Thus, Theorem 1 becomes here
Theorem 1′:
Λn(θ, θn)− h′∆n(θ) + 12h′Tn(θ)h
Pn,θ−→ 0.
The underlying family of probability measures is
referred to as Locally Asymptotically Mixed Normal
(LAMN).
From Theorem 1′,
Ln(θ, θn) ' eh′∆n(θ)+1
2h′Tn(θ)h
The expression on the RHS above is referred to as
curved exponential family.
From the fact that
L{[∆(θ) | T (θ)] | Pθ} = N(T (θ)h, T (θ)),
it follows that, in the limit, inference can be car-
ried out conditionally as in a normal distribution,
and then revert to the original family of probability
measures.
21
Examples
(i) Explosive autoregressive process of first order
Xj = θXj−1 + εj, X0 = 0, |θ| > 1,
εj, j ≥ 1, i.i.d. ∼ N(0,1).
Here
∆n(θ) =θ2 − 1
θn
n∑j=1
Xj−1εj
Tn(θ) =n∑
j=1
Xj−1εj
/ n∑j=1
X2j−1
L[T (θ) | Pθ] = χ21, W ∼ N(0,1), T1/2(θ) ∼ N(0,1),
T1/2(θ) and W independent.
22
(ii) Super-critical Galton-Watson branching pro-
cess with geometric offspring distribution
Here
∆n(θ) =1
θ(n−1)/2
n∑j=1
(Xj − θXj−1),
Tn(θ) =θ − 1
θn
n∑j=1
Xj−1,
T (θ) exponentially distributed with mean 1,
W ∼ N(0,1), T1/2(θ) and W independent.
For LAMN families some results analogous to those
associated with LAN families have been established.
23
Some Selected References
1. Akritas, M.G. (1978): Contiguity of Probability Mea-
sures Associated with Continuous Time Stochastic Pro-
cesses. Ph.D. Thesis, Department of Statistics, Univer-
sity of Wisconsin, Madison.
2. Akritas, M.G. and Roussas, G.G. (1980): Asymptotic
inference in continuous semi-Markov processes. Scandi-
navian Journal of Statistics, Vol.7, 73–79.
3. Bhattacharya, D. and Roussas, G.G. (2001): Exponen-
tial approximation for randomly stopped locally asymp-
totically mixture of normal experiments. Stochastic Mod-
eling and Applications, Vol.4, No.2, 56–71.
4. Jeganathan, P. (1982): On the asymptotic theory of
estimation when the limit of the log-likelihood ratios is
mixed normal. Sankhya, Vol.44, Ser.A, 173–212.
5. Jeganathan, P. (1995): Some aspects of asymptotic the-
ory with applications to time series model. Economic
Theory, Vol.II, 818–887.
6. Johnson, R.A. and Roussas, G.G. (1969): Asymptoti-
cally most powerful tests in Markov processes. Annals
of Mathematical Statistics, Vol.40, 1207–15.
7. Johnson, R.A. and Roussas, G.G. (1970): Asymptoti-
cally optimal tests in Markov processes. Annals of Math-
ematical Statistics, Vol.41, 918–38.
24
8. Johnson, R.A. and Roussas, G.G. (1972): Applications
of contiguity to multiparameter hypothesis testing. Pro-
ceedings of the Sixth Berkeley Symposium on Probability
Theory and Mathematical Statistics, Vol.1, 195–226.
9. Le Cam, L. (1960): Locally asymptotically normal fam-
ilies of distributions. Univ. of California Publications in
Statistics, Vol.3, 37–98.
10. Le Cam, L. (1986): Asymptotic Methods in Statistical
Decision Theory. Springer-Verlag, New York.
11. Le Cam, L. and Yang, G.L. (2000): Asymptotics in
Statistics: Some Basic Concepts (2nd edition). Springer
Series in Statistics, Springer, New York.
12. Lind, B. and Roussas, G.G. (1972): A remark on quadratic
mean differentiability. Annals of Mathematical Statis-
tics, Vol.43, 1030–34.
13. Roussas, G.G. (1972): Contiguity of Probability Mea-
sures: Some applications in statistics. Cambridge Uni-
versity Press, Cambridge.
14. Roussas, G.G. (1979): Asymptotic distribution of the
log-likelihood function for stochastic processes. Z. Wahr-
scheinlichkeitstheorie und Verwandte Gebiete, Vol.47, 31–
46.
25
15. Roussas, G.G. (2005): An Introduction to Measure-
Theoretic Probability. Elsevier Academic Press, Burling-
ton, Massachusetts.
16. Roussas, G.G. and Soms, A. (1972): On the exponential
approximation of a family of probability measures and
representation theorem of Hajek-Inagaki. Annals of the
Institute of Statistical Mathematics, Vol.25, 27–39.
17. Roussas, G.G. and Bhattacharya, D. (1999): Asymp-
totic behavior of the log-likelihood function in stochas-
tic processes when based on a random number of ran-
dom variables. In Semi-Markov Models and Applications,
Jacques Janssen and Nikolaos Limnios (eds), 119–147,
Kluwer Academic Publishers, Dordrecht.
18. Roussas, G.G. and Bhattacharya, D. (2002): Exponen-
tial approximation of distributions. In Teoriya Imovirnos-
tey ta Matematichna Statystika, Vol.66, 109–120. Also,
in Theory of Probability and Mathematical Statistics,
Vol.66, 119–132 (2003) (English version).
19. Roussas, G.G. and Bhattacharya, D. (2008): Hajek-
Inagaki representation theorem, under general stochastic
processes framework, based on stopping times. Statis-
tics & Probability Letters.
26
20. Van der Vaart, A.W. (1998): Asymptotic Statistics,
Cambridge series in Statistical and Probabilistic Math-
ematics. Cambridge Univ. Press.
21. Wald, A. (1943): Tests of statistical hypotheses con-
cerning several parameters when the number of obser-
vations is large. Transactions of the American Mathe-
matical Society, Vol.54, 426–482.
27