on asymptotic likelihood inference: removzng p-value ... · l($, a) for sxed @, &(@) = 1($, b)...
TRANSCRIPT
ON ASYMPTOTIC LIKELIHOOD INFERENCE: REMOVZNG P-VALUE: SINGULARITIES
Rongcai Li
A thesis submitted in conformilgr with requirements
for the Degree of Doctor of Philosophy
Graduate Department of Statistics
University of Toronto
@Copyright by Rongcai Li 2001
National Library BibliothPque nationale du Canada
Acquisitions and Acquisitions et Bibliagraphii SeMces seivices bibliographquas
The author has granted a non- exclusive licence ailowing the National L i m of Canada to reproduce, loan, distn'bute or sel1 copies of tbis thesis in microform, paper or electronic formats.
The author retains omership of the copy@ht in this thesis. Neither the thesis nor substantial extracts h m it may be printed or otherwise reproduced without the author's permission.
L'auteur a accordé une Licence non exclusive permettant à la Bibliothèque nationale du Canada de repraduire, prêter, distriiuer ou vendre des copies de cette thèse sous la forme de microficbe/fh, de reproduction sur papier ou sur format éIectronique.
L'auteur comme La prowété du droit d'auteur qui protège cette thése. Ni la thèse ni des extraits substaatieIs de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
On Asymptotic Likelihood Inference: Removing P-value Singularities
Rongcai Li, PhD 2001
Department of Statistics
University of Statistics
Abstract
Recent iikeiihood asymptotics initiated by Barndorff-Nielsen (1986) and Lugan-
nani & Rice(1980) has developed two combining formulas, which are often called
third order formulas, for computing pvalues and confidence coefficients with high
accuracy. Fraser & Reid (1995) and many others have extended these resuits to
very generai contexts and extensively explored their applications to various statk-
tical problems. These two formulas, however, have certain singularities near the
maximum iikelihood value. In this thesis we develop a theory for removing the
singuiarities in a generd statisticai context using the tangent expooential mode1 de-
veloped by Cakmak et al (1998) and Abebe et al (1995). In doing so, the asymptotic
expansions of the signed ükelihood ratio statistic and the standardized measure of
departure are fhst obtained in t e m of the standardized third and fourth derivatives
of the log density and are then used to form a bridge for the pvaiue hinctions in
the neighborhood of the maximum likeiihood value. The concept of a nonnormality
measure is also developed and its implications to the singularity problems are &
cussed. In addition, its expressions for two types of tangent exponential mode1 are
related.
We have also developed several alternative combining formulas for obtaining
highly accurate p-value and confidence interval. Unlike the Lugannani & Rice for-
mula, these formulas are generdy continuous at the extrema. Numerical studies
are used to determiae and compare the accuracy and reliability of the àiierent
combining formulas. These studies show that these new formdas outperform the
existing ones in many cases.
Acknowledgement
First and foremost, 1 wodd like to express my sincere gratitude to my supervisor,
Professor D.A.S. Fraser, for his guidance, encouragement, inspiration and patience
through the period of my Ph.D. program here at the University of Toronto. Without
his invaluable supervision and support this thesis could not have been completed.
1 am grateful to Professor Nancy Reid for her advice and support through my
PhD study.
1 aiso highly appreciate the kind mistance from Professor A.C.M Wong. My
conversations with him have been so helpful.
The financiai support provideci by the Ontario Government and the Department
of Statistics of the University of Toronto is acknowledged and is highly appreciated.
Last but not least, 1 would like to thank my parents and wife for their love and
continuing support through rny PIiD study. This thesis is dedicated to them and to
my IoveIy daughter Jiefei and son Ben.
Contents
1 A Review of Theory on Asymptotic Inference 1
. . . . . . . . . . . . . . 1.1 Likelihood and 6rst order asymptotic theory 2
. . . . . . . . . . . . . . . . . . . . . . . 1.2 Saddlepoint approximations 5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 The p'-formula 7
. . . . . . . . . . . . . . . . . . . . . 1.4 Tail probability approximations 10
. . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Overview and Outline 18
2 Likelihood Asympt otics: On Removing Singularities 21
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction 22
. . . . . . . . . . . . . . . . . . . . . . 2.2 Asymptotic exponential model 25
. . . . . . . . . . . . 2.3 Departures from standard normality: scalar case 32
. . . . . . . . . . . . . . . . . . . 2.4 Bridging the singularity: scalar case 39
. . . . . . . . . . . . . . . . . 2.5 Bridging the singularity: scdar interest 43
. . . . . . . . . . . . . . . . . . 2.6 Graphical bridging of the singuIarity 45
. . . . . . . . . . . . . . . . 2.7 Interrelating two nonnomality measures 50
2.8 Discontinuity at the extremes . . . . . . . . . . . . . . . . . . . . . . 56
3 Appraximating Tail Probabiities: the Gamma Combiner 60
3.1 Inference in the gamma mode1 . . . . . . . . . . . . . . . . . . . . . . 61
3.2 An alternative third order combining formuia: the gamma combiner . 63
3.3 Some numerical studies . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Merence for the noncentrd chi-squared distribution . . . . . . . . . . 73
3.5 Theoretical work on the Gamma Combiner . . . . . . . . . . . . . . . 84
4 Appraximating Tai1 Probabilities: the Student t Combiner 90
4.1 Why the Student t combiner? . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Inference in the location Student mode1 . . . . . . . . . . . . . . . . . 93
4.3 The Student t combiner . . . . . . . . . . . . . . . . . . - - . . . . . 94
5 Conchsion 97
Bibliography 99
List of Figures
2.1 The gamma model r-l(0)ye-l with go = 10. The asymptotic approx-
imations QGR(0) and QBN(0) for the pvalue function px(8 ) for testing
0 are plotted against 0. The bridge p~ (O) at the maximum likelihood
value is superimposeci on the exact px (O). . - - . . . . . . . . . . . . 42
2.2 For the gamma model with mean p and shape P the pvalue approxi-
mations aLR(p) and GBN(p) for testing p are pIotted for a sample of
20. The aberrant behavior at the maximum likelihood value is suc-
cessfully bridged using (2.32) together with a graphical d2 determined
from Figure 2.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 For the gamma model and data for Figure 2.2, the departure measures
di and d2 are plotted against the signeci likelihood ratio r and a
bridging straight line is obtained graphicaiiy. As dl and d2 are so
close that they overlap in this figure. . . . . . . . . . . . . . . . . . . 49
The approximations iPLR and QBN for the distribution function of the
noncentral chi-squareci distribution with degrees of fieedom 5 and
noncentral e2 = 1. The compression modification iPc avoids the neg-
ative values found with the aLR approximation. . . . . . . . . . . . . 59
Approximations to the t distribution by different formulas: the Barndorff-
Nielsen formula, the gamma formula, and the signed likelihood ratio
statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Logistic mode1 f (y; 0) = with obsenration y = O. Approx- (1 +
imations to significance function by different rnethods: the gamma
formula, the Barndod-Nielsen formula, and the signed likelihood ra-
tio statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Comparison of approximations to noncentrai x2 with degrees of free-
dom n = 5 and noncentrality 8 = 100. The two Cox & Reid methods
. . . . . . . . . . . . . . . . . . . are (3.12) and (3.13) respectively.
Comparison of approximations to noncentrai x2 with degrees of kee-
dom n = 10 and noncentrality 3 = 25.The two Cox & Reid methods
are (3.12) and (3.13) respectively. . . . . . . . . . . . . . . . . . . .
List of Tables
3.1 The gamma model f(y; 8 , P ) = r-1(P)(pO)fl-Le-%3, with B = 0.1.
The approximations to the pvalues for testing = 1 with different
observed go. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2 Approximations to the S tudent (2) distribution by the Bamdorff-Nielsen
formula, the Lugannani & Rice formula and the gamma formuia. . . . 71
3.3 Test in iocation logistic model. Approximations to pvalues by dif-
ferent methods: the signed likelihood ratio, the Barndos-Nielsen
formula, the gamma formula and the Bartlett correction. . . . . . . . 72
3.4 Approximations to G,,a($) with n = 2 . . . . . . . . . . . . . . . . . 80
3.5 Approximations to G? ($) with n = 5 . . . . . . . . . . . . . . . . . 81
3.6 Approximations to G3($) with n = 10 . . . . . . . . . . . . . . . . 81
Chapter 1
A Review on Theory Asymptotic
Inference
Parametric statistics is a very important and active branch of statistics. It de&
with distribution problems in situations where previous knowleàge and conjecture
provide us with the density form of the distribution except that some parameters in
the distribution are unknown. We are typicaiiy interested in estimating parameters
and then testing hypotheses on some or aü the parameters or obtaining coddence
coefficients. In some cases, these problems have a simple exact solution. But in
other cases either they do not have an exact solution or the exact solution is too
complicated to be used directiy. These situations seem to necessitate the use of
asymptotic methods. In this chapter we review some background in this area.
1.1 Likelihood and fmt order asymptotic theory
The study of parametric statistics by based on Likelihood h c t i o n was initiated
by Fisher (1922,1925). The likelihood is a function of the parameter as determined
by the data. It cari be viewed as summarizing al1 the information ia the data about
the parameter in a simple form. One of the various reasons for its current popuIarity
is that it cm provide simple and general approximation for sarnpiing distnbutious
of likelihood based quantities. Tbese approximations are associat ed wit h various
large sample limit results. The simplest prirnary result is cailed the Central Limit
Theorem, which States that the mean of a sample of n independent and identicdy
distributed variables is asympto tically normaiiy distributed as the sample size n goes
to infini@. In more generai contexts Central Limit Theorem type results hold as long
as the quantity of information supplieci by the sample tends to infinity. Increasing
the sample size is just one simple way of increasing the quantity of information.
Based on these limit tbeorems, we may just% first order asymptotic theory, which
we can summarize as foUows.
Assume that yl, yz, . . . ,y, is a sample from a statisticd mode1 (f (y; 8) : û E
R E W ) where y is a scaIar and û is a pdimensional parameter vector. Let
t(8) = t(û; y,, . . . , y,) = Cy=L=, logf(yi; 8) denote the log likelihood function and
- 8 = O(y1,. . . , y,) denote the maximum Iikelihood estimate of the full parameter
vector. To test the hypothesis
Ho:e=eo,
one familiar statistic is the Wilks' log likeliood ratio statistic
which converges in distribution under the hypothesis to the Central Chisquare dis-
tribution with p degrees of freedom where p is the dimension of e. See Wilks (1938)
for more details.
For the same situation, two other statistics are often us&. the score statistic,
and the Wald statistic, which are defineci respectively as
ae where U(8) = is the score hinction, i(0) = E (U(0)U(6)T) is the (expected)
Fisher information. These two statistics also converge in distribution to the same
Chisquare distribution; See Wald (1941, 1943). Both the W i ' and score statistics
are invariant to reparameterization of the mode1 and the hypothesis while the WaId's
statistic is not.
Several asymptoticaliy equivaient versions of these statistics are available due
to the asymptotic equivaience of the expected Fisher information and the observeci
a=l Fisher information j = j (ê) = -=(O) Ikd: that is, i (B ) in the above expressions can
be replaced by 3, and the resuiting statistics foiiow the same chi-square distribution.
If 8 is a scalar parameter, a square root version of these statistics is sometimes
mggesteci:
where m stands for the standardized maximum iikehood estimate. It is easily seen
that these are asymptotically standard normal.
If 8 is a vector parameter and we are interested in testing one of its scaiar
components, the full parameter can sometimes be partitioned as BT = ($, AT), where
$ = $(O) is the parameter of interest. ln such cases the proHe log likelihood function
and the constrained maximum iikelihood estimate are typicaiiy used to express the
fint order statistics:
where &, is the mnstrained maximum iikeiihoad estimate obtained by maamizing
l ($ , A) for sxed @, &(@) = 1($, b) is the prodie log iikelihood, and is the
component of the inverse of i(8) conespondhg to the $ component. For more
details see Barndorff - Nieison and Cox (1994).
These statistics sometimes give fairly sirnilar approximations for practical use.
However, it shodd be kept in mind that they are meant for situations where we
have a mfficiently large number of observations from a fairly simple model.
In other situations, they can give very different resuits for the same problem.
In such cases the likelihood ratio statistic is often more reiiable than the other two
statistics. Such situations suggest the search for improvements to these first order
results. In fact, in the past several decades, many efforts have been directed towards
improving these results, an area often called higher order approximation theory. The
theory not ody gives us accurate results in more cornplicated situations but also
gives us confidence in applying the first order theory in simple situations after it
is confirmeci by the higher order theory (Pierce & Peters 1992, 1994). The higher
order theory is the focus of next three sections.
1.2 Saddlepoint approximations
In order to get better approximation in likelihood inference, the asymptotic
expansion for the density of a statistic of interest is obtained so that the first two
or three terms provide an approximation to the density bc t ion . Even though
it is not aiways obvious that this approximation wiü give a better Enitesampie
approximation to the tme density than that based on the first term. But in many
cases this approximation turns out to be surprisingly accurate.
Let K, &, . . , Yn be independent, identically distributed random vecton £rom
a density f (y) on W. The moment generating function is denoted by
and the cumulant generating function by
The saddlepoint expansion of the density of the mean vector, Y = n-l Y;:, is given
by
The right-hand side of ( l . l ) , excluding the enor term, is c d e d the saddlepoint
approximation to the density of Y. The value 1 = is cded the saddlepoint and
is the solution to the saddlepoint equation
where Kr(@ = aK/aOIi and ~ ' ' ( e ) = # K / ~ ~ W I ~ , and (K"(@I is the determinant
of ~ " ( 9 ) . The remainder & has an expausion h power of n-'. The expansion (1.1)
was 6rst derived in Daniels (1954) using two Merent approaches. One approach
uses the "exponential tilting" (Efion, 1981). It first embeds f&) in a conjugate
exponentid family
= o = P { ~ [ K v ) - e T i ] ) -
Now the Edgeworth method is applied to obtain an expansion for fF(pj. This
technique is attributed to Cramer (1938) and Esscher (1932). The other approach,
fiom which the approximation takes its name, uses the saddiepoint technique from
applied mathematics The density of E is 6rst expressed as the Fourier inversion
integral of its characteristic function and then a Taylor expansion method is used to
get the saddlepoint approximation. An excellent review of this was given by Reid
(1988). Daniels (1987) aiso gives a detaiied discussion.
For the vector forrn of the saddepoint expansion, see Barndorff-Nieisen and Cox
(1989).
Even thougb introduced by Daniels in 1954, the saddepoint approximation was
not wideIy acknowIedged until a paper by Barndorff-Nielsen and Cm (1979) dis
cussed it for general statistical application.
1.3 The p*-formula
In the previous section it was mentioned that the saddiepoint approximation
is surprisingly accurate in many contexts. The qonential family provides one of
these contexts and in this case, the approximation can be reformulated entireiy in
terms of the likeiihood function.
Let Y', 6,. . . , Y, be independent and identicdy distributeci random variable
vectors with density
/>;(y; 0 ) = exp{oTy - k ( @ ) } h ( y )
with 8 E R C RP. Denote the sample rnean by Y = ! k;; then the log-tikeiihood
can be written as
!(O) = n d T I - nk(8).
It is easily seen that E is a minimal sufEcient statistic for O. The maximum iikelihood
estimate of 8, denoted by d, is defined by Y (1) = 9. Note that the transformation
from to 9 is one-to-one, so we can rewrite the log likeiihood as
Hence
Note that the cumulant generating lunctioo is K(t) = k(8 + t ) - k(8) . Then the
saddlepoint approximation, given by (1.11, leads to the foiiowing approximation to
the density of ê
where j(B) = -#l(0)/BBB~le=j is the observeci Fisher information, in this case
equal to n l ( 6 ) . If we replace ( 2 ~ ) - p l 2 by a normaiizing constant
C(O) = ( j U(ê)llpexp{c(e; 8) - e(ê; ê) }) ", e
(1.4)
the resulting approximation is
f (@) = ~ ( O ) l j ( e ) l l / ~ ~ { l ( 8 ; ê ) - ! (8 ;8 ) } { i + O(n-l)} (1.5)
and the relative error of (1.5) wiii be reduced to ~ ( n ' ~ / ~ ) (Dwbii, 1980). The
right-hand side of (1.5) is a special case of a more generai formula, which has come
to be knom as Barndorff-Nielsen's approximation or the p'-formula, as Bamdorff-
Nielsen (1980,1983) showed that (1.5) holds outside the exponential family and &O
investigated extensively its application outside the exponential family. The same
formula continues to provide an approximation to the conditional distribution of
the maximum likelihood estimate, conditional on an approximate anciIIary statistic
a. ApproxhateIy ancillary is taken to mean that the distribution of a depends on
O only in terms of O(n-') or higher, for 13 within O(n-Il2) of the true value. We
write the likelihood as ( ( O ) = ((8; 9, a) . Then the density of 6 is a p p r h a t e d by
p*(êla; 0) = c(0, a) ( j (9) ll/2exp{e(9; 0, a) - ((9; 8, a)} .
where c(8, a) is a normalizing constant in a sense similar to (1.4). Tbis formula has
an error of order to O ( ~ L - ~ / ~ ) .
An important feature of this formula is that it gives the exact density in the case
of transformation model.
The accuracy of this approximation has b e n extensively investigated, particu-
lady by Barndofi-Niehen (1980,1983,1985,1986). Detaiied discussions and reviews
are given by BarndorfWielsen and Cox (1994) and Reid (1988, 1995).
1.4 Tail probability approximations
IL statistical inference, we are often interesteci in approximating the cumuiative
distribution function as opposeci to the density function of a statistic, as we need
the cumulative distribution Eunction to compute a pvalue or confidence coefficient.
This section reviews recent results on using the ükelihood function to approximate
cumulative distribution functions.
Let us first discuss a simple situation where both data and the full parameter
are essentiaily one dimensional. This discussion will throw some Iigfit on more
complicated situations.
Assume that YI, Y2, . . . , Y, are independent and identicaliy distnbuted from a
density f (y; 8) 6 t h cumulant generating function K(s) = log E exp (sY). A saddle-
point approximation to the cumuiative distribution b c t i o n of E was first derived
in Lugannani & Rice (1980) and reviepred in Danieis (1987):
where @ and 4 are the cumuiative distribution function and density function of
standard normal, respectively,
and 9 is defineci by Kr( i ) = Y. The relative error in (1.6) is 0(n-3 /2) .
In the speciai case of a canonical exponentid family mode1
the maximum iikelihood estimate 6 is a one-tmne h c t i o n of k. The iikelihood
formulation of the quantities r and q is
where j (9 ) = -Ci(B). It can be seen that, at the value 6 = 8, the approximation
breaks dom, as r = q = O, It shodd be replaceci at that point by its limiting value
&(a; 6) = $ + r n / [ 6 6 ) , where = n-L~(d) / {n -L j ( ~ $ ) ~ f ~ (Reid, 1996).
Intriguingly, however, the results are not restricted tu distributions with CU-
mulant generating functions, but app1y tto generai statisticd models f (y;d) with
smuothness and asymptotic properties.
Denote by F ( y ; d ) the cumulative distribution function of Y. We consider the
approximation
with r as in (1.9) but with a q somewhat different fiom (1.10) that uses a nominai
reparameterization (p(9) specific to the data point y,
where
Fraser (1990) derived the expression in (1.12) by using a tangent exponential
model approximation to a generai one-dimensionai model. An alternative expression
for q in (1.12) is
9 = ( ~ ( 8 ) - t i ( e ) } j ( ê ) - ; ,
where e,i(B) = %(O; y)/a9.
This expression for q is due to Barndofi-Nielsen (1988,1990) and was derived by
integrating the p*-formula directly. Actuaily Barndorff-NieIsen derived a so-caiied
r8-formuia
where
This approximation is accurate to the same order of accuracy as that in (1.11).
Their quivalence was examineci in detail by Jensen (1992) by expancihg r* about
r for exponentiai family models. It also foiiows fiom expansions given in Barndorff-
Kelsen (19861, as indicated by DiCiccio and Martin (1993).
The approximation given by (1.11) or (1.15) is not only asymptotically third-
order accurate but also is found to work very weii even for mal1 sarnples.
We now extend the approximation (1.11) or (1.15) to the vector full param-
eter case, where we suppose one scaiar component is of interest. Denote the p
dimensional parameter 8 by (.Sr, AT)= where .CO = $(O) is the component of interest
and X is the (p - 1)-dimensionai nuisance parameter.
The minimum sufficient statistic may or may not have the same dimension as
the full parameter. If it has and we are in an exponentiai model, we cm eiiminate
the nuisance parameters by conditioning or by marginaiization, and if we are in a
transformation model, we can eliminate by marginaiizing. For the exponentiai family
case, see Fraser, Reid & Wong (1991) and Skovgaard (1986), for transformation
models, see DiCiccio, Field & Fraser (1990).
If it has higher dimension, the generai approach is first to reduce its dimension
to that of the full parameter by conditioning on an exact or approximate anciilary
statistic. Skovgaard (1986) and Raser & Reid (1995) show that a second order ancil-
lary is suficient to give a third-order accurate approximation as in (1.11) or (1.15).
Barndorff-Nielsen (1986,1991) and Barndorff-Nieisen & Wood (1998) describe how
to constnict an ancillary in some situations.
Denote by a an exact or a p p r h a t e anciiiary. The fidl log-likelihood function
e($) = !(O; y) ean be written as L(9; 8, a) and the sample space derivative e;è(B; 8, a)
is deüned by $((O; 6, a) for fixed ancillary. The approximation is again given by
(1.11) or (1.15) with the foilowing r and q:
w here
the nuisance parameter A.
This approximation depends on the speciûcation of an exact or approximate
anciilary statistic a as the calculation of the quantity q in (1.18) involves a. But in
generai the choice for an ancillary may not be apparent.
Fortunately, Raser & Reid (1995) show that the explicit form for an ancillary is
not necessary. We only need its tangent direction at the data point for the calculation
of the pvaiue. Raser, Reid & Wu (1999) develop a general tail probability formula
which is the focus of the rest of this section.
Consider a vector response y = (yi, y2,. . . , yn) with independent scalar com-
ponents and density function f (y; 8). Assume that log-likelihood fûnction L(8; y)
is O(n) in 0, the m h m likelihood estirnate converges at rate n-4 to 0, and
various differentiability properties hold as discussed for example in DiCiccio, Field
& Raser (1990). The thiid-order approximation to the pvalue still has the form
of (1.11) or (1.15) with the same r in (1.17) but with a special q designated Q to
distinguish it fiom the special q as discussed earlier; the third order accuracy of
this approximation is the central issue addressecl in Fraser, Reid & Wu (1999). The
construction of Q involves two steps: the fùst step is a reduction by conditioning
from the dimension n of the variable y to the dimension p of the full parameter.
This reduction in dimension is obtained by conditioning on an exact or approxirnate
ancillary. As we mentioned earlier, we only need its p tangent direction vectors at
the data point. Denote these vectors by V = (vl, uz, . . . , v,) where all the vils are
n-dimensional vectors. So V can be treated as an n by p array. This array c m be
constructed by using a vector z = (zt, a,. . . , zJ' of pivotal quantities t, = &(y; 0)
that has a ûxed distribution. One simple choice for the case of independent scaiar
coordinates uses the successive distribution Eunctions q = F(yi; O ) , i = 1,2, . . . , n,
which are uniformly distributeci. Then the array V is obtained by differentiating:
where the first expression is calculateci for hed z, and yo,&O are the observed data
point and maximum likelihood estimate, respectively. Then a tangent exponentiai
model, which is an exponential f d y mode1 whose asymptotic expression is closest
to that of the true model, is obtained. The canonical parameters of this exponential
model are obtained by dinerentiating the likelihood in the tangent direction V:
where is the ?" row of V and &L(B; y) denotes the vector of directionai derivatives
(al(0; Y)/&J~, --- , a.!(& that is defined by de(#; y)/& = a.!(& y + t ~ ) / a t ( ~ , ~ .
Now we corne to the second step: eliminate the nuisance parameter by marginal-
ization to a pivotal quantity that depends on the parameter of interest T+!J but whose
distribution is free of the nuisance parameter A. By doing this, the parameter of
interest is isolated and the dimension p of the variable is again reduced now to the
dimension 1 giving what can be viewed as an intrinsic measure of departure from
what is expected under $(O) = II.
Typicaliy the reparameterization cp in (1.20) d o s not have the parameter of
interest q9 as a Iinear component. Hence a new scalar parameter is extracted fiom
p and then used to access $J:
where
where
the argument p in (1.22) is to indicate that the full and nuisance information deter-
minants are recaübrated on the cp scale:
This is the freguentist analysis of the third order approximation. The Bayesian
analysis can &O be found in Fraser, Reid & Wu (1999). And this Bayesian version
generaiizes the r d t s of DiCiccio & Martin (1991, 1993).
When we are interesteci in several components of a vector full parameter, one
option for making inferences is to use the "sequential method": testing the corn-
ponents one by one. Sometirnes simultaneous inference for them may seem more
appropriate. Skovgaard (2000) gives a detailed discussion of this.
Here we have concentrated on the methods developed by Barndofi-Nielsen and
Fraser and many other authors. This does not mean these methods are the only
options to get high order accurate pvatues. In fact, at least another asymptotic
approach to the same problem is also avaiiabie. The usuai test statistic to test p
components of 0 is the generalized Iikelihood-ratio statistic:
w($) = 2{4 t j t . - w, ), whicb is asymptoticaiiy $-distributed. It can be shown that
and that
So an idea to improve the accuracy of the inference is that, instead of using w(@) in
(1.23), we can use G($) in (1.25). This technique is known as the Bartlett correction,
which originates from 8artIett (1937) and Lawley (1956). The problem with this
method is that it is not easy to implement as a generai expression for estimate 6(3, A)
is compiicated, invoiving fourth-order cumulants of derivatives of the log-likeiihood
function. Some background may be found in DiCiccio (1986), DiCiccio & Stern
(19931, Barndo&-Nielsen & Cm (1994, Ch.9) and Pace & Salvan (1997, Ch.7).
We compare in some examples the performance of these different approximations
with that of the approximations we develop in later chapters.
1.5 Overview and Outline
This thesis concentrates on the iikelihood asymptotics: it develops the theory
for rernoving singularities in the BamdorfE-Nielsen combining formuIa (Barndodi"
Nielsen, 1986) and the Lugannani & Rice formula (Luganuani & Rice, 1980). Zt also
develops several alternative combining formulas.
Chapter 2 proposes a procedure for removing the singulanty of the combining
formulas at the m i x h u m likelihood estimate. This kind of singularity is a down-
stream version of one addressed by Daniels (1987) for the scalar saddlepoint context.
Our procedwe is based on the tangent exponential model (Cakmak, et ai, 1998 and
Abebe et al , l995), which is an exponential model whose asymptotic expansion at
the data point of interest is closest to that of the tme model. The likelihood is then
approximated in t e m of the standardized cumdants of the approximating tangent
exponential model. So asymptotic expansions of the two elements T and q, and
hence of the r* in the Barndorff-Nielsen formula cm be obtained. The expansion
for r8 provides a third order bridge for cornputhg the p d u e for the values of the
parameter in a srnali neighborhood of the rnmimuni Iikeiihood estimate. A compar-
ison of the thickness of the taiis of the interest distribution to that of those obtained
for the normal distribution provides the basis for defining a nonnormality measure-
The impiications of this measure are also discussed. There we also address the issue
of discontinuity at the extrema for the Lugannani & Rice formda arid provide some
remedies for t h .
Chapter 3 devebps an alternative third order combining formuia using the gamma
distribution. One feature of the gamma distribution is that its one taïi is thicker
than that of the nomal whereas the other is thinner than that of the normal dis-
tribution. This feature makes the gamma very useful in computing the pvalue for
both thick-tailed and thin-tailed distributions. Results indicate good performance
of this formula in various situations.
Chapter 4 develop another alternative combiiing formula using the Student t
distribution. Due to the fact that its two tails are thicker than that of the normal,
however, this formula can be used only for thick-taited distributions.
Fiaaiiy, a brief conclusion is given in Chapter 5.
Chapter 2
Likelihood Asymptotics:
On Removing Singularities
Recent Iikelihood asyrnptotics has produced highly accurate pvalues for very
general contexts. The terminal combining formulas for the production of these p
values have certain singularities, however, near the maximum likelihood value. The
singularities at the maximum iikelihood value are a downstream version of an is-
sue addressed by Danieis (1987) for the scaiar saddlepoiut context; He provided
an approximate value at the singularities, which invotved a standardized third or-
der cumulant. For a generd statisticd context we develop a third degree bridge
for the pvalue function at the maximum iikelibood d u e for the case with no nui-
sance parameters, and a limiting value at the singularity For the case with nuisance
parameters.
2.1 Introduction
The saddlepoint method introduced to statistics by Daniels (1954) and Bamdorff-
Nielsen & Cox (1979) gives a highly accurate approximation for a density function
with known cumulant generating function. Lugannani & Rice (1980) used the sa&
dIepoint met hod to develop a distribution function approximation as an alternative
to numericd integration of the approximate density function. The Lugannani &
Rice (1980) approximation has a singuiarity at the saddlepoint, which c m be re-
pIaced (Danieis 1987) by its limiting value, a multiple of a third order standardized
cumuIant. Barndofi-Nielsen (1986) developed an alternative distribution function
approximation as part of extending results beyond the exponential model context.
These distribution function approximations quite generaUy use two rather dif-
ferent inputs of information from likelihood. The first is almost always the signed
square root r of the log iiieühood ratio given below at (2.3). The second is some
appropriately defined maximum likeiihood departure p; the search for the appropri-
ate q has been the recent focus for obtaining progressively more general pvalues in
likelihood asymptotics.
These two inputs are combineci using either of the foilowing two formulas to give
a third order pvaiue for testing a scalar parameter value:
due to Lugannani & Füce (1980) and BamdorfKNielsen (1986) respectively, as de,
veloped for specific contexts; CJ~(T) and @(T) are the standard normal density and
distribution functions.
In the cases we consider, T is the likelihood root, the signed square root of the
log likeiihood ratio statistic,
where !(O; y) is log likelihood, + (O) is the scdar interest parameter with tested value
+, and 9 and $ are the maximum Wrelihwd d u e s without and with the constraint
@(O) = $. The definition of q is fess straightforward as it typically depends on
more than just observecl likelihood; several expressions are recorded below at (2.13),
(2.30), (2.33). Both r and q are standard normal to order ~ ( n - I l 2 ) , but with the
appropriately defined q the pvalues given by (2.1) and (2.2) are distributeci under
the model as uniform (O, 1 ) to order ~ ( n - ~ / ~ ) .
We are assuming that we have a continuous statistical model J(y; 8) with di-
mensions n and p for y and O , and that Q(8) is a scaiar parameter of interest. The
reduction in dimension fiom n to p is achieved in principle by conditionhg on an
a p p r h a t e ancillary statistics, but only the tangents to the approximate ancil-
lary are needed at the data point of interest. It is shown in Raser & Reid (1995,
2000) that these tangents cm be obtauied from a full dimension pivota1 quantity
z = z(y; 8) ; related details are recorded in Sections 2.3 and 2.4.
The r and q in (2.1) and (2.2) are functions of $J and y and are both close to zero
when is near &y). This poses obvious numerical difüculties for the evahation of
T-' -q-' and r/q; see Figure 2.1 for an example of numerical perturbations near the
maximum likelihood value. h o , for extreme values of $, the second t e m in (2.1)
can overwheim the tail probability from the first t e m and give a d u e outside the
acceptable range [O, 11 for a pvalue; see Figure 2.4 for an example where a range of
values less than zero are recorded for large values of the parameter.
Zn Section 2.2, we give some background on third order asymptotic exponential-
type model, both the likelihood centered version and the density centered version.
This serves the basis for the development of the bcidging formulas for Iater sections.
In Section 2.3 we examine an asymptotic model with scalar variable and param-
eter and define two measures of how the asymptotic density of the signed Iikelihood
ratio departs from the standard normal; asymptotic expressions are obtained for the
measures and they are seen to be essentiaiiy equivalent.
Ln Section 2.4 we examine the scalar parameter case and use the two measures
of departure to develop a third order bridge for the singularity at the maximum
likelihood value.
In Section 2.5 we examine the case of vector full parameter 0 with scalar interest
paramete. $ = +(O). The techniques from Section 2.3 are then used to develop a
second order bridge at the maximllm Likeiihood value.
Then in Section 2.6 we use functionai properties of the departure measures to
develop simple third order graphical procedures for bridging at the maximum value,
for both the scalar and vector full parameter cases.
In Section 2.7, we generalize the relation between two departures for any general
point.
In Section 2.8 we consider alternatives to the combining formulas (2.1) and (2.2)
to avoid the possible singularities at the extremes of the parameters being tested.
Third order asymptotic exponent id-type model:
likelihood centered and density centered
For the development of the concept of nonnormality and the theory for remov-
ing the singularity intrinsicaliy existing with the Barndorff-Nielsen formula and the
Lugannani & Rice formula, we need some background on third order asymptotic
exponential models, both the likelihood centered version and the density centered
version. They were timt introduced by Abebe et al (1995) and Cakmak et al (1998),
respectively. These are both exponential type approximations. The density centered
version is an approximation to the density for a certain h e d parameter d u e and is
therefore reIevant to the distributional property whde the likelihood centered version
is for fixed data value and would therefore bear on effectiveness of test quantities
for inference with fixeci data. The ideas and techniques used in developing these
two versions of approximations are very similar. So we wiii concentrate on decr ib
ing the development of the iikelihood centered version. Then at the end of this
section we will bnefly record the density centered version of the exponential type
approximation.
For testing a scalar interest parameter in a large sample asymptotic context,
methods with third order accuracy are available that make a reduction to the sim-
ple case having a scalar parameter and scalar variable. Such a reduced mode1 can
arise by marginalization and conditionhg with an exponentiai model or by condi-
tioning and marginaiiiation with a transformation model; the latter sequence is also
available generaiiy in the asymptotic context (Fraser & Reid, 1993, 1995).
For a reai variable and reai parameter consider a density function fn(y; 8) that
depends on a mathematical parameter n, usuaiiy sample size. We assume that for
each 8, y is 0,,(n-L'2) about a maximum density point and that [(O; y) = log f&; O)
is O(n) and with either argument h e d has a unique muimum.
The objective of developing exponential type approximations is to approximate
the observed significance function p(O) = P(y < y"; 8) by the exponential approxi-
mation rather t h a . by the usual normal approximation. Accordingly we first expand
the log-density t (B ; y) in a Taylor series expansion in parameter and variable about
a data vdue y0 of interest and the correspondhg maximum iikelihood parameter
value O, = &rla),
rhere the coefficient a, = (@/û@)(a>/&J)l(0; y)I and in partidar <rio = O 00 ;uo
from the maximum ükeIihood property at $O.
We record the coefficients in a rnatrix,
We then reexpress b t h 8 and y and do so in two steps. In each step, we wili
record the new parameter and variable <p, x as functions of the old 8, y and record
the new coefficients A, as functions of the old. To avoid notationai growth, we will
then replace cp, x, A by 8, y, a for the next step. The transformations thus need to
be compouuded accordingIy.
In the fùst step, a tocation-scate standardization is applied to the variables y
and 8
This transformation malces the variabIe center at the data value y* and also gives
a unit cross Hessian. The new coefficients are
with Aj = 0(n-*+l) for i + j 2 2, as a consequeme of the asymptotic properties;
thus AiZ, AO3 are O(n-Li2) and 4 0 , A31, An, Al3, AM are O(n"). Other terms
with a + j > 4 are ignored as we choose to work only to order 0(r1-~1*). SO in terms
of lower case letters, the array of coefficients takes the forni
In the second step, we consider a reexpression of the variabIe and the parameter
so that this approximation has key characteristics of a canonical exponential mode1
(Az1 = A31 = AIZ = A13 = O). This requires transformations
The resuiting coefficients, in terms of old coefficients, are
and the coefficient array takes the form
As we know, for each 8 there is a density and the density has an integral of 1.
This means that the coefficients are interconnecteci; in fact with A30 = -a3/n1/*,
ho = -cq/n, and A= = c / n , it can be shown that al1 other coefficients are
detennined and the coefficient array takes the form
(ai,) =
where a = -$log(2~), a3 and cq are standardized third and fourth derivaiives of
the log density ((9; x) with respect to p at {p(e0), xO} and p = p(9) and x = z(y)
are local reexpressions of 8 and y that are used to obtain the tangent exponential
model approximation relative to the data point y0 and and c is a measure of nonex-
ponentiality; if c = O the model is exponentiai to the third order; these are intrinsic
parameters describing shape characteristics of the model. For some recent details
see Andrews, Raser & Wong (2000).
An asymptotic statistical mode1 having log density
with coefficient array (ai,) as in (2.9) is caiied in Cakmak et al (1998) the canonical
exponential type asymptotic model in iikelihood centered form; it describes the
essential characteristics of a large sample distribution relative to the exponential
pattern. This mode1 gives a method to compare the score, maximum iikeiihood
estimate and the signed likelihood ratio departures for a fixeci data point.
The density centered version of the asymptotic exponentiai type model was de-
veloped for a chosen parameter value and the correspondhg maximum density value
in Abebe et al (1995). The model takes the the same Taylor expansion forrn as in
(2.10) but with coefficient array as follows
where a = -; log(27r), a3 and 4 are standardized third and fourth derivatives of
t(p; x) with respect to x at {cp(Oo), i ( O O ) ) where *(O0) is the maximum density point
for 80, and cp = ~ ( 0 ) and x = x ( y ) are local reexpressions of 8 and y that are used
to obtain the tangent exponentiai model relative to the parameter value go; the
constant c is a measure of nonexponentiality and is given as û"e/û&2 evaiuated
at {lp(Bo), i ( O o ) ) . This model gives a method to compare the score, maximum
iikelihood estimate and the signed likelihood ratio departures for a fixed parameter
value.
2.3 Departures from standard normality : scalar
case
Consider first the case of an asymptotic model with scalar variable and scalar
parameter. Many properties for the more p e r d context c m be denved from this
case. We assume the model f (y; 19) leads to a log density .!(O; y) = log f (y; O) that
has the usual asymptotic properties, such a s t ( O ; Y ) = O,(n), var {f(& Y)) = O(n)
and so on. For testing a value 8, the likelihood quantity q is intrinsicdy based on
a locally defineci reparameterization,
which is the canonical parameter of the tangent exponentiai model at the data
point y0 of interest (Raser, 1990). This is used to fonn the standardized measure
of departure
where jet = - ~ ( 9 ; y) is the observed information and Ir, = 8.4?(8; y ) /&?@ =
&p/Oe evaluated at the maximum Iikelihood value o(~) adjusts the information
standardization to that for 9. The likeiihood based third order approximation to
the density of the iikelihood root r(0; y) is then given by
where k is a constant to third order. This can be obtained by change of variable
from y to r starting from the p'-formula (Barndofi-Nielsen, 1983) or starting from
the sacidlepoint approximation to the tangent exponential mode1 ( b e r & Reid,
1995, 2000).
We investigate how the third order likelihood based distribution for r differs from
the nominal standard normal of ûrst order theory; and we can do this in terms of
the density Cunction (2.14) or the distribution function venions (2.1) and (2.2).
In terms of its functional form the expression (2.14) has the Factor r/q at tached to
the basic standard density 4(r). The factor r /q is greater (or les ) than 1 according
as a tail of the density for r is thicker (or thinner) than that of the standard normal.
As a measure of departure fiom standard normal we then examine how r/q exceeds
1 taken relative to r:
We muid ais0
nominal (r) ,
examine the distribution function (2.1) and how it fails short of the
taken relative to the density 4(r),
this gives the same measure.
For a second measure we examine the agument of the distribution function
approximation (2.2); the argument is usually designated r*. We consider how it
f a short of the nominal normal deviation T :
We now determine the asymptotic form of these departure measures, as a means
to bridge the singularity at the maximum Likelihood point. Taylor series expansion
methods were used in Cakmak et al (1998) to determine the local f o m of a statistical
mode1 reIative to a particiùar data point, Say go.
In the case that y and 0 are both scalars we can examine the asymptotic form
of d(r, q) as a Eunction of r and q by expanding the log density !(O; y) = log f (y; 8)
about a reference point in tems of standardized deviations for y and for 6. A data-
oriented expansion (Cakmak et d, 1998) uses (Bo, go) = (d(&), go) and standardizes
with respect to coefficients of B2 and By. If the departures are then reexpressed
in exponential mode1 form to the third order we obtain the iikelihood centered
exponential mode1 (2.10) with coefficient array as in (2.9). F'rom this we obtain the
Iog fikelihood at y0 and the log density at do aven respectively by
and
The ody nonzero mixed derivative terms to this order are y8 and y202/4n. kL is a
constant and a = - log(2n)/2. The a3 and a4 are the standardized cumulants of the
nul1 density, and c is a measure of nonexponentiality; these are intrinsic parameters
describing shape characteristics of the model. Rom this an expansion for q in terms
of T for h e d y = go is obtained,
which gives
These r d t s are used to obtain asymptotic expressions for dl and d2 for fked data
y = go and varying 8:
In a paralle1 way, for a parameter-oriented qansion (Abebe et al, 1995) we can
use (Ba, y,,) = (Bo , g(Ba)) and standardize with respect to coefficients of yl and 30. If
the deparhm are then reexpressed towards qonential form we obtain the density
centered tangent exponential model (2.10) with coefficient array (2.11). Then the
log density at 80 and the log likelihood at y0 are given respectively by
and
together with mixed derivative temu y9 and $02/4n. k2 is a constant. From this
an expansion for g in terms of r for fixed B = O0 is obtained,
which gives
and thus gives asyrnptotic expressions for the two nonnormality measures dl and dz
for ûxed 0 = Bo and varying y:
These two types of expressions for di and dz cm be interrelateci by taking the
model (2.18, 2.19) centered at (eO, yO) and reexpreming it in the form (2.22, 2.23).
For this we take f& = @ which is zero io (2.191, and obtain $(go) = a3/2n1f2 + O ( n - l ) ; this then gives a3 = (YS, to order O(n- l ) and = a4 - 3a3 - 6c to order
O(n- I l2 ) . The constants kl and k2 check under the reexpressions. We can then
record (2.24) and (2.25) respectively as
Formulas (2.20), (2.21), (2.26) and (2.27) al1 use standardized likelihood cumu-
lants CYQ, a4 for a point (yo, Bo) with r = 0; the first two record the change in d for
fixed y0 and the last two for ûxed Bo.
If Bo = 8(yo) or if y0 = t ( B o ) then the expan.oo coefficients are linked by the
norming property which gives a3 = a3 + O(n- I l2 ) , a4 = a*( - 3 4 - 6c + O(n-Il2).
All these expressions for the di are accurate to 0(n-3 /2) .
Some clarity on the roies for the two versions (2.20), (2.21) and (2.24), (2.25)
arises by noting that T and q are functions of y and 8 for a moderate deviations
range h m some initial y0 or do of interest. Along the niNe C = { ( B , y) : 0 = &)}
we have r = q = O and to first derivative we have r = p. The departure measure is
then describing how T and q differ beyond the first derivative.
We couid have started with a point (Bo, yo) with some particular value for r =
~ ( 8 0 , yo) and then used (2.17) to examine change in r for 6xed y0 or (2.24) (2.25) to
examine change in T for fixed O*. For this we note that the aa, a4 wouid be values
determined on C with the particuiar go, and as, aq, c wotdd be d u e s determined
on C with the particular Bo, Section 2.7 discusses this in details.
Ezample 2.1. Cauchy location modei.
Consider the location Cauchy f (y - 0) = reL{(l + (y - O)*}-' with n = 1. For
this we have 6 = y and
The exponentiai parameter can be standardized, cp = &9/(1+ 02), giving
from which we obtain as = 0, a4 = 9, and thus di = dz = Dr. The la& of skewness
removes ciifferences between the two versions of the departure measure.
More generaiiy when the parameter 8 is a scalar but the observable variable has
dimension n we d e h e a vector u by
where r = z(y,B) is an n x 1 vector of natural pivotai quantities. As shown in
Raser & Reid (1995), this vector can be used to define a canonicai parametrization
rp for the original model, and then defining q as the standardized maximum likeli-
hood departure in this parametrization ensures that (2.1) and (2.2) are third order
approximations to the p-value conditiond on an a p p r h a t e l y ancillary statistic.
Thus the dimension reduction from n to 1 is achieved by conditioning on an a p
prmhate andary statistic, but this anciiiary is not explicitly needed, just the
derivative of l in the directions (2.28) for the ancillary at the data point. Using vy
the reparameterization cp in (2.12) is generalized to
and the expanded expression for q in (2.13) uses derivatives in the direction v rather
than with respect to the onginal scalar y:
In this more general context the expressions (2.20) and (2.21) for the departure
measmes remain avaiIable, but the versions (2.24) and (2.25) for varying data point
are typicaiiy not available, as they would need mode1 information dong the contour
of the observed approximate ancaary.
2.4 Bridging the singularity: scalar case
The measuses of departure developed in the preceding section provide a simple
and direct means for bridging the maximum likelihood singularity in the pvalue
formulas.
From (2.1) and (2.20) we obtain
and h m (2.2) and (2.21) we obtain
These c m be viewed as Bartlett type corrections to the IikeIihood ratio but are
derived fiom observed likelihood.
Example 2.2. Cauchy Iocation model.
Consider the location Cauchy mode1 with data y = O, as examined in Example
2.1. From the two bridging formulas we obtain
The exact p d u e is of course anilable as
p(8) = .5 + n-' tan-L{ I (e'12 - 1)L/2)
At T = O a11 three are equd to 0.5. At close to r = O we check numericaily; at the
point say T = 0.1 we have
p = -522499
The rather srnail departuse of the approximations fiom the exact is of course due to
the alma Mpossibiy smaU sample size n = I and to the sharp peak at the centre
of the Cauchy model.
For the bridging f o d a s (2.31) and (2.32) we couid have done a full Taylor
series expansions in r but, as in many sirnilar asymptotic calculations, there are
advantages to retaining the d(r) and @(r) which reflect the dominant role of the
signed likelihood ratio r.
Ezample 2.3. Consider the simple gamma mode1 on the positive a&,
f (y; O) = r-L(e)Ye-fe-y ,
with data y = 10. The sigrdicame hnction p(0) is pIotted in Figure 2.1. Note
the computationai ircegularities near the maximum likeiihood value e = 10.495838.
Simple calculations give a3 = -0.315901 and a4 = 0.199422 Çom which we obtain
giving the bridge
pB(0) = (P(0.9972348~ - 0.0526502) .
The likelihaod approximations QtR(O), GBN(B) are plotted in Figure 2.1 together
with the bridge pB(8) and the exact px (O). Cleariy a simple algorithm can choose
between the approximation and the bridge to give a ciose construction for the exact.
Figure 2.1: The gamma mode1 i'-l (O)#-' with y* = 10. The asyrnptotic apprm-
imations GLR(0) and GBN(6) for the pvalue function p x ( d ) for testing 0 are piotted
against 8. The bridge pB(d) at the maximum likelihood value is superimposed on
the exact ~ ~ ( 0 ) .
2.5 Bridging the singdarity: scalar interest
Now consider a continuous statistical mode1 f (y; 8) with dimensions n and p for
y and B and let $ ( O ) be a scalar interest parameter with Say 8' = (A', $). Again
there is an approximate anciliary with vectos V = (vi - -. up) given as
where z = z(y, 19) is stiil an n x 1 vector of natural pivotal quantities. The exponentiai
type parameter is
and d/dV gives a row vector of directiond derivatives; for some discussion of exam-
ples see Fraser, Wong & Wu (1999).
For testing $ (O) = $ using (2.1) or (2.2) the r is given by (2.3) and the q by the
following extension ( h e r , Reid & Wu, 1999) of (2.13) and (2.30):
where the numerator and denominator determinants are the full and nuisance in-
formation determinants recalibrated on the cp scale, and ~ ( p ) = u$p is a rotated cp
coordinate based on a unit vector
perpendicdar ta @{û(cp)) at the constrained maximum liketihood d u e &,.
For bridging the discontinuity at the maximum likelihood value @(ê) = 4, the
calculations are more complex and we temporarily restrict our attention to O(n-l)
accuracy. Let f!(cp) = 6'{B(cp); yO) be the observed likelihood reexpressed in terms
of cp and suppose it has been normed and recentered and rescaled so that go = 0,
! (O) = !,(O) = O, and eV# = -I. Then in tensor summation notation we have
to second order. Aiso for convenience we restrict attention to a p = 2 dimensional
parame ter.
For the scdar interest parameter 7,6{0(cp)) we suppose that the <p coordinates have
been rotated so that 7,6(rp) = 4 is tangent to pl = O and $(O) has been relocateà
and rescaled so that = O, û$/ûpt = 1, at the maximum likelihood value cp = 0;
then +(p) = cpl+ c(p2/2n1/* where c is a second derivative measuring the curvature
*
of 7,6 = relative to at cp = 0.
The signeci likeiihood ratio for testing 7,6 can be calculated to the second order
The maximum likeiihood departure (2.33) uses a unit vector q which is the bt
coordinate vector at $J = O and Iocaliy can change direction by ~(n-'/'); the de-
parture, however, 2 - X* = -yl to second order based on the cosine of an O(n"I2)
angle. The nuisance information is
Et foiiows that
Combining (2.35) and (2.36) we obtain
fiom which it foliows that
d = -(aH' f 3a1" - 3c)/6n11* .
The bridging pvalue formula is then
p($) = @(r) - d d ( r ) = @(r - d )
to second order.
2.6 Graphical bridging of the singularity
For the case of a scalar full parameter we have seen in Section 2.3 that the
departure measutes (2.15), (2.16) and (2.17) are linear in r to the third order and
thus provide simple third order bridging, using (2.31) and (2.32)- For the more
general pciimensional fidl parameter we have fiom Section 2.5 that the departure
measures are constant (2.38) to the second order.
The development (Fraser & Reid, 1995, 2000) of the pvalue formulas fiom tan-
gent exponential model approximations records the pvalue as a tail probability from
any adjusted asymptotic density; and Cheah et ai (1995) show that such an adjusted
density is itself an asymptotic model. Together thae show thcrt the departure mea-
sures
are asymptotically iinear in r to the tùird order under parameter change for ûxed
data. This is of course consistent with the familiar location-scale standardizations
of the signed likelihood ratio that gives a third order standard normal variable.
Now consider a particuiar assessrnent of a parameter @ with given data together
with possible instabity in the pvalue formulas (2.1) and (2.2). We propose pfotting
dl and d2 against the signeci likelihood ratio r. Any instabiity in the pvalue formulas
will show in dl and d2, as 9(r ) is typicaliy smooth. Accordingly we propose fitting
a line for di or d2 plotted against T , excluding the middk possibly unstable values
and the extreme values; the fitted dl or 6 is then used with (2.31) and (2.32) to
bridge the singularity-
Exampie 2.4. Consider the gamma mode1 with mean p and shape parameter P,
and data from Fraser, Reid and Wong (1997):
152 152 115 109 137 88 94 77 160 165
125 40 128 123 136 101 62 153 83 69.
For testing the parameter p we record the approximations and aBN(p) in
Figure 2.2. Note the aberrant behavior near the maximum likelihood value fi0 =
113.45- For bridging the b0 value we plot dl and d2 fiom (2.39) against the likelihood
ratio r , in Figure 2.3. The bridging p value using (2.32) with the marked segment
of the straight line fit for d2 is then recorded in Figure 2.2.
Figure 2.2: For the gamma mode1 with rnean p and shape 0 the pvalue approx-
imations iPLR(p) and <PB&) for testhg p are plotted for a sample of 20. The
aberrant behavior at the maximum likelihood value is successfully bridged using
(2.32) together with a graphicd d2 determined fiom Figure 2.3.
Figure 2.3: For the gamma mode1 and data for Figure 2.2, the departure measures
dl and d2 are plotted against the signed likelihood ratio r and a bridging straight
fine is obtained graphically. As d l and d2 are so close that they overlap in this figure.
2.7 Int errelat ing two nonnormaüt y measures
In this section we interrelate two nonnormality measures of departure for any
generd point (y0, Bo). For this, we take Likelihood centered exponentiai type asymp
totic mode1 (2.10) with coefficient array as in (2.9) (which we cal1 mode1 A hereafter)
which is centered at (yo, O(yo)). So in this model the new coordinate of y0 is zero.
We denote the new coordinate of 90 in this model by 60. The idea to interrelate
these two measures is to reexpress model A into density-centered exponential type
asymptotic modei (2.10) with coefficient array as in (2.11) (which we c d mode1 B
hereafter) so that a3, Q and c of mode1 B cm be written in terms of as, a4 and c of
model A. Rom t h , dl and 4 c m be easily rewritten in terms of a j , ~ and c.
From mode1 A, we know that the density is
Now we Bnd the maximum density point for density with 0 = JO. Setting the
gradient
quai to zero, &(0; 3) = O and solving for y with 0 = 60 gives
To reexpress model A into mode1 B, we need several steps. As in the procedure
used to deveiop the likelihood centered tangent exponential model, a t each step we
wi l l make a change of parameter and variable kom (B,y) with coefEcients(qj) to
new parameter and variable (cp,x) with new co&cients (Aj) and record the new
coefficients A,, as functions of the old. To avoid notationci1 growth, we wül then
replace 9, x, A by 8, y, a for the next step. The final transformation required can
be obtained by compounding those used in eacti step accordingly.
First we recenter the model at (8, JO) using
{ cp = 0-&O
x = y - y ,
consequently, the log density now bas form x:,, & p ' ~ ~ / i ! j ! where coefficients
Our goai is to reexpress model X into modd B so that we can get the relation
between as, a4 and w, ad, C. The coefficients in the k i t column wiU not be needed
to get that relation; accordingly, we wiU not calculate them in the subûequent steps.
Other co&cients are of order 0(n-3/2) and are ignoreci. AU 4 ' s are now ehanged
to lower case au's and (z, p) iç changed to (y, 8). After t h , we can go to the second
step: make the transformation
to standardize the variable with respect to its second derivative at the maximum
for the nuil density f (y, 0) and standardize 0 to get a unit Hessian between the new
parameter p and variable x. The resulting new coefficients are now
53
and A31 = O . We record the coefficients in an array where the first colurnn values
are changed to lower case letter as before:
Again the variable and parameter (z, cp) are changed to (y, 8). Now we can make
a transformation so that a12 is zero:
The new coefficients are now
54
with aii other coefkients either unchanged or not of interest. ln lower case, the
coefficient may is
The (2, cp) is changeci to (y, 0). In the third step we recenter the variable so that
the new ndI density has maximum at zero:
AU the coefficients of interest wüi not change except Aol which is now -Rdo. .Mer
this, once again (x , 9) is changeci to (y, 0). To get model B, we only need one more
step: the fourth step. Ln this step, we make transformation
cp =
x = y.
This transformation will change the new coefficient to zero with di other c e
efficients of interest unchangecl. So model A baliy has the coefficient array given
This array has the form of coefficient array (2.11) for mode1 B. Cornparhg these
two arrays, we have
which gives the reIation between a3, a4 and ~ 3 , C Y ~ , C:
to order O ( R - ~ / ~ ) . Then, substituting the above relation into (2.24) and (2.25), we
have, for fixeci 8 and varying y,
III particular, if bo = 0, the reIation (2.46) becornes a3 = (113, = aq - 3 4 - 6c to
order ~ ( n - * f 3 ) and the normality measures are reexpressed as
which are (2.26) and (2.27) respectiveIy.
2.8 Discontinuity at the extremes
The Lugannani & Rice cornbining formula has another disadvantage in that it can
produce values outside the acceptable [O, 11 range for pvaiues. The mechanits of this
can be seen in the scaiar parameter case with Say large values of r. We consider this
Eiom a distribution function viewpoint (fixeci 8) rather than the pvalue viewpoint
(fixed Y).
For large values of r the k i t f o d a can be viewed using (2.14) as the integral of
a normal density 4(r) together with an adjustment factor r/q. The hst correction
term to @(r) in (2.1) is $(r)/r which provides the hïiils ratio evaiuation of the
right tail of the normal. As the M i l s ratio for the normal is typically on the large
side, this first correction can produce an appraximate value greater than 1. The
second correction is r /q times the Milis ratio and provides an adjusted M i ratio
appropriate to the scaled density (2.14). If the nght tail is very thin and r/q is smali,
then this compensating adjustment may not be enough to bring the value below 1.
A reasonabIe objective is a modifieci formula that generaiiy tracks the Lugannani &
Rice (2.1) formula but avoids the singularity just described.
Formula (2.1) can be wrïtten
using the nonnormality measure dl Erom Section 2.3 which takes the form (2.24)
in the h e d 0 context. We c m consider this for any fixed T and then examine
convergence as d goes to zero:
a (T; O) = a(r) o(TI-~/~) -
@d(r; 0) = 4(r) O(nA1)
@&(ri O) = 0 O(n-"*) ,
where the subscripts denote differentiation. The second Barndorff-Nielsen formuia
(2.2) can be written
h ( r ; d ) = a (T - T-' 10g(1 - rd))
= ~ ( r + d + r k / 2 )
which of course sathfies (2.49) as is easily checked by Merentiation or expansion.
For simuiations the tail singuIarity with the Lugannani & Rice (2.1) formuia can
be avoided by cornpressing towards the Barndorff-Nielsen (2.2) formula; for the right
tail of the distribution h c t i o n use
and for the left tail use
This ref ains the third order asymptotic property (2.49) but limits the value to being
at most half way from Q2(r,q) to the particular bound. Alternative proportions
even proportions dependent on r can replace the .50 above.
Example 2.5. Consider the noncentral chi-square distribution with noncentral
$ = 1 and degrees of fieedom equal to f = 5. With p as parameter we use the
asymptotic approximations (2.1) and (2.2) to approxirnate the distribution b c t i o n
for the chi-squared variable r2 when 2 = 1; see Figure 2.4. The modification
Gom (2.51) is also plotted there; it does avoid the long range of negative p d u e s
for small values of 73 but still falls short of the exact. We do note that the left end
of the distribution corresponds to a singularity in the original model.
Figure 2.4: The approximations QLR and aBN for the distribution function of
the noncentral chi-squared distribution with degrees of keedom 5 and noncentral
eZ = 1. The compression modification Qc avoids the negative values found with the
aLR approximation.
O 5 10 15 20 25 30 eta squate
Chapter 3
Approximating Tai1 Probabilit ies:
Gamma Combiner
In this chapter we develop a new combining formula for caIculating third or-
der tail probabilities; the formula uses asymptotic results derived fiom the gamma
model. Consider inference for a scalar interest parameter t,b = $ (O) in a continuous
statisticd model with density f (y; 81, where y is a vector of length n and B is a
vector of length p. The inference is presented as an approximate pvalue p($) for
assessing a hypothesised value $. To this end, we calculate the famiiiar first-order
statistics: the signed likelihood ratio statistic r and the maximum likeiihood de-
parture q. Then we calculate the p d u e by assuming that the r and q corne from
testing a hypothesis O = 1 9 ~ for the gamma model, Gamma(z;B,p), using various
values of Bo and various values of the shape parameter p. We s h d show that for
each pair (r, q), there exists a pair (0,p) if some conditions are satis6ed. We will
also show that this new combining formula is a third urder formula.
Section 3.1 discusses the inference in the gamma statistical model. This dis-
cussion provides part of the logic for us to use the gamma as the basis for a new
combining formula. Section 3.2 proposes what can be called a gamma combining
formula for computing pvalues. Section 3.3 gives some numerical resuits. In Sec-
tion 3.4 we apply this combining formula to the noncentral chi-square distribution
and compare the results to those obtained by other existing methods. FinaUy, we
rigorously establish the third order property of this combining formula in Section
3.5.
3.1 Inference in the gamma model
Assume that y is an observation hom the gamma model with density
where y is a scdar variable, 8 is a scalar parameter of interest and p is a mathematical
parameter which we assume is a certain number whose d u e w i i l be d i s c d in the
next section. For computing the pvalue, we wouid begin with d d a t i n g two first
order statistics r and q. The log-likelihood function for this model is e(B) = l(8; y ) =
log f (y; 8, p) = p log(8) - 38, where the constant log ï - l (P) is ignored. Rom this we
have the score b c t i o n X/ûû = p/B - y, and the derivative of the score function
#t?/i%î2 = -p/02. Hence, the maximum Likelihood estimate of B is obtained by
solving the score equation I(9) = 0, and the result is = p/y. The obse~ed Fisher
information is then
Therefore, the signed Iikelihood ratio statistic is given by
and the maximum likelihood departure is given by
where z = 99. It is not diflicult to prove that the above r and q have the relationship
Notice that q > O if ordy if p - yû > O, or equidently, y > p/9 . So we can see that
the gamma distribution has a thin taii on the right and a thick tail on the left.
The above r and q can be used to make inferences in the gamma mode1 such
as testing any hypothesis O = Bo. The pvalue can be calculateci based on formula
(2.12) or (2.13). This, however, is not our puspose here. M e a d , the r and q form
the basis for us to develop the gamma combining formula later in this chapter.
3.2 An alternative third order combining formula:
the gamma combiner
The usefulness of the gamma model (3.1) in computing a pvalue for a general
continuous statistical model is t hat we can incorporate the mat hematicd parameter
p in such a way that an inference in a general model can correspond to a one in a
gamma modei in some sense. This correspondence provides the basis for us to use
the gamma model to compute pvalues in a general model. The properiy (3.4) of the
gamma model implies that it is usefid for both thin-tailed and thick-tded general
model. In fact, the pvaiue in a general model cm be approximated by using the
gamma model if we can properly choase its tails. First, one can show that for my
r and q such that rq > O: if r > q, the system
r = ~ g o ( p - z ) { 2 ( r - p + p l o g p - p l o ~ r ) ) '
q = p - f ( p - r )
has one im iqe solution Ip, 2); and if T < q, the system
has one unique solution (p, 2).
Now we are ready to put forward an alternative third order combining formula,
which can be c d e d the gamma combining formula.
Theofern 3.1 Assume yl, . . . ,y, are observations fiom a generd continuous
statistical mode1 f (y; 6) with good asymptotic properties as outlined in Fraser, Reid,
& Wu (1999) where 8 is a pdimensional parameter. Nso assume that $(a) is a
scaIar interest parameter and r is the usuai signeci square root of the likelihood
ratio statistic given by
and q is calculated as in (1.21). Then to third order the pvalue cm be approximated
where (p, z ) is the unique solution to the system (3.5) or (3.6) in accordance with
Gp(z) = Li I'-' (p) zp- e-'dr (3-9)
is the cumulative distribution function of the gamma distribution with density
Since the system (3.5) is the same as (3.2) and (3.31, we can see that any pvalue
in a generat model with T > q can be approicimated by the pvalue in a gamma
model for testing hypothesis 0 = with observation y0 where z = yuBo. For the
case r < q, we need to switch the tails in some sense.
We make some cumments here: if the statistical mode1 is reaily a gamma model
with location parameter 8 being the ody unknown parameter, then this combining
formula gives exact pvalues for testing 0. Let's look at how badly the pvdues eau
be approximated by the &ing formulas such as the BarndorfI-NieIsen formula
(2.1) or the Lugannani & Rice formula (2.2) when a model has a density peak near
zero. Consider the gamma model
f (y; 8, D) = r-'(@) (~e)~-'e-%
with p = 0.1. Table 3.1 gives the approximations to pvalues for testing hypothesis
8 = Bo for diflerent observed go. Our proposed Gamma combiner gives exact pvalues
for this model while neither the Barndod-Nielsen formula nor the Lugannani & Rice
formula c m give good approximations, though the former does slightly better than
the latter. This suggests that when a model is a gamma rnodel or is dose to a gamma
model, especiaily when the model has a high peak near zero like the gamma model,
one may need sorne formulas like the gamma combiner to calculate p-values with
required accuracy. In fact, this is one of the advantages of the gamma approximation
over other combining formulas. It can provide better approximations or even exact
pvalues in some important target models.
We wiii postpone establishing the third order property of the gamma combining
formula to Section 3.5.
Table 3.1: The gamma mode1 f (y; O, B) = C-l(/?)(yO)~-le-eV, with = 0.1. The
approximations to the pvalues for testing 0 = 1 with Werent obsemed yo.
3.3 Some numerical studies
in this section, we want to see the performance of the gamma combining formula
and compare it with Lugannani & Rice formula (1.11) and the Barndofi-Nielsen
formula (1.15). As the k t example, we check the performance of the gamma
formuia in the context of normal distribution.
Example 3.1. Let 91, y*, - - . , yn be a sample from N(p , $) with unknown mean
y and unknom variance $. Suppose we want to test hypothesis Ho : p = O. In this
example, we know the usual t test is an exact test. So it is easy for us to compare
the performance of these third order formulas.
The log iikelihood function is
It foiiows that the maximum likelihood estimates of p and &! are given by f i = Y and
- n o2 = C(yi - fiI2/n, respectively. We also need the constrained maximum iikelihood
i= 1 n
estimate of $: oi = C(gi - ~ ) ~ / n . Therefoie the signed likelihood ratio statistic i= 1
for testing hypothesis that p is the true value is
the standardized departure of the maximum IikeIihood estima
by using (1 .21) and the resuiting quantity is
Then the pvalue for the one-sided test of p will be given by (1 .11) or (1.15).
The exact t-test statistic is
Y - P t=- S / f i '
where s is the standard deviation defineci by
The t statistic foiiows the t distribution with n - 1 degrees of fieedom.
It is not difEcult to write r and q in terms of the t statistic as
r = sipn(t) {n log (1 + - f2 )Y2 n - l
S i c e t distribution converges to the standard normal distribution as the degrees
of freedom goes to infinity and r, q, and r' al1 also converge to a standard normal
random variable, they give very simiIar approximation when the sample size n is
very large. Thus, to better see the Merence in the performance of these methods,
we choose a smaii sample size, let say n = 3. In this case, the t distribution has tails
much heavier than the standard normal distribution. Figure 3.1 below shows that
the gamma c o m b i i g formula and the Bamdorff-Nielsen c o m b i i g formula both
give very good approximation to the t distribution.
Our purpose here of course is not to replace the exact t-test by our proposed
gamma formula or the Barndorff-NieIsen formula. Instead, we try to get a feeling
for how the gamma formula behaves so we can appIy it when the exact pvaiue is
either unavailable or very hard to get.
Figure 3.1: Approximations to the t distribution by different formuias: the
Barndorff-Nielsen formula, the gamma formula, and the signed likelihood ratio
statistic.
Table 3.2 gives a more detailed cornparison. E'rorn this table we cm see that
the gamma formula gives very good approximations to probabiity at the middle
quantiles of the Student distribution.
Nthough the gamma fomuia has some numericd problems and gives slightly
thinner approximations to probability at the quantiles in the far tails of the Student
distribution, it does give approximations as good as those given by the Barndo&
Nielsen formula or the Lugannani & Rice formula.
The numerical problems aise when we try to solve the equation
Its unique root is so close to zero (in fact O < w < 10-~") if r/Q > 45 that a
computer wiil set this root equd to zero. But to get the pvalue we need a non-zero
root with higher accuracy. As a resuit, the gamma approximation should be avoided
for a range of such cases.
Ezampk 3.2. Logistic model. This is a pure Iocation mode1 with one observation
and with an error distribution given by the iogistic density; thus the mode1 for Y is
For the hypothesis 0 = O0 the signed likeiihood ratio statistic is
Tabh 3.2: Approximations to the Student(2) distribution by the Barndorfi-Nielsen
formula, the Lugannani & Rice formula and the gamma formula.
1 exact probabüity t quantile 1
gamma (3.8)
NA
NA
0.2480 - 0.5803
0.9056 . IO-' 0.1929
0.2937
0.3964
and the standardized departure of the maximum likelihuod estimate is
The numerical results are shown in Table 3.3 where the approximations to pvdues
correspond to various exact pvaiues for these tests. AIso pvalues fiom the Bartlett
correction are reported. The Bartlett correction factor (1.24) is E(w) N 23/20 when
computed fiom the general asymptotic expression which involves cumulants of order
up to four in the uniform distribution (Skovgaard, 2000).
Ekom this table, we know that pvaiues fiom the Barndorff-Nielsen formula, the
gamma formula and the Bartlett correction are very similar but the first two, espe-
Table 3.3: Test in location logistic rnodel. Approximations to pvaiues by different
methods: the signed likelihood ratio, the BarndorfT-Nielsen formula, the gamma
formula and the Bartlett correction.
Exan pvalue 1 pvalue by r 1 Barndoiff-Nielsen 1 Gamma 1 Bartlett
cialiy the gamma formda, give much better approximations than the signeci square
root of the Iikelihood ratio statistic. Skovgaard (2000) gave a simiiar cornparison of
these these methods except the gamma formula.
We also compare approximations to the signiacance function, a function givingp
value for Mereut Bo for given data. Here we assume, without loss of generality, that
the observeci y is O. The significance function approximations are plotted in Figure
3.2 . This figure shows that the approximations given by the gamma formula and
the Barndorff-Nielsen formula are alrnost the same as the exact hrnction while the
likelihood root T gives an unsatisfactory approximation to the significance fundion.
3.4 Inference for the noncentral chi-squared dis-
t ribution
As another example, in this section we use the gamma combiner to approximate
the noncentrai chi-squared distribution and compare its accuracy with that of several
other methods.
The noncentrai chi-squared distribution is often encountered when we want to
caIcuiate the power of tests on the mean of a rndtivariate normal distribution, for
detaiis, see for example Anderson (1975) and Patnaik (1949). It mises naturaiiy
fkom normal distribution. More specificaiiy, if 91, a,. . . ,yn are independent and
normally distributed with mean and variance 1. Then
i=l
foiiows the chi-square distribution with n degrees of fieedom and non-centrality
n
Umaiiy we want to calculate its cumuIative distribution function which can be
Figure 3.2: Logistic mode1 f (y; 8 ) = 9 with observation y = O. Ap (1 + eV-@)
proximations to significance hct ion by different methods: the gamma formula, the
Bamdorff-Nielsen forda , and the signed likelihood ratio statistic.
where 2+% is a chi-sqmed distributed with n + 2k degrees of heedorn. See, for
example, Johnson & Kotz 1970, p. 132 for details.
Many authors have studied this noncentral chi-squared distribution in order to
avoid using the infinite series (3.10). Arnong them, Bol'shev and Kuznetzov (1963)
obtained an approximation
Cox and Reid (1987) proposed two simpler approximations
and
The approximation (3.12) was obtained by inverting the cumuiative generating h c -
tion and using an Edgeworth expansion while (3.13) can be seen as its asymptotic
expansion.
Cohen (1988) provided a procedure for evaluating a noncentral 2 distribution.
This method uses a tabuiation of the three Iowest degree of freedom of the noncentrd
chi-square distribution function or equivalently an effective computer algorithm for
their evaluation and it requires recursive evaiuation.
Wu (1999) developed a saddlepoint type of approximation. To get this approxi-
mation, first we note that the density of 112 is (Johnson and Kota, 1970)
so its moment generating function is
and its cumulant generating function is
n 89 K(s) = log M ( s ) = -- log(1- 23) + -
2 1 - 2s'
The saddlepoint density approximation is then given as
where
is the second derivative of K(s), and S is the saddlepoint given by
Therefore the tail probability based on Lugannani-Rice formula (2.11) is
where
Fraser, Wong & Wu (1998) discussed what is called the double saddlepoint a p
proXimation (Reid, 1995). This appruximation uses the Lugannani & Rice formula
(1.11) or the Banidodf-Nielsen formula (1.15) in which Q is obtained by (1.21).
The main idea is to reparameterize the problem. More spcifically, let yi = pcui + G,
where is an n-dimensional vector, i = 1,2, - - - ,n and el, ez, - - - ,e, are a sample
from the normal distribution with mean O and known variance ai. The parameter
is 0 = (p, a) with p taken as the parameter of interest.
The probabity to the left of an obsemed $ for the non-central chi-squared
distribution with n degrees of beedom and noncentrality pl is given by Gd ($) as
in (3.10). This probability can be approximated by the Barndofi-Nieisen formula
(1.15) or the Lugannani & Rice formula (LU), with third order accuracy. There
the r and Q are given by
and
Aiternatively, we may use gamma combining formula (3.8) and with the same T
and Q above. Table 3 4 and Table 3.5 beIow give numencal cornprisons of these
methods for some selected vatues of the distribution function for the noncentral
chi-square distribution with n degrees of freedom and noncentrality $. Part of the
tables are cited here from Raser, Wong & Wu (1998) and Wu (1999). In aii these
examples, the exact pro ba bilities are O btained by
where N is an integer chosen to satisfy
We can see, from these two tables, that Cox & Reid's approximations (3.12)
and (3.13) seem unsatisfactory: the former is good only for small p and the latter is
even worse. So does Bol'shev & kuznetzov's approximation (3.11). The saddlepoint
approximation (3.14) breakdowns in a large neighborhood of some value of V* for
which the saddlepoint is zero. in addition, its performance elsewhere is not good
as the double saddlepoint approximation using the Barndorff-Nielsen formula (1.15)
or the Lugannani & Rice formula (1.11) based in r in (3.15) and Q in (3.16). Gen-
erally speaking, the Barndo&- Nielsen method provides the best approximation to
noncentral Chi-square distribution. The gamma combining formuIa (3.8) can also
give very good approximations to the noncentrai chi-squared distribution, especidy
for the noncentral chi-squareci distribution with a medium large number of degrees
and a large noncentrality. Although it may sometimes have numerical problems as
reporteci in Example 3.2 in the previous section when r and Q are quite different
in magnitude, let Say when r/Q > 45 or Q/r > 45, it outperforms the double
sacidepoint approximations in rnany cases.
Figune3.4 and Figure 3.4 compare the approximations to the distribution of the
noncentrai chi-squared distribution by different methods. F'rom these figures we
c m see the performance of these methods for the whole range of the variable 171.
The approximations to the noncentral chi-squared distribution with degrees of free-
dom n = 5 and the noncentrality p2 = LOO by the Barndorfl-Nielsen method, the
Lugannani-Rice method and the Gamma method are so close to the exact distribu-
tion that they overlap in this figure. Figure 3.4 shows that these three methods give
very good approximation though the BarndorfT-Nielsen method gives the best one
for the case with n = 10 and 9 = 25.
Table 3.4: Approximations to Gd (r12) with n = 2
Cox & Reid (3.12) Cox & Reid (3.13) Bol'shev (3.11) Saddlepoint (3.14 ) Lugannani-Rice(l.11) Barndorff-Nielsen(l.15) Gamma (3.8) Exact (3.10)
Cox & Reid (3.12) Cox & Reid (3.13) Bol'shev (3.11) Saddlepoint (3.14 ) Lugannani & Rice(l.11) Bmdorff-Nielsen(1. 15) Gamma (3.8) Exact (3.10)
Table 3.5: Approximations to Gd(v2) with n = 5
Cox & Reid (3.12) Cox & Reid (3.13) Bol'shev (3.11) Saddiepoint (3.14 ) Lugannani & Rice(l.11) Barndorff-Nielsen(l.15) Gamma (3.8) Exact (3.10)
Table 3.6: Approximations to Gd(v2) with n = 10
Cox & Reid (3.12) Cox & Reid (3.13) Bol'shev (3.1 1) Saddiepoint (3.14 ) Lugannani & Rice (1.11) BarndorfWielsen(l.15) Gamma (3.8) Exact (3.10)
Figure 3.3: Cornparison of approximations to noncentral x2 with degrees of kee-
dom n = 5 and noncentraiity 8 = 100. The two Cox k Reid methods are (3.12)
and (3.13) respectively.
1 1 - I
O 50 100 1 50 200 250 300 eta square
Figure 3.4: Cornparison of approximations to noncentral x2 with degrees of hee
dom n = 10 and noncentrality 2 = 25.The two Cox & Reid methods are (3.12) and
(3.13) respectively.
O 20 40 60 80 100 eta square
3.5 Theoretical work on the Gamma Combiner
In this section, we give the proof of Z'lmxern 3.1. That is, we establish the third
order property of the Gamma Combiner.
Without loss of generality, we simply assume hereafter r > q > O. Then (p: z ) is
the solution to the equation system
Other cases can be proved in a similar way.
We will proceed in several steps. Since we are developing asymptotic results, al1
the arguments here are asymptotically correct to the appropriate order. First we
note from Section 2.4 and Section 2.5 that, for a generai model with asymptotic
properties, r and q satisfy
where a is some certain constant determined by the model.
Lemma 3.1 Let n denote the size of the samplefrom which T and q are calmlated.
Then p -+ oo in probability as n + m.
Pmof. We know (p, z) is the solution to equation system (3.17). Let w = z/p,
2 w -1 - Iogw then w must satkfy F(w) - r2 = O where F(w) = -- is a strîctly ;;a Iw - 1)
increasing function, thus q = F(w) = 5 as n + m. Conseqoently, ve have 2
that w = F-I (9) -+ 1, aa n + m. Therefore p = i2 + w, as n -t m. (1 - 4
Lemma 3.2 The T and q in (3.1 7) satisfy
Proof. Expmding the T in (3.17), we obtain
which is (3.19). It is not difücuit to prove (3.20) and (3.21) based on (3.19).
By comparing (3.18) and (3.21), one may have
Lemma 3.3 The solution p hm the same order of magnitude as saiple sue n.
In order ta simp1ify the calculations in the proof of Theorem 3.1, we aiso need
the Hermite polynomials which are defined by
The explkit forms of the Hermite polynomiais Ho to Hs(x) are as follows (see for
example BP-dorff-Nielson and C a 1989):
Hermite polynomiais have many interesthg properties, A property used in our proof
is the relationship in the foiiowing Iemma
Lemma 3.4 For Hennite polynomaals,
Pmof. n o m defuition (3.22) of polynomiais, we have
integrating both sides of (3.24) fields
which is (3.23).
As we know, the Lugannani & Rice combining formula (1.11) has third order
accuracy, it sufices to show that
Now we are ready to prove it-
The proof of Theorem 3.1: For r > q > O, we have
Note that (3.26) is the probabiiity Pr{U > 2 ) where random variable U follows the
gamma distribution with density
We know U has mean p and variance p. As we want to approximate probability
(3.26) by the standard normal density and distribution functioa as in (3.25), we
need to standardize the gamma variable U. Let
Then V has mean zero and standard deviation one. Then
where
is the density of V. We need the asymptotic expansion of this density in terms of the
standard normal density t$(v). By substituthg the following asymptotic expansion
(see Albramowitz and Stegun 1972, Lawless 1982)
into log f (u), we obtain
Therefore
where #(a) is the density of the standard normal. After substituthg (3.28) into
(3.27), we have
FinaUy the substitutions of (3.20) and (3.21) into the above (3.29) and some algebraic
calcuiations yield (3.25).
Chapter 4
Approximat ing Tai1 Probabilit ies:
the Student t Combiner
In this chapter we s h d develop another third order combiiing formula. The idea
in that is, for asymptotic inference in a generai statisticd modeI, we 6rst calculate
the signeci ükelihood ratio statistic r and the standardid departuce of the maximum
likelihood estimate q. We think of this pair of r and q as coming from a location
Student distribution mode1 where we test the location parameter. We then seek
to determine the degrees of the fieedom and the coaesponding quantile of the t
distribution. The desireci pvdue is f i n d y approximated by the distribution function
of Student's t distribution, which is readily available in any standard textbook.
4.1 Why the Student t combiner?
As we know, the Barndorff-Nielson formula and the Lugannani & Rice formula
are both third order formula for approximating taii probabities. They give very
good approximations in many situations even when the sample size is very srnall.
However, they give only approximate but not exact tail probabilities even in some
very simple cases. Example 3.1 gives such a case where yl, y2, - + + , y,, are independent
and identically normdy distributeci with unknom mem p and unknown variance
a2. We know the log iikeiihood function is
l n
((p, c) = -n log O - - C(y, - p)2. 2a2 .-
I- 1
Therefore the signeci iikelihood ratio statistic of testing hypothesis that p = po is
the true value is
The standardid departure of the maximum likelihood estimate cm be calculated
by using (1.21) as:
Then an approximation to the pvalue of the one-sided test of p = iç p ( ~ ) or
1-p(h) where p ( h ) is given by the Lugannani-Rice formula (1.11) or the Barndorff-
Nielson formda (1.15). However, we know the Student t statistic
gives an exact test. Since Student-like distributions are very common in practice and
they have thicker tailed probability densities than the normal. A question naturaiiy
d e : Cm we have a new combiner that can process r and q better and give a better
approximation to the pvaiue?
For thii, it is not di£ûcult to write r and q in terms of the t statistic and degrees
of freedom f , where f = n - 1, as
A naturai solution would be, for the Student-like statistical model, we calculate
r and q in a usual way, then solve (4.1) and (4.2) for t and f , and the desired pvaiue
is obtained by Hl( t ) , where il/ denotes the cumulative distribution function of the
Student t distribution with f degrees of freedom.
Of course, in order for the equation systern consisting of (4.1) and (4.2) to have
a solution (t, f), it is not difEcult to show that r and q should satisfy
In other cases, the method provided here should be avoided.
4.2 Inference in the location Student mode1
Suppose y cornes fÎom a Iocation Student t distribution witb f degrees of freedom.
That is,
y = p + e
where e follows the Student distribution with f degrees of heedom. Therefore y has
a density Eunction as
where g(t) is the density for the Student distribution with f degrees of fieedom:
f + 1 2 ( + - y - 1 h(t) = r(ip)r(f/2) fi.
If we want to test the hypothesis: p = h, we proceed as follows, The likelihood
function is
f + l l ( p ) = -- 2 log 1 + -(t - p ) 2 , ( ; )
from this, we know the score fwiction and the derivative of the score function are,
Therefore the maximum iikelihood estimate of p is Cr = t , and the observed Fisher
information is jp,, = (f + l)/ f. The new parameterization is
We have, consequently,
It is not difficult to get the signed Likeliood statistic ratio:
Therefore, comparing (4.6) and (4.5) with (4.1) and (4.2), we know the Student
combiner gives an exact pvalue for testing the hypothesis Ho: p = in the location
Student t distribution model.
4.3 The Student t combiner
In the previous two sections, we discussed two situations. In the ûrst situation
we test p with the normal distribution N ( p , 8 ) and in the second we test p with the
Student distribution centered at y. Neither the Barndorff-Nielsen formula nor the
Lugannani & Rice formula can process r and q to give exact pvalues for these two
important modeIs. Both models have thicker tails than the normal distribution. We
propose what can be cded the Student t combiner to target some similar models
iike these.
As we suggested in Section 4.1, to get p d u e s for statistical models with thick
tails, that is, the ones with r/q > 1 where r and q are calculateci in step 1 below.
we use the foUowing procedure:
Step 1. Use (1.17) and (1.21) calculate r and q as usual;
Step 2. Solve
for f and t;
Step 3. Get pvalue Hl@), where HI is the cumulative distribution function of
the Student t distribution.
The idea of establishing the third order property of the Student combiner is
similar to the one used in the proof of the Gamma combiner. However, in this
case we need the condition l/r - l / q = O(n-L), which basicaüy requires that the
model in question be symmetric in some sense. This condition coma naturaily if
we consider the symmetry of the Student distribution, though it may not be triviai
at aii. However, if a model is not far away from this requirement, this procedure
rnight still apply and we need numencal simulations to c o w this and the order
of this procedure remains an open issue.
As we have seen in Section 4.1 and Section 4.2, in normal distribution sarnpIing
the Student combiner d l give an exact pvaiue for testing the location p. Then, as
suggested by Prof. John Tukey, this approximation would be preferable for location
scale anaiysis where the error distribution could be at or near the normal form; for
some related discussion see Fraser, Wong and Wu (1999).
Chapter 5
Conclusion
A bridging method is developed to calculate the pvalue for the value of the
interest parameter in the neighborhood of the maximum likelihood estimate and
it is rigorously established by using likelihood asymptotics. Its importance can be
found in simulations and hypothesis testing. The inaccuracy in calculating the p
values for this srnail neighborhood may not only drarnatically distort the behavior
of the statistic in question in the simulations and thus Iead to incorrect conclusions
about the statistic, but may also Iead to an incorrect rejection of a hypothesis in
which the tested value is very close to the maximum Likelihood value. This method is
a downstream version of the issue address addressed by DanieIs (1987) who defined
the pvalue ody at the maximum likelihood value in the sense that our bridging
method define p-values for the neighborhood and the pvalue at the maximum value
coincides with the one defined by DanieIs.
Alternative combining formuias, the Gamma Combiner and the Student Com-
biner, are aIso developed in this thesis. One of the advantages of the Gamma
Combiner is that it gives exact pvalue for gamma-type modeis. It also targets other
sUnilar modeis. Numerical studies Uidicate its good pedorrnances. Correspondingly,
the Student combiner gives exact pvaiue in the normal distribution sampling mod-
els while it is also preferable for location scaie anal@ mhere the enor distribution
could at or near the normal form.
The Lugannani & Rice formula has the problem of producing pvaiues outside
the acceptable [O, 11 range. Neither the Gamma combining formula nor the Student
combining formula has this kind of discontinuity at the extremes. This is another
advantage of these two combining formulas.
Bibliograp hy
[l] Abebe, F., Cakmak, S., Cheah, P.K., Fraser, D.A.S., Kuhn, J., and Reid, N.
(1995). Third order asymptotic model: Exponentid and Location type approx-
imations. Parisankhyan Samikhha 2, 25-35.
[2] Anderson. T.W. (1975). An Introduction to Multivafiate Statistical Analysis.
New York: Wiley.
[3] Andrews, D.F., fiaser, D.A.S. and Wong, A. (2000). Higher order Laplace in-
tegration and the hyperaccuracy of ment likeiihood methods, submitted Jour.
Amer. Stutist. Assoc.
[4] Barndorff-Nielsen, O.E. (1980). Conditionaiity resolutions. BiometriXa 67,293-
310.
[5] Barndora-Nielsen, O.E. (1983). On a formula for the distribution of the maxi-
mum likelihood estimator. Biometrika 70,343-365.
[6] Barndorff-Nielsen, O.E. (1985). Confidence limits £rom C ( ~ ( ~ / ' E in the single
parameter case. Scand. J. Statist. 12, 8387.
[7] Barndofi-Nielsen, O.E. (1986). Inference on fidi or partial parameters based
on the standardized signed log iikelihood ratio. Biometrika 73, 307-322.
[8] Barndofi-Nielsen, O.E. (1988). Discussion of " Saddlepoint methods and sta-
tistical inference" by N. Reid. Stotistical Science3, 228229.
[9] Barndodi-Nielsen, O.E. (1990). Approximate i n t e rd probabiIities. J.R. Statut.
Soc. B 52, 485-96.
[IO] Barndorff-Nielsen, O.E. (1991). ?dodilied signed log iikelihood ratio. Biometrika
78, 557-63.
[lll Brundo&-Nielsen, O.E. (1994). -4djusted versions of profile LikeIihood and di-
rected likelihood, and extended iikelihood. 3. R. Statist. Soc. B 56, 125-40.
(121 BarndorE-Nielsen, O.E. & Chamberün, S.R. (1991). An ancillary invariant mod-
ification of the signeci log iikelihood ratio. Scand. J. Statist. 18,341-52.
[13] Barndorff-Nielsen, O.E. & Chamberlin, S.R. (1994). Stable and invariant ad-
justed directed likelihoods. Biometrika 81, 485-94.
[14] Barndorff-Nielsen, O.E. & Cox, D.R (1979). Edgeworth and saddlepoint a p
proximations with statistical applications. J.R. Statist. Soc. B 41, 279-312.
[15] Barndofi-Nielsen, O.E. & Cox, D.R. (1989). Inference and Asynptotics. Lon-
don: Chapman and Hd.
[16] Barndorff-Nielsen, O.E. & Cox, D.R. (1994). Asy~nptotic Techniques for Use in
Stotistics. London: Chapman and Haii.
[17] Bmdorff-Nielsen, O.E. & Wood, A.T.A. (1998). On large deviations and choice
of ancillary for p' and r'. Bernoulli 4, 35-63.
[18] Bartiett, M.S. (1937). Properties of sufficiency and statistical tests. Proc. Roy.
Soc. London Ser. A160, 268282.
[19] C h & , S., McDunnough, P., Reid, N. & Yuan, X. (1998). Likelihood centered
asymptotic model: exponential and location model versions. J. Statist. Plan.
Inf. 66, 211-22.
(201 Cheah, P.K., F'raser, D.A.S. & Reid, N. (1995). Adjustment to Iikelihood and
densities; caiculating significance. J. Statist. Res, 29, 1-13.
[21] Cohen, J.D. (1988). Noncentral chi-square: some observations on recurrence.
The American Statistician 42, 12c122.
[22] Cm, D.R. & Reid, N. (1987). Approximations to noncentrd distributions.
Canadian Journal of Statistics 15, 105-114.
[23] Cramer, H. (1938). Sur un nouveau theoreme-limite des probabiiities. Actualités
Sci. Indust. 736, 5-23.
[24] Daniels, H.E. (1954). Saddlepoint approximations in statistia. Ann. Math.
Statist. 25, 631-650.
[25] Daniels, H.E. (1987). Tai1 probabity approximations. Internationd Statistical
Review 55, 37-48-
[26] DiCiccio, T. (1986). PLppraximate conditionai inference for location families.
Canad. J. StatZst. 14, 1-18.
[27] DiCiccio, T., Field, C. & Fkaser, D.A.S. (1990). Marginal taiI probabilities and
inference for real parameters, Biometrika 77,6576.
[28] DiCiccio, T. & Martin, M.A. (1991). Approximations of marginal taii probabili-
ties for a class of smooth functions with applications to Bayesian and conditionai
inference. Biometrika 78, 891-902.
[29] DiCiccio, T. & Martin, M.A. (1993). Simple modilications for signed roots of
likelihood ratio statistics. J. R. Statist. Soc. B 55, 305-18.
(301 DiCiccio, T. & Stern, S.E. (1993). On Bartlett adjustments for approximate
Bayesian infer ence, Biometrika80, 731-740.
[311 Durbi, J. (1980). Approximations for densities of sufficient estimators.
Biometrika 67, 31 1-333.
[32] Efron, B. (1981). Nonparametric standard enors and confidence intervals (with
discussion). Cunad. J. Statisf. 9, 139-172.
(331 Esscher, F. (1932). The probability function in the collective theory of risk.
Skand Aktuaritidskr. 15,175-195.
[34] Fisher, R.A. (1922). On the mathematicai foundation of theoretical statistics.
Phil. %ns. Ray. Statkt. Soc. Ser. A 222, 309-368.
[35] Fisher, R.A, (1925). Theory of Statisticai estimation. Pmc. Cambridge Philos.
Soc. 22,700-725.
p6] Fraser, D.A.S. (1964). Local conditional sufEciency. J- R. Statist. Soc. B 26,
52-62.
[37] Fraser, D.A.S. (1990). Tai1 probabilities fiom observed Likeiihood. Biametriko
77, 67-76.
[381 Fraser, D.A.S. & Reid, N. (1993). Simple asyrnptotic connections between den-
sity and cumulant functions leading to accurate approximations for distribution
functions. Statist. Siniw 3,67-82.
[39] Raser, D.A.S. & Reid, N. (1995). Anciiiaries and third-order significance. Utd-
itas Mathematica 7,3353.
1401 Fraser, D. AS. & Reid, N. (1998). hcillary information for statistical inference.
P roceedings of a Symposium on Empiricai Bayes and Likelihood Inference. New
York: Springer-Verlag, to appear.
[41] Fraser, D.A.S. and Reid, N. (2000). Ancillary information for statisticai infer-
ence, Proceedinga of the Conference on Empin'caI Bayes and Likelihood, Eds:
E . h e d and N. Reid, Springer-Verlag,to appear.
[42] Fraser, D.A.S., Reid, N. & Wong (1991). Exponentiai b a r models: a two-pass
procedure for saddlepoint approximation. J. Roy, Statist. Soc. B 53, 483-492.
[43] Fkaser, D.A.S., Reid, N. and Wu, J (1999). A simple general formula for tail
probabilities for frequentist and Bayesian inference. Biometrika 86, 249-264.
[44] Raser, D.A.S., Wong, A.C.M. and Wu, J (1998). An approximation for the
noncentral Chi-square distribution. Commun. Stafist. - Simula. 27(2), 279-287.
[45j HinkIey, D.V. (1977). Conditionai inference about a normal mean with knom
coefficient of variation. Biometrika 64, 105-8.
[46] Lawley, D.N. (1956). A generai method for approximating to the distribution
of the likehhood ratio criteria. Biometrika 43, 295-303.
[47] Jensen, J.L. f 1992). The modifiecl signed likelihood statistic and sacidlepoint
approximations. Biometrika79 693-704.
[48] Johnson, N.L. & Kotz, S. (1970) Distribution in Statistics: Continuow Uni-
vanate Dkhibutions. New York: Houghton MiffIin.
(491 Lieblein, J. & Zelen, M. (1956). Statistical investigation of the fatigue life of
deep groove b d bearings J. Research, National Bureau of Standards 57, 273-
316.
[50] Lugannani, R. & Rice, S. (1980). Saddepoint approximation for the distribution
funetion of the sum of independent variables. Adu. Appl. Pmb. 12, 475-90.
[51] Pace, L. & Sdvan, A. (1997). Principles of Stai%~tz'cul inference h m a neo-
Fishenan perspectiue. Singapore: World ScientSc Publishing Co.
[52] Patiak, P.B. (1949). The noncentral x2 and F distributions and their applica-
tions. Biometrika 38,202-232.
1531 Pierce, D.A. & Peters, D. (1992). Practicd use of higher-order asymptotics for
muhiparameter exponentiai famiiies (with Discussion). J. R. Statkt. Soc- B 54,
(541 Pierce, D.A. & Peters, D. (1994). Higher order asymptotics and the iikelihood
principle: One parameter models. Biometrika 81, 1-10.
[55] Posten, H.O. (1989). An effective algorithm for the noncentral chi-square dis-
tribution function. The American Statisticien 43, 261-263.
[56] Reid, N. (1988). Saddlpoint methods and statistical inference (with discussion).
Canad. J. Statist 24, 141-166.
[571 Reid, N. (1995). The roles of conditioning in inference. Statisticut SMencelO,
13û-157.
[58] Reid, N. (1996). Likehood and higher-order approximations to tail areas: A
review and annotated bibiiography. Canad. J. Statist. 24, 141-66.
[59] Skovgaard, I.M. (1986). Successive improvements of the order of ancillarity.
Biometrika 73, 516-19.
[60] Skovgaard, LM. (1996). An explicit large-deviation approximation to one pa-
rameter tests. Bernoulli 2, 145-65.
[61j Skovgaard, LM. (2000). Lielihood Asymptotics. Scand. 3. Statut., to appear.
[62[ Vangel, M.G. (1996). Confidence intervals for a normal coefficient of variation.
Amer. Stctist. 15, 21-6.
[63] Wald, A. (1941). AsymptoticaIly most pmrful tests of statistical hypotheses.
Ann. Math. Statist. 12, 1-19.
[64] Waid, A. (1943). Tests of statistical hypotheses concerning several parameters
when the number of observations is large. Duns. Amer. Math. Soc. 54,426482.
[65] Wang, S. & Gray, H.L. (1993). Approximating tail probabities of noncentral
distri bution. Computational Statistic. and Data Anaiysis 15, 343-352.
[66] Wi, S.S. (1938). The large-sample distribution of the iikelihood ratio for
testing composite hypotheses. Ann. Math. Statist. 9, 60-62.
[67] Wu, J . (1999). Asymptotic Lzkelzhood Inference. Pb.D. Thesis, Department of
Statistics, University of Toronto.