on asymptotic likelihood inference: removzng p-value ... · l($, a) for sxed @, &(@) = 1($, b)...

ON ASYMPTOTIC LIKELIHOOD INFERENCE: REMOVZNG P-VALUE: SINGULARITIES

Rongcai Li

A thesis submitted in conformilgr with requirements

for the Degree of Doctor of Philosophy

Graduate Department of Statistics

University of Toronto

@Copyright by Rongcai Li 2001

National Library BibliothPque nationale du Canada

Acquisitions and Acquisitions et Bibliagraphii SeMces seivices bibliographquas

The author has granted a non- exclusive licence ailowing the National L i m of Canada to reproduce, loan, distn'bute or sel1 copies of tbis thesis in microform, paper or electronic formats.

The author retains omership of the copy@ht in this thesis. Neither the thesis nor substantial extracts h m it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé une Licence non exclusive permettant à la Bibliothèque nationale du Canada de repraduire, prêter, distriiuer ou vendre des copies de cette thèse sous la forme de microficbe/fh, de reproduction sur papier ou sur format éIectronique.

L'auteur comme La prowété du droit d'auteur qui protège cette thése. Ni la thèse ni des extraits substaatieIs de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

On Asymptotic Likelihood Inference: Removing P-value Singularities

Rongcai Li, PhD 2001

Department of Statistics

University of Statistics

Abstract

Recent iikeiihood asymptotics initiated by Barndorff-Nielsen (1986) and Lugan-

nani & Rice(1980) has developed two combining formulas, which are often called

third order formulas, for computing pvalues and confidence coefficients with high

accuracy. Fraser & Reid (1995) and many others have extended these resuits to

very generai contexts and extensively explored their applications to various statk-

tical problems. These two formulas, however, have certain singularities near the

maximum iikelihood value. In this thesis we develop a theory for removing the

singuiarities in a generd statisticai context using the tangent expooential mode1 de-

veloped by Cakmak et al (1998) and Abebe et al (1995). In doing so, the asymptotic

expansions of the signed ükelihood ratio statistic and the standardized measure of

departure are fhst obtained in t e m of the standardized third and fourth derivatives

of the log density and are then used to form a bridge for the pvaiue hinctions in

the neighborhood of the maximum likeiihood value. The concept of a nonnormality

measure is also developed and its implications to the singularity problems are &

cussed. In addition, its expressions for two types of tangent exponential mode1 are

related.

We have also developed several alternative combining formulas for obtaining

highly accurate p-value and confidence interval. Unlike the Lugannani & Rice for-

mula, these formulas are generdy continuous at the extrema. Numerical studies

are used to determiae and compare the accuracy and reliability of the àiierent

combining formulas. These studies show that these new formdas outperform the

existing ones in many cases.

Acknowledgement

First and foremost, 1 wodd like to express my sincere gratitude to my supervisor,

Professor D.A.S. Fraser, for his guidance, encouragement, inspiration and patience

through the period of my Ph.D. program here at the University of Toronto. Without

his invaluable supervision and support this thesis could not have been completed.

1 am grateful to Professor Nancy Reid for her advice and support through my

PhD study.

1 aiso highly appreciate the kind mistance from Professor A.C.M Wong. My

conversations with him have been so helpful.

The financiai support provideci by the Ontario Government and the Department

of Statistics of the University of Toronto is acknowledged and is highly appreciated.

Last but not least, 1 would like to thank my parents and wife for their love and

continuing support through rny PIiD study. This thesis is dedicated to them and to

my IoveIy daughter Jiefei and son Ben.

Contents

1 A Review of Theory on Asymptotic Inference 1

. . . . . . . . . . . . . . 1.1 Likelihood and 6rst order asymptotic theory 2

. . . . . . . . . . . . . . . . . . . . . . . 1.2 Saddlepoint approximations 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 The p'-formula 7

. . . . . . . . . . . . . . . . . . . . . 1.4 Tail probability approximations 10

. . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Overview and Outline 18

2 Likelihood Asympt otics: On Removing Singularities 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction 22

. . . . . . . . . . . . . . . . . . . . . . 2.2 Asymptotic exponential model 25

. . . . . . . . . . . . 2.3 Departures from standard normality: scalar case 32

. . . . . . . . . . . . . . . . . . . 2.4 Bridging the singularity: scalar case 39

. . . . . . . . . . . . . . . . . 2.5 Bridging the singularity: scdar interest 43

. . . . . . . . . . . . . . . . . . 2.6 Graphical bridging of the singuIarity 45

. . . . . . . . . . . . . . . . 2.7 Interrelating two nonnomality measures 50

2.8 Discontinuity at the extremes . . . . . . . . . . . . . . . . . . . . . . 56

3 Appraximating Tail Probabiities: the Gamma Combiner 60

3.1 Inference in the gamma mode1 . . . . . . . . . . . . . . . . . . . . . . 61

3.2 An alternative third order combining formuia: the gamma combiner . 63

3.3 Some numerical studies . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4 Merence for the noncentrd chi-squared distribution . . . . . . . . . . 73

3.5 Theoretical work on the Gamma Combiner . . . . . . . . . . . . . . . 84

4 Appraximating Tai1 Probabilities: the Student t Combiner 90

4.1 Why the Student t combiner? . . . . . . . . . . . . . . . . . . . . . . 91

4.2 Inference in the location Student mode1 . . . . . . . . . . . . . . . . . 93

4.3 The Student t combiner . . . . . . . . . . . . . . . . . . - - . . . . . 94

5 Conchsion 97

Bibliography 99

List of Figures

2.1 The gamma model r-l(0)ye-l with go = 10. The asymptotic approx-

imations QGR(0) and QBN(0) for the pvalue function px(8 ) for testing

0 are plotted against 0. The bridge p~ (O) at the maximum likelihood

value is superimposeci on the exact px (O). . - - . . . . . . . . . . . . 42

2.2 For the gamma model with mean p and shape P the pvalue approxi-

mations aLR(p) and GBN(p) for testing p are pIotted for a sample of

20. The aberrant behavior at the maximum likelihood value is suc-

cessfully bridged using (2.32) together with a graphical d2 determined

from Figure 2.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.3 For the gamma model and data for Figure 2.2, the departure measures

di and d2 are plotted against the signeci likelihood ratio r and a

bridging straight line is obtained graphicaiiy. As dl and d2 are so

close that they overlap in this figure. . . . . . . . . . . . . . . . . . . 49

The approximations iPLR and QBN for the distribution function of the

noncentral chi-squareci distribution with degrees of fieedom 5 and

noncentral e2 = 1. The compression modification iPc avoids the neg-

ative values found with the aLR approximation. . . . . . . . . . . . . 59

Approximations to the t distribution by different formulas: the Barndorff-

Nielsen formula, the gamma formula, and the signed likelihood ratio

statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Logistic mode1 f (y; 0) = with obsenration y = O. Approx- (1 +

imations to significance function by different rnethods: the gamma

formula, the Barndod-Nielsen formula, and the signed likelihood ra-

tio statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Comparison of approximations to noncentrai x2 with degrees of free-

dom n = 5 and noncentrality 8 = 100. The two Cox & Reid methods

. . . . . . . . . . . . . . . . . . . are (3.12) and (3.13) respectively.

Comparison of approximations to noncentrai x2 with degrees of kee-

dom n = 10 and noncentrality 3 = 25.The two Cox & Reid methods

are (3.12) and (3.13) respectively. . . . . . . . . . . . . . . . . . . .

List of Tables

3.1 The gamma model f(y; 8 , P ) = r-1(P)(pO)fl-Le-%3, with B = 0.1.

The approximations to the pvalues for testing = 1 with different

observed go. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.2 Approximations to the S tudent (2) distribution by the Bamdorff-Nielsen

formula, the Lugannani & Rice formula and the gamma formuia. . . . 71

3.3 Test in iocation logistic model. Approximations to pvalues by dif-

ferent methods: the signed likelihood ratio, the Barndos-Nielsen

formula, the gamma formula and the Bartlett correction. . . . . . . . 72

3.4 Approximations to G,,a($) with n = 2 . . . . . . . . . . . . . . . . . 80

3.5 Approximations to G? ($) with n = 5 . . . . . . . . . . . . . . . . . 81

3.6 Approximations to G3($) with n = 10 . . . . . . . . . . . . . . . . 81

Chapter 1

A Review on Theory Asymptotic

Inference

Parametric statistics is a very important and active branch of statistics. It de&

with distribution problems in situations where previous knowleàge and conjecture

provide us with the density form of the distribution except that some parameters in

the distribution are unknown. We are typicaiiy interested in estimating parameters

and then testing hypotheses on some or aü the parameters or obtaining coddence

coefficients. In some cases, these problems have a simple exact solution. But in

other cases either they do not have an exact solution or the exact solution is too

complicated to be used directiy. These situations seem to necessitate the use of

asymptotic methods. In this chapter we review some background in this area.

1.1 Likelihood and fmt order asymptotic theory

The study of parametric statistics by based on Likelihood h c t i o n was initiated

by Fisher (1922,1925). The likelihood is a function of the parameter as determined

by the data. It cari be viewed as summarizing al1 the information ia the data about

the parameter in a simple form. One of the various reasons for its current popuIarity

is that it cm provide simple and general approximation for sarnpiing distnbutious

of likelihood based quantities. Tbese approximations are associat ed wit h various

large sample limit results. The simplest prirnary result is cailed the Central Limit

Theorem, which States that the mean of a sample of n independent and identicdy

distributed variables is asympto tically normaiiy distributed as the sample size n goes

to infini@. In more generai contexts Central Limit Theorem type results hold as long

as the quantity of information supplieci by the sample tends to infinity. Increasing

the sample size is just one simple way of increasing the quantity of information.

Based on these limit tbeorems, we may just% first order asymptotic theory, which

we can summarize as foUows.

Assume that yl, yz, . . . ,y, is a sample from a statisticd mode1 (f (y; 8) : û E

R E W ) where y is a scaIar and û is a pdimensional parameter vector. Let

t(8) = t(û; y,, . . . , y,) = Cy=L=, logf(yi; 8) denote the log likelihood function and

- 8 = O(y1,. . . , y,) denote the maximum Iikelihood estimate of the full parameter

vector. To test the hypothesis

Ho:e=eo,

one familiar statistic is the Wilks' log likeliood ratio statistic

which converges in distribution under the hypothesis to the Central Chisquare dis-

tribution with p degrees of freedom where p is the dimension of e. See Wilks (1938)

for more details.

For the same situation, two other statistics are often us&. the score statistic,

and the Wald statistic, which are defineci respectively as

ae where U(8) = is the score hinction, i(0) = E (U(0)U(6)T) is the (expected)

Fisher information. These two statistics also converge in distribution to the same

Chisquare distribution; See Wald (1941, 1943). Both the W i ' and score statistics

are invariant to reparameterization of the mode1 and the hypothesis while the WaId's

statistic is not.

Several asymptoticaliy equivaient versions of these statistics are available due

to the asymptotic equivaience of the expected Fisher information and the observeci

a=l Fisher information j = j (ê) = -=(O) Ikd: that is, i (B ) in the above expressions can

be replaced by 3, and the resuiting statistics foiiow the same chi-square distribution.

If 8 is a scalar parameter, a square root version of these statistics is sometimes

mggesteci:

where m stands for the standardized maximum iikehood estimate. It is easily seen

that these are asymptotically standard normal.

If 8 is a vector parameter and we are interested in testing one of its scaiar

components, the full parameter can sometimes be partitioned as BT = ($, AT), where

$ = $(O) is the parameter of interest. ln such cases the proHe log likelihood function

and the constrained maximum iikelihood estimate are typicaiiy used to express the

fint order statistics:

where &, is the mnstrained maximum iikeiihoad estimate obtained by maamizing

l ($ , A) for sxed @, &(@) = 1($, b) is the prodie log iikelihood, and is the

component of the inverse of i(8) conespondhg to the $ component. For more

details see Barndorff - Nieison and Cox (1994).

These statistics sometimes give fairly sirnilar approximations for practical use.

However, it shodd be kept in mind that they are meant for situations where we

have a mfficiently large number of observations from a fairly simple model.

In other situations, they can give very different resuits for the same problem.

In such cases the likelihood ratio statistic is often more reiiable than the other two

statistics. Such situations suggest the search for improvements to these first order

results. In fact, in the past several decades, many efforts have been directed towards

improving these results, an area often called higher order approximation theory. The

theory not ody gives us accurate results in more cornplicated situations but also

gives us confidence in applying the first order theory in simple situations after it

is confirmeci by the higher order theory (Pierce & Peters 1992, 1994). The higher

order theory is the focus of next three sections.

1.2 Saddlepoint approximations

In order to get better approximation in likelihood inference, the asymptotic

expansion for the density of a statistic of interest is obtained so that the first two

or three terms provide an approximation to the density bc t ion . Even though

it is not aiways obvious that this approximation wiü give a better Enitesampie

approximation to the tme density than that based on the first term. But in many

cases this approximation turns out to be surprisingly accurate.

Let K, &, . . , Yn be independent, identically distributed random vecton £rom

a density f (y) on W. The moment generating function is denoted by

and the cumulant generating function by

The saddlepoint expansion of the density of the mean vector, Y = n-l Y;:, is given

by

The right-hand side of ( l . l ) , excluding the enor term, is c d e d the saddlepoint

approximation to the density of Y. The value 1 = is cded the saddlepoint and

is the solution to the saddlepoint equation

where Kr(@ = aK/aOIi and ~ ' ' ( e ) = # K / ~ ~ W I ~ , and (K"(@I is the determinant

of ~ " ( 9 ) . The remainder & has an expausion h power of n-'. The expansion (1.1)

was 6rst derived in Daniels (1954) using two Merent approaches. One approach

uses the "exponential tilting" (Efion, 1981). It first embeds f&) in a conjugate

exponentid family

= o = P { ~ [ K v ) - e T i ] ) -

Now the Edgeworth method is applied to obtain an expansion for fF(pj. This

technique is attributed to Cramer (1938) and Esscher (1932). The other approach,

fiom which the approximation takes its name, uses the saddiepoint technique from

applied mathematics The density of E is 6rst expressed as the Fourier inversion

integral of its characteristic function and then a Taylor expansion method is used to

get the saddlepoint approximation. An excellent review of this was given by Reid

(1988). Daniels (1987) aiso gives a detaiied discussion.

For the vector forrn of the saddepoint expansion, see Barndorff-Nieisen and Cox

(1989).

Even thougb introduced by Daniels in 1954, the saddepoint approximation was

not wideIy acknowIedged until a paper by Barndorff-Nielsen and Cm (1979) dis

cussed it for general statistical application.

1.3 The p*-formula

In the previous section it was mentioned that the saddiepoint approximation

is surprisingly accurate in many contexts. The qonential family provides one of

these contexts and in this case, the approximation can be reformulated entireiy in

terms of the likeiihood function.

Let Y', 6,. . . , Y, be independent and identicdy distributeci random variable

vectors with density

/>;(y; 0 ) = exp{oTy - k ( @ ) } h ( y )

with 8 E R C RP. Denote the sample rnean by Y = ! k;; then the log-tikeiihood

can be written as

!(O) = n d T I - nk(8).

It is easily seen that E is a minimal sufEcient statistic for O. The maximum iikelihood

estimate of 8, denoted by d, is defined by Y (1) = 9. Note that the transformation

from to 9 is one-to-one, so we can rewrite the log likeiihood as

Hence

Note that the cumulant generating lunctioo is K(t) = k(8 + t ) - k(8) . Then the

saddlepoint approximation, given by (1.11, leads to the foiiowing approximation to

the density of ê

where j(B) = -#l(0)/BBB~le=j is the observeci Fisher information, in this case

equal to n l ( 6 ) . If we replace ( 2 ~ ) - p l 2 by a normaiizing constant

C(O) = ( j U(ê)llpexp{c(e; 8) - e(ê; ê) }) ", e

(1.4)

the resulting approximation is

f (@) = ~ ( O ) l j ( e ) l l / ~ ~ { l ( 8 ; ê ) - ! (8 ;8 ) } { i + O(n-l)} (1.5)

and the relative error of (1.5) wiii be reduced to ~ ( n ' ~ / ~ ) (Dwbii, 1980). The

right-hand side of (1.5) is a special case of a more generai formula, which has come

to be knom as Barndorff-Nielsen's approximation or the p'-formula, as Bamdorff-

Nielsen (1980,1983) showed that (1.5) holds outside the exponential family and &O

investigated extensively its application outside the exponential family. The same

formula continues to provide an approximation to the conditional distribution of

the maximum likelihood estimate, conditional on an approximate anciIIary statistic

a. ApproxhateIy ancillary is taken to mean that the distribution of a depends on

O only in terms of O(n-') or higher, for 13 within O(n-Il2) of the true value. We

write the likelihood as ( ( O ) = ((8; 9, a) . Then the density of 6 is a p p r h a t e d by

p*(êla; 0) = c(0, a) ( j (9) ll/2exp{e(9; 0, a) - ((9; 8, a)} .

where c(8, a) is a normalizing constant in a sense similar to (1.4). Tbis formula has

an error of order to O ( ~ L - ~ / ~ ) .

An important feature of this formula is that it gives the exact density in the case

of transformation model.

The accuracy of this approximation has b e n extensively investigated, particu-

lady by Barndofi-Niehen (1980,1983,1985,1986). Detaiied discussions and reviews

are given by BarndorfWielsen and Cox (1994) and Reid (1988, 1995).

1.4 Tail probability approximations

IL statistical inference, we are often interesteci in approximating the cumuiative

distribution function as opposeci to the density function of a statistic, as we need

the cumulative distribution Eunction to compute a pvalue or confidence coefficient.

This section reviews recent results on using the ükelihood function to approximate

cumulative distribution functions.

Let us first discuss a simple situation where both data and the full parameter

are essentiaily one dimensional. This discussion will throw some Iigfit on more

complicated situations.

Assume that YI, Y2, . . . , Y, are independent and identicaliy distnbuted from a

density f (y; 8) 6 t h cumulant generating function K(s) = log E exp (sY). A saddle-

point approximation to the cumuiative distribution b c t i o n of E was first derived

in Lugannani & Rice (1980) and reviepred in Danieis (1987):

where @ and 4 are the cumuiative distribution function and density function of

standard normal, respectively,

and 9 is defineci by Kr( i ) = Y. The relative error in (1.6) is 0(n-3 /2) .

In the speciai case of a canonical exponentid family mode1

the maximum iikelihood estimate 6 is a one-tmne h c t i o n of k. The iikelihood

formulation of the quantities r and q is

where j (9 ) = -Ci(B). It can be seen that, at the value 6 = 8, the approximation

breaks dom, as r = q = O, It shodd be replaceci at that point by its limiting value

&(a; 6) = $ + r n / [ 6 6 ) , where = n-L~(d) / {n -L j ( ~ $ ) ~ f ~ (Reid, 1996).

Intriguingly, however, the results are not restricted tu distributions with CU-

mulant generating functions, but app1y tto generai statisticd models f (y;d) with

smuothness and asymptotic properties.

Denote by F ( y ; d ) the cumulative distribution function of Y. We consider the

approximation

with r as in (1.9) but with a q somewhat different fiom (1.10) that uses a nominai

reparameterization (p(9) specific to the data point y,

where

Fraser (1990) derived the expression in (1.12) by using a tangent exponential

model approximation to a generai one-dimensionai model. An alternative expression

for q in (1.12) is

9 = ( ~ ( 8 ) - t i ( e ) } j ( ê ) - ; ,

where e,i(B) = %(O; y)/a9.

This expression for q is due to Barndofi-Nielsen (1988,1990) and was derived by

integrating the p*-formula directly. Actuaily Barndorff-NieIsen derived a so-caiied

r8-formuia

where

This approximation is accurate to the same order of accuracy as that in (1.11).

Their quivalence was examineci in detail by Jensen (1992) by expancihg r* about

r for exponentiai family models. It also foiiows fiom expansions given in Barndorff-

Kelsen (19861, as indicated by DiCiccio and Martin (1993).

The approximation given by (1.11) or (1.15) is not only asymptotically third-

order accurate but also is found to work very weii even for mal1 sarnples.

We now extend the approximation (1.11) or (1.15) to the vector full param-

eter case, where we suppose one scaiar component is of interest. Denote the p

dimensional parameter 8 by (.Sr, AT)= where .CO = $(O) is the component of interest

and X is the (p - 1)-dimensionai nuisance parameter.

The minimum sufficient statistic may or may not have the same dimension as

the full parameter. If it has and we are in an exponentiai model, we cm eiiminate

the nuisance parameters by conditioning or by marginaiization, and if we are in a

transformation model, we can eliminate by marginaiizing. For the exponentiai family

case, see Fraser, Reid & Wong (1991) and Skovgaard (1986), for transformation

models, see DiCiccio, Field & Fraser (1990).

If it has higher dimension, the generai approach is first to reduce its dimension

to that of the full parameter by conditioning on an exact or approximate anciilary

statistic. Skovgaard (1986) and Raser & Reid (1995) show that a second order ancil-

lary is suficient to give a third-order accurate approximation as in (1.11) or (1.15).

Barndorff-Nielsen (1986,1991) and Barndorff-Nieisen & Wood (1998) describe how

to constnict an ancillary in some situations.

Denote by a an exact or a p p r h a t e anciiiary. The fidl log-likelihood function

e($) = !(O; y) ean be written as L(9; 8, a) and the sample space derivative e;è(B; 8, a)

is deüned by $((O; 6, a) for fixed ancillary. The approximation is again given by

(1.11) or (1.15) with the foilowing r and q:

w here

the nuisance parameter A.

This approximation depends on the speciûcation of an exact or approximate

anciilary statistic a as the calculation of the quantity q in (1.18) involves a. But in

generai the choice for an ancillary may not be apparent.

Fortunately, Raser & Reid (1995) show that the explicit form for an ancillary is

not necessary. We only need its tangent direction at the data point for the calculation

of the pvaiue. Raser, Reid & Wu (1999) develop a general tail probability formula

which is the focus of the rest of this section.

Consider a vector response y = (yi, y2,. . . , yn) with independent scalar com-

ponents and density function f (y; 8). Assume that log-likelihood fûnction L(8; y)

is O(n) in 0, the m h m likelihood estirnate converges at rate n-4 to 0, and

various differentiability properties hold as discussed for example in DiCiccio, Field

& Raser (1990). The thiid-order approximation to the pvalue still has the form

of (1.11) or (1.15) with the same r in (1.17) but with a special q designated Q to

distinguish it fiom the special q as discussed earlier; the third order accuracy of

this approximation is the central issue addressecl in Fraser, Reid & Wu (1999). The

construction of Q involves two steps: the fùst step is a reduction by conditioning

from the dimension n of the variable y to the dimension p of the full parameter.

This reduction in dimension is obtained by conditioning on an exact or approxirnate

ancillary. As we mentioned earlier, we only need its p tangent direction vectors at

the data point. Denote these vectors by V = (vl, uz, . . . , v,) where all the vils are

n-dimensional vectors. So V can be treated as an n by p array. This array c m be

constructed by using a vector z = (zt, a,. . . , zJ' of pivotal quantities t, = &(y; 0)

that has a ûxed distribution. One simple choice for the case of independent scaiar

coordinates uses the successive distribution Eunctions q = F(yi; O ) , i = 1,2, . . . , n,

which are uniformly distributeci. Then the array V is obtained by differentiating:

where the first expression is calculateci for hed z, and yo,&O are the observed data

point and maximum likelihood estimate, respectively. Then a tangent exponentiai

model, which is an exponential f d y mode1 whose asymptotic expression is closest

to that of the true model, is obtained. The canonical parameters of this exponential

model are obtained by dinerentiating the likelihood in the tangent direction V:

where is the ?" row of V and &L(B; y) denotes the vector of directionai derivatives

(al(0; Y)/&J~, --- , a.!(& that is defined by de(#; y)/& = a.!(& y + t ~ ) / a t ( ~ , ~ .

Now we corne to the second step: eliminate the nuisance parameter by marginal-

ization to a pivotal quantity that depends on the parameter of interest T+!J but whose

distribution is free of the nuisance parameter A. By doing this, the parameter of

interest is isolated and the dimension p of the variable is again reduced now to the

dimension 1 giving what can be viewed as an intrinsic measure of departure from

what is expected under $(O) = II.

Typicaliy the reparameterization cp in (1.20) d o s not have the parameter of

interest q9 as a Iinear component. Hence a new scalar parameter is extracted fiom

p and then used to access $J:

where

where

the argument p in (1.22) is to indicate that the full and nuisance information deter-

minants are recaübrated on the cp scale:

This is the freguentist analysis of the third order approximation. The Bayesian

analysis can &O be found in Fraser, Reid & Wu (1999). And this Bayesian version

generaiizes the r d t s of DiCiccio & Martin (1991, 1993).

When we are interesteci in several components of a vector full parameter, one

option for making inferences is to use the "sequential method": testing the corn-

ponents one by one. Sometirnes simultaneous inference for them may seem more

appropriate. Skovgaard (2000) gives a detailed discussion of this.

Here we have concentrated on the methods developed by Barndofi-Nielsen and

Fraser and many other authors. This does not mean these methods are the only

options to get high order accurate pvatues. In fact, at least another asymptotic

approach to the same problem is also avaiiabie. The usuai test statistic to test p

components of 0 is the generalized Iikelihood-ratio statistic:

w($) = 2{4 t j t . - w, ), whicb is asymptoticaiiy $-distributed. It can be shown that

and that

So an idea to improve the accuracy of the inference is that, instead of using w(@) in

(1.23), we can use G($) in (1.25). This technique is known as the Bartlett correction,

which originates from 8artIett (1937) and Lawley (1956). The problem with this

method is that it is not easy to implement as a generai expression for estimate 6(3, A)

is compiicated, invoiving fourth-order cumulants of derivatives of the log-likeiihood

function. Some background may be found in DiCiccio (1986), DiCiccio & Stern

(19931, Barndo&-Nielsen & Cm (1994, Ch.9) and Pace & Salvan (1997, Ch.7).

We compare in some examples the performance of these different approximations

with that of the approximations we develop in later chapters.

1.5 Overview and Outline

This thesis concentrates on the iikelihood asymptotics: it develops the theory

for rernoving singularities in the BamdorfE-Nielsen combining formuIa (Barndodi"

Nielsen, 1986) and the Lugannani & Rice formula (Luganuani & Rice, 1980). Zt also

develops several alternative combining formulas.

Chapter 2 proposes a procedure for removing the singulanty of the combining

formulas at the m i x h u m likelihood estimate. This kind of singularity is a down-

stream version of one addressed by Daniels (1987) for the scalar saddlepoint context.

Our procedwe is based on the tangent exponential model (Cakmak, et ai, 1998 and

Abebe et al , l995), which is an exponential model whose asymptotic expansion at

the data point of interest is closest to that of the tme model. The likelihood is then

approximated in t e m of the standardized cumdants of the approximating tangent

exponential model. So asymptotic expansions of the two elements T and q, and

hence of the r* in the Barndorff-Nielsen formula cm be obtained. The expansion

for r8 provides a third order bridge for cornputhg the p d u e for the values of the

parameter in a srnali neighborhood of the rnmimuni Iikeiihood estimate. A compar-

ison of the thickness of the taiis of the interest distribution to that of those obtained

for the normal distribution provides the basis for defining a nonnormality measure-

The impiications of this measure are also discussed. There we also address the issue

of discontinuity at the extrema for the Lugannani & Rice formda arid provide some

remedies for t h .

Chapter 3 devebps an alternative third order combining formuia using the gamma

distribution. One feature of the gamma distribution is that its one taïi is thicker

than that of the nomal whereas the other is thinner than that of the normal dis-

tribution. This feature makes the gamma very useful in computing the pvalue for

both thick-tailed and thin-tailed distributions. Results indicate good performance

of this formula in various situations.

Chapter 4 develop another alternative combiiing formula using the Student t

distribution. Due to the fact that its two tails are thicker than that of the normal,

however, this formula can be used only for thick-taited distributions.

Fiaaiiy, a brief conclusion is given in Chapter 5.

Chapter 2

Likelihood Asymptotics:

On Removing Singularities

Recent Iikelihood asyrnptotics has produced highly accurate pvalues for very

general contexts. The terminal combining formulas for the production of these p

values have certain singularities, however, near the maximum likelihood value. The

singularities at the maximum iikelihood value are a downstream version of an is-

sue addressed by Danieis (1987) for the scaiar saddlepoiut context; He provided

an approximate value at the singularities, which invotved a standardized third or-

der cumulant. For a generd statisticd context we develop a third degree bridge

for the pvalue function at the maximum iikelibood d u e for the case with no nui-

sance parameters, and a limiting value at the singularity For the case with nuisance

parameters.

2.1 Introduction

The saddlepoint method introduced to statistics by Daniels (1954) and Bamdorff-

Nielsen & Cox (1979) gives a highly accurate approximation for a density function

with known cumulant generating function. Lugannani & Rice (1980) used the sa&

dIepoint met hod to develop a distribution function approximation as an alternative

to numericd integration of the approximate density function. The Lugannani &

Rice (1980) approximation has a singuiarity at the saddlepoint, which c m be re-

pIaced (Danieis 1987) by its limiting value, a multiple of a third order standardized

cumuIant. Barndofi-Nielsen (1986) developed an alternative distribution function

approximation as part of extending results beyond the exponential model context.

These distribution function approximations quite generaUy use two rather dif-

ferent inputs of information from likelihood. The first is almost always the signed

square root r of the log iiieühood ratio given below at (2.3). The second is some

appropriately defined maximum likeiihood departure p; the search for the appropri-

ate q has been the recent focus for obtaining progressively more general pvalues in

likelihood asymptotics.

These two inputs are combineci using either of the foilowing two formulas to give

a third order pvaiue for testing a scalar parameter value:

due to Lugannani & Füce (1980) and BamdorfKNielsen (1986) respectively, as de,

veloped for specific contexts; CJ~(T) and @(T) are the standard normal density and

distribution functions.

In the cases we consider, T is the likelihood root, the signed square root of the

log likeiihood ratio statistic,

where !(O; y) is log likelihood, + (O) is the scdar interest parameter with tested value

+, and 9 and $ are the maximum Wrelihwd d u e s without and with the constraint

@(O) = $. The definition of q is fess straightforward as it typically depends on

more than just observecl likelihood; several expressions are recorded below at (2.13),

(2.30), (2.33). Both r and q are standard normal to order ~ ( n - I l 2 ) , but with the

appropriately defined q the pvalues given by (2.1) and (2.2) are distributeci under

the model as uniform (O, 1 ) to order ~ ( n - ~ / ~ ) .

We are assuming that we have a continuous statistical model J(y; 8) with di-

mensions n and p for y and O , and that Q(8) is a scaiar parameter of interest. The

reduction in dimension fiom n to p is achieved in principle by conditionhg on an

a p p r h a t e ancillary statistics, but only the tangents to the approximate ancil-

lary are needed at the data point of interest. It is shown in Raser & Reid (1995,

2000) that these tangents cm be obtauied from a full dimension pivota1 quantity

z = z(y; 8) ; related details are recorded in Sections 2.3 and 2.4.

The r and q in (2.1) and (2.2) are functions of $J and y and are both close to zero

when is near &y). This poses obvious numerical difüculties for the evahation of

T-' -q-' and r/q; see Figure 2.1 for an example of numerical perturbations near the

maximum likelihood value. h o , for extreme values of $, the second t e m in (2.1)

can overwheim the tail probability from the first t e m and give a d u e outside the

acceptable range [O, 11 for a pvalue; see Figure 2.4 for an example where a range of

values less than zero are recorded for large values of the parameter.

Zn Section 2.2, we give some background on third order asymptotic exponential-

type model, both the likelihood centered version and the density centered version.

This serves the basis for the development of the bcidging formulas for Iater sections.

In Section 2.3 we examine an asymptotic model with scalar variable and param-

eter and define two measures of how the asymptotic density of the signed Iikelihood

ratio departs from the standard normal; asymptotic expressions are obtained for the

measures and they are seen to be essentiaiiy equivalent.

Ln Section 2.4 we examine the scalar parameter case and use the two measures

of departure to develop a third order bridge for the singularity at the maximum

likelihood value.

In Section 2.5 we examine the case of vector full parameter 0 with scalar interest

paramete. $ = +(O). The techniques from Section 2.3 are then used to develop a

second order bridge at the maximllm Likeiihood value.

Then in Section 2.6 we use functionai properties of the departure measures to

develop simple third order graphical procedures for bridging at the maximum value,

for both the scalar and vector full parameter cases.

In Section 2.7, we generalize the relation between two departures for any general

point.

In Section 2.8 we consider alternatives to the combining formulas (2.1) and (2.2)

to avoid the possible singularities at the extremes of the parameters being tested.

Third order asymptotic exponent id-type model:

likelihood centered and density centered

For the development of the concept of nonnormality and the theory for remov-

ing the singularity intrinsicaliy existing with the Barndorff-Nielsen formula and the

Lugannani & Rice formula, we need some background on third order asymptotic

exponential models, both the likelihood centered version and the density centered

version. They were timt introduced by Abebe et al (1995) and Cakmak et al (1998),

respectively. These are both exponential type approximations. The density centered

version is an approximation to the density for a certain h e d parameter d u e and is

therefore reIevant to the distributional property whde the likelihood centered version

is for fixed data value and would therefore bear on effectiveness of test quantities

for inference with fixeci data. The ideas and techniques used in developing these

two versions of approximations are very similar. So we wiii concentrate on decr ib

ing the development of the iikelihood centered version. Then at the end of this

section we will bnefly record the density centered version of the exponential type

approximation.

For testing a scalar interest parameter in a large sample asymptotic context,

methods with third order accuracy are available that make a reduction to the sim-

ple case having a scalar parameter and scalar variable. Such a reduced mode1 can

arise by marginalization and conditionhg with an exponentiai model or by condi-

tioning and marginaiiiation with a transformation model; the latter sequence is also

available generaiiy in the asymptotic context (Fraser & Reid, 1993, 1995).

For a reai variable and reai parameter consider a density function fn(y; 8) that

depends on a mathematical parameter n, usuaiiy sample size. We assume that for

each 8, y is 0,,(n-L'2) about a maximum density point and that [(O; y) = log f&; O)

is O(n) and with either argument h e d has a unique muimum.

The objective of developing exponential type approximations is to approximate

the observed significance function p(O) = P(y < y"; 8) by the exponential approxi-

mation rather t h a . by the usual normal approximation. Accordingly we first expand

the log-density t (B ; y) in a Taylor series expansion in parameter and variable about

a data vdue y0 of interest and the correspondhg maximum iikelihood parameter

value O, = &rla),

rhere the coefficient a, = (@/û@)(a>/&J)l(0; y)I and in partidar <rio = O 00 ;uo

from the maximum ükeIihood property at $O.

We record the coefficients in a rnatrix,

We then reexpress b t h 8 and y and do so in two steps. In each step, we wili

record the new parameter and variable <p, x as functions of the old 8, y and record

the new coefficients A, as functions of the old. To avoid notationai growth, we will

then replace cp, x, A by 8, y, a for the next step. The transformations thus need to

be compouuded accordingIy.

In the fùst step, a tocation-scate standardization is applied to the variables y

and 8

This transformation malces the variabIe center at the data value y* and also gives

a unit cross Hessian. The new coefficients are

with Aj = 0(n-*+l) for i + j 2 2, as a consequeme of the asymptotic properties;

thus AiZ, AO3 are O(n-Li2) and 4 0 , A31, An, Al3, AM are O(n"). Other terms

with a + j > 4 are ignored as we choose to work only to order 0(r1-~1*). SO in terms

of lower case letters, the array of coefficients takes the forni

In the second step, we consider a reexpression of the variabIe and the parameter

so that this approximation has key characteristics of a canonical exponential mode1

(Az1 = A31 = AIZ = A13 = O). This requires transformations

The resuiting coefficients, in terms of old coefficients, are

and the coefficient array takes the form

As we know, for each 8 there is a density and the density has an integral of 1.

This means that the coefficients are interconnecteci; in fact with A30 = -a3/n1/*,

ho = -cq/n, and A= = c / n , it can be shown that al1 other coefficients are

detennined and the coefficient array takes the form

(ai,) =

where a = -$log(2~), a3 and cq are standardized third and fourth derivaiives of

the log density ((9; x) with respect to p at {p(e0), xO} and p = p(9) and x = z(y)

are local reexpressions of 8 and y that are used to obtain the tangent exponential

model approximation relative to the data point y0 and and c is a measure of nonex-

ponentiality; if c = O the model is exponentiai to the third order; these are intrinsic

parameters describing shape characteristics of the model. For some recent details

see Andrews, Raser & Wong (2000).

An asymptotic statistical mode1 having log density

with coefficient array (ai,) as in (2.9) is caiied in Cakmak et al (1998) the canonical

exponential type asymptotic model in iikelihood centered form; it describes the

essential characteristics of a large sample distribution relative to the exponential

pattern. This mode1 gives a method to compare the score, maximum iikeiihood

estimate and the signed likelihood ratio departures for a fixeci data point.

The density centered version of the asymptotic exponentiai type model was de-

veloped for a chosen parameter value and the correspondhg maximum density value

in Abebe et al (1995). The model takes the the same Taylor expansion forrn as in

(2.10) but with coefficient array as follows

where a = -; log(27r), a3 and 4 are standardized third and fourth derivatives of

t(p; x) with respect to x at {cp(Oo), i ( O O ) ) where *(O0) is the maximum density point

for 80, and cp = ~ ( 0 ) and x = x ( y ) are local reexpressions of 8 and y that are used

to obtain the tangent exponentiai model relative to the parameter value go; the

constant c is a measure of nonexponentiality and is given as û"e/û&2 evaiuated

at {lp(Bo), i ( O o ) ) . This model gives a method to compare the score, maximum

iikelihood estimate and the signed likelihood ratio departures for a fixed parameter

value.

2.3 Departures from standard normality : scalar

case

Consider first the case of an asymptotic model with scalar variable and scalar

parameter. Many properties for the more p e r d context c m be denved from this

case. We assume the model f (y; 19) leads to a log density .!(O; y) = log f (y; O) that

has the usual asymptotic properties, such a s t ( O ; Y ) = O,(n), var {f(& Y)) = O(n)

and so on. For testing a value 8, the likelihood quantity q is intrinsicdy based on

a locally defineci reparameterization,

which is the canonical parameter of the tangent exponentiai model at the data

point y0 of interest (Raser, 1990). This is used to fonn the standardized measure

of departure

where jet = - ~ ( 9 ; y) is the observed information and Ir, = 8.4?(8; y ) /&?@ =

&p/Oe evaluated at the maximum Iikelihood value o(~) adjusts the information

standardization to that for 9. The likeiihood based third order approximation to

the density of the iikelihood root r(0; y) is then given by

where k is a constant to third order. This can be obtained by change of variable

from y to r starting from the p'-formula (Barndofi-Nielsen, 1983) or starting from

the sacidlepoint approximation to the tangent exponential mode1 ( b e r & Reid,

1995, 2000).

We investigate how the third order likelihood based distribution for r differs from

the nominal standard normal of ûrst order theory; and we can do this in terms of

the density Cunction (2.14) or the distribution function venions (2.1) and (2.2).

In terms of its functional form the expression (2.14) has the Factor r/q at tached to

the basic standard density 4(r). The factor r /q is greater (or les ) than 1 according

as a tail of the density for r is thicker (or thinner) than that of the standard normal.

As a measure of departure fiom standard normal we then examine how r/q exceeds

1 taken relative to r:

We muid ais0

nominal (r) ,

examine the distribution function (2.1) and how it fails short of the

taken relative to the density 4(r),

this gives the same measure.

For a second measure we examine the agument of the distribution function

approximation (2.2); the argument is usually designated r*. We consider how it

f a short of the nominal normal deviation T :

We now determine the asymptotic form of these departure measures, as a means

to bridge the singularity at the maximum Likelihood point. Taylor series expansion

methods were used in Cakmak et al (1998) to determine the local f o m of a statistical

mode1 reIative to a particiùar data point, Say go.

In the case that y and 0 are both scalars we can examine the asymptotic form

of d(r, q) as a Eunction of r and q by expanding the log density !(O; y) = log f (y; 8)

about a reference point in tems of standardized deviations for y and for 6. A data-

oriented expansion (Cakmak et d, 1998) uses (Bo, go) = (d(&), go) and standardizes

with respect to coefficients of B2 and By. If the departures are then reexpressed

in exponential mode1 form to the third order we obtain the iikelihood centered

exponential mode1 (2.10) with coefficient array as in (2.9). F'rom this we obtain the

Iog fikelihood at y0 and the log density at do aven respectively by

and

The ody nonzero mixed derivative terms to this order are y8 and y202/4n. kL is a

constant and a = - log(2n)/2. The a3 and a4 are the standardized cumulants of the

nul1 density, and c is a measure of nonexponentiality; these are intrinsic parameters

describing shape characteristics of the model. Rom this an expansion for q in terms

of T for h e d y = go is obtained,

which gives

These r d t s are used to obtain asymptotic expressions for dl and d2 for fked data

y = go and varying 8:

In a paralle1 way, for a parameter-oriented qansion (Abebe et al, 1995) we can

use (Ba, y,,) = (Bo , g(Ba)) and standardize with respect to coefficients of yl and 30. If

the deparhm are then reexpressed towards qonential form we obtain the density

centered tangent exponential model (2.10) with coefficient array (2.11). Then the

log density at 80 and the log likelihood at y0 are given respectively by

and

together with mixed derivative temu y9 and $02/4n. k2 is a constant. From this

an expansion for g in terms of r for fixed B = O0 is obtained,

which gives

and thus gives asyrnptotic expressions for the two nonnormality measures dl and dz

for ûxed 0 = Bo and varying y:

These two types of expressions for di and dz cm be interrelateci by taking the

model (2.18, 2.19) centered at (eO, yO) and reexpreming it in the form (2.22, 2.23).

For this we take f& = @ which is zero io (2.191, and obtain $(go) = a3/2n1f2 + O ( n - l ) ; this then gives a3 = (YS, to order O(n- l ) and = a4 - 3a3 - 6c to order

O(n- I l2 ) . The constants kl and k2 check under the reexpressions. We can then

record (2.24) and (2.25) respectively as

Formulas (2.20), (2.21), (2.26) and (2.27) al1 use standardized likelihood cumu-

lants CYQ, a4 for a point (yo, Bo) with r = 0; the first two record the change in d for

fixed y0 and the last two for ûxed Bo.

If Bo = 8(yo) or if y0 = t ( B o ) then the expan.oo coefficients are linked by the

norming property which gives a3 = a3 + O(n- I l2 ) , a4 = a*( - 3 4 - 6c + O(n-Il2).

All these expressions for the di are accurate to 0(n-3 /2) .

Some clarity on the roies for the two versions (2.20), (2.21) and (2.24), (2.25)

arises by noting that T and q are functions of y and 8 for a moderate deviations

range h m some initial y0 or do of interest. Along the niNe C = { ( B , y) : 0 = &)}

we have r = q = O and to first derivative we have r = p. The departure measure is

then describing how T and q differ beyond the first derivative.

We couid have started with a point (Bo, yo) with some particular value for r =

~ ( 8 0 , yo) and then used (2.17) to examine change in r for 6xed y0 or (2.24) (2.25) to

examine change in T for fixed O*. For this we note that the aa, a4 wouid be values

determined on C with the particuiar go, and as, aq, c wotdd be d u e s determined

on C with the particular Bo, Section 2.7 discusses this in details.

Ezample 2.1. Cauchy location modei.

Consider the location Cauchy f (y - 0) = reL{(l + (y - O)*}-' with n = 1. For

this we have 6 = y and

The exponentiai parameter can be standardized, cp = &9/(1+ 02), giving

from which we obtain as = 0, a4 = 9, and thus di = dz = Dr. The la& of skewness

removes ciifferences between the two versions of the departure measure.

More generaiiy when the parameter 8 is a scalar but the observable variable has

dimension n we d e h e a vector u by

where r = z(y,B) is an n x 1 vector of natural pivotai quantities. As shown in

Raser & Reid (1995), this vector can be used to define a canonicai parametrization

rp for the original model, and then defining q as the standardized maximum likeli-

hood departure in this parametrization ensures that (2.1) and (2.2) are third order

approximations to the p-value conditiond on an a p p r h a t e l y ancillary statistic.

Thus the dimension reduction from n to 1 is achieved by conditioning on an a p

prmhate andary statistic, but this anciiiary is not explicitly needed, just the

derivative of l in the directions (2.28) for the ancillary at the data point. Using vy

the reparameterization cp in (2.12) is generalized to

and the expanded expression for q in (2.13) uses derivatives in the direction v rather

than with respect to the onginal scalar y:

In this more general context the expressions (2.20) and (2.21) for the departure

measmes remain avaiIable, but the versions (2.24) and (2.25) for varying data point

are typicaiiy not available, as they would need mode1 information dong the contour

of the observed approximate ancaary.

2.4 Bridging the singularity: scalar case

The measuses of departure developed in the preceding section provide a simple

and direct means for bridging the maximum likelihood singularity in the pvalue

formulas.

From (2.1) and (2.20) we obtain

and h m (2.2) and (2.21) we obtain

These c m be viewed as Bartlett type corrections to the IikeIihood ratio but are

derived fiom observed likelihood.

Example 2.2. Cauchy Iocation model.

Consider the location Cauchy mode1 with data y = O, as examined in Example

2.1. From the two bridging formulas we obtain

The exact p d u e is of course anilable as

p(8) = .5 + n-' tan-L{ I (e'12 - 1)L/2)

At T = O a11 three are equd to 0.5. At close to r = O we check numericaily; at the

point say T = 0.1 we have

p = -522499

The rather srnail departuse of the approximations fiom the exact is of course due to

the alma Mpossibiy smaU sample size n = I and to the sharp peak at the centre

of the Cauchy model.

For the bridging f o d a s (2.31) and (2.32) we couid have done a full Taylor

series expansions in r but, as in many sirnilar asymptotic calculations, there are

advantages to retaining the d(r) and @(r) which reflect the dominant role of the

signed likelihood ratio r.

Ezample 2.3. Consider the simple gamma mode1 on the positive a&,

f (y; O) = r-L(e)Ye-fe-y ,

with data y = 10. The sigrdicame hnction p(0) is pIotted in Figure 2.1. Note

the computationai ircegularities near the maximum likeiihood value e = 10.495838.

Simple calculations give a3 = -0.315901 and a4 = 0.199422 Çom which we obtain

giving the bridge

pB(0) = (P(0.9972348~ - 0.0526502) .

The likelihaod approximations QtR(O), GBN(B) are plotted in Figure 2.1 together

with the bridge pB(8) and the exact px (O). Cleariy a simple algorithm can choose

between the approximation and the bridge to give a ciose construction for the exact.

Figure 2.1: The gamma mode1 i'-l (O)#-' with y* = 10. The asyrnptotic apprm-

imations GLR(0) and GBN(6) for the pvalue function p x ( d ) for testing 0 are piotted

against 8. The bridge pB(d) at the maximum likelihood value is superimposed on

the exact ~ ~ ( 0 ) .

2.5 Bridging the singdarity: scalar interest

Now consider a continuous statistical mode1 f (y; 8) with dimensions n and p for

y and B and let $ ( O ) be a scalar interest parameter with Say 8' = (A', $). Again

there is an approximate anciliary with vectos V = (vi - -. up) given as

where z = z(y, 19) is stiil an n x 1 vector of natural pivotal quantities. The exponentiai

type parameter is

and d/dV gives a row vector of directiond derivatives; for some discussion of exam-

ples see Fraser, Wong & Wu (1999).

For testing $ (O) = $ using (2.1) or (2.2) the r is given by (2.3) and the q by the

following extension ( h e r , Reid & Wu, 1999) of (2.13) and (2.30):

where the numerator and denominator determinants are the full and nuisance in-

formation determinants recalibrated on the cp scale, and ~ ( p ) = u$p is a rotated cp

coordinate based on a unit vector

perpendicdar ta @{û(cp)) at the constrained maximum liketihood d u e &,.

For bridging the discontinuity at the maximum likelihood value @(ê) = 4, the

calculations are more complex and we temporarily restrict our attention to O(n-l)

accuracy. Let f!(cp) = 6'{B(cp); yO) be the observed likelihood reexpressed in terms

of cp and suppose it has been normed and recentered and rescaled so that go = 0,

! (O) = !,(O) = O, and eV# = -I. Then in tensor summation notation we have

to second order. Aiso for convenience we restrict attention to a p = 2 dimensional

parame ter.

For the scdar interest parameter 7,6{0(cp)) we suppose that the <p coordinates have

been rotated so that 7,6(rp) = 4 is tangent to pl = O and $(O) has been relocateà

and rescaled so that = O, û$/ûpt = 1, at the maximum likelihood value cp = 0;

then +(p) = cpl+ c(p2/2n1/* where c is a second derivative measuring the curvature

*

of 7,6 = relative to at cp = 0.

The signeci likeiihood ratio for testing 7,6 can be calculated to the second order

The maximum likeiihood departure (2.33) uses a unit vector q which is the bt

coordinate vector at $J = O and Iocaliy can change direction by ~(n-'/'); the de-

parture, however, 2 - X* = -yl to second order based on the cosine of an O(n"I2)

angle. The nuisance information is

Et foiiows that

Combining (2.35) and (2.36) we obtain

fiom which it foliows that

d = -(aH' f 3a1" - 3c)/6n11* .

The bridging pvalue formula is then

p($) = @(r) - d d ( r ) = @(r - d )

to second order.

2.6 Graphical bridging of the singularity

For the case of a scalar full parameter we have seen in Section 2.3 that the

departure measutes (2.15), (2.16) and (2.17) are linear in r to the third order and

thus provide simple third order bridging, using (2.31) and (2.32)- For the more

general pciimensional fidl parameter we have fiom Section 2.5 that the departure

measures are constant (2.38) to the second order.

The development (Fraser & Reid, 1995, 2000) of the pvalue formulas fiom tan-

gent exponential model approximations records the pvalue as a tail probability from

any adjusted asymptotic density; and Cheah et ai (1995) show that such an adjusted

density is itself an asymptotic model. Together thae show thcrt the departure mea-

sures

are asymptotically iinear in r to the tùird order under parameter change for ûxed

data. This is of course consistent with the familiar location-scale standardizations

of the signed likelihood ratio that gives a third order standard normal variable.

Now consider a particuiar assessrnent of a parameter @ with given data together

with possible instabity in the pvalue formulas (2.1) and (2.2). We propose pfotting

dl and d2 against the signeci likelihood ratio r. Any instabiity in the pvalue formulas

will show in dl and d2, as 9(r ) is typicaliy smooth. Accordingly we propose fitting

a line for di or d2 plotted against T , excluding the middk possibly unstable values

and the extreme values; the fitted dl or 6 is then used with (2.31) and (2.32) to

bridge the singularity-

Exampie 2.4. Consider the gamma mode1 with mean p and shape parameter P,

and data from Fraser, Reid and Wong (1997):

152 152 115 109 137 88 94 77 160 165

125 40 128 123 136 101 62 153 83 69.

For testing the parameter p we record the approximations and aBN(p) in

Figure 2.2. Note the aberrant behavior near the maximum likelihood value fi0 =

113.45- For bridging the b0 value we plot dl and d2 fiom (2.39) against the likelihood

ratio r , in Figure 2.3. The bridging p value using (2.32) with the marked segment

of the straight line fit for d2 is then recorded in Figure 2.2.

Figure 2.2: For the gamma mode1 with rnean p and shape 0 the pvalue approx-

imations iPLR(p) and <PB&) for testhg p are plotted for a sample of 20. The

aberrant behavior at the maximum likelihood value is successfully bridged using

(2.32) together with a graphicd d2 determined fiom Figure 2.3.

Figure 2.3: For the gamma mode1 and data for Figure 2.2, the departure measures

dl and d2 are plotted against the signed likelihood ratio r and a bridging straight

fine is obtained graphically. As d l and d2 are so close that they overlap in this figure.

2.7 Int errelat ing two nonnormaüt y measures

In this section we interrelate two nonnormality measures of departure for any

generd point (y0, Bo). For this, we take Likelihood centered exponentiai type asymp

totic mode1 (2.10) with coefficient array as in (2.9) (which we cal1 mode1 A hereafter)

which is centered at (yo, O(yo)). So in this model the new coordinate of y0 is zero.

We denote the new coordinate of 90 in this model by 60. The idea to interrelate

these two measures is to reexpress model A into density-centered exponential type

asymptotic modei (2.10) with coefficient array as in (2.11) (which we c d mode1 B

hereafter) so that a3, Q and c of mode1 B cm be written in terms of as, a4 and c of

model A. Rom t h , dl and 4 c m be easily rewritten in terms of a j , ~ and c.

From mode1 A, we know that the density is

Now we Bnd the maximum density point for density with 0 = JO. Setting the

gradient

quai to zero, &(0; 3) = O and solving for y with 0 = 60 gives

To reexpress model A into mode1 B, we need several steps. As in the procedure

used to deveiop the likelihood centered tangent exponential model, a t each step we

wi l l make a change of parameter and variable kom (B,y) with coefEcients(qj) to

new parameter and variable (cp,x) with new co&cients (Aj) and record the new

coefficients A,, as functions of the old. To avoid notationci1 growth, we wül then

replace 9, x, A by 8, y, a for the next step. The final transformation required can

be obtained by compounding those used in eacti step accordingly.

First we recenter the model at (8, JO) using

{ cp = 0-&O

x = y - y ,

consequently, the log density now bas form x:,, & p ' ~ ~ / i ! j ! where coefficients

Our goai is to reexpress model X into modd B so that we can get the relation

between as, a4 and w, ad, C. The coefficients in the k i t column wiU not be needed

to get that relation; accordingly, we wiU not calculate them in the subûequent steps.

Other co&cients are of order 0(n-3/2) and are ignoreci. AU 4 ' s are now ehanged

to lower case au's and (z, p) iç changed to (y, 8). After t h , we can go to the second

step: make the transformation

to standardize the variable with respect to its second derivative at the maximum

for the nuil density f (y, 0) and standardize 0 to get a unit Hessian between the new

parameter p and variable x. The resulting new coefficients are now

53

and A31 = O . We record the coefficients in an array where the first colurnn values

are changed to lower case letter as before:

Again the variable and parameter (z, cp) are changed to (y, 8). Now we can make

a transformation so that a12 is zero:

The new coefficients are now

54

with aii other coefkients either unchanged or not of interest. ln lower case, the

coefficient may is

The (2, cp) is changeci to (y, 0). In the third step we recenter the variable so that

the new ndI density has maximum at zero:

AU the coefficients of interest wüi not change except Aol which is now -Rdo. .Mer

this, once again (x , 9) is changeci to (y, 0). To get model B, we only need one more

step: the fourth step. Ln this step, we make transformation

cp =

x = y.

This transformation will change the new coefficient to zero with di other c e

efficients of interest unchangecl. So model A baliy has the coefficient array given

This array has the form of coefficient array (2.11) for mode1 B. Cornparhg these

two arrays, we have

which gives the reIation between a3, a4 and ~ 3 , C Y ~ , C:

to order O ( R - ~ / ~ ) . Then, substituting the above relation into (2.24) and (2.25), we

have, for fixeci 8 and varying y,

III particular, if bo = 0, the reIation (2.46) becornes a3 = (113, = aq - 3 4 - 6c to

order ~ ( n - * f 3 ) and the normality measures are reexpressed as

which are (2.26) and (2.27) respectiveIy.

2.8 Discontinuity at the extremes

The Lugannani & Rice cornbining formula has another disadvantage in that it can

produce values outside the acceptable [O, 11 range for pvaiues. The mechanits of this

can be seen in the scaiar parameter case with Say large values of r. We consider this

Eiom a distribution function viewpoint (fixeci 8) rather than the pvalue viewpoint

(fixed Y).

For large values of r the k i t f o d a can be viewed using (2.14) as the integral of

a normal density 4(r) together with an adjustment factor r/q. The hst correction

term to @(r) in (2.1) is $(r)/r which provides the hïiils ratio evaiuation of the

right tail of the normal. As the M i l s ratio for the normal is typically on the large

side, this first correction can produce an appraximate value greater than 1. The

second correction is r /q times the Milis ratio and provides an adjusted M i ratio

appropriate to the scaled density (2.14). If the nght tail is very thin and r/q is smali,

then this compensating adjustment may not be enough to bring the value below 1.

A reasonabIe objective is a modifieci formula that generaiiy tracks the Lugannani &

Rice (2.1) formula but avoids the singularity just described.

Formula (2.1) can be wrïtten

using the nonnormality measure dl Erom Section 2.3 which takes the form (2.24)

in the h e d 0 context. We c m consider this for any fixed T and then examine

convergence as d goes to zero:

a (T; O) = a(r) o(TI-~/~) -

@d(r; 0) = 4(r) O(nA1)

@&(ri O) = 0 O(n-"*) ,

where the subscripts denote differentiation. The second Barndorff-Nielsen formuia

(2.2) can be written

h ( r ; d ) = a (T - T-' 10g(1 - rd))

= ~ ( r + d + r k / 2 )

which of course sathfies (2.49) as is easily checked by Merentiation or expansion.

For simuiations the tail singuIarity with the Lugannani & Rice (2.1) formuia can

be avoided by cornpressing towards the Barndorff-Nielsen (2.2) formula; for the right

tail of the distribution h c t i o n use

and for the left tail use

This ref ains the third order asymptotic property (2.49) but limits the value to being

at most half way from Q2(r,q) to the particular bound. Alternative proportions

even proportions dependent on r can replace the .50 above.

Example 2.5. Consider the noncentral chi-square distribution with noncentral

$ = 1 and degrees of fieedom equal to f = 5. With p as parameter we use the

asymptotic approximations (2.1) and (2.2) to approxirnate the distribution b c t i o n

for the chi-squared variable r2 when 2 = 1; see Figure 2.4. The modification

Gom (2.51) is also plotted there; it does avoid the long range of negative p d u e s

for small values of 73 but still falls short of the exact. We do note that the left end

of the distribution corresponds to a singularity in the original model.

Figure 2.4: The approximations QLR and aBN for the distribution function of

the noncentral chi-squared distribution with degrees of keedom 5 and noncentral

eZ = 1. The compression modification Qc avoids the negative values found with the

aLR approximation.

O 5 10 15 20 25 30 eta squate

Chapter 3

Approximating Tai1 Probabilit ies:

Gamma Combiner

In this chapter we develop a new combining formula for caIculating third or-

der tail probabilities; the formula uses asymptotic results derived fiom the gamma

model. Consider inference for a scalar interest parameter t,b = $ (O) in a continuous

statisticd model with density f (y; 81, where y is a vector of length n and B is a

vector of length p. The inference is presented as an approximate pvalue p($) for

assessing a hypothesised value $. To this end, we calculate the famiiiar first-order

statistics: the signed likelihood ratio statistic r and the maximum likeiihood de-

parture q. Then we calculate the p d u e by assuming that the r and q corne from

testing a hypothesis O = 1 9 ~ for the gamma model, Gamma(z;B,p), using various

values of Bo and various values of the shape parameter p. We s h d show that for

each pair (r, q), there exists a pair (0,p) if some conditions are satis6ed. We will

also show that this new combining formula is a third urder formula.

Section 3.1 discusses the inference in the gamma statistical model. This dis-

cussion provides part of the logic for us to use the gamma as the basis for a new

combining formula. Section 3.2 proposes what can be called a gamma combining

formula for computing pvalues. Section 3.3 gives some numerical resuits. In Sec-

tion 3.4 we apply this combining formula to the noncentral chi-square distribution

and compare the results to those obtained by other existing methods. FinaUy, we

rigorously establish the third order property of this combining formula in Section

3.5.

3.1 Inference in the gamma model

Assume that y is an observation hom the gamma model with density

where y is a scdar variable, 8 is a scalar parameter of interest and p is a mathematical

parameter which we assume is a certain number whose d u e w i i l be d i s c d in the

next section. For computing the pvalue, we wouid begin with d d a t i n g two first

order statistics r and q. The log-likelihood function for this model is e(B) = l(8; y ) =

log f (y; 8, p) = p log(8) - 38, where the constant log ï - l (P) is ignored. Rom this we

have the score b c t i o n X/ûû = p/B - y, and the derivative of the score function

#t?/i%î2 = -p/02. Hence, the maximum Likelihood estimate of B is obtained by

solving the score equation I(9) = 0, and the result is = p/y. The obse~ed Fisher

information is then

Therefore, the signed Iikelihood ratio statistic is given by

and the maximum likelihood departure is given by

where z = 99. It is not diflicult to prove that the above r and q have the relationship

Notice that q > O if ordy if p - yû > O, or equidently, y > p/9 . So we can see that

the gamma distribution has a thin taii on the right and a thick tail on the left.

The above r and q can be used to make inferences in the gamma mode1 such

as testing any hypothesis O = Bo. The pvalue can be calculateci based on formula

(2.12) or (2.13). This, however, is not our puspose here. M e a d , the r and q form

the basis for us to develop the gamma combining formula later in this chapter.

3.2 An alternative third order combining formula:

the gamma combiner

The usefulness of the gamma model (3.1) in computing a pvalue for a general

continuous statistical model is t hat we can incorporate the mat hematicd parameter

p in such a way that an inference in a general model can correspond to a one in a

gamma modei in some sense. This correspondence provides the basis for us to use

the gamma model to compute pvalues in a general model. The properiy (3.4) of the

gamma model implies that it is usefid for both thin-tailed and thick-tded general

model. In fact, the pvaiue in a general model cm be approximated by using the

gamma model if we can properly choase its tails. First, one can show that for my

r and q such that rq > O: if r > q, the system

r = ~ g o ( p - z ) { 2 ( r - p + p l o g p - p l o ~ r ) ) '

q = p - f ( p - r )

has one im iqe solution Ip, 2); and if T < q, the system

has one unique solution (p, 2).

Now we are ready to put forward an alternative third order combining formula,

which can be c d e d the gamma combining formula.

Theofern 3.1 Assume yl, . . . ,y, are observations fiom a generd continuous

statistical mode1 f (y; 6) with good asymptotic properties as outlined in Fraser, Reid,

& Wu (1999) where 8 is a pdimensional parameter. Nso assume that $(a) is a

scaIar interest parameter and r is the usuai signeci square root of the likelihood

ratio statistic given by

and q is calculated as in (1.21). Then to third order the pvalue cm be approximated

where (p, z ) is the unique solution to the system (3.5) or (3.6) in accordance with

Gp(z) = Li I'-' (p) zp- e-'dr (3-9)

is the cumulative distribution function of the gamma distribution with density

Since the system (3.5) is the same as (3.2) and (3.31, we can see that any pvalue

in a generat model with T > q can be approicimated by the pvalue in a gamma

model for testing hypothesis 0 = with observation y0 where z = yuBo. For the

case r < q, we need to switch the tails in some sense.

We make some cumments here: if the statistical mode1 is reaily a gamma model

with location parameter 8 being the ody unknown parameter, then this combining

formula gives exact pvalues for testing 0. Let's look at how badly the pvdues eau

be approximated by the &ing formulas such as the BarndorfI-NieIsen formula

(2.1) or the Lugannani & Rice formula (2.2) when a model has a density peak near

zero. Consider the gamma model

f (y; 8, D) = r-'(@) (~e)~-'e-%

with p = 0.1. Table 3.1 gives the approximations to pvalues for testing hypothesis

8 = Bo for diflerent observed go. Our proposed Gamma combiner gives exact pvalues

for this model while neither the Barndod-Nielsen formula nor the Lugannani & Rice

formula c m give good approximations, though the former does slightly better than

the latter. This suggests that when a model is a gamma rnodel or is dose to a gamma

model, especiaily when the model has a high peak near zero like the gamma model,

one may need sorne formulas like the gamma combiner to calculate p-values with

required accuracy. In fact, this is one of the advantages of the gamma approximation

over other combining formulas. It can provide better approximations or even exact

pvalues in some important target models.

We wiii postpone establishing the third order property of the gamma combining

formula to Section 3.5.

Table 3.1: The gamma mode1 f (y; O, B) = C-l(/?)(yO)~-le-eV, with = 0.1. The

approximations to the pvalues for testing 0 = 1 with Werent obsemed yo.

3.3 Some numerical studies

in this section, we want to see the performance of the gamma combining formula

and compare it with Lugannani & Rice formula (1.11) and the Barndofi-Nielsen

formula (1.15). As the k t example, we check the performance of the gamma

formuia in the context of normal distribution.

Example 3.1. Let 91, y*, - - . , yn be a sample from N(p , $) with unknown mean

y and unknom variance $. Suppose we want to test hypothesis Ho : p = O. In this

example, we know the usual t test is an exact test. So it is easy for us to compare

the performance of these third order formulas.

The log iikelihood function is

It foiiows that the maximum likelihood estimates of p and &! are given by f i = Y and

- n o2 = C(yi - fiI2/n, respectively. We also need the constrained maximum iikelihood

i= 1 n

estimate of $: oi = C(gi - ~ ) ~ / n . Therefoie the signed likelihood ratio statistic i= 1

for testing hypothesis that p is the true value is

the standardized departure of the maximum IikeIihood estima

by using (1 .21) and the resuiting quantity is

Then the pvalue for the one-sided test of p will be given by (1 .11) or (1.15).

The exact t-test statistic is

Y - P t=- S / f i '

where s is the standard deviation defineci by

The t statistic foiiows the t distribution with n - 1 degrees of fieedom.

It is not difEcult to write r and q in terms of the t statistic as

r = sipn(t) {n log (1 + - f2 )Y2 n - l

S i c e t distribution converges to the standard normal distribution as the degrees

of freedom goes to infinity and r, q, and r' al1 also converge to a standard normal

random variable, they give very simiIar approximation when the sample size n is

very large. Thus, to better see the Merence in the performance of these methods,

we choose a smaii sample size, let say n = 3. In this case, the t distribution has tails

much heavier than the standard normal distribution. Figure 3.1 below shows that

the gamma c o m b i i g formula and the Bamdorff-Nielsen c o m b i i g formula both

give very good approximation to the t distribution.

Our purpose here of course is not to replace the exact t-test by our proposed

gamma formula or the Barndorff-NieIsen formula. Instead, we try to get a feeling

for how the gamma formula behaves so we can appIy it when the exact pvaiue is

either unavailable or very hard to get.

Figure 3.1: Approximations to the t distribution by different formuias: the

Barndorff-Nielsen formula, the gamma formula, and the signed likelihood ratio

statistic.

Table 3.2 gives a more detailed cornparison. E'rorn this table we cm see that

the gamma formula gives very good approximations to probabiity at the middle

quantiles of the Student distribution.

Nthough the gamma fomuia has some numericd problems and gives slightly

thinner approximations to probability at the quantiles in the far tails of the Student

distribution, it does give approximations as good as those given by the Barndo&

Nielsen formula or the Lugannani & Rice formula.

The numerical problems aise when we try to solve the equation

Its unique root is so close to zero (in fact O < w < 10-~") if r/Q > 45 that a

computer wiil set this root equd to zero. But to get the pvalue we need a non-zero

root with higher accuracy. As a resuit, the gamma approximation should be avoided

for a range of such cases.

Ezampk 3.2. Logistic model. This is a pure Iocation mode1 with one observation

and with an error distribution given by the iogistic density; thus the mode1 for Y is

For the hypothesis 0 = O0 the signed likeiihood ratio statistic is

Tabh 3.2: Approximations to the Student(2) distribution by the Barndorfi-Nielsen

formula, the Lugannani & Rice formula and the gamma formula.

1 exact probabüity t quantile 1

gamma (3.8)

NA

NA

0.2480 - 0.5803

0.9056 . IO-' 0.1929

0.2937

0.3964

and the standardized departure of the maximum likelihuod estimate is

The numerical results are shown in Table 3.3 where the approximations to pvdues

correspond to various exact pvaiues for these tests. AIso pvalues fiom the Bartlett

correction are reported. The Bartlett correction factor (1.24) is E(w) N 23/20 when

computed fiom the general asymptotic expression which involves cumulants of order

up to four in the uniform distribution (Skovgaard, 2000).

Ekom this table, we know that pvaiues fiom the Barndorff-Nielsen formula, the

gamma formula and the Bartlett correction are very similar but the first two, espe-

Table 3.3: Test in location logistic rnodel. Approximations to pvaiues by different

methods: the signed likelihood ratio, the BarndorfT-Nielsen formula, the gamma

formula and the Bartlett correction.

Exan pvalue 1 pvalue by r 1 Barndoiff-Nielsen 1 Gamma 1 Bartlett

cialiy the gamma formda, give much better approximations than the signeci square

root of the Iikelihood ratio statistic. Skovgaard (2000) gave a simiiar cornparison of

these these methods except the gamma formula.

We also compare approximations to the signiacance function, a function givingp

value for Mereut Bo for given data. Here we assume, without loss of generality, that

the observeci y is O. The significance function approximations are plotted in Figure

3.2 . This figure shows that the approximations given by the gamma formula and

the Barndorff-Nielsen formula are alrnost the same as the exact hrnction while the

likelihood root T gives an unsatisfactory approximation to the significance fundion.

3.4 Inference for the noncentral chi-squared dis-

t ribution

As another example, in this section we use the gamma combiner to approximate

the noncentrai chi-squared distribution and compare its accuracy with that of several

other methods.

The noncentrai chi-squared distribution is often encountered when we want to

caIcuiate the power of tests on the mean of a rndtivariate normal distribution, for

detaiis, see for example Anderson (1975) and Patnaik (1949). It mises naturaiiy

fkom normal distribution. More specificaiiy, if 91, a,. . . ,yn are independent and

normally distributed with mean and variance 1. Then

i=l

foiiows the chi-square distribution with n degrees of fieedom and non-centrality

n

Umaiiy we want to calculate its cumuIative distribution function which can be

Figure 3.2: Logistic mode1 f (y; 8 ) = 9 with observation y = O. Ap (1 + eV-@)

proximations to significance hct ion by different methods: the gamma formula, the

Bamdorff-Nielsen forda , and the signed likelihood ratio statistic.

where 2+% is a chi-sqmed distributed with n + 2k degrees of heedorn. See, for

example, Johnson & Kotz 1970, p. 132 for details.

Many authors have studied this noncentral chi-squared distribution in order to

avoid using the infinite series (3.10). Arnong them, Bol'shev and Kuznetzov (1963)

obtained an approximation

Cox and Reid (1987) proposed two simpler approximations

and

The approximation (3.12) was obtained by inverting the cumuiative generating h c -

tion and using an Edgeworth expansion while (3.13) can be seen as its asymptotic

expansion.

Cohen (1988) provided a procedure for evaluating a noncentral 2 distribution.

This method uses a tabuiation of the three Iowest degree of freedom of the noncentrd

chi-square distribution function or equivalently an effective computer algorithm for

their evaluation and it requires recursive evaiuation.

Wu (1999) developed a saddlepoint type of approximation. To get this approxi-

mation, first we note that the density of 112 is (Johnson and Kota, 1970)

so its moment generating function is

and its cumulant generating function is

n 89 K(s) = log M ( s ) = -- log(1- 23) + -

2 1 - 2s'

The saddlepoint density approximation is then given as

where

is the second derivative of K(s), and S is the saddlepoint given by

Therefore the tail probability based on Lugannani-Rice formula (2.11) is

where

Fraser, Wong & Wu (1998) discussed what is called the double saddlepoint a p

proXimation (Reid, 1995). This appruximation uses the Lugannani & Rice formula

(1.11) or the Banidodf-Nielsen formula (1.15) in which Q is obtained by (1.21).

The main idea is to reparameterize the problem. More spcifically, let yi = pcui + G,

where is an n-dimensional vector, i = 1,2, - - - ,n and el, ez, - - - ,e, are a sample

from the normal distribution with mean O and known variance ai. The parameter

is 0 = (p, a) with p taken as the parameter of interest.

The probabity to the left of an obsemed $ for the non-central chi-squared

distribution with n degrees of beedom and noncentrality pl is given by Gd ($) as

in (3.10). This probability can be approximated by the Barndofi-Nieisen formula

(1.15) or the Lugannani & Rice formula (LU), with third order accuracy. There

the r and Q are given by

and

Aiternatively, we may use gamma combining formula (3.8) and with the same T

and Q above. Table 3 4 and Table 3.5 beIow give numencal cornprisons of these

methods for some selected vatues of the distribution function for the noncentral

chi-square distribution with n degrees of freedom and noncentrality $. Part of the

tables are cited here from Raser, Wong & Wu (1998) and Wu (1999). In aii these

examples, the exact pro ba bilities are O btained by

where N is an integer chosen to satisfy

We can see, from these two tables, that Cox & Reid's approximations (3.12)

and (3.13) seem unsatisfactory: the former is good only for small p and the latter is

even worse. So does Bol'shev & kuznetzov's approximation (3.11). The saddlepoint

approximation (3.14) breakdowns in a large neighborhood of some value of V* for

which the saddlepoint is zero. in addition, its performance elsewhere is not good

as the double saddlepoint approximation using the Barndorff-Nielsen formula (1.15)

or the Lugannani & Rice formula (1.11) based in r in (3.15) and Q in (3.16). Gen-

erally speaking, the Barndo&- Nielsen method provides the best approximation to

noncentral Chi-square distribution. The gamma combining formuIa (3.8) can also

give very good approximations to the noncentrai chi-squared distribution, especidy

for the noncentral chi-squareci distribution with a medium large number of degrees

and a large noncentrality. Although it may sometimes have numerical problems as

reporteci in Example 3.2 in the previous section when r and Q are quite different

in magnitude, let Say when r/Q > 45 or Q/r > 45, it outperforms the double

sacidepoint approximations in rnany cases.

Figune3.4 and Figure 3.4 compare the approximations to the distribution of the

noncentrai chi-squared distribution by different methods. F'rom these figures we

c m see the performance of these methods for the whole range of the variable 171.

The approximations to the noncentral chi-squared distribution with degrees of free-

dom n = 5 and the noncentrality p2 = LOO by the Barndorfl-Nielsen method, the

Lugannani-Rice method and the Gamma method are so close to the exact distribu-

tion that they overlap in this figure. Figure 3.4 shows that these three methods give

very good approximation though the BarndorfT-Nielsen method gives the best one

for the case with n = 10 and 9 = 25.

Table 3.4: Approximations to Gd (r12) with n = 2

Cox & Reid (3.12) Cox & Reid (3.13) Bol'shev (3.11) Saddlepoint (3.14 ) Lugannani-Rice(l.11) Barndorff-Nielsen(l.15) Gamma (3.8) Exact (3.10)

Cox & Reid (3.12) Cox & Reid (3.13) Bol'shev (3.11) Saddlepoint (3.14 ) Lugannani & Rice(l.11) Bmdorff-Nielsen(1. 15) Gamma (3.8) Exact (3.10)

Table 3.5: Approximations to Gd(v2) with n = 5

Cox & Reid (3.12) Cox & Reid (3.13) Bol'shev (3.11) Saddiepoint (3.14 ) Lugannani & Rice(l.11) Barndorff-Nielsen(l.15) Gamma (3.8) Exact (3.10)

Table 3.6: Approximations to Gd(v2) with n = 10

Cox & Reid (3.12) Cox & Reid (3.13) Bol'shev (3.1 1) Saddiepoint (3.14 ) Lugannani & Rice (1.11) BarndorfWielsen(l.15) Gamma (3.8) Exact (3.10)

Figure 3.3: Cornparison of approximations to noncentral x2 with degrees of kee-

dom n = 5 and noncentraiity 8 = 100. The two Cox k Reid methods are (3.12)

and (3.13) respectively.

1 1 - I

O 50 100 1 50 200 250 300 eta square

Figure 3.4: Cornparison of approximations to noncentral x2 with degrees of hee

dom n = 10 and noncentrality 2 = 25.The two Cox & Reid methods are (3.12) and

(3.13) respectively.

O 20 40 60 80 100 eta square

3.5 Theoretical work on the Gamma Combiner

In this section, we give the proof of Z'lmxern 3.1. That is, we establish the third

order property of the Gamma Combiner.

Without loss of generality, we simply assume hereafter r > q > O. Then (p: z ) is

the solution to the equation system

Other cases can be proved in a similar way.

We will proceed in several steps. Since we are developing asymptotic results, al1

the arguments here are asymptotically correct to the appropriate order. First we

note from Section 2.4 and Section 2.5 that, for a generai model with asymptotic

properties, r and q satisfy

where a is some certain constant determined by the model.

Lemma 3.1 Let n denote the size of the samplefrom which T and q are calmlated.

Then p -+ oo in probability as n + m.

Pmof. We know (p, z) is the solution to equation system (3.17). Let w = z/p,

2 w -1 - Iogw then w must satkfy F(w) - r2 = O where F(w) = -- is a strîctly ;;a Iw - 1)

increasing function, thus q = F(w) = 5 as n + m. Conseqoently, ve have 2

that w = F-I (9) -+ 1, aa n + m. Therefore p = i2 + w, as n -t m. (1 - 4

Lemma 3.2 The T and q in (3.1 7) satisfy

Proof. Expmding the T in (3.17), we obtain

which is (3.19). It is not difücuit to prove (3.20) and (3.21) based on (3.19).

By comparing (3.18) and (3.21), one may have

Lemma 3.3 The solution p hm the same order of magnitude as saiple sue n.

In order ta simp1ify the calculations in the proof of Theorem 3.1, we aiso need

the Hermite polynomials which are defined by

The explkit forms of the Hermite polynomiais Ho to Hs(x) are as follows (see for

example BP-dorff-Nielson and C a 1989):

Hermite polynomiais have many interesthg properties, A property used in our proof

is the relationship in the foiiowing Iemma

Lemma 3.4 For Hennite polynomaals,

Pmof. n o m defuition (3.22) of polynomiais, we have

integrating both sides of (3.24) fields

which is (3.23).

As we know, the Lugannani & Rice combining formula (1.11) has third order

accuracy, it sufices to show that

Now we are ready to prove it-

The proof of Theorem 3.1: For r > q > O, we have

Note that (3.26) is the probabiiity Pr{U > 2 ) where random variable U follows the

gamma distribution with density

We know U has mean p and variance p. As we want to approximate probability

(3.26) by the standard normal density and distribution functioa as in (3.25), we

need to standardize the gamma variable U. Let

Then V has mean zero and standard deviation one. Then

where

is the density of V. We need the asymptotic expansion of this density in terms of the

standard normal density t$(v). By substituthg the following asymptotic expansion

(see Albramowitz and Stegun 1972, Lawless 1982)

into log f (u), we obtain

Therefore

where #(a) is the density of the standard normal. After substituthg (3.28) into

(3.27), we have

FinaUy the substitutions of (3.20) and (3.21) into the above (3.29) and some algebraic

calcuiations yield (3.25).

Chapter 4

Approximat ing Tai1 Probabilit ies:

the Student t Combiner

In this chapter we s h d develop another third order combiiing formula. The idea

in that is, for asymptotic inference in a generai statisticd modeI, we 6rst calculate

the signeci ükelihood ratio statistic r and the standardid departuce of the maximum

likelihood estimate q. We think of this pair of r and q as coming from a location

Student distribution mode1 where we test the location parameter. We then seek

to determine the degrees of the fieedom and the coaesponding quantile of the t

distribution. The desireci pvdue is f i n d y approximated by the distribution function

of Student's t distribution, which is readily available in any standard textbook.

4.1 Why the Student t combiner?

As we know, the Barndorff-Nielson formula and the Lugannani & Rice formula

are both third order formula for approximating taii probabities. They give very

good approximations in many situations even when the sample size is very srnall.

However, they give only approximate but not exact tail probabilities even in some

very simple cases. Example 3.1 gives such a case where yl, y2, - + + , y,, are independent

and identically normdy distributeci with unknom mem p and unknown variance

a2. We know the log iikeiihood function is

l n

((p, c) = -n log O - - C(y, - p)2. 2a2 .-

I- 1

Therefore the signeci iikelihood ratio statistic of testing hypothesis that p = po is

the true value is

The standardid departure of the maximum likelihood estimate cm be calculated

by using (1.21) as:

Then an approximation to the pvalue of the one-sided test of p = iç p ( ~ ) or

1-p(h) where p ( h ) is given by the Lugannani-Rice formula (1.11) or the Barndorff-

Nielson formda (1.15). However, we know the Student t statistic

gives an exact test. Since Student-like distributions are very common in practice and

they have thicker tailed probability densities than the normal. A question naturaiiy

d e : Cm we have a new combiner that can process r and q better and give a better

approximation to the pvaiue?

For thii, it is not di£ûcult to write r and q in terms of the t statistic and degrees

of freedom f , where f = n - 1, as

A naturai solution would be, for the Student-like statistical model, we calculate

r and q in a usual way, then solve (4.1) and (4.2) for t and f , and the desired pvaiue

is obtained by Hl( t ) , where il/ denotes the cumulative distribution function of the

Student t distribution with f degrees of freedom.

Of course, in order for the equation systern consisting of (4.1) and (4.2) to have

a solution (t, f), it is not difEcult to show that r and q should satisfy

In other cases, the method provided here should be avoided.

4.2 Inference in the location Student mode1

Suppose y cornes fÎom a Iocation Student t distribution witb f degrees of freedom.

That is,

y = p + e

where e follows the Student distribution with f degrees of heedom. Therefore y has

a density Eunction as

where g(t) is the density for the Student distribution with f degrees of fieedom:

f + 1 2 ( + - y - 1 h(t) = r(ip)r(f/2) fi.

If we want to test the hypothesis: p = h, we proceed as follows, The likelihood

function is

f + l l ( p ) = -- 2 log 1 + -(t - p ) 2 , ( ; )

from this, we know the score fwiction and the derivative of the score function are,

Therefore the maximum iikelihood estimate of p is Cr = t , and the observed Fisher

information is jp,, = (f + l)/ f. The new parameterization is

We have, consequently,

It is not difficult to get the signed Likeliood statistic ratio:

Therefore, comparing (4.6) and (4.5) with (4.1) and (4.2), we know the Student

combiner gives an exact pvalue for testing the hypothesis Ho: p = in the location

Student t distribution model.

4.3 The Student t combiner

In the previous two sections, we discussed two situations. In the ûrst situation

we test p with the normal distribution N ( p , 8 ) and in the second we test p with the

Student distribution centered at y. Neither the Barndorff-Nielsen formula nor the

Lugannani & Rice formula can process r and q to give exact pvalues for these two

important modeIs. Both models have thicker tails than the normal distribution. We

propose what can be cded the Student t combiner to target some similar models

iike these.

As we suggested in Section 4.1, to get p d u e s for statistical models with thick

tails, that is, the ones with r/q > 1 where r and q are calculateci in step 1 below.

we use the foUowing procedure:

Step 1. Use (1.17) and (1.21) calculate r and q as usual;

Step 2. Solve

for f and t;

Step 3. Get pvalue Hl@), where HI is the cumulative distribution function of

the Student t distribution.

The idea of establishing the third order property of the Student combiner is

similar to the one used in the proof of the Gamma combiner. However, in this

case we need the condition l/r - l / q = O(n-L), which basicaüy requires that the

model in question be symmetric in some sense. This condition coma naturaily if

we consider the symmetry of the Student distribution, though it may not be triviai

at aii. However, if a model is not far away from this requirement, this procedure

rnight still apply and we need numencal simulations to c o w this and the order

of this procedure remains an open issue.

As we have seen in Section 4.1 and Section 4.2, in normal distribution sarnpIing

the Student combiner d l give an exact pvaiue for testing the location p. Then, as

suggested by Prof. John Tukey, this approximation would be preferable for location

scale anaiysis where the error distribution could be at or near the normal form; for

some related discussion see Fraser, Wong and Wu (1999).

Chapter 5

Conclusion

A bridging method is developed to calculate the pvalue for the value of the

interest parameter in the neighborhood of the maximum likelihood estimate and

it is rigorously established by using likelihood asymptotics. Its importance can be

found in simulations and hypothesis testing. The inaccuracy in calculating the p

values for this srnail neighborhood may not only drarnatically distort the behavior

of the statistic in question in the simulations and thus Iead to incorrect conclusions

about the statistic, but may also Iead to an incorrect rejection of a hypothesis in

which the tested value is very close to the maximum Likelihood value. This method is

a downstream version of the issue address addressed by DanieIs (1987) who defined

the pvalue ody at the maximum likelihood value in the sense that our bridging

method define p-values for the neighborhood and the pvalue at the maximum value

coincides with the one defined by DanieIs.

Alternative combining formuias, the Gamma Combiner and the Student Com-

biner, are aIso developed in this thesis. One of the advantages of the Gamma

Combiner is that it gives exact pvalue for gamma-type modeis. It also targets other

sUnilar modeis. Numerical studies Uidicate its good pedorrnances. Correspondingly,

the Student combiner gives exact pvaiue in the normal distribution sampling mod-

els while it is also preferable for location scaie anal@ mhere the enor distribution

could at or near the normal form.

The Lugannani & Rice formula has the problem of producing pvaiues outside

the acceptable [O, 11 range. Neither the Gamma combining formula nor the Student

combining formula has this kind of discontinuity at the extremes. This is another

advantage of these two combining formulas.

Bibliograp hy

[l] Abebe, F., Cakmak, S., Cheah, P.K., Fraser, D.A.S., Kuhn, J., and Reid, N.

(1995). Third order asymptotic model: Exponentid and Location type approx-

imations. Parisankhyan Samikhha 2, 25-35.

[2] Anderson. T.W. (1975). An Introduction to Multivafiate Statistical Analysis.

New York: Wiley.

[3] Andrews, D.F., fiaser, D.A.S. and Wong, A. (2000). Higher order Laplace in-

tegration and the hyperaccuracy of ment likeiihood methods, submitted Jour.

Amer. Stutist. Assoc.

[4] Barndorff-Nielsen, O.E. (1980). Conditionaiity resolutions. BiometriXa 67,293-

310.

[5] Barndora-Nielsen, O.E. (1983). On a formula for the distribution of the maxi-

mum likelihood estimator. Biometrika 70,343-365.

[6] Barndorff-Nielsen, O.E. (1985). Confidence limits £rom C ( ~ ( ~ / ' E in the single

parameter case. Scand. J. Statist. 12, 8387.

[7] Barndofi-Nielsen, O.E. (1986). Inference on fidi or partial parameters based

on the standardized signed log iikelihood ratio. Biometrika 73, 307-322.

[8] Barndofi-Nielsen, O.E. (1988). Discussion of " Saddlepoint methods and sta-

tistical inference" by N. Reid. Stotistical Science3, 228229.

[9] Barndodi-Nielsen, O.E. (1990). Approximate i n t e rd probabiIities. J.R. Statut.

Soc. B 52, 485-96.

[IO] Barndorff-Nielsen, O.E. (1991). ?dodilied signed log iikelihood ratio. Biometrika

78, 557-63.

[lll Brundo&-Nielsen, O.E. (1994). -4djusted versions of profile LikeIihood and di-

rected likelihood, and extended iikelihood. 3. R. Statist. Soc. B 56, 125-40.

(121 BarndorE-Nielsen, O.E. & Chamberün, S.R. (1991). An ancillary invariant mod-

ification of the signeci log iikelihood ratio. Scand. J. Statist. 18,341-52.

[13] Barndorff-Nielsen, O.E. & Chamberlin, S.R. (1994). Stable and invariant ad-

justed directed likelihoods. Biometrika 81, 485-94.

[14] Barndorff-Nielsen, O.E. & Cox, D.R (1979). Edgeworth and saddlepoint a p

proximations with statistical applications. J.R. Statist. Soc. B 41, 279-312.

[15] Barndofi-Nielsen, O.E. & Cox, D.R. (1989). Inference and Asynptotics. Lon-

don: Chapman and Hd.

[16] Barndorff-Nielsen, O.E. & Cox, D.R. (1994). Asy~nptotic Techniques for Use in

Stotistics. London: Chapman and Haii.

[17] Bmdorff-Nielsen, O.E. & Wood, A.T.A. (1998). On large deviations and choice

of ancillary for p' and r'. Bernoulli 4, 35-63.

[18] Bartiett, M.S. (1937). Properties of sufficiency and statistical tests. Proc. Roy.

Soc. London Ser. A160, 268282.

[19] C h & , S., McDunnough, P., Reid, N. & Yuan, X. (1998). Likelihood centered

asymptotic model: exponential and location model versions. J. Statist. Plan.

Inf. 66, 211-22.

(201 Cheah, P.K., F'raser, D.A.S. & Reid, N. (1995). Adjustment to Iikelihood and

densities; caiculating significance. J. Statist. Res, 29, 1-13.

[21] Cohen, J.D. (1988). Noncentral chi-square: some observations on recurrence.

The American Statistician 42, 12c122.

[22] Cm, D.R. & Reid, N. (1987). Approximations to noncentrd distributions.

Canadian Journal of Statistics 15, 105-114.

[23] Cramer, H. (1938). Sur un nouveau theoreme-limite des probabiiities. Actualités

Sci. Indust. 736, 5-23.

[24] Daniels, H.E. (1954). Saddlepoint approximations in statistia. Ann. Math.

Statist. 25, 631-650.

[25] Daniels, H.E. (1987). Tai1 probabity approximations. Internationd Statistical

Review 55, 37-48-

[26] DiCiccio, T. (1986). PLppraximate conditionai inference for location families.

Canad. J. StatZst. 14, 1-18.

[27] DiCiccio, T., Field, C. & Fkaser, D.A.S. (1990). Marginal taiI probabilities and

inference for real parameters, Biometrika 77,6576.

[28] DiCiccio, T. & Martin, M.A. (1991). Approximations of marginal taii probabili-

ties for a class of smooth functions with applications to Bayesian and conditionai

inference. Biometrika 78, 891-902.

[29] DiCiccio, T. & Martin, M.A. (1993). Simple modilications for signed roots of

likelihood ratio statistics. J. R. Statist. Soc. B 55, 305-18.

(301 DiCiccio, T. & Stern, S.E. (1993). On Bartlett adjustments for approximate

Bayesian infer ence, Biometrika80, 731-740.

[311 Durbi, J. (1980). Approximations for densities of sufficient estimators.

Biometrika 67, 31 1-333.

[32] Efron, B. (1981). Nonparametric standard enors and confidence intervals (with

discussion). Cunad. J. Statisf. 9, 139-172.

(331 Esscher, F. (1932). The probability function in the collective theory of risk.

Skand Aktuaritidskr. 15,175-195.

[34] Fisher, R.A. (1922). On the mathematicai foundation of theoretical statistics.

Phil. %ns. Ray. Statkt. Soc. Ser. A 222, 309-368.

[35] Fisher, R.A, (1925). Theory of Statisticai estimation. Pmc. Cambridge Philos.

Soc. 22,700-725.

p6] Fraser, D.A.S. (1964). Local conditional sufEciency. J- R. Statist. Soc. B 26,

52-62.

[37] Fraser, D.A.S. (1990). Tai1 probabilities fiom observed Likeiihood. Biametriko

77, 67-76.

[381 Fraser, D.A.S. & Reid, N. (1993). Simple asyrnptotic connections between den-

sity and cumulant functions leading to accurate approximations for distribution

functions. Statist. Siniw 3,67-82.

[39] Raser, D.A.S. & Reid, N. (1995). Anciiiaries and third-order significance. Utd-

itas Mathematica 7,3353.

1401 Fraser, D. AS. & Reid, N. (1998). hcillary information for statistical inference.

P roceedings of a Symposium on Empiricai Bayes and Likelihood Inference. New

York: Springer-Verlag, to appear.

[41] Fraser, D.A.S. and Reid, N. (2000). Ancillary information for statisticai infer-

ence, Proceedinga of the Conference on Empin'caI Bayes and Likelihood, Eds:

E . h e d and N. Reid, Springer-Verlag,to appear.

[42] Fraser, D.A.S., Reid, N. & Wong (1991). Exponentiai b a r models: a two-pass

procedure for saddlepoint approximation. J. Roy, Statist. Soc. B 53, 483-492.

[43] Fkaser, D.A.S., Reid, N. and Wu, J (1999). A simple general formula for tail

probabilities for frequentist and Bayesian inference. Biometrika 86, 249-264.

[44] Raser, D.A.S., Wong, A.C.M. and Wu, J (1998). An approximation for the

noncentral Chi-square distribution. Commun. Stafist. - Simula. 27(2), 279-287.

[45j HinkIey, D.V. (1977). Conditionai inference about a normal mean with knom

coefficient of variation. Biometrika 64, 105-8.

[46] Lawley, D.N. (1956). A generai method for approximating to the distribution

of the likehhood ratio criteria. Biometrika 43, 295-303.

[47] Jensen, J.L. f 1992). The modifiecl signed likelihood statistic and sacidlepoint

approximations. Biometrika79 693-704.

[48] Johnson, N.L. & Kotz, S. (1970) Distribution in Statistics: Continuow Uni-

vanate Dkhibutions. New York: Houghton MiffIin.

(491 Lieblein, J. & Zelen, M. (1956). Statistical investigation of the fatigue life of

deep groove b d bearings J. Research, National Bureau of Standards 57, 273-

316.

[50] Lugannani, R. & Rice, S. (1980). Saddepoint approximation for the distribution

funetion of the sum of independent variables. Adu. Appl. Pmb. 12, 475-90.

[51] Pace, L. & Sdvan, A. (1997). Principles of Stai%~tz'cul inference h m a neo-

Fishenan perspectiue. Singapore: World ScientSc Publishing Co.

[52] Patiak, P.B. (1949). The noncentral x2 and F distributions and their applica-

tions. Biometrika 38,202-232.

1531 Pierce, D.A. & Peters, D. (1992). Practicd use of higher-order asymptotics for

muhiparameter exponentiai famiiies (with Discussion). J. R. Statkt. Soc- B 54,

(541 Pierce, D.A. & Peters, D. (1994). Higher order asymptotics and the iikelihood

principle: One parameter models. Biometrika 81, 1-10.

[55] Posten, H.O. (1989). An effective algorithm for the noncentral chi-square dis-

tribution function. The American Statisticien 43, 261-263.

[56] Reid, N. (1988). Saddlpoint methods and statistical inference (with discussion).

Canad. J. Statist 24, 141-166.

[571 Reid, N. (1995). The roles of conditioning in inference. Statisticut SMencelO,

13û-157.

[58] Reid, N. (1996). Likehood and higher-order approximations to tail areas: A

review and annotated bibiiography. Canad. J. Statist. 24, 141-66.

[59] Skovgaard, I.M. (1986). Successive improvements of the order of ancillarity.

Biometrika 73, 516-19.

[60] Skovgaard, LM. (1996). An explicit large-deviation approximation to one pa-

rameter tests. Bernoulli 2, 145-65.

[61j Skovgaard, LM. (2000). Lielihood Asymptotics. Scand. 3. Statut., to appear.

[62[ Vangel, M.G. (1996). Confidence intervals for a normal coefficient of variation.

Amer. Stctist. 15, 21-6.

[63] Wald, A. (1941). AsymptoticaIly most pmrful tests of statistical hypotheses.

Ann. Math. Statist. 12, 1-19.

[64] Waid, A. (1943). Tests of statistical hypotheses concerning several parameters

when the number of observations is large. Duns. Amer. Math. Soc. 54,426482.

[65] Wang, S. & Gray, H.L. (1993). Approximating tail probabities of noncentral

distri bution. Computational Statistic. and Data Anaiysis 15, 343-352.

[66] Wi, S.S. (1938). The large-sample distribution of the iikelihood ratio for

testing composite hypotheses. Ann. Math. Statist. 9, 60-62.

[67] Wu, J . (1999). Asymptotic Lzkelzhood Inference. Pb.D. Thesis, Department of

Statistics, University of Toronto.

on asymptotic likelihood inference: removzng p-value ... · l($, a) for sxed @, &(@) = 1($, b)...

Documents