score tests in semiparametric models raymond j. carroll department of statistics faculties of...

Score Tests in Semiparametric Models

Raymond J. CarrollDepartment of StatisticsFaculties of Nutrition and

Toxicology

Texas A&M Universityhttp://stat.tamu.edu/~carroll

Papers available at my web site

Texas is surrounded on all sides by foreign countries: Mexico to the

south and the United States to the east, west and north

College Station, home of Texas A&M University

Big Bend National Park

Wichita Falls, Wichita Falls, that’s my hometown

West Texas

Palo DuroCanyon, the Grand Canyon of Texas

Guadalupe Mountains National Park

East Texas

Palo Duro Canyon of the Red River

Co-Authors

Arnab Maity

Co-Authors

Nilanjan Chatterjee

Co-Authors

Kyusang Yu Enno Mammen

Outline

• Parametric Score Tests

• Straightforward extension to semiparametric models

• Profile Score Testing

• Gene-Environment Interactions

• Repeated Measures

Parametric Models

• Parametric Score Tests

• Parameter of interest =

Nuisance parameter =

Interested in testing whether

Log-Likelihood function = L (Y ;X ;Z;¯ ;µ)

Parametric Models

• Score Tests are convenient when it is easy to maximize the null loglikelihood

• But hard to maximize the entire loglikelihood

P ni=1L (Y i ;X i ;Zi ;0;µ)

P ni=1L (Y i ;X i ;Zi ;¯ ;µ)

Parametric Models

• Let be the MLE for a given value of

• Let subscripts denote derivatives

• Then the normalized score test statistic is just

bµ(¯ )

S = n¡ 1=2P ni=1L ¯ fY i ;X i ;Zi ;0;bµ(0)g

Parametric Models

• Let be the Fisher Information evaluated at = 0, and with sub-matrices such as

• Then using likelihood properties, the score statistic under the null hypothesis is asymptotically equivalent to

n¡ 1=2P ni=1

·L ¯ fY i ;X i ;Zi ;0;µg

¡ I ¯ µI ¡ 1µµL µfY i ;X i ;Zi ;0;µg

I ¯ µ

Parametric Models

• The asymptotic variance of the score statistic is

• Remember, all computed at the null = 0

• Under the null, if = 0 has dimension p, then

T = I ¯ ¯ ¡ I ¯ µI ¡ 1µµI µ¯

S > T ¡ 1S ) Â2p

Parametric Models

• The key point about the score test is that all computations are done at the null hypothesis

• Thus, if maximizing the loglikelihood at the null is easy, the score test is easy to implement.

Semiparametric Models

• Now the loglikelihood has the form

• Here, is an unknown function. The obvious score statistic is

• Where is an estimate under the null

L fY i ;X i ;¯ ;µ(Zi)g

µ(¢)

n¡ 1=2P ni=1L ¯ fY i ;X i ;0;bµ(Zi ;0)g

bµ(Zi ;0)

Semiparametric Models

• Estimating in a loglikelihood like

• This is standard

• Kernel methods used local likelihood

• Splines use penalized loglikelihood

L fY i ;X i ;0;µ(Zi)g

µ(¢)

Simple Local Likelihood

• Let K be a density function, and h a bandwidth

• Your target is the function at z• The kernel weights for local likelihood are

• If K is the uniform density, only observations within h of z get any weight

iKZ -zh

Only observations within h = 0.25 of x = -1.0 get any weight

• Near z, the function should be nearly linear

• The idea then is to do a likelihood estimate local to z via weighting, i.e., maximize

• Then announce 0θ(z)

P ni=1K

µZi ¡ z

¶L fY i ;X i ;0;®0 + ®1(Zi ¡ z)g

• It is well-known that the optimal bandwidth is

• The bandwidth can be estimated from data using such things as cross-validation

h / n¡ 1=5

Score Test Problem

• The score statistic is

• Unfortunately, when this statistic is no longer asymptotically normally distributed with mean zero

• The asymptotic test level = 1!

h / n¡ 1=5

S = n¡ 1=2P ni=1L ¯ fY i ;X i ;0;bµ(Zi ;0)g

Score Test Problem

• The problem can be fixed up in an ad hoc way by setting

• This defeats the point of the score test, which is to use standard methods, not ad hoc ones.

h / n¡ 1=3

Profiling in Semiparametrics

• In profile methods, one does a series of steps

• For every , estimate the function by using local likelihood to maximize

• Call it

P ni=1K

µZi ¡ z

¶L fY i ;X i ;¯ ;®0 + ®1(Zi ¡ z)g

bµ(z;¯ )

• Then maximize the semiparametric profile loglikelihood

• Often difficult to do the maximization, hence the need to do score tests

n¡ 1=2P ni=1L fY i ;X i ;¯ ;bµ(Zi ;¯ )g

• The semiparametric profile loglikelihood has many of the same features as profiling does in parametric problems.

• The key feature is that it is a projection, so that it is orthogonal to the score for , or to any function of Z alone.

• The semiparametric profile score is

n¡ 1=2P ni=1

L fY i ;X i ;¯ ;bµ(Zi ;¯ )g =0

¼n¡ 1=2P ni=1

·L ¯ fY i ;X i ;0;bµ(Zi ;0)g

+L µfY i ;X i ;0;bµ(Zi ;0)g@

@bµ(Zi ;¯ )¯ =0

• The problem is to compute

• Without doing profile likelihood!

bµ(Zi ;¯ )¯ =0

• The definition of local likelihood is that for every ,

• Differentiate with respect to .

0 = E£L µfY ;X ;¯ ;µ(Z;¯ )gjZ = z

• Then

• Algorithm: Estimate numerator and denominator by nonparametric regression

• All done at the null model!

bµ(Z;0) = ¡E

hL ¯ µfY ;X ;0;µ(Z;0)gjZ = z

E£L µµfY ;X ;0;µ(Z;0)gjZ = z

Results

• There are two things to estimate at the null model

• Any method can be used without affecting the asymptotic properties

• Not true without profiling

bµ(Z;0)@

@bµ(Z;0) = bµ¯ (Z;0)

Results

• We have implemented the test in some cases using the following methods:• Kernels• Splines from gam in Splus• Splines from R• Penalized regression splines

• All results are similar: this is as it should be: because we have projected and profiled, the method of fitting does not matter

Results

• The null distribution of the score test is asymptotically the same as if the following were known

µ(Z) @@

µ(Z;0) = µ¯ (Z;0)

Results

• This means its variance is the same as the variance of

• This is trivial to estimate• If you use different methods, the

asymptotic variance may differ

n¡ 1=2P ni=1

·L ¯ fY i ;X i ;0;µ(Zi)g

+L µfY i ;X i ;0;µ(Zi)gµ¯ (Zi ;0)¸

Results

• With this substitution, the semiparametric score test requires no undersmoothing

• Any method works

• How does one do undersmoothing for a spline or an orthogonal series?

Results

• Finally, the method is a locally semiparametric efficient test for the null hypothesis

• The power is: the method of nonparametric regression that you use does not matter

Example

• Colorectal adenoma: a precursor of colorectal cancer

• N-acetyltransferase 2 (NAT2): plays important role in detoxification of certain aromatic carcinogen present in cigarette smoke

• Case-control study of colorectal adenoma• Association between colorectal adenoma

and the candidate gene NAT2 in relation to smoking history.

Example

• Y = colorectal adenoma

• X = genetic information (below)

• Z = years since stopping smoking

score tests in semiparametric models raymond j. carroll department of statistics faculties of...

Documents

raymond j. carroll texas a&m university carroll...

sieve-based empirical likelihood under semiparametric

penalized profiled semiparametric estimating...

semiparametric estimation of average treatment...

root-n-consistent semiparametric...

semiparametric eﬃcient estimation of partially linear...

bayesian semiparametric multivariate garch modeling

semiparametric mode regression

a semiparametric model for heterogeneous panel data … ·...

estimation in semiparametric spatial regression

endogenous semiparametric binary choice models with

semiparametric stationarity tests based on adaptive ......of...

semiparametric estimation of fixed e ects panel data varying...

penalized quantile regression with semiparametric...

inference on semiparametric multinomial response models

semiparametric modeling of autonomous nonlinear dynamical

carroll-wang biology research group department of statistics...

semiparametric regression for assessing agreement using...

semiparametric methods for colonic crypt signaling raymond...

semiparametric estimation in the secondary analysis of...