empirical likelihood estimation and testing in covariance structure...

Empirical Likelihood Estimation and Testing in CovarianceStructure Models

Stanislav Kolenikov∗ Yiyong Yuan†

February 17, 2009

Abstract

This paper applies empirical likelihood approach to estimation and testing in structural equation mod-els. Relations to other estimation methods and optimality of the moment conditions as estimating equationsare demonstrated. Model fit for over-identified models can be tested using the empirical likelihood ratio.The bootstrap can be used to estimate Bartlett correction factor to improve the accuracy of the asymptoticχ2 distribution of this test. Estimation examples of real and simulated non-elliptically distributed data areprovided.

AMS Classification: 62F12, 62H12, 62H15Keywords: Bartlett correction, covariance structure models, empirical likelihood, simulation, structural

equation models

∗Corresponding author. Mailing address: 146 Middlebush Hall, Department of Statistics, University of Missouri, Colubmia, MO65211-6100, USA. Email: [email protected]. Phone: +1 (573) 882-6376. Fax: +1 (573) 884-5524.†324 Mumford Hall, Department of Agricultural Economics, University of Missouri, Columbia, MO 65211, USA.

1

1 Introduction

Structural equation modeling (SEM) is a set of statistical techniques to analyze linear relations among ran-dom variables including unobserved (latent) ones. They are widely used in social sciences, such as psychol-ogy, sociology, anthropology (MacCallum & Austin 2000), and gaining recognition in natural sciences suchas ecology (Pugesek, Tomer & von Eye 2002). The first general formulations of SEM are due to the worksof Joreskog, Keesing and Wiley in the 1970s. Their framework is known as JKW or LISREL model and iswidely used in current SEM work. There are also formulations proposed by Bentler & Weeks (1980) andMcArdle & McDonald (1984) that are of similar representation capacity.

At its early stages, estimation and inference for SEM were made possible by contributions in econo-metrics (simultaneous equation models) and psychometrics (factor analysis). Lawley (1940), Anderson &Rubin (1956), Joreskog (1969) proposed hypothesis testing in factor analysis. Bock & Bargmann (1966)introduced analysis of covariance. Joreskog (1973) proposed normal theory maximum likelihood estimationof general SEMs. Joreskog & Goldberger (1972) and Browne (1974, 1982, 1984) set up least squares esti-mation and its various modifications. Particularly, Browne proposed estimator under elliptical distributionand Arbitrary Distribution Function (ADF) estimator also known as the Weighted Least Squares (WLS) fordistribution-free model. Bentler (1983) suggested using higher-order moments to identify model that is oth-erwise not identified by covariance based estimation approach. Muthen (1984) applied SEMs to categoricaland censored data. Bollen (1996a) investigated a limited information, two stage least squares (2SLS) instru-mental variables estimator for latent variable SEMs. For a survey of problems and approaches in structuralequation modeling, see Bollen (1989).

All those estimation methods have their pros and cons. The maximum likelihood estimation is suscep-tible to biases, especially in the standard errors, when the distribution of the data deviates from multivariatenormal. While analysis of the asymptotic robustness conditions (Satorra 1990, Satorra & Bentler 1990) andcorrections to the standard errors (Satorra & Bentler 1994) are available, efficiency of the method is compro-mised. ADF estimation procedure appears appealing as it achieves asymptotic efficiency. However, Chou,Bentler & Satorra (1991), Muthen & Kaplan (1985, 1992), and Hu, Bentler & Kano (1992), among others,found that ADF performs poorly with small sample sizes or large model degrees of freedom. The search forthe best estimation method continues. This paper proposes a new estimation method with strong backing intheoretical statistics and econometrics.

Empirical likelihood (EL) theory, as a non-parametric inference method, has been developed first formoments of i.i.d. data (Owen 1990), then for regression, i.e., non-i.i.d. data (Owen 1991), and then forarbitrary estimating equations (Qin & Lawless 1994), thus allowing one to incorporate auxiliary informationin estimation procedures. Since then it has been extended by many researchers into a systematic theory(Owen 2001) and found a great amount of following in econometrics (Mittelhammer, Judge & Miller 2000,Imbens 2002, Kitamura 2007). Computational algorithms and corrections improving on numerical and smallsample issues have been proposed (Hall & La Scala 1990, Owen 2001, Chen, Variyath & Abraham 2008).

We adopt empirical likelihood estimation and testing procedures for structural equation models. Thegeneral theory of EL for estimating equations allowed us to identify the optimal estimating equations whichturned out to be simply the moment conditions. A test of model fit naturally arises from the EL theory, andthe bootstrap can be used to approximate Bartlett correction factor. A simulation compared EL to maximum(quasi-)likelihood and least squares, and showed that the former indeed provides improved estimates when

2

the normality conditions are violated.The remainder of the paper is organized as follows. The theory of empirical likelihood is reviewed in

Section 2, and the framework of structural equation modeling, in Section 3. Section 4 blends covariancestructure models and empirical likelihood. Examples and simulations are given in Section 5. Some furtherextensions and related issues are overviewed in Section 6. Section 7 concludes.

2 Basics of empirical likelihood

The concept of empirical likelihood stems from MLE estimation of the distribution function. For i.i.d. data,a non-parametric likelihood can be constructed as a product of probabilities Prob(X = x) = F (x)−F (x−)of individual observations:

L(F ) =n∏i=1

[F (xi)− F (xi−)

]. (1)

Maximization of this likelihood is maximizing the point probability mass of the observed sample. Let theempirical distribution function be

Fn(x) =1n

n∑i=1

1I[xi ≤ x],

where 1I[·] is a 0/1 indicator function, and inequality is understood component-wise for multivariate x. It iseasy to check that Fn maximizes non-parametric likelihood (1) (Owen 2001, Theorem 2.1). Thus, Fn is thenon-parametric MLE of F .

Analogous to the parametric likelihood ratio test, define empirical likelihood ratio

R(F ) =L(F )L(Fn)

and profile likelihood function

R(θ) = sup{R(F )|T (F ) = θ, F ∈ F}, (2)

where T (F ) is a functional of the distribution F . A broad range of distribution characteristics can be thoughtof functionals of F , including moments and quantiles. In this paper, we consider the second moments of thedata in covariance structure models.

As in the fully parametric case, the likelihood ratio test based on R(θ) rejects for low values of R(θ),and confidence sets can be formed as

{θ|R(θ) ≥ r0},

where r0 controls coverage.As long as profile likelihood (2) plays the role of the regular likelihood for inferential purposes, it can

also be used to generate point estimates. Define maximum empirical likelihood estimator (MELE) as

θ = arg maxθR(θ). (3)

3

If the condition T (F ) = 0 can be thought of or expressed as a set of estimating equations EF m(X, θ) =0, the estimation problem can be written as

θ = arg maxw,θ{n∏i=1

nwi|n∑i=1

wim(xi, θ), wi ≥ 0,n∑i=1

wi = 1}. (4)

A number of general results about the properties of the resulting estimates have been established by Qin &Lawless (1994).

Theorem 1 Let Xi ∈ IRd be iid random vectors, and suppose that θ0 ∈ IRt is uniquely determined byE[m(X, θ)] = 0, where m(X, θ) takes values in IRt+q , for q ≥ 0. If there is a neighborhood Θ of θ0 and afunction M(x) with E[M(X)] <∞ so that for all θ ∈ Θ:

• E[∂m(X, θ0)/∂θ] has rank t,

• E[m(X, θ0)m(X, θ0)′] is positive definite,

• ∂m(x, θ)/∂θ is continuous in θ,

• ∂2m(x, θ)/∂θ ∂θ′ is continuous in θ,

• max{‖m(x, θ)‖3, ‖∂m(x, θ)/∂θ‖, ‖∂2m(x, θ)/∂θ ∂θ′‖} ≤M(x),

then MELE is consistent and asymptotically normal with asymptotic variance V[θ] satisfying

limn→∞

nV(θ) ={

E(∂m∂θ

)′[E(mm′)]−1 E(∂m∂θ

)}−1

(5)

This asymptotic variance is at least as small as that of any estimator of the form EA(θ)m(X, θ) = 0, whereA(θ) is a p× (p+ q) matrix with rank p for all values of θ. Furthermore,

− 2 log(R(θ0)/R(θ)) d−→ χ2t (6)

and− 2 log(R(θ)) d−→ χ2

q (7)

as n→∞.

While formally the problem (4) appears to be of maximization over n+p−1 parameters (as maximizationis performed jointly over the n−1 weights and p parameters), the effective degrees of freedom turned out tobe the same as in the classical likelihood problems (Wilks 1938). There are t degrees of freedom correspondto the model space and the number of parameters, and residual q degrees of freedom to test the goodness offit of the model.

The regularity conditions are quite typical for M -estimation problems (Huber 1974, White 1982). Theyare comparable to the conditions of Wald (1948) if the likelihood scores ∂l(θ,X)/∂θ are taken as the es-timating equations m(X, θ), and more restrictive than those of Huber (1967) in requiring existence andsmoothness of higher order derivatives.

4

Finally, (5) is the smallest possible variance of an estimator obtained from the set of estimating equationsm(x, θ), or semiparametric efficiency bound (Stein 1956, Godambe & Thompson 1984, Chamberlain 1987,Bickel, Klaassen, Ritov & Wellner 1998). Thus, MELE is asymptotically efficient. Newey & Smith (2004)have also established higher order properties of EL, and found it to generally have lower biases than otherestimators in class of generalized method of moments (GMM) and generalized empirical likelihood (GEL)estimators. With appropriate bias corrections, it also has the smallest higher order variance.

Confidence intervals, and in general confidence sets, can be constructed by inversion of the empiricallikelihood ratio results in Theorem 1. An asymptotic confidence set based on the χ2 distribution is{

θ : R(θ) > cα,Prob[χ2q < −2 ln cα] = 1− α

}. (8)

The empirical likelihood confidence regions have the same rate of coverage accuracy as parametriclikelihood, jackknife and the bootstrap methods. The coverage error decreases to 0 at the rate of 1

n asn→∞, and affords Bartlett correction (Barndorff-Nielsen & Cox 1994), reducing the error rate from

Prob[−2 lnR(θ0) ≤ x] = Prob[χ2q ≤ x] +O(n−1)

toProb[(1 + a/n)(−2 lnR(θ0)) ≤ x] = Prob[χ2

q ≤ x] +O(n−2)

with Bartlett factor a. Bartlett-correctability was established by DiCiccio, Hall & Romano (1991) for exactlyidentified models, and by Chen & Cui (2007) for overidentifed models.

Similar higher order improvements are also possible with the bootstrap calibration (Hall & La Scala1990, Hall & Horowitz 1996, Brown & Newey 2002), either by direct estimation of the distribution functionof −2 lnR(θ), or by a bootstrap-based estimate of Bartlett factor a:

R(θMELE) =q lnR(θMELE)

1B

∑Bb=1 lnR(θ(b)∗ )

, (9)

where B is the number of the bootstrap replications, and θ(b)∗ are the estimates obtained in b-th bootstrapreplicate. Calibration of the Bartlett correction can be done with a few dozen bootstrap samples, while directestimation of the tail probability might require hundreds and thousands of the bootstrap replications.

Some general computational details, including the recently proposed adjusted empirical likelihood (AEL)method (Chen et al. 2008), are given in Appendix B.

3 Covariance structure models

This section provides a brief description of covariance structure models, or structural equation models(SEM)1, and the main estimation and inference procedures associated with them. They encompass fac-tor analysis, regression, errors-in-variables, growth models, and simultaneous equations as special cases(Bollen 1989, Yuan & Bentler 2007).

1A more general version of SEM will also include models for the means of the variables. In this paper, we consider the meanstructure to be saturated, and concentrate on the covariance structure.

5

A general structural equation model consists of a latent model relating the latent variables:

η = αη +Bη + Γξ + ζ, (10)

and a measurement model linking the latent and the observed variables:

y = αy + Λyη + ε, (11)

x = αx + Λxξ + δ. (12)

Here, ξ are exogenous latent variables, η are endogenous latent variables, x and y are indicators of theexogenous and endogenous latent variables, and error terms ζ, ε, δ are uncorrelated with one another and ξ.The set of model parameters includes the intercepts α, the effect of exogenous variables on the endogenousones Γ, the effect of endogenous variables on one another B, the loadings Λx,Λy , and variances andcovariances of unrestricted elements of block diagonal V[(ξ′, ζ′, ε′, δ′)′] = diag[Φ,Ψ,Θε,Θδ ]. Equations(10) – (12) imply a certain covariance structure:

Σ(θ) = V[(xy

)]= V

[(ΛxΦΛ′x + Θδ ΛxΦΓ′M ′

MΓΦΛ′x MΓΦΓ′M ′ +MΨM ′ + Θε

)], M = Λy(I −B)−1. (13)

Appropriateness, or goodness of fit, of the researcher’s model is assessed via various tests of the null hy-pothesis

H0 : Cov(z) = Σ(θ) vs. H1 : Cov(z) 6= Σ(θ), (14)

where z = (x′, y′) is the vector of stacked observed variables. If the null is rejected, the interpretation is thatthe covariance of the data cannot be approximated by the class of covariance matrices Σ(θ).

As most nonlinear models, SEMs face identification issues. A model is said to be identified if no twosets of parameters can generate the same likelihood. In the context of covariance structure models, it isusually weakened to the requirement that no two sets of parameters can generate the same covariance ma-trix. Identification by higher moments, although technically possible (Bentler 1983), typically requires hugesample sizes. Hence, the parameter identification problem can be cast in terms of inverting the (componentsof) implicit function Σ(θ) given by (13). If there are just enough conditions to identify all the elements ofθ, the model is said to be exactly identified. If there is at least one θ that has more than one condition to bedetermined from, the model is said to be over-identified, and goodness of fit becomes possible (and usuallydesirable). For a detailed discussion on identification, see Bollen (1989).

Among the estimation methods for structural equation models, maximum likelihood estimation and vari-ations of the least squares procedures are the two dominant ones.

The maximum likelihood estimator (MLE), also referred to as full information maximum likelihood(FIML) estimator, is defined as

arg minθFML(θ, S) = arg min{log |Σ(θ)|+ tr[SΣ−1(θ)]− log |S| − dim(x′, y′)′}, (15)

where S is the MLE or unbiased estimator of the variance-covariance matrix without any structure imposed(i.e., the sample covariance matrix). It can be obtained either by assuming the multivariate normality of ob-served variables or by the Wishart distribution of the unbiased estimator of variance-covariance matrix. The

6

objective function in (15) is the likelihood ratio against a saturated unstructured model normalized per ob-servation. MLE has a number of well known good properties. It is scale free, consistent and asymptoticallyefficient, provided that the model is correct in both structural and distributional specifications. However,when the distribution of the data is not multivariate normal, MLE loses efficiency, and the covariance ma-trix of the parameter estimates is no longer the inverse of the information matrix (Satorra 1990, Satorra &Bentler 1994).

The goodness of fit test is available as the likelihood ratio test of (14) against a saturated model thatdoes not assume any structure. It can also be computed equivalently as T = (n − 1)FML(θ, S). Underasymptotic robustness conditions (Satorra 1990), including the simplest case of multivariate normality, ithas the asymptotic χ2

q distribution where the degrees of freedom of the model q = p∗ − t is the differencebetween the total number of moments p∗ = dim(z)(dim(z) + 1)/2 and the number of parameters t. Whenthe asymptotic robustness conditions are violated, its distribution becomes a mixture of χ2

1 distributions withweights different from 1. Satterthwaite approximation to this distribution was proposed by Satorra & Bentler(1994) and is widely used in practice.

Alternatively, the distribution of T under the null can be approximated by the bootstrap. To make the dataconform to the null hypothesis, Beran & Srivastava (1985) and Bollen & Stine (1992) proposed to augmentthem as

X = Σ(θ)1/2S−1/2X, (16)

where θ is a valid value of the parameter vector usually taken to be an estimator obtained from any of theavailable methods. With this transformation, X is guaranteed to have the correct model structure in thebootstrap population while retaining the multivariate kurtosis of the original data. Hence, the bootstrapdistribution of the test statistic gives an approximation to the distribution of the test statistic under the null.

Different versions of the least squares procedures minimize the distance between the sample covariancematrix and the model implied covariance matrix. Unweighted Least Squares (ULS) estimator minimizes thesquare loss, or the Frobenius norm of the residual matrix:

θULS = arg minθ{1

2tr[(S − Σ(θ))2]}.

Browne (1982) established consistency and provided inferential procedures.The Generalized Least Squares (GLS) estimator is defined as

θGLS = arg minθ

12

tr{

[(Σ(θ)S−1 − I)]2}.

For correctly specified model structure, the GLS estimator is asymptotically equivalent to MLE (Lee &Jennrich 1979).

The most general formulation is the Weighted Least Squares (WLS) estimator, also titled ArbitraryGeneralized Least Squares (AGLS) in Yuan & Bentler (2007), defined as

θWLS = arg minθFWLS(S, θ,W ) = arg min

θ{[s− σ(θ)]′W−1[s− σ(θ)]}, (17)

where s = vech(S) and σ(θ) = vech[Σ(θ)] are non-redundant vectorizations of the two covariance matrices(Magnus & Neudecker 1999), and W−1 is a positive definite (possibly random) weight matrix.

7

By a choice of the weight matrix W , the ULS or GLS methods can be reproduced, so WLS is the mostgeneral formulation of the minimal distance estimation methods. Indeed,

tr{

[(S − Σ(θ))V −1]2} = tr[V −1(S − Σ(θ))V −1(S − Σ(θ))

]= vec′(S − Σ(θ))(V −1 ⊗ V −1) vec(S − Σ(θ))

= vech′(S − Σ(θ))D′(V −1 ⊗ V −1)D vech(S − Σ(θ))

by the properties of Kronecker products, and definition of the duplication matrix D (Magnus & Neudecker1999). Here, V = I for ULS and V = S−1 for GLS. Thus, most of the modern treatments of SEM such asYuan & Bentler (2007) are based on WLS.

As shown by Browne (1984), if W is chosen to be the asymptotic sample covariance matrix of s or aconsistent estimator of it, the WLS is consistent and asymptotically efficient as long as the 4-th order mo-ments of the observed variables are finite. If such W is used in estimation, it is known as the AsymptoticallyDistribution Free (ADF) estimation method. It is capable of achieving asymptotic efficiency for arbitrarydistributions satisfying the regularity conditions of sufficiently many moments. However, as shown by Chouet al. (1991), Muthen & Kaplan (1985, 1992), and Hu et al. (1992), its performance under small sample sizeor large model degree of freedom is poor, and beneficial asymptotic properties are expressed only for samplesizes more than 10, 000.

Finally, a limited information estimation method that concentrates on one or few parameters at a time istwo-stage least squares estimator with instrumental variables (Bollen 1996a, Bollen 1996b, Bollen 2001).This method allows to estimate regression-type coefficients, including latent variable loadings on indicators,and provides useful specification tests. Sargan (1958) test checks instrument validity, and Hausman (1978)test checks validity of the model as a whole by comparing 2SLS estiamtes with another estimator, usuallythe MLE.

The 2SLS method imposes identification condition of setting one of the latent variable loadings to 1. Itcan be assumed without loss of generality that it is the first indicator of a latent variable. Assume also forsimplicity that all variables are considered in their deviations from the (sample) means, and all intercepts in(10)–(12) are zeroes. Consider the first exogenous and endogenous latent variables, so that x1, x2, . . . , xkare indicators of ξ1, and y1, y2, . . . , yl are indicators of η1. By using the above scaling convention, express

ξ1 = x1 − δ1, η1 = y1 − ε1.

Then the regression equations for other indicators can be written as

x2 = λx21x1 + δ2 − λx21δ1 ≡ λx21x1 + δ2, (18)

. . .

xk = λxk1x1 + δk − λxk1δ1 ≡ λxk1x1 + δk. (19)

Estimation of say (18) by OLS will lead to biased results since the regressor x1 is correlated with the compos-ite error δ2 (they both contain δ1). However, one can find instrumental variables to estimate (18) by a com-mon econometric technique of two stage least squares (Davidson & MacKinnon 1993, Wooldridge 2002).The set of instrumental variables must satisfy the following conditions: (i) the instruments are correlatedwith the regressors; (ii) the instruments are uncorrelated with the (composite) error term; (iii) there should

8

be at least as many instruments as there are regressors. If measurement errors δ are uncorrelated, the re-maining indicators of ξ1, namely x3, . . . , xk can be used as instruments in regression equation (18). Indeed,Cov(x1, x3) = λx31φ11 6= 0, Cov(x3, δ2) = E[(λx31ξ1 + δ3, δ2 − λx21δ1] = 0, and similarly for othervariables. Since the regression equation only contains one explanatory variable x1, there must be at leastone instrument to satisfy condition (iii). Similarly, other loadings λx31, . . . , λxk1 for indicators of latentvariables can be estimated.

To estimate the coefficients of regression between two latent variables, assume variables ξ1 and η1 arerelated as

η1 = γ11ξ1 + ζ1.

Then the corresponding equations in the scaling indicators are

y1 = γ11x1 + ζ1 − γ11δ1 ≡ γ11x1 + ζ1. (20)

The instruments for this equation can again be chosen from the remaining indicators of ξ1. However, indi-cators of η1 cannot be used as instruments since they contain ζ1.

4 Application of empirical likelihood theory to SEM

In this section, we apply the empirical likelihood concepts outlined in Section 2 to structural equation modelsof Section 3.

4.1 Optimal estimating equations

The estimating equations for maximum likelihood objective function (15) can be found using its differential(Magnus & Neudecker 1999) with respect to the implied moments matrix Σ(θ) and eventually the vector ofparameters θ. See Appendix A for derivations. The resulting maximum likelihood estimating equations, orscore equations, are given by

σ(θ)W−1NT (s− σ(θ)) = 0 (21)

where the normal theory weight matrix is

W−1NT = 2D′(Σ−1 ⊗ Σ−1)D,

and σ(θ) is the Jacobian of the transformation θ 7→ σ(θ) ≡ vech Σ(θ). Hence, the estimating equations arelinear combinations of the generic moment conditions

mkl(X, θ) = s− σ(θ), 1 ≤ k ≤ l ≤ p, (22)

where the coefficients for the linear combinations are provided by the normal theory weight matrix.Using similar techniques, the estimating equations for the weighted least squares are given by

σ(θ)′W−1(s− σ(θ)) = 0,

with details in the Appendix. Hence, they also represent linear combinations of the moment conditions (22).

9

Let us now derive the estimating equations for the 2SLS method. Unlike the MLE or WLS estimatingequations derived as first order conditions for certain optimization problems, 2SLS estimating equations canbe obtained directly. For equation (18) and instruments x3, . . . , xk, the estimating equations are∑

i

[xi2 − λx21xi1](xi3, . . . , xik)′ = 0, (23)

with corresponding population analogues

E[x2 − λx21x1](x3, . . . , xk)′ = 0 (24)

or

Cov(x2, x3)− λx21 Cov(x1, x3) = 0...

Cov(x2, xk)− λx21 Cov(x1, xk) = 0 (25)

(recall that the intercepts were suppressed).For equation (20) with instruments x3, . . . , xk, the estimating equations are∑

i

[yi1 − γ11xi1](xi3, . . . , xik)′ = 0, (26)

with corresponding population analogues

E[y1 − γ11x1](x3, . . . , xk)′ = 0 (27)

or

Cov(y1, x3)− γ11 Cov(x1, x3) = 0...

Cov(y1, xk)− γ11 Cov(x1, xk) = 0 (28)

Both systems (23) and (26) are obtained as linear combinations of the generic moment conditions (22).E.g., the first equation of (25) is obtained from the two moment conditions, m13 = s13 − λ3φ11 andm23 = s23−λ2λ3φ11 = 0, with weights−λx21 and 1, so that the implied moments disappear from the finalexpression −λx21s13 + s23.

We now see that all the estimation methods outlined above are based on the moment conditions (22) withvarious linear combinations of those providing estimating equations for each particular method (and eachparticular parameter in 2SLS). Hence, the following result holds:

Corollary 1 For the structural equations model (10)–(13), the empirical likelihood estimator based on mo-ment conditions (22) is an asymptotically efficient estimator in the class of all estimators based on the secondorder moments only.

10

In particular, asymptotic efficiency of MELE is at least as good as that of any WLS or 2SLS estimator.The conditions of Theorem 1 are easy to interpret for SEM. The first condition requires that rk σ(θ0) = p,

which is equivalent to local identifiability of the model. The next condition on positivity of V[s− σ(θ)] willhold unless some moments are identical in case of perfect collinearity or redundancy of the variables. Thisalso has to do with identification of the model. The derivatives for the smoothness conditions are givenby Neudecker & Satorra (1991), and their continuity is easily verified as long as |Σ(θ)| > 0. Finally, theconditions on the upper bounds of the estimating equations and their derivatives dictate that the sixth momentof the data is finite, which is a slightly more demanding condition than existence of the fourth order momentsrequired for consistency and asymptotic efficiency of ADF.

Thus, the maximum empirical likelihood estimates for the structural equation model (10)–(12) are ob-tained as

θMELE = arg maxθ,w

n∑i=1

ln(nwi), (29)

subject to constraints

n∑i=1

wi = 1,

n∑i=1

wi[(zik − µk(θ))(zil − µl(θ))− σkl(θ)

]= 0, 1 ≤ k ≤ l ≤ p. (30)

If the means of the variables are not modeled, µi(θ) = xi. The dual saddle point formulation (see AppendixB) is

θMELE = arg maxθ

minλ

n∑i=1

ln1n

11 + λ′

[vech(zi − µ(θ))(zi − µ(θ))′ − σ(θ)

] . (31)

4.2 A test of model fit

In SEM analysis, substantive interest of the researcher usually lies in testing hypotheses (14) of whether hermodel is compatible with the data. In the EL framework, test of overall model can be constructed based onthe profile likelihood

R(θ) = max{n∏i=1

nwi|n∑i=1

wi vech[ZiZ ′i − Σ(θ)] = 0, wi ≥ 0,n∑i=1

wi = 1},

By the result (7) of Theorem 1, it has a familiar χ2q distribution:

TMELE = −2 logR(θMELE) d−→ χ2q.

The degrees of freedom of the test is the number of overidentifying restrictions, and is the same as in thenormal-theory based χ2-difference test for multivariate normal data. However, EL-based test does not invokeany distributional assumptions, and does not need any corrections like those of Satorra & Bentler (1994) forthe normal theory test statistic T .

11

An asymptotically equivalent test of model fit can be obtained via Lagrange multipliers λ. Imbens (2002)suggests the following form of the test statistic:

LMMELE = nλ′Γλ, (32)

where λ are the Lagrange multipliers at the optimum, and Γ is the estimator of the variance of estimatingequations, e.g.,

Γ = (s− σ(θ))(s− σ(θ))′.

This is the same matrix of empirical fourth moments that appears in ADF and Satorra & Bentler (1994)corrections.

5 Examples

In this section, we report the results of simulation undertaken to compare the performance of several estima-tion and testing methods, including EL and AEL. We also report the results of analysis based on a popularfactor analysis data set.

5.1 Simulation study

For this simulation project, a confirmatory factor analysis model was considered:

xk = λkξ + εk, k = 1, . . . , 6, (33)

where all parameters λk, θk and φ were equal to one.Three distributions were considered to study possible effects of skewness and kurtosis, either marginal

or joint:

D1: ξ ∼ N(0, 1), εk ∼ N(0, 1) for k = 1, 2, 3, εk is half-normal, scaled to have mean 0 and variance 1,with marginal skewness of

√2(4− π)/(π− 2)3/2 = 0/407 and kurtosis 8(π− 3)/(π− 2)2 = 0.869.

D2: ξ ∼ N(0, 1), εk ∼ N(0, 1) for k = 1, 2, 3, εk is lognormal with mean of logs equal to−0.5 ln(e1−1)−0.5 and variances of the logs equal to 1, scaled to have mean 0 and variance 1. The marginal skewnessof the error terms is (e + 2)

√(e− 1) = 6.185, and marginal kurtosis is e4 + 2e3 + 3e2 − 6 = 110.9.

D3: ξ ∼ N(0, 1), εk ∼ N(0, 1) for k = 1, 2, 3, (ε4, ε5, ε6)′ ∼ t6, multivariate t distribution, scaled to haveunit marginal variances.

For distributions D1 and D2, asymptotic robustness conditions (Satorra 1990) predict that the quasi-MLEswill be asymptotically efficient for all parameters except those describing the variances of the non-normalterms (i.e, θ4 through θ6). Distribution D2 should be more problematic as it has higher skewness andkurtosis. Distribution D3 violates those conditions, and the distribution-free nature of empirical likelihoodmethods would be expected to provide asymptotic efficiency gains.

For each distribution, sample sizes n = 100, 200, 300, 500 and 1000 typical for applications have beenstudied. For each sample size, at least 250 Monte Carlo replications were taken. For each Monte Carlosample, the following estimation and testing procedures were performed, in this order:

12

1. ULS estimation, using the population values of the parameters as starting values;

2. quasi-MLE, or FIML, estimation, using a convex combination of ULS estimates (with weight 0.9) andpopulation values (with weight 0.1) as starting values;

3. AEL estimation, using a convex combination of FIML estimates (with weight 0.8), ULS (with weight0.1) and population values (with weight 0.1) as starting values;

4. EL estimation, using a convex combination of FIML and AEL estimates (with weights 0.4) and pop-ulation parameters (with weight 0.2).

Huber (1967) sandwich variance estimator was used with all methods, with the necessary derivatives takennumerically.

The tables with simulation results are given in appendix C available externally2. Here, we provide asummary of results that we found to be interesting.

The degree of violation of multivariate normality is assessed in Table 2. The deviations from normalityare notable enough to be detected by Mardia (1970) test even at the smallest sample sizes.

The quality of the asymptotic χ29 approximation to the goodness of fit tests is explored in Table 3 using

Kolmogorov-Smirnov test (Conover 1998, Kendall, Stuart, Ord & Arnold 1999). Quasi-likelihood ratio test,or FIML fit statistic, and likelihood ratio and score/LM tests for empirical likelihood and adjusted empiricallikelihood are compared. Among the distributions satisfying asymptotic robustness conditions, the one withlight tails (D1) produced valid χ2 statistics for sample sizes of 500 and above, with the quasi-likelihoodratio test statistic TFIML achieving asymptotic normality even faster. The heavy tailed distribution D2 onlyyielded χ2 approximation for the TFIML statistic. All other test statistics did not perform well. However,additional simulation results (Section 5.1) suggest that the situation can be rectified using Bartlett-correctedtests.

We investigate three aspects of parameter estimation: bias, efficiency, and confidence interval coverage,each approximated by the corresponding Monte Carlo quantity. Each of the characteristics was analyzedaccording to the second order expansion:

√n(θrj,n − θj) = βb0 + βb1n

−1/2 + εbnr, (34)

n(θrj,n − θj,n)2 = βe0 + βe1n−1/2 + εenr, (35)

1I{θj > θrj,n − 1.645s.e.[θrj,n]

}= βl0 + βl1n

−1/2 + εlnr,

1I{θj < θrj,n + 1.645s.e.[θrj,n])

}= βr0 + βr1n

−1/2 + εrnr. (36)

Here, θrj,n is the estimate of parameter θj in r-th Monte Carlo sample for the sample size n, and subindicesb, e, l and r stand for the models for bias, efficiency and coverage on the left and right limits of the CI,respectively. The regression models for each characteristic was estimated simultaneously for all methods,and the standard errors were corrected for clustering with the Monte Carlo sample being the cluster. Thisrepresentation allows for concise analytic results driven summary of the simulation (Skrondal 2000).

2 Tables will be made available as a separate file on the authors’ website(s).

13

Note that some parameters can be considered parallel, as some of the variables in the model can bepermuted. Hence, the parallel sets are: loadings and variances of the variables with normal errors, λ2–λ3

and θ2–θ3; loadings and variances of the variables with skewed and/or heavy tailed errors, λ4–λ6 and θ4–θ6.Table 4 summarizes the results for the bias of the estimators. There is little evidence of the higher

order biases in estimation of the loadings, although this might be an issue of the power of the Monte Carlostudy. Both EL and AEL seem to exhibit higher order (negative) biases in estimation of the unique varianceparameters θk.

Table 5 summarizes the results for the variance of the estimators. While efficiency of ULS and quasi-MLE, surprisingly, is about the same on most occasions, the estimates of AEL and EL suggest higherefficiency. Individual differences do not appear to be significant at conventional levels, probably due toinsufficient simulation study size, but overall F -tests reject the null that performance of the methods is thesame within a given distribution. The distribution for which AEL and EL offer clear advantage is D2, withlognormal distribution of the error terms. In agreement with predictions of asymptotic robustness theory,FIML is not efficient for those parameters. Lower asymptotic variances of the empirical likelihood meth-ods is somewhat offset by greater magnitude of higher order terms, which disagrees with predictions fromNewey & Smith (2004).

Combining the biases and variances together into MSE, Table 6 shows that the leading term of the MSEof the EL/AEL estimates is generally lower than those of other methods, although the second order term isgreater for EL/AEL than for other methods. The result is by and large driven by the variance part of MSE.

The confidence intervals coverage is reported in Table 7. One-sided CI nominal coverage is 95% (withthe coverage error of O(n−1/2)), and two-sided CI nominal coverage is 90% (with the coverage error ofO(n−1)). For all methods, the CI was constructed as the point estimate ± 1.645 times the reported standarderror, and coverage in each tail was studied separately. Additionally, CIs based on profile EL (39) werestudied for the distribution D2. The reported entries are deviations from the nominal coverage, so positiveentries suggest overcoverage, and negative entries, undercoverage. Also, higher order terms show the effectof the sample size. For instance, a coefficient of -200 of the term n−1/2 indicates that for a sample size ofn = 100, the confidence interval has a coverage of only 75% vs. the target 95%. The lack of coverage onlyimproves to 89% with the sample size of n = 1000. For two-sided coverage, the second term is of the ordern−1. The reported coefficient directly shows under- or overcoverage for the sample size of n = 1000, withcoverage error being 10 times great for the sample size of n = 100.

Generally, confidence intervals for the variance parameters θ are the least accurate ones, especially forthe variables were the distributions were tweaked (variables 4–6). That is not surprising in the light ofasymptotic robustness theory.

For all distributions but D1, confidence intervals based on ULS estimation results are notably off thenominal coverage. Interestingly, in a lot of situations overcoverage occurs, so that the ULS CIs are too long.FIML did not have any major problems with the relatively simple distributions D1 (that satisfies asymptoticrobustness conditions and has relatively light tails) and D3 (that violates asymptotic robustness, but producesa distribution that is relatively close to an elliptic one). EL and AEL methods performed similarly, and tendedto have problems more often that FIML, although less often than ULS. Overcoverage was also observed withthese methods. They also tended to have greater magnitudes of the second order terms. The main term andthe second order term were of opposite signs in several cases, compensating one another in the moderatesample sizes. This may also be an indication of misspecified asymptotic expansion, e.g. when the term

14

O(n−1/2) is absent in the expansion, while higher order terms such as O(n−1) are present. Specificationswith both n−1/2 and n−1 as regressors were attempted, but the results were unstable because of very highmulticollinearity between the two terms.

Contrary to the expectations, the profile empirical likelihood based CIs did not perform better in small tomoderate size samples. While asymptotic coverage is on target, the small sample (second order) terms aregenerally greater in magnitude than with other methods, and indicate notable undercoverage. However, inthe most difficult situation of the CI for the variance of the lognormal error terms in D2, the profile EL wasthe only method that had accurate coverage. All other methods overcovered on the left side and undercoveredon the right side, with tendency to undercover for two-sided intervals.

A combination of the bootstrap based Bartlett correction with profiling the likelihood might improve thesituation, but this approach was not studied in the current project.

Across most columns of Tables 4 through 7, F -tests are significant at conventional levels. It indicatesthat the estimators have notably different performance for most parameters and characteristics (34)–(36).

An additional simulation was used to gauge viability of the bootstrap to provide direct estimates of thegoodness of fit statistic and/or Bartlett correction to it. Distribution D3 that featured multivariate Studentdistribution of several error terms was used, with sample sizes of 100, 200, 500 and 1000. The numberof Monte Carlo repetitions was 400, 600, 600 and 300, respectively. For each Monte Carlo sample, thefollowing estimation and testing procedures were performed, in this order:

1. ULS estimation, using the population values as the starting point

2. (quasi-)MLE estimation, using a convex combination of ULS estimates (with weight 0.9) and popula-tion values (with weight 0.1)

3. EL estimation, using a convex combination of MLE (with weight 0.8), ULS (with weight 0.1) andpopulation values (with weight 0.1)

4. B = 200 bootstrap replications of EL estimation from the augmented distribution rotated accordingto (16)

5. The direct bootstrap p-value was computed as

pBS =1B

B∑b=1

1I[T ∗b > TEL]

where TEL is the empirical likelihood ratio for the overall goodness of fit test achieved in the originalMonte Carlo sample, and T ∗b is the EL ratio statistic obtained in b-th bootstrap sample

6. The Bartlett-corrected statistics were obtained according to (9) using either all 200 bootstrap replica-tions, or just the first 50 bootstrap replications.

The simulation results are summarized in Table 8. The statistics TEL, TFIML, and the Bartlett-correctedversions of TEL were compared to their asymptotic targets of χ2

9, while the direct bootstrap estimate pBSwas compared to the uniform U [0, 1] distribution. As expected, the likelihood ratio test for FIML rejects the

15

model, since the distribution is misspecified. The empirical likelihood ratio only achieves its asymptotic χ2

distribution at the largest sample sizes used. However, the bootstrap approach showed great performance,and Bartlett-corrected R(θMELE) had the desired pivotal distribution with as few as 50 bootstrap replica-tions and sample size as small as n = 100.

5.2 Empirical illustration

We have used Holzinger-Swineford data analyzed by Joreskog (1969) to compare several estimation meth-ods3. The following variables were used in the analysis:

Factor 1 Factor 2 Factor 3y1, visual perception test y4, paragraph comprehension test y7, speeded addition testy2, spatial relations test y5, sentence completion test y8, speeded counting of dots in shape

y3, lozenges y6, word meaning test y9, straight and curved caps

In order to avoid numeric optimization issues such as poorly condition matrices, the variables werescaled by 6, 4, 8, 3, 4, 7, 23, 20, 36, to make their empirical variances close to 1. The identification conditionimposed on the system was to set the variances of the latent factors to 1, so that φ parameters represent factorcorrelations. Three methods were compared: maximum likelihood (FIML), ULS and empirical likelihood4.The results are reported in Table 1.

We would expect that the coefficient estimates would be comparable to one another, while the standarderrors would be smaller for the empirical likelihood and larger for the less efficient ULS, with the (quasi)ML with Huber (1967) sandwich standard errors in between. Also, the model is known to fit quite poorly(Yuan & Bentler 2007), so the overall fit tests are expected to reject the model.

Those expectations were by and large confirmed by the analysis, although the relations between the es-timated covariance matrices were not monotonic. For each of the differences V[θML]− V[θEL], V[θULS ]−V[θEL], 15 out of 21 eigenvalues were positive, and 6 were negative and smaller in magnitude that the pos-itive ones, showing a tendency for V[θEL] matrix to be smaller than the other two. Some of the parametersthat EL seemed to be fitting more poorly are those of the latent factor structure Φ.

The normal theory T statistic was 85.17, but due to severe non-normality, kurtosis correcting (Satorra& Bentler (1994) adjusted T ) was computed and reported in the table. Both adjusted T and empiricallikelihood ratio test safely rejected the model, as expected. The bootstrap from the null using rotated data(16) was used to calibrate the χ2 test statistics for both the FIML method and the empirical likelihood, usingthe FIML and EL estimates for θ, respectively. R = 500 bootstrap replications were taken for each of themethods. The number of bootstrap samples where the empirical likelihood ratio for the correctly specifiedmodel exceeded 91.281 was 39, so the direct bootstrap p-value is 39/500 = 0.078 indicating a mild rejection.None of the bootstrap samples had resampled T statistic exceeding 85.17, hence, the the bootstrap p-valuefor the quasi-ML T is less than 0.002.

Bartlett correction factor can also be approximated by the bootstrap from the null distribution. The meanof the empirical likelihood ratios was 63.363, much larger than the degrees of freedom 24. Hence, the

3 The data set is that available in R package MBESS.4 Empirical likelihood was implemented by direct optimization of weights in MATLAB, as well as through the dual problem in

Stata. The results agreed to numeric accuracy.

16

Table 1: Factor analysis of Holzinger-Swineford data.

Parameter FIML ULS ELλ11 0.899 0.976 0.800

(0.101) (0.106) (0.133)λ21 0.498 0.489 0.443

(0.0879) (0.0804) (0.0966)λ31 0.656 0.614 0.724

(0.0807) (0.0786) (0.0911)λ42 0.990 1.006 1.040

(0.0614) (0.0622) (0.0616)λ52 1.102 1.061 1.108

(0.0548) (0.0585) (0.0519)λ62 0.917 0.941 0.936

(0.0582) (0.0639) (0.0484)λ73 0.619 0.489 0.619

(0.0857) (0.0904) (0.0954)λ83 0.731 0.631 0.697

(0.0922) (0.0832) (0.0826)λ93 0.671 0.866 0.717

(0.0985) (0.0869) (0.109)φ12 0.459 0.430 0.381

(0.0734) (0.0726) (0.102)φ23 0.284 0.292 0.241

(0.0853) (0.0730) (0.109)φ13 0.470 0.464 0.515

(0.118) (0.0736) (0.159)

Parameter FIML ULS ELθ1 0.549 0.405 0.676

(0.157) (0.179) (0.182)θ2 1.134 1.143 1.176

(0.112) (0.116) (0.111)θ3 0.844 0.898 0.746

(0.101) (0.0941) (0.136)θ4 0.371 0.338 0.355

(0.0504) (0.0621) (0.0495)θ5 0.446 0.534 0.403

(0.0568) (0.0709) (0.0511)θ6 0.356 0.311 0.311

(0.0466) (0.0519) (0.0401)θ7 0.797 0.941 0.761

(0.0968) (0.0873) (0.0910)θ8 0.488 0.624 0.445

(0.118) (0.0941) (0.0935)θ9 0.568 0.267 0.498

(0.118) (0.131) (0.120)χ2 test 72.812 91.281d.f. 21.3 24p-value 3 · 10−8 9 · 10−10

Direct bootstrap percentile< 0.002 0.078

Bootstrap-calibrated Bartlett corrected 34.574p-value 0.075

Standard errors are in parentheses. First column: FIML criterion, Huber sandwich standard errors, adjusted Satorra-Bentler test. Second column: ULS criterion. Third column: empirical likelihood and empirical likelihood ratio test.

17

Bartlett-corrected empirical likelihood ratio is

q lnR(θMELE)1B

∑Bb=1 lnR(θ(b)∗ )

=24 · 91.281

63.363= 34.574,

where θ(r)∗ is the MELE obtained in r-th bootstrap sample. It is reported in the last block of Table 1, and thecorresponding p-value of 0.075 is similar to that of the bootstrap distribution tail probability.

The empirical likelihood weights wi demonstrated an appreciable spread, with the minimal value of0.619 · 10−3, the maximum value of 22.23 · 10−3 (c.f. 1/n = 3.32 · 10−3), and coefficient of variationof 1.24. This quite notable variability of weights might be yet another indication of poor model fit. Ifthe estimating equations were all correct, then the Lagrange multipliers λ are of order Op(n−1/2) (Qin &Lawless 1994), and hence the weights are 1/n(1 + Op(n−1/2)). In this example, however, the variabilityof weights is quite pronounced. The EL method had to go a great length in approximating the empiricalcovariance matrix with the weighted implied one.

Estimation of the model with analytical derivatives in λ and numeric derivatives in θ took about 20minutes on a Windows XP computer with 2 GHz processor.

6 Discussion

There is a number of further topics that can be studied with regards to empirical likelihood approach instructural equation modeling.

Estimators based on a set of estimating equations, like those in (22), have been studied extensively ineconometrics. The method of combining several moment conditions together via a quadratic form is knownas generalized method of moments (Hansen 1982, Hall 2005, GMM). The properties of GMM estimators,such as consistency, asymptotic normality and asymptotic efficiency, have been established in econometricsliterature, and it has become the basic estimation framework within which many other estimation methodscan be explained (Hayashi 2000). In SEM, the WLS is the analogue of GMM estimator, and ADF is theasymptotically efficient GMM estimator. As with ADF estimator, finite sample problems of GMM havealso been noted in econometric literature (Hansen, Heaton & Yaron 1996). Relations between GMM andempirical likelihood have been studied extensively (Smith 1997, Imbens 2002, Bera & Bilias 2002, Newey& Smith 2004, Hall, Inoue, Jana & Shin 2007). Newey & Smith (2004) demonstrated that EL estimates havesmaller biases than other estimation methods such as (different versions of) GMM. This accumulated bodyof literature will be helpful in formulating new estimators for SEM and investigating their properties.

There is a number of related information-theory based estimators, including Euclidean empirical likeli-hood (Owen 2001), exponential tilting (Kitamura & Stutzer 1997), generalized empirical likelihood (Smith1997), and Cressie-Read divergence family (Cressie & Read 1984). All of those estimators can be enter-tained for SEM, as well.

The empirical likelihood framework can provide inference for derived quantities such as estimates ofdirect, indirect and total effects. Testing whether a direct effect is zero is usually available through a teston a single parameter. A test on whether an indirect or a total effect is zero can be obtained by forming thecorresponding nonlinear combinations of parameters, setting this direct or indirect effect to zero, adding it asan additional estimating equation along with the main equations (22), and forming the empirical likelihood

18

ratio against the base model. Alternatively, the confidence intervals for the indirect and total effects can beformed as mentioned above by finding the level points of the empirical likelihood ratio.

While we have covered the main estimation methods commonly used in structural equation modeling,other more exotic methods can also be entertained provided they result in closed form estimating equations.For instance, robust estimators and estimating equations implied by them (Moustaki & Victoria-Feser 2006,Siemsen & Bollen 2007) can be used instead of the usual second moments (22).

It is interesting to note that empirical likelihood naturally treats missing data in pairwise fashion. That is,if the value xij of j-th variable is missing in i-th observation, this will lead to exclusion of i-th observationonly from the estimating equations referring to the missing variable. All pairs of non-missing variables willstill be present in their respective estimating equations for that observation. The estimating approaches thatuse estimates of covariance matrices in their objective functions naturally tend to rely on listwise deletion,where observation will be lost if any of its entries are missing, unless special modifications of the standardestimation procedures are performed.

Other bootstrap procedures can be used for Bartlett correction calibration in place of the bootstrap fromthe rotated distribution (16). One possible bootstrap scheme is unequal probability bootstrap from the distri-bution with estimated empirical likelihood weights. Alternatively, the bootstrap samples can be taken fromthe original distribution, and empirical likelihood ratios can be formed against the empirical likelihood ofthe original sample.

Empirical likelihood procedures place considerable computing demands. Consider the data set with nobservations, and the model with p observed variables and t parameters. Direct optimization of weights isa constrained maximization procedure with n + t parameters, and optimization via the dual problem is anested set of maximization over t parameters where each nested call is a minimization with respect to p∗

parameters where p∗ = p(p+ 1)/2. Depending on the sample size and optimizers available to the end user,one or the other might be preferred. A generic implementation is available in GAUSS by Bruce Hansen5.From the authors’ experience, pushing the problems beyond a dozen or so observed variables may requireenormous computing power generally not available in desktop formats.

7 Conclusion

This paper proposed to use empirical likelihood framework for estimation and testing in structural equationand covariance structure models. Based on the asymptotic theory of empirical likelihood and its extensions,the method has been expected to produce asymptotically efficient estimates as well as to provide a test of fitthat has a pivotal χ2 distribution regardless of the underlying distribution. The only other method known tohave this important property is Browne’s (1984) ADF that usually needs very large sample sizes.

It has been demonstrated that the EL or AEL estimates based on the complete set of second order mo-ments will be asymptotically efficient, and have asymptotic variance at least as small as those of MLE, WLSor 2SLS estimators. Based on higher order theory, we would expect EL/AEL estimators to have lower biasesand better coverage of confidence intervals.

Simulation evidence confirmed some of those findings, but not all of them. When asymptotic robustnessconditions are satisfied, estimation and testing based on quasi-MLE produced results that are arguably better

5 http://www.ssc.wisc.edu/˜bhansen/progs/progs gmm.html

19

that those based on EL methods, with more accurate confidence intervals and asymptotic χ2 distribution ofthe goodness of fit test providing reasonable approximation at lower sample sizes. However, when asymp-totic robustness conditions are violated, only the EL/AEL methods produce χ2 distribution, although thesample sizes might have to be in high hundreds of observations.

Limited simulation evidence also suggests that Bartlett corrections may bring the sample sizes requiredfor asymptotic χ2 distribution to be sufficiently accurate down to low hundreds. The number of bootstrapreplications that can be used to calibrate the Bartlett factor may be as low as 50.

There was mixed evidence regarding performance of the empirical likelihood estimates. They tended tohave higher order biases in our simulations, and while the reported variances and MSEs were lower than forother methods, they failed to reach “significance” with the current number of Monte Carlo replications.

The simulation results suggest that the best way to apply the EL procedure in SEM is to complementthe main estimation with the bootstrap calibration of the Bartlett corrections, probably for both the overallgoodness of fit test and the CIs for the parameters of primary interest. Then reasonably accurate results canbe obtained for sample sizes as low as n = 100.

Acknowledgments

The authors would like to thank Kenneth Bollen, Douglas Miller, Victoria Savalei, Oksana Loginova, threereferees and the associate editor for helpful comments. The errors and omissions are our responsibility. Thiswork was supported by NSF grant SES-0617193 with funds from Social Security Administration.

20

References

Anderson, T. W. & Rubin, H. (1956), Statistical inference in factor analysis, in ‘Proceedings of the ThirdBerkeley Symposium on Mathematical Statistics and Probability’, Vol. 5, pp. 111–150.

Azzalini, A. & Dalla Valle, A. (1996), ‘The multivariate skew-normal distribution’, Biometrika 83(4), 715–726.

Barndorff-Nielsen, O. E. & Cox, D. R. (1994), Inference and Asymptotics, Monographs on Statistics andApplied Probability, Chapman & Hall/CRC, New York.

Bentler, P. M. (1983), ‘Simultaneous equation systems as moment structure models’, Journal of Economet-rics 22(1-2), 13–42.

Bentler, P. M. & Weeks, D. G. (1980), ‘Linear structural equations with latent variables’, Psychometrika45, 289–308.

Bera, A. K. & Bilias, Y. (2002), ‘The MM, ME, ML, EL, EF and GMM approaches to estimation: a synthe-sis’, Journal of Econometrics 107(1-2), 51–86.

Beran, R. & Srivastava, M. S. (1985), ‘Bootstrap tests and confidence regions for functions of a covariancematrix’, The Annals of Statistics 13(1), 95–115.

Bickel, P. J., Klaassen, C. A. J., Ritov, Y. & Wellner, J. A. (1998), Efficient and Adaptive Estimation forSemiparametric Models, Springer.

Bock, R. & Bargmann, R. (1966), ‘Analysis of covariance structures’, Psychometrika 31(4), 507–534.

Bollen, K. A. (1989), Structural Equations with Latent Variables, Wiley, New York.

Bollen, K. A. (1996a), ‘An alternative two stage least squares (2SLS) estimator for latent variable models’,Psychometrika 61(1), 109–121.

Bollen, K. A. (1996b), A limited-information estimator for LISREL models with and without heteroscedasticerrors, in G. Marcoulides & R. Schumaker, eds, ‘Advanced Structural Equation Modeling Techniques’,Erlbaum, Mahwah, NJ, pp. 227–241.

Bollen, K. A. (2001), Two-stage least squares and latent variable models: Simultaneous estimation androbustness to misspecifications, in R. Cudeck, S. D. Toit & D. Sorbom, eds, ‘Structural EquationModeling: Present and Future, A Festschrift in Honor of Karl Joreskog’, Scientific Software, pp. 119–138.

Bollen, K. & Stine, R. (1992), ‘Bootstrapping goodness of fit measures in structural equation models’,Sociological Methods and Research 21, 205–229.

Brown, B. W. & Newey, W. K. (2002), ‘Generalized method of moments, efficient bootstrapping, and im-proved inference’, Journal of Business and Economic Statistics 20, 507–517.

21

Browne, M. W. (1974), ‘Generalized least squares estimators in the analysis of covariances structures’, SouthAfrican Statistical Journal 8, 1–24.

Browne, M. W. (1982), Covariance structures, in D. M. Hawkins, ed., ‘Topics in Multivariate Analysis’,Cambridge University Press, pp. 72–141.

Browne, M. W. (1984), ‘Asymptotically distribution-free methods for the analysis of the covariance struc-tures’, British Journal of Mathematical and Statistical Psychology 37, 62–83.

Chamberlain, G. (1987), ‘Asymptotic efficiency in estimation with conditional moment restrictions’, Journalof Econometrics 34(3), 305–334.

Chen, J., Variyath, A. M. & Abraham, B. (2008), ‘Adjusted empirical likelihood and its properties’, Journalof Computational and Graphical Statistics 17, 426–443.

Chen, S. X. & Cui, H. (2007), ‘On the second-order properties of empirical likelihood with moment restric-tions’, Journal of Econometrics 141(2), 492–516.

Chou, C., Bentler, P. & Satorra, A. (1991), ‘Scaled test statistics and robust standard errors for non-normaldata in covariance structure analysis: a Monte Carlo study’, British Journal of Mathematical andStatistial Psychology 44(2), 347–357.

Conover, W. (1998), Practical nonparametric statistics, Wiley Series in Probability and Statistics, JohnWiley and Sons, New York.

Cressie, N. & Read, T. (1984), ‘Multinomial goodness-of-fit tests’, Journal of the Royal Statistical SocietySeries B 46, 440–464.

Davidson, R. & MacKinnon, J. (1993), Estimation and Inference in Econometrics, Oxford University Press,New York.

DiCiccio, T., Hall, P. & Romano, J. (1991), ‘Empirical likelihood is Bartlett-correctable’, The Annals ofStatistics 19(2), 1053–1061.

Godambe, V. P. & Thompson, M. E. (1984), ‘Robust estimation through estimating equations’, Biometrika71(1), 115–125.

Hall, A. R. (2005), Generalized Method of Moments, Oxford University Press.

Hall, A. R., Inoue, A., Jana, K. & Shin, C. (2007), ‘Information in generalized method of moments estima-tion and entropy-based moment selection’, Journal of Econometrics 138(2), 488–512.

Hall, P. & Horowitz, J. L. (1996), ‘Bootstrap critical values for tests based on generalized-method-of-moments estimators’, Econometrica 64(4), 891–916.

Hall, P. & La Scala, B. (1990), ‘Methodology and algorithms of empirical likelihood’, International Statis-tical Review / Revue Internationale de Statistique 58(2), 109–127.

22

Hansen, L. P. (1982), ‘Large sample properties of generalized method of moments estimators’, Econometrica12, 347–360.

Hansen, L. P., Heaton, J. & Yaron, A. (1996), ‘Finite-sample properties of some alternative GMM estima-tors’, Journal of Business & Economic Statistics 14(3), 262–280.

Hausman, J. (1978), ‘Specification tests in econmetrics’, Econometrica 46, 1251–1271.

Hayashi, F. (2000), Econometrics, Princeton University Press, Princeton, NJ.

Hu, L. T., Bentler, P. M. & Kano, Y. (1992), ‘Can test statistics in covariance structure analysis be trusted?’,Psychological Bulletin 112(2), 351–362.

Huber, P. (1967), The behavior of the maximum likelihood estimates under nonstandard conditions, in ‘Pro-ceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability’, Vol. 1, Univer-sity of California Press, Berkeley, pp. 221–233.

Huber, P. (1974), Robust Statistics, Wiley, New York.

Imbens, G. W. (2002), ‘Generalized method of moments and empirical likelihood’, Journal of Business andEconomic Statistics pp. 493–506.

Joreskog, K. (1969), ‘A general approach to confirmatory maximum likelihood factor analysis’, Psychome-trika 34(2), 183–202.

Joreskog, K. G. (1973), A general method for estimating a linear structural equation system, in A. Goldberger& O. Duncan, eds, ‘Structural Equation Models in the Social Sciences’, Seminar Press, New York,pp. 85–112.

Joreskog, K. & Goldberger, A. (1972), ‘Factor analysis by generalized least squares’, Psychometrika37(3), 243–260.

Kendall, M., Stuart, A., Ord, K. J. & Arnold, S. (1999), Advanced Theory of Statistics: Classical Infer-ence and and the Linear Model, Vol. 2A of Kendall’s Library of Statistics, 6 edn, A Hodder ArnoldPublication.

Kitamura, Y. (2007), Empirical likelihood methods in econometrics: Theory and practice, in R. Blundell,W. Newey & T. Persson, eds, ‘Advances in Economics and Econometrics, Vol. III’, Cambridge Uni-versity Press, New York.

Kitamura, Y. & Stutzer, M. (1997), ‘An information theoretic alternative to generalized method of momentsestimation’, Econometrica 65(4), 861–874.

Lawley, D. N. (1940), ‘The estimation of factor loadings by the method of maximum likelihood’, Proceed-ings of the Royal Society of Edinburgh 60, 64–82.

Lee, S. & Jennrich, R. (1979), ‘A study of algorithms for covariance structure analysis with specific com-parisons using factor analysis’, Psychometrika 44(1), 99–113.

23

MacCallum, R. C. & Austin, J. T. (2000), ‘Applications of structural equation modeling in psychologicalresearch’, Annual Review of Psychology 51(1), 201–226.

Magnus, J. R. & Neudecker, H. (1999), Matrix calculus with applications in statistics and econometrics,2nd edn, John Wiley & Sons.

Mardia, K. V. (1970), ‘Measures of multivariate skewness and kurtosis with applications’, Biometrika57, 519–530.

McArdle, J. J. & McDonald, R. P. (1984), ‘Some algebraic properties of the reticular action model formoment structures.’, The British Journal of Mathematical and Statistical Psychology 37, 234–251.

Mittelhammer, R. C., Judge, G. G. & Miller, D. J. (2000), Econometric Foundations, Cambridge UniversityPress.

Moustaki, I. & Victoria-Feser, M.-P. (2006), ‘Bounded influence robust estimation in generalized linearlatent variable models’, Journal of the American Statistical Association 101(474), 644–653. DOI10.1198/016214505000001320.

Muthen, B. (1984), ‘A general structural equation model with dichotomous, ordered categorical, and contin-uous latent variable indicators’, Psychometrika 49, 115–132.

Muthen, B. & Kaplan, D. (1985), ‘A comparison of some methodologies for the factor analysis of non-normal Likert variables’, British Journal of Mathematical and Statistical Psychology 38(2), 171–189.

Muthen, B. & Kaplan, D. (1992), ‘A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model’, British Journal of Mathematical and Statis-tical Psychology 45(1), 19–30.

Neudecker, H. & Satorra, A. (1991), ‘Linear structural relations: Gradient and Hessian of the fitting func-tion’, Statistics and Probability Letters 11, 57–61.

Newey, W. K. & Smith, R. J. (2004), ‘Higher order properties of gmm and generalized empirical likelihoodestimators’, Econometrica 72(1), 219–255.

Owen, A. B. (1988), ‘Empirical likelihood ratio confidence intervals for a single functional’, Biometrika75(2), 237–249.

Owen, A. B. (1990), ‘Empirical likelihood ratio confidence regions’, The Annals of Statistics 18(1), 90–120.

Owen, A. B. (1991), ‘Empirical likelihood for linear models’, The Annals of Statistics 19(4), 1725–1747.

Owen, A. B. (2001), Empirical Likelihood, Chapman & Hall/CRC.

Pugesek, B. H., Tomer, A. & von Eye, A., eds (2002), Structural Equation Modeling: Applications inEcological and Evolutionary Biology, Cambridge University Press.

Qin, J. & Lawless, J. (1994), ‘Empirical likelihood and general estimating equations’, The Annals of Statis-tics 22(1), 300–325.

24

Sargan, J. D. (1958), ‘The estimation of economic relationships using instrumental variables’, Econometrica26(3), 393–415.

Satorra, A. (1990), ‘Robustness issues in structural equation modeling: A review of recent developments’,Quality and Quantity 24, 367–386.

Satorra, A. & Bentler, P. M. (1990), ‘Model conditions for asymptotic robustness in the analysis oflinear relations’, Computational Statistics and Data Analysis 10(3), 235–249. doi:10.1016/0167-9473(90)90004-2.

Satorra, A. & Bentler, P. M. (1994), Corrections to test statistics and standard errors in covariance structureanalysis, in A. von Eye & C. C. Clogg, eds, ‘Latent variables analysis’, Sage, Thousands Oaks, CA,pp. 399–419.

Siemsen, E. & Bollen, K. A. (2007), ‘Least absolute deviation estimation in structural equation modeling’,Sociological Methods Research 36(2), 227–265.

Skrondal, A. (2000), ‘Design and analysis of Monte Carlo experiments: Attacking the conventional wis-dom’, Multivariate Behavioral Research 35(2), 137–167.

Smith, R. J. (1997), ‘Alternative semi-parametric likelihood approaches to generalised method of momentsestimation’, The Economic Journal 107(441), 503–519.

Stein, C. (1956), Efficient nonparametric testing and estimation, in ‘Proceedings of the Third Berkeley Sym-posium on Mathematical Statistics and Probability’, Vol. 1, University of California Press, Berkeley.

Wald, A. (1948), ‘Asymptotic properties of the maximum likelihood estimate of an unknown parameter of adiscrete stochastic process’, The Annals of Mathematical Statistics 19(1), 40–46.

White, H. (1982), ‘Maximum likelihood estimation of misspecified models’, Econometrica 50(1), 1–26.

Wilks, S. S. (1938), ‘The large sample distribution of the likelihood ratio for testing composite hypotheses’,The Annals of Mathematical Statistics 9(1), 60–62.

Wooldridge, J. M. (2002), Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge,MA.

Yuan, K.-H. & Bentler, P. M. (2007), Structural equation modeling, in C. Rao & S. Sinharay, eds, ‘Handbookof Statistics: Psychometrics’, Vol. 26 of Handbook of Statistics, Elsevier, chapter 10.

25

A Estimating equations for SEM estimators

For the maximum likelihood objective function (15), the matrix differential is

dFML(θ, S) = d{log |Σ(θ)|+ tr[Σ−1(θ)S]− log |S| − dim(x′y′)′}= tr[Σ−1(θ) d Σ(θ)]− tr[Σ−1(θ) d Σ(θ)Σ−1S]

= − tr[Σ−1(θ) d Σ(θ)Σ−1(S − Σ(θ))] = − vec[d Σ′(θ)]′(Σ−1 ⊗ Σ−1) vec[S − Σ(θ)]

= − d θ′σ(θ)′D′(Σ−1 ⊗ Σ−1)D vech[S − Σ(θ)],

where D is duplication matrix (Magnus & Neudecker 1999), and σ(·) is the Jacobian of the implied mo-ments:

σ(θ) =∂σ(θ)∂θ′

.

In order to satisfy the first order condition

dFML(·) = 0

for arbitrary d θ, the estimates must satisfy

σ(θ)′D′(Σ−1 ⊗ Σ−1)D(s− σ(θ)) = 0.

These are the estimating equations for FIML estimation method.Assuming the weight matrix W does not depend on the parameters over which minimization is per-

formed, the differential of the WLS objective function (17) is

dFWLS(θ, S,W ) = d{[s− σ(θ)]′W−1[s− σ(θ)]} = 2 d θ′σ(θ)′W−1(s− σ(θ)),

resulting in the first order conditions

σ(θ)′W−1(s− σ(θ)) = 0.

B Computation of EL estimates

Computing MELEs and profile likelihoods requires numeric optimization. Solvers capable of dealing withlarge number of variables are necessary for efficient computation. An interior point optimizing package“Knitro” for MATLAB was used in our early work and in the empirical example. Alternative optimizerNPSOL is much faster but often returns slightly violated constraints conditions which could make the pointestimates and the value of χ2 statistic inaccurate.

An alternative approach to finding the solutions θ and the empirical likelihood ratios R(θ) is based onduality of the optimization problem. The Lagrangian function for problem (3) is

H =n∑i=1

logwi + λ0(1−n∑i=1

wi)− nλ′n∑i=1

wim(xi, θ), (37)

26

where dimension t + q of the Lagrange multiplier λ matches that of m(xi, θ). By the standard empiricallikelihood calculations given in Owen (1990, 1991, 2001) or Qin & Lawless (1994), the Lagrange multiplierλ0 can be shown to be equal to n, and eliminated, and the weights have explicit expression

wi =1n

11 + λ′m(xi, θ)

. (38)

Thus, n+ t− 1-dimensional problem (3) is reduced to a saddle point search over smaller 2t+ q space:

θEL = arg maxθ

minλ

n∑i=1

log1n

11 + λ′m(xi, θ)

.

The solution is a point in the space of (θ, λ), so computations must be organized as an inner loop of mini-mization over λ, and an outer loop of maximization over θ (Hall & La Scala 1990, Kitamura 2007).

Also in order to obtain valid weights 0 ≤ wi ≤ 1, λ and θ must satisfy 1 + λ′m(xi, θ) ≥ 1/n for each i.To bypass this complication, Owen (2001)[Sec. 3.14] suggested to modify the objective function. Introducean augmented function

l∗(z) =

{log(z), z ≥ 1/n

log(1/n)− 3/2 + 2nz − (nz)2/2, z < 1/n.

It coincides with log(z) in the valid domain, penalizes its argument for going below 1/n but unlike log(z)allows it to become negative. It also provides a smooth transition with two continuous derivatives at z = 1/n.Then the objective function that can be safely used without regard to any other remaining restrictions is

L∗(λ, θ) =n∑i=1

l∗(1 + λ′m(xi, θ)

).

The resulting estimates and test are asymptotically equivalent to the EL, since the effect of the modificationis away from the relevant domain of the argument of l∗(·), which is 1 +Op(n−1/2).

Another complication that may arise in the empirical likelihood computations is that the true value ofthe parameter is not feasible for any combinations of the empirical likelihood weights wi unless the non-negativity condition is violated. In the simplest case of inference for the mean, the sample convex hullmay fail to capture the true mean. For general estimating equations case, the situation is more complicated,as the set of parameter values supported by the sample is not convex, although generally connected. Thisproblem is a convolution of a small sample and model specification issues. For properly specified models,the probability that zero is inside the convex hull of m(xi, θ) goes to 1 exponentially fast as n → ∞. Chenet al. (2008) proposed to augment the data, or estimating equations, by an extra artificial observation that isa scaled version of the sample mean. The “size” of this observation can be a slowly growing function of thesample size. Chen et al. (2008) suggested using

xn+1 = −anxn, an = max(1, log(n)/2).

In the inference for the mean problems, they showed that this adjustment does not affect the asymptotic prop-erties of EL, at the same time allowing to bypass the convex hull problem, and also incidentally improvingupon undercoverage in small samples.

27

With MELE and EL ratio at hand, an overall model fit test can be performed directly with EL ratio value,and confidence sets can be explicitly written as

Cθ,α = {θ|2 logR(θ)

R(θ)≥ χ2

u,α,

n∑i=1

wi vech[ZiZ ′i − Σ(θ)] = 0, wi ≥ 0,n∑i=1

wi = 1}.

For individual confidence interval of θk, we can solve the following constrained programming problem:

max or min{θk|2 logR(θ)

R(θ)≥ χ2

1,α,

n∑i=1

wi vech[(ZiZ ′i − Σ(θ))] = 0, wi ≥ 0,n∑i=1

wi = 1}.

to compute the maximum and minimum values allowed for specified significance level. Profile likelihoodfor parameter θk can be defined as

R∗(θk; θ−k) = maxw{n∏i=1

nwi|n∑i=1

wi vech[(ZiZ ′i − Σ(θ))] = 0, wi ≥ 0,n∑i=1

wi = 1},

in which all components of θ other than θk (denoted as θ−k) are fixed. Then the confidence interval for θkis obtained by solving

− 2 logR∗(θk; θ−k) = χ21,α, (39)

for θk fixing the remaining components at their MELE values θ−k. The idea can be straightforwardly adaptedfor AEL, as well.

28

C Simulation results

The following tables summarize the simulation results. Four distributions were considered to study possibleeffects of skewness and kurtosis, either marginal or joint:

D1: ξ ∼ N(0, 1), εk ∼ N(0, 1) for k = 1, 2, 3, εk is half-normal, scaled to have mean 0 and variance 1,with marginal skewness of

√2(4− π)/(π− 2)3/2 = 0/407 and kurtosis 8(π− 3)/(π− 2)2 = 0.869.

D2: ξ ∼ N(0, 1), εk ∼ N(0, 1) for k = 1, 2, 3, εk is lognormal with mean of logs equal to−0.5 ln(e1−1)−0.5 and variances of the logs equal to 1, scaled to have mean 0 and variance 1. The marginal skewnessof the error terms is (e + 2)∗

√(e−1) = 6.185, and marginal kurtosis is e4 + 2e3 + 3e2−6 = 110.9.

D3: ξ ∼ N(0, 1), εk ∼ N(0, 1) for k = 1, 2, 3, (ε4, ε5, ε6)′ ∼ t6, multivariate t distribution, scaled to haveunit marginal variances.

D4: x is multivariate skew normal (Azzalini & Dalla Valle 1996) with skewing vector 0.09(0,0,1,1,-1,-1),and the covariance matrix of the original distribution equal to

2 1 1 1 1 11 2 1 1 1 11 1 2.0665 1.0665 0.9335 0.93351 1 1.0665 2.0665 0.9335 0.93351 1 0.9335 0.9335 2.0665 1.06651 1 0.9335 0.9335 1.0665 2.0665

Throughout this appendix, single star (*) indicates nominal significance at 1% level, and double star

(**), at 0.1% level. F -tests are contrasts between estimators, for a given distribution, across all distributions.Significant p-value indicates that different estimators have different performance.

29

Table 2: Multivariate normality tests.

Sample Distributionsize D1 D2 D3 D4

Multivariate skewness, b1,p100 76.09 100.00 59.62 9.91200 98.66 100.00 70.31 9.26300 100.00 100.00 80.00 9.01500 100.00 100.00 81.40 8.111000 100.00 100.00 93.70 9.42

Multivariate kurtosis, b2,p100 14.35 99.50 55.77 7.75200 25.89 100.00 86.56 10.00300 39.57 100.00 97.29 9.01500 52.83 100.00 99.17 9.731000 82.96 100.00 100.00 9.96Entries are Monte-Carlo estimates of the power ofMardia (1970) test of asymptotic 90% size.

30

Table 3: Asymptotic χ2 approximation for goodness of fit tests.

LMAEL LMEL TAEL TEL TFIML

D1, n= 100 0.000 0.000 0.000 0.000 0.458 K-S test p-value63.48 43.04 20.87 25.22 12.61 Empirical test size

200 0.000 0.000 0.008 0.001 0.386 K-S test p-value30.80 24.11 15.63 15.63 12.05 Empirical test size



















Entries are p-values of Kolmogorov-Smirnov test statistic to compare the Monte Carlodistribution of a given test statistic with the asymptotic χ2

9, and Monte Carlo rejec-tion rates for the test of nominal 10% size using the quantile of the χ2

9 distribution,Prob[χ2

9 > 14.68] = 0.1.

31

Table 4: Estimates bias.

The entries are regression estimates and their standard errors of the two-term expansion (34). The standarderrors are corrected for clustering due to the same Monte Carlo sample.

λ2 λ3 λ4 λ5 λ6 φ11

b/se b/se b/se b/se b/se b/seULS, leading term for D1 -0.125 -0.113 -0.237 -0.154 -0.185 0.153

(0.134) (0.133) (0.139) (0.139) (0.137) (0.214)MLE, leading term for D1 -0.082 -0.047 -0.205 -0.091 -0.136 0.073

(0.130) (0.128) (0.135) (0.135) (0.131) (0.206)AEL, leading term for D1 -0.113 -0.079 -0.220 -0.159 -0.221 0.130

(0.132) (0.132) (0.137) (0.137) (0.136) (0.207)EL, leading term for D1 -0.114 -0.077 -0.223 -0.158 -0.221 0.111

(0.132) (0.132) (0.137) (0.138) (0.137) (0.211)ULS, leading term for D2 -0.034 -0.031 0.024 0.026 0.071 -0.086

(0.059) (0.060) (0.061) (0.061) (0.060) (0.097)MLE, leading term for D2 -0.051 -0.030 0.011 0.008 0.036 -0.041

(0.057) (0.058) (0.059) (0.059) (0.058) (0.095)AEL, leading term for D2 -0.060 -0.049 -0.069 -0.059 -0.072 0.076

(0.060) (0.060) (0.061) (0.059) (0.059) (0.096)EL, leading term for D2 -0.057 -0.046 -0.063 -0.055 -0.071 -0.015

(0.060) (0.060) (0.061) (0.059) (0.059) (0.098)ULS, leading term for D3 -0.074 0.014 0.050 0.036 -0.039 -0.044

(0.130) (0.134) (0.132) (0.130) (0.129) (0.209)MLE, leading term for D3 -0.078 0.006 0.076 0.052 -0.045 -0.064

(0.128) (0.132) (0.129) (0.128) (0.128) (0.205)AEL, leading term for D3 -0.066 0.030 0.118 0.075 0.007 -0.111

(0.131) (0.134) (0.129) (0.129) (0.132) (0.207)EL, leading term for D3 -0.065 0.032 0.118 0.076 0.005 -0.157

(0.131) (0.134) (0.129) (0.129) (0.132) (0.212)ULS, leading term for D4 -0.051 -0.659** -0.670** -0.659** -0.547** 0.422*

(0.089) (0.092) (0.092) (0.089) (0.089) (0.146)MLE, leading term for D4 -0.053 -0.653** -0.669** -0.665** -0.536** 0.435*

(0.086) (0.089) (0.089) (0.086) (0.087) (0.141)AEL, leading term for D4 -0.036 -0.673** -0.653** -0.689** -0.534** 0.430*

(0.090) (0.093) (0.093) (0.091) (0.090) (0.142)EL, leading term for D4 -0.035 -0.673** -0.656** -0.688** -0.536** 0.400*

(0.090) (0.093) (0.093) (0.091) (0.090) (0.145)

32

λ2 λ3 λ4 λ5 λ6 φ11

b/se b/se b/se b/se b/se b/seULS, n−1/2 term for D1 3.837 2.452 4.639 3.485 3.436 -1.738

(2.140) (2.075) (2.212) (2.207) (2.147) (3.332)MLE, n−1/2 term for D1 3.157 1.363 4.098 2.197 2.226 -0.470

(2.035) (1.961) (2.129) (2.118) (2.033) (3.174)AEL, n−1/2 term for D1 3.938 2.049 4.379 3.315 3.645 -5.529

(2.097) (2.088) (2.212) (2.201) (2.205) (3.203)EL, n−1/2 term for D1 3.966 2.005 4.428 3.296 3.656 -0.753

(2.097) (2.095) (2.211) (2.212) (2.211) (3.307)ULS, n−1/2 term for D2 1.654 1.744 0.654 1.285 0.038 2.186

(0.913) (0.942) (0.970) (0.950) (0.937) (1.506)MLE, n−1/2 term for D2 1.954 1.558 0.930 1.579 0.703 1.337

(0.873) (0.901) (0.932) (0.898) (0.910) (1.438)AEL, n−1/2 term for D2 2.174 1.806 1.416 1.786 1.577 -4.400*

(0.933) (0.957) (0.960) (0.923) (0.919) (1.462)EL, n−1/2 term for D2 2.118 1.740 1.297 1.699 1.554 2.004

(0.934) (0.958) (0.961) (0.922) (0.922) (1.521)ULS, n−1/2 term for D3 3.516 1.934 0.316 0.679 2.657 -0.322

(2.037) (2.080) (2.045) (2.014) (2.025) (3.127)MLE, n−1/2 term for D3 3.119 1.547 0.146 0.518 2.564 0.045

(1.962) (2.014) (1.967) (1.942) (1.996) (3.030)AEL, n−1/2 term for D3 3.116 1.033 -0.257 0.130 1.730 -3.230

(2.047) (2.055) (1.980) (1.993) (2.109) (3.060)EL, n−1/2 term for D3 3.085 0.992 -0.264 0.120 1.764 2.060

(2.048) (2.058) (1.982) (1.995) (2.114) (3.183)ULS, n−1/2 term for D4 2.160 7.778** 7.576** 6.305** 5.723** -3.029

(1.432) (1.453) (1.472) (1.397) (1.430) (2.307)MLE, n−1/2 term for D4 2.397 7.929** 7.902** 6.718** 5.652** -3.567

(1.349) (1.387) (1.397) (1.321) (1.352) (2.183)AEL, n−1/2 term for D4 2.396 8.639** 7.879** 7.757** 6.008** -8.580**

(1.426) (1.469) (1.486) (1.434) (1.425) (2.209)EL, n−1/2 term for D4 2.372 8.629** 7.935** 7.737** 6.033** -3.219

(1.426) (1.471) (1.483) (1.434) (1.429) (2.282)F -test, leading terms 0.602 0.000 0.000 0.000 0.000 0.000F -test, n−1/2 terms 0.126 0.000 0.000 0.000 0.000 0.000

33

θ1 θ2 θ3 θ4 θ5 θ6b/se b/se b/se b/se b/se b/se

ULS, leading term for D1 -0.298 -0.139 -0.267 0.049 0.074 0.056(0.148) (0.140) (0.138) (0.165) (0.171) (0.163)

MLE, leading term for D1 -0.270 -0.181 -0.275 0.025 0.022 0.064(0.142) (0.134) (0.132) (0.159) (0.164) (0.157)

AEL, leading term for D1 -0.119 -0.144 -0.169 0.209 0.220 0.253(0.144) (0.138) (0.131) (0.161) (0.169) (0.157)

EL, leading term for D1 -0.145 -0.159 -0.194 0.186 0.211 0.240(0.146) (0.140) (0.133) (0.164) (0.171) (0.160)

ULS, leading term for D2 -0.047 0.008 -0.028 -0.364 -0.323 -0.192(0.065) (0.063) (0.064) (0.350) (0.307) (0.320)

MLE, leading term for D2 -0.056 -0.010 -0.021 -0.257 -0.305 -0.260(0.063) (0.061) (0.062) (0.347) (0.299) (0.317)

AEL, leading term for D2 0.067 0.110 0.155 -1.282** -1.419** -1.103**(0.063) (0.062) (0.062) (0.255) (0.224) (0.247)

EL, leading term for D2 -0.005 0.038 0.078 -1.292** -1.434** -1.082**(0.064) (0.063) (0.063) (0.258) (0.226) (0.247)

ULS, leading term for D3 0.277 -0.026 0.049 -0.261 -0.160 0.189(0.138) (0.138) (0.140) (0.203) (0.197) (0.218)

MLE, leading term for D3 0.269 -0.032 0.069 -0.277 -0.098 0.194(0.135) (0.136) (0.137) (0.197) (0.192) (0.213)

AEL, leading term for D3 0.445* 0.130 0.242 -0.199 -0.116 0.299(0.136) (0.137) (0.135) (0.169) (0.183) (0.193)

EL, leading term for D3 0.421* 0.098 0.209 -0.228 -0.146 0.279(0.138) (0.139) (0.137) (0.171) (0.186) (0.196)

ULS, leading term for D4 -0.417** -0.454** 3.046** 3.087** 2.965** 3.055**(0.093) (0.094) (0.099) (0.099) (0.106) (0.101)

MLE, leading term for D4 -0.397** -0.435** 2.965** 3.019** 2.951** 3.043**(0.090) (0.092) (0.096) (0.096) (0.102) (0.099)

AEL, leading term for D4 -0.226 -0.271* 3.143** 3.133** 3.111** 3.178**(0.092) (0.093) (0.099) (0.097) (0.105) (0.099)

EL, leading term for D4 -0.251* -0.300* 3.116** 3.103** 3.081** 3.150**(0.093) (0.094) (0.100) (0.099) (0.106) (0.101)

34


ULS, n−1/2 term for D1 2.863 -1.744 1.648 -1.917 -4.320 -3.152(2.329) (2.166) (2.168) (2.575) (2.597) (2.561)

MLE, n−1/2 term for D1 2.190 -0.939 1.691 -1.697 -3.484 -3.311(2.215) (2.052) (2.057) (2.468) (2.485) (2.458)

AEL, n−1/2 term for D1 -6.165* -7.451** -6.232* -10.648** -13.222** -13.041**(2.243) (2.160) (2.056) (2.498) (2.582) (2.478)

EL, n−1/2 term for D1 -1.244 -2.788 -1.372 -5.766 -8.672* -8.397**(2.310) (2.210) (2.100) (2.576) (2.636) (2.541)

ULS, n−1/2 term for D2 -1.747 -2.070 -1.257 3.022 2.148 1.080(0.998) (0.973) (1.001) (5.664) (4.756) (4.769)

MLE, n−1/2 term for D2 -1.574 -1.637 -1.363 1.216 1.452 2.458(0.940) (0.929) (0.947) (5.559) (4.500) (4.701)

AEL, n−1/2 term for D2 -10.298** -10.523** -10.879** -18.433** -18.036** -22.220**(0.956) (0.937) (0.949) (3.593) (3.029) (3.337)

EL, n−1/2 term for D2 -4.329** -4.529** -4.789** -13.922** -13.401** -18.327**(0.979) (0.959) (0.973) (3.669) (3.089) (3.297)

ULS, n−1/2 term for D3 -6.675* -2.079 -2.046 2.136 0.622 -6.212(2.080) (2.141) (2.114) (3.313) (3.006) (3.307)

MLE, n−1/2 term for D3 -6.329* -1.247 -2.249 2.130 -0.876 -6.442(2.004) (2.074) (2.045) (3.118) (2.893) (3.189)

AEL, n−1/2 term for D3 -15.227** -10.030** -10.869** -11.035** -12.767** -19.753**(2.036) (2.093) (2.014) (2.521) (2.691) (2.785)

EL, n−1/2 term for D3 -10.362** -4.989 -5.813* -6.078 -7.820* -14.986**(2.086) (2.147) (2.058) (2.580) (2.763) (2.865)

ULS, n−1/2 term for D4 1.112 2.054 -27.242** -27.259** -24.438** -26.475**(1.458) (1.470) (1.552) (1.553) (1.688) (1.582)

MLE, n−1/2 term for D4 0.850 1.961 -25.776** -26.220** -24.079** -25.923**(1.381) (1.401) (1.476) (1.456) (1.568) (1.503)

AEL, n−1/2 term for D4 -8.364** -6.912** -36.605** -35.525** -34.699** -36.304**(1.415) (1.425) (1.514) (1.482) (1.610) (1.513)

EL, n−1/2 term for D4 -3.236 -1.690 -30.927** -29.781** -28.928** -30.595**(1.447) (1.463) (1.550) (1.523) (1.648) (1.553)

F -test, leading terms 0.000 0.000 0.000 0.000 0.000 0.000F -test, n−1/2 terms 0.000 0.000 0.000 0.000 0.000 0.000

35

Table 5: Estimates efficiency.The entries are regression estimates and their standard errors of the two-term expansion (35). The standarderrors are corrected for clustering due to the same Monte Carlo sample.

λ2 λ3 λ4 λ5 λ6 φ11

b/se b/se b/se b/se b/se b/seULS, leading term for D1 1.973 2.317 2.313 2.340 2.340 6.236

(0.317) (0.304) (0.420) (0.392) (0.346) (0.747)MLE, leading term for D1 2.170 2.472 2.282 2.305 2.393 6.468

(0.306) (0.291) (0.396) (0.382) (0.333) (0.732)AEL, leading term for D1 2.109 2.139 1.977 2.059 1.786 6.233

(0.351) (0.323) (0.455) (0.407) (0.408) (0.719)EL, leading term for D1 2.109 2.118 1.980 2.024 1.767 5.927

(0.351) (0.324) (0.457) (0.409) (0.409) (0.762)ULS, leading term for D2 2.411 2.228 2.130 2.367 2.280 6.641












(0.238) (0.241) (0.239) (0.248) (0.236) (0.593)

36

λ2 λ3 λ4 λ5 λ6 φ11

b/se b/se b/se b/se b/se b/seULS, n−1/2 term for D1 7.698 1.203 4.745 5.485 1.590 2.611

(5.656) (4.971) (7.641) (6.855) (5.846) (11.828)MLE, n−1/2 term for D1 5.000 -1.202 5.985 6.280 0.674 -0.218

(5.285) (4.663) (7.101) (6.711) (5.501) (11.613)AEL, n−1/2 term for D1 8.606 7.241 13.567 12.437 14.756 4.686

(6.295) (5.515) (8.617) (7.375) (7.535) (11.350)EL, n−1/2 term for D1 8.612 7.708 13.498 13.187 15.156 14.112

(6.299) (5.543) (8.654) (7.443) (7.582) (12.449)ULS, n−1/2 term for D2 1.639 5.997 8.386 4.278 5.134 1.853

(2.256) (2.547) (2.816) (2.887) (2.415) (5.818)MLE, n−1/2 term for D2 1.089 5.591 9.220 2.464 7.277 0.288

(2.256) (2.627) (2.714) (2.467) (2.557) (5.473)AEL, n−1/2 term for D2 8.709 13.708 12.659 7.630 7.209 5.329

(2.815) (3.083) (3.840) (2.616) (2.686) (5.619)EL, n−1/2 term for D2 8.920 13.904 12.776 7.420 7.642 18.300

(2.831) (3.089) (3.908) (2.591) (2.705) (6.170)ULS, n−1/2 term for D3 5.620 3.265 3.089 3.361 7.205 -20.386

(5.133) (5.909) (5.418) (5.019) (4.576) (12.171)MLE, n−1/2 term for D3 3.775 2.524 2.353 2.004 9.122 -22.902

(4.904) (5.717) (5.001) (4.697) (4.594) (11.658)AEL, n−1/2 term for D3 10.424 4.350 4.454 6.816 18.582 -20.512

(5.403) (5.126) (5.111) (4.917) (5.897) (12.581)EL, n−1/2 term for D3 10.511 4.616 4.617 6.992 18.909 -8.394

(5.420) (5.139) (5.115) (4.948) (5.962) (14.256)ULS, n−1/2 term for D4 6.286 1.737 4.542 0.107 5.096 4.480

(3.852) (3.982) (3.983) (3.363) (3.944) (10.014)MLE, n−1/2 term for D4 6.428 3.423 6.496 0.692 5.798 7.296

(3.506) (3.887) (3.733) (3.232) (3.575) (9.202)AEL, n−1/2 term for D4 12.466 10.613 13.305 9.928 11.517 9.518

(4.047) (4.044) (4.038) (4.207) (4.025) (9.281)EL, n−1/2 term for D4 12.466 10.812 12.961 9.834 11.985 20.442


37


ULS, leading term for D1 2.876 2.848 2.569 3.766 4.628 3.514(0.371) (0.319) (0.327) (0.470) (0.490) (0.505)

MLE, leading term for D1 2.878 2.840 2.504 3.725 4.511 3.325(0.359) (0.311) (0.309) (0.455) (0.474) (0.481)

AEL, leading term for D1 2.865 2.449 2.420 3.713 4.383 3.132(0.382) (0.340) (0.300) (0.484) (0.498) (0.511)

EL, leading term for D1 2.731 2.372 2.369 3.545 4.307 3.010(0.399) (0.346) (0.308) (0.507) (0.513) (0.542)













38


ULS, n−1/2 term for D1 4.650 -2.900 5.049 1.636 -11.425 3.180(5.814) (4.843) (5.094) (7.632) (7.197) (8.851)

MLE, n−1/2 term for D1 4.429 -2.607 5.920 2.862 -9.156 6.277(5.614) (4.731) (4.761) (7.313) (6.937) (8.422)

AEL, n−1/2 term for D1 5.894 5.990 7.458 5.172 -4.240 10.827(6.128) (5.684) (4.658) (8.037) (7.524) (9.527)

EL, n−1/2 term for D1 10.192 8.891 9.916 10.590 -0.765 15.158(6.545) (5.810) (4.841) (8.600) (7.856) (10.259)

ULS, n−1/2 term for D2 -2.211 -1.887 3.305 418.918 52.535 -191.131(2.349) (2.338) (2.511) (456.246) (173.799) (189.255)

MLE, n−1/2 term for D2 -4.116 -1.603 1.594 515.605 -15.057 -115.106(2.227) (2.251) (2.344) (603.379) (160.489) (199.914)

AEL, n−1/2 term for D2 -2.837 -0.864 1.060 -433.131 -455.161 -569.998(2.306) (2.431) (2.326) (150.233) (45.925) (122.664)

EL, n−1/2 term for D2 0.209 1.966 4.295 -404.847 -439.127 -609.404(2.416) (2.570) (2.504) (161.682) (47.809) (111.105)

ULS, n−1/2 term for D3 -4.868 4.474 -3.059 43.845 -3.866 -13.443(4.928) (6.290) (5.325) (28.944) (13.054) (27.874)

MLE, n−1/2 term for D3 -5.736 3.088 -3.798 34.223 -6.211 -17.643(4.692) (6.175) (5.199) (25.754) (12.069) (27.043)

AEL, n−1/2 term for D3 -4.250 3.471 -5.603 -5.635 -20.490 -36.826(4.430) (5.633) (5.238) (7.635) (9.835) (14.502)

EL, n−1/2 term for D3 -1.215 7.050 -2.955 -1.142 -14.730 -30.509(4.643) (6.088) (5.482) (8.005) (10.201) (15.193)

ULS, n−1/2 term for D4 -0.781 -1.974 0.219 -0.996 3.381 -2.742(3.402) (3.942) (3.954) (3.932) (4.851) (4.068)

MLE, n−1/2 term for D4 -0.024 -0.439 0.642 -2.925 0.747 -1.945(3.252) (3.808) (3.727) (3.697) (4.400) (3.961)

AEL, n−1/2 term for D4 1.922 0.727 0.995 -1.192 1.101 -1.950(3.578) (3.871) (4.056) (3.803) (5.036) (4.243)

EL, n−1/2 term for D4 4.797 4.340 4.149 2.482 4.729 1.723(3.751) (4.061) (4.244) (4.103) (5.284) (4.516)


39

Table 6: Estimates mean squared error.

The entries are regression estimates and their standard errors of the two-term expansion accurate to op(n−1).The standard errors are corrected for clustering due to the same Monte Carlo sample.

λ2 λ3 λ4 λ5 λ6 φ11

b/se b/se b/se b/se b/se b/seULS, leading term for D1 1.918 2.282 2.190 2.274 2.272 6.247
















(0.243) (0.250) (0.246) (0.256) (0.241) (0.604)

40

λ2 λ3 λ4 λ5 λ6 φ11

b/se b/se b/se b/se b/se b/seULS, n−1/2 term for D1 9.597 2.008 7.530 7.271 3.193 3.310

(6.346) (5.184) (8.398) (7.359) (6.318) (11.569)MLE, n−1/2 term for D1 6.439 -0.777 8.169 7.107 1.399 0.028

(5.786) (4.801) (7.707) (7.067) (5.752) (11.400)AEL, n−1/2 term for D1 10.620 7.824 15.862 13.699 15.946 9.583

(6.991) (5.691) (9.297) (7.762) (7.979) (10.816)EL, n−1/2 term for D1 10.649 8.273 15.830 14.436 16.351 14.435

(6.996) (5.718) (9.340) (7.819) (8.033) (12.198)ULS, n−1/2 term for D2 1.906 6.186 8.423 4.522 5.134 2.092

(2.321) (2.591) (2.854) (2.932) (2.437) (5.891)MLE, n−1/2 term for D2 1.450 5.731 9.305 2.787 7.374 0.380

(2.331) (2.671) (2.765) (2.505) (2.597) (5.517)AEL, n−1/2 term for D2 9.199 13.924 12.799 7.852 7.391 7.717

(2.930) (3.158) (3.890) (2.680) (2.723) (5.494)EL, n−1/2 term for D2 9.389 14.104 12.893 7.622 7.821 18.665

(2.945) (3.160) (3.953) (2.649) (2.741) (6.275)ULS, n−1/2 term for D3 7.503 4.002 3.165 3.492 8.043 -20.329

(5.680) (6.179) (5.499) (5.126) (4.714) (11.992)MLE, n−1/2 term for D3 5.125 3.011 2.366 2.093 9.852 -23.129

(5.357) (5.970) (5.080) (4.786) (4.717) (11.538)AEL, n−1/2 term for D3 11.751 4.586 4.357 6.803 19.033 -18.100

(5.837) (5.292) (5.219) (5.025) (6.055) (12.244)EL, n−1/2 term for D3 11.817 4.836 4.519 6.977 19.370 -9.113

(5.854) (5.305) (5.222) (5.056) (6.125) (14.242)ULS, n−1/2 term for D4 6.724 -1.088 1.658 -2.917 2.508 2.635

(3.966) (4.118) (4.109) (3.433) (3.991) (10.282)MLE, n−1/2 term for D4 6.977 1.001 4.069 -2.040 3.436 5.516

(3.621) (4.031) (3.874) (3.314) (3.626) (9.373)AEL, n−1/2 term for D4 13.082 8.718 11.045 7.526 9.331 11.651

(4.156) (4.243) (4.173) (4.341) (4.096) (9.006)EL, n−1/2 term for D4 13.072 8.905 10.706 7.418 9.794 18.948


41


















42


ULS, n−1/2 term for D1 4.190 -1.831 4.831 1.761 -8.649 4.208(5.826) (5.023) (5.145) (7.577) (7.147) (8.461)

MLE, n−1/2 term for D1 3.762 -2.107 5.612 2.808 -7.246 7.424(5.572) (4.843) (4.800) (7.263) (6.837) (8.033)

AEL, n−1/2 term for D1 12.800 16.168 14.716 15.910 14.949 28.211(6.195) (6.246) (5.219) (7.938) (7.769) (8.419)

EL, n−1/2 term for D1 10.711 10.974 10.709 12.617 6.637 21.189(6.386) (5.826) (4.899) (8.352) (7.569) (9.293)

ULS, n−1/2 term for D2 -1.586 -1.231 3.513 418.272 51.447 -191.318(2.346) (2.325) (2.506) (456.232) (172.005) (187.744)

MLE, n−1/2 term for D2 -3.556 -1.131 1.847 515.119 -15.938 -116.018(2.223) (2.221) (2.340) (602.804) (158.575) (198.692)

AEL, n−1/2 term for D2 10.943 12.916 14.168 -344.173 -360.641 -457.965(2.642) (2.582) (2.654) (139.864) (38.480) (114.209)

EL, n−1/2 term for D2 2.957 4.699 6.653 -346.174 -377.152 -527.335(2.436) (2.509) (2.526) (152.169) (40.698) (104.709)

ULS, n−1/2 term for D3 -2.855 5.382 -2.224 43.439 -4.101 -11.453(4.908) (6.162) (5.267) (28.870) (12.813) (27.199)

MLE, n−1/2 term for D3 -4.045 3.593 -2.980 33.784 -6.173 -15.590(4.666) (6.086) (5.137) (25.665) (11.716) (26.387)

AEL, n−1/2 term for D3 14.571 16.004 7.477 15.891 4.643 2.982(5.335) (6.017) (5.469) (8.978) (9.778) (13.927)

EL, n−1/2 term for D3 5.030 10.139 0.491 6.862 -4.348 -9.676(4.887) (6.055) (5.385) (8.451) (9.727) (14.424)

ULS, n−1/2 term for D4 -1.537 -3.597 -78.898 -83.325 -73.443 -83.267(3.442) (3.814) (6.305) (6.606) (7.674) (6.757)

MLE, n−1/2 term for D4 -0.646 -1.916 -73.879 -81.047 -73.431 -79.999(3.263) (3.690) (5.949) (6.202) (7.029) (6.492)

AEL, n−1/2 term for D4 15.953 12.382 -62.332 -68.599 -64.618 -68.598(4.087) (4.058) (5.717) (5.677) (6.720) (5.920)

EL, n−1/2 term for D4 8.044 6.073 -72.025 -76.064 -71.559 -76.978(3.832) (3.950) (6.180) (6.314) (7.538) (6.570)


43

Table 7: Confidence interval coverage.

The entries are regression estimates and their standard errors of the two-term expansion (36). The standarderrors are corrected for clustering due to the same Monte Carlo sample.

Left limit, λ2 λ3 λ4 λ5 λ6 φ11

nominal 95% b/se b/se b/se b/se b/se b/seULS, leading term for D1 -1.155 -1.092 -1.224 0.384 1.261 0.044

(2.632) (2.439) (2.648) (2.486) (2.569) (2.535)MLE, leading term for D1 -1.252 -1.526 -1.320 1.340 1.582 0.419

(1.688) (1.333) (1.722) (1.447) (1.495) (1.488)AEL, leading term for D1 -0.092 -0.767 1.706 1.003 2.612 0.454

(1.621) (1.437) (1.787) (1.479) (1.559) (1.439)EL, leading term for D1 0.210 -0.767 2.880 1.595 2.914 0.954

(1.646) (1.437) (1.868) (1.531) (1.584) (1.586)ULS, leading term for D2 7.558** 8.127** 8.492** 7.245** 7.464** 8.810**


(0.695) (0.625) (0.683) (0.691) (0.688) (0.657)AEL, leading term for D2 0.800 1.746 2.226* 0.431 0.703 1.541

(0.774) (0.709) (0.746) (0.793) (0.755) (0.621)EL, leading term for D2 1.229 2.199* 2.809** 0.601 1.024 2.323*

(0.797) (0.725) (0.770) (0.808) (0.781) (0.737)ELprofile, leading term for D2 0.653 -0.079 -1.451 -1.609 -1.139 -5.543*

(2.011) (1.926) (2.159) (2.220) (2.184) (2.145)ULS, leading term for D3 0.298 -0.137 -0.076 -1.337 1.327 -0.853

(2.178) (2.170) (2.134) (2.060) (2.034) (2.061)MLE, leading term for D3 1.850 0.461 1.129 0.219 2.247 -0.248

(1.596) (1.580) (1.535) (1.343) (1.278) (1.396)AEL, leading term for D3 1.212 -0.018 -0.341 0.798 3.908* -0.064

(1.703) (1.687) (1.627) (1.441) (1.514) (1.336)EL, leading term for D3 2.009 0.670 -0.453 0.835 4.098* -0.890


(1.556) (1.436) (1.442) (1.417) (1.446) (1.600)MLE, leading term for D4 -1.012 2.759* 2.362* 3.326** 1.975 0.100

(1.010) (0.845) (0.828) (0.735) (0.839) (1.075)AEL, leading term for D4 -0.481 3.102* 2.414* 3.385** 2.909** 0.765

(1.086) (0.954) (0.847) (0.856) (0.827) (0.928)EL, leading term for D4 0.280 3.175** 2.785* 3.747** 3.191** 0.591

(1.141) (0.962) (0.911) (0.879) (0.867) (1.106)ELprofile, leading term for D4 -1.524 3.567 4.939* 4.104 1.896 -10.683**

(1.871) (1.654) (1.703) (1.615) (1.581) (2.067)44

Left limit, λ2 λ3 λ4 λ5 λ6 φ11

nominal 95% b/se b/se b/se b/se b/se b/seULS, n−1/2 term for D1 -54.064 -39.677 -55.740 -73.860 -78.511 -68.242

(42.450) (39.096) (42.726) (40.648) (42.228) (41.297)MLE, n−1/2 term for D1 41.490 57.229** 39.814 4.309 15.744 22.324

(24.215) (16.176) (24.874) (21.978) (22.628) (21.734)AEL, n−1/2 term for D1 20.940 39.097 -19.956 5.674 -9.672 28.762

(24.020) (19.563) (28.647) (22.347) (24.907) (20.620)EL, n−1/2 term for D1 13.150 39.097 -46.339 -8.314 -17.462 5.081

(24.804) (19.563) (30.925) (23.946) (25.646) (24.259)ULS, n−1/2 term for D2 -159.489** -171.549** -173.557** -157.195** -157.392** -181.828**

(16.996) (16.766) (17.038) (17.111) (16.866) (17.142)MLE, n−1/2 term for D2 15.549 2.943 9.482 18.249 22.552 4.459

(10.190) (9.325) (10.182) (10.052) (9.880) (9.835)AEL, n−1/2 term for D2 -3.791 -13.006 -21.937 -3.668 -3.044 6.614

(11.876) (11.049) (11.852) (12.150) (11.533) (9.222)EL, n−1/2 term for D2 -16.663 -22.624 -36.189* -10.932 -13.321 -28.010

(12.483) (11.511) (12.489) (12.501) (12.146) (11.793)ELprofile, n−1/2 term for D2 -164.138** -152.140** -173.342** -201.052** -200.312** -120.306**

(33.217) (31.776) (35.271) (36.303) (35.809) (34.320)ULS, n−1/2 term for D3 -190.794** -184.980** -177.624** -159.668** -203.830** -165.094**

(36.703) (36.453) (35.809) (34.229) (35.098) (34.423)MLE, n−1/2 term for D3 -5.133 16.070 13.633 24.731 -9.159 33.457

(24.688) (23.416) (22.943) (18.790) (19.588) (19.313)AEL, n−1/2 term for D3 -4.378 10.705 26.626 5.871 -49.054 36.451

(26.243) (25.252) (23.719) (21.352) (25.038) (18.159)EL, n−1/2 term for D3 -21.985 -7.541 27.237 2.898 -56.868 36.677

(27.739) (26.862) (23.723) (22.120) (26.174) (20.637)ULS, n−1/2 term for D4 -205.728** -256.946** -251.166** -251.795** -250.123** -229.190**

(27.474) (26.795) (26.815) (26.514) (26.802) (28.363)MLE, n−1/2 term for D4 43.072* -6.251 7.441 -6.002 7.207 14.675

(13.955) (13.196) (12.462) (11.524) (12.519) (16.041)AEL, n−1/2 term for D4 20.018 -19.053 0.597 -20.113 -10.506 18.096

(15.981) (15.392) (12.963) (13.882) (13.034) (13.481)EL, n−1/2 term for D4 0.372 -22.044 -12.650 -28.439 -19.908 1.236

(17.505) (15.597) (14.456) (14.535) (14.000) (16.950)ELprofile, n−1/2 term for D4 -136.577** -176.450** -202.557** -167.294** -143.031** -25.723


45

Left limit, θ1 θ2 θ3 θ4 θ5 θ6nominal 95% b/se b/se b/se b/se b/se b/seULS, leading term for D1 2.423 1.443 2.250 -0.248 -0.534 1.430

(2.507) (2.413) (2.438) (2.549) (2.466) (2.447)MLE, leading term for D1 2.340 2.718 2.940 0.375 -0.395 2.502

(1.463) (1.175) (1.312) (1.502) (1.351) (1.258)AEL, leading term for D1 1.493 3.194* 2.968** 0.119 0.818 2.920

(1.254) (0.980) (0.643) (1.425) (1.292) (1.158)EL, leading term for D1 2.292 4.062** 5.698** -0.132 0.728 3.107


(0.958) (0.929) (0.963) (0.752) (0.752) (0.759)MLE, leading term for D2 -0.163 1.096 0.128 4.563** 4.390** 4.493**

(0.648) (0.641) (0.672) (0.187) (0.216) (0.257)AEL, leading term for D2 0.435 1.224 1.201 3.812** 3.996** 4.012**

(0.572) (0.567) (0.545) (0.377) (0.377) (0.377)EL, leading term for D2 0.275 0.949 1.187 3.640** 3.993** 3.703**

(0.712) (0.693) (0.677) (0.416) (0.403) (0.409)ELprofile, leading term for D2 0.175 1.751 0.600 1.646 0.733 -0.648

(1.301) (1.272) (1.326) (1.284) (1.122) (1.273)ULS, leading term for D3 -0.111 -0.732 -0.543 3.477 0.381 0.188

(2.004) (2.085) (2.123) (1.680) (1.993) (1.950)MLE, leading term for D3 1.296 -0.019 0.401 4.418** 1.590 1.658

(1.242) (1.414) (1.473) (0.587) (1.304) (1.255)AEL, leading term for D3 1.683 0.617 0.251 4.608** 2.155 1.840

(1.065) (1.309) (1.297) (0.416) (1.097) (1.208)EL, leading term for D3 1.275 0.023 1.276 4.881** 2.343 1.434

(1.355) (1.578) (1.497) (0.799) (1.336) (1.274)ULS, leading term for D4 11.201** 11.316** -31.664** -30.615** -31.126** -29.527**

(1.472) (1.454) (2.452) (2.442) (2.497) (2.446)MLE, leading term for D4 1.532 2.260* -39.857** -39.325** -39.215** -38.619**

(0.867) (0.850) (2.188) (2.190) (2.262) (2.208)AEL, leading term for D4 2.743** 2.756** -34.639** -31.471** -33.517** -33.897**

(0.671) (0.738) (2.020) (2.005) (2.110) (2.049)EL, leading term for D4 1.988 2.454* -38.458** -37.311** -37.281** -36.032**

(0.859) (0.901) (2.203) (2.178) (2.231) (2.205)ELprofile, leading term for D4 1.562 1.891 -45.781** -41.664** -41.472** -42.387**

(1.082) (1.088) (2.351) (2.343) (2.381) (2.382)

46

Left limit, θ1 θ2 θ3 θ4 θ5 θ6nominal 95% b/se b/se b/se b/se b/se b/seULS, n−1/2 term for D1 -110.327* -73.107 -97.639 -73.238 -48.888 -82.710

(41.869) (39.717) (40.608) (41.518) (39.796) (40.366)MLE, n−1/2 term for D1 -16.390 2.606 -16.475 13.240 44.172 1.969

(23.336) (17.913) (21.067) (22.246) (17.832) (19.286)AEL, n−1/2 term for D1 17.217 4.561 9.706 25.877 31.166 0.675

(18.065) (14.635) (7.258) (20.261) (17.878) (17.812)EL, n−1/2 term for D1 -21.217 -20.977 -61.998* 11.785 6.009 -13.630

(24.572) (20.443) (23.966) (23.364) (24.895) (20.823)ULS, n−1/2 term for D2 -145.716** -165.322** -159.156** -179.593** -179.593** -181.124**

(16.726) (16.607) (16.986) (14.791) (14.791) (14.895)MLE, n−1/2 term for D2 31.252** 11.327 30.167* 3.743 5.721 2.075

(8.963) (9.413) (9.500) (2.498) (2.823) (3.823)AEL, n−1/2 term for D2 39.206** 24.111* 29.821** 0.246 -0.783 0.082

(7.483) (7.861) (7.333) (5.679) (5.766) (5.754)EL, n−1/2 term for D2 19.635 5.302 6.750 -2.006 -2.701 0.062

(10.425) (10.392) (10.132) (6.349) (6.266) (6.199)ELprofile, n−1/2 term for D2 -11.790 -39.757 -22.276 -25.345 -3.315 8.759

(20.242) (20.573) (20.938) (20.528) (17.139) (19.197)ULS, n−1/2 term for D3 -165.151** -169.420** -177.228** -203.981** -173.100** -164.036**

(33.717) (34.871) (35.581) (30.696) (33.786) (32.979)MLE, n−1/2 term for D3 22.839 29.763 18.232 -7.271 16.907 21.769

(17.508) (19.861) (21.434) (9.327) (18.980) (17.966)AEL, n−1/2 term for D3 22.548 29.038 31.364 -6.777 20.884 24.781

(14.377) (18.219) (17.732) (6.746) (15.357) (17.112)EL, n−1/2 term for D3 11.271 15.993 1.736 -24.276 4.767 23.005

(19.905) (23.223) (22.633) (13.852) (20.239) (18.165)ULS, n−1/2 term for D4 -235.580** -242.845** 204.209** 193.581** 173.279** 176.921**

(26.947) (26.774) (36.628) (36.517) (37.652) (36.722)MLE, n−1/2 term for D4 17.469 -0.459 431.852** 426.100** 402.198** 410.946**

(12.622) (12.977) (29.043) (29.172) (31.089) (29.770)AEL, n−1/2 term for D4 11.938 5.756 425.105** 380.290** 379.210** 398.598**

(9.620) (11.040) (25.370) (25.740) (28.069) (26.369)EL, n−1/2 term for D4 6.998 -9.020 415.462** 402.620** 383.005** 369.691**

(12.887) (14.121) (29.662) (29.196) (30.620) (30.123)ELprofile, n−1/2 term for D4 -10.618 -24.998 439.117** 381.701** 359.986** 375.613**


47

Right limit, λ2 λ3 λ4 λ5 λ6 φ11

nominal 95% b/se b/se b/se b/se b/se b/seULS, leading term for D1 1.241 -1.352 -2.322 -0.154 -1.911 1.458

(1.571) (1.787) (1.878) (1.892) (1.760) (2.031)MLE, leading term for D1 1.177 -1.422 -3.461 0.671 -0.750 1.315

(1.702) (1.840) (1.987) (1.995) (1.859) (2.165)AEL, leading term for D1 1.703 -1.854 -0.702 2.200 -1.462 2.086

(1.749) (1.921) (2.049) (2.082) (2.000) (2.383)EL, leading term for D1 1.602 -1.497 -0.691 2.491 -0.870 2.555

(1.753) (1.972) (2.050) (2.098) (2.036) (2.245)ULS, leading term for D2 -1.074 -0.285 -0.289 -0.457 0.868 -0.766

(0.858) (0.878) (0.910) (0.857) (0.868) (0.900)MLE, leading term for D2 -0.123 0.425 0.724 -0.525 1.344 0.303

(0.881) (0.906) (0.940) (0.893) (0.897) (0.928)AEL, leading term for D2 0.877 0.589 -1.036 -0.355 1.497 2.080

(0.915) (0.955) (1.060) (1.002) (0.991) (1.042)EL, leading term for D2 1.069 1.113 -0.568 -0.367 1.688 1.560

(0.937) (0.978) (1.079) (1.015) (1.008) (0.970)ELprofile, leading term for D2 -2.241 -0.554 -0.383 -2.504 -3.925 -5.861

(1.927) (1.972) (2.282) (2.233) (2.157) (2.276)ULS, leading term for D3 1.405 -0.689 2.716 -1.290 0.821 -2.621

(1.566) (1.797) (1.810) (1.815) (1.746) (2.075)MLE, leading term for D3 0.976 -0.510 2.291 -0.773 1.338 -2.815

(1.573) (1.804) (1.821) (1.891) (1.829) (2.078)AEL, leading term for D3 1.544 0.430 3.507 1.219 2.191 -0.578


(1.794) (1.942) (2.016) (2.012) (1.906) (2.174)ULS, leading term for D4 0.681 -8.084** -7.724** -6.568** -3.964* 2.257

(1.263) (1.543) (1.576) (1.497) (1.519) (1.159)MLE, leading term for D4 1.897 -7.544** -6.405** -6.458** -2.498 3.439*

(1.290) (1.598) (1.596) (1.511) (1.546) (1.235)AEL, leading term for D4 2.313 -7.828** -5.382* -5.216** -2.865 5.493**

(1.306) (1.604) (1.672) (1.518) (1.532) (1.374)EL, leading term for D4 2.760 -6.963** -4.695* -5.192** -2.841 5.223**

(1.336) (1.658) (1.701) (1.546) (1.576) (1.290)ELprofile, leading term for D4 -2.325 -12.326** -10.877** -12.368** -8.219** -2.244

(1.856) (2.074) (2.082) (2.062) (2.036) (1.991)

48

Right limit, λ2 λ3 λ4 λ5 λ6 φ11

nominal 95% b/se b/se b/se b/se b/se b/seULS, n−1/2 term for D1 -26.301 10.881 24.096 -29.928 20.112 -38.296

(24.954) (26.798) (27.790) (29.982) (25.844) (32.988)MLE, n−1/2 term for D1 -25.241 9.238 34.486 -53.349 -4.660 -48.560

(27.126) (27.769) (29.186) (32.425) (28.668) (35.315)AEL, n−1/2 term for D1 -45.148 12.157 -33.492 -79.996 -15.341 -90.730

(28.561) (29.015) (32.556) (34.738) (31.127) (39.612)EL, n−1/2 term for D1 -46.292 0.664 -35.084 -86.194 -29.329 -80.244

(28.615) (30.294) (32.594) (35.169) (32.200) (37.404)ULS, n−1/2 term for D2 -1.067 -13.066 -19.462 -11.952 -37.158* -15.063

(13.003) (13.581) (14.180) (13.182) (13.801) (13.908)MLE, n−1/2 term for D2 -22.160 -30.793 -43.796* -17.859 -52.762** -37.800

(13.732) (14.309) (15.013) (13.840) (14.460) (14.703)AEL, n−1/2 term for D2 -49.653** -48.622* -52.211* -51.775* -84.439** -103.476**

(14.663) (15.275) (16.816) (15.949) (16.213) (17.196)EL, n−1/2 term for D2 -59.237** -63.596** -66.560** -55.518** -93.164** -72.839**


(31.183) (32.350) (37.521) (36.390) (34.903) (36.434)ULS, n−1/2 term for D3 -12.239 6.054 -46.481 13.370 -11.161 25.316

(24.172) (27.032) (29.244) (27.016) (26.978) (30.972)MLE, n−1/2 term for D3 -16.047 -7.548 -51.535 -11.636 -36.163 16.539

(24.238) (27.478) (29.394) (28.942) (28.976) (31.117)AEL, n−1/2 term for D3 -39.478 -45.319 -89.000* -59.247 -63.012 -49.304

(27.727) (30.650) (32.740) (30.981) (30.663) (36.661)EL, n−1/2 term for D3 -55.835 -48.458 -97.426* -68.767 -68.464 -45.440

(29.054) (30.734) (33.524) (32.460) (31.057) (34.470)ULS, n−1/2 term for D4 -23.547 77.241** 66.559* 48.797 16.813 -43.425

(20.023) (22.127) (22.940) (21.850) (23.090) (18.917)MLE, n−1/2 term for D4 -47.670 58.222 40.730 42.797 -15.004 -71.176**

(21.010) (23.473) (23.818) (22.209) (24.130) (20.731)AEL, n−1/2 term for D4 -62.859* 53.324 4.836 18.241 -13.752 -139.532**

(21.527) (23.587) (25.676) (22.821) (23.822) (23.942)EL, n−1/2 term for D4 -74.385** 28.984 -11.813 12.469 -25.501 -118.378**

(22.215) (24.955) (26.423) (23.399) (24.688) (22.461)ELprofile, n−1/2 term for D4 -109.657** -5.340 -28.536 -19.028 -73.965 -168.465**


49

Right limit, θ1 θ2 θ3 θ4 θ5 θ6nominal 95% b/se b/se b/se b/se b/se b/seULS, leading term for D1 -2.886 -2.070 -1.662 2.401 -1.260 0.682

(2.168) (2.329) (2.153) (2.101) (2.451) (2.131)MLE, leading term for D1 -1.186 -2.282 -1.246 3.390 -1.802 3.682

(2.236) (2.322) (2.155) (2.250) (2.493) (2.224)AEL, leading term for D1 2.557 0.965 3.232 6.491 5.812 10.436**

(2.614) (2.815) (2.477) (2.610) (2.959) (2.724)EL, leading term for D1 2.220 1.004 3.684 6.266 4.895 9.267**

(2.468) (2.614) (2.255) (2.477) (2.830) (2.506)ULS, leading term for D2 -0.087 0.297 1.335 -12.161** -9.814** -12.479**

(1.012) (0.973) (0.927) (1.590) (1.559) (1.585)MLE, leading term for D2 -0.036 1.566 2.173 -9.295** -7.298** -9.456**

(1.054) (1.000) (0.968) (1.612) (1.578) (1.603)AEL, leading term for D2 4.265** 6.227** 5.405** -10.675** -8.031** -7.415**

(1.280) (1.225) (1.228) (1.735) (1.718) (1.709)EL, leading term for D2 2.963 4.014** 4.749** -10.714** -7.885** -8.235**

(1.191) (1.128) (1.134) (1.731) (1.713) (1.710)ELprofile, leading term for D2 4.283 7.135** 8.190** -6.344 -1.863 -1.562

(1.870) (1.702) (1.766) (2.669) (2.632) (2.614)ULS, leading term for D3 2.842 0.180 -0.668 0.303 -0.247 0.412

(2.049) (2.090) (2.011) (2.430) (2.463) (2.435)MLE, leading term for D3 1.207 0.586 0.422 -0.871 0.273 -0.501

(2.053) (2.088) (2.065) (2.459) (2.545) (2.474)AEL, leading term for D3 7.984* 5.285 3.401 1.964 8.754* 8.105*

(2.649) (2.643) (2.550) (2.950) (2.982) (3.029)EL, leading term for D3 8.369** 3.977 3.461 2.641 7.622* 7.494

(2.412) (2.453) (2.340) (2.823) (2.895) (2.918)ULS, leading term for D4 -2.049 -6.400** 7.232** 7.044** 6.443** 6.896**

(1.558) (1.553) (0.666) (0.630) (0.643) (0.537)MLE, leading term for D4 -1.416 -4.438* 7.626** 7.281** 6.326** 6.894**

(1.611) (1.598) (0.727) (0.650) (0.633) (0.537)AEL, leading term for D4 2.133 0.851 13.065** 11.781** 11.675** 12.180**

(1.901) (1.932) (1.091) (0.988) (1.028) (1.046)EL, leading term for D4 1.906 0.723 10.927** 10.445** 9.741** 10.303**

(1.760) (1.800) (0.968) (0.890) (0.903) (0.889)ELprofile, leading term for D4 3.784 2.538 12.703** 11.446** 10.591** 11.171**

(1.817) (1.841) (1.045) (0.943) (0.963) (0.937)

50

Right limit, θ1 θ2 θ3 θ4 θ5 θ6nominal 95% b/se b/se b/se b/se b/se b/seULS, n−1/2 term for D1 -7.265 -22.133 -33.083 -88.923 -50.925 -54.936

(33.472) (36.738) (34.034) (35.244) (39.401) (34.685)MLE, n−1/2 term for D1 -42.341 -22.847 -41.340 -123.472* -47.598 -118.474*

(35.722) (36.570) (34.318) (38.312) (39.932) (37.916)AEL, n−1/2 term for D1 -174.233** -161.983** -176.964** -241.972** -272.831** -330.926**

(44.241) (46.840) (42.380) (45.525) (50.477) (48.387)EL, n−1/2 term for D1 -139.221** -114.938* -145.154** -207.402** -218.434** -263.957**

(41.585) (43.286) (38.774) (43.279) (48.180) (44.796)ULS, n−1/2 term for D2 -53.624** -51.441** -64.679** -166.689** -193.378** -161.234**

(16.156) (15.546) (15.045) (24.937) (24.633) (24.846)MLE, n−1/2 term for D2 -65.399** -76.859** -90.113** -242.233** -256.210** -230.362**

(16.915) (16.306) (15.977) (25.504) (25.122) (25.343)AEL, n−1/2 term for D2 -232.752** -248.673** -242.042** -454.669** -483.383** -511.645**

(21.479) (20.915) (20.864) (26.697) (26.521) (26.375)EL, n−1/2 term for D2 -171.152** -166.090** -186.463** -425.972** -463.020** -476.136**


(31.792) (29.875) (31.220) (42.085) (41.657) (41.477)ULS, n−1/2 term for D3 -78.282 -35.333 -26.424 -84.945 -85.595 -111.700*

(33.509) (32.887) (31.239) (38.980) (39.396) (39.378)MLE, n−1/2 term for D3 -65.004 -51.408 -59.484 -96.955 -129.682* -127.918*

(32.989) (33.203) (32.914) (39.348) (41.229) (39.998)AEL, n−1/2 term for D3 -275.489** -221.239** -180.139** -277.185** -403.373** -423.857**

(45.076) (44.237) (42.280) (48.456) (50.126) (50.650)EL, n−1/2 term for D3 -230.518** -162.060** -145.405** -246.449** -348.222** -367.576**

(41.589) (40.794) (38.869) (46.584) (48.715) (49.033)ULS, n−1/2 term for D4 -46.863 34.674 -64.706** -63.416** -52.959** -50.827**

(24.718) (23.089) (13.287) (12.723) (12.446) (11.171)MLE, n−1/2 term for D4 -65.015 0.094 -74.745** -67.900** -51.037** -50.187**

(25.843) (24.570) (14.423) (13.172) (12.215) (11.154)AEL, n−1/2 term for D4 -216.457** -187.651** -217.169** -185.351** -186.000** -190.698**

(31.734) (31.913) (22.261) (20.414) (20.887) (21.256)EL, n−1/2 term for D4 -164.325** -141.934** -160.581** -149.686** -137.526** -138.384**



51

Two-sided, λ2 λ3 λ4 λ5 λ6 φ11

nominal 90% b/se b/se b/se b/se b/se b/seULS, leading term for D1 -1.971 -3.038 -3.871 -2.696 -1.806 -0.840

(1.735) (1.726) (1.816) (1.765) (1.741) (1.782)MLE, leading term for D1 0.393 -1.245 -2.449 0.276 1.283 1.323

(1.408) (1.392) (1.537) (1.452) (1.376) (1.493)AEL, leading term for D1 0.695 -1.253 -0.618 0.872 0.405 1.182

(1.405) (1.437) (1.557) (1.477) (1.460) (1.569)EL, leading term for D1 0.590 -1.225 -0.232 1.190 0.666 1.693

(1.415) (1.455) (1.572) (1.489) (1.474) (1.541)ULS, leading term for D2 1.990* 2.619** 2.897** 2.036* 2.826** 2.501**

(0.724) (0.718) (0.725) (0.724) (0.718) (0.730)MLE, leading term for D2 0.060 0.960 0.849 -0.248 0.372 0.512

(0.663) (0.651) (0.670) (0.666) (0.667) (0.667)AEL, leading term for D2 0.089 0.527 -0.866 -1.572 -0.477 0.770

(0.694) (0.690) (0.741) (0.739) (0.723) (0.703)EL, leading term for D2 0.054 0.829 -0.625 -1.749 -0.500 0.903

(0.708) (0.701) (0.752) (0.749) (0.735) (0.701)ELprofile, leading term for D2 -9.644** -9.565** -14.080** -16.043** -15.761** -18.919**

(1.506) (1.499) (1.619) (1.630) (1.615) (1.645)ULS, leading term for D3 -6.545** -8.353** -6.040** -9.406** -6.604** -9.671**

(1.645) (1.701) (1.659) (1.693) (1.634) (1.749)MLE, leading term for D3 2.190 0.233 2.506 -0.367 2.059 -1.445

(1.310) (1.393) (1.346) (1.386) (1.308) (1.466)AEL, leading term for D3 1.518 -0.718 1.523 0.106 2.662 -0.727

(1.404) (1.483) (1.451) (1.431) (1.383) (1.542)EL, leading term for D3 2.180 -0.838 1.498 0.085 2.756 -0.364

(1.426) (1.503) (1.467) (1.468) (1.404) (1.513)ULS, leading term for D4 2.825 -0.324 -0.133 0.275 1.770 4.078**

(1.097) (1.170) (1.177) (1.144) (1.145) (1.075)MLE, leading term for D4 0.959 -3.009* -2.277 -2.084 -0.478 2.011

(0.950) (1.061) (1.047) (1.005) (1.018) (0.942)AEL, leading term for D4 0.674 -3.493* -2.488 -1.943 -0.559 2.709*

(0.981) (1.085) (1.085) (1.029) (1.014) (0.953)EL, leading term for D4 1.041 -3.299* -2.242 -1.937 -0.880 2.450

(1.002) (1.105) (1.106) (1.045) (1.043) (0.966)ELprofile, leading term for D4 -10.707** -13.771** -12.154** -13.651** -12.903** -18.032**

(1.413) (1.439) (1.441) (1.429) (1.412) (1.509)

52

Two-sided, λ2 λ3 λ4 λ5 λ6 φ11

nominal 90% b/se b/se b/se b/se b/se b/seULS, n−1 term for D1 -0.668 -0.273 -0.378 -0.798 -0.566 -0.977*

(0.357) (0.342) (0.365) (0.363) (0.357) (0.378)MLE, 1000 · n−1 term for D1 0.123 0.552 0.515 -0.294 0.052 -0.280

(0.261) (0.227) (0.269) (0.286) (0.260) (0.303)AEL, 1000 · n−1 term for D1 -0.131 0.413 -0.384 -0.516 -0.183 -0.569

(0.270) (0.248) (0.311) (0.302) (0.286) (0.328)EL, 1000 · n−1 term for D1 -0.187 0.325 -0.597 -0.672 -0.346 -0.650

(0.274) (0.257) (0.322) (0.311) (0.296) (0.326)ULS, 1000 · n−1 term for D2 -1.241** -1.414** -1.514** -1.304** -1.490** -1.514**

(0.155) (0.157) (0.159) (0.156) (0.158) (0.160)MLE, 1000 · n−1 term for D2 -0.035 -0.191 -0.281 0.017 -0.202 -0.235

(0.124) (0.125) (0.131) (0.123) (0.128) (0.129)AEL, 1000 · n−1 term for D2 -0.392* -0.457** -0.577** -0.406* -0.623** -0.716**

(0.137) (0.137) (0.148) (0.145) (0.145) (0.144)EL, 1000 · n−1 term for D2 -0.559** -0.650** -0.796** -0.485** -0.770** -0.743**

(0.142) (0.143) (0.153) (0.147) (0.149) (0.144)ELprofile, 1000 · n−1 term for D2 -1.996** -2.135** -3.183** -2.844** -2.597** -1.886**

(0.306) (0.305) (0.318) (0.319) (0.318) (0.321)ULS, 1000 · n−1 term for D3 -0.996* -0.818 -1.196** -0.526 -1.050* -0.566

(0.322) (0.326) (0.331) (0.314) (0.320) (0.328)MLE, 1000 · n−1 term for D3 -0.155 0.057 -0.329 0.143 -0.295 0.340

(0.254) (0.259) (0.270) (0.251) (0.256) (0.263)AEL, 1000 · n−1 term for D3 -0.340 -0.233 -0.510 -0.321 -0.806* -0.163

(0.279) (0.286) (0.295) (0.278) (0.291) (0.300)EL, 1000 · n−1 term for D3 -0.610 -0.375 -0.572 -0.434 -0.912* -0.128

(0.294) (0.294) (0.300) (0.290) (0.298) (0.293)ULS, 1000 · n−1 term for D4 -1.833** -1.480** -1.541** -1.603** -1.893** -2.144**

(0.245) (0.250) (0.252) (0.247) (0.253) (0.247)MLE, 1000 · n−1 term for D4 -0.083 0.326 0.274 0.280 -0.122 -0.449

(0.183) (0.192) (0.192) (0.179) (0.197) (0.192)AEL, 1000 · n−1 term for D4 -0.340 0.201 -0.035 -0.000 -0.205 -0.902**

(0.195) (0.200) (0.207) (0.193) (0.197) (0.204)EL, 1000 · n−1 term for D4 -0.588* -0.015 -0.271 -0.113 -0.360 -0.884**



53

Two-sided, θ1 θ2 θ3 θ4 θ5 θ6noiminal 90% b/se b/se b/se b/se b/se b/seULS, leading term for D1 -3.590 -2.716 -3.059 -2.306 -3.989 -1.468

(1.843) (1.840) (1.812) (1.822) (1.900) (1.790)MLE, leading term for D1 -0.743 -0.070 -0.372 0.380 -2.047 2.820

(1.553) (1.514) (1.490) (1.553) (1.636) (1.454)AEL, leading term for D1 -0.719 -0.277 0.694 0.186 0.053 3.805

(1.664) (1.696) (1.519) (1.694) (1.812) (1.648)EL, leading term for D1 -0.369 1.431 2.977 0.283 -0.043 4.329*

(1.649) (1.612) (1.482) (1.665) (1.796) (1.563)ULS, leading term for D2 1.246 2.491** 2.884** -10.516** -9.050** -10.799**

(0.768) (0.740) (0.734) (0.982) (0.967) (0.982)MLE, leading term for D2 -1.158 0.782 0.524 -12.014** -10.606** -11.957**

(0.723) (0.688) (0.687) (0.971) (0.955) (0.968)AEL, leading term for D2 -0.730 1.005 0.454 -20.600** -18.598** -19.001**

(0.804) (0.771) (0.774) (1.060) (1.050) (1.049)EL, leading term for D2 -1.005 0.329 0.732 -20.034** -17.886** -19.122**

(0.789) (0.754) (0.752) (1.057) (1.046) (1.049)ELprofile, leading term for D2 -1.481 1.700 0.833 -19.357** -16.699** -18.441**

(1.276) (1.180) (1.227) (1.657) (1.634) (1.643)ULS, leading term for D3 -6.613** -8.820** -9.679** -7.026** -9.627** -9.950**

(1.699) (1.752) (1.766) (1.746) (1.843) (1.839)MLE, leading term for D3 1.232 -0.021 -0.547 0.196 -1.463 -2.331

(1.402) (1.461) (1.488) (1.503) (1.645) (1.624)AEL, leading term for D3 2.371 0.479 -0.748 -2.318 -0.428 -1.965

(1.631) (1.671) (1.647) (1.771) (1.818) (1.876)EL, leading term for D3 3.413 -0.112 0.511 -0.834 -0.098 -1.343

(1.549) (1.646) (1.582) (1.717) (1.794) (1.825)ULS, leading term for D4 1.371 -0.771 -18.650** -18.146** -19.381** -17.247**

(1.162) (1.179) (1.478) (1.467) (1.495) (1.461)MLE, leading term for D4 -1.132 -2.032 -20.515** -20.349** -21.256** -19.883**

(1.059) (1.057) (1.389) (1.381) (1.407) (1.376)AEL, leading term for D4 -0.927 -1.293 -14.098** -12.763** -14.801** -14.301**

(1.159) (1.181) (1.353) (1.318) (1.373) (1.355)EL, leading term for D4 -0.541 -0.917 -18.582** -18.241** -19.021** -17.679**

(1.118) (1.145) (1.417) (1.397) (1.418) (1.405)ELprofile, leading term for D4 -0.785 -1.246 -24.518** -22.874** -23.754** -23.340**

(1.180) (1.193) (1.506) (1.484) (1.500) (1.500)

54

Two-sided, θ1 θ2 θ3 θ4 θ5 θ6noiminal 90% b/se b/se b/se b/se b/se b/seULS, n−1 term for D1 -0.949 -0.874 -1.015* -1.274** -0.915 -1.130*

(0.381) (0.383) (0.377) (0.385) (0.393) (0.379)MLE, 1000 · n−1 term for D1 -0.395 -0.171 -0.341 -0.782 -0.085 -0.878*

(0.310) (0.297) (0.294) (0.325) (0.318) (0.316)AEL, 1000 · n−1 term for D1 -1.126* -1.211** -1.099** -1.584** -1.915** -2.483**

(0.353) (0.364) (0.326) (0.371) (0.398) (0.386)EL, 1000 · n−1 term for D1 -1.149* -1.092* -1.459** -1.425** -1.711** -2.083**

(0.352) (0.350) (0.336) (0.363) (0.393) (0.368)ULS, 1000 · n−1 term for D2 -1.555** -1.667** -1.719** -2.563** -2.738** -2.518**

(0.166) (0.163) (0.163) (0.196) (0.195) (0.196)MLE, 1000 · n−1 term for D2 -0.263 -0.496** -0.441* -1.703** -1.777** -1.625**

(0.140) (0.138) (0.136) (0.192) (0.190) (0.191)AEL, 1000 · n−1 term for D2 -1.493** -1.699** -1.591** -3.277** -3.509** -3.659**

(0.169) (0.167) (0.166) (0.199) (0.198) (0.197)EL, 1000 · n−1 term for D2 -1.171** -1.212** -1.349** -3.081** -3.378** -3.388**


(0.272) (0.263) (0.271) (0.315) (0.314) (0.314)ULS, 1000 · n−1 term for D3 -1.319** -1.016* -0.954* -1.631** -1.440** -1.502**

(0.341) (0.341) (0.341) (0.356) (0.367) (0.367)MLE, 1000 · n−1 term for D3 -0.308 -0.173 -0.272 -0.714 -0.841 -0.710

(0.276) (0.282) (0.289) (0.305) (0.335) (0.324)AEL, 1000 · n−1 term for D3 -1.924** -1.490** -1.107* -2.000** -2.840** -2.944**

(0.359) (0.356) (0.342) (0.374) (0.393) (0.400)EL, 1000 · n−1 term for D3 -1.689** -1.134** -1.074* -1.934** -2.577** -2.545**

(0.344) (0.344) (0.331) (0.367) (0.388) (0.390)ULS, 1000 · n−1 term for D4 -2.209** -1.639** 0.637 0.588 0.476 0.539

(0.257) (0.252) (0.278) (0.276) (0.283) (0.276)MLE, 1000 · n−1 term for D4 -0.386 -0.039 2.353** 2.374** 2.288** 2.376**

(0.210) (0.201) (0.228) (0.224) (0.235) (0.224)AEL, 1000 · n−1 term for D4 -1.560** -1.449** 1.217** 1.160** 1.109** 1.231**

(0.247) (0.250) (0.242) (0.234) (0.248) (0.242)EL, 1000 · n−1 term for D4 -1.207** -1.196** 1.543** 1.590** 1.509** 1.416**

(0.236) (0.241) (0.252) (0.244) (0.251) (0.250)ELprofile, 1000 · n−1 term for D4 -1.721** -1.639** 1.415** 1.269** 1.193** 1.284**


55

Table 8: Simulation results for the bootstrap approximation of Bartlett correction.

Sample sizes 100 200 500 1000Number of Monte Carlo samples 400 600 600 300EL 0.275 0.117 0.051 0.065

0.000 0.000 0.079 0.139FIML 0.428 0.460 0.522 0.476

0.000 0.000 0.000 0.000Bartlett corrected, B = 200 0.062 0.044 0.044 0.047

0.078 0.178 0.179 0.496Bartlett corrected, B = 50 0.067 0.074 0.038 0.036

0.043 0.126 0.322 0.805Direct bootstrap 0.085 0.024 0.033 0.053

0.004 0.860 0.496 0.331Reported entries are Kolmogorov-Smirnov distances D and p-values ofthe test against the asymptotic χ2

9 distribution, except the direct boot-strap estimate which is referred to the uniform distribution.

56

empirical likelihood estimation and testing in covariance structure...

Documents