information-theoretic distribution tests with symmetry 2004

8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

1/22

Information-Theoretic Distribution Tests with

Application to Symmetry and Normality

Thanasis Stengos and Ximing Wu

March 4, 2004

Abstract

We derive distribution free tests based on the Maximum Entropy densities to test

the null hypotheses of symmetry and normality. The proposed tests are derived from

maximizing the differential entropy subject to moment constraints. By exploiting the

equivalence between Maximum Entropy and Maximum Likelihood estimates of the

exponential families, we can use the conventional Likelihood Ratio, Wald and Lagrange

Multiplier testing principles in the maximum entropy framework. Monte Carlo evidence

suggests that they have desirable small sample properties, when compared with the

standard parametric tests used in the literature, such as the standardized skewness

coefficient test for symmetry or the Jarque and Bera (1980) test for normality. We

apply the proposed symmetry tests to test the nominal wage rigidity hypothesis of

wage determination process.

Department of Economics, University of Guelph. Email: [email protected]. We want to thankseminar participants at Penn State University for comments. Financial support from SSHRC of Canada isgreatfully acknowledged.

Corresponding author. Department of Economics, University of Guelph. Ontario, Canada, N1G 2W1.Email: [email protected]; Tel: (519) 824-4120, ext. 53014; Fax: (519) 763-8497.

JEL code: C1, C12, C16; Key words: distribution test, maximum entropy, symmetry, normality.


2/22

1 Introduction

There are many parametric and nonparametric tests proposed in the literature to test the

hypothesis that a distribution is symmetric about a known median. For example, among

the most widely used parametric tests there is the standardized skewness coefficient test of

Gupta (1967) and in the category of nonparametric tests those proposed by Fan and Gencay

(1995) and Ahmad and Li (1997). The latter tests are distribution free based on local kernel

estimation. They offer an advantage over the more traditional parametric tests such as the

standardized skewness coefficient test in that they are consistent, since they are based on

the whole distribution and not simply a part of it. However, as they are based on local

smoothing methods these tests depend on the choice of bandwidth that is used in density

estimation, something that may affect their power and size in finite samples.In this paper we introduce two alternative distribution free tests based on the maximum

entropy (ME) densities. Unlike the above mentioned kernel based nonparametric tests, the

proposed tests do not depend on bandwidth selection. Our proposed tests differ from those

introduced by Imbens, Spady and Johnson (1998) that minimize the discrete Kullback-

Leibler information criterion (cross entropy) or other Cresie-Read family statistics subject

to moment constraints. They are derived from maximizing differential entropy subject to

moment constraints. By exploiting the equivalence between ME and ML estimates for the

exponential families, we can use the conventional likelihood ratio (LR), Wald and Lagrange

Multiplier (LM) testing principles in the maximum entropy framework. Hence, our tests

share the optimality properties of the standard maximum likelihood based tests. We show

that the ME approach leads to simple yet powerful tests of symmetry which have desirable

small sample properties. One of the tests is asymptotically equivalent to the conventional

skewness test but is always more powerful. We also derive a normality test with similar

properties that compares favorably with the existing JB test proposed in Jarque and Bera

(1980) and Bera and Jarque (1981).

The paper is organized as follows. In the next section we present the information the-

oretic framework on which we base our analysis. We then proceed to derive our symmetry

and normality tests and discuss their properties. In the following section we present some

simulation results and then we present the results of an empirical application using a unique

Canadian wage change contract data. Finally, before we conclude, we discuss some possible

1


3/22

extensions.

2 Information-theoretic distribution test

The ME principle states that among all the distributions that satisfy certain moment con-

straints, we should choose the one that maximizes Shannons information entropy. According

to Jaynes (1957), the ME distribution is uniquely determined as the one which is maximally

noncommittal with regard to missing information, and that it agrees with what is known,

but expresses maximum uncertainty with respect to all other matters.

The ME (maxent) density is obtained by maximizing the entropy subject to some moment

constraints. Let X be a random variable distributed with a probability density function (pdf)

f(0) , and X1, X2,...,Xn be an i.i.d. random sample of size n generated according to f(0) .

We also let be the estimate of0 based on a particular sample realization. We maximizethe entropy

W =

f(x,)log f(x,) dx,

subject to

f(x, ) dx = 1,gk (x) f(x,) dx = k, k = 1, 2, . . . , K ,

where k = 1n ni=1 gk (xi) , and gk (x) is continuous and at least twice differentiable. Thesolution takes the form

f(x, ) = exp

0

Kk=1

kgk (x)

. (1)

To ensure f(x, ) integrates to one, we set

0 = log

exp

Kk=1

kgk (x)

dx

.

The maximized entropy W = 0+K

k=1 kk. The maxent density is of the generalized expo-nential family and can be completely characterized by the moments Egk (x) , k = 1, 2, . . . , K .

2


4/22

We call these moments characterizing moments, which are the sufficient statistics of the

maxent density. A wide range of distributions belong to this family. For example, the Pear-

son family and its extensions described in Cobb et al. (1982), which nests the normal, beta,

gamma and inverse gamma densities as special cases, are all maxent densities.

In general, there is no analytical solution for the maxent density, and nonlinear optimiza-

tion is required (Zellner and Highfield (1988), Ornermite and White (1999) and Wu (2003)).

We use Lagranges method to solve for this problem by iteratively updating

t+1 = t H1b,

where for the (t + 1)th stage of the updating, bk =

gk (x) f

x, t

dx

k and the Hessian

matrix H takes the form

Hk,j =

gk (x) gj (x) f

x, t

dx.

The positive-definitiveness of the Hessian ensures the existence and uniqueness of the solu-

tion.1

Given Eq. (1), we can also estimate f(x,) using MLE. The maximized log-likelihood

l =ni=1

log fxi, = ni=1

0 + Kk=1

kgk (xi)= n

0 + Kk=1

kk

= nW.

Therefore, when the distribution is of the generalized exponential family, MLE and ME are

equivalent. Moreover, they are also equivalent to a method of moments (MM) estimator.

This ME/MLE/MM estimator only uses the sample characterizing moments.

1Let = [0, 1, . . . , K ] be a non-zero vector and g0 (x) = 1, we have

H=

Kk=0

Kj=0

kj

gk (x) gj (x) f(x,) dx

=

Kk=0

kgk (x)

2f(x, ) dx > 0.

Hence, H is positive-definite.

3


5/22

Although the MLE and ME are equivalent in our case, there are some conceptual dif-

ferences. For MLE, the restricted estimates are obtained by imposing some constraints on

the parameters. In contrast, for ME, the dimension of the parameter is determined by the

number of moment restrictions imposed: the more moment restrictions, the more complex

the distribution. To reconcile these two methods, we note that a ME estimate with the first

m moment restrictions has a solution of the form

f() = exp

0

mk=1

kgk (x)

,

which implicitly sets j, j = m + 1, m + 2, . . . , to be zero. Instead, when we impose more

moment restrictions, say, gm+1 (x) f() dx = m+1, we let the data choose the appropriatevalue of m+1.2 In this sense, the estimate with more moment restrictions is in fact lessrestricted, or more flexible. The ME and MLE share the same objective function (up to

a proportion) which is determined by the moment restrictions of the maximum entropy

problem. Therefore, we can regard the ME approach as a method of model selection, which

generates a MLE solution.

Consider a M dimension parameter space M, and we want to test if m, a subspaceof M, m M. Because of the equivalence between the ME and MLE, we can use the

traditional LR, Wald and LM principles to construct test statistics.3

For j = m,M, let j

be the MLE estimates in j , lj and Wj be their corresponding log-likelihood and maximized

entropy, we have

f(m)log f(m) dx =

mk=0

m,kgk (x)

f(m) dx

=mk=0

m,k

gk (x) f(m) dx =

mk=0

m,k

gk (x) f(M) dx

= mk=0

m,kgk (x) f(M) dx = f(M)log f(m) dx.

2The only case that m+1 = 0 is when the moment restriction

gm+1f(m) = m+1 is not binding, orthe (m + 1)th moment is identical to its prediction based on the maxent density f(m) from the first mmoments. In this case, the (m + 1)th moment does not contain any additional information that will furtherreduce the entropy.

3Imbens et al. (1998) proposes similar tests in the information-theoretic generalized empirical likelihoodframework.

4


6/22

The fourth equality follows because the first m moments of f(m) are identical to those of

f(M) . Consequently, the log-likelihood ratio

R =

2 (lm

lM) =

2n (Wm

WM)

= 2n

f(m)log f(m) dx

f(M)log f(M) dx

= 2n

f(M)log f(m) dx

f(M)log f(M) dx

= 2n

f(M)log

f(M)

f(m)dx,

which is the Kullback-Leibler distance statistic between f(M) and f(m) multiplied by

twice of the sample size. Therefore, if f(m) is the true model and nested in f(M) , the

quasi-MLE estimate f(M) is equivalent to the estimate that minimizes the Kullback-Leibler

statistic between f(M) and f(m) , as shown in White (1982).

If we partition u = (m, Mm) = (1u, 2u) for the unrestricted model and similarly

r = (1r, 0) for the restricted model, then the score function

S(x,m,Mm) =

ln f

m(x|m, Mm)

ln f

Mm(x|m, Mm)

,

and the Hessian

H(x, m, Mm) =

2 ln fmm (x|m,Mm) 2 ln fmMm (x|m,Mm)2 ln f

Mm

m

(x|m, Mm) 2 ln fMm

Mm

(x|m, Mm)

.If we partition similarly the inverse of the information matrix I = E(H) as

I

1 = I11 I12I21 I22

,then the Wald test is defined as

W D = n2u I2212u,

5


7/22

whereas the Lagrange Multiplier test is defined as

LM =1

n

ni=1

S

xi,

1r, 0

I22

ni=1

S

xi,

1r, 0

.

All the tests are asymptotically equivalent and distributed as 2 with (M m) degrees offreedom.

3 Test of Symmetry and Normality

In this section, we use the proposed ME method to obtain test statistics for symmetry and

normality. Since the LR and Wald procedures require the estimation of the ME density,

which in general has no analytical solution and is computationally quite involved, we focus

on the LM test, which reduces surprisingly to a simple functional form.

3.1 Symmetry Test

As before, let X be a random variable distributed with a pdf f0, and X1, X2,...,Xn be an

i.i.d. random sample of size n generated according to f0. The standard test of skewness

takes the form

b = n 36 64 + 9 ,wherej = 1n ni=1 xji . The test statistic b is asymptotically distribution as N(0, 1). Althoughoriginally proposed under the assumption of normality, Gupta (1967) shows that the test

is also applicable without this assumption provided the underlying distribution has finite

moments up to order six.

3.1.1 Two Alternative Maxent Density Estimators.

Alternatively, we can approximate f0 by the maxent densities and then use the LM test

proposed above. In this paper we consider two simple, yet flexible functional forms. If we

approximate f0 using the maxent density subject to the first four arithmetic moments, the

solution takes the form

f1 = exp

4k=0

kxk

.

6


8/22

This exponential quartic form was first discussed by Fisher (1922) and studied in the maxi-

mum entropy framework in Zellner and Highfield (1988), Ornermite and White (1999) and

Wu (2003).

However, instead of using the high order sample moments, whose small sample properties

may be unreliable, we can use generalized moments of sin (x) and cos(x) .4 The resulting

density takes the form

f2 = exp

2k=0

kxk 3 sin(x) 4 cos(x)

.

This is in the spirit of the orthogonal trigometric polynomials method of Cencov (1962).

Since sine and cosine terms are bounded in [

1, 1] , these two moments always exist and

are not sensitive to outliers. An additional advantage of f2 is its numerical stability in

estimation. For large values ofx, f1, an exponential function where the exponent is quartic,

may encounter a numerical overflow with badly chosen initial values ofs. However, f2 does

not run into this problem because of the bounded range of sin (x) and cos(x) .

Pearson uses

f (x) =(x a) f

b0 + b1x + b2x2

to characterize the Pearson family distributions (Stuart and Ord, 1994). This family includes

the exponential, normal, beta of first and second kind, gamma distributions as special cases.

Cobb et al. (1983) further generalizes the Pearson distributions using

f (x)

f(x)= g (x)

v (x),

where g (x) is the so called shape polynomial of x up to order K and v (x) takes the form

of 1, x , x2 or x (1 x) . The maxent densities are even more flexible than Cobb et al. family.

Denote f(x) = exp(h (x)) , generally no restrictions are imposed on h (x) except that itis continuously differentiable. Following Cobb et al. (1983)s framework, approximating

the differentiable density f(x) by exp(h (x)) is equivalent to approximating f (x) /f(x) by

4When the domain off(x) = expKj=0 jxj is the real line, K should be an even number and k > 0

to ensure that f(x) is a proper density function as we require lim |x| f(x) = 0. However, for f2, we canhave 3 = 0 and 4 = 0 as the even function x2 is the dominant term in f2. For the test of symmetry basedon the third term, the choice of the last characterizing moments is immaterial.

7


9/22

h (x) . The power series and Fourier series are two commonly used approximation methods.

One can see that f1 corresponds to a power series approximation of f, while f2 is mixture

of both. Gallant (1981) notes that including a linear term helps to considerably reduce the

number of sine/cosine terms in a Fourier approximation of a non-periodic function. If a

quadratic term is included as well, then curvature restriction may be imposed. Moreover,

sine and cosine can be expressed as infinite power serieses

sin(x) =n=0

(1)n x2n+1

(2n + 1)!= x x

3

3!+

x5

5! . . .

cos(x) =n=0

(1)n x2n

(2n)!= 1 x

2

2!+

x4

4! . . . .

Therefore, the exponent of f2 is essentially an infinite power series with some coefficient

restrictions.

We proceed to shed some light into how f1 and f2 approximate the underlying distribu-

tions by conducting two experiments. In the first one, we estimate f1 and f2 from a random

sample of standard normal variates with sample size 50. We repeat the experiment 1,000

times. Denote fij as the estimate from the jth experiment, i = 1, 2. Because the experimentsare independent, we define the average estimate as

fi =

1000j=1

fij1

1000

= exp

1

1000

1000j=1

ikjgik (x) , i = 1, 2,where s are the estimated Lagrange multipliers. For comparison, we also calculate thetwo-term Edgeworth expansion of each experiment.5 The average estimate is also defined

similarly as the geometric average of each estimate. Figure 1 plots the average estimated

maxent density and Edgeworth expansion, together with the theoretical density. The plot

5The Edgeworth expansion is obtained as

f(x) =

1 +

1

6

1H3 (x) +

1

24(2 3) H4 (x) + 1

721H6 (x)

(x) ,

where

1 and 2 is the coefficient of skewness and kurtosis, Hi is the ith order Hermite polynomial and (x) is the standard normal density function. Hence, the average estimate is defined as

f(x) =1

n

ni=1

1 +

1

6

1 H3 (x) + 124

2 3 H4 (x) + 172H6 (x) (x) .

8


10/22

suggests that both maxent estimates approximate the underlying distributions well. The

Edgeworth expansion appears to be slightly closer to the underlying distribution. However,

the Edgeworth expansion is not a proper density estimate and may have negative values

for some x. In fact, when we evaluate the Edgeworth expansion on the range [

4, 4] , in

787 out of the 1,000 experiments we encounter negative values, which are replaced by some

arbitrarily small positive numbers (1e-6 in this study). Moreover, the Edgeworth expansion

usually does not integrate to unity. In contrast, the maxent estimates, by construction, are

proper densities that are always positive and integrate to one.

We run a second experiment on 2 with 3 degrees of freedom, and the results are plotted

in Figure 2. One can see that when the underlying distributions are not close to normal, the

Edgeworth expansion misses badly. In contrast, the maxent densities are considerably closer

to f0. Moreover, we can exploit the fact the domain of the distribution is positive to improve

the approximation by restricting x to be the positive half line or using a log(x) term in the

maxent density.6

3.1.2 The symmetry test statistics based on f1 and f2.

We do not assume normality for our symmetry test. Given the maxent density f(x) , we can

test the assumption of symmetry by testing if the Lagrange Multipliers associated with the

moments that are odd functions, i.e., Eg (x) = Eg (x), are zero. For f1, the informationmatrix under symmetry takes the form

I =

1 0 1 0 4

0 1 0 4 0

1 0 4 0 6

0 4 0 6 0

4 0 6 0 8

.

6A detailed comparison between maxent estimates and the Edgeworth expansion is not pursued here anyfurther and is left as a topic of future research.

9


11/22

Under symmetry, the score function for f1 is S = n [0, 0, 0,3, 0] . Then the LM test forsymmetry, which is equivalent to testing 3 = 0, is defined as

ts1 =1

n

SI1S =n

236 24 ,

where ts1 is distributed as 2 with one degree of freedom, and therefore is asymptotically

equivalent to b2. Both tests require the existence of the moments up to order 6. Comparing

with the conventional skewness test b =

n3/6 64 + 9, we note that24 64 9

which in turn implies that ts1

b2. The equality holds only when 4 = 3, where the firstfour moments coincide with that of standard normal distribution. Otherwise, ts1 always has

higher power than b under the alternative hypothesis of asymmetry.

The information matrix of f2 under symmetry is

I =

1 0 1 0 c

0 1 0 1,s 0

1 0 4 0 2,c

0 1,s 0 s2 0

c 0 2,c 0 c2

,

where 1,s = E[x sin(x)] , 2,c = E[x2 cos(x)] , s2 = E

sin(x)2

and c2 = E

cos(x)2

.

Now the test for symmetry is equivalent to testing if the Lagrange Multiplier for s =

Esin(x) is zero. Under the restriction of symmetry, the score function of f2 is S =

n [0, 0, 0,

s, 0] , and the test statistic is given by

ts2 = 1n

SI1S = n2ss2 21,s .where s = 1n ni=1 sin(xi) , 1,s = 1n ni=1 xi sin(xi) and s2 = 1n ni=1 sin(xi)2 . The teststatistic ts2 is also asymptotically distributed as

2 with one degree of freedom.

10


12/22


13/22

propose a test based on f2. Under normality, the information matrix of f2 takes the form

I =

1 0 1 0 e1

2

0 1 0 e1

2 0

1 0 3 0 0

0 e1

2 0 1e2

20

e1

2 0 0 0 1+e2

2

.

The score function under normality restriction is S = n

0, 0, 0,s,c e 12 and the newLM test is given by

tn2 = 1n

SI1S = n 22s1 2e1 e2 +

2c e 122

1 3e1 + e2 .Both tn1 and tn2 are asymptotically distributed as

2 with two degrees of freedom.

Under normality, the correlation of 3 and 4 is practically zero, so is that of s and

c. However, the correlation of |3| and 4 is 0.52, 0.41 and 0.32 for 10,000 random normalsamples with n = 50, 100 and 200, while the correlation of |

s| and

c is 0.33, 0.22 and 0.17

for n = 50, 100, 200. Therefore, we expect that tn2 converges to 22 distribution faster than

tn1 asymptotically.

4 Simulations

In this section, we use Monte Carlo simulations to assess the size and power of the proposed

tests. Following Randles et al. (1980) and Bai and Ng (2003), we consider well known

distributions such as the normal, the t and the 2, as well as distributions from the generalized

lambda family. The generalized lambda distribution, which nests a range of symmetric andasymmetric distributions, is defined in terms of the inverse of the cumulative distribution

F1 (u) = 1 +

u3 (1 u)4

/2, 0 < u < 1.

For both the symmetry and normality test, we consider the following symmetric and

asymmetric distributions:

S1: N(0, 1) ;

12


14/22

S2: t5;

S3: e1I(z 0.5) + e2I(z > 0.5) , where z U(0, 1) , e1 N(1, 1), and e2 N(1, 1) ;

S4: F1 (u) = 1 + u3 (1 u)4 /2, 1 = 0, 2 = 0.19754, 3 = 0.134915, 4 =0.134915;

S5: F1 (u) = 1 +

u3 (1 u)4

/2, 1 = 0, 2 = 1, 3 = 0.8, 4 = 0.8;

A1: lognormal: exp(e) , e N(0, 1) ;

A2: 23;

A3: exponential: ln (e) , e U(0, 1) ;

A4: F1 (u) = 1 +

u3 (1 u)4 /2, 1 = 0, 2 = 1, 3 = 1.4, 4 = 0.25;A5: F1 (u) = 1 +

u3 (1 u)4

/2, 1 = 0, 2 = 1, 3 = 0.0075, 4 = 0.03.

For each distribution, we draw 100,000 random samples of size n = 20, 50, 100, 200, 500

and 1,000 and calculate the symmetry and normality test statistics discussed above. There is

a large body of work on symmetry and normality test, for example, Gupta (1967), Randles

et al. (1980), Bera and Jarque (1981), DAgostino et al. (1990), Ahmad and Li (1996),

Bai and Ng (2003) and Bontemps and Meddahi (2003). Our results are comparable to and

in general more favorable than those of existing studies, especially when the sample size is

small. Table 1 reports the results of the symmetry test at the 5 per cent level of significance.

We report the results for sample size up to 200 as the power of the tests is nearly unity for

n 500 for all the asymmetric distributions. The first five rows for symmetric distributionsreport size and the next five for asymmetric distributions show powers. The size of the tests

remains stable across different sample sizes. For sample sizes ranging between 20 to 200,

which is frequently encountered in empirical work, the variation in size for b is no more than

2%, and that for ts1 and ts2 is less than 1%. We find that ts1 always rejects more often than

b, confirming the results of the previous section that ts1 b2. Also in general, ts2 tends toreject more often than the other two tests. On the other hand, we observe high power of

the proposed tests against asymmetric distributions and the power increases rapidly to unity

with sample size. Overall, ts1 and ts2 are considerably more powerful than b. For example,

13


15/22

the power of the two maxent tests against the lognormal distribution (A1) is often more

than twice that of the conventional skewness test when the sample size is small.

Table 2 reports the results for the normality tests at the 5 per cent significance level.

The first row reflects the size and the rest show the power of the tests. In terms of size the

two tests are comparable, however, the second test is generally more powerful. For example,

for distribution S3, the power of JB test is only 0.06 for n = 200, while that of tn2 is 0.18.

So is the case for distribution A4, where tn2 is considerably more powerful even when the

sample size is small. For distribution S4, whose first four moments coincide with those of

the standard normal distribution, both tests as expected have very low power. Randles et

al. (1980) and Bai and Ng (2003) also report very low power against S4. Even then though,

tn2 appears to be slightly more powerful than tn1.

5 Empirical Application

The wage determination process is one of the most studied areas of empirical labor economics.

At the root of this research lies the question of downward nominal rigidity, which, if prevalent,

would interfere with the functioning of the labor market preventing the efficient reallocation

of workers from low to high demand areas and inducing unemployment. Furthermore, if

nominal rigidity is more pervasive in some sectors than in others, similar shocks will havedifferent price and quantity effects. For example, if unions are more resistant to wage cuts

than the non-union sector real wage realignment may be more difficult to achieve in the union

sector. The same may be true in the public sector when compared with the private sector,

since the former is typically more unionized than the latter. There is an expanding research

area that studies this issue, see Christofides and Stengos (2003) for a recent review of the

current literature. The role of symmetry as a gage of the extent and significance of downward

nominal rigidity is the subject of considerable debate. Card and Hyslop (1997) note that

most wage determination models imply symmetry and use the portion of the distribution

above the median as the no-rigidity counterfactual in their own work. McLaughlin (2000)

provides reasons other than downward wage rigidity for believing that the wage-change

distribution may be asymmetric. Whatever the reasons, testing for symmetry or the lack

of it (asymmetry) is a question of paramount importance in empirical labor economics and

serves as a first stage in offering a better understanding of wage determination. The starting

14


16/22

point of most of these studies is the construction of wage-change histograms from data on

individual agents. However, visual evidence must be filtered through standard statistical

procedures in order to measure the quantitative significance of certain forces and effects.

In this context, consideration of the symmetry of the wage-change distribution has played

a very important role. A number of symmetry measures have been used in this literature.

These include the standardized skewness coefficient test b, the difference between the median

and mean, symmetrically differenced histograms and nonparametric symmetry tests, see

MacLaughlin (2000) and Christofides and Stengos (2003) for a discussion of these different

measures.

We apply our proposed LM test statistics based on the maxent densities f1 and f2 on

a set of Canadian public sector union wage contract data for different years in the 1978 to

1994 period. The union contract data used in this paper concern the annualized change

over the life of each contract in the base-wage rate agreed to by employers and unions in

the Canadian public sector. The data compiled initially by Labor Canada (now Human

Resources Development Canada), starts in 1978 and covers contracts involving employees

which range in number from 500 to nearly 80,000. Collective bargaining agreements often

implement an increase in the nominal wage rate at the beginning of the contract and then

again at yearly intervals. Contract duration (in years) and the number of nominal revisions

are correlated but not identical and, in any case, the main pattern is one of infrequent wageadjustments. A result of this fact is that, depending on what contract sub-period one chooses

to focus on, the implied nominal change could be made arbitrarily large or small. Under

these circumstances, and given that contract duration varies substantially across settlements,

a natural interval over which to measure wage adjustment is contract duration itself, taking

care to standardize across contracts by annualizing. In the contract data the private-public

sector distinction is made by the data collection agency itself and is based on the sources of

funding for the employer. Thus, health and education contracts are classified as belonging

to the public sector because these services are not market-provided.

The data span the high-inflation period of 1978-1982, the medium-inflation period of

1984-1989 and the low-inflation period of 1990-94. The contract data involves agreements

which do not contain Cost-of-Living-Allowance (COLA) clauses and, therefore, do not raise

the additional complication that some of the relevant wage flexibility may come from the

indexation clause. To avoid spurious correlations that may arise by pooling different years

15


17/22

together we look at the years at the beginning and end of each of the three mentioned periods.

The left panel of Table 3 provides some descriptive statistics for the data used in 1978, 1982,

1984, 1989, 1990 and 1994 and the right panel of Table 3 reports the test statistics for these

years. The first column is the conventional skewness test b, the next two are the maxent test

ts1 and ts2. As was expected from the findings in the simulations, the standardized skewness

coefficient test statistic b is less powerful than the information based test statistics ts1 and

ts2. In two out of six years b fails to reject the symmetry hypothesis whereas the other two

test statistics do. These results suggest that the standardized skewness coefficient test would

give unreliable results. The only times that this test statistic rejects the null hypothesis of

symmetry is in 1978 and 1983 which are high inflation years. It fails to reject the null during

the medium inflation years, whereas the two maxent density based statistics reject the null

during these years. During these years public sector union contracts displayed considerable

downward nominal rigidity in order to avoid real wage erosion due to inflation.

6 Extensions

In addition to their simplicity, a major advantage of the proposed tests is its generality. In

this section, we briefly discuss the some potential extensions of the tests.

First, we can increase the terms of polynomials in the maxent density. For f1, we canuse moments higher than order 4, and for f2, we can use sin(kx) and cos(kx) , k = 2, 3, . . . .

Moreover, we can mix high order polynomials and tri-polynomials. The derivation of the

test statistics is straightforward.

Second, we can use our method of normality test for other distributions. For example,

the gamma distribution can be characterized as a maxent distribution

f0 (x) = exp (

0

1x

2 log x) , x > 0.

Because Ex and Elog x are the sufficient statistics for gamma distribution, the presence of

any additional terms in the exponent of f0 (x) rejects the hypothesis that x is distributed

according to a gamma distribution. Let f(x) = exp0 1x 2 log x

Kk=3 kgk (x)

,

the test of k = 0 for k 3 is then the LM test for gamma distribution. The dis-cussions in previous section suggest that the natural candidates for gk (x) may include

16


18/22

xi+1, sin(ix) , cos(ix) , (log x)i+1 , sin(i log x) , cos(i log x) , i = 1, 2, 3, . . . , or their combina-

tions.

Third, we can generalize our tests to regression residuals within the framework of White

and McDonald (1980). For time series data or heteroskedastic data, we can use the approach

of Bai and Ng (2003) or Bontemps and Meddahi (2003). In general, for non-i.i.d. data, to test

the Lagrange Multipliers associated with sample moments gk (x) in the maxent density being

zero, we need to estimate a Heteroskedastic-Autocorrelation-Consistent (HAC) covariance

matrix for those moments.

7 Conclusion

In this paper we derive distribution free tests based on the maximum entropy densities to

test the null hypotheses of symmetry and normality. The proposed tests are derived from

maximizing differential entropy subject to moment constraints. By exploiting the equivalence

between ME and ML estimates for the exponential families, we can use the conventional LR,

Wald and LM testing principles in the maximum entropy framework. Hence, our tests share

the optimality properties of the standard ML based tests. We show that the ME approach

leads to simple yet powerful LM tests of symmetry and normality. The proposed tests have

desirable small sample properties, when compared with the standard parametric tests usedin the literature, such as the standardized skewness coefficient test for symmetry or the

Jarque and Bera (1980) test for normality. These properties are confirmed by our Monte

Carlo experiments and empirical application.

17


19/22


20/22

Gupta, M. K., 1967, An asymptotically nonparametric test of symmetry, Annals of Statistics,

38, 849-866.

Imbens, G. W., R. H. Spady and P. Johnson, 1998, Information theoretic approaches to

inference in moment condition models, Econometrica, 66, 333-357.Jarque, C. and A. Bera, 1980, Efficient tests for normality, homoscedasticity and serial

independence of regression residuals, Economics Letters, 6, 255-59.

Jaynes, E.T., 1957, Information theory and statistical mechanics, Physics Review, 106, 620-

630.

McLaughlin, K. J., 2000, Testing for asymmetry in the distribution of wage changes, Mimeo.

Ormoneit, D. and H. White, 1999, An efficient algorithm to compute maximum entropy

densities, Econometric Reviews, 18(2), 141-67.

Randles, R.H., M. A. Polocello and D. A. Wolfe, 1980, An asymptotically distribution-free

test for symmetry versus asymmetry, Journal of the American Statistical Association,

75, 168-172.

Shannon, C. E., 1949, The mathematical theory of communication (University of Illinois

Press: Urbana).

Stuart, A. and J. K. Ord, 1994, Kendalls advanced theory of statistics, Vol.1 (Edward

Arnold), 6th Edition.

Vasicek, O., 1976, A test for normality based on sample entropy, Journal of the Royal

Statistical Society, Series B, 38, 54-59.

White, H., 1982, Maximum likelihood estimation of misspecified models, Econometrica, 50,

1-26.

White, H. and G. M. McDonald, 1980, Some large-sample tests for nonnormality in the

linear regression model, Journal of American Statistical Association, 75, 16-28.

Wu, X., 2003, Calculation of maximum entropy densities with application to income distri-

bution, Journal of Econometrics, 115, 347-354.

Zellner, A. and R. A. Highfield, 1988, Calculation of maximum entropy distribution and

approximation of marginal posterior distributions, Journal of Econometrics, 37, 195-209.

19


21/22

Table 1: Size and Power of Symmetry Test

n = 20 n = 50 n = 100 n = 200b ts1 ts2 b ts1 ts2 b ts1 ts2 b ts1 ts2

S1 0.02 0.05 0.05 0.03 0.04 0.05 0.04 0.05 0.05 0.04 0.05 0.05

S2 0.03 0.06 0.07 0.03 0.05 0.07 0.03 0.05 0.08 0.03 0.05 0.08S3 0.02 0.06 0.06 0.03 0.06 0.06 0.04 0.06 0.06 0.04 0.06 0.06S4 0.03 0.05 0.05 0.03 0.04 0.05 0.04 0.05 0.05 0.05 0.05 0.05S5 0.03 0.06 0.07 0.04 0.05 0.07 0.03 0.05 0.08 0.03 0.05 0.08A1 0.27 0.59 0.70 0.36 0.76 0.99 0.43 0.81 1.00 0.54 0.83 1.00A2 0.23 0.35 0.41 0.58 0.71 0.90 0.80 0.89 1.00 0.93 0.97 1.00A3 0.29 0.44 0.54 0.58 0.75 0.96 0.76 0.89 1.00 0.91 0.96 1.00A4 0.13 0.28 0.30 0.53 0.72 0.73 0.89 0.96 0.96 1.00 1.00 1.00A5 0.14 0.23 0.27 0.40 0.52 0.69 0.68 0.79 0.97 0.87 0.93 1.00

Table 2: Size and Power of Normality Test

n = 20 n = 50 n = 100 n = 200 n = 500 n = 1000tn1 tn2 tn1 tn2 tn1 tn2 tn1 tn2 tn1 tn2 tn1 tn2

S1 0.02 0.03 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.05 0.05 0.05S2 0.17 0.18 0.39 0.40 0.63 0.63 0.86 0.86 0.99 1.00 1.00 1.00S3 0.01 0.01 0.01 0.01 0.00 0.04 0.06 0.18 0.51 0.68 0.93 0.97S4 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.03 0.04 0.04 0.04S5 0.16 0.18 0.39 0.41 0.63 0.64 0.87 0.88 1.00 1.00 1.00 1.00A1 0.72 0.80 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

A2 0.36 0.44 0.86 0.93 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00A3 0.48 0.58 0.96 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00A4 0.02 0.04 0.08 0.40 0.80 0.97 1.00 1.00 1.00 1.00 1.00 1.00A5 0.30 0.35 0.73 0.79 0.97 0.98 1.00 1.00 1.00 1.00 1.00 1.00

Table 3: Symmetry Test of Wage Adjustment Data

Year Mean Std. Min Max b ts1 ts21978 7.10 2.48 0.0 19.4 3.34* 15.32* 26.20*1983 5.20 3.03 -8.4 16.2 -2.54* 6.72* 8.14*

1984 3.39 2.25 -0.4 10.8 -1.90 3.90* 14.03*1989 5.10 1.93 -0.2 19.8 1.89 4.34* 16.07*1990 5.37 2.22 -0.3 15.2 0.24 0.08 2.271994 0.12 2.01 -7.5 17.7 1.12 1.71 0.53*: Rejected at 5% significance level.

20


22/22

4 2 0 2 4

0.

0

0.

1

0.

2

0.

3

0.

4

Estimated Maxent densities and Edgeworth expansion for standard normal

x

density

TheoreticalMaxent f1Maxent f2Edgeworth

Figure 1: Estimated Maxent densities and Edgeworth expansion for standard normal.

1 0 1 2 3 4 5

0.

00

0.

05

0.

10

0.

15

0.

20

0.2

5

Estimated Maxent densities and Edgeworth expansion for Chisqure (3)

x

density

Theoretical

Maxent f1Maxent f2Edgeworth

Figure 2: Estimated Maxent densities and Edgeworth expansion for 2 with d.f.=3.

21

information-theoretic distribution tests with symmetry 2004

Documents