information-theoretic distribution tests with symmetry 2004
TRANSCRIPT
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
1/22
Information-Theoretic Distribution Tests with
Application to Symmetry and Normality
Thanasis Stengos and Ximing Wu
March 4, 2004
Abstract
We derive distribution free tests based on the Maximum Entropy densities to test
the null hypotheses of symmetry and normality. The proposed tests are derived from
maximizing the differential entropy subject to moment constraints. By exploiting the
equivalence between Maximum Entropy and Maximum Likelihood estimates of the
exponential families, we can use the conventional Likelihood Ratio, Wald and Lagrange
Multiplier testing principles in the maximum entropy framework. Monte Carlo evidence
suggests that they have desirable small sample properties, when compared with the
standard parametric tests used in the literature, such as the standardized skewness
coefficient test for symmetry or the Jarque and Bera (1980) test for normality. We
apply the proposed symmetry tests to test the nominal wage rigidity hypothesis of
wage determination process.
Department of Economics, University of Guelph. Email: [email protected]. We want to thankseminar participants at Penn State University for comments. Financial support from SSHRC of Canada isgreatfully acknowledged.
Corresponding author. Department of Economics, University of Guelph. Ontario, Canada, N1G 2W1.Email: [email protected]; Tel: (519) 824-4120, ext. 53014; Fax: (519) 763-8497.
JEL code: C1, C12, C16; Key words: distribution test, maximum entropy, symmetry, normality.
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
2/22
1 Introduction
There are many parametric and nonparametric tests proposed in the literature to test the
hypothesis that a distribution is symmetric about a known median. For example, among
the most widely used parametric tests there is the standardized skewness coefficient test of
Gupta (1967) and in the category of nonparametric tests those proposed by Fan and Gencay
(1995) and Ahmad and Li (1997). The latter tests are distribution free based on local kernel
estimation. They offer an advantage over the more traditional parametric tests such as the
standardized skewness coefficient test in that they are consistent, since they are based on
the whole distribution and not simply a part of it. However, as they are based on local
smoothing methods these tests depend on the choice of bandwidth that is used in density
estimation, something that may affect their power and size in finite samples.In this paper we introduce two alternative distribution free tests based on the maximum
entropy (ME) densities. Unlike the above mentioned kernel based nonparametric tests, the
proposed tests do not depend on bandwidth selection. Our proposed tests differ from those
introduced by Imbens, Spady and Johnson (1998) that minimize the discrete Kullback-
Leibler information criterion (cross entropy) or other Cresie-Read family statistics subject
to moment constraints. They are derived from maximizing differential entropy subject to
moment constraints. By exploiting the equivalence between ME and ML estimates for the
exponential families, we can use the conventional likelihood ratio (LR), Wald and Lagrange
Multiplier (LM) testing principles in the maximum entropy framework. Hence, our tests
share the optimality properties of the standard maximum likelihood based tests. We show
that the ME approach leads to simple yet powerful tests of symmetry which have desirable
small sample properties. One of the tests is asymptotically equivalent to the conventional
skewness test but is always more powerful. We also derive a normality test with similar
properties that compares favorably with the existing JB test proposed in Jarque and Bera
(1980) and Bera and Jarque (1981).
The paper is organized as follows. In the next section we present the information the-
oretic framework on which we base our analysis. We then proceed to derive our symmetry
and normality tests and discuss their properties. In the following section we present some
simulation results and then we present the results of an empirical application using a unique
Canadian wage change contract data. Finally, before we conclude, we discuss some possible
1
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
3/22
extensions.
2 Information-theoretic distribution test
The ME principle states that among all the distributions that satisfy certain moment con-
straints, we should choose the one that maximizes Shannons information entropy. According
to Jaynes (1957), the ME distribution is uniquely determined as the one which is maximally
noncommittal with regard to missing information, and that it agrees with what is known,
but expresses maximum uncertainty with respect to all other matters.
The ME (maxent) density is obtained by maximizing the entropy subject to some moment
constraints. Let X be a random variable distributed with a probability density function (pdf)
f(0) , and X1, X2,...,Xn be an i.i.d. random sample of size n generated according to f(0) .
We also let be the estimate of0 based on a particular sample realization. We maximizethe entropy
W =
f(x,)log f(x,) dx,
subject to
f(x, ) dx = 1,gk (x) f(x,) dx = k, k = 1, 2, . . . , K ,
where k = 1n ni=1 gk (xi) , and gk (x) is continuous and at least twice differentiable. Thesolution takes the form
f(x, ) = exp
0
Kk=1
kgk (x)
. (1)
To ensure f(x, ) integrates to one, we set
0 = log
exp
Kk=1
kgk (x)
dx
.
The maximized entropy W = 0+K
k=1 kk. The maxent density is of the generalized expo-nential family and can be completely characterized by the moments Egk (x) , k = 1, 2, . . . , K .
2
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
4/22
We call these moments characterizing moments, which are the sufficient statistics of the
maxent density. A wide range of distributions belong to this family. For example, the Pear-
son family and its extensions described in Cobb et al. (1982), which nests the normal, beta,
gamma and inverse gamma densities as special cases, are all maxent densities.
In general, there is no analytical solution for the maxent density, and nonlinear optimiza-
tion is required (Zellner and Highfield (1988), Ornermite and White (1999) and Wu (2003)).
We use Lagranges method to solve for this problem by iteratively updating
t+1 = t H1b,
where for the (t + 1)th stage of the updating, bk =
gk (x) f
x, t
dx
k and the Hessian
matrix H takes the form
Hk,j =
gk (x) gj (x) f
x, t
dx.
The positive-definitiveness of the Hessian ensures the existence and uniqueness of the solu-
tion.1
Given Eq. (1), we can also estimate f(x,) using MLE. The maximized log-likelihood
l =ni=1
log fxi, = ni=1
0 + Kk=1
kgk (xi)= n
0 + Kk=1
kk
= nW.
Therefore, when the distribution is of the generalized exponential family, MLE and ME are
equivalent. Moreover, they are also equivalent to a method of moments (MM) estimator.
This ME/MLE/MM estimator only uses the sample characterizing moments.
1Let = [0, 1, . . . , K ] be a non-zero vector and g0 (x) = 1, we have
H=
Kk=0
Kj=0
kj
gk (x) gj (x) f(x,) dx
=
Kk=0
kgk (x)
2f(x, ) dx > 0.
Hence, H is positive-definite.
3
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
5/22
Although the MLE and ME are equivalent in our case, there are some conceptual dif-
ferences. For MLE, the restricted estimates are obtained by imposing some constraints on
the parameters. In contrast, for ME, the dimension of the parameter is determined by the
number of moment restrictions imposed: the more moment restrictions, the more complex
the distribution. To reconcile these two methods, we note that a ME estimate with the first
m moment restrictions has a solution of the form
f() = exp
0
mk=1
kgk (x)
,
which implicitly sets j, j = m + 1, m + 2, . . . , to be zero. Instead, when we impose more
moment restrictions, say, gm+1 (x) f() dx = m+1, we let the data choose the appropriatevalue of m+1.2 In this sense, the estimate with more moment restrictions is in fact lessrestricted, or more flexible. The ME and MLE share the same objective function (up to
a proportion) which is determined by the moment restrictions of the maximum entropy
problem. Therefore, we can regard the ME approach as a method of model selection, which
generates a MLE solution.
Consider a M dimension parameter space M, and we want to test if m, a subspaceof M, m M. Because of the equivalence between the ME and MLE, we can use the
traditional LR, Wald and LM principles to construct test statistics.3
For j = m,M, let j
be the MLE estimates in j , lj and Wj be their corresponding log-likelihood and maximized
entropy, we have
f(m)log f(m) dx =
mk=0
m,kgk (x)
f(m) dx
=mk=0
m,k
gk (x) f(m) dx =
mk=0
m,k
gk (x) f(M) dx
= mk=0
m,kgk (x) f(M) dx = f(M)log f(m) dx.
2The only case that m+1 = 0 is when the moment restriction
gm+1f(m) = m+1 is not binding, orthe (m + 1)th moment is identical to its prediction based on the maxent density f(m) from the first mmoments. In this case, the (m + 1)th moment does not contain any additional information that will furtherreduce the entropy.
3Imbens et al. (1998) proposes similar tests in the information-theoretic generalized empirical likelihoodframework.
4
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
6/22
The fourth equality follows because the first m moments of f(m) are identical to those of
f(M) . Consequently, the log-likelihood ratio
R =
2 (lm
lM) =
2n (Wm
WM)
= 2n
f(m)log f(m) dx
f(M)log f(M) dx
= 2n
f(M)log f(m) dx
f(M)log f(M) dx
= 2n
f(M)log
f(M)
f(m)dx,
which is the Kullback-Leibler distance statistic between f(M) and f(m) multiplied by
twice of the sample size. Therefore, if f(m) is the true model and nested in f(M) , the
quasi-MLE estimate f(M) is equivalent to the estimate that minimizes the Kullback-Leibler
statistic between f(M) and f(m) , as shown in White (1982).
If we partition u = (m, Mm) = (1u, 2u) for the unrestricted model and similarly
r = (1r, 0) for the restricted model, then the score function
S(x,m,Mm) =
ln f
m(x|m, Mm)
ln f
Mm(x|m, Mm)
,
and the Hessian
H(x, m, Mm) =
2 ln fmm (x|m,Mm) 2 ln fmMm (x|m,Mm)2 ln f
Mm
m
(x|m, Mm) 2 ln fMm
Mm
(x|m, Mm)
.If we partition similarly the inverse of the information matrix I = E(H) as
I
1 = I11 I12I21 I22
,then the Wald test is defined as
W D = n2u I2212u,
5
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
7/22
whereas the Lagrange Multiplier test is defined as
LM =1
n
ni=1
S
xi,
1r, 0
I22
ni=1
S
xi,
1r, 0
.
All the tests are asymptotically equivalent and distributed as 2 with (M m) degrees offreedom.
3 Test of Symmetry and Normality
In this section, we use the proposed ME method to obtain test statistics for symmetry and
normality. Since the LR and Wald procedures require the estimation of the ME density,
which in general has no analytical solution and is computationally quite involved, we focus
on the LM test, which reduces surprisingly to a simple functional form.
3.1 Symmetry Test
As before, let X be a random variable distributed with a pdf f0, and X1, X2,...,Xn be an
i.i.d. random sample of size n generated according to f0. The standard test of skewness
takes the form
b = n 36 64 + 9 ,wherej = 1n ni=1 xji . The test statistic b is asymptotically distribution as N(0, 1). Althoughoriginally proposed under the assumption of normality, Gupta (1967) shows that the test
is also applicable without this assumption provided the underlying distribution has finite
moments up to order six.
3.1.1 Two Alternative Maxent Density Estimators.
Alternatively, we can approximate f0 by the maxent densities and then use the LM test
proposed above. In this paper we consider two simple, yet flexible functional forms. If we
approximate f0 using the maxent density subject to the first four arithmetic moments, the
solution takes the form
f1 = exp
4k=0
kxk
.
6
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
8/22
This exponential quartic form was first discussed by Fisher (1922) and studied in the maxi-
mum entropy framework in Zellner and Highfield (1988), Ornermite and White (1999) and
Wu (2003).
However, instead of using the high order sample moments, whose small sample properties
may be unreliable, we can use generalized moments of sin (x) and cos(x) .4 The resulting
density takes the form
f2 = exp
2k=0
kxk 3 sin(x) 4 cos(x)
.
This is in the spirit of the orthogonal trigometric polynomials method of Cencov (1962).
Since sine and cosine terms are bounded in [
1, 1] , these two moments always exist and
are not sensitive to outliers. An additional advantage of f2 is its numerical stability in
estimation. For large values ofx, f1, an exponential function where the exponent is quartic,
may encounter a numerical overflow with badly chosen initial values ofs. However, f2 does
not run into this problem because of the bounded range of sin (x) and cos(x) .
Pearson uses
f (x) =(x a) f
b0 + b1x + b2x2
to characterize the Pearson family distributions (Stuart and Ord, 1994). This family includes
the exponential, normal, beta of first and second kind, gamma distributions as special cases.
Cobb et al. (1983) further generalizes the Pearson distributions using
f (x)
f(x)= g (x)
v (x),
where g (x) is the so called shape polynomial of x up to order K and v (x) takes the form
of 1, x , x2 or x (1 x) . The maxent densities are even more flexible than Cobb et al. family.
Denote f(x) = exp(h (x)) , generally no restrictions are imposed on h (x) except that itis continuously differentiable. Following Cobb et al. (1983)s framework, approximating
the differentiable density f(x) by exp(h (x)) is equivalent to approximating f (x) /f(x) by
4When the domain off(x) = expKj=0 jxj is the real line, K should be an even number and k > 0
to ensure that f(x) is a proper density function as we require lim |x| f(x) = 0. However, for f2, we canhave 3 = 0 and 4 = 0 as the even function x2 is the dominant term in f2. For the test of symmetry basedon the third term, the choice of the last characterizing moments is immaterial.
7
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
9/22
h (x) . The power series and Fourier series are two commonly used approximation methods.
One can see that f1 corresponds to a power series approximation of f, while f2 is mixture
of both. Gallant (1981) notes that including a linear term helps to considerably reduce the
number of sine/cosine terms in a Fourier approximation of a non-periodic function. If a
quadratic term is included as well, then curvature restriction may be imposed. Moreover,
sine and cosine can be expressed as infinite power serieses
sin(x) =n=0
(1)n x2n+1
(2n + 1)!= x x
3
3!+
x5
5! . . .
cos(x) =n=0
(1)n x2n
(2n)!= 1 x
2
2!+
x4
4! . . . .
Therefore, the exponent of f2 is essentially an infinite power series with some coefficient
restrictions.
We proceed to shed some light into how f1 and f2 approximate the underlying distribu-
tions by conducting two experiments. In the first one, we estimate f1 and f2 from a random
sample of standard normal variates with sample size 50. We repeat the experiment 1,000
times. Denote fij as the estimate from the jth experiment, i = 1, 2. Because the experimentsare independent, we define the average estimate as
fi =
1000j=1
fij1
1000
= exp
1
1000
1000j=1
ikjgik (x) , i = 1, 2,where s are the estimated Lagrange multipliers. For comparison, we also calculate thetwo-term Edgeworth expansion of each experiment.5 The average estimate is also defined
similarly as the geometric average of each estimate. Figure 1 plots the average estimated
maxent density and Edgeworth expansion, together with the theoretical density. The plot
5The Edgeworth expansion is obtained as
f(x) =
1 +
1
6
1H3 (x) +
1
24(2 3) H4 (x) + 1
721H6 (x)
(x) ,
where
1 and 2 is the coefficient of skewness and kurtosis, Hi is the ith order Hermite polynomial and (x) is the standard normal density function. Hence, the average estimate is defined as
f(x) =1
n
ni=1
1 +
1
6
1 H3 (x) + 124
2 3 H4 (x) + 172H6 (x) (x) .
8
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
10/22
suggests that both maxent estimates approximate the underlying distributions well. The
Edgeworth expansion appears to be slightly closer to the underlying distribution. However,
the Edgeworth expansion is not a proper density estimate and may have negative values
for some x. In fact, when we evaluate the Edgeworth expansion on the range [
4, 4] , in
787 out of the 1,000 experiments we encounter negative values, which are replaced by some
arbitrarily small positive numbers (1e-6 in this study). Moreover, the Edgeworth expansion
usually does not integrate to unity. In contrast, the maxent estimates, by construction, are
proper densities that are always positive and integrate to one.
We run a second experiment on 2 with 3 degrees of freedom, and the results are plotted
in Figure 2. One can see that when the underlying distributions are not close to normal, the
Edgeworth expansion misses badly. In contrast, the maxent densities are considerably closer
to f0. Moreover, we can exploit the fact the domain of the distribution is positive to improve
the approximation by restricting x to be the positive half line or using a log(x) term in the
maxent density.6
3.1.2 The symmetry test statistics based on f1 and f2.
We do not assume normality for our symmetry test. Given the maxent density f(x) , we can
test the assumption of symmetry by testing if the Lagrange Multipliers associated with the
moments that are odd functions, i.e., Eg (x) = Eg (x), are zero. For f1, the informationmatrix under symmetry takes the form
I =
1 0 1 0 4
0 1 0 4 0
1 0 4 0 6
0 4 0 6 0
4 0 6 0 8
.
6A detailed comparison between maxent estimates and the Edgeworth expansion is not pursued here anyfurther and is left as a topic of future research.
9
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
11/22
Under symmetry, the score function for f1 is S = n [0, 0, 0,3, 0] . Then the LM test forsymmetry, which is equivalent to testing 3 = 0, is defined as
ts1 =1
n
SI1S =n
236 24 ,
where ts1 is distributed as 2 with one degree of freedom, and therefore is asymptotically
equivalent to b2. Both tests require the existence of the moments up to order 6. Comparing
with the conventional skewness test b =
n3/6 64 + 9, we note that24 64 9
which in turn implies that ts1
b2. The equality holds only when 4 = 3, where the firstfour moments coincide with that of standard normal distribution. Otherwise, ts1 always has
higher power than b under the alternative hypothesis of asymmetry.
The information matrix of f2 under symmetry is
I =
1 0 1 0 c
0 1 0 1,s 0
1 0 4 0 2,c
0 1,s 0 s2 0
c 0 2,c 0 c2
,
where 1,s = E[x sin(x)] , 2,c = E[x2 cos(x)] , s2 = E
sin(x)2
and c2 = E
cos(x)2
.
Now the test for symmetry is equivalent to testing if the Lagrange Multiplier for s =
Esin(x) is zero. Under the restriction of symmetry, the score function of f2 is S =
n [0, 0, 0,
s, 0] , and the test statistic is given by
ts2 = 1n
SI1S = n2ss2 21,s .where s = 1n ni=1 sin(xi) , 1,s = 1n ni=1 xi sin(xi) and s2 = 1n ni=1 sin(xi)2 . The teststatistic ts2 is also asymptotically distributed as
2 with one degree of freedom.
10
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
12/22
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
13/22
propose a test based on f2. Under normality, the information matrix of f2 takes the form
I =
1 0 1 0 e1
2
0 1 0 e1
2 0
1 0 3 0 0
0 e1
2 0 1e2
20
e1
2 0 0 0 1+e2
2
.
The score function under normality restriction is S = n
0, 0, 0,s,c e 12 and the newLM test is given by
tn2 = 1n
SI1S = n 22s1 2e1 e2 +
2c e 122
1 3e1 + e2 .Both tn1 and tn2 are asymptotically distributed as
2 with two degrees of freedom.
Under normality, the correlation of 3 and 4 is practically zero, so is that of s and
c. However, the correlation of |3| and 4 is 0.52, 0.41 and 0.32 for 10,000 random normalsamples with n = 50, 100 and 200, while the correlation of |
s| and
c is 0.33, 0.22 and 0.17
for n = 50, 100, 200. Therefore, we expect that tn2 converges to 22 distribution faster than
tn1 asymptotically.
4 Simulations
In this section, we use Monte Carlo simulations to assess the size and power of the proposed
tests. Following Randles et al. (1980) and Bai and Ng (2003), we consider well known
distributions such as the normal, the t and the 2, as well as distributions from the generalized
lambda family. The generalized lambda distribution, which nests a range of symmetric andasymmetric distributions, is defined in terms of the inverse of the cumulative distribution
F1 (u) = 1 +
u3 (1 u)4
/2, 0 < u < 1.
For both the symmetry and normality test, we consider the following symmetric and
asymmetric distributions:
S1: N(0, 1) ;
12
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
14/22
S2: t5;
S3: e1I(z 0.5) + e2I(z > 0.5) , where z U(0, 1) , e1 N(1, 1), and e2 N(1, 1) ;
S4: F1 (u) = 1 + u3 (1 u)4 /2, 1 = 0, 2 = 0.19754, 3 = 0.134915, 4 =0.134915;
S5: F1 (u) = 1 +
u3 (1 u)4
/2, 1 = 0, 2 = 1, 3 = 0.8, 4 = 0.8;
A1: lognormal: exp(e) , e N(0, 1) ;
A2: 23;
A3: exponential: ln (e) , e U(0, 1) ;
A4: F1 (u) = 1 +
u3 (1 u)4 /2, 1 = 0, 2 = 1, 3 = 1.4, 4 = 0.25;A5: F1 (u) = 1 +
u3 (1 u)4
/2, 1 = 0, 2 = 1, 3 = 0.0075, 4 = 0.03.
For each distribution, we draw 100,000 random samples of size n = 20, 50, 100, 200, 500
and 1,000 and calculate the symmetry and normality test statistics discussed above. There is
a large body of work on symmetry and normality test, for example, Gupta (1967), Randles
et al. (1980), Bera and Jarque (1981), DAgostino et al. (1990), Ahmad and Li (1996),
Bai and Ng (2003) and Bontemps and Meddahi (2003). Our results are comparable to and
in general more favorable than those of existing studies, especially when the sample size is
small. Table 1 reports the results of the symmetry test at the 5 per cent level of significance.
We report the results for sample size up to 200 as the power of the tests is nearly unity for
n 500 for all the asymmetric distributions. The first five rows for symmetric distributionsreport size and the next five for asymmetric distributions show powers. The size of the tests
remains stable across different sample sizes. For sample sizes ranging between 20 to 200,
which is frequently encountered in empirical work, the variation in size for b is no more than
2%, and that for ts1 and ts2 is less than 1%. We find that ts1 always rejects more often than
b, confirming the results of the previous section that ts1 b2. Also in general, ts2 tends toreject more often than the other two tests. On the other hand, we observe high power of
the proposed tests against asymmetric distributions and the power increases rapidly to unity
with sample size. Overall, ts1 and ts2 are considerably more powerful than b. For example,
13
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
15/22
the power of the two maxent tests against the lognormal distribution (A1) is often more
than twice that of the conventional skewness test when the sample size is small.
Table 2 reports the results for the normality tests at the 5 per cent significance level.
The first row reflects the size and the rest show the power of the tests. In terms of size the
two tests are comparable, however, the second test is generally more powerful. For example,
for distribution S3, the power of JB test is only 0.06 for n = 200, while that of tn2 is 0.18.
So is the case for distribution A4, where tn2 is considerably more powerful even when the
sample size is small. For distribution S4, whose first four moments coincide with those of
the standard normal distribution, both tests as expected have very low power. Randles et
al. (1980) and Bai and Ng (2003) also report very low power against S4. Even then though,
tn2 appears to be slightly more powerful than tn1.
5 Empirical Application
The wage determination process is one of the most studied areas of empirical labor economics.
At the root of this research lies the question of downward nominal rigidity, which, if prevalent,
would interfere with the functioning of the labor market preventing the efficient reallocation
of workers from low to high demand areas and inducing unemployment. Furthermore, if
nominal rigidity is more pervasive in some sectors than in others, similar shocks will havedifferent price and quantity effects. For example, if unions are more resistant to wage cuts
than the non-union sector real wage realignment may be more difficult to achieve in the union
sector. The same may be true in the public sector when compared with the private sector,
since the former is typically more unionized than the latter. There is an expanding research
area that studies this issue, see Christofides and Stengos (2003) for a recent review of the
current literature. The role of symmetry as a gage of the extent and significance of downward
nominal rigidity is the subject of considerable debate. Card and Hyslop (1997) note that
most wage determination models imply symmetry and use the portion of the distribution
above the median as the no-rigidity counterfactual in their own work. McLaughlin (2000)
provides reasons other than downward wage rigidity for believing that the wage-change
distribution may be asymmetric. Whatever the reasons, testing for symmetry or the lack
of it (asymmetry) is a question of paramount importance in empirical labor economics and
serves as a first stage in offering a better understanding of wage determination. The starting
14
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
16/22
point of most of these studies is the construction of wage-change histograms from data on
individual agents. However, visual evidence must be filtered through standard statistical
procedures in order to measure the quantitative significance of certain forces and effects.
In this context, consideration of the symmetry of the wage-change distribution has played
a very important role. A number of symmetry measures have been used in this literature.
These include the standardized skewness coefficient test b, the difference between the median
and mean, symmetrically differenced histograms and nonparametric symmetry tests, see
MacLaughlin (2000) and Christofides and Stengos (2003) for a discussion of these different
measures.
We apply our proposed LM test statistics based on the maxent densities f1 and f2 on
a set of Canadian public sector union wage contract data for different years in the 1978 to
1994 period. The union contract data used in this paper concern the annualized change
over the life of each contract in the base-wage rate agreed to by employers and unions in
the Canadian public sector. The data compiled initially by Labor Canada (now Human
Resources Development Canada), starts in 1978 and covers contracts involving employees
which range in number from 500 to nearly 80,000. Collective bargaining agreements often
implement an increase in the nominal wage rate at the beginning of the contract and then
again at yearly intervals. Contract duration (in years) and the number of nominal revisions
are correlated but not identical and, in any case, the main pattern is one of infrequent wageadjustments. A result of this fact is that, depending on what contract sub-period one chooses
to focus on, the implied nominal change could be made arbitrarily large or small. Under
these circumstances, and given that contract duration varies substantially across settlements,
a natural interval over which to measure wage adjustment is contract duration itself, taking
care to standardize across contracts by annualizing. In the contract data the private-public
sector distinction is made by the data collection agency itself and is based on the sources of
funding for the employer. Thus, health and education contracts are classified as belonging
to the public sector because these services are not market-provided.
The data span the high-inflation period of 1978-1982, the medium-inflation period of
1984-1989 and the low-inflation period of 1990-94. The contract data involves agreements
which do not contain Cost-of-Living-Allowance (COLA) clauses and, therefore, do not raise
the additional complication that some of the relevant wage flexibility may come from the
indexation clause. To avoid spurious correlations that may arise by pooling different years
15
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
17/22
together we look at the years at the beginning and end of each of the three mentioned periods.
The left panel of Table 3 provides some descriptive statistics for the data used in 1978, 1982,
1984, 1989, 1990 and 1994 and the right panel of Table 3 reports the test statistics for these
years. The first column is the conventional skewness test b, the next two are the maxent test
ts1 and ts2. As was expected from the findings in the simulations, the standardized skewness
coefficient test statistic b is less powerful than the information based test statistics ts1 and
ts2. In two out of six years b fails to reject the symmetry hypothesis whereas the other two
test statistics do. These results suggest that the standardized skewness coefficient test would
give unreliable results. The only times that this test statistic rejects the null hypothesis of
symmetry is in 1978 and 1983 which are high inflation years. It fails to reject the null during
the medium inflation years, whereas the two maxent density based statistics reject the null
during these years. During these years public sector union contracts displayed considerable
downward nominal rigidity in order to avoid real wage erosion due to inflation.
6 Extensions
In addition to their simplicity, a major advantage of the proposed tests is its generality. In
this section, we briefly discuss the some potential extensions of the tests.
First, we can increase the terms of polynomials in the maxent density. For f1, we canuse moments higher than order 4, and for f2, we can use sin(kx) and cos(kx) , k = 2, 3, . . . .
Moreover, we can mix high order polynomials and tri-polynomials. The derivation of the
test statistics is straightforward.
Second, we can use our method of normality test for other distributions. For example,
the gamma distribution can be characterized as a maxent distribution
f0 (x) = exp (
0
1x
2 log x) , x > 0.
Because Ex and Elog x are the sufficient statistics for gamma distribution, the presence of
any additional terms in the exponent of f0 (x) rejects the hypothesis that x is distributed
according to a gamma distribution. Let f(x) = exp0 1x 2 log x
Kk=3 kgk (x)
,
the test of k = 0 for k 3 is then the LM test for gamma distribution. The dis-cussions in previous section suggest that the natural candidates for gk (x) may include
16
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
18/22
xi+1, sin(ix) , cos(ix) , (log x)i+1 , sin(i log x) , cos(i log x) , i = 1, 2, 3, . . . , or their combina-
tions.
Third, we can generalize our tests to regression residuals within the framework of White
and McDonald (1980). For time series data or heteroskedastic data, we can use the approach
of Bai and Ng (2003) or Bontemps and Meddahi (2003). In general, for non-i.i.d. data, to test
the Lagrange Multipliers associated with sample moments gk (x) in the maxent density being
zero, we need to estimate a Heteroskedastic-Autocorrelation-Consistent (HAC) covariance
matrix for those moments.
7 Conclusion
In this paper we derive distribution free tests based on the maximum entropy densities to
test the null hypotheses of symmetry and normality. The proposed tests are derived from
maximizing differential entropy subject to moment constraints. By exploiting the equivalence
between ME and ML estimates for the exponential families, we can use the conventional LR,
Wald and LM testing principles in the maximum entropy framework. Hence, our tests share
the optimality properties of the standard ML based tests. We show that the ME approach
leads to simple yet powerful LM tests of symmetry and normality. The proposed tests have
desirable small sample properties, when compared with the standard parametric tests usedin the literature, such as the standardized skewness coefficient test for symmetry or the
Jarque and Bera (1980) test for normality. These properties are confirmed by our Monte
Carlo experiments and empirical application.
17
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
19/22
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
20/22
Gupta, M. K., 1967, An asymptotically nonparametric test of symmetry, Annals of Statistics,
38, 849-866.
Imbens, G. W., R. H. Spady and P. Johnson, 1998, Information theoretic approaches to
inference in moment condition models, Econometrica, 66, 333-357.Jarque, C. and A. Bera, 1980, Efficient tests for normality, homoscedasticity and serial
independence of regression residuals, Economics Letters, 6, 255-59.
Jaynes, E.T., 1957, Information theory and statistical mechanics, Physics Review, 106, 620-
630.
McLaughlin, K. J., 2000, Testing for asymmetry in the distribution of wage changes, Mimeo.
Ormoneit, D. and H. White, 1999, An efficient algorithm to compute maximum entropy
densities, Econometric Reviews, 18(2), 141-67.
Randles, R.H., M. A. Polocello and D. A. Wolfe, 1980, An asymptotically distribution-free
test for symmetry versus asymmetry, Journal of the American Statistical Association,
75, 168-172.
Shannon, C. E., 1949, The mathematical theory of communication (University of Illinois
Press: Urbana).
Stuart, A. and J. K. Ord, 1994, Kendalls advanced theory of statistics, Vol.1 (Edward
Arnold), 6th Edition.
Vasicek, O., 1976, A test for normality based on sample entropy, Journal of the Royal
Statistical Society, Series B, 38, 54-59.
White, H., 1982, Maximum likelihood estimation of misspecified models, Econometrica, 50,
1-26.
White, H. and G. M. McDonald, 1980, Some large-sample tests for nonnormality in the
linear regression model, Journal of American Statistical Association, 75, 16-28.
Wu, X., 2003, Calculation of maximum entropy densities with application to income distri-
bution, Journal of Econometrics, 115, 347-354.
Zellner, A. and R. A. Highfield, 1988, Calculation of maximum entropy distribution and
approximation of marginal posterior distributions, Journal of Econometrics, 37, 195-209.
19
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
21/22
Table 1: Size and Power of Symmetry Test
n = 20 n = 50 n = 100 n = 200b ts1 ts2 b ts1 ts2 b ts1 ts2 b ts1 ts2
S1 0.02 0.05 0.05 0.03 0.04 0.05 0.04 0.05 0.05 0.04 0.05 0.05
S2 0.03 0.06 0.07 0.03 0.05 0.07 0.03 0.05 0.08 0.03 0.05 0.08S3 0.02 0.06 0.06 0.03 0.06 0.06 0.04 0.06 0.06 0.04 0.06 0.06S4 0.03 0.05 0.05 0.03 0.04 0.05 0.04 0.05 0.05 0.05 0.05 0.05S5 0.03 0.06 0.07 0.04 0.05 0.07 0.03 0.05 0.08 0.03 0.05 0.08A1 0.27 0.59 0.70 0.36 0.76 0.99 0.43 0.81 1.00 0.54 0.83 1.00A2 0.23 0.35 0.41 0.58 0.71 0.90 0.80 0.89 1.00 0.93 0.97 1.00A3 0.29 0.44 0.54 0.58 0.75 0.96 0.76 0.89 1.00 0.91 0.96 1.00A4 0.13 0.28 0.30 0.53 0.72 0.73 0.89 0.96 0.96 1.00 1.00 1.00A5 0.14 0.23 0.27 0.40 0.52 0.69 0.68 0.79 0.97 0.87 0.93 1.00
Table 2: Size and Power of Normality Test
n = 20 n = 50 n = 100 n = 200 n = 500 n = 1000tn1 tn2 tn1 tn2 tn1 tn2 tn1 tn2 tn1 tn2 tn1 tn2
S1 0.02 0.03 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.05 0.05 0.05S2 0.17 0.18 0.39 0.40 0.63 0.63 0.86 0.86 0.99 1.00 1.00 1.00S3 0.01 0.01 0.01 0.01 0.00 0.04 0.06 0.18 0.51 0.68 0.93 0.97S4 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.03 0.04 0.04 0.04S5 0.16 0.18 0.39 0.41 0.63 0.64 0.87 0.88 1.00 1.00 1.00 1.00A1 0.72 0.80 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
A2 0.36 0.44 0.86 0.93 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00A3 0.48 0.58 0.96 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00A4 0.02 0.04 0.08 0.40 0.80 0.97 1.00 1.00 1.00 1.00 1.00 1.00A5 0.30 0.35 0.73 0.79 0.97 0.98 1.00 1.00 1.00 1.00 1.00 1.00
Table 3: Symmetry Test of Wage Adjustment Data
Year Mean Std. Min Max b ts1 ts21978 7.10 2.48 0.0 19.4 3.34* 15.32* 26.20*1983 5.20 3.03 -8.4 16.2 -2.54* 6.72* 8.14*
1984 3.39 2.25 -0.4 10.8 -1.90 3.90* 14.03*1989 5.10 1.93 -0.2 19.8 1.89 4.34* 16.07*1990 5.37 2.22 -0.3 15.2 0.24 0.08 2.271994 0.12 2.01 -7.5 17.7 1.12 1.71 0.53*: Rejected at 5% significance level.
20
-
8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004
22/22
4 2 0 2 4
0.
0
0.
1
0.
2
0.
3
0.
4
Estimated Maxent densities and Edgeworth expansion for standard normal
x
density
TheoreticalMaxent f1Maxent f2Edgeworth
Figure 1: Estimated Maxent densities and Edgeworth expansion for standard normal.
1 0 1 2 3 4 5
0.
00
0.
05
0.
10
0.
15
0.
20
0.2
5
Estimated Maxent densities and Edgeworth expansion for Chisqure (3)
x
density
Theoretical
Maxent f1Maxent f2Edgeworth
Figure 2: Estimated Maxent densities and Edgeworth expansion for 2 with d.f.=3.
21