entropy statistics tests goodness of fit wald lagrange likelihood ratio
TRANSCRIPT
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
1/89
Cairo University
Institute of Statistical Studies & Research
Department of Mathematical Statistics
Tests Based on Sampling Entropy
By
Mohamed Soliman Abdallah
Supervised By
Prof.Samir Kamel Ashour Dr.Esam Aly Amin
Professor of Mathematical Statistics, Lecturer of Mathematical Statistics,
Department of Mathematical Statistics, Department of Mathematical Statistics,
I.S.S.R. Cairo University I.S.S.R. Cairo University
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
2/89
A Thesis Submitted to the Department of Mathematical Statistics
In Partial Fulfillment of the Requirements for the Degree of
MASTER OF SCIENCE IN
Mathematical Statistics
2009
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
3/89
Acknowledgement
All Gratitude due to ALLAH
I would like to extend my appreciation to my advisor Prof. Samir
Kamel Ashour for his support, guidance and patience during the
completion of this study and my graduate studies, really I feel very lucky
due to working with this kind of people. Special thanks for Dr. Esam Ali
Amin for his encouragement to complete this thesis.
I would like to express my gratitude to my parents, my brother,
my sister, Mr Ebrahim and Mr Tag Eldean and every one who helps me
throughout this work.
I can not forget to thank Wikipedia.com that made the process of
the research becomes easier and faster than before.
Summary
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
4/89
Goodness of fit tests play a vital role in the scientific and academic, so that a lot
of goodness of fit tests were suggested by the researchers throughout the previous
decades and each one has its own philosophy during deriving the test. In particular
this study deals in some details with tests based on sampling Shannons (1948)
entropy.
Mainly there are two schools that employ the sampling entropy for goodness
of fit, on one hand the non parametric estimation which contains two routes, first
Vasiceks (1975) approach that suggested to test the samples distribution by consider
the ratio between the observed entropy and the expected entropy, second the approach
of Arizono and Hiroshi (1989) that mentioned to test the samples distribution by
consider the difference between the observed entropy and the expected entropy.
On the other hand the parametric approach of Stengos and Wu(2004),(2007)
that proposed a flexible four tests which are derived by Lagrange multiplier test based
on principle of maximum entropy which is proposed by Jaynes (1957).
In this study we concentrated on the parametric approach by deriving the
Stengos and Wus (2004),(2007) tests by wald and likelihood ratio test instead of
Lagrange multiplier test, in addition operating simulated comparisons among
entropys estimators, which mentioned only in this essay, besides carrying out a
numerical comparisons among all tests based on Shannons (1948), finally proposed
some academic points that can be conducted in the future researches.
Table of Contents
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
5/89
Pages
Chapter (1) Introduction 1
Chapter (2) Definitions and Notation
2.1 Estimation Theory 4
2.2 Methods of Estimation 8
2.2.1 Method of Moments 8
2.2.2 Method of Maximum Likelihood 9
2.2.3 Method of Least Square 10
2.3 Hypotheses Testing 12
2.3.1 Introduction to Hypotheses Testing 12
2.3.2 Tests Based on Likelihood Function 13
2.4 Measures of Information 14
2.4.1
2.4.2
Shannon Entropy and Related Measures
Kullback Leibler divergence (Relative Entopy)
19
22
Chapter (3) Goodness of Fit Based on Maximum Entropy 253.1 ParametersEstimation Based on Maximum Entropy 25
3.2 Entropy Estimation Using Sampling m-Spacing 35
3.2.1 Entropy Estimation Using Vasiceks Estimator 35
3.2.2 Entropy Estimation Using Correas Estimator 41
3.2.3 Entropy Estimation Using Wieczorkowski et als estimators 42
3.3 Goodness of Fit Based on Maximum Entropy 45
3.3.1 Goodness of Fit Based on Likelihood Tests 46
3.3.2 Goodness of Fit Based on Sampling m-Spacing 58
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
6/89
Chapter (4) Monte Carlo Simulation of The Entropys Tests 67
4.1 Simulated Results for the Performance of Entropys Estimators 67
4.2 Power Comparisons Among Tests Based on Sampling Entropy 69
4.2.1 Simulated Results for Testing the Normality 69
4.2.2 Simulated Results for Testing the Uniformity 73
4.2.3 Simulated Results for Testing the Exponintiality 74
4.3 Extensions 76
Appendices 77
Appendix(1): Tables 78
Appendix(2): Programs 128
References 160
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
7/89
Chapter (I)
Introduction
Testing the distribution of the sample has a long been an interesting issue in
the body of the literature, it is considered a key word in statistical modeling and
potentially useful developing statistical methodology, in particular in this essay it is
concerned with the tests based on the sample Shannons (1948) entropy.
Entropy as a statistical concept was formulated by Shannon (1948) to measure
the uncertainly or the size of the information in the sample, such that increasing
entropy denotes to less information and more uncertainly, in addition Kullback and
Liebler (KL) (1951) proposed an indicator to measure the divergence between two
sample via making a comparison between the amount of the information which can be
obtained from each sample, so that high values of KL refers to wide area between he
two samples and vice versa.
Jaynes (1957) utilized the Shannon entropy and proposed a flexible tool for
estimation the probability of the events using the prior information for instance the
average or the variance of the events, then Singh and Rajagopal (1986) extended this
tool for estimating the parameters of the frequency distribution.
Vasciek (1975) discovered a new route for testing the normality based on
sampling entropy, mainly the problem that faced him is how to estimate the sampling
entropy, he preferred to estimate the entropy function using m spacing , after that
sampling entropy is considered a famous issue and is reported in a variety fields, so
that there are four different entropys estimators based on m spacing without
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
8/89
Vascieks (1975) estimator, this study will concentrate on Correas (1995) estimator
and the two estimators of Wieczorkowski ad Grzegorzewski (1999).
Dudewicz and Van der Meulen (1981) extended Vascieks (1975) approach
for testing uniformity, however Gokhale (1983) applied Vascieks (1975) idea on a lot
of distributions, Taufer(2002) proposed a new idea for testing the exponintiality via
the two transformations which proposed by Seshadri and Csorgo (1969), his idea
operates any one of the two transformations on the data then operating Dudewicz and
Van der Meulens (1981) test, that if the transformed data is really uniform hence one
can conclude that the original data follows exponential distribution and vice versa.
Arizono and Ohta (1989) proposed an interesting idea for testing the
normality by utilizing KL to test the divergence between the observed sample and
expected sample under the null hypothesis, moreover Ebrahimi and Habibullah
(1991) applied KLs approach on exponential distribution, further Mao(2002) applied
the same approach in the multivariate distributions.
Stengos and Wu (2004), (2007) proposed a nother way for testing the
normality based on entropy by Lagrange multiplier test that they used Jaynes (1957)
idea for deriving four flexible normality tests, mainly this idea can be regarded as the
parametric approach because it doesnt need to estimate entropy function.
Mainly, the main purposes for this study can be summarized as following:
1. Review of both the parametric and the non parametric goodness of fit tests based
on sampling entropy.
2. Instead of deriving only the Lagrange multiplier test for testing the normality as
Stengos and Wu(2004),(2007) did, it isnt worth nothing to drive wald and likelihood
ratio test for testing the normality, moreover making a comparison between the three
tests.
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
9/89
3. Investigating numerically the performance of the entropys estimators, which
based on m spacing.
4. Operating a numerical illustration among the non parametric tests and on the other
hand the parametric tests.
Indeed the components of this study will be organized in the following form:
1. The second chapter will divided into four parts, the first part will discuss some
topics in the estimation theory, but the second part shows in brief some methods of
estimation, however the third part will concentrate on testing hypotheses and some
common tests will be used in this study, finally the fourth part will deals with the
review of the Shannons (1948) entropy and some related measures of information.
2. The third chapter will essentially divided into three parts, the first part will focus
on estimation based on principle of maximum entropy, however the second partconcerned with some estimators of entropy based on m spacing, however the second
part explains both the parametric tests and the non parametric tests based on sampling
entropy.
3. Finally the fourth chapter contains the comments of the numerical results that is
concluded from Monte Carlo simulation as well as suggestions for future options of
academic researchers.
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
10/89
Chapter (II)
Definitions and Notation
This chapter is concerned with some important definitions and notation that
will be used in this study. The first section deals with some definitions associated to
estimation theory, then the second is concerned with different approaches of
estimation, however the third section is devoted to some topics in hypotheses testing,
finally the fourth section discuss Shannons entropy and some measures of
information.
2.1 Estimation TheoryIn most statistical studies the parameters of the population are unknown and
must be estimated from the sample because it is impossible or just too much trouble
(in terms of time or expensive) to look at the entire population, therefore estimation
theory has a vital role in statistical inference and is divided in to point estimation and
interval estimation.
Definition (2.1.1): Point estimation is a number obtained from computations on the
observed values of the random sample that serves as an approximation to the
parameters of the population.
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
11/89
It is important to point the difference between an estimate and corresponding
estimator, that an estimate is a particular value calculated from a specified sample of
observation, but an estimator is a random variable can be regarded as point
estimation. Of course, there are a lot of estimators corresponding to each parameter of
populationU , therefore one would probably obtain many different estimates forU ,thus it is required to discuss the criteria that make one estimator preferable to another.
Definition (2.1.2) Suppose U is a statistic from observed random sample and
consider a point estimator forU , we called U is an unbiased estimator forU iff :
E (U ) =U
If the previous condition valid in the large sample size, we called, U is asymptoticunbiased estimator forU .
Definition (2.1.3): Suppose1U , 2U are two estimators forU and if:
1)(
)(
)(
)(
2
1
2
2
2
1 !
U
U
UU
UU
MSE
MSE
E
E
then 1
U is more efficient than 2
U , where MSE refers to the mean square error of theestimator. If the previous condition valid in the large sample size, hence 1
U will be
asymptotic more efficient than 2U .
Definition (2.1.4) :The statistic U is consistent estimator forU iff:
1)|)((|lim pgp
\UUpn
where \ > 0
It is obvious that consistency is an asymptotic property and sometimes called
convergence in probability. If U is unbiased estimator for U and its MSE tends to
zero at large sample size, so U is consistent estimator forU .
Definition (2.1.5): An estimator U is said to be a sufficient statistic iff it utilizes all
information in a sample relevant to the estimation for U , that is meaning all the
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
12/89
knowledge about U can be gained from the whole sample can just as well as gained
from U only . In mathematical form, U is a sufficient statistic iff the conditional
probability distribution of the random sample given U is independent of U .
Definition (2.1.6): The probability density function (p.d.f) ),( Uxf considered as a
member of the exponential family if ),( Uxf can be rewritten as the following
formula:
( , ) ( ) ( )exp{ ( ) ( )}f x a b x c g xU U U!
Where )(Ua and )(Uc are two different functions in U , however )(xb and )(xg are
two different functions in x .Furthermore exponential family can be applied for more
than one parameter as follows:
1
( , ) ( )( ( ))exp( ( ) ( )))J
j j
j
f x a b x c g xU U U
!
!
One of the advantages from regarding ( , )f x U
belongs to exponential family that
1 2( ), ( ).. ( )i i j ig x g x g x can be considered as a joint sufficient statistics forU
.
Cramer- Rao proposed an inequality that proposed the lower bound for the
variance of the unbiased estimator .Assume U is unbiased estimator forU then:
2));(ln(
1)(
UU
Uxf
d
dnE
V u (2.1.1)
If the two sides coincided, then U is the best estimator forU among unbiased
estimators, from (2.1.1) inequality, we notice the following points:
1. Inequality (2.1.1) can take another equivalent formula:
22
2
1 1( )
( ln ( ; )) ) ( ln ( ; ))
Vd d
nE f x nE f xd d
UU U
U U
u !
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
13/89
2. The denominator of the (2.1.1) called fisher information I(U ), that is an index for
the size of the information in the sampling corresponding U . It is obvious more
information leads to more accuracy meaning less variability. If we haveU of order
1J v parameters, thus the fisher information called information matrix of order J Jv
can be expressed as:
2
2
( , ) 2
( ln ( , )) if i j
( )
( ln ( , )) if i j
i
J J
i j
dnE f x
dI
dnE f x
d d
UU
U
UU U
!
!
{
Definition (2.1.7): Confidence interval is an interval determined by two numbers
obtained from the computations on the observed values that is expected to contain the
parameterU in its interior.
Let nXXX ..21 be a random sample from f(x;U ) , assume L = )..( 211 nxxxt and
U = )..( 212 nxxxt satisfies L
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
14/89
EU ! 1));..(( 21 bXXXQaP n
Converted to:
EU ! 1)),..(),..((212211
bXXXgaXXXgPnn
Where (a, b, E ) can be regarded as values free of U .
2. Obtain two values (a,b) in the domain of pivotal quantity, Where ba , which
minimizing the length of the interval:
),..(),..( 211212 aXXXgbXXXgLength nn !
Subject to
!b
a
n dxXXXQh EU 1)),..(( 21
Where )),..(( 21 UnXXXQh is the sampling distribution of ),..( 21 UnXXXQ .
Furthermore, Guenther (1969) concluded that the two sided confidence interval
based on symmetric distributions can be considered as the shortest confidence interval
because of the systematic of the distribution, however confidence interval based on
asymmetric distribution cannot be considered the shortest confidence interval, so he
recommended for using a table which proposed by Tate and Klett(1959) for the shortest
confidence interval based on chi-square distribution corresponding different levels of
sample size and various levels of significance .
2.2 Methods of EstimationAfter established some criteria which judge on the performance of the
estimators, hence it is required to discuss in brief the methods of estimation the
populations parameters, mainly many methods were proposed for construction the
estimators in the statistical literatures, however this section is concerned with three
methods of estimation.
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
15/89
2.2.1 Method of MomentsIt is difficult to track back who introduce the method of moments MOM ,
but Johnan Bernoulli(1667-1748) was the first who used the method in his work (see
Gelder(1997)), this method based on solving simultaneously a system of J equationsconsist of matching the observed sample moments with unobservable population
moments, where J refers to the number of the estimated parameters. Typically
different types of the observed sampling moments can be used as following:
1. The moments about zero (raw moments):
1 ( )
n
ij i
j
x
m E xn
!! !
2. The central moments:
2
1
( ) ( )
n
ij i
j
x x
E x xn
!
d! !
3. The standard moments:
1
( )
( )
nji
j i
j
x x
x xE
n
W
E W!
d
! !
Where and andx W refer to the mean and the standard deviation of the probability
density function (p.d.f) respectively. The method of moments in general provides
estimators, which are biased but consistent as large sample size, and not efficient, they
are often used because they lead to very simple computations, moreover may be used
as the first approximation or initial values to the solutions for other methods that need
for iteration. It is not unique, that instead of using the row moments, we can use the
central moments, therefore we obtain another estimators, unfortunately in some cases
MOM can not be applicable.
2.2.2 Method of Maximum LikelihoodIt is difficult to track who discovered this tool, but Bernoulil in 1700 was the
first who reported about it (see Gelder (1997)), the idea that it is required to give the
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
16/89
specified sample high probability to be drawn, so it is required to research about the
parameters that maximized the likelihood function for the specified sample. The
likelihood function is the joint density function for the completely random sampling
taking the following formula:
);();...(1
1 UU in
in
xfxx
!!
The method of maximum likelihood is required to estimate by searching for the
value of^
U that maximizes );...( 1 UnxxL , hence^
U is called maximum likelihood
estimator MLE, indeed obtaining^
U in many cases by solving the following equation:
0);...( 1 !U
Ud
xxd
n
In addition, the maximum likelihood method can be used to estimate J
unknown parameters by solving simultaneously the following homogenous J
equations:
1( . . . ; )0
n
j
d
x x
d
U
U ! 1..j J! (2.2.1)
Indeed U can not be obtained by (2.2.1) if the following conditions are not valid
(often called regularity conditions):
1. The first and second derivatives of the likelihood function must be defined.
2.
The range of Xs doesnt depend on the unknown parameters.3. The fisher information corresponding to each parameter greater than zero.
Typically solving (2.2.1) cannot be easily, thus one can use monotonic
transformation that making the calculation easier:
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
17/89
11
ln ( ; )ln ( ... ; )
n
in
i
j j
d f xd
x x
d d
UU
U U
!!
In general, MLEs estimates are asymptotically unbiased or consistent
estimators for the parameters. They have a powerful property called invariance, that
is, if U is MLE for , then^
)(Ug is MLE for g(). Furthermore MLEs estimates are
asymptotically normal distribution, so deriving confidence interval using MLE
estimates can be considered the shortest confidence interval when sample size is
large. If there is an efficient estimator for that achieves the Cramr-Rao lower bound
, it must be MLE. If U is a location parameter then UU is a pivotal quantity; also if
U is a scale parameter thenU
U is a pivotal quantity.
2.2.3 Method of Ordinary Least Squares
The method of least squares or ordinary least squares (OLS) is often has a vital
role in the statistical researches, particularly regression analysis, and has historically
much older than method of moments and method of maximum likelihood, it is
interesting to note that it is proposed by Gauss (see Gelder(1997)). Typically OLS
used to estimate the relation between two variables are known as independent and
dependent variables. Least squares problems fall into two categories, linear and non-
linear models. The linear least squares problem has a closed form solution in the
simple linear models, however in the multiple linear models and the non-linear
problem does not, this study will focus on one independent variable. Suppose there is
a theoretical relation between Y and X that can be expressed as:
iii UXBBY ! 10 ni ..1!
Where:
iY : represents the dependent or response random variable.
iX : represents the independent fixed variable.
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
18/89
iU : is random variable represents the residual of the model.
For estimating 0B and 1B , one can suggest that obtaining the estimators that
minimizing1
n
i
i
U!
, but since the residuals either positive or negative thus might
adding up will be small although poor estimators , however for avoiding this problem
it can resort to minimizing1
n
i
i
! , but sums of absolute values are not convenient to
work mathematically, therefore to overcome this difficulty OLS states that 0B and 1B
should minimize !
n
i
i
1
2 as possible, so that taking the partial differential with respect
to 0B and 1B respectively:
2 2
1 10 1 0 1
1 10 1
2( ) and 2 ( )
n n
i in n
i ii i i i i
i i
d d
y B B x x y B B xdB B
! !
! !
! !
Obtaining OLS estimators as:
0 1 0 1
1 1
2( ) 0 and 2 ( ) 0n n
i i i i i
i i
y b b x x y b b x! !
! ! )2.2.2(
Solving simultaneously (2.2.2) to obtain 1b and 0b :
xbyb
xnx
yxnyx
bn
i
i
n
i
ii
10
2
1
2
11 and !
!
!
!
In general estimators based on OLS can be identical with MLE if the
normality assumption is assumed (see Mood et al. (1974)), further OLS estimators
have a powerful property comparison with other linear estimators in the dependent
variable known as Gauss and Markov theorem states in which the errors have
expectation zero conditional on the independent variables, are uncorrelated and have
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
19/89
equal variances that 2)/()/( W!! iiii XVarXYVar , then OLS estimator
will be unbiased estimator for 0B and 1B and are more precise, less variability, than
any other unbiased estimators belong to the class of linear function in the response
variable, in other words among all linear unbiased estimators OLS estimators have a
smallest dispersion in repeated samples at fixed explanatory values, this properties is
well known as the best linear unbiased estimator ( BLUE).
2.3 Hypotheses TestingA statistical hypothesis tests is a method of making statistical decision using
sampling data, it is consider a key technique of statistical inference. The aim of using
hypotheses tests is to use the information in the sample to guide us to accept or reject
the doubtful hypothesis called null hypothesis oH against the alternative
hypothesis1H .
2.3.1.Introduction to Hypotheses TestingIn fact there are two types of hypotheses testing in the academic literature that
are classified as:
1. Parametric Hypothesis that is considered with one or more constraints imposed
upon the parameters of certain distribution.
2. Non-Parametric Hypothesis that is the statement about the form of the cumulative
distribution function or probability function of the distribution from which sample is
drawn.
Definition (2.2.1) : Critical region C(nXXX ..21 ) is a subset of the sample space,
where sample space consists of all possible samples that can be drawn from the
population at fixed sample space, for which oH is rejected. Indeed C( nXXX ..21 )
plays a significance role for accepting or rejecting the null hypothesisoH .
Definition (2.2.2): Test statistic T( nXXX ..21 ) is a rule or a procedure for deciding
whether or not reject the null hypothesis based on its sampling distribution, so that
the decision is rejecting the null hypothesis iff :
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
20/89
)..()..( 2121 nn XXXCXXXT
Typically alternative hypothesis can be classified as following:
1.Simple Hypothesis: if the statistical hypothesis specifies the probability distribution
completely.
2. Composite Hypothesis: if the statistical hypothesis doesnt specify the probability
distribution completely.
It is noted that o
should be take simple hypothesis to enable us for deriving
the sample distribution of the test statistic. Actual accepting or rejectingo
based on
the sampling data instead of the population, therefore the decision can be affected
with two kinds of errors.
1.Type I error E : this error can be done when we reject o
although it is correct,
also called the level of significance:
E! )/)..()..(( 2121 correctisHXXXCXXXTp onn
2.Type II error F : this error can be done when we reject 1H although it is correct, the
complement ofF called the Power of the Test F1 :
F! )/)..()..(( 12121 correctisHXXXCXXXTp nn
2.3.2.Tests Based on Likelihood FunctionMainly we need test statistical that keeps the two errors of the decision as
minimum as possible, unfortunately with a fixed sample size if one of errors was
minimized the other was maximized, so there is a negative relation between the two
errors, for overcoming this problem we can fix the more serious error, type I error,
and searching for the statistical test, which has the minimum type II error, or most powerful test. Indeed there are various approaches for driving most powerful tests,
(see Engle(1984)), this study is concerned with Likelihood Ratio Test (LR), The Wald
Test (WT), Lagrange test (score test) (LM) .
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
21/89
1. Likelihood Ratio Test (LR)This test was proposed by Marriott in 1990 (see Han 2002), the tests is operated
by obtaining the ratio between two likelihood functions one evaluated under restricted
parameters space and the other calculated under unrestricted parameters space,
suppose we have1 2
( , ,.. ; )n
f x x x U
and it is required to test:
ooo HSVH
{! UUUU :..: 1
Hence likelihood ratio testwill be:
11 0
1 1
0
1
ln ( ... ; )ln ( ... ; )
ln ( ... ; ) ln ( ... ; )
n
n
J
n n
H
H
Max
x xMax
x x
RTMax
x x
x x
U
U
UU
U U
! !
WhereU is a vector 1J v of the tested parameters and
U refers to MLE of
U , so that
LRT lies between zero and one, therefore large values of LR as evidence agreement
betweeno
U and MLE that can enable us to accept oH , while small values of LRT
guides to reject oH . Because it is not well known the sample distribution of LRT it is
recommended for using the following formula (see Engle (1984)):
1 0 12( ln ( ... ; ) ln ( ... ; ))n nL R Max L x x L x xU U
!
It is proved that LR has chi square distribution with J degrees of freedom for large
sample size, so one can conclude to reject oH if ,JLR EGu .
2. Wald Test (WT)There is another test for testing the agreement between the null and alternative
hypothesis was proposed by Wald in 1943 (see Han 2002), the idea based on testing
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
22/89
whether the distance betweenU and
oU is significance large or not via the following
formula:
0 0 ( )( ( ))( ) t
JWT IU U U U U
!
Since MLE always has asymptotic normal distribution thusJ
WT works by
standardizingU so that approximately distributed standard normal, then taking the
square to be limiting chi square distribution with J degrees of freedom. Suppose it is
required to test hU
as a subset ofU where:
1 2
{ )h h
U U U U
! L where h J
First partitioningU as:
{ )h j hU U U
!
Second partition the variance covariance matrix ofU as:
1
1
1
( ) ov( , ) ( ) ( )
ov( , ) ( )
h h J h
h J h J h
I C
V IC I
U U UU U
U U U
! ! -
Where Cov refers to the covariance matrix between two vector, hence hWT will be:
1 1 ( )( ( )) ( )th h o h h oWT IU U U U U
!
HencehWT has limiting chi square distribution with h degrees of freedom.
3. Lagrange Multiplier TestIn statistical inference there is a well-known test related to Lagrange multiplier
(LM) for testing hypothesis concerned with the parameters of the distribution.
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
23/89
Aitcheson and Silvey (1958) proposed the Lagrange multiplier test (score test) which
derives from a restricted maximum likelihood estimation using Lagrange multiplier,
therefore first it is required to explain in brief about Lagrange multipliers, then discuss
the score test.
In mathematical optimization, the method of Lagrange multipliers provides a
strategy for finding the maximum or minimum of the objective function subject to
constraints, this method is due to Jeseph Louis Lagrange, suppose it is required to
obtain the extreme values of the objective function:
)..,( 21 nxxxf
Subject to
1 2( , .. ) 0
i ng x x x ! 1..i m!
Where )..,( 21 nxxxf and )..,( 21 ni xxxg are differentiable functions and m n
. This
method required to obtain the Lagrangian equation, then differentiating the
Lagrangian equation with respect to sxd and sPd, where sPd refer to the Lagrange
multipliers, that yield nm equations, finally solving the nm homogenous
equations in nm unknowns to reach the values of sxd that represent the extreme
values of )..,( 21 nxxxf and in the same time satisfying the m conditions for more
details see Thomas (2005).
For recognizing if the solution represents the maximum, minimum or saddle
values, it is required to calculate the determinant of the Hessian matrix nn v that has
the following formula:
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
24/89
2
1 2
2
( , ) 2
1 2
( , .. )
( )( , .. )
o o no
i
n n
o o no
i j
d f x x xfor i j
d xHess x
d f x x xfor i j
dx dx
!
!
{
Where )..,( 21 nooo xxx are the values of sxd which satisfied the m conditions. Thus via
the determinant of the Hessian matrix one can recognize if the solution represents the
maximum, minimum or saddle values as follows:
1. )..,( 21 nooo xxx are classified as minimum values for )..,( 21 nxxxf iff 0"Hess .
2. )..,( 21 nooo xxx are classified as maximum values for )..,( 21 nxxxf iff 0Hess .
3. )..,( 21 nooo xxx are classified as saddle points for )..,( 21 nxxxf iff 0!Hess .
In calculus of variation, there is a fundamental equation that based on
Lagrangian equation is known as Euler-Lagrange equation, actually Euler-Lagrange
equation is useful for solving an optimization problem in which the objective function
is donated as a functional, function of function, and one seeks the function that
maximizing or minimizing it. To see this point suppose it is required to seek ( )f x that
maximizing the following functional:
( ( ), ( ), )F f x f x xd
Where ( )f xd denotes as the first derivative of ( )f x with respect to x. The solution,
without proof, will be according to Riley et al. (2006):
( ( ), ( ), ) ( ( ), ( ), )( )
( ) ( )
dF f x f x x d dF f x f x x
df x dx f x
d d!
d
If the functional doesnt contain ( )f xd , hence the Euler-Lagrange equation will
become:
( ( ), ( ), )0
( )
dF f x f x x
df x
d!
An excellent example that making the idea more clearly is the principle of maximum
entropy method that will be explained later.
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
25/89
The idea of LM that is required to maximized );...( 1 UnxxL with respect to the
null hypothesis 0UU ! , hence the Lagrangian function can be expressed as:
)();...(),( 01 UUPUPU ! nxxLLagr
Differentiating ( , )Lagr U P with respect to U and P then setting to zero, it will yield:
0),...(),( 1 !! P
U
U
UPU
d
xxdL
d
dLagr n and 0),(
0 !! UUPPU
d
dLagr(2.3.1)
One can solve (2.3.1) simultaneously by obtaining the derivative of the );...( 1 UnxxL
with respect toU , then replacing U with 0U into the derivative, thus it yields:
PU
U
UPU
!!d
xxdL
d
dLagr n ),...(),( 01 (2.3.2)
Typically (2.3.2) known as the score function )( 0US .Since U is often unknown so it
will be estimated by MLE, hence smaller value of )( 0US will agree with 0U is close to
MLE and accept the null hypothesis, otherwise reject 0U is MLE, thus score test
measures the magnitude between the tested value ( 0U ) and MLE via testing if )( 0US
is significance different from zero or not. It is notice that under oH the mean and the
variance of )( 0US are zero and the fisher information )(UI respectively, thus LM can
be written as:
)(
))((
0
2
0
U
U
I
SLM !
Mainly LM has chi-square distribution with one degree of freedom for large sample
and under the null hypotheses, for more details (see Judge et al. (1982)). Suppose we
have J parameters and it is required to test them simultaneously then LM test has the
following formula:
1( ) ( ) ( )to o oLM S I SU U U
! (2.3.3)
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
26/89
Where )(oS U refers the score function of the vector
oU ,
1)(
oI U refers to the inverse
of the information matrix of orderJ Jv , taking the following formula respectively:
1( ... , )( )
n
j
d L x xS
d
UU
U
! and
2
1
( , ) 2
1
( ln ( ... ; ))( )
( ln ( ... ; ))
n
i
J J
n
i j
d
E L x x f or i jdI
dE L x x f or i j
d d
UUU
UU U
!!
{
It is proven that (2.3.3) has chi-Square distribution with J degrees of freedom (see
Engle (1984)), further an interesting relationship between the three tests can be
represented geometrical when U is one dimension as follows:
Figure (1): The likelihood tests in one dimension
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
27/89
2.4 Measures of InformationA great variety of the informations measures are proposed in the literatures
recently (see Estban and Morales (1995)), since Shannon (1948) was the first one who
written about measuring the samples information and has a huge contribution for
development the information theory, thus this section is concerned with Shannons
entropy .
2.4.1. Shannon Entropy and Related MeasuresThe origin of the entropy concept goes back to Ludwig Boltzmannin (1877), it
is a Greek notation meaning transformation, it has been given a probabilistic
interpretation in information theory by Shannon (1948). He consider the entropy as
index of the uncertainty associated with a random variable expressed in nats, where
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
28/89
nat (sometimes nit or nepit) is a unit of information or entropy, based on natural
logarithms.
Definition (2.4.1): Let there is n events with probabilities nppp ..21 adding up to 1,
Shannon (1948) stated the entropy corresponding these events take the following
formula:
!
!n
i
iixpxpXH
1
)(ln)()( (2.4.1)
he claimed that via (2.4.1) one can transform the information in the sample from the
invisible form to numerical physical form so the comparisons can easily made and can
be understood, further it can be regarded as the variance for the qualitative data.
Assumek
nnn .., 21 be the number of each categories occurs in the experiment of
length n, where:
nnk
i
i!
!1
andn
np ii !
Shannon (1948) mentioned that the all possible combination that partition n into k
categories of size kn can be indicator for the accuracy of any decision associated to
this sample (see Golan (1996) and Mack (1988)), one can present the numbers of all
possible combination as:
!!..!
!
21
..2,1k
n
knnn nnn
nCW !! (2.4.2)
It is obvious that (2.4.2) is always greater than or equal to one and less than1
!
( !)kn
n, if
(2.4.2) equals one this indicator for the sample has one category and that refers to the
maximum of accuracy and minimum uncertainty, for more simplicity Shannon (1948)
preferred to deal with logarithm of W as follows:
!
!k
i
innW1
!ln!ln)ln(
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
29/89
Using Striling approximation that states:
gp} xasxxxx ln!ln
Thus ln(W) will be:
!!
}k
i
i
k
i
ii nnnnnnW11
lnln)ln(
!
}k
i
iinnnn
1
lnln1
ln lnk
i i
i
n n n np!
! 1
ln (ln ln )k
i i
i
n n n n p!
!
1 1 1
ln ln ln lnk k k
i i i i i
i i i
n n n n n p n p
! ! !
} !
Therefore on can conclude:
)(ln)ln(1
1 pHppWnk
i
ii!}
!
Typically Shannons (1948) entropy can be regarded as a measurement of the
accuracy associated to the decisions sample in average. Equation (2.4.1), according
to Shannon (1948), satisfies the following properties:
1 The quantity )(XH reaches a minimum, equal to zero, when one of the events is
a certainty, assuming 0)0ln(0 ! , and )(XH reaches the maximum when all the
probabilities are equal, hence )(XH can be regarded as a concave function, for
instance suppose an experiment has two outcomes , then the entropy can be:
Figure (2): The curve of H(p) in one dimension
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
30/89
2 If some events have zero probability, they can just as well be left out of the
entropy when we evaluate the uncertainty.
3. Entropy information must be symmetric that does not depend on the order of the
probabilities. For continuous distribution (2.3.1) can take the following formula:
g
g
! dxxfxfXH ),(ln),()( UU
It can easy notice that entropy for the continuous variables satisfies Shannons (1948)
properties, but it can take negative values.
Definition (2.3.2): Joint entropy is a measurement concerned with uncertainty ofthe two variables takes the following formula:
!
!n
i
iiiiyxpyxpYXH
1
),(ln),(),(
It is obvious that:
)()(),( YHXHYXH e
According to Shannon (1948) the uncertainty of a joint events is less than or equal to
the sum of the individual uncertainties and with equality only if the events are
independent.
Definition (2.3.3): Mutual information measures the information that X and Yshare, takes the following formula:
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
31/89
!
!n
i ii
iiii
ypxp
yxpyxpYXM
1 )()(
),(ln),(),(
It is obvious that 0),( !YXM if the two variables are independent.
Definition (2.3.4): Conditional entropy )/( YXH is a measure of what Y does notsay about X, meaning how much information in X does not in Y, takes the following
formula:
)(),()/( YHYXHYXH !
If the two variables are independent then the conditional entropy )/( YXH will
equal )(XH .
Remark: Definitions from (2.3.2) to (2.3.4) can be extended to the continuousvariables, via just replacing the summation symbol with the integration symbol. It can
realize that there is a relation between the measures of information as follows:
Venn diagram: relation between the informations measures
2.4.2.Kullback Leibler Divergence (Relative Entropy)Definition (2.3.5): Kullback and Leibler (1951) introduced relative-entropy or
information divergence, which measures the distance between two distributions of a
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
32/89
random variable. This information measure is also known as KL-entropy taking the
following formula:
!
!n
i i
i
iyq
xpxpYXKL
1 )(
)(ln)()/( (2.4.3)
Where ( ) and q( ) 0i i
p x y " , typically (2.4.3) can be regarded as the relative entropy
for using Y instead of X, actual there is a relation between )/( YXKL and ( )H X as
following:
1 1 1
( / ) ( ) ln ( ) ( ) ln ( ) ( ) ( ) ln ( )n n n
i i i i i i
i i i
KL X Y p x p x p x q y H x p x q y! ! !
! !
(2.4.4)
Thus (2.4.4) can be consider a good tool for discrimination between two distributions
(see Gohale (1983)). Indeed )/( YXKL has the following famous properties:
1. )/( YXKL isn't symmetry that:
)/()/( XYKLYXKL {
2. )/( YXKL is non-negative measure and it equals zero iff X and Y are identity:
iallforYXKL 0)/( u (2.4.5)
According to Lue (2007) ,(2.3.5) can be studied using the following identity :
0,)ln( "u yxforyxy
xx (2.4.6)
Hence, one can rewrite (2.4.3) according to (2.4.6) as:
0)(),()()()(
)(ln)(
111
"u !!!
ii
n
i
i
n
i
i
n
i i
ii xqxpforxqxp
yq
xpxp
0)(11
uu !
n
i
ixq
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
33/89
Thus one can conclude that iallforYXKL 0)/( u , indeed KL can be applied when
the variables are continuous via replacing the symbol of summation with integration
notation, furthermore also all the properties are valid, therefore it is recommended in
the literature for using )/( YXKL instead of H(x) for the continuous distribution (see
Dukkipati (2006)).
Chapter (III)
Goodness of Fit Based on Maximum Entropy
Statistical distributions are playing a vital role in the scientific researches,
since recognizing the probability distribution of the sample study denotes the key
word in many situations. There are many goodness of fit tests proposed in the
literatures to test the hypothesis that the drawn sample has a specificdistribution. This
chapter is organized as on one hand discussing parameters estimation based on
maximum entropy and estimation entropys function, on the other hand using
parameters estimation and estimation entropys function for fitting the distribution of
the sample.
3.1 Parameters Estimation Based on Maximum EntropyAccording to Jaynes (1957) the principle of maximum entropy approach
(POME) is a relative new method estimation and can be regarded as a flexible and
powerful tool for estimation the probability distribution. Using maximum entropy
method one should peak the probability distribution of the specified sample which
satisfies certain moments representing in one constrain or more, typically it can be the
mean, variance and skewness ..etc, and in the same time maximized samplings
entropy.
In the discrete case, estimating the probability distribution for representing
the sample by (POME) required:
1. Define the entropy for the available data.
2. Define the given or prior information in some independent constraints.
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
34/89
3. Maximizing the entropy function subject to some independent constraints.
In a mathematical form, it is required to maximizing:
!
!n
i
iixpxpXH
1
)(ln)()(
Subject to consistency constraints:
1)(1
!!
n
i
ixp and Jjcxpxg j
n
i
iij..1)()(
1
!!!
,
Where jc are constant numbers. Define the following Lagrangian function:
,..1
))()(()1)()(1()(ln)()),((111
Jj
cxpxgxpxpxpxpLagr j
n
i
ijj
n
i
io
n
i
iii
!
! !!!
PPP
WhereP denotes as a vector of Lagrange multipliers ),.., 1 Jo PPP , using
differentiation we have:
( ( ), )ln ( ) ( ) 0
( )
i
i o j j
i
Lagrd p xp x g x
dp x
PP P ! ! 1..j J!
Hence the mass function of maximum entropy will be:
( , )( ( ), )) exp( ( )o j jip i
x Lagr p x g xP P Pd
! 1..j J! (3.1.1)
It is easy to check that the general solution of (3.1.1) gives the maximum entropy. For
making the idea more obvious, according to Paul (2003), suppose there is a restaurant
has three meals {C,D,E} with {1$,2$,3$} respectively, if we have information that the
customer can spend in average 1.5 $ of the meal. Computing the probability of the
customer will demand each meal via (POME) as follows:
1.Define the entropy of the sample:
!
!3
1
)(ln)()(i
iixpxpxH
Whereix represents the price of the meal (i).
2.Define the given or prior information in independent constraints:
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
35/89
5.1)(1)(3
1
3
1
!! !! i
ii
i
ixpxandxp
Maximizing the following entropy function subject to two independent constraints,
using Lagrangian function as follows:
)5.1)(()1)()(1()(ln)()),((3
1
1
3
1
3
1
! !!!
i
ii
i
io
i
iiixpxxpxpxpxpLagr PPP
(3.1.2)
Differentiation (3.1.2) with respect to )(ixp and equality with zero yields:
1( , ) exp( )o ix ip xP P P!d (3.1.3)
For estimating different probabilities of the meals, substituting (3.1.3) in the two
independent constrains:
exp exp 2 exp 3 11 1 1( ) ( ) ( )o o oP P P P P P !
and
2 3exp exp 2 exp 3 1.51 1 1( ) ( ) ( )o o oP P P P P P !
Solving simultaneously the previous system gives:
843.,35. 1 !! PPo (3.1.4)
Substituting (3.1.4) in (3.1.3), thus it yields:
116.)(268.)(615.)( 321 !!! xpxpxp
Similarly for continuous distribution it is required to obtain ( , )f x U that
maximizing the following entropy function, objective function, as following:
( ) ( , ) ln ( , )H X f x f x dxU Ug
g
!
Subject to
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
36/89
( , ) 1f x dxUg
g
! and ( ) ( , ) 1..j jg x f x dx m j JUg
g
! ! (3.1.5)
Where ( , )f x U
satisfies the regularity conditions. To optimism the entropy function
subject to the conditions in (3.1.5), the Lagrangian function will be:
1
( , , ) ( , ) ln ( , ) ( 1)( ( , ) 1)
- ( ( ) ( , ) )
o
J
j j j
j
Lagr x f x f x d x f x dx
g x f x m dx
P U U U P U
P U
g g
g g
g
! g
!
=1
( , ) ln ( , ) ( , ) ( , ) 1
( ) ( , ) )
o o
j
j j j j
j
f x f x f x f x
g x f x m dx
U U P U U P
P U P
g
g
!
(3.1.6)
It can realize that (3.1.6) be a functional therefore differentiating (3.1.6) with respect
to ( , )f x U
using Euler Lagrange equation that yields according to Lue (2007):
10
( , ) ( )ln 0J
j jj
f x g xU PP!
!
Hence the maximum entropy density will be:
01
( , , ) exp( ( ))J
j j
j
f x g xU P P P !
! (3.1.7)
Where 0P called the normalized term and it is related to the other Lagrange
multipliers with the following formula:
0 1
ln( exp( ( ) ))J
j j j g x dxP P
g
!g!
(3.1.8)
Also (3.1.7) valid for any type of moment form. For making the idea more
obvious (see Radriguez (1984)), suppose it is required to search for the p.d.f. that
represents the sample given its variance equals two, hence the Lagrangian equation
will be:
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
37/89
2
1 1
( , , ) ( , ) ln ( , ) ( 1)( ( , ) 1)
( ( ) ( , ) 2)
oLagr f x f x dx f x dx
x f x dx
x U U U U
U U
P P
P
g g
g g
g
g
!
Where U is a vector of the parameters p.d.f. and 1U refers to the mean of the sample,
using (3.1.7) the maximum entropy density has the following formula:
2
0 1 1( , , ) exp( ( ) )f xx P U P P U
! (3.1.9)
Where:
2
0 1 1ln( exp( ( ) ))x dxP P Ug
g
!
2
1 1 1
1
exp( ( ) )
ln( )
x dxTP P U
TP
g
g
!
2
1 1 1
1
exp( ( ) )
ln( ) ln( )
x dxP P UT
P T
g
g
!
(3.1.10)
The second term of (3.1.10) is the normal distribution density, thus it will yield:
0
1
ln( )T
P
P
!
Substituting 0P in (3.1.9) gives:
2
1 11
1( , , ) exp( ( ) )f x x
PU P
TP U
! (3.1.11)
Actually (3.1.11) belongs to the normal distribution; hence the normal distribution has
a maximum entropy among all distributions subject to the fixed variance.
Singh and Rajagopal (1986) proposed a new approach for estimation the
parameters of the probability density distribution via principle of maximum entropy
(POME), in addition Singh et al. (1986) applied POME in various continuous
distributions, their idea generally consists of three steps summarized as:
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
38/89
1. Transforming the probability density function as a function in the Lagrange
multipliers instead of function in the parameters of the distribution.
2. Estimating the Lagrange multipliers.
3. Recognizing the relation between the Lagrange multipliers and the parameters of
the distribution.
Note that transforming the probability density function as a function in the
Lagrange multipliers instead of function in the parameters can be formulated via
inserting the probability density functions raw moments in the (3.1.5).
Estimating the Lagrange multipliersjP can be done by two ways, first one
can insert the maximum entropy density in (3.1.5) yields 1J nonlinear equations in
1J unknowns, and then solving numerically for reaching the suitable solution (seeZellener et al. (1988) and Wu(2003)). Second way is transforming the constrained
optimization problem into unconstrained optimization problem using the dual
approach (see Golan et al. (1996)), this idea can be summarized as:
a) Substitute (3.1.7) in the objective function thus it will become:
1
( ) ( , ) ln exp( ( ))J
od j j
j
H f x g x dxP P P Pg
!g
!
= 01
( , ) ( )J
j j
j
f x g x dxP P Pg
!g
Using (3.1.5) it yields:
0
1
( )J
d j j
j
H mP P P
!
! (3.1.12)
b) Hence the objective function (3.1.12) rely only on the Lagrange multipliers which
have the inverse relation with the objective function in (3.1.4), thus maximizing the
entropy required to minimizing (3.1.12) by obtaining the derivative with respect to jP
to satisfy the first condition:
j
J
jjj
jj
d
d
md
d
od
d
dH
P
P
PP
P
P ! ! 1
)(
j
j
J
j
jj
md
dxxgd
! g
g !
P
P )))(exp(ln(1
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
39/89
jJ
j
jj
J
j
jjj
m
dxxg
dxxgxg
!
g
g !
g
g !
))(exp(
))(exp()(
1
1
P
P
jJ
j
jj
J
j
jjj
m
dxxg
dxxgxg
!
g
g !
g
g !
)))(exp(exp(ln(
))(exp()(
1
1
P
P
Using (3.1.8) we have:
j
J
j
jjoj
j
d
mdxxgxgd
dH!
g
g !
1
)(exp()()(
PPP
P
jjmdxxfxg !
g
g),()( P (3.1.13)
To assure that the estimates of sPdas the minimum values of the dual entropy, one
should evaluate the second derivative of (3.1.13) as follows:
i
jj
ij
d
d
mdxxfxgd
dd
dH
P
P
PP
P
!g
g
),()()(
1( ) exp( ( ))
J
j j j
j
i
d g x o g x dx
d
P P
P
g
!g
!
1 1
( ) exp( ( )) ( ) ( )exp( ( ))J J
o j j j j i j j
j ji
dg x o g x g x g x o g x dx
d
PP P P P
P
g
! !g
!
1 1
( )exp( ( )) ( ) ( )exp( ( ))J J
o j j j j i j j
j ji
dg x o g x dx g x g x o g x dx
d
PP P P P
P
g g
! !g g
!
dxxfxgxgdxxfxgdxxfxgijji
g
g
g
g
g
g
! ),()()(),()(),()( PPP
JjixgxgExgExgE jiji ee! ,1))()(())(())(( (3.1.14)
The second derivative (3.1.14) would be a square matrix of order J, (known as
Hessian matrix), it is regarded as a variance covariance matrix (it is every where
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
40/89
positive definite), thus sPdcan be regarded as the minimum values of the dual
entropy.
The most serious step during estimation by POME recognizes the relation
between the estimated Lagrange multipliers and the parameters of the distribution.Generally it is required to compute (3.1.8) then inserted in the maximum entropy
density (3.1.7), finally making a comparison between (3.1.7) and the original
probability density. For making this idea more simplicity, letn
XXX .., 21 be a random
sampling of size n generated from normal distribution with ( 2,WQ ), it is clear that the
entropy function corresponding to normal distribution will be:
( ) ( , ) ln ( , ) H x f x f x dxU Ug
g
!
=
2
2
1 .5( )( , ) ln exp( )
2
x f x dx
QU
WW T
g
g
2( , ) ln ( , )( ) f x Adx B f x x dxU U Qg g
g g
!
}),(2),(),({)ln( 22 g
g
g
g
g
g
! QUUU xfxxfdxxxfBA
})2()({)ln(22
Q! xExEBA
Where2
1 1and
22A B
WW T
! ! . According to sighn et al. (1986) the sufficient
constraints for estimating ( 2,WQ ) represented in )}(),({ 2 xExE , hence for
transforming the density function as function in Lagrange multipliers one should
maximizing the following entropy function:
g
g!dxxfxfXH ),,(ln),,()(
22
WQWQ
Subject to:
1),,( 2 !g
g
dxxf WQ and 2..1),,( 2 !!g
g
jmdxxfx jj
WQ (3.1.15)
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
41/89
It is proved as a general solution in (3.1.7) that:
2
1
, exp( )( ) jo jj
f x xP P P!
! (3.1.16)
Estimating jP can be done by two ways, first one based on inserting (3.1.16) in
(3.1.15) yields three nonlinear equations in three unknowns, thus using any numerical
approach for reaching the solution, secondly based on transforming the constrained
optimization to unconstrained optimization via using dual approach. To obtain the
parameters of the distribution, first it is required to obtain0
N
P as follows:
2
0 ( )
1
ln( exp( )j N j N j
x dxP Pg
!g
! (3.1.17)
Actually (3.1.17) is nearly to the normal distribution, according to sighn et al. (1986)
the solution will be:
2
10 2
2
.5ln .5ln
4
NN N
N
PP T P
P! (3.1.18)
Substituting (3.1.18) in (3.1.16) yields:
2 21
21
2
( , ) exp
( .5ln .5ln )4
jN N jN
jN
f x xPP
T P PP !
!
222 1
1 2
2
2
2
exp( )
4
N N
N N
N
x xP
T
PP P
P!
221 1
2
2 2
22
( )2
exp 2 ( ) 4 2
N N
N
N N
N x xP
T
P PP
P P!
22 1 1
22
2 2
22
2
2exp( ( )) 2 4
N N
N
N N
N x xP
T
P PP
P P!
22 12
2
2
2
exp( ( ) )
2
N N
N
N
xP
T
PP
P! (3.1.19)
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
42/89
It is easy considered that equation (3.1.19) has a normal density with mean
1
2
2
N
N
P
P and variance
2
1
2N
P, so that:
21
2 2
1
and 2 2
N
N N
PQ WP P! ! (3.1.20)
In addition to make the calculations easier inserting (3.1.19) in (3.1.15)
yields:
1),( !g
g dxxf P , 1),( mdxxxf !
g
gP and 22 ),( mdxxfx !
g
gP
In the light of (3.1.20) the constrains will convert to:
1),( !g
g
dxxf P , 12
1
2
m
N
N !
P
Pand 2
2
1
2
2
2
1
2
1)
2
( mm
NN
N d!PP
P
HenceNN 21
PP and have a closed form as:
2
1and
2
2
2
11 d!d
!
mm
mNN
PP (3.1.21)
Stengos and Wu (2004),(2007) proved that there is an equivalent relation
between the density of maximum entropy ( , )f x P
and the original p.d.f. if it is
member of exponential family as follows:
1
( , ) exp(ln( ( )) ln( ( )) ( ) ( ))J
j j
j
f x a b x c g xU U U
!
! (3.1.22)
Comparing (3.1.22) with (3.1.7), it will be conclude thatoP is corresponding to
ln( ( ))a U
and ( )j jg xP is corresponding to ( ) ( )j jc g xU
and )(xb is one. Due to this
symmetric relation they concluded that until the density belongs to the exponential
family the parameters estimators based on either MLE or POME will be identical as
follows:
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
43/89
1
1 1 1
ln ( ... , ) ln( ( , )) ln(exp(ln( ( )) ln( ( )) ( ) ( )))n n J
n i j j
i i j
L x x f x a b x c g xU U U U
! ! !
! !
=1 1
ln(exp( ( )))n J
o j j
i j
g xP P! !
=1
( ) ( )J
o j j d
j
n m nH xP P!
!
Hence the values that maximized the likelihood function will be the same as the
values that minimized the dual entropy and equivalent to maximize the constrained
entropy function until the distribution belongs to exponential family, thus due to this
relation one can conclude the uniqueness of maximum entropys estimates in this
case.
Sighn (1996) applied maximum entropy approach on distributions where
regularity conditions not valid and concluding by Monte Carlo simulation that
maximum entropy yielded the least parameters bias for all sample sizes comparative
to others methods of estimation such as probability weighted moments, maximum
likelihood and method of moments, so that overall maximum entropy offers an
alternative method for estimating the parameters of the frequency distributions.
3.2 Entropy Estimation Using Sampling m-SpacingProbability density function estimation was proposed in the statistical literatures
by variety nonparametric methods (see Beirlant et al. (1997)). This section will
concern with density estimation based on m-spacing.
3.2.1 Entropy Estimation Using Vasiceks EstimatorLet nXXX .., 21 be a random variables of size 3un , let )()2()1( .., nXXX denote
the corresponding order statistics, the sample entropy can be defined as:
( )1 1 1( )( )( ) (ln( ( ))) (ln( ( ) ) (ln( ) ) (ln( ) )
1
n idF xdF xH x E f x E f x E E
dx dx
i n
! ! ! !
e eWhere )(xFn is the following empirical distribution function:
n
xnobservatioofnumberxFn
e!)(
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
44/89
According to Mood et al. (1976) it is proved that:
(sup ( ) ( ) 0) 1n
nP F x F x
pg p !
Vasicek (1976) estimated the slope of the cumulative function via replacing thecumulative function with the empirical distribution function and the differential
operator with the difference operator (see Mao (2001)), therefore the slope can take
the following formula:
)()()()()()(
)()()(
2)()()(
mimimimimimi
minminin
xx
n
m
xx
n
mi
n
mi
xx
xFxF
x
xF
!
!
!
(
((3.2.1)
Where m is a positive integer is chosen by the user and well known as window size,
mainly choosing m is a serious problem, typically it is recommended to peak m that
gives the least mean square error MSE corresponding to each sample size, hence
substituting (3.2.1) in )(xH it will yield Vasiceks (1975) estimator entropy:
!
!
!n
i
mimimimi
vasm
xxnn
m
xxnExH
1
)()(1)()( )))(
())(
(()(2
ln2
ln
Where )1()( XX mi ! for 1 mi and )()( nmi XX ! for nmi " . Indeed (3.2.1) can
be considered as application of mean value theorem, that if )(xf is a continuous on
closed interval ? Aba, and differentiable inside the interval, then there exists c in the
open interval (a,b) such that:
ab
afbf
dx
cdf
!)()()(
Due to )( xH has always a boundary bias during estimation based on spacing
when 1 mi or nmi " , Vasicek (1976) recommended that m should be less than
2
nto reduce the number of a boundary bias, further he proved that )( xH is an
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
45/89
unbiased estimator for )(xH when gpn gpm and 0pn
m. To see this point it is
required to decompose )( xH into three parts due to studying it is behavior as follows:
1
1
( ) ln ( )n
vas i mn mn
i
H x n f x U V!
! (3.2.2)
Where:
})()({2
ln1
)()(
1!
!
n
i
mimimn XFXFm
nnU and }
))((
)()({ln
1 )()(
)()(1!
!
n
i mimii
mimi
mnXXxf
XFXFnV
To double check from (3.2.2) as following:
1
1
1 1
( ) ( )
1 1
( ) ( )1
1 ( ) ( )
ln ( )
ln ( ) ln { ( ) ( )}2
{ ( ) ( )ln }
( )( )
n
mn mn
i
n n
i m i m
i i
ni m i m
i i m i m
n f x U V
nn f x n F X F X
m
F X F X n
f x X X
!
! !
!
!
1
( ) ( )
1
( ) ( ) ( ) ( )
1
{ ln ( ) ln ln( ( ) ( ))2
ln( ( ) ( ) ln ( ) ln( )}
n
i i m i m
i
n
i m i m i i m i m
i
nn f x F X F X
m
F X F X f x X X
!
!
!
-
!n
i
mimiXX
m
nn
1
)()(
1 )()(2
ln (3.2.3)
It is clearly that (3.2.3) equals (3.2.2), hence to study the behavior of )( xH it is
enough to study separately the properties of the three parts in (3.2.2). First the
expected value for 1
1
ln ( )n
i
i
n f x
!
equals )(xH at large sample size by law of large
number.
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
46/89
Definition (3.2.1): The weak law of large numbers states that ifHI
W
2
2
"n then the
sample average is unbiased estimator toward the population average:
HIQI u
! 1)( 1n
y
P
n
i
i
WhereQ , 2W refer to the mean and the variance of the population receptively, and
(I ,H ) is denoted as any two specified numbers satisfying I> 0 and 0 < H < 1 for
more details see Mood et al. (1976).
So if we donate )(ln ii xfy ! hence according to definition (3.2.1) for large
sample size the absolute difference betweeni
y and )(i
yE will be small with high
probability.Thus the two parts represent the sources of the noisy, fortunately it is
proven under some conditions that the two parts will be approximately vanish. For
fixed sample size the effect of mnV decreases with decreasing the values of m, since for
any interval ),( )()( mimi xx there exists ),( )()( mimi xxx d such that:
)()()(
)()(
)()(
i
mimi
mimixf
xx
xFxFd!
Therefore for decreasing the widow size will decrease the effect of mnV as following:
}))((
)()(ln{
1 )()(
)()(1!
!
n
i mimii
mimi
mnXXxf
XFXFnV 0
)(
)(
1
1ln p
d$
!
n
i i
i
xf
xfn
Also )( mnUE can be written as:
-
!
!
})()({2
ln)(1
)()(
1n
i
mimimnXFXF
m
nnEUE
)2ln()ln(})()(ln{1
)()(
1 mnXFXFEnn
i
mimi ! !
)2ln()ln(})()(ln{
})()(ln{})()(ln{
1
)()(
1
1
)()(
1
1
)1()(
1
mnXFXFEn
XFXFEnXFXFEn
n
mni
min
mn
mi
mimi
m
i
mi
!
!
!
!
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
47/89
Suppose )( )()( ii XFh ! , since )(ih has uniform (0,1), see Mood et al.(1976), therefore
the joint distribution of ),( )()( iji hhf takes:
1 1
( ) ( ) ( ) ( )
( ) ( )
! ( ) (1 )
( , ) ( 1)!( 1)!( )!
i j n i j
i i j i i j
i j i
n h h h h
f h h i j n i j
! ( ) ( )0 1i i jh h
Recognizing the p.d.f. of )()()( ijio hhc ! required obtaining the joint distribution of
)(oc and )(ih that yields:
jinhc
jc
ih
jinji
nhcf iooiio
! )1(1)(1)!()!1()!1(
!),( )()()()()()( 110 )()( oi ch
Hence the marginal distribution of ( )oc :
1 1
( ) ( ) ( ) ( ) ( ) ( )
0
1 ( )!
( ) ( ) (1 )( 1)!( 1)!( )!
i j n i j
o i o o i i
ocn
f c h c c h d hi j n i j
!
Using binominal expansion:
( )
( ) ( ) ( ) ( ) ( )
( )( )
1 ! 1 1( 1) ( ) (1 )
( 1) !( 1) ! !( ) !0 0
o
i o o i i
f co
cn i j nt i j n i j t th c c h dh
i j t n i j tt
!
!
=
1 ( )
1
( ) ( ) ( ) ( )
0 0
! 1( 1) ( ) (1 )( 1)!( 1)! !( )!
on i jt j
o o i i
t
cn n i j t i tc c h d h
i j t n i j t
!
= ( ) ( )! 1
( 1) ( ) (1 )( 1)!( 1)!( )! !( )
0o o
n i jn j n jt c c
i j n i j t t i tt
! 10 )( oc
Taking the following inequality in the consideration:
)1()!(
)1()1(
)(!)!(
)1(
0 +++
!
! jnjini
jini
itttjin
jin
t
t
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
48/89
Therefore )(oc has beta )1,( jnj , hence ( )(ln )oE c , according to Kendall and
Stuart (1969), can be computed by obtaining first( )
(ln(1 ))oE c as follows:
1
1
( ) ( ) ( )
0
( ) (1 ) ( , 1)j n jo o o
c c dc B j n j ! (3.2.4)
Obtaining the derivative of (3.2.4) with respect to n as:
1
1
( ) ( ) ( ) ( )
0
( , 1)( ) (1 ) ln(1 )j n j
o o o o
dB j n jc c c dc
n
!
Hence( )(ln(1 ))oE c will be:
( )(ln(1 ))
1 ( , 1) ln ( , 1)
( , 1)
oE c
dB j n j d B j n j
B j n j dn dn
! !
ln ( 1) ln ( 1)( 1) ( 1)
d n j d nn j n
dn dn] ]
+ + ! !
Where ( )x] is a digamma function has the following formula:
( ) ( ) / ( ) x x x] d! + +
Obtaining( )
(ln )o
E c required to calculate the derivative of (3.2.4) with respect to j as
follows:
1
1
( ) ( ) ( ) ( ) ( )
0
( , 1)( ) (1 ) [ln( ) ln(1 )]j n j
o o o o o
dB j n jc c c c dc
dj
! (3.2.5)
Hence (3.2.5) will be:
( ) ( )
1 ( , 1)(ln ) (ln(1 )
( , 1)o o
dB j n jE c E c
B j n j dj
!
= ( 1) ( 1) ( ) ( 1) ( ) ( 1)n j n j n j j n] ] ] ] ] ] !
Hence ( )mnE U can be computed as:
d
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
49/89
1 1
1 1
1
1
( ) { ( 1) ( 1)} { (2 ) ( 1)}
{ ( ) ( 1)} ln( ) ln(2 )
m n m
mn
i i m
n
i n m
E U n i m n n m n
n n m i n n m
] ] ] ]
] ]
! !
!
!
Arranging the terms:
)2ln()ln()1()1(2
)1()(()2()1()(1
1
1
1
1
1
mnnn
mn
n
mn
nn
mimnnmnminUE
n
mni
mn
mi
m
i
mn
! !
!
!
]]
]]]]
1 1 1
1 1
( 1) ( 2 ) (2 ) ( )
( 1) ln( ) ln(2 )
m n
i i n m
n i m n n m m n n m i
n n m
] ] ]
]
! !
!
1
22( ) ( 1) (1 ) (2 ) ( 1) ln( ) ln(2 )m
mn
i
mE U i m m n n mn n
] ] ]!
!
Using the fact that for large x, (see Pardo(2003))
1( ) ln( )
2x x
x] } (3.2.6)
Taking (3.2.6) in the view, thus ( )E Umn
will be:
1
2 2 1 1( ) ( 1) ln(2 ) ln(2 ) ln( 1)2
1+ ln( ) ln(2 )
2 2
mmn
i
mE U i m m m nn n m n
n mn
]
!
!
1
2 2 1 1 1( 1) ln(2 )
2 2 1
m
i
mi m m
n n m n n]
!
! (3.2.7)
According to (3.2.7), ( )Umn
E will tend to zero under the
assumptions gpn , gpm and 0pn
m, so that Vasicek(1975) concluded that under
these conditions )( xH can be asymptotic unbiased estimator for )(xH . Unfortunately
there is some problems in this proof that is treated by Song (2000).
3.2.2 Entropy Estimation Using Correas Estimator
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
50/89
Correa (1995) gave another estimator of entropy based on m spacing, he
claimed that its estimator is regarded as a modified estimator of Vasicek (1975), he
concluded that his estimator has mean square error less than Vasiceks (1975)
estimator using the simulation. His idea based on estimating)(
)(
i
i
dx
xdFthrough the
interval ),( )()( mimi xx , but instead of taking only the upper and the lower of the
interval ),( )()( mimi xx , he used all the points, 2 1m , in the interval ),( )()( mimi xx by
applying ordinary least squares OLS on the following model:
nimimijxBBxF jiijni ..1,..,where)( )(10 !!! I
For )()()1()( njj xxandxx !! when njandj " ,1 respectively, where iB1 can be
regarded as the slope of the empirical distribution function )( )( jni xF on the
observations of the sample with in the interval ),( )()( mimi xx , hence it is required to fit
n models each model can be rewritten as:
)(10 jixbbn
j! mimij ! ,..,
Where:
( ) ( )
12
( ) ( )
( )( )(2 1)
( )
i m i m
j i
j i m j i m
i i m
j i
j i m
j jx x n n m
b
x x
! !
!
!
2
0
( ) ( )
2
( ) ( )
( )(2 1)
( )( )(2 1)
( )
m
i mj
j i
j i m
i m
j i
j i m
j i m mj
x xn n m
x x
!
!
!
!
( ) ( )
2
( ) ( )
( (2 1) ( )(2 1)( )
(2 1)( )
( )
i m
j i
j i m
i m
j i
j i m
j m m i m m
n n mx x
x x
!
!
!
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
51/89
( ) ( ) ( ) ( )
2 2
( ) ( ) ( ) ( )
(2 1)( )
(2 1)( ) ( )( )
( ) ( )
i m i m
j i j i
j i m j i m
i m i m
j i j i
j i m j i m
j i m
n n m
j i x x x x
n n
x x x x
! !
! !
! !
(3.2.8)
Where:
! !
mi
mij
j
im
xx
12
)(
)(
Finally the estimate of the entropy will be:
1
1
ln( ) ( )n
i
i
corr
bH x
n!!
A numerical comparison was performed amongcorrxH )(
,vanxH )(
and
vasxH )( by Correa (1992) with different sample sizes 10, 20 , 50 each with m equals
1,2,3 and 4 associated to three distributions N(0,1), Uniform(0,1) and Exp(1), where
vanxH )( doesnt concluded in this study, he concluded that
corrxH )( has smallest
mean square error and didnt affect with the levels of window size m.
3.2.3 Entropy Estimation Using Wieczorkowski and Grzegorzewskisestimators:
Wieczorkowski and Grzegorzewski (1999) proposed three new estimators, itwill concentrate only on the modification of the estimators of Vasicek (1975) and
Correa (1995). First one can conclude from (3.2.5) that:
)()()),(ln())((1
1
mnmn
n
i
ivasVEUExfnExHE !
!
U
!
$m
i
min
nmn
mmnxH
1
)1(2
)1()2()2
1()2ln()ln()( ]]]
The bias of vasxH )( can be written as:
)1(2
)1()2()2
1()2ln()ln())()((1
!
$m
i
vasmi
nnm
n
mmnxHxHE ]]]
Mainly Wieczorkowski and Grzegorzewski (1999) decided to correct the bias of
vasxH )( as subtracting ( )
mnE U from vasxH )( as follows:
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
52/89
!
!m
i
vaswmi
nnm
n
mmnxHxH
1
1 )1(2
)1()2()2
1()2ln()ln()()( ]]] (3.2.9)
Actually (3.2.9) can be regarded as corrected Vasiceks (1975) estimators,
thus Wieczorkowski and Grzegorzewski (1999) surprised for not using Vasicek
(1975) )(1 xHw , secondly they proposed an estimator which modified Correa (1992)
by jackknife method, their idea consists of the following steps:
1. Let corrxH )( be an estimator for )(xH after removing one observation, thus it is
required to calculate corrxH )( n times.
2. Obtain
corrxH )( that has the following formula:
n
xH
xH
n
i
corr
corr
!
!1
)(
)(
3. Calculate the jackknife estimator 2)(
wxH that can be expressed as :
! corrcorrw xHnxHnxH )()()()( 12
It is obvious that 2)(
wxH has the same properties of corrxH )( , that is if
corrxH )( is unbiased estimator so are
corrxH )( and 2)(
wxH , however if corrxH )(
is
biased estimator then 2)(
wxH so is but less than, to proof this point, according to
Wasserman (2006), the bias for many statistics can often express as:
)1
())((32
nO
n
b
n
axHbias ! (3.2.10)
Where )( kn , according to Theil (1971), denotes that there is a sequence of terms
which the leading or the dominator term of order kn and )( kk nn is bounded
sequence for large n. When (3.2.10) holds so that:
)1
()1(1
))((32 n
On
b
n
axHbias corr
!
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
53/89
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
54/89
2)(
WxH works better than corrxH )( because has less bias, finally
vasxH )( doesnt
posses good statistical properties.
3.3Goodness of Fit tests Based on Maximum Entropy
The entropy of a random variable is playing a fundamental role not only in the
information theory but also in the testing hypotheses; indeed there are two different
approaches for goodness of fit via maximum entropy: tests based on likelihood tests
and tests based on sampling m spacing.
3.3.1 Goodness of Fit Based on Likelihood TestsStengos and Wu (2004),(2007) derived a general distribution tests based on the
method of maximum entropy density, their tests based on the fact that there is one to
one relation between maximum entropy density ( , )f x P
and the probability density
function for specified distribution, in other words they tested whether the maximum
entropy corresponding to the distribution under the null hypothesis, normal
distribution, is suitable for representing the specified sample via testing
0 3..j j JP ! ! , actually in the practice life J will be small, in our case J = 4, they provided four flexible tests based on Lagrange multiplier principle test that it reduced
surprisingly to a test statistic with a simple closed form. It isnt worth nothing if we
using Wald test (WT) and Likelihood ratio (LR) for testing 0 3..4j
JP ! ! .
1.Tests Based on the Third and the Fourth MomentsStengos and Wu (2004) claimed that it can test the distributions maximum
entropy density via using the third and the fourth moments as follows:
4
1 0
1
( , ) exp( )jj
j
f x xP P P
!
!
Clearly ),(1
Pxf is integerable over the real line until the dominant term , 4x , in the
exponent must be an even function, otherwise ),(1Pxf will explode as gpx , the
second condition is that the coefficient associated with the dominant term, which is an
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
55/89
even function by the first condition, must be positive, otherwise ),(1Pxf will explode
as gpx . For testing the normality of the sample it is requited to test:
: 0o j
H P ! . .V S 1 : j jH P P!)
3,4j !
WhereP
denotes the maximum entropy estimator ofP .
a) According to Stengos and Wu (2004), for running the Lagrange multiplier test it is
required to obtain the score function and the information matrix under the null
hypothesis, the sample follows the standard normal, as follows:
1( )S P
d !
4
0
11
ln exp( )n
j
j i
ji
j
d x
d
P P
P
!!
4
0
1 1
( )n
j
j i
i j
j
d x
d
P P
P
! !
!
4
1{ }
o j jj
j
d m
nd
P P
P
!
! }{ jj
o md
dn
!
P
P
4
1
ln( exp( ) )
{ )
j
j
j
j
j
d x dx
n md
P
P
g
!g
!
1( )S P
d ! }))({})({})({})({( 4
4
3
3
2
2
1 mxEmxEmxEmxEn oooo dddd!
Where )( jo xEd refers to the expected of thejx under the standard normal of the
sample, using (3.1.21) )( jo xEd has the following formula:
g
g
g
g !
d!!d ))5.exp())exp()( 22
1
dxxxdxxxxE oNj
j
j
jNoN
jj
oPPP
Where:
g
g
!d ))5.exp(ln( 2 dxxoN
P
Since it is required to test only 43 and PP equal to zero, therefore the score function
under the standard normal can be:
)300()))(())((00()( 4344
3
3
1 mmnmxEmxEnS oo !dd!dP
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
56/89
The information matrix )(44 PvdI associated to testing the normality can be expressed
as:
(4,4)
2
( ln( ( , )) i
i
12
( ln( ( , )) i
i
1
( )
N
dnE f x
di
dnE f x
d di j
I
PP
PP P
P
!
{
d !
( , ) ( )J J NI P
d { ( ) ( ) ( )}i j j io o o
n E x E x E xd d d! = n
960120
01503
12023
0301
1 , 4i je e
According to Engle (1984) the Lagrange multiplier test takes the following formula:
1
1 1 1 1( ) ( ) ( )t
NLM S I SP P P
d d d d! )24
)3(
6(
2
4
2
3 !mm
n
Nevertheless 1ML dhas a simple closed form, it is testing only whether the sample
follows the standard normal that is required to standardize the sample before
operating the test, fortunately it can derived the 1ML dunder the normal ( Q ,2W ),
where2
and WQ denote the maximum entropy estimators for the normal distributions
parameters. Hence the score function under normal ( Q , 2W ) will be:
)})({})({00{)(4
4
3
3
1 mxEmxEnS oo !P
Where )( jo xE refers to the expected of thejx under normal ( Q , 2W ), using (3.1.21)
it can take:
221
12 2
1 ( ) exp( ) ) exp( )
2
( , )
j j j j
o oN jN oN
j
j
i N
mE x x x dx x x x dx
m m
x f x dx
P P P
P
g g
!g g
g
g
! ! d d
!
)
Where:
-
8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio
57/89
g
gd
d
! ))2
1exp(ln( 2
22
1 dxxm
xm
moNP
The information matrix under normal ( Q , 2W ) can be expressed as:
)}()()({)( 1j
o
i
o
ji
oN xExExEnI !
P
Finally the Lagrange multiplier test will be:
1
1 1 1 1( ) ( ) ( )tNLM S I SP P P
!
One can observe that 1ML dis a special case of 1LM .
b) For testing 0 3, 4h hP ! ! via Wald test(WT), it is first required to partition
P as:
!
21 PPP where 211