entropy statistics tests goodness of fit wald lagrange likelihood ratio

Upload: mohamed0105053257

Post on 30-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    1/89

    Cairo University

    Institute of Statistical Studies & Research

    Department of Mathematical Statistics

    Tests Based on Sampling Entropy

    By

    Mohamed Soliman Abdallah

    Supervised By

    Prof.Samir Kamel Ashour Dr.Esam Aly Amin

    Professor of Mathematical Statistics, Lecturer of Mathematical Statistics,

    Department of Mathematical Statistics, Department of Mathematical Statistics,

    I.S.S.R. Cairo University I.S.S.R. Cairo University

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    2/89

    A Thesis Submitted to the Department of Mathematical Statistics

    In Partial Fulfillment of the Requirements for the Degree of

    MASTER OF SCIENCE IN

    Mathematical Statistics

    2009

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    3/89

    Acknowledgement

    All Gratitude due to ALLAH

    I would like to extend my appreciation to my advisor Prof. Samir

    Kamel Ashour for his support, guidance and patience during the

    completion of this study and my graduate studies, really I feel very lucky

    due to working with this kind of people. Special thanks for Dr. Esam Ali

    Amin for his encouragement to complete this thesis.

    I would like to express my gratitude to my parents, my brother,

    my sister, Mr Ebrahim and Mr Tag Eldean and every one who helps me

    throughout this work.

    I can not forget to thank Wikipedia.com that made the process of

    the research becomes easier and faster than before.

    Summary

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    4/89

    Goodness of fit tests play a vital role in the scientific and academic, so that a lot

    of goodness of fit tests were suggested by the researchers throughout the previous

    decades and each one has its own philosophy during deriving the test. In particular

    this study deals in some details with tests based on sampling Shannons (1948)

    entropy.

    Mainly there are two schools that employ the sampling entropy for goodness

    of fit, on one hand the non parametric estimation which contains two routes, first

    Vasiceks (1975) approach that suggested to test the samples distribution by consider

    the ratio between the observed entropy and the expected entropy, second the approach

    of Arizono and Hiroshi (1989) that mentioned to test the samples distribution by

    consider the difference between the observed entropy and the expected entropy.

    On the other hand the parametric approach of Stengos and Wu(2004),(2007)

    that proposed a flexible four tests which are derived by Lagrange multiplier test based

    on principle of maximum entropy which is proposed by Jaynes (1957).

    In this study we concentrated on the parametric approach by deriving the

    Stengos and Wus (2004),(2007) tests by wald and likelihood ratio test instead of

    Lagrange multiplier test, in addition operating simulated comparisons among

    entropys estimators, which mentioned only in this essay, besides carrying out a

    numerical comparisons among all tests based on Shannons (1948), finally proposed

    some academic points that can be conducted in the future researches.

    Table of Contents

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    5/89

    Pages

    Chapter (1) Introduction 1

    Chapter (2) Definitions and Notation

    2.1 Estimation Theory 4

    2.2 Methods of Estimation 8

    2.2.1 Method of Moments 8

    2.2.2 Method of Maximum Likelihood 9

    2.2.3 Method of Least Square 10

    2.3 Hypotheses Testing 12

    2.3.1 Introduction to Hypotheses Testing 12

    2.3.2 Tests Based on Likelihood Function 13

    2.4 Measures of Information 14

    2.4.1

    2.4.2

    Shannon Entropy and Related Measures

    Kullback Leibler divergence (Relative Entopy)

    19

    22

    Chapter (3) Goodness of Fit Based on Maximum Entropy 253.1 ParametersEstimation Based on Maximum Entropy 25

    3.2 Entropy Estimation Using Sampling m-Spacing 35

    3.2.1 Entropy Estimation Using Vasiceks Estimator 35

    3.2.2 Entropy Estimation Using Correas Estimator 41

    3.2.3 Entropy Estimation Using Wieczorkowski et als estimators 42

    3.3 Goodness of Fit Based on Maximum Entropy 45

    3.3.1 Goodness of Fit Based on Likelihood Tests 46

    3.3.2 Goodness of Fit Based on Sampling m-Spacing 58

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    6/89

    Chapter (4) Monte Carlo Simulation of The Entropys Tests 67

    4.1 Simulated Results for the Performance of Entropys Estimators 67

    4.2 Power Comparisons Among Tests Based on Sampling Entropy 69

    4.2.1 Simulated Results for Testing the Normality 69

    4.2.2 Simulated Results for Testing the Uniformity 73

    4.2.3 Simulated Results for Testing the Exponintiality 74

    4.3 Extensions 76

    Appendices 77

    Appendix(1): Tables 78

    Appendix(2): Programs 128

    References 160

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    7/89

    Chapter (I)

    Introduction

    Testing the distribution of the sample has a long been an interesting issue in

    the body of the literature, it is considered a key word in statistical modeling and

    potentially useful developing statistical methodology, in particular in this essay it is

    concerned with the tests based on the sample Shannons (1948) entropy.

    Entropy as a statistical concept was formulated by Shannon (1948) to measure

    the uncertainly or the size of the information in the sample, such that increasing

    entropy denotes to less information and more uncertainly, in addition Kullback and

    Liebler (KL) (1951) proposed an indicator to measure the divergence between two

    sample via making a comparison between the amount of the information which can be

    obtained from each sample, so that high values of KL refers to wide area between he

    two samples and vice versa.

    Jaynes (1957) utilized the Shannon entropy and proposed a flexible tool for

    estimation the probability of the events using the prior information for instance the

    average or the variance of the events, then Singh and Rajagopal (1986) extended this

    tool for estimating the parameters of the frequency distribution.

    Vasciek (1975) discovered a new route for testing the normality based on

    sampling entropy, mainly the problem that faced him is how to estimate the sampling

    entropy, he preferred to estimate the entropy function using m spacing , after that

    sampling entropy is considered a famous issue and is reported in a variety fields, so

    that there are four different entropys estimators based on m spacing without

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    8/89

    Vascieks (1975) estimator, this study will concentrate on Correas (1995) estimator

    and the two estimators of Wieczorkowski ad Grzegorzewski (1999).

    Dudewicz and Van der Meulen (1981) extended Vascieks (1975) approach

    for testing uniformity, however Gokhale (1983) applied Vascieks (1975) idea on a lot

    of distributions, Taufer(2002) proposed a new idea for testing the exponintiality via

    the two transformations which proposed by Seshadri and Csorgo (1969), his idea

    operates any one of the two transformations on the data then operating Dudewicz and

    Van der Meulens (1981) test, that if the transformed data is really uniform hence one

    can conclude that the original data follows exponential distribution and vice versa.

    Arizono and Ohta (1989) proposed an interesting idea for testing the

    normality by utilizing KL to test the divergence between the observed sample and

    expected sample under the null hypothesis, moreover Ebrahimi and Habibullah

    (1991) applied KLs approach on exponential distribution, further Mao(2002) applied

    the same approach in the multivariate distributions.

    Stengos and Wu (2004), (2007) proposed a nother way for testing the

    normality based on entropy by Lagrange multiplier test that they used Jaynes (1957)

    idea for deriving four flexible normality tests, mainly this idea can be regarded as the

    parametric approach because it doesnt need to estimate entropy function.

    Mainly, the main purposes for this study can be summarized as following:

    1. Review of both the parametric and the non parametric goodness of fit tests based

    on sampling entropy.

    2. Instead of deriving only the Lagrange multiplier test for testing the normality as

    Stengos and Wu(2004),(2007) did, it isnt worth nothing to drive wald and likelihood

    ratio test for testing the normality, moreover making a comparison between the three

    tests.

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    9/89

    3. Investigating numerically the performance of the entropys estimators, which

    based on m spacing.

    4. Operating a numerical illustration among the non parametric tests and on the other

    hand the parametric tests.

    Indeed the components of this study will be organized in the following form:

    1. The second chapter will divided into four parts, the first part will discuss some

    topics in the estimation theory, but the second part shows in brief some methods of

    estimation, however the third part will concentrate on testing hypotheses and some

    common tests will be used in this study, finally the fourth part will deals with the

    review of the Shannons (1948) entropy and some related measures of information.

    2. The third chapter will essentially divided into three parts, the first part will focus

    on estimation based on principle of maximum entropy, however the second partconcerned with some estimators of entropy based on m spacing, however the second

    part explains both the parametric tests and the non parametric tests based on sampling

    entropy.

    3. Finally the fourth chapter contains the comments of the numerical results that is

    concluded from Monte Carlo simulation as well as suggestions for future options of

    academic researchers.

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    10/89

    Chapter (II)

    Definitions and Notation

    This chapter is concerned with some important definitions and notation that

    will be used in this study. The first section deals with some definitions associated to

    estimation theory, then the second is concerned with different approaches of

    estimation, however the third section is devoted to some topics in hypotheses testing,

    finally the fourth section discuss Shannons entropy and some measures of

    information.

    2.1 Estimation TheoryIn most statistical studies the parameters of the population are unknown and

    must be estimated from the sample because it is impossible or just too much trouble

    (in terms of time or expensive) to look at the entire population, therefore estimation

    theory has a vital role in statistical inference and is divided in to point estimation and

    interval estimation.

    Definition (2.1.1): Point estimation is a number obtained from computations on the

    observed values of the random sample that serves as an approximation to the

    parameters of the population.

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    11/89

    It is important to point the difference between an estimate and corresponding

    estimator, that an estimate is a particular value calculated from a specified sample of

    observation, but an estimator is a random variable can be regarded as point

    estimation. Of course, there are a lot of estimators corresponding to each parameter of

    populationU , therefore one would probably obtain many different estimates forU ,thus it is required to discuss the criteria that make one estimator preferable to another.

    Definition (2.1.2) Suppose U is a statistic from observed random sample and

    consider a point estimator forU , we called U is an unbiased estimator forU iff :

    E (U ) =U

    If the previous condition valid in the large sample size, we called, U is asymptoticunbiased estimator forU .

    Definition (2.1.3): Suppose1U , 2U are two estimators forU and if:

    1)(

    )(

    )(

    )(

    2

    1

    2

    2

    2

    1 !

    U

    U

    UU

    UU

    MSE

    MSE

    E

    E

    then 1

    U is more efficient than 2

    U , where MSE refers to the mean square error of theestimator. If the previous condition valid in the large sample size, hence 1

    U will be

    asymptotic more efficient than 2U .

    Definition (2.1.4) :The statistic U is consistent estimator forU iff:

    1)|)((|lim pgp

    \UUpn

    where \ > 0

    It is obvious that consistency is an asymptotic property and sometimes called

    convergence in probability. If U is unbiased estimator for U and its MSE tends to

    zero at large sample size, so U is consistent estimator forU .

    Definition (2.1.5): An estimator U is said to be a sufficient statistic iff it utilizes all

    information in a sample relevant to the estimation for U , that is meaning all the

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    12/89

    knowledge about U can be gained from the whole sample can just as well as gained

    from U only . In mathematical form, U is a sufficient statistic iff the conditional

    probability distribution of the random sample given U is independent of U .

    Definition (2.1.6): The probability density function (p.d.f) ),( Uxf considered as a

    member of the exponential family if ),( Uxf can be rewritten as the following

    formula:

    ( , ) ( ) ( )exp{ ( ) ( )}f x a b x c g xU U U!

    Where )(Ua and )(Uc are two different functions in U , however )(xb and )(xg are

    two different functions in x .Furthermore exponential family can be applied for more

    than one parameter as follows:

    1

    ( , ) ( )( ( ))exp( ( ) ( )))J

    j j

    j

    f x a b x c g xU U U

    !

    !

    One of the advantages from regarding ( , )f x U

    belongs to exponential family that

    1 2( ), ( ).. ( )i i j ig x g x g x can be considered as a joint sufficient statistics forU

    .

    Cramer- Rao proposed an inequality that proposed the lower bound for the

    variance of the unbiased estimator .Assume U is unbiased estimator forU then:

    2));(ln(

    1)(

    UU

    Uxf

    d

    dnE

    V u (2.1.1)

    If the two sides coincided, then U is the best estimator forU among unbiased

    estimators, from (2.1.1) inequality, we notice the following points:

    1. Inequality (2.1.1) can take another equivalent formula:

    22

    2

    1 1( )

    ( ln ( ; )) ) ( ln ( ; ))

    Vd d

    nE f x nE f xd d

    UU U

    U U

    u !

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    13/89

    2. The denominator of the (2.1.1) called fisher information I(U ), that is an index for

    the size of the information in the sampling corresponding U . It is obvious more

    information leads to more accuracy meaning less variability. If we haveU of order

    1J v parameters, thus the fisher information called information matrix of order J Jv

    can be expressed as:

    2

    2

    ( , ) 2

    ( ln ( , )) if i j

    ( )

    ( ln ( , )) if i j

    i

    J J

    i j

    dnE f x

    dI

    dnE f x

    d d

    UU

    U

    UU U

    !

    !

    {

    Definition (2.1.7): Confidence interval is an interval determined by two numbers

    obtained from the computations on the observed values that is expected to contain the

    parameterU in its interior.

    Let nXXX ..21 be a random sample from f(x;U ) , assume L = )..( 211 nxxxt and

    U = )..( 212 nxxxt satisfies L

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    14/89

    EU ! 1));..(( 21 bXXXQaP n

    Converted to:

    EU ! 1)),..(),..((212211

    bXXXgaXXXgPnn

    Where (a, b, E ) can be regarded as values free of U .

    2. Obtain two values (a,b) in the domain of pivotal quantity, Where ba , which

    minimizing the length of the interval:

    ),..(),..( 211212 aXXXgbXXXgLength nn !

    Subject to

    !b

    a

    n dxXXXQh EU 1)),..(( 21

    Where )),..(( 21 UnXXXQh is the sampling distribution of ),..( 21 UnXXXQ .

    Furthermore, Guenther (1969) concluded that the two sided confidence interval

    based on symmetric distributions can be considered as the shortest confidence interval

    because of the systematic of the distribution, however confidence interval based on

    asymmetric distribution cannot be considered the shortest confidence interval, so he

    recommended for using a table which proposed by Tate and Klett(1959) for the shortest

    confidence interval based on chi-square distribution corresponding different levels of

    sample size and various levels of significance .

    2.2 Methods of EstimationAfter established some criteria which judge on the performance of the

    estimators, hence it is required to discuss in brief the methods of estimation the

    populations parameters, mainly many methods were proposed for construction the

    estimators in the statistical literatures, however this section is concerned with three

    methods of estimation.

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    15/89

    2.2.1 Method of MomentsIt is difficult to track back who introduce the method of moments MOM ,

    but Johnan Bernoulli(1667-1748) was the first who used the method in his work (see

    Gelder(1997)), this method based on solving simultaneously a system of J equationsconsist of matching the observed sample moments with unobservable population

    moments, where J refers to the number of the estimated parameters. Typically

    different types of the observed sampling moments can be used as following:

    1. The moments about zero (raw moments):

    1 ( )

    n

    ij i

    j

    x

    m E xn

    !! !

    2. The central moments:

    2

    1

    ( ) ( )

    n

    ij i

    j

    x x

    E x xn

    !

    d! !

    3. The standard moments:

    1

    ( )

    ( )

    nji

    j i

    j

    x x

    x xE

    n

    W

    E W!

    d

    ! !

    Where and andx W refer to the mean and the standard deviation of the probability

    density function (p.d.f) respectively. The method of moments in general provides

    estimators, which are biased but consistent as large sample size, and not efficient, they

    are often used because they lead to very simple computations, moreover may be used

    as the first approximation or initial values to the solutions for other methods that need

    for iteration. It is not unique, that instead of using the row moments, we can use the

    central moments, therefore we obtain another estimators, unfortunately in some cases

    MOM can not be applicable.

    2.2.2 Method of Maximum LikelihoodIt is difficult to track who discovered this tool, but Bernoulil in 1700 was the

    first who reported about it (see Gelder (1997)), the idea that it is required to give the

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    16/89

    specified sample high probability to be drawn, so it is required to research about the

    parameters that maximized the likelihood function for the specified sample. The

    likelihood function is the joint density function for the completely random sampling

    taking the following formula:

    );();...(1

    1 UU in

    in

    xfxx

    !!

    The method of maximum likelihood is required to estimate by searching for the

    value of^

    U that maximizes );...( 1 UnxxL , hence^

    U is called maximum likelihood

    estimator MLE, indeed obtaining^

    U in many cases by solving the following equation:

    0);...( 1 !U

    Ud

    xxd

    n

    In addition, the maximum likelihood method can be used to estimate J

    unknown parameters by solving simultaneously the following homogenous J

    equations:

    1( . . . ; )0

    n

    j

    d

    x x

    d

    U

    U ! 1..j J! (2.2.1)

    Indeed U can not be obtained by (2.2.1) if the following conditions are not valid

    (often called regularity conditions):

    1. The first and second derivatives of the likelihood function must be defined.

    2.

    The range of Xs doesnt depend on the unknown parameters.3. The fisher information corresponding to each parameter greater than zero.

    Typically solving (2.2.1) cannot be easily, thus one can use monotonic

    transformation that making the calculation easier:

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    17/89

    11

    ln ( ; )ln ( ... ; )

    n

    in

    i

    j j

    d f xd

    x x

    d d

    UU

    U U

    !!

    In general, MLEs estimates are asymptotically unbiased or consistent

    estimators for the parameters. They have a powerful property called invariance, that

    is, if U is MLE for , then^

    )(Ug is MLE for g(). Furthermore MLEs estimates are

    asymptotically normal distribution, so deriving confidence interval using MLE

    estimates can be considered the shortest confidence interval when sample size is

    large. If there is an efficient estimator for that achieves the Cramr-Rao lower bound

    , it must be MLE. If U is a location parameter then UU is a pivotal quantity; also if

    U is a scale parameter thenU

    U is a pivotal quantity.

    2.2.3 Method of Ordinary Least Squares

    The method of least squares or ordinary least squares (OLS) is often has a vital

    role in the statistical researches, particularly regression analysis, and has historically

    much older than method of moments and method of maximum likelihood, it is

    interesting to note that it is proposed by Gauss (see Gelder(1997)). Typically OLS

    used to estimate the relation between two variables are known as independent and

    dependent variables. Least squares problems fall into two categories, linear and non-

    linear models. The linear least squares problem has a closed form solution in the

    simple linear models, however in the multiple linear models and the non-linear

    problem does not, this study will focus on one independent variable. Suppose there is

    a theoretical relation between Y and X that can be expressed as:

    iii UXBBY ! 10 ni ..1!

    Where:

    iY : represents the dependent or response random variable.

    iX : represents the independent fixed variable.

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    18/89

    iU : is random variable represents the residual of the model.

    For estimating 0B and 1B , one can suggest that obtaining the estimators that

    minimizing1

    n

    i

    i

    U!

    , but since the residuals either positive or negative thus might

    adding up will be small although poor estimators , however for avoiding this problem

    it can resort to minimizing1

    n

    i

    i

    ! , but sums of absolute values are not convenient to

    work mathematically, therefore to overcome this difficulty OLS states that 0B and 1B

    should minimize !

    n

    i

    i

    1

    2 as possible, so that taking the partial differential with respect

    to 0B and 1B respectively:

    2 2

    1 10 1 0 1

    1 10 1

    2( ) and 2 ( )

    n n

    i in n

    i ii i i i i

    i i

    d d

    y B B x x y B B xdB B

    ! !

    ! !

    ! !

    Obtaining OLS estimators as:

    0 1 0 1

    1 1

    2( ) 0 and 2 ( ) 0n n

    i i i i i

    i i

    y b b x x y b b x! !

    ! ! )2.2.2(

    Solving simultaneously (2.2.2) to obtain 1b and 0b :

    xbyb

    xnx

    yxnyx

    bn

    i

    i

    n

    i

    ii

    10

    2

    1

    2

    11 and !

    !

    !

    !

    In general estimators based on OLS can be identical with MLE if the

    normality assumption is assumed (see Mood et al. (1974)), further OLS estimators

    have a powerful property comparison with other linear estimators in the dependent

    variable known as Gauss and Markov theorem states in which the errors have

    expectation zero conditional on the independent variables, are uncorrelated and have

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    19/89

    equal variances that 2)/()/( W!! iiii XVarXYVar , then OLS estimator

    will be unbiased estimator for 0B and 1B and are more precise, less variability, than

    any other unbiased estimators belong to the class of linear function in the response

    variable, in other words among all linear unbiased estimators OLS estimators have a

    smallest dispersion in repeated samples at fixed explanatory values, this properties is

    well known as the best linear unbiased estimator ( BLUE).

    2.3 Hypotheses TestingA statistical hypothesis tests is a method of making statistical decision using

    sampling data, it is consider a key technique of statistical inference. The aim of using

    hypotheses tests is to use the information in the sample to guide us to accept or reject

    the doubtful hypothesis called null hypothesis oH against the alternative

    hypothesis1H .

    2.3.1.Introduction to Hypotheses TestingIn fact there are two types of hypotheses testing in the academic literature that

    are classified as:

    1. Parametric Hypothesis that is considered with one or more constraints imposed

    upon the parameters of certain distribution.

    2. Non-Parametric Hypothesis that is the statement about the form of the cumulative

    distribution function or probability function of the distribution from which sample is

    drawn.

    Definition (2.2.1) : Critical region C(nXXX ..21 ) is a subset of the sample space,

    where sample space consists of all possible samples that can be drawn from the

    population at fixed sample space, for which oH is rejected. Indeed C( nXXX ..21 )

    plays a significance role for accepting or rejecting the null hypothesisoH .

    Definition (2.2.2): Test statistic T( nXXX ..21 ) is a rule or a procedure for deciding

    whether or not reject the null hypothesis based on its sampling distribution, so that

    the decision is rejecting the null hypothesis iff :

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    20/89

    )..()..( 2121 nn XXXCXXXT

    Typically alternative hypothesis can be classified as following:

    1.Simple Hypothesis: if the statistical hypothesis specifies the probability distribution

    completely.

    2. Composite Hypothesis: if the statistical hypothesis doesnt specify the probability

    distribution completely.

    It is noted that o

    should be take simple hypothesis to enable us for deriving

    the sample distribution of the test statistic. Actual accepting or rejectingo

    based on

    the sampling data instead of the population, therefore the decision can be affected

    with two kinds of errors.

    1.Type I error E : this error can be done when we reject o

    although it is correct,

    also called the level of significance:

    E! )/)..()..(( 2121 correctisHXXXCXXXTp onn

    2.Type II error F : this error can be done when we reject 1H although it is correct, the

    complement ofF called the Power of the Test F1 :

    F! )/)..()..(( 12121 correctisHXXXCXXXTp nn

    2.3.2.Tests Based on Likelihood FunctionMainly we need test statistical that keeps the two errors of the decision as

    minimum as possible, unfortunately with a fixed sample size if one of errors was

    minimized the other was maximized, so there is a negative relation between the two

    errors, for overcoming this problem we can fix the more serious error, type I error,

    and searching for the statistical test, which has the minimum type II error, or most powerful test. Indeed there are various approaches for driving most powerful tests,

    (see Engle(1984)), this study is concerned with Likelihood Ratio Test (LR), The Wald

    Test (WT), Lagrange test (score test) (LM) .

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    21/89

    1. Likelihood Ratio Test (LR)This test was proposed by Marriott in 1990 (see Han 2002), the tests is operated

    by obtaining the ratio between two likelihood functions one evaluated under restricted

    parameters space and the other calculated under unrestricted parameters space,

    suppose we have1 2

    ( , ,.. ; )n

    f x x x U

    and it is required to test:

    ooo HSVH

    {! UUUU :..: 1

    Hence likelihood ratio testwill be:

    11 0

    1 1

    0

    1

    ln ( ... ; )ln ( ... ; )

    ln ( ... ; ) ln ( ... ; )

    n

    n

    J

    n n

    H

    H

    Max

    x xMax

    x x

    RTMax

    x x

    x x

    U

    U

    UU

    U U

    ! !

    WhereU is a vector 1J v of the tested parameters and

    U refers to MLE of

    U , so that

    LRT lies between zero and one, therefore large values of LR as evidence agreement

    betweeno

    U and MLE that can enable us to accept oH , while small values of LRT

    guides to reject oH . Because it is not well known the sample distribution of LRT it is

    recommended for using the following formula (see Engle (1984)):

    1 0 12( ln ( ... ; ) ln ( ... ; ))n nL R Max L x x L x xU U

    !

    It is proved that LR has chi square distribution with J degrees of freedom for large

    sample size, so one can conclude to reject oH if ,JLR EGu .

    2. Wald Test (WT)There is another test for testing the agreement between the null and alternative

    hypothesis was proposed by Wald in 1943 (see Han 2002), the idea based on testing

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    22/89

    whether the distance betweenU and

    oU is significance large or not via the following

    formula:

    0 0 ( )( ( ))( ) t

    JWT IU U U U U

    !

    Since MLE always has asymptotic normal distribution thusJ

    WT works by

    standardizingU so that approximately distributed standard normal, then taking the

    square to be limiting chi square distribution with J degrees of freedom. Suppose it is

    required to test hU

    as a subset ofU where:

    1 2

    { )h h

    U U U U

    ! L where h J

    First partitioningU as:

    { )h j hU U U

    !

    Second partition the variance covariance matrix ofU as:

    1

    1

    1

    ( ) ov( , ) ( ) ( )

    ov( , ) ( )

    h h J h

    h J h J h

    I C

    V IC I

    U U UU U

    U U U

    ! ! -

    Where Cov refers to the covariance matrix between two vector, hence hWT will be:

    1 1 ( )( ( )) ( )th h o h h oWT IU U U U U

    !

    HencehWT has limiting chi square distribution with h degrees of freedom.

    3. Lagrange Multiplier TestIn statistical inference there is a well-known test related to Lagrange multiplier

    (LM) for testing hypothesis concerned with the parameters of the distribution.

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    23/89

    Aitcheson and Silvey (1958) proposed the Lagrange multiplier test (score test) which

    derives from a restricted maximum likelihood estimation using Lagrange multiplier,

    therefore first it is required to explain in brief about Lagrange multipliers, then discuss

    the score test.

    In mathematical optimization, the method of Lagrange multipliers provides a

    strategy for finding the maximum or minimum of the objective function subject to

    constraints, this method is due to Jeseph Louis Lagrange, suppose it is required to

    obtain the extreme values of the objective function:

    )..,( 21 nxxxf

    Subject to

    1 2( , .. ) 0

    i ng x x x ! 1..i m!

    Where )..,( 21 nxxxf and )..,( 21 ni xxxg are differentiable functions and m n

    . This

    method required to obtain the Lagrangian equation, then differentiating the

    Lagrangian equation with respect to sxd and sPd, where sPd refer to the Lagrange

    multipliers, that yield nm equations, finally solving the nm homogenous

    equations in nm unknowns to reach the values of sxd that represent the extreme

    values of )..,( 21 nxxxf and in the same time satisfying the m conditions for more

    details see Thomas (2005).

    For recognizing if the solution represents the maximum, minimum or saddle

    values, it is required to calculate the determinant of the Hessian matrix nn v that has

    the following formula:

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    24/89

    2

    1 2

    2

    ( , ) 2

    1 2

    ( , .. )

    ( )( , .. )

    o o no

    i

    n n

    o o no

    i j

    d f x x xfor i j

    d xHess x

    d f x x xfor i j

    dx dx

    !

    !

    {

    Where )..,( 21 nooo xxx are the values of sxd which satisfied the m conditions. Thus via

    the determinant of the Hessian matrix one can recognize if the solution represents the

    maximum, minimum or saddle values as follows:

    1. )..,( 21 nooo xxx are classified as minimum values for )..,( 21 nxxxf iff 0"Hess .

    2. )..,( 21 nooo xxx are classified as maximum values for )..,( 21 nxxxf iff 0Hess .

    3. )..,( 21 nooo xxx are classified as saddle points for )..,( 21 nxxxf iff 0!Hess .

    In calculus of variation, there is a fundamental equation that based on

    Lagrangian equation is known as Euler-Lagrange equation, actually Euler-Lagrange

    equation is useful for solving an optimization problem in which the objective function

    is donated as a functional, function of function, and one seeks the function that

    maximizing or minimizing it. To see this point suppose it is required to seek ( )f x that

    maximizing the following functional:

    ( ( ), ( ), )F f x f x xd

    Where ( )f xd denotes as the first derivative of ( )f x with respect to x. The solution,

    without proof, will be according to Riley et al. (2006):

    ( ( ), ( ), ) ( ( ), ( ), )( )

    ( ) ( )

    dF f x f x x d dF f x f x x

    df x dx f x

    d d!

    d

    If the functional doesnt contain ( )f xd , hence the Euler-Lagrange equation will

    become:

    ( ( ), ( ), )0

    ( )

    dF f x f x x

    df x

    d!

    An excellent example that making the idea more clearly is the principle of maximum

    entropy method that will be explained later.

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    25/89

    The idea of LM that is required to maximized );...( 1 UnxxL with respect to the

    null hypothesis 0UU ! , hence the Lagrangian function can be expressed as:

    )();...(),( 01 UUPUPU ! nxxLLagr

    Differentiating ( , )Lagr U P with respect to U and P then setting to zero, it will yield:

    0),...(),( 1 !! P

    U

    U

    UPU

    d

    xxdL

    d

    dLagr n and 0),(

    0 !! UUPPU

    d

    dLagr(2.3.1)

    One can solve (2.3.1) simultaneously by obtaining the derivative of the );...( 1 UnxxL

    with respect toU , then replacing U with 0U into the derivative, thus it yields:

    PU

    U

    UPU

    !!d

    xxdL

    d

    dLagr n ),...(),( 01 (2.3.2)

    Typically (2.3.2) known as the score function )( 0US .Since U is often unknown so it

    will be estimated by MLE, hence smaller value of )( 0US will agree with 0U is close to

    MLE and accept the null hypothesis, otherwise reject 0U is MLE, thus score test

    measures the magnitude between the tested value ( 0U ) and MLE via testing if )( 0US

    is significance different from zero or not. It is notice that under oH the mean and the

    variance of )( 0US are zero and the fisher information )(UI respectively, thus LM can

    be written as:

    )(

    ))((

    0

    2

    0

    U

    U

    I

    SLM !

    Mainly LM has chi-square distribution with one degree of freedom for large sample

    and under the null hypotheses, for more details (see Judge et al. (1982)). Suppose we

    have J parameters and it is required to test them simultaneously then LM test has the

    following formula:

    1( ) ( ) ( )to o oLM S I SU U U

    ! (2.3.3)

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    26/89

    Where )(oS U refers the score function of the vector

    oU ,

    1)(

    oI U refers to the inverse

    of the information matrix of orderJ Jv , taking the following formula respectively:

    1( ... , )( )

    n

    j

    d L x xS

    d

    UU

    U

    ! and

    2

    1

    ( , ) 2

    1

    ( ln ( ... ; ))( )

    ( ln ( ... ; ))

    n

    i

    J J

    n

    i j

    d

    E L x x f or i jdI

    dE L x x f or i j

    d d

    UUU

    UU U

    !!

    {

    It is proven that (2.3.3) has chi-Square distribution with J degrees of freedom (see

    Engle (1984)), further an interesting relationship between the three tests can be

    represented geometrical when U is one dimension as follows:

    Figure (1): The likelihood tests in one dimension

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    27/89

    2.4 Measures of InformationA great variety of the informations measures are proposed in the literatures

    recently (see Estban and Morales (1995)), since Shannon (1948) was the first one who

    written about measuring the samples information and has a huge contribution for

    development the information theory, thus this section is concerned with Shannons

    entropy .

    2.4.1. Shannon Entropy and Related MeasuresThe origin of the entropy concept goes back to Ludwig Boltzmannin (1877), it

    is a Greek notation meaning transformation, it has been given a probabilistic

    interpretation in information theory by Shannon (1948). He consider the entropy as

    index of the uncertainty associated with a random variable expressed in nats, where

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    28/89

    nat (sometimes nit or nepit) is a unit of information or entropy, based on natural

    logarithms.

    Definition (2.4.1): Let there is n events with probabilities nppp ..21 adding up to 1,

    Shannon (1948) stated the entropy corresponding these events take the following

    formula:

    !

    !n

    i

    iixpxpXH

    1

    )(ln)()( (2.4.1)

    he claimed that via (2.4.1) one can transform the information in the sample from the

    invisible form to numerical physical form so the comparisons can easily made and can

    be understood, further it can be regarded as the variance for the qualitative data.

    Assumek

    nnn .., 21 be the number of each categories occurs in the experiment of

    length n, where:

    nnk

    i

    i!

    !1

    andn

    np ii !

    Shannon (1948) mentioned that the all possible combination that partition n into k

    categories of size kn can be indicator for the accuracy of any decision associated to

    this sample (see Golan (1996) and Mack (1988)), one can present the numbers of all

    possible combination as:

    !!..!

    !

    21

    ..2,1k

    n

    knnn nnn

    nCW !! (2.4.2)

    It is obvious that (2.4.2) is always greater than or equal to one and less than1

    !

    ( !)kn

    n, if

    (2.4.2) equals one this indicator for the sample has one category and that refers to the

    maximum of accuracy and minimum uncertainty, for more simplicity Shannon (1948)

    preferred to deal with logarithm of W as follows:

    !

    !k

    i

    innW1

    !ln!ln)ln(

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    29/89

    Using Striling approximation that states:

    gp} xasxxxx ln!ln

    Thus ln(W) will be:

    !!

    }k

    i

    i

    k

    i

    ii nnnnnnW11

    lnln)ln(

    !

    }k

    i

    iinnnn

    1

    lnln1

    ln lnk

    i i

    i

    n n n np!

    ! 1

    ln (ln ln )k

    i i

    i

    n n n n p!

    !

    1 1 1

    ln ln ln lnk k k

    i i i i i

    i i i

    n n n n n p n p

    ! ! !

    } !

    Therefore on can conclude:

    )(ln)ln(1

    1 pHppWnk

    i

    ii!}

    !

    Typically Shannons (1948) entropy can be regarded as a measurement of the

    accuracy associated to the decisions sample in average. Equation (2.4.1), according

    to Shannon (1948), satisfies the following properties:

    1 The quantity )(XH reaches a minimum, equal to zero, when one of the events is

    a certainty, assuming 0)0ln(0 ! , and )(XH reaches the maximum when all the

    probabilities are equal, hence )(XH can be regarded as a concave function, for

    instance suppose an experiment has two outcomes , then the entropy can be:

    Figure (2): The curve of H(p) in one dimension

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    30/89

    2 If some events have zero probability, they can just as well be left out of the

    entropy when we evaluate the uncertainty.

    3. Entropy information must be symmetric that does not depend on the order of the

    probabilities. For continuous distribution (2.3.1) can take the following formula:

    g

    g

    ! dxxfxfXH ),(ln),()( UU

    It can easy notice that entropy for the continuous variables satisfies Shannons (1948)

    properties, but it can take negative values.

    Definition (2.3.2): Joint entropy is a measurement concerned with uncertainty ofthe two variables takes the following formula:

    !

    !n

    i

    iiiiyxpyxpYXH

    1

    ),(ln),(),(

    It is obvious that:

    )()(),( YHXHYXH e

    According to Shannon (1948) the uncertainty of a joint events is less than or equal to

    the sum of the individual uncertainties and with equality only if the events are

    independent.

    Definition (2.3.3): Mutual information measures the information that X and Yshare, takes the following formula:

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    31/89

    !

    !n

    i ii

    iiii

    ypxp

    yxpyxpYXM

    1 )()(

    ),(ln),(),(

    It is obvious that 0),( !YXM if the two variables are independent.

    Definition (2.3.4): Conditional entropy )/( YXH is a measure of what Y does notsay about X, meaning how much information in X does not in Y, takes the following

    formula:

    )(),()/( YHYXHYXH !

    If the two variables are independent then the conditional entropy )/( YXH will

    equal )(XH .

    Remark: Definitions from (2.3.2) to (2.3.4) can be extended to the continuousvariables, via just replacing the summation symbol with the integration symbol. It can

    realize that there is a relation between the measures of information as follows:

    Venn diagram: relation between the informations measures

    2.4.2.Kullback Leibler Divergence (Relative Entropy)Definition (2.3.5): Kullback and Leibler (1951) introduced relative-entropy or

    information divergence, which measures the distance between two distributions of a

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    32/89

    random variable. This information measure is also known as KL-entropy taking the

    following formula:

    !

    !n

    i i

    i

    iyq

    xpxpYXKL

    1 )(

    )(ln)()/( (2.4.3)

    Where ( ) and q( ) 0i i

    p x y " , typically (2.4.3) can be regarded as the relative entropy

    for using Y instead of X, actual there is a relation between )/( YXKL and ( )H X as

    following:

    1 1 1

    ( / ) ( ) ln ( ) ( ) ln ( ) ( ) ( ) ln ( )n n n

    i i i i i i

    i i i

    KL X Y p x p x p x q y H x p x q y! ! !

    ! !

    (2.4.4)

    Thus (2.4.4) can be consider a good tool for discrimination between two distributions

    (see Gohale (1983)). Indeed )/( YXKL has the following famous properties:

    1. )/( YXKL isn't symmetry that:

    )/()/( XYKLYXKL {

    2. )/( YXKL is non-negative measure and it equals zero iff X and Y are identity:

    iallforYXKL 0)/( u (2.4.5)

    According to Lue (2007) ,(2.3.5) can be studied using the following identity :

    0,)ln( "u yxforyxy

    xx (2.4.6)

    Hence, one can rewrite (2.4.3) according to (2.4.6) as:

    0)(),()()()(

    )(ln)(

    111

    "u !!!

    ii

    n

    i

    i

    n

    i

    i

    n

    i i

    ii xqxpforxqxp

    yq

    xpxp

    0)(11

    uu !

    n

    i

    ixq

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    33/89

    Thus one can conclude that iallforYXKL 0)/( u , indeed KL can be applied when

    the variables are continuous via replacing the symbol of summation with integration

    notation, furthermore also all the properties are valid, therefore it is recommended in

    the literature for using )/( YXKL instead of H(x) for the continuous distribution (see

    Dukkipati (2006)).

    Chapter (III)

    Goodness of Fit Based on Maximum Entropy

    Statistical distributions are playing a vital role in the scientific researches,

    since recognizing the probability distribution of the sample study denotes the key

    word in many situations. There are many goodness of fit tests proposed in the

    literatures to test the hypothesis that the drawn sample has a specificdistribution. This

    chapter is organized as on one hand discussing parameters estimation based on

    maximum entropy and estimation entropys function, on the other hand using

    parameters estimation and estimation entropys function for fitting the distribution of

    the sample.

    3.1 Parameters Estimation Based on Maximum EntropyAccording to Jaynes (1957) the principle of maximum entropy approach

    (POME) is a relative new method estimation and can be regarded as a flexible and

    powerful tool for estimation the probability distribution. Using maximum entropy

    method one should peak the probability distribution of the specified sample which

    satisfies certain moments representing in one constrain or more, typically it can be the

    mean, variance and skewness ..etc, and in the same time maximized samplings

    entropy.

    In the discrete case, estimating the probability distribution for representing

    the sample by (POME) required:

    1. Define the entropy for the available data.

    2. Define the given or prior information in some independent constraints.

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    34/89

    3. Maximizing the entropy function subject to some independent constraints.

    In a mathematical form, it is required to maximizing:

    !

    !n

    i

    iixpxpXH

    1

    )(ln)()(

    Subject to consistency constraints:

    1)(1

    !!

    n

    i

    ixp and Jjcxpxg j

    n

    i

    iij..1)()(

    1

    !!!

    ,

    Where jc are constant numbers. Define the following Lagrangian function:

    ,..1

    ))()(()1)()(1()(ln)()),((111

    Jj

    cxpxgxpxpxpxpLagr j

    n

    i

    ijj

    n

    i

    io

    n

    i

    iii

    !

    ! !!!

    PPP

    WhereP denotes as a vector of Lagrange multipliers ),.., 1 Jo PPP , using

    differentiation we have:

    ( ( ), )ln ( ) ( ) 0

    ( )

    i

    i o j j

    i

    Lagrd p xp x g x

    dp x

    PP P ! ! 1..j J!

    Hence the mass function of maximum entropy will be:

    ( , )( ( ), )) exp( ( )o j jip i

    x Lagr p x g xP P Pd

    ! 1..j J! (3.1.1)

    It is easy to check that the general solution of (3.1.1) gives the maximum entropy. For

    making the idea more obvious, according to Paul (2003), suppose there is a restaurant

    has three meals {C,D,E} with {1$,2$,3$} respectively, if we have information that the

    customer can spend in average 1.5 $ of the meal. Computing the probability of the

    customer will demand each meal via (POME) as follows:

    1.Define the entropy of the sample:

    !

    !3

    1

    )(ln)()(i

    iixpxpxH

    Whereix represents the price of the meal (i).

    2.Define the given or prior information in independent constraints:

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    35/89

    5.1)(1)(3

    1

    3

    1

    !! !! i

    ii

    i

    ixpxandxp

    Maximizing the following entropy function subject to two independent constraints,

    using Lagrangian function as follows:

    )5.1)(()1)()(1()(ln)()),((3

    1

    1

    3

    1

    3

    1

    ! !!!

    i

    ii

    i

    io

    i

    iiixpxxpxpxpxpLagr PPP

    (3.1.2)

    Differentiation (3.1.2) with respect to )(ixp and equality with zero yields:

    1( , ) exp( )o ix ip xP P P!d (3.1.3)

    For estimating different probabilities of the meals, substituting (3.1.3) in the two

    independent constrains:

    exp exp 2 exp 3 11 1 1( ) ( ) ( )o o oP P P P P P !

    and

    2 3exp exp 2 exp 3 1.51 1 1( ) ( ) ( )o o oP P P P P P !

    Solving simultaneously the previous system gives:

    843.,35. 1 !! PPo (3.1.4)

    Substituting (3.1.4) in (3.1.3), thus it yields:

    116.)(268.)(615.)( 321 !!! xpxpxp

    Similarly for continuous distribution it is required to obtain ( , )f x U that

    maximizing the following entropy function, objective function, as following:

    ( ) ( , ) ln ( , )H X f x f x dxU Ug

    g

    !

    Subject to

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    36/89

    ( , ) 1f x dxUg

    g

    ! and ( ) ( , ) 1..j jg x f x dx m j JUg

    g

    ! ! (3.1.5)

    Where ( , )f x U

    satisfies the regularity conditions. To optimism the entropy function

    subject to the conditions in (3.1.5), the Lagrangian function will be:

    1

    ( , , ) ( , ) ln ( , ) ( 1)( ( , ) 1)

    - ( ( ) ( , ) )

    o

    J

    j j j

    j

    Lagr x f x f x d x f x dx

    g x f x m dx

    P U U U P U

    P U

    g g

    g g

    g

    ! g

    !

    =1

    ( , ) ln ( , ) ( , ) ( , ) 1

    ( ) ( , ) )

    o o

    j

    j j j j

    j

    f x f x f x f x

    g x f x m dx

    U U P U U P

    P U P

    g

    g

    !

    (3.1.6)

    It can realize that (3.1.6) be a functional therefore differentiating (3.1.6) with respect

    to ( , )f x U

    using Euler Lagrange equation that yields according to Lue (2007):

    10

    ( , ) ( )ln 0J

    j jj

    f x g xU PP!

    !

    Hence the maximum entropy density will be:

    01

    ( , , ) exp( ( ))J

    j j

    j

    f x g xU P P P !

    ! (3.1.7)

    Where 0P called the normalized term and it is related to the other Lagrange

    multipliers with the following formula:

    0 1

    ln( exp( ( ) ))J

    j j j g x dxP P

    g

    !g!

    (3.1.8)

    Also (3.1.7) valid for any type of moment form. For making the idea more

    obvious (see Radriguez (1984)), suppose it is required to search for the p.d.f. that

    represents the sample given its variance equals two, hence the Lagrangian equation

    will be:

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    37/89

    2

    1 1

    ( , , ) ( , ) ln ( , ) ( 1)( ( , ) 1)

    ( ( ) ( , ) 2)

    oLagr f x f x dx f x dx

    x f x dx

    x U U U U

    U U

    P P

    P

    g g

    g g

    g

    g

    !

    Where U is a vector of the parameters p.d.f. and 1U refers to the mean of the sample,

    using (3.1.7) the maximum entropy density has the following formula:

    2

    0 1 1( , , ) exp( ( ) )f xx P U P P U

    ! (3.1.9)

    Where:

    2

    0 1 1ln( exp( ( ) ))x dxP P Ug

    g

    !

    2

    1 1 1

    1

    exp( ( ) )

    ln( )

    x dxTP P U

    TP

    g

    g

    !

    2

    1 1 1

    1

    exp( ( ) )

    ln( ) ln( )

    x dxP P UT

    P T

    g

    g

    !

    (3.1.10)

    The second term of (3.1.10) is the normal distribution density, thus it will yield:

    0

    1

    ln( )T

    P

    P

    !

    Substituting 0P in (3.1.9) gives:

    2

    1 11

    1( , , ) exp( ( ) )f x x

    PU P

    TP U

    ! (3.1.11)

    Actually (3.1.11) belongs to the normal distribution; hence the normal distribution has

    a maximum entropy among all distributions subject to the fixed variance.

    Singh and Rajagopal (1986) proposed a new approach for estimation the

    parameters of the probability density distribution via principle of maximum entropy

    (POME), in addition Singh et al. (1986) applied POME in various continuous

    distributions, their idea generally consists of three steps summarized as:

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    38/89

    1. Transforming the probability density function as a function in the Lagrange

    multipliers instead of function in the parameters of the distribution.

    2. Estimating the Lagrange multipliers.

    3. Recognizing the relation between the Lagrange multipliers and the parameters of

    the distribution.

    Note that transforming the probability density function as a function in the

    Lagrange multipliers instead of function in the parameters can be formulated via

    inserting the probability density functions raw moments in the (3.1.5).

    Estimating the Lagrange multipliersjP can be done by two ways, first one

    can insert the maximum entropy density in (3.1.5) yields 1J nonlinear equations in

    1J unknowns, and then solving numerically for reaching the suitable solution (seeZellener et al. (1988) and Wu(2003)). Second way is transforming the constrained

    optimization problem into unconstrained optimization problem using the dual

    approach (see Golan et al. (1996)), this idea can be summarized as:

    a) Substitute (3.1.7) in the objective function thus it will become:

    1

    ( ) ( , ) ln exp( ( ))J

    od j j

    j

    H f x g x dxP P P Pg

    !g

    !

    = 01

    ( , ) ( )J

    j j

    j

    f x g x dxP P Pg

    !g

    Using (3.1.5) it yields:

    0

    1

    ( )J

    d j j

    j

    H mP P P

    !

    ! (3.1.12)

    b) Hence the objective function (3.1.12) rely only on the Lagrange multipliers which

    have the inverse relation with the objective function in (3.1.4), thus maximizing the

    entropy required to minimizing (3.1.12) by obtaining the derivative with respect to jP

    to satisfy the first condition:

    j

    J

    jjj

    jj

    d

    d

    md

    d

    od

    d

    dH

    P

    P

    PP

    P

    P ! ! 1

    )(

    j

    j

    J

    j

    jj

    md

    dxxgd

    ! g

    g !

    P

    P )))(exp(ln(1

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    39/89

    jJ

    j

    jj

    J

    j

    jjj

    m

    dxxg

    dxxgxg

    !

    g

    g !

    g

    g !

    ))(exp(

    ))(exp()(

    1

    1

    P

    P

    jJ

    j

    jj

    J

    j

    jjj

    m

    dxxg

    dxxgxg

    !

    g

    g !

    g

    g !

    )))(exp(exp(ln(

    ))(exp()(

    1

    1

    P

    P

    Using (3.1.8) we have:

    j

    J

    j

    jjoj

    j

    d

    mdxxgxgd

    dH!

    g

    g !

    1

    )(exp()()(

    PPP

    P

    jjmdxxfxg !

    g

    g),()( P (3.1.13)

    To assure that the estimates of sPdas the minimum values of the dual entropy, one

    should evaluate the second derivative of (3.1.13) as follows:

    i

    jj

    ij

    d

    d

    mdxxfxgd

    dd

    dH

    P

    P

    PP

    P

    !g

    g

    ),()()(

    1( ) exp( ( ))

    J

    j j j

    j

    i

    d g x o g x dx

    d

    P P

    P

    g

    !g

    !

    1 1

    ( ) exp( ( )) ( ) ( )exp( ( ))J J

    o j j j j i j j

    j ji

    dg x o g x g x g x o g x dx

    d

    PP P P P

    P

    g

    ! !g

    !

    1 1

    ( )exp( ( )) ( ) ( )exp( ( ))J J

    o j j j j i j j

    j ji

    dg x o g x dx g x g x o g x dx

    d

    PP P P P

    P

    g g

    ! !g g

    !

    dxxfxgxgdxxfxgdxxfxgijji

    g

    g

    g

    g

    g

    g

    ! ),()()(),()(),()( PPP

    JjixgxgExgExgE jiji ee! ,1))()(())(())(( (3.1.14)

    The second derivative (3.1.14) would be a square matrix of order J, (known as

    Hessian matrix), it is regarded as a variance covariance matrix (it is every where

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    40/89

    positive definite), thus sPdcan be regarded as the minimum values of the dual

    entropy.

    The most serious step during estimation by POME recognizes the relation

    between the estimated Lagrange multipliers and the parameters of the distribution.Generally it is required to compute (3.1.8) then inserted in the maximum entropy

    density (3.1.7), finally making a comparison between (3.1.7) and the original

    probability density. For making this idea more simplicity, letn

    XXX .., 21 be a random

    sampling of size n generated from normal distribution with ( 2,WQ ), it is clear that the

    entropy function corresponding to normal distribution will be:

    ( ) ( , ) ln ( , ) H x f x f x dxU Ug

    g

    !

    =

    2

    2

    1 .5( )( , ) ln exp( )

    2

    x f x dx

    QU

    WW T

    g

    g

    2( , ) ln ( , )( ) f x Adx B f x x dxU U Qg g

    g g

    !

    }),(2),(),({)ln( 22 g

    g

    g

    g

    g

    g

    ! QUUU xfxxfdxxxfBA

    })2()({)ln(22

    Q! xExEBA

    Where2

    1 1and

    22A B

    WW T

    ! ! . According to sighn et al. (1986) the sufficient

    constraints for estimating ( 2,WQ ) represented in )}(),({ 2 xExE , hence for

    transforming the density function as function in Lagrange multipliers one should

    maximizing the following entropy function:

    g

    g!dxxfxfXH ),,(ln),,()(

    22

    WQWQ

    Subject to:

    1),,( 2 !g

    g

    dxxf WQ and 2..1),,( 2 !!g

    g

    jmdxxfx jj

    WQ (3.1.15)

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    41/89

    It is proved as a general solution in (3.1.7) that:

    2

    1

    , exp( )( ) jo jj

    f x xP P P!

    ! (3.1.16)

    Estimating jP can be done by two ways, first one based on inserting (3.1.16) in

    (3.1.15) yields three nonlinear equations in three unknowns, thus using any numerical

    approach for reaching the solution, secondly based on transforming the constrained

    optimization to unconstrained optimization via using dual approach. To obtain the

    parameters of the distribution, first it is required to obtain0

    N

    P as follows:

    2

    0 ( )

    1

    ln( exp( )j N j N j

    x dxP Pg

    !g

    ! (3.1.17)

    Actually (3.1.17) is nearly to the normal distribution, according to sighn et al. (1986)

    the solution will be:

    2

    10 2

    2

    .5ln .5ln

    4

    NN N

    N

    PP T P

    P! (3.1.18)

    Substituting (3.1.18) in (3.1.16) yields:

    2 21

    21

    2

    ( , ) exp

    ( .5ln .5ln )4

    jN N jN

    jN

    f x xPP

    T P PP !

    !

    222 1

    1 2

    2

    2

    2

    exp( )

    4

    N N

    N N

    N

    x xP

    T

    PP P

    P!

    221 1

    2

    2 2

    22

    ( )2

    exp 2 ( ) 4 2

    N N

    N

    N N

    N x xP

    T

    P PP

    P P!

    22 1 1

    22

    2 2

    22

    2

    2exp( ( )) 2 4

    N N

    N

    N N

    N x xP

    T

    P PP

    P P!

    22 12

    2

    2

    2

    exp( ( ) )

    2

    N N

    N

    N

    xP

    T

    PP

    P! (3.1.19)

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    42/89

    It is easy considered that equation (3.1.19) has a normal density with mean

    1

    2

    2

    N

    N

    P

    P and variance

    2

    1

    2N

    P, so that:

    21

    2 2

    1

    and 2 2

    N

    N N

    PQ WP P! ! (3.1.20)

    In addition to make the calculations easier inserting (3.1.19) in (3.1.15)

    yields:

    1),( !g

    g dxxf P , 1),( mdxxxf !

    g

    gP and 22 ),( mdxxfx !

    g

    gP

    In the light of (3.1.20) the constrains will convert to:

    1),( !g

    g

    dxxf P , 12

    1

    2

    m

    N

    N !

    P

    Pand 2

    2

    1

    2

    2

    2

    1

    2

    1)

    2

    ( mm

    NN

    N d!PP

    P

    HenceNN 21

    PP and have a closed form as:

    2

    1and

    2

    2

    2

    11 d!d

    !

    mm

    mNN

    PP (3.1.21)

    Stengos and Wu (2004),(2007) proved that there is an equivalent relation

    between the density of maximum entropy ( , )f x P

    and the original p.d.f. if it is

    member of exponential family as follows:

    1

    ( , ) exp(ln( ( )) ln( ( )) ( ) ( ))J

    j j

    j

    f x a b x c g xU U U

    !

    ! (3.1.22)

    Comparing (3.1.22) with (3.1.7), it will be conclude thatoP is corresponding to

    ln( ( ))a U

    and ( )j jg xP is corresponding to ( ) ( )j jc g xU

    and )(xb is one. Due to this

    symmetric relation they concluded that until the density belongs to the exponential

    family the parameters estimators based on either MLE or POME will be identical as

    follows:

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    43/89

    1

    1 1 1

    ln ( ... , ) ln( ( , )) ln(exp(ln( ( )) ln( ( )) ( ) ( )))n n J

    n i j j

    i i j

    L x x f x a b x c g xU U U U

    ! ! !

    ! !

    =1 1

    ln(exp( ( )))n J

    o j j

    i j

    g xP P! !

    =1

    ( ) ( )J

    o j j d

    j

    n m nH xP P!

    !

    Hence the values that maximized the likelihood function will be the same as the

    values that minimized the dual entropy and equivalent to maximize the constrained

    entropy function until the distribution belongs to exponential family, thus due to this

    relation one can conclude the uniqueness of maximum entropys estimates in this

    case.

    Sighn (1996) applied maximum entropy approach on distributions where

    regularity conditions not valid and concluding by Monte Carlo simulation that

    maximum entropy yielded the least parameters bias for all sample sizes comparative

    to others methods of estimation such as probability weighted moments, maximum

    likelihood and method of moments, so that overall maximum entropy offers an

    alternative method for estimating the parameters of the frequency distributions.

    3.2 Entropy Estimation Using Sampling m-SpacingProbability density function estimation was proposed in the statistical literatures

    by variety nonparametric methods (see Beirlant et al. (1997)). This section will

    concern with density estimation based on m-spacing.

    3.2.1 Entropy Estimation Using Vasiceks EstimatorLet nXXX .., 21 be a random variables of size 3un , let )()2()1( .., nXXX denote

    the corresponding order statistics, the sample entropy can be defined as:

    ( )1 1 1( )( )( ) (ln( ( ))) (ln( ( ) ) (ln( ) ) (ln( ) )

    1

    n idF xdF xH x E f x E f x E E

    dx dx

    i n

    ! ! ! !

    e eWhere )(xFn is the following empirical distribution function:

    n

    xnobservatioofnumberxFn

    e!)(

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    44/89

    According to Mood et al. (1976) it is proved that:

    (sup ( ) ( ) 0) 1n

    nP F x F x

    pg p !

    Vasicek (1976) estimated the slope of the cumulative function via replacing thecumulative function with the empirical distribution function and the differential

    operator with the difference operator (see Mao (2001)), therefore the slope can take

    the following formula:

    )()()()()()(

    )()()(

    2)()()(

    mimimimimimi

    minminin

    xx

    n

    m

    xx

    n

    mi

    n

    mi

    xx

    xFxF

    x

    xF

    !

    !

    !

    (

    ((3.2.1)

    Where m is a positive integer is chosen by the user and well known as window size,

    mainly choosing m is a serious problem, typically it is recommended to peak m that

    gives the least mean square error MSE corresponding to each sample size, hence

    substituting (3.2.1) in )(xH it will yield Vasiceks (1975) estimator entropy:

    !

    !

    !n

    i

    mimimimi

    vasm

    xxnn

    m

    xxnExH

    1

    )()(1)()( )))(

    ())(

    (()(2

    ln2

    ln

    Where )1()( XX mi ! for 1 mi and )()( nmi XX ! for nmi " . Indeed (3.2.1) can

    be considered as application of mean value theorem, that if )(xf is a continuous on

    closed interval ? Aba, and differentiable inside the interval, then there exists c in the

    open interval (a,b) such that:

    ab

    afbf

    dx

    cdf

    !)()()(

    Due to )( xH has always a boundary bias during estimation based on spacing

    when 1 mi or nmi " , Vasicek (1976) recommended that m should be less than

    2

    nto reduce the number of a boundary bias, further he proved that )( xH is an

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    45/89

    unbiased estimator for )(xH when gpn gpm and 0pn

    m. To see this point it is

    required to decompose )( xH into three parts due to studying it is behavior as follows:

    1

    1

    ( ) ln ( )n

    vas i mn mn

    i

    H x n f x U V!

    ! (3.2.2)

    Where:

    })()({2

    ln1

    )()(

    1!

    !

    n

    i

    mimimn XFXFm

    nnU and }

    ))((

    )()({ln

    1 )()(

    )()(1!

    !

    n

    i mimii

    mimi

    mnXXxf

    XFXFnV

    To double check from (3.2.2) as following:

    1

    1

    1 1

    ( ) ( )

    1 1

    ( ) ( )1

    1 ( ) ( )

    ln ( )

    ln ( ) ln { ( ) ( )}2

    { ( ) ( )ln }

    ( )( )

    n

    mn mn

    i

    n n

    i m i m

    i i

    ni m i m

    i i m i m

    n f x U V

    nn f x n F X F X

    m

    F X F X n

    f x X X

    !

    ! !

    !

    !

    1

    ( ) ( )

    1

    ( ) ( ) ( ) ( )

    1

    { ln ( ) ln ln( ( ) ( ))2

    ln( ( ) ( ) ln ( ) ln( )}

    n

    i i m i m

    i

    n

    i m i m i i m i m

    i

    nn f x F X F X

    m

    F X F X f x X X

    !

    !

    !

    -

    !n

    i

    mimiXX

    m

    nn

    1

    )()(

    1 )()(2

    ln (3.2.3)

    It is clearly that (3.2.3) equals (3.2.2), hence to study the behavior of )( xH it is

    enough to study separately the properties of the three parts in (3.2.2). First the

    expected value for 1

    1

    ln ( )n

    i

    i

    n f x

    !

    equals )(xH at large sample size by law of large

    number.

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    46/89

    Definition (3.2.1): The weak law of large numbers states that ifHI

    W

    2

    2

    "n then the

    sample average is unbiased estimator toward the population average:

    HIQI u

    ! 1)( 1n

    y

    P

    n

    i

    i

    WhereQ , 2W refer to the mean and the variance of the population receptively, and

    (I ,H ) is denoted as any two specified numbers satisfying I> 0 and 0 < H < 1 for

    more details see Mood et al. (1976).

    So if we donate )(ln ii xfy ! hence according to definition (3.2.1) for large

    sample size the absolute difference betweeni

    y and )(i

    yE will be small with high

    probability.Thus the two parts represent the sources of the noisy, fortunately it is

    proven under some conditions that the two parts will be approximately vanish. For

    fixed sample size the effect of mnV decreases with decreasing the values of m, since for

    any interval ),( )()( mimi xx there exists ),( )()( mimi xxx d such that:

    )()()(

    )()(

    )()(

    i

    mimi

    mimixf

    xx

    xFxFd!

    Therefore for decreasing the widow size will decrease the effect of mnV as following:

    }))((

    )()(ln{

    1 )()(

    )()(1!

    !

    n

    i mimii

    mimi

    mnXXxf

    XFXFnV 0

    )(

    )(

    1

    1ln p

    d$

    !

    n

    i i

    i

    xf

    xfn

    Also )( mnUE can be written as:

    -

    !

    !

    })()({2

    ln)(1

    )()(

    1n

    i

    mimimnXFXF

    m

    nnEUE

    )2ln()ln(})()(ln{1

    )()(

    1 mnXFXFEnn

    i

    mimi ! !

    )2ln()ln(})()(ln{

    })()(ln{})()(ln{

    1

    )()(

    1

    1

    )()(

    1

    1

    )1()(

    1

    mnXFXFEn

    XFXFEnXFXFEn

    n

    mni

    min

    mn

    mi

    mimi

    m

    i

    mi

    !

    !

    !

    !

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    47/89

    Suppose )( )()( ii XFh ! , since )(ih has uniform (0,1), see Mood et al.(1976), therefore

    the joint distribution of ),( )()( iji hhf takes:

    1 1

    ( ) ( ) ( ) ( )

    ( ) ( )

    ! ( ) (1 )

    ( , ) ( 1)!( 1)!( )!

    i j n i j

    i i j i i j

    i j i

    n h h h h

    f h h i j n i j

    ! ( ) ( )0 1i i jh h

    Recognizing the p.d.f. of )()()( ijio hhc ! required obtaining the joint distribution of

    )(oc and )(ih that yields:

    jinhc

    jc

    ih

    jinji

    nhcf iooiio

    ! )1(1)(1)!()!1()!1(

    !),( )()()()()()( 110 )()( oi ch

    Hence the marginal distribution of ( )oc :

    1 1

    ( ) ( ) ( ) ( ) ( ) ( )

    0

    1 ( )!

    ( ) ( ) (1 )( 1)!( 1)!( )!

    i j n i j

    o i o o i i

    ocn

    f c h c c h d hi j n i j

    !

    Using binominal expansion:

    ( )

    ( ) ( ) ( ) ( ) ( )

    ( )( )

    1 ! 1 1( 1) ( ) (1 )

    ( 1) !( 1) ! !( ) !0 0

    o

    i o o i i

    f co

    cn i j nt i j n i j t th c c h dh

    i j t n i j tt

    !

    !

    =

    1 ( )

    1

    ( ) ( ) ( ) ( )

    0 0

    ! 1( 1) ( ) (1 )( 1)!( 1)! !( )!

    on i jt j

    o o i i

    t

    cn n i j t i tc c h d h

    i j t n i j t

    !

    = ( ) ( )! 1

    ( 1) ( ) (1 )( 1)!( 1)!( )! !( )

    0o o

    n i jn j n jt c c

    i j n i j t t i tt

    ! 10 )( oc

    Taking the following inequality in the consideration:

    )1()!(

    )1()1(

    )(!)!(

    )1(

    0 +++

    !

    ! jnjini

    jini

    itttjin

    jin

    t

    t

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    48/89

    Therefore )(oc has beta )1,( jnj , hence ( )(ln )oE c , according to Kendall and

    Stuart (1969), can be computed by obtaining first( )

    (ln(1 ))oE c as follows:

    1

    1

    ( ) ( ) ( )

    0

    ( ) (1 ) ( , 1)j n jo o o

    c c dc B j n j ! (3.2.4)

    Obtaining the derivative of (3.2.4) with respect to n as:

    1

    1

    ( ) ( ) ( ) ( )

    0

    ( , 1)( ) (1 ) ln(1 )j n j

    o o o o

    dB j n jc c c dc

    n

    !

    Hence( )(ln(1 ))oE c will be:

    ( )(ln(1 ))

    1 ( , 1) ln ( , 1)

    ( , 1)

    oE c

    dB j n j d B j n j

    B j n j dn dn

    ! !

    ln ( 1) ln ( 1)( 1) ( 1)

    d n j d nn j n

    dn dn] ]

    + + ! !

    Where ( )x] is a digamma function has the following formula:

    ( ) ( ) / ( ) x x x] d! + +

    Obtaining( )

    (ln )o

    E c required to calculate the derivative of (3.2.4) with respect to j as

    follows:

    1

    1

    ( ) ( ) ( ) ( ) ( )

    0

    ( , 1)( ) (1 ) [ln( ) ln(1 )]j n j

    o o o o o

    dB j n jc c c c dc

    dj

    ! (3.2.5)

    Hence (3.2.5) will be:

    ( ) ( )

    1 ( , 1)(ln ) (ln(1 )

    ( , 1)o o

    dB j n jE c E c

    B j n j dj

    !

    = ( 1) ( 1) ( ) ( 1) ( ) ( 1)n j n j n j j n] ] ] ] ] ] !

    Hence ( )mnE U can be computed as:

    d

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    49/89

    1 1

    1 1

    1

    1

    ( ) { ( 1) ( 1)} { (2 ) ( 1)}

    { ( ) ( 1)} ln( ) ln(2 )

    m n m

    mn

    i i m

    n

    i n m

    E U n i m n n m n

    n n m i n n m

    ] ] ] ]

    ] ]

    ! !

    !

    !

    Arranging the terms:

    )2ln()ln()1()1(2

    )1()(()2()1()(1

    1

    1

    1

    1

    1

    mnnn

    mn

    n

    mn

    nn

    mimnnmnminUE

    n

    mni

    mn

    mi

    m

    i

    mn

    ! !

    !

    !

    ]]

    ]]]]

    1 1 1

    1 1

    ( 1) ( 2 ) (2 ) ( )

    ( 1) ln( ) ln(2 )

    m n

    i i n m

    n i m n n m m n n m i

    n n m

    ] ] ]

    ]

    ! !

    !

    1

    22( ) ( 1) (1 ) (2 ) ( 1) ln( ) ln(2 )m

    mn

    i

    mE U i m m n n mn n

    ] ] ]!

    !

    Using the fact that for large x, (see Pardo(2003))

    1( ) ln( )

    2x x

    x] } (3.2.6)

    Taking (3.2.6) in the view, thus ( )E Umn

    will be:

    1

    2 2 1 1( ) ( 1) ln(2 ) ln(2 ) ln( 1)2

    1+ ln( ) ln(2 )

    2 2

    mmn

    i

    mE U i m m m nn n m n

    n mn

    ]

    !

    !

    1

    2 2 1 1 1( 1) ln(2 )

    2 2 1

    m

    i

    mi m m

    n n m n n]

    !

    ! (3.2.7)

    According to (3.2.7), ( )Umn

    E will tend to zero under the

    assumptions gpn , gpm and 0pn

    m, so that Vasicek(1975) concluded that under

    these conditions )( xH can be asymptotic unbiased estimator for )(xH . Unfortunately

    there is some problems in this proof that is treated by Song (2000).

    3.2.2 Entropy Estimation Using Correas Estimator

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    50/89

    Correa (1995) gave another estimator of entropy based on m spacing, he

    claimed that its estimator is regarded as a modified estimator of Vasicek (1975), he

    concluded that his estimator has mean square error less than Vasiceks (1975)

    estimator using the simulation. His idea based on estimating)(

    )(

    i

    i

    dx

    xdFthrough the

    interval ),( )()( mimi xx , but instead of taking only the upper and the lower of the

    interval ),( )()( mimi xx , he used all the points, 2 1m , in the interval ),( )()( mimi xx by

    applying ordinary least squares OLS on the following model:

    nimimijxBBxF jiijni ..1,..,where)( )(10 !!! I

    For )()()1()( njj xxandxx !! when njandj " ,1 respectively, where iB1 can be

    regarded as the slope of the empirical distribution function )( )( jni xF on the

    observations of the sample with in the interval ),( )()( mimi xx , hence it is required to fit

    n models each model can be rewritten as:

    )(10 jixbbn

    j! mimij ! ,..,

    Where:

    ( ) ( )

    12

    ( ) ( )

    ( )( )(2 1)

    ( )

    i m i m

    j i

    j i m j i m

    i i m

    j i

    j i m

    j jx x n n m

    b

    x x

    ! !

    !

    !

    2

    0

    ( ) ( )

    2

    ( ) ( )

    ( )(2 1)

    ( )( )(2 1)

    ( )

    m

    i mj

    j i

    j i m

    i m

    j i

    j i m

    j i m mj

    x xn n m

    x x

    !

    !

    !

    !

    ( ) ( )

    2

    ( ) ( )

    ( (2 1) ( )(2 1)( )

    (2 1)( )

    ( )

    i m

    j i

    j i m

    i m

    j i

    j i m

    j m m i m m

    n n mx x

    x x

    !

    !

    !

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    51/89

    ( ) ( ) ( ) ( )

    2 2

    ( ) ( ) ( ) ( )

    (2 1)( )

    (2 1)( ) ( )( )

    ( ) ( )

    i m i m

    j i j i

    j i m j i m

    i m i m

    j i j i

    j i m j i m

    j i m

    n n m

    j i x x x x

    n n

    x x x x

    ! !

    ! !

    ! !

    (3.2.8)

    Where:

    ! !

    mi

    mij

    j

    im

    xx

    12

    )(

    )(

    Finally the estimate of the entropy will be:

    1

    1

    ln( ) ( )n

    i

    i

    corr

    bH x

    n!!

    A numerical comparison was performed amongcorrxH )(

    ,vanxH )(

    and

    vasxH )( by Correa (1992) with different sample sizes 10, 20 , 50 each with m equals

    1,2,3 and 4 associated to three distributions N(0,1), Uniform(0,1) and Exp(1), where

    vanxH )( doesnt concluded in this study, he concluded that

    corrxH )( has smallest

    mean square error and didnt affect with the levels of window size m.

    3.2.3 Entropy Estimation Using Wieczorkowski and Grzegorzewskisestimators:

    Wieczorkowski and Grzegorzewski (1999) proposed three new estimators, itwill concentrate only on the modification of the estimators of Vasicek (1975) and

    Correa (1995). First one can conclude from (3.2.5) that:

    )()()),(ln())((1

    1

    mnmn

    n

    i

    ivasVEUExfnExHE !

    !

    U

    !

    $m

    i

    min

    nmn

    mmnxH

    1

    )1(2

    )1()2()2

    1()2ln()ln()( ]]]

    The bias of vasxH )( can be written as:

    )1(2

    )1()2()2

    1()2ln()ln())()((1

    !

    $m

    i

    vasmi

    nnm

    n

    mmnxHxHE ]]]

    Mainly Wieczorkowski and Grzegorzewski (1999) decided to correct the bias of

    vasxH )( as subtracting ( )

    mnE U from vasxH )( as follows:

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    52/89

    !

    !m

    i

    vaswmi

    nnm

    n

    mmnxHxH

    1

    1 )1(2

    )1()2()2

    1()2ln()ln()()( ]]] (3.2.9)

    Actually (3.2.9) can be regarded as corrected Vasiceks (1975) estimators,

    thus Wieczorkowski and Grzegorzewski (1999) surprised for not using Vasicek

    (1975) )(1 xHw , secondly they proposed an estimator which modified Correa (1992)

    by jackknife method, their idea consists of the following steps:

    1. Let corrxH )( be an estimator for )(xH after removing one observation, thus it is

    required to calculate corrxH )( n times.

    2. Obtain

    corrxH )( that has the following formula:

    n

    xH

    xH

    n

    i

    corr

    corr

    !

    !1

    )(

    )(

    3. Calculate the jackknife estimator 2)(

    wxH that can be expressed as :

    ! corrcorrw xHnxHnxH )()()()( 12

    It is obvious that 2)(

    wxH has the same properties of corrxH )( , that is if

    corrxH )( is unbiased estimator so are

    corrxH )( and 2)(

    wxH , however if corrxH )(

    is

    biased estimator then 2)(

    wxH so is but less than, to proof this point, according to

    Wasserman (2006), the bias for many statistics can often express as:

    )1

    ())((32

    nO

    n

    b

    n

    axHbias ! (3.2.10)

    Where )( kn , according to Theil (1971), denotes that there is a sequence of terms

    which the leading or the dominator term of order kn and )( kk nn is bounded

    sequence for large n. When (3.2.10) holds so that:

    )1

    ()1(1

    ))((32 n

    On

    b

    n

    axHbias corr

    !

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    53/89

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    54/89

    2)(

    WxH works better than corrxH )( because has less bias, finally

    vasxH )( doesnt

    posses good statistical properties.

    3.3Goodness of Fit tests Based on Maximum Entropy

    The entropy of a random variable is playing a fundamental role not only in the

    information theory but also in the testing hypotheses; indeed there are two different

    approaches for goodness of fit via maximum entropy: tests based on likelihood tests

    and tests based on sampling m spacing.

    3.3.1 Goodness of Fit Based on Likelihood TestsStengos and Wu (2004),(2007) derived a general distribution tests based on the

    method of maximum entropy density, their tests based on the fact that there is one to

    one relation between maximum entropy density ( , )f x P

    and the probability density

    function for specified distribution, in other words they tested whether the maximum

    entropy corresponding to the distribution under the null hypothesis, normal

    distribution, is suitable for representing the specified sample via testing

    0 3..j j JP ! ! , actually in the practice life J will be small, in our case J = 4, they provided four flexible tests based on Lagrange multiplier principle test that it reduced

    surprisingly to a test statistic with a simple closed form. It isnt worth nothing if we

    using Wald test (WT) and Likelihood ratio (LR) for testing 0 3..4j

    JP ! ! .

    1.Tests Based on the Third and the Fourth MomentsStengos and Wu (2004) claimed that it can test the distributions maximum

    entropy density via using the third and the fourth moments as follows:

    4

    1 0

    1

    ( , ) exp( )jj

    j

    f x xP P P

    !

    !

    Clearly ),(1

    Pxf is integerable over the real line until the dominant term , 4x , in the

    exponent must be an even function, otherwise ),(1Pxf will explode as gpx , the

    second condition is that the coefficient associated with the dominant term, which is an

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    55/89

    even function by the first condition, must be positive, otherwise ),(1Pxf will explode

    as gpx . For testing the normality of the sample it is requited to test:

    : 0o j

    H P ! . .V S 1 : j jH P P!)

    3,4j !

    WhereP

    denotes the maximum entropy estimator ofP .

    a) According to Stengos and Wu (2004), for running the Lagrange multiplier test it is

    required to obtain the score function and the information matrix under the null

    hypothesis, the sample follows the standard normal, as follows:

    1( )S P

    d !

    4

    0

    11

    ln exp( )n

    j

    j i

    ji

    j

    d x

    d

    P P

    P

    !!

    4

    0

    1 1

    ( )n

    j

    j i

    i j

    j

    d x

    d

    P P

    P

    ! !

    !

    4

    1{ }

    o j jj

    j

    d m

    nd

    P P

    P

    !

    ! }{ jj

    o md

    dn

    !

    P

    P

    4

    1

    ln( exp( ) )

    { )

    j

    j

    j

    j

    j

    d x dx

    n md

    P

    P

    g

    !g

    !

    1( )S P

    d ! }))({})({})({})({( 4

    4

    3

    3

    2

    2

    1 mxEmxEmxEmxEn oooo dddd!

    Where )( jo xEd refers to the expected of thejx under the standard normal of the

    sample, using (3.1.21) )( jo xEd has the following formula:

    g

    g

    g

    g !

    d!!d ))5.exp())exp()( 22

    1

    dxxxdxxxxE oNj

    j

    j

    jNoN

    jj

    oPPP

    Where:

    g

    g

    !d ))5.exp(ln( 2 dxxoN

    P

    Since it is required to test only 43 and PP equal to zero, therefore the score function

    under the standard normal can be:

    )300()))(())((00()( 4344

    3

    3

    1 mmnmxEmxEnS oo !dd!dP

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    56/89

    The information matrix )(44 PvdI associated to testing the normality can be expressed

    as:

    (4,4)

    2

    ( ln( ( , )) i

    i

    12

    ( ln( ( , )) i

    i

    1

    ( )

    N

    dnE f x

    di

    dnE f x

    d di j

    I

    PP

    PP P

    P

    !

    {

    d !

    ( , ) ( )J J NI P

    d { ( ) ( ) ( )}i j j io o o

    n E x E x E xd d d! = n

    960120

    01503

    12023

    0301

    1 , 4i je e

    According to Engle (1984) the Lagrange multiplier test takes the following formula:

    1

    1 1 1 1( ) ( ) ( )t

    NLM S I SP P P

    d d d d! )24

    )3(

    6(

    2

    4

    2

    3 !mm

    n

    Nevertheless 1ML dhas a simple closed form, it is testing only whether the sample

    follows the standard normal that is required to standardize the sample before

    operating the test, fortunately it can derived the 1ML dunder the normal ( Q ,2W ),

    where2

    and WQ denote the maximum entropy estimators for the normal distributions

    parameters. Hence the score function under normal ( Q , 2W ) will be:

    )})({})({00{)(4

    4

    3

    3

    1 mxEmxEnS oo !P

    Where )( jo xE refers to the expected of thejx under normal ( Q , 2W ), using (3.1.21)

    it can take:

    221

    12 2

    1 ( ) exp( ) ) exp( )

    2

    ( , )

    j j j j

    o oN jN oN

    j

    j

    i N

    mE x x x dx x x x dx

    m m

    x f x dx

    P P P

    P

    g g

    !g g

    g

    g

    ! ! d d

    !

    )

    Where:

  • 8/14/2019 Entropy Statistics Tests Goodness of Fit Wald Lagrange Likelihood Ratio

    57/89

    g

    gd

    d

    ! ))2

    1exp(ln( 2

    22

    1 dxxm

    xm

    moNP

    The information matrix under normal ( Q , 2W ) can be expressed as:

    )}()()({)( 1j

    o

    i

    o

    ji

    oN xExExEnI !

    P

    Finally the Lagrange multiplier test will be:

    1

    1 1 1 1( ) ( ) ( )tNLM S I SP P P

    !

    One can observe that 1ML dis a special case of 1LM .

    b) For testing 0 3, 4h hP ! ! via Wald test(WT), it is first required to partition

    P as:

    !

    21 PPP where 211