testing hypothesis

OutlineLarge Sample Tests

p-value

Testing Hypothesis

Maura Mezzetti

Department of Economics and FinanceUniversita Tor Vergata

Mezzetti Testing Hypothesis


p-value

Hypothesis Testing

”It is a mistake to confound strangeness with mystery”

Sherlock HolmesA Study in Scarlet



p-value

Outline

english

1 Large Sample TestsWald TestScore TestLikelihood Ratio Test

2 p-valueApplication to Poisson Distribution



p-value

Wald TestScore TestLikelihood Ratio Test

Large Sample Tests

Why do we need large sample hypothesis testing?

The advantage of a large sample for testing is that we do not needthe distribution of the test statistics: less computations needed! ...but results hold asymptotically



p-value


Large Sample Tests

Suppose we wish to test H0 : θ = θ0 against H0 : θ 6= θ0. Thelikelihood-based give rise to several possible tests



p-value


Large Sample Tests

Let logL(θ) denote the loglikelihood and θn the consistent root ofthe likelihood equation. Intuitively, the farther θ0 is from θn, thestronger the evidence against the null hypothesis.

But how far is far enough?

If θ0 is close to θn

logL(θ0) should also be close to logL(θn)

logL′(θ0) should also be close to logL′(θn) = 0.



p-value


Large Sample Tests

Wald test

If we base a test upon the value of (θn − θ0)

Likelihood ratio test

If we base a test upon the value of logL(θn)− logL(θ0)

(Rao) score test

If we base a test upon the value of logL′(θ0)



p-value


Three asymptotic tests



p-value


Large Sample Tests

We argued that under certain regularity conditions, the followingfacts are true under H0:

√n(θn − θ0) −→ N

(0,

1

I (θ0)

)

1√n

(logL′(θ0)) −→ N (0, I (θ0))

−1

n(logL′′(θ0)) −→ I (θ0)



p-value


Wald Test

Under certain regularity conditions, the maximum likelihoodestimator θ has approximately in large samples a (multivariate)normal distribution with mean equal to the true parameter valueand variance-covariance matrix given by the inverse of theinformation matrix, so that

θ ∼ N

(θ0,

1

nI (θ0)

)



p-value


Wald Test

Wald test statistics (parameter θ scalar)

(θ − θ0)×√

nI (θ0) ∼ N(0, 1)

A consistent estimator is used

(θ − θ0)×√

nI (θ) ∼ N(0, 1)

More generally, often it is written as:

θ − θ0√Var(θ)

∼ N(0, 1)



p-value


Wald Test Statistics

When θ is a parameter of dimension p. This result provides a basisfor constructing tests of hypotheses and confidence regions. Forexample under the hypothesis H0 : θ = θ0 for a fixed value θ0, thequadratic form

W = (θ − θ0)′I (θ)(θ − θ0)

written also as:

W = (θ − θ0)′var−1(θ)(θ − θ0)

has approximately in large samples a chi-squared distribution withp degrees of freedom.



p-value


Comments on Wald Test Stastiscs

You may find also this formula:

W = (θ − θ0)′I (θ0)(θ − θ0)

and assumed an approximately chi-squared distribution with pdegrees of freedom.

REMEMBER:

limn→∞I (θ) = I (θ0)



p-value


Score Test

Under some regularity conditions the score itself has an asymptoticnormal distribution with mean 0 and variance equal to theinformation matrix, so that

u(θ) ∼ N(0, In(θ))



p-value


Score Test Statistcs

u(θ0) ∼ N(0, In(θ0))

∂logL(θ|x1,...,xn)∂θ√nI (θ0)

∼ N(0, 1)



p-value


Score Test

When θ is a parameter of dimension p. Under some regularityconditions the score itself has an asymptotic normal distributionwith mean 0 and variance-covariance matrix equal to theinformation matrix, so that

u(θ) ∼ Np(0, In(θ))



p-value


Score Test

This result provides another basis for constructing tests ofhypotheses and confidence regions. For example under H0 : θ = θ0

for a fixed value θ0, the quadratic form

Q = (u(θ0))′I−1(θ0)u(θ0)

Q = (u(θ0))′I−1(θ0)u(θ0)

has approximately in large samples a chi-squared distribution withp degrees of freedom.



p-value


Likelihood Ratio Tests

Under certain regularity conditions, minus twice the log of thelikelihood ratio has approximately in large samples a chi-squaredistribution

−2logλ = 2logL(θ, x)− 2logL(θ0, x)

In case θ is a scalar, −2logλ has a chi-square distribution with 1degree of freedom.



p-value


Likelihood Ratio Tests

Under certain regularity conditions, minus twice the log of thelikelihood ratio has approximately in large samples a chi-squaredistribution with degrees of freedom equal to the difference in thenumber of parameters between the two models. Thus,

−2logλ = 2logL(θω2 , x)− 2logL(θω1 , x)

where the degrees of freedom are ν = dim(ω2)− dim(ω1), thenumber of parameters in the larger model ω2 minus the number ofparameters in the smaller model ω1.



p-value


Large Sample Tests

The LR, Wald, and Score tests (the ”trinity” of test statistics)require different models to be estimated.

LR test requires both the restricted and unrestricted modelsto be estimated

Wald test requires only the unrestricted model to beestimated.

Score test requires only the restricted model to be estimated.

Applicability of each test then depends on the nature of thehypotheses. For H0 : θ = θ0, the restricted model is trivial toestimate, and so LR or Score test might be preferred. ForH0 : θ > θ0, restricted model is a constrained maximizationproblem, so Wald test might be preferred.



p-value


Large Sample Tests

The score test does not require θn whereas the other two tests do.

The Wald test is most easily interpretable and yields immediateconfidence intervals.

The score test and likelihood ratio test are invariant underreparameterization, whereas the Wald test is not.



p-value


Large Sample Tests: H0 θ = θ0

Wald Test

Under the null, the normalised distance between θ and θ0 shouldbe small.

Lagrange multiplier (or Score) test

Under the null, the score u(θ0) evaluated at θ0 should be close tozero.

Likelihood ratio test

The ratio between the likelihood L(x , θ) evaluated at at theestimate θ and the likelihood L(x , θ0) evaluated at θ0 should beclose to 1



p-value


Relationship between the three tests

From the Handbook Chapter 13 by Robert Engle

These three general principles have a certain symmetry which hasrevolutionized the teaching of hypothesis tests and thedevelopment of new procedures:

Essentially, the Lagrance Multiplier approach starts at the nulland asks whether movement toward the alternative would bean improvement

The Wald approace starts at the alternative and considersmovement toward the null

The Likelihood ratio method compares the two hypothesisdirectly on an equal basis.



p-valueApplication to Poisson Distribution

p-value

A common complaint about Hypothesis Testing is the he choice ofthe significance level α, that is essentially arbitrary. To makematters worse, the conclusions (Reject or Dont Reject H ) dependon what value of α is used.




p-value

Definition: A p-value p(x) is a test statistic satisfying0 ≤ p(x) ≤ 1 for every sample point x . Small valuesof p(x) give evidence that H1 is true.

Theorem: Let W (X ) be a test statistic such that large value ofW give evidence that H1 is true. For each samplepoint x , define

p(x) = supθ∈Θ0

Pθ(W (X ) ≥W (x))

then p(X ) is a valid p-value.




Steps to find p-value

1 Let T be the test statistic.

2 Compute the value of T using the sample x1, . . . , xn. Say it isa.

3 The p-value is given by

p − value =P(T < a|H0), if lower tail testP(T > a|H0), if upper tail test

P(|T | > |a||H0), if two tail test




p-value

The smallest level at which H0 would be rejected.

The the probability that you would obtain evidence against H0

which is at least as strong as that actually observed, if H0

were true.

The smaller the p − value, the stronger the evidence that H0

is false.




Example: psychiatrist problem

A psychiatrist wants to determine if a small amount of cokedecreases the reaction time in adults.The reaction time for a specified test is assumed normallydistributed and has mean equal to µ = 0.15 seconds and σ = 0.2.A sample of 81 people were given a small amount of coke and theirreaction time averaged 0.11 seconds.Does this sample result support the psychiatrist’s hypothesis?





The reaction time is expected to decrease

What are null and alternative hypothesis?

H0 : µ = 0.15 H1 : µ < 0.15

If the null hypothesis were true, could I observe a samplemean of 0.11 from a sampling distribution centered in 0.15and standard deviation σ = 0.2?

P(X < 0.11|µ = 0.15

)= P

(X − 0.15

0.281

<0.11− 0.15

0.281

)= 0.0359





Hypothesis Testing

Population

I believe the population mean time is 0.15 (hypothesis).

MeanX = 0.11

Reject hypothesis! Not close.

Random sample

☺☺☺☺☺☺☺☺

☺☺☺☺

☺☺☺☺

☺☺☺☺

☺☺☺☺

☺☺☺☺

☺☺☺☺☺☺☺☺





Hypothesis Testing Process

Population

I believe thepopulationreaction timeis 0.15(Hypothesis)

REJECT

The SampleMean Is 0.11

SampleHypothesis

?15.011.0 =<= µxIs

yes

no ACCEPT





Sample Meanµµµµ =0.15

Sampling Distribution

It is unlikely that we would get a sample mean of this value ...

... if in fact this werethe population mean.

... Therefore, we reject the null hypothesis that µ = 0.15

0.11H0




p-value: Example: psychiatrist problem

0.0359 represents the probability to observe a sample meanwith a more extreme value than 0.11.

Probability of obtaining a test statistic as extreme as, or moreextreme, (≤ or ,≥) than the actual sample value, given H0 istrue

If we think the p-value is too low to believe the observed teststatistic is obtained by chance only, then we would rejectchance (reject the null hypothesis).

Otherwise, we fail to reject chance and do not reject the nullhypothesis.




Rejection Region: Example: psychiatrist problem

R =

(x1, x2, . . . , xn) :x − µ0√

σ2

n

< −z1−α





α = 0.05

R =

(x1, x2, . . . , xn) :x − 0.15√

0.22

81

< −1.645

−1.8 < −1.645 Reject null hypothesis

R =

(x1, x2, . . . , xn) :x − 0.15√

0.22

81

< −2.32

−1.8 < −2.32 Accept null hypothesis





R =

{(x1, x2, . . . , xn) : x < µ0 − z1−α

√σ2

n

}





α = 0.05

R = {(x1, x2, . . . , xn) : x < 0.1134}

Reject null hypothesis

R = {(x1, x2, . . . , xn) : x < 0.0984}

Accept null hypothesis




Example: psychiatrist problem: Decision

p-value= 0.0359 represents the probability to observe asample mean with a more extreme value than 0.11

If α = 0.05, we reject null hypothesis, coke does have effecton time reaction (coke decreases reaction time)

If α = 0.01, we DO NOT reject null hypothesis, coke doesNOT have effect on time reaction




Application to Poisson Distribution

Let (X1, . . . ,Xn) be a random sample of i.i.d. random variablesdistributed as a Poisson distribution with parameter λ

f (x ;λ) =λxexp(−λ)

x!

We wish to test H0 : λ = 6 versus H1 : λ 6= 6We observe a random sample (X1, . . . ,X100), such that∑

i xi = 550





Observe that the joint pdf of X = (X1, . . . ,Xn) is give by

f (x1, . . . , x : n;λ) =∏i

λxi exp(−λ)

xi !

L(λ) =λ∑

i xi exp(−nλ)∏i xi !

l(λ) =∑i

xi log(λ)− nλ− log

(∏i

xi !

)dl(λ)

λ=

∑i xiλ− n

d2l(λ)

λ2= −

∑i xiλ2

λ = x = 5.5





l(λ) =∑i

xi log(λ)− nλ− log

(∏i

xi !

)dl(λ)

λ=

∑i xiλ− n

d2l(λ)

λ2= −

∑i xiλ2

In(λ) = −E(d2l(λ)

λ2

)= −n

λ

λ = x




Poisson- Large sample test

WALD Test statistics

x − λ0√xn

approximate distribution N(0, 1)

x − λ0√xn

=5.5− 6√

5.5100

= −2.13

p-value=0.0332





Score test statistics ∑i xiλ0− n√nλ0

approximate distribution N(0, 1)

∑i xiλ0− n√nλ0

= −2.0412

p-value=0.04146





Likelihood Ratio Test Statistics

2×(l(λ)− l(λ0)

)= 2×

(∑i

xi logλ

λ0− n(λ− λ0)

)

Λ(x) = 2×

(∑i

xi logx

λ0− n(x − λ0)

)= 4.2875

p − value = 0.0384

approximate distribution χ1


testing hypothesis

Documents