chapter 4 est. test of hypothesis

ESTIMATION AND TEST OF HYPOTHESES: ONE- SAMPLE, TWO- SAMPLE

Introduction to hypothesis Testing: Suppose you have to buy cornflakes from a salesman. The issue is not the price of cornflakes but the amount of cornflakes in each box. The salesman appears and claims that the cornflakes he is selling are

packaged at 10 oz/box. You have exactly 4 alternative possible views of his claim.

INTRODUCTION…He is honest and

μ = 10 ozHe is conservative and there is more than 10 oz/box;

μ > 10 oz

He is trying to cheat you and there is less than 10

oz/box; μ < 10 oz

He is new on the job and does not really know the amount per box; his claim could be high or low, ie

μ ≠ 10 oz.

INTRODUCTION…If you think he is honest you would just go

ahead and order your cornflakes from him. You may, however, have one the other

views, he is i)CONSERVATIVE or

ii)LIAR or

iii)CLUELESS. The position you hold regarding the

salesman can be any one of these but not more than one. You can’t assume he is liar and conservative ie μ < 10 oz and μ > 10 oz , at the same time.

INTRODUCTION…Proper use of scientific method will allow you to test one of these alternative positions through a sampling process. Remember you can choose only one to test.

How would you decide ?

?????

INTRODUCTION… CASE 1: Testing the salesman is conservativeSuppose the salesman is remarkably shy and seems

to lack self confidence. You feel from his general conduct that he is being conservative in his claim of 10 oz/box. The situation can be summarized with a pair of hypothesis – actually a pair of predictions.

A) The salesman’s claim and the prediction we

will directly test. It is usually called Ho or null hypothesis. In this case

Ho: μ=10 oz.

INTRODUCTION…B) The second is called the alternative or

research hypothesis which is your belief or position. The alternative hypothesis in this case is Ha: μ > 10 oz. By writing the null hypothesis as Ho: μ ≤ 10 oz. Predictions take the following forms

Ho: μ ≤ 10 oz (null hypothesis) Ha: μ > 10 oz (alternative

hypothesis)And we have generated two mutually

exclusive and all-inclusive possibilities. Therefore, either Ho or Ha will be true, but not both.

INTRODUCTION..Hypothses

A. Salesman’s claim(Ho)

B. Customer’s belief or position (Ha)

INTRODUCTION…In order to test the salesman’s claim (Ho)

against your views (Ha), you decide to do a small experiment. You select 25 boxes of cornflakes from a consignment and carefully empty each box, weigh and record its contents. This experimental sampling is done after you have formulated the two hypotheses. If the first hypothesis were true you would expect the sample mean of the 25 boxes to be close to or less than 10 oz.

INTRODUCTION…If the second hypothesis were true you

would expect the sample mean to be significantly greater than 10 oz. We have to think about what significantly greater means in this context. In statistics significantly less or more or different means that the result of the experiment would be a rare result if the null hypothesis were true. In other words, the result is far enough from the prediction in the null hypothesis that we feel that we must reject the truthfulness of the hypothesis.

INTRODUCTION…The idea leads to the problem of what is a

rare result or rare enough result to be sufficiently suspicious of the null hypothesis. For now we will say if the result could occur by chance less than 1 in 20 times if the null hypothesis were true. When we will reject the null hypothesis and consequently accept the alternative ones. Let’s now look at how this decision making criterion works in CASE 1.

INTRODUCTION…Ho : μ ≤ 10 ozHa : μ > 10 ozn= 25 and assume and is widely

known. 0.1

INTRODUCTION….Suppose the mean of your 25 box sample is

10.36 oz. Is that significantly different from (>) 10 oz so that we should reject the claim of 10 oz stated in Ho. Clearly it is greater than 10 oz but is this mean rare enough under the claim of μ ≤ 10 oz for us to reject the claim.

To answer this question we will use the standard normal transformation to find the probability of ≥10.36 oz when the mean of the sampling distribution of is 10 oz. If this probability is less than 0.05 (1 in 20), we consider the result to be too rare for acceptance of Ho.

X

X

INTRODUCTION…CASE II: Testing that the salesman is a cheat Suppose our salesman is a fast and smooth

talker with fancy cloths and a new sports car. Your view might be that cornflakes salesman only gain this type affluence through unethical practices. You think this guy is cheat. Your null hypothesis is Ho: μ ≥ 10 oz and your alternative hypothesis is Ha:μ < 10 oz . Notice that the two hypothesis are again mutually exclusive and all inclusive and that the equal sign is always in the null hypothesis.

INTRODUCTION…..It is the null hypothesis (the salesman’s

claim) that will be tested. Ho : μ ≥ 10 ozHa : μ < 10 oz. Suppose you again sample 25 boxes to

determine the average weight. The question you want to answer and the predictions (Ho, Ha) stemming from that question are again formulated before the sampling is done,

INTRODUCTION…n = 25, oz and again we find =

10.36 oz. How does this result fit our predictions ? If Ho is false, we expect the mean to be significantly less than 10 oz.

0.1 X

X

INTRODUCTION…CASE III: Testing that the salesman is clueless

The last case is somewhat different from the

first in that we really don’t know whether to expect the mean of the sample to be higher or lower than the salesman’s claim. The salesman is new on the job and does not know his product very well. The claim of 10 oz per box is what he has been told, but you don’t have a sense that he is either overly conservative (CASE I) or dishonest (CASE II). Your alternative hypothesis here is less focused.

INTRODUCTION…It becomes that the mean is different from

10 oz. The prediction become Ho: μ = 10 ozHa : μ ≠ 10 oz. Under Ho we expect to

be close to 10 oz, while under Ha we

expect to be different from 10 oz in either direction ie significantly smaller or significantly larger than 10 oz.

X

X

TYPICAL STEPS IN A STATISTICAL TEST OF HYPOTHESIS

1. State the problem: should I buy cornflakes from salesman?

2. Formulate the null and alternative hypothesis

Ho : μ = 10 oz Ha : μ ≠ 10 oz3. Choose the level of significance. This means

to choose the probability of rejecting a true null hypothesis. We choose 1 in 20 in our cornflakes example, that is, 5% or 0.05. When Z was so extreme as to occur less than 1 in 20 times if Ho were true, we rejected Ho.

TYPICAL STEPS…4. Z is calculated as

Determine the appropriate test statistic. Here we mean the index whose sampling distribution is known, so that objective criteria can be used to decide between Ho and Ha. In the cornflakes example we used a Z transformation because under the Central Limit Theorem was assumed to be normally or approximately normally distributed and the value of was known.

n

XZ

X

TYPICAL STEPS…5. Calculate the appropriate test statistic.

Only after the first four steps are completed , can one do the sampling and generate the so-called test statistic.

Here Z= 8.120.0

36.0

25

100.1036.10

TYPICAL STEPS…6. Determine the critical values for the

sampling distribution and appropriate level of significance. For the two tailed test and level of significance of 1 in 20 we have critical values of + 1.960 (C.3 Tab). These values or more extreme ones only occur 1 in 20 times if Ho is true. The critical values serve as cutoff points in the sampling distribution for regions to reject Ho.

TYPICAL STEPS….7. Compare the test statistic to the critical

values. In a two-tailed test, the CV’s = + 1.960 and the test statistic is 1.8, so

- 1.960<1.8<1.960. 8. Based on the comparison in step 7,

accept or reject Ho. Since Z falls between the critical values, it is not extreme enough to reject Ho.

9. State your conclusion and answer the question posed in step 1. SO WE ACCEPT HO.

TYPE I VS TYPE II ERROR IN HYPOTHESIS TESTINGBecause the predictions in Ho and Ha are written so

that they are naturally exclusive and all inclusive, we have a situation where one is true and the other is automatically false.

When Ho is true, then Ha is false.

If we accept Ho we have done the right thing If we reject Ho we have made an error

This type of mistake is called a Type I error

TYPE I VS TYPE II ERRORWhen Ho is false , then Ha is true

If we accept Ho, we have made an error

If we reject Ho, we have done the right thing

The second type of mistake is called Type II error

t- test ( Hypothesis involving the mean)

Example 1. A forest ecologist studying regeneration of rain forest communities in gaps caused by large tree falling during storms, read the stinging (bow) tree, Dendrocnide excelsa, seedlings will grow 1.5m/yr in direct sun light in each gap. In the gaps in her study plot she identified 9 specimens of this species and measured them in 2009and again 1 yr later. Listed below are the changes in height for the nine specimens.

T-TEST…Do her data support the published

contention that seedlings of this species will average 1.5 m of growth per yr in direct sun light ?

1.9 2.5 1.6 2.0 1.5 2.7 1.9 1.0 2.0 SolutionHypothesis : Ho: μ = 1.5 m/yr Ha: μ ≠ 1.5 m/yr

T-TEST…If the sample mean for 9 specimens is close

to 1.5 m/yr we will accept Ho. If sample mean is significantly larger or smaller than 1.5 m/yr we will accept Ha (reject Ho). To test significant difference, it means that they are so rare that they would occur by chance less than 5% of the time, if Ho is true ie α = 0.05. Test statistic will be

n

sX

t

T-TEST…Here, n=9, s2 =0.260 m2 , s= 0.51

and

Clearly t-value of 2.35 is not zero but it is far enough away from zero so that we can comfortably reject Ho. With a predetermined α level of 0.05 we must get a t-value far enough from zero that would occur <5% of the time if Ho is true.

,90.1 mX

35.2

3

51.040.0

9

51.050.190.1

n

sX

t

T-TEST…From Tab C.4 we have the following

sampling distribution for t with v=n-1= 8 and α=0.05 for a two tailed test.

-2.306 +2.306

t=2.35

0accept

reject

reject

0.025

0.025

T-TEST…If Ho is true and we sample hundreds or

thousands of times with samples of 9 species and each time we calculate the t-value for the sample, these t-values would form a distribution with the shape indicated above. 2.5% of the samples would generate t-values below -2.306 and 2.5% of the samples would generate t values above 2.306. So values as extreme as + 2.306 are rare if Ho is true.

T-TEST…The test statistic in this sample is 2.35

and since 2.35>2.306, the result would be considered rare for a true null hypothesis. We reject Ho based on this comparison and conclude that average growth of stinging trees in direct sun light is different from the published value and is, in fact, greater than 1.5 m/yr.

Rejecting Ho may lead to a Type I error.

EXAMPLE: TWO SAMPLE TESTWatching an infomercial on TV you hear

the claim that without changing your eating habits, a particular herbal extract when taken daily will allow you to loose 5lb in 5 days. You decide to test this claim by enlisting 12 of your classmates into an experiment. You weigh each subject, ask them to use the herbal extract for 5 days and then weigh them again. From the results recorded below, test the infomercial’s claim of 5 lb lost in 5 days.

EXAM. TWO SAMPLE TESTSubject Weight

before(lb)Weight after(lb)

1 128 120

2 131 123

3 165 163

4 140 141

5 178 170

6 121 118

7 190 188

8 135 136

9 118 121

10 146 140

11 212 207

12 135 126

EXAM: TWO SAMPLE TESTSolution: Because the data are paired

we are not directly interested in the values presented above, but are interested in the differences or changes on the pairs of members. Think of data as in groups

Group 1 Group 2

X11 X21

X12 X22

X13 X23

… …

X1n X2n

For the paired data here we wish to investigate the differences or di’s where X11-X21 = d1, X12-X22 = d2, X1n-X2n =dn

EXAM: TWO SAMPLE TESTExpressing the data set in terms of these

differences di’s, we have the following table. Note importance of sign of these differences

subjects

di subjects

di

1 8 7 2

2 8 8 -1

3 2 9 -3

4 -1 10 6

5 8 11 5

6 3 12 9

EXAM: TWO SAMPLE TESTThe infomercial claim of a 5 lb loss in 5

days could be written Ho: μB- μA = 5lb but Ho: μd = 5lb is

somewhat more appealingHo: μd = 5 lb

Ha: μd ≠ 5 lb

Choose α = 0.05, since the two columns of data collapse into one column of interest, we treat these data now as a one sample experiment.

EXAM: TWO SAMPLE TESTThere is no preliminary F test and our only

assumption is that the di’s are approximately normally distributed. The test statistic for the paired sample t test is

With v = n-1, where n is number of pairs of data points.

n

sX

td

d

EXAM: TWO SAMPLE TEST, Here = 3.8 lb, sd = 4.1 lb, n=12. We

expect this statistic to be close to 0 if Ho is true ie the herbal extract allows you to loose 5 lb in 5 days. We expect this statistic to be significantly different from 0 if the claim is false.

dX

01.1

12

1.458.3

t

EXAM: TWO SAMPLE TESTWith v= n-1= 12-1 =11. The critical value for

this left tailed test from Tab C.4 is t0.05(11)= -1.796. Since -1.796<-1.01 the test statistic does not deviate enough from expectation under a true Ho that you can reject Ho. The data gathered from your classmates support the claim of an average loss of 5 lbs in 5 days with the herbal extract. Because you accept Ho here, you may be making a Type II error (accepting a false Ho), but we have no way of quantifying the probability of this type of error.

EXAMPLE 3An expt. was conducted to compare the

performance of two varieties of wheat, A and B. Seven farms were randomly chosen for the expt. and the yields in metric tons per hectare for each variety on each farm were as follows;

Farm Yield of var. A

Yield of var. B

1 4.6 4.1

2 4.8 4.0

3 3.2 3.5

4 4.7 4.1

5 4.3 4.5

6 3.7 3.3

7 4.1 3.8

EXAMPLE 3…a) Why do you think both varieties were

on each farm rather than testing variety A on seven farms and variety B on seven different farms?

b) Carry out a hypothesis test to decide whether the mean yields are the same for the two varieties.

EXAMPLE 3…Solution: The expt. was designed to test

both varieties on each farm because different farms may have significantly different yields due to differences in

i) soil characteristics

ii) micro climate

iii) cultivation practices

“Pairing” the data points accounts for most of the “between farm” variability and should make any difference in yield due solely to what variety.

EXAMPLE 3…Farm Difference

(A-B)

1 0.5

2 0.8

3 -0.3

4 0.6

5 -0.2

6 0.4

7 0.3

The hypotheses areHo : μA – μB or μd = 0

Ha : μd ≠ 0

Let α = 0.05.Then ton/hectare n =7

and and sd = 0.41 ton/hectare.

30.0dX

94.1

7

41.0030.0

t

EXAMPLE 3…With v=7-1=6 . The critical values from

Tab C.4 are t0.025(6)= -2.447 and t0.975(6) = 2.447. Since

-2.447<1.94<2.447 the test statistic does not deviate enough from 0, the expected t value if Ho is true, to reject Ho. From the data given we can not say that the yields of varieties A and B are significantly different.

CHI-SQUARE TESTExample: A geneticist interested in human

population has been studying growth patterns in US males since 1900. A monograph written in 1902 states that the mean height of adult US males is 67.0 inch with a standard deviation of 3.5 inch. Wishing to see if these values have changed over the 20th century the geneticists measured a random sample of adult US males and found that = 69.4 inch and s = 4.0 inch. Are these values significantly different from the values published in 1902?

X

CHI-SQUARE…Solution: There are two questions here –

one about the mean and the second about the standard deviation or variance. Two questions require two sets of hypotheses and two test statistics. For the question about means, the hypotheses are

Ho : μ = 67.0 inchHa : μ ≠ 67.0 inch

CHI-SQUARE…With n = 28 and α = 0.01. This is a two

tail test with the question and hypotheses (Ho and Ha) formulated before the data were collected or analyzed.

Using an α level of 0.01 for v= n-1= 27, we find the critical values to be ± 2.771 (Tab C.4).

16.376.0

4.2

28

0.40.674.69

n

sX

t

CHI-SQUARE…Since 3.16>2.77, we reject Ho and say

that modern mean is significantly different from that reported in 1902 and , in fact, is higher than the reported value (because the t-value falls in the right hand tail). P (Type I error)< 0.01.

For the question about variance, the hypotheses are Ho: Ha :

22 25.12 inch 22 25.12 inch

CHI-SQUARE….Here n=28. Then

The question about variability is answered

with a Chi-square statistic. The value is expected to be close to 27 (n-1), if Ho is true and significantly different from 27, if Ha is true.

3.3525.12

16)128()1(2

22

sn

2

CHI-SQUARE…From Table C.5 using an alpha level of

0.01 for v = 27, we find the critical values for to be 11.8 and 49.6. Since 11.8<35.3<49.6 we do not reject Ho here. There is not statistical support for Ha. The p value here for p

is between 0.500(31.5) and 0.250(36.7) indicating the calculated value is not a rare event under the null hypothesis.

2

)3.35( 2

CHI-SQUARE…..We would conclude that the mean height

of adult US males is higher now than reported in 1902, but the variability in heights is not significantly different today than in 1902.

CHI-SQUARE TEST FOR GOODNESS OF FITAssumptions for the test for goodness of fit

are that1. An independent random sample of size n is

drawn from the population.2. The population can be divided into a set of

k mutually exclusive categories.3. The expected frequencies for each

category must be specified. Let Ei denote the expected frequency for the i-th category. The sample size must be sufficiently large so that each Ei is at least 5 (categories may be combined to achieve this).

2

…GOODNESS OF FITThe hypothesis test takes only one form

Ho : The observed frequency distribution is the same as the hypothesized frequency distribution

Ha : The observed and hypothesized frequency distributions are different

Generally speaking, this is an example of a statistical test where one wishes to confirm the null hypothesis.

….GOODNESS OF FITTest statistic

Let Oi denote the observed frequency of the i-th category. The test statistic is based on the difference between the observed and expected frequencies, Oi - Ei.

The intuition for the test is that if the observed and expected frequencies are nearly equal for each category, then each

Oi – Ei will be small and, hence, will be small. Small values of Chi-squares should lead

to acceptance of Ho while large values lead to rejection. The test is always right tailed. Ho is rejected only when the test statistic exceeds a specified value.

k

ii

ii

E

EO1

22 )(

2

….GOODNESS OF FITThe statistic has an approximate Chi-

square distribution where Ho is true; the approximation improves as sample size increases. The values of the Chi-square distribution are tabulated in C.5.

….GOODNESS OF FIT :EXAMPLEThe progeny of self-fertilized four-o’clocks

were expected to flower red, pink and white in the ratio of 1:2:1. There were 240 progeny produced with 55 red plants, 132 pink plants, and 53 white plants. Are these data reasonably consistent with the Mendelian 1:2:1 ratio?

EXAMPLE…Solution: The hypotheses are

Ho: The data are consistent with a Mendelian model (1:2:1)

Ha: The data are inconsistent with a Mendelian model (1:2:1)

The THREE colours are the THREE categories. In order to calculate frequencies, no parameters need to be estimated. The Mendelian ratios are given; 25% red, 50% pink and 25% white. Using the fact that there are 240 observations, the number of expected red four-o’clock is 0.25 × 240 = 60 ie Ei = 60. Similar calculations for pink and white yield the following table:

EXAMPLE…Category Oi Ei

Red 55 60 0.42

Pink 132 120 1.20

White 53 60 0.82

Total 240 240 2.44

i

ii

E

EO 2)(

EXAMPLE…

44.282.020.142.0)(3

1

22

i

i

ii

E

EO

EXAMPLE….v = df = no. of categories-1 = 3-1 = 2 Let

α = 0.05Because the test is right tailed, the critical

value occurs when . Thus in Table C.5 for df=2 and p=1-α = 0.95, the critical value is found to be 5.99. Since 2.44<5.99, Ho is accepted. This support Mendelian 1:2:1 ratio.

)( 21

21p

chapter 4 est. test of hypothesis

Education

alternative hypothesis

null hypothesis ha

hypothesis testing

pair of hypothesis

cornflakes salesman

salesmans claim ho

rare result

predictions ho