hypothesis testing

139
Hypothesis Testing Chapter 6

Upload: asdasdas-asdasdasdsadsasddssa

Post on 10-Nov-2014

124 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hypothesis Testing

Hypothesis Testing

Chapter 6

Page 2: Hypothesis Testing

Hypothesis Testing

• In the last chapter we learned that thre are two types of statistical infrence– Estimation– Hypothesis Testing

• Hypothesis testing is about making a decision concerning a population by examining a sample from that population

Page 3: Hypothesis Testing

Hypothesis Testing

• A Hypothesis is defined as a statement about one or more populatiobs

• İt is always about the parameters of the populations and samples such as mean, standard deviation, variance

• For that, the techniques that are used in hypothesis testing are also called as parametric tests

Page 4: Hypothesis Testing

Hypothesis Testing

• With hypothesis testing we van answer many different questions such as– An administrator in an hospital may hypothesize that

the average length of stay of patients is five day– A doctor my hypothesize that a certain drug will be

effective in 90% of all the cases

• By means of hypothesis testing, one determines whether or not such statements are compatible with the vailable data

Page 5: Hypothesis Testing

Hypothesis Testing

• Here we are interested in two types of hypothesis– Research hypotheses– Statistical hypotheses

• Research hypothesis lead directly to statistical hypotheses

• For that reason, here we will assume that the research hypotheses for the examples and the exercises have already been concidered

Page 6: Hypothesis Testing

Hypothesis Testing

• In this book, the general procedures for hypothesis testing are outlined in 9 steps– 1. Data– 2. Assumptions – 3. Hypothses– 4. Test Statistic– 5. Distribution of the Test Statistic– 6. Decision Rule– 7. Calculation of the Test Statistic– 8. Statistical Decision– 9. Conclusion

Page 7: Hypothesis Testing

Data

• Here we need to examine the data and understand it for the appropriate test to analyze it

• For example, we need to know whether the data consist of counts of objects or measurements on objects

Page 8: Hypothesis Testing

Assumtions

• As we made several assumtions about the data in the previous chapter, we also need to do almost same assumtions about the data

• These are– Normality of the population distribution– Equality of the variances– Independence of the samples

Page 9: Hypothesis Testing

Hypotheses

• There are two statistical hypotheses involved in hypothesis testing

• These are– Null hypothesis (Ho) : it is the hypothesis to

be stated. İt is also called hypothesis of no diffrence

– Alternative hypothesis (HA) : it is the hypothesis about the diffrence

Page 10: Hypothesis Testing

Hypotheses

• For an example• Can ve conclude that a certain population mean is not

50?• The nul hypothesis Ho µ=50• Alternative hypothesis HA µ50• Now, if we want to know if we can conclude that the

population mean is greater than 50. • The the hypotheses will be• Ho: µ≤50 HA: µ>50• If it is about concluding that the population mean is less

than 50, the the hypotheses are• Ho: µ50 HA: µ<50

Page 11: Hypothesis Testing

Hypotheses

• In summary• We may state the following rules of thumb for deciding

what statement goes in the null hypothesis and what statement goes in the alternative hypothesis– 1. What you hope or expect to be able to conclude as a result of

the test usually should be placed in the alternative hypothesis– 2. The null hypothesis should contain a statement of equality,

either =, ≤, or .– 3. The null hypothesis is the hypothesis that is tested– 4. The null and alternative hypotheses are complementary. That

is, the two together exhoust all possibilities regarding the value that the hypothesized parameter can assume

Page 12: Hypothesis Testing

Hypotheses

• Here you need to ralize that:

• In general, either hypothesis testing nor statistical inference leads to the proof of a hypothesis

• It just indicates whether the hypothesis is supported or is not supported by available data

Page 13: Hypothesis Testing

Test Statistic

• One example for test statistic can be given as

• Where o is hypothesized value of a population mean

• This test statistic is related to the statistic we have seen in previous chapter

n

xz 0

n

xz

Page 14: Hypothesis Testing

Test Statistic

• The geral equation for test statistic can be given as

statisticrelevant theoferror standard

parameter edhypothesiz-statisticrelevant statistic test

n

xz 0

Page 15: Hypothesis Testing

Distribution of the Test Statistic

• Here, we say that the distribution of the tst statistic:

• follows the standard normal distribution if the null hypothesis is true and assumptions are met

n

xz 0

Page 16: Hypothesis Testing

Decision Rule• Distribution graph will have two regions

– Acceptance region– Rejection region

• Here is called the probability of rejecting a true null hypothesis

• The error committed when a true null hypothesis is rejected is called the type I error

• The probability of accepting a false null hypothesis is called

• And, the error committed when a false null hypothesis is accepted is called type II error

Page 17: Hypothesis Testing

Decision Rule

• The goal is to make small

• However we do not have control on • In practice, is larger than • With all this, there is one thing that is for sure

and it is the fact that we never know whether we have committed one of these errors when we reject or fail to reject a null hypothesis since the true state of affairs is unknown

Page 18: Hypothesis Testing

Conditions of Null Hypothesis

True False

Fail to reject Ho Correct action Type II error

Reject Ho Type I error Correct action

Page 19: Hypothesis Testing

Calculation of the Test statistic

• From the data contained in the sample we have we can calculate the test statistic

And compare it with the acceptnce and rejection regions that have been specified

n

xz 0

Page 20: Hypothesis Testing

8. Statistical Decision

• The statistical decision consist of rejecting or of not rejecting the null hypothesis

• It is rejected if the computed value of the test statistic falls in the rejection region

• It is not rejected if the computed value of the test statistic falls in the acceptance region

Page 21: Hypothesis Testing

9. Conclusion

• In the end• If Ho is rejected, we conclude that HA is true. If Ho

is not rejected, we conclude that Ho may be true• Here• It is impotant to realize that when the null

hyothesis is not rejected one should not say that the null hypothesis is accepted.

• We should say that the null hypothesis is “not rejected”.

• Now with all this precautions, we will look at some tests

Page 22: Hypothesis Testing

Here we will cover the testing of a hypothesis about a population mean under three different conditions

– 1. when sampling is from a normally distributed population of values with known variance

– 2. when sampling is from a normally distributed population with unknown variance

– 3. when sampling from a population that is not normally distributed

HYPOTHESIS TESTING: A SINGLE POPULATION MEAN

Page 23: Hypothesis Testing

HYPOTHESIS TESTING: A SINGLE POPULATION MEAN

• When sampling is from a normall distributed population the population varianve is known, the test statistic for testing Ho: = o

n

xz 0

Page 24: Hypothesis Testing

Sampling from Normally Distributed Populations: population Varianves Known• Example 6.2.1: Researchers are interested in the mean

level of some enzyme in a certain population.• Let say that they are asking the following question• Can we conclude that the mean enzyme level in this

population is different from 25?• The data available to the researchers are the

determinations made on a sample of 10 with a mean of 22 and it is known that the sample comes from a population with a known variance of 45

• Let us also say that we want to probability of rejecting a true null hypothesis to be = 0.05 (that means we want 95% confidence )

Page 25: Hypothesis Testing

Sampling from Normally Distributed Populations: population Varianves Known• Now we need to look at the

rejection and acceptance regions of example

• From Table C, the value of z for 0.975 is 1.96

• So we may state the decision rule for the test as:

• Reject Ho if the computed value of the test statistic is either 1.96 or ≤ -1.96

• Otherwise do not reject Ho

• For that reason this test is called a two sided test

15µ=10

=2.5

20

=1

=6.71

x=22 25

µ=0 1.96- 1.96

=1

0.95/2=0.025/2=0.025

Page 26: Hypothesis Testing

Sampling from Normally Distributed Populations: population Varianves Known

• Here we set the hypotheses asHo: = 25

o 25

Now we can calculate the test statistic as

41.1

1078.6

2522

0

z

n

xz

Page 27: Hypothesis Testing

Sampling from Normally Distributed Populations: population Varianves Known

• Since the calculated value of z (-1.41) lies between the table values of -1.96 and 1.96, the we do not reject the null hypothesis

• Then we can say that the computed value of the test statistic is not significant at the 0.05 level

• So the conclusion is that µ may be equal to 25

Page 28: Hypothesis Testing

Sampling from Normally Distributed Populations: population Varianves Known

• p Values: Instead of saying that an observed value of the test statistic is significant or not significant we may want to report the exact probability of getting a value as extreme or more extreme than the observed if the null hypothesis is true

• Now for our example, p value is given as p=0.1586

• The satatement p=0.1586 means that the probability of getting a value as extreme as 1.41 in either direction when the null hypothesis is true is 0.1586

Page 29: Hypothesis Testing

Sampling from Normally Distributed Populations: population Varianves Known

• The value 0.1586 is obtained from Table C as we look at the area for z=1.41 which is 0.9207 and area for z=-1.41 which is 0.0793

• Since p value is the probability of observing z 1.41 or a z ≤ -1.41when the null hypothesis is true, the we divide 0.1586 by 2 (0.1586 / 2 = 0.0793)

• This means that, when Ho is true, the probability of obtaining a value of z as large as or larger than 1.41 is 0.0793

• The probability of observing a value of z as small as or smaller than -1.41 is also 0.0793

Page 30: Hypothesis Testing

Sampling from Normally Distributed Populations: population Varianves Known

• The probability one or the other of these events occuring, when Ho is true, is equal to the sum of the two individual probabilities

• So thatp = 0.0793 + 0.0793 = 0.1586

• The quantity p is referred to as the p value for the test

Page 31: Hypothesis Testing

Sampling from Normally Distributed Populations: population Varianves Known

• The p value for a test may be defined also as the smallest value of for which the null hypothesis can be rejected

• So the general rule is then• If the p value is less than or equal to , we reject the null

hypothesis• If the p value is greather than , we do not reject the null

hypothesis

Page 32: Hypothesis Testing

Testing Ho by Means of a Confidence Interval

• One can also use confidence intervals as we seen in previous chapter to test the hypotheses

• In example 6.2.1 we tested the hypothese as

• Ho: = 25

o 25

• And concluded that we are not able to reject Ho since the computed value of the test statistic fall in the acceptance region

• We can also arrive the same conclusion using a 100(1-) percent confidence interval

41.1

1078.6

2522

z

Page 33: Hypothesis Testing

Testing Ho by Means of a Confidence Interval

• The 95 percent confidence interval for µ is

26.16 ,84.17

16.422

1213.296.12210

4596.122

Page 34: Hypothesis Testing

Testing Ho by Means of a Confidence Interval

• Since the interval include 25, then we make the same conclusion as we did with the test statistic

26.16 ,84.17

16.422

1213.296.12210

4596.122

Page 35: Hypothesis Testing

Testing Ho by Means of a Confidence Interval

• In general• When testing a null hypothesis by means of a two

sided confidence interval, we reject Ho at the level of significance if the hypothesized parameter is not contained within the 100(1-) percent confidence interval

• If the hypthesized parameter is containeed within the interval, Ho can not be rejected at the level of significance

Page 36: Hypothesis Testing

One sided Hypothesis Tests

• A hypothesis test may be one sided in which case all the rejection region is in one or the other tail of the distribution

• Whether a one sided or a two sided test is used depends on the nature of the qustion being asked by the researcher

• If both large and small values will cause rejection of the null hypothesis, a two sided test is indicated

• When either sufficiently small values only or sufficiently large values only will cause rejection of the null hypothesis, one sided test is indicated

Page 37: Hypothesis Testing

One sided Hypothesis Tests

• Example 6.2.2:• Now for example 6.2.1, suppose that instead of

asking if they could conclude that µ 25, the researcher had asked the following question

• Can we conclude that µ < 25?• To this question we could reply that they can so

conclude if they can reject the null hypothesis that µ 25

Page 38: Hypothesis Testing

One sided Hypothesis Tests

• Example 6.2.2:• Now let us set up the hypotheses first and

calculate test statistic as

25:H0

25:HA n

xz 0

41.1

1045

2522z

Page 39: Hypothesis Testing

One sided Hypothesis Tests

• The question is which z value from the Table C will be used?

• Let say agin that = 0.05• Since we are only intersed in the values less than

25 we say that the test is one sided and that will go in the one tail of the distribution as

x

f(x)

=1

µ=0

0.95

0.05

-1.645

Page 40: Hypothesis Testing

One sided Hypothesis Tests

• So, now we know that the table value of z for 0.05 will be -1.645

• Then this value will be compared with the value for the test statistic (-1.41)

• Since -1.41 > -1.645, we are unable to reject the null hypothesis

• So, the conclusion is that the population mean may be greather than or equal to 25 and act accordingly

• If the question was “can we conclude that the mean is greather than 25?” then the Table C value would be +1.645

Page 41: Hypothesis Testing

One sided Hypothesis Tests

• The p value for the test statistic is now 0.0793, since P(z ≤ -1.41),when Ho is true, is 0.0793 as given in Table C when we determine the magnitude of the area to the left of -1.41 under the standard normal curve

Page 42: Hypothesis Testing

Sampling from a Normally Distributed Population: Population Variance Unknown

• In reality, most of the time we do not know the population variance

• In cases like that the test statistic for

• becomes

ns

t 0x

00 : H

Page 43: Hypothesis Testing

Sampling from a Normally Distributed Population: Population Variance Unknown• Example 6.2.3• Researchers collected serum amylase values

from a random sample of 15 apparently healty subjects.

• They want to know whether they can conclude that the mean of the population from which the samples of serum amylase determinations came is different from 120.

• The mean and the standard deviation computed from the samples are 96 and 35 units / 100 mL, respectively.

• Here we want the = 0.05

Page 44: Hypothesis Testing

Sampling from a Normally Distributed Population: Population Variance Unknown• As the way question is asked we see that the test

is two sided as

120:H0

120:HA

ns

t 0x

65.204.9

24

15/35

12096

t

Page 45: Hypothesis Testing

Sampling from a Normally Distributed Population: Population Variance Unknown• Now we need to look at the t value from Table E

for = 0.05 (remember it is a two sided test so we look at the value at 0.975

• The t values to the right and left of which 0.025 of the area lies are 2.1448 and -2.1448

• Since the calculated t value is outside of the acceptance region we reject the null hypotesis

• So the conclusion is that based on the data, the mean of the population from which the sample came is not 120

Page 46: Hypothesis Testing

Sampling from a Normally Distributed Population: Population Variance Unknown• The expected p value for this test can not be obtained

from Table E since it gives t values only for the selected percentiles

• The p value can be stated as an interval however.• In this example, -2.65 is less than -2.624, the value of the t

to the left of which lies 0.01 of the area under the t with 14 degrees of freedom, but greater than -2.9768, to the left of which lies 0.005 of the area

• Consequently, when Ho is true, the probability of obtaining a value of t as small as or smaller than -2.65 is less than 0.01 but greater than 0.005

• So we can state this as:• 0.005 < P(t ≤ -2.65) < 0.01

Page 47: Hypothesis Testing

Sampling from a Normally Distributed Population: Population Variance Unknown• Here, since the test is two sided, it must be

allowed for the possibility of a computed value of the test statistic as large in the opposite direction as that observed

• Table E reveals that 0.005 < P(t 2.65) < 0.01• The p value,then, is 0.01 < p < 0.02

Page 48: Hypothesis Testing

Sampling from a Normally Distributed Population: Population Variance Unknown• If in the previous example the hypotheses had

been

• The testing procedure would have led to a one sided test with all the rejection region at the lower tail of the distribution

120:H0

120:HA

Page 49: Hypothesis Testing

Sampling from a Normally Distributed Population: Population Variance Unknown• If the hypotheses had been

• We would have had a one sided test with all the rejection region at the upper tail of the distribution

120:H0

120:HA

Page 50: Hypothesis Testing

Sampling from a Population That is Not Normally Distribution

• In this case, we must have a sample that is equal or greather that 30

• If so, we can use the test statistic as

• If the population standard deviation is not known then the test statistic is

ns

xt 0

n

xz 0

Page 51: Hypothesis Testing

Sampling from a Population That is Not Normally Distribution

• Example 6.2.4• In a healt survey of a certain community 150 persons were

interviewed• One of the items of informaion obtained was the number

of prescriptions each person had had filled during the past year

• The average number for the 150 people was 5.8 with a standard deviation of 3.1

• The investigator wishes to know if these data provide sufficient evidence to indicate that the population mean is greater than 5.

• Here we take the = 0.05

Page 52: Hypothesis Testing

Sampling from a Population That is Not Normally Distribution

• Based on the question is asked we set the hypotheses as

• Then the tst statistic is

5:H0

5:HA

ns

xz 0 2.3

25.0

8.0

1501.3

0.58.5z

Page 53: Hypothesis Testing

Sampling from a Population That is Not Normally Distribution

• Since the question is asked as one sided, the critical value of the test statistic at 0.95 is 1.645

x

f(x)

=1

µ=0

0.95 0.05

1.645

Page 54: Hypothesis Testing

Sampling from a Population That is Not Normally Distribution

• Since the calculated z value ( z=3.2) is greater than the table value (z=1.645) we reject Ho

• Then we conclude that the mean number of prescriptions filled per person per year for this population is greater than 5

• The p value for this test is 0.0007

Page 55: Hypothesis Testing

Computer Analysis

• The following are the head circumferences (centimeters) at bird of 15 infants

• The goal is to test

33.38 32.15 33.99 34.10 33.97

34.34 33.95 33.85 34.23 32.73

33.46 34.13 34.45 34.19 34.05

5.34:H

5.34:H

A

0

Page 56: Hypothesis Testing

Computer Analysis• The following are the head circumferences (centimeters)

at bird of 15 infants• The goal is to test

5.34:H

5.34:H

A

0

33.38 32.15 33.99 34.1 33.97

34.34 33.95 33.85 34.23 32.73

33.46 34.13 34.45 34.19 34.05

n 15      

mean 33.798      

stdev 0.630297      

var 0.397274      

µ 34.5      

0.05      

t (cal) -4.31358      

t (table) 2.144789 two sided    

t (table) 1.761309 one sided    

ttest        

Since calculated t value is outside of the        

table values (-2.145 and 2.145) we say that        

the mean of our sample is not 34.5        

and alternative hypothesis is accepted        

Page 57: Hypothesis Testing

Hypothesis Testing: The Difference Between Two Population Means

• Here we can formulate the following hypotheses

0:H,0:H.3

0:H,0:H.2

0:H,0:H.1

21A210

21A210

21A210

Page 58: Hypothesis Testing

Hypothesis Testing: The Difference Between Two Population Means

• As we discussed in previous case, we will discuss this issue under three subcategories as:– 1. when sampling is from a normally distributed

population of values with known variance– 2. when sampling is from a normally distributed

population with unknown variance– 3. when sampling from a population that is not normally

distributed

Page 59: Hypothesis Testing

Sampling From Normally Distributed Population Of Known Variance

• The test statistic will be

2

22

1

21

02121

nn

xxz

Page 60: Hypothesis Testing

Sampling From Normally Distributed Population Of Known Variance

• Example 6.3.1• The task is to find out if there is a sufficient

evidence to indicate a difference in mean serum uric acid levels between normal individuals and individuals with mongolism

• A samples of 12 mongolism case and 15 normal individuals were taken

• The means are 4.5 mg/100 mL and 3.4 mg/100 mL• The =0.05

Page 61: Hypothesis Testing

Sampling From Normally Distributed Population Of Known Variance

• The hypotheses are

• Alternatively, we can also set the hypotheses as:

0:H

0:H

21A

210

21A

210

:H

:H

Page 62: Hypothesis Testing

Sampling From Normally Distributed Population Of Known Variance

• Since we are dealing with a case where we know the variance, then the test statistic will be based on the z=±1.96 (at 0.975)

• Here we will be able to reject the null hypothesis if the calculated z value is outside of the range

-1.96 < zcalculated < 1.96

Page 63: Hypothesis Testing

Sampling From Normally Distributed Population Of Known Variance

• Now if calculate the z value as

2

22

1

21

02121

nn

xxz

82.2

39.0

1.1

15/112/1

04.35.4z

• Since the calculated value is outside of the range we just described we reject the Ho and conclude that on the basis of this data there is an indication of that the two population means are not equal

• The p value for the test is p=0.0048

Page 64: Hypothesis Testing

Sampling From Normally Distributed Population Of Unknown Variance

• If the variances are not known than there are two possibbilitis– Population variances equal– Population variances unequal

• Lets look at the first case• Here we calculate a pooled variance as

2nn

s)1n(s1ns

21

222

2112

p

Page 65: Hypothesis Testing

Sampling From Normally Distributed Population Of Unknown Variance

• The tset statistic is then calculated as

2

2p

1

2p

02121

n

s

n

s

)(xxt

Page 66: Hypothesis Testing

Sampling From Normally Distributed Population Of Unknown Variance

• Example 6.3.2 The data is about the serum amylase determibation of n2=15 healty subjects and n1= 22 hospitalized subjects

• The sample means and standard deviations are given as

• They wish to know if they would be justified in concluding that the population means are different

mLunitssmLunitsx

mLunitssmLunitsx

/ 35 / 96

/ 40 / 120

22

11

Page 67: Hypothesis Testing

Sampling From Normally Distributed Population Of Unknown Variance

• The hypotheses are set as

• For =0.05 at 22+15 – 2 degrees of freedom the t value from table is ±2.0301

• So, we reject Ho if calculated t value is outside the range -2.0301 < t < 2.0301

0:

0:

21

210

AH

H

Page 68: Hypothesis Testing

1450

21522

)35)(115()40(122

2

)1(1

222

21

222

2112

p

p

s

nn

snsns

88.1

151450

221450

096120)(

2

2

1

2

02121

n

s

n

s

xxt

pp

Page 69: Hypothesis Testing

Sampling From Normally Distributed Population Of Unknown Variance

• Since the calculated t value is in the range of -2.0301 < 1.88 < 2.0301

• We are unable to reject Ho• So, the conclusion is that based on the data, we

can not conclude that the two population means are different

• Also, for this test 0.10 > p > 0.05 since 1.6896 < 1.88 < 2.0301

Page 70: Hypothesis Testing

Sampling From Normally Distributed Population Of Unknown Variance

• If the population variances unequal

• The table value for t’ at two sided test is obtained as

2

22

1

21

02121 )(

ns

ns

xxt

1

21

11

21

1

21

22112/1

n

sw

n

sw

ww

twtwt

Page 71: Hypothesis Testing

Sampling From Normally Distributed Population Of Unknown Variance

• Example 6.3.3• İt is about to determine if two population differ

with respect to the mean value of a total serum complement activith (CH50).

• The data given for CH50 activity determination on n2=20 normal subjects and n1=10 subjects with disease

• The following data is obtained

1.10 2.47

8.33 6.62

22

11

sx

sx

Page 72: Hypothesis Testing

Sampling From Normally Distributed Population Of Unknown Variance

• The hypotheses are set as

• For =0.05 we obtain the t table value as

0:

0:

21

210

AH

H

255.21005.5244.114

0930.2*1005.52622.2*244.114

1005.520

)1.10( 114.244

10

)8.33(

2/05.01

2

2

22

2

2

1

21

1

21

22112/1

t

n

sw

n

sw

ww

twtwt

Page 73: Hypothesis Testing

Sampling From Normally Distributed Population Of Unknown Variance

• So we can oly reject Ho if t calculated is either tcal 2.2555 or tcal ≤ -2.255

• Here -2.255 < 1.41 < 2.255• Then we can not reject Ho• So, we can not conclude that the two population means

are different

41.1

20)1.10(

10)8.33(

02.476.62)(22

2

22

1

21

02121

ns

ns

xxt

Page 74: Hypothesis Testing

Sampling From Non-normally Distributed Populations

• In this case we approximate the normal distribution if we have large samples (based on central limit theorem) and use the z statistic as

2

22

1

21

02121

nn

xxz

Page 75: Hypothesis Testing

Sampling From Non-normally Distributed Populations

• Example 6.3.4• A hospital administrator want to know that if the

population that patronize hospital A has a larger mean family income than does the population that patronize hospital B.

• The data consist of the family incomes of 75 patients admitted to hospital A andf 80 patients admitted to hospital B

• The sample means are:

5450$

6800$

2

1

x

x

Page 76: Hypothesis Testing

Sampling From Non-normally Distributed Populations

• Example 6.3.4• Now let us assume that the data constitute two

independent random samples, each drawn from a nonnormally distributed population with a standard deviation

500$ 5450$

600$ 6800$

22

11

x

x

Set up an appropriate hypothesis for an 0f 0.01and test it

Page 77: Hypothesis Testing

Sampling From Non-normally Distributed Populations

• Example 6.3.4

2

22

1

21

02121

nn

xxz

0:H

0:H

21A

210

21A

210

:H

:H

17.15

89

1350

80)500(

75)600(

054506800z

22

Page 78: Hypothesis Testing

Sampling From Non-normally Distributed Populations

• Example 6.3.4

0:H

0:H

21A

210

21A

210

:H

:H

17.15

89

1350

80)500(

75)600(

054506800z

22

The fact that the calculated z value is larger than z-critical (z-table=2.33) we conclude that the population patronizing hospital A has a larger mean family income than the population patronizing hospital B.

Page 79: Hypothesis Testing

PAIRED COMPARISONS

The test proceduses so far we have been concidered assume that the samples are independent.

Thet are not appropriate for the related observations resulting from nonindependent samples.

For this type of problems paired comparison test procedures are used

The objective of paired comparison is to eliminate a maximum number of sources of extraneous variation by making the pairs similar with respect to as many variables as possible

Page 80: Hypothesis Testing

PAIRED COMPARISONS

The paired test statistic can ber applied either as a z-test or t-test depending on the knowledge of variance of the differences

İf the variance of the differences is known we can use a z test as

n/

dz

d

d

Page 81: Hypothesis Testing

PAIRED COMPARISONS

İf the variance of the differences is NOT known we can use a t-test as

nss

s

dt

dd

d

d

Page 82: Hypothesis Testing

Example 6.4.1

Twelve subjects participated in an experiment to study the effectivenes of a certain diet combined with a program of exercise, in reducing serum cholestrol levels

Table below show the data

Do the data provide sufficient evidence for us to conclude that the diet-exercise program is effective in reducing serum cholestrol levels?

 Serum cholestrol

levels    

Subjects Before (x1) After(x2) Difference (x2-x1)

1 201 200 -1

2 231 236 5

3 221 216 -5

4 260 233 -27

5 228 224 -4

6 237 216 -21

7 326 296 -30

8 235 195 -40

9 240 207 -33

10 267 247 -20

11 284 210 -74

12 201 209 8

Page 83: Hypothesis Testing

0:

0:0

dA

d

H

H

17.2012

242

12

)8(....)5()5()1(

n

dd i

06.535

)11(12

)242(10766(12

)1n(n

)d(dn

1n

dds

22i

2i

2

i2d

Example: 6.4.1

7959.1

02.368.6

17.20

1206.535

017.20

tablet

t

So the diet program is effective

And we reject the nul hypothesis (Ho)

Page 84: Hypothesis Testing

47.5,87.34

70.1417.20

)68.6(201.217.20

)2/1(

dstd

A 95% confidence interval for µd can be obtained as

As seen the confidence interval do not contain zero so we can reject the nul hypothesis

Page 85: Hypothesis Testing

Two machines are used for filling plastic bottles with a net volume of 16.0 ounces.

The fill volume can be assumed normal with standard deviation σ1=0.020 and

σ2=0.025 ounces. A member of quality engineering staff suspects that both machines

fill to the same mean net volume, whether or not this volume is 16.0 ounces.

A random sample of 10 bottles is taken from the output of each machine as given

below.

Machine 1 Machine 2

16.03 16.02

16.04 15.97

16.05 15.96

16.05 16.01

16.02 15.99

16.01 16.03

15.96 16.04

15.98 16.02

16.02 16.01

15.99 16.00

Page 86: Hypothesis Testing

HYPOTHESIS TESTING: A SINGLE PROPORTION

nqp

pp̂z

oo

0

Page 87: Hypothesis Testing
Page 88: Hypothesis Testing

nqp

pp̂z

oo

0

Page 89: Hypothesis Testing
Page 90: Hypothesis Testing

5.0p:H

5.0p:H

A

0

11.30289.0

9.0

300)5.0)(5.0(

50.041.0z

Example 6.5.1

ztable =±1.96

Page 91: Hypothesis Testing

HYPOTHESIS TESTING: THE DIFFERENCE BETWEEN TWO

POPULATION PRPOPORTIONS

Page 92: Hypothesis Testing
Page 93: Hypothesis Testing

HYPOTHESIS TESTING: THE DIFFERENCE BETWEEN TWO

POPULATION PRPOPORTIONS

21

21

nn

xxp

21p̂p̂ n

)p1(p

n

)p1(pˆ

21

21 p̂p̂

02121 ppp̂p̂z

Page 94: Hypothesis Testing
Page 95: Hypothesis Testing
Page 96: Hypothesis Testing
Page 97: Hypothesis Testing

90.0100/90p̂,78.0100/78p̂ 21

84.0100100

7890p

0pp:H

0pp:H

21A

210

32.20518.0

12.0

100)16.0)(84.0(

100)16.0)(84.0(

)78.09.0(z

Example 6.6.1

ztable =±1.645

Page 98: Hypothesis Testing

HYPOTHESIS TESTING: A SINGLE POPULATION VARIANCES

Page 99: Hypothesis Testing
Page 100: Hypothesis Testing
Page 101: Hypothesis Testing
Page 102: Hypothesis Testing
Page 103: Hypothesis Testing
Page 104: Hypothesis Testing

HYPOTHESIS TESTING: A SINGLE POPULATION VARIANCES

222 /s)1n(

2500:H

2500:H2

A

20

86.62500/)1225)(14(2

Example 6.7.1

X2table =5.629 and 26.119

Page 105: Hypothesis Testing

HYPOTHESIS TESTING: THE RATIO OF TWO POPULATION VARIANCES

Page 106: Hypothesis Testing

HYPOTHESIS TESTING: THE RATIO OF TWO POPULATION VARIANCES

Page 107: Hypothesis Testing

HYPOTHESIS TESTING: THE RATIO OF TWO POPULATION VARIANCES

Page 108: Hypothesis Testing
Page 109: Hypothesis Testing
Page 110: Hypothesis Testing
Page 111: Hypothesis Testing
Page 112: Hypothesis Testing

HYPOTHESIS TESTING: THE RATIO OF TWO POPULATION VARIANCES

22

21A

22

210

:H

:H

22

21 ss.R.V

31.11225

1600.R.V

Example 6.8.1

Ftable =2.39 for numerator 20 (21 is not given so the closest value is used)

Page 113: Hypothesis Testing

Test For Outliers

•One of the important aim in the statistical tests is to recognize the presene or absence of outliers

•Outliers in a series of measurements are extraordinarily small or large observations compared with the bulk of the data

•There are test procedures in order to detect outliers in data and we will look at the Dixon’s Q-test

•Q-test is one of the nost frequently used outlier test procedure

Page 114: Hypothesis Testing

Test For Outliers

•The Q-test uses the range of measurements and can be applied even when only few data are available

•The n measurements are arranged in ascending order

•If the very small value to be tested as an outlier is denoted by x1 and the very large value by xn

•Then the test statistic is calculated as given on the next slide

Page 115: Hypothesis Testing

Test For Outliers

•For the smallest one

1

121 xx

xxQ

n

Page 116: Hypothesis Testing

Test For Outliers

•For the Largest one

1

1

xx

xxQ

n

nnn

Page 117: Hypothesis Testing

Test For Outliers

•The null hypothesis, i.e, that the concidered measurement is not an outlier, is accepted if the quantity Q<Q(1-;n).

•If the calculated Q value is greather than the Table value [Q>Q(1-a;n)], then we reject the null hypothesis and say that the value is an outlier

•Q values for selected significance and degrees of freedom are given in standard tables

Page 118: Hypothesis Testing

Example

•Trace analysis of polycyclic aromatic hydrocarbons (PAH) in a soil revealed for the trace constituent benzo[a]pyrene the following values in mg/kg dry weight

•5.30, 5,00, 5.10, 5.20, 5.10, 6.20, 5.15

•Apply the Q-test to check whether the smallest and largest value might be an outlier

Page 119: Hypothesis Testing

Example

•First we need to arrange the data in an ascending order as

•5.00, 5,10, 5.10, 5.15, 5.20, 5.30, 6.20

•The we can calculate the Q value for both smallest and largest values as

083.000.520.6

00.510.5

1

121

xx

xxQ

n

Page 120: Hypothesis Testing

Example•For the largest value

75.000.520.6

30.520.6

1

1

xx

xxQ

n

nnn

•For an =0.01 we can obtain the table value as

•Q(1-0.01=0.99;n=7)=0.64

•Since the Q1 value is much smaller (0.083) than the table value we can not eliminate the smallest value as outlier

•However, the Q2 value is in fact larger than the table value and for this reason we can eliminate the largest as outlier

Page 121: Hypothesis Testing

Grubbs’s Test for Outlier

);1(*

nTs

xxT table

•It can be applied for series of measurements consisting of 3 to 150 measuremets

•The null hypothesis, according to which x* is not an outlier within the measurement series of n values is accepted at level , if the test quantitity T is:

•By use of the test quantity T, the distances of the suspicious values from the mean are determined and related to the standard deviation of the measurements

Page 122: Hypothesis Testing

Grubbs’s Test for Outlier

21.2411.0

20.629.5*

71.0411.0

00.529.5*1

s

xxT

s

xxT

n

•Exmaple

•The data for the trace analysis of benzo[a]pyrene from previous example are also used in Grubbs’s test

•The mean of the data was 5.29 and the standard deviation was 0.411)

•The we can calculate the T values for the smallest ans the largest vales as

Page 123: Hypothesis Testing

Grubbs’s Test for Outlier

21.2411.0

20.629.5*

71.0411.0

00.529.5*1

s

xxT

s

xxT

n

•Exmaple

•The table value at an a=0.01 is

•T(1-0.01=0.99;n=7) =2.10

•As a result, the test results is not significant for the smallest value but is significant for the largest value

•So the largest one is an outlier

Page 124: Hypothesis Testing

Non-parametric Tests for Method Comparison

•The Tests that we have seen so far all requires that the data must be normaly distributed.

•In this case distribution free methods needs to be used

•These methods do not require the parameters such as mean and standard deviation used in the previous tests

•For that reason, they are non-parametric methods

•These methods require more replicate mesurements

•The do not use the values of the quantitative variables

•They use the rank of the data and are based on the counting

Page 125: Hypothesis Testing

Non-parametric Tests for Method Comparison

•We will look at two example of non-parametric tests

•These are:•The Mann-Whitney U-test for the comparison of the independent samples•Wilcoxon T-test for for paqired measurements

•When Normality isw doubtful, you should always check these tests especially in the case of small samples

Page 126: Hypothesis Testing

The Mann-Whitney U-test

•This test is based on the ranking the samples by taking the both gruops (group A and group B ) of the data together

•It gives the rank 1 to the lowest result and rank 2 to the second ect.

•If n1 and n2 are the number of data in the group with the smallest and largest number of results, respectively, and R1 and R2 are the sum of the ranks in these two groups, then we can we can set up the equations as:

2

22212

111

211

21

21

Rnn

nnU

Rnn

nnU

Page 127: Hypothesis Testing

The Mann-Whitney U-test

•The smaller of the two U values is used to evaluate the test

•When we have tie, the the average of the ranks are given.

•The Mann-whitney test compares the median of the two samples

•The smaller the diffrerence between the medians, the smaller the difference between U1and U2

2

22212

111

211

21

21

Rnn

nnU

Rnn

nnU

211

210

:

:

UUH

UUH

Page 128: Hypothesis Testing

The Mann-Whitney U-test

•Example: The following two grops of measuremets are to be compared

•Here the lowest results, 10.8 is given the rank 1.

•Since we have 10.8 twice in group A and B, they are both given the rank of 1.5 as their average

A B

11.1 10.9

13.7 11.2

14.8 12.1

11.2 12.4

15.0 15.5

16.1 14.6

17.3 13.5

10.9 10.8

10.8

11.7

5.1

221 Rank

Page 129: Hypothesis Testing

The Mann-Whitney U-test

•If we set the hypothesis as

•This will be a two sided test

Group result rank Group result rank

A 10.8 1.5 B 12.4 10

B 10.8 1.5 B 13.5 11

A 10.9 3.5 A 13.7 12

B 10.9 3.5 B 14.6 13

A 11.1 5 A 14.8 14

A 11.2 6.5 A 15.0 15

B 11.2 6.5 B 15.5 16

A 11.7 8 A 16.1 17

B 12.1 9 A 17.3 18

211

210

:

:

UUH

UUH

Page 130: Hypothesis Testing

The Mann-Whitney U-test

•R1 is the sum of the ranks in group B as:

•R1=1.5+3.5+6.5+9+11+13+16=70.5

•R2 is the sum of the ranks in group A as:

•R2=1.5+3.5+6+6.5+8+10+12+14+15+17+18=100.5

5.34),min(

that notice

5.345.1002

1101010*8

21

5.455.702

18*810*8

21

21

2121

222

212

111

211

UUU

nnUU

Rnn

nnU

Rnn

nnU

Page 131: Hypothesis Testing

The Mann-Whitney U-test

•From the table for a two sided test with n1=8 and n2=10, a value of 17 is found.

•If an observed U value is les than or equal to the value in the table, the null hypothesis may be rejected at the level of the significance of the table.

•Since our calculated value is larger than 17, we conclude that no difference between the two groups.

5.34),min(

that notice

5.345.1002

1101010*8

21

5.455.702

18*810*8

21

21

2121

222

212

111

211

UUU

nnUU

Rnn

nnU

Rnn

nnU

Page 132: Hypothesis Testing

The Mann-Whitney U-test•We can now check the data used in this test have any tendency to show normal distribution or not.

sample raw A raw B ranked ranked (j-0.5)/10 (j-0.5)/8 ranked (j-0.5)/18

1 11.10 10.90 10.80 10.80 0.05 0.06 10.80 0.03

2 13.70 11.20 10.90 10.90 0.15 0.19 10.80 0.08

3 14.80 12.10 11.10 11.20 0.25 0.31 10.90 0.14

4 11.20 12.40 11.20 12.10 0.35 0.44 10.90 0.19

5 15.00 15.50 11.70 12.40 0.45 0.56 11.10 0.25

6 16.10 14.60 13.70 13.50 0.55 0.69 11.20 0.31

7 17.30 13.50 14.80 14.60 0.65 0.81 11.20 0.36

8 10.90 10.80 15.00 15.50 0.75 0.94 11.70 0.42

9 10.80   16.10   0.85   12.10 0.47

10 11.70   17.30   0.95   12.40 0.53

11             13.50 0.58

12             13.70 0.64

13             14.60 0.69

14             14.80 0.75

15             15.00 0.81

16             15.50 0.86

17             16.10 0.92

18             17.30 0.97

Page 133: Hypothesis Testing

The Mann-Whitney U-test•We can now check the data used in this test have any tendency to show normal distribution or not.

Normal probabilty plot

0.00

0.25

0.50

0.75

1.00

10.00 12.00 14.00 16.00 18.00

Measurement

Pro

ba

bili

ty

Group A

Group B

A and B

Page 134: Hypothesis Testing

Wilcoxon Matched Pairs Signed-Rank test

•In this test, difference of the (di) paired data first calculated

•These di values are ranked first without regard to sign starting with the smallest value.

•Then the same sign is given as to corresponding difference

•If there are ties, the same rule (take average) is applied as in the Mann-Whitney test

•If any di value is zero the you can either drop them from analysis or assign a rank of (p+1)/2, in which p is the number of zero differences

•In this case half of the zero difference takes negative and the other half positive rank

Page 135: Hypothesis Testing

Wilcoxon Matched Pairs Signed-Rank test

•The null hypohesis is that the methods A and B are equivalet

•If Ho is true, it would be expected that that the sum of all ranks for positive differences (T+) would be close to the sum for negative differences (T-).

The test statistic is than for two sided case:

Wilcoxon T-test is calculated as: T= min (T+, T-)

The smaller the value of T, the larger the significance of the difference

BAH

BAH

:

:

1

0

Page 136: Hypothesis Testing

Wilcoxon Matched Pairs Signed-Rank test

Lets now do the example

sample R T d=R-T rank signed rank

1 114 116 -2 1 -1

2 49 42 7 7.5 7.5

3 100 95 5 4 4

4 20 10 10 9.5 9.5

5 90 94 -4 2.5 -2.5

6 106 100 6 5.5 5.5

7 100 96 4 2.5 2.5

8 95 102 -7 7.5 -7.5

9 160 150 10 9.5 9.5

10 110 104 6 5.5 5.5

Page 137: Hypothesis Testing

Wilcoxon Matched Pairs Signed-Rank test

•The critical (Table) value of T as a function of n and are given tables.

•In our example, all positive differences adds up to T+=44.0

•And all negative differences T-=11.0

•If the calulated T value is equal to or smaller than the table value, the null hyothesis is rejected.

•For an =0.05 and n=10 in our two sided test, the table value is T=8.

•Thus the nul hypothesis is accepted and we can conclude that there is no diffrence between the two method

Page 138: Hypothesis Testing

Q.1 A liquid dietary product implies in its advertising that use of the product for one month results in an average weight loss of at least 3 pounds. Eight subjects use the product for one month, and the resulting weight loss data are reported below. Use hypothesis-testing procedures to answer the following questions.

(a) Do the data support the claim of the producer of the dietary product at 95% confidence?(b) Do the data support the claim of the producer of the dietary product at 99% confidence?(c) In an effort to improve sales, the producer is consideringchanging its claim from “at least 3 pounds” to “at least 5 pounds.” Repeat parts (a) and (b) to test this new claim.

Page 139: Hypothesis Testing

Q.2 The overall distance traveled by a golf ball is tested by hitting the ball with Iron Byron, a mechanical golfer with a swing that is said to emulate the legendary champion, Byron Nelson. Ten randomly selected balls of two different brands are tested and the overall distance measured. The data follow:

Brand 1: 275, 286, 287, 271, 283, 271, 279, 275, 263, 267Brand 2: 258, 244, 260, 265, 273, 281, 271, 270, 263, 268

(a) Is there evidence that overall distance is approximately normally distributed?

(b) Test the hypothesis that both brands of ball have equal mean overall distance.at 95% confidence.