chapter 9 large-sample tests of hypotheses general objectives: in this chapter, the concept of a...

Chapter 9 Large-Sample Tests of Hypotheses

General Objectives:

In this chapter, the concept of a statistical test of a hypothesis is formally introduced. The sampling distributions of statistics presented in earlier chapters are used to construct large-sample tests concerning the values of population parameters of interest to the experimenter.

Specific Topics

1. A Statistical test of hypotheses

2. Large-sample test about a population mean

3. Large-sample test about ( 1 2)

4. Testing a hypothesis about a population proportion p

5. Testing a hypothesis about (p 1 p 2)

9.1 Testing Hypotheses About Population Parameters

Samples can be used to estimate the mean potency of a population.

Two possibilities:

- The mean potency does not exceed the minimum allowable potency.

- The mean potency exceeds the minimum allowable potency.

This is an example of a statistical test of a hypothesis.

9.2 A Statistical Test of Hypothesis

A statistical test of hypothesis consists of five parts:

1. The null hypothesis, denoted by H 0

2. The alternative hypothesis, denoted by Ha

3. The test statistic and its p-value

4. The rejection region

5. The conclusion

Definition: The two competing hypotheses are the alternative hypothesis Ha , generally the hypothesis that the researcher

wishes to support, and the null hypothesis H 0 , a contradiction of

the alternative hypothesis.

The researcher then uses the sample data to decide whether the evidence favors Ha rather than H 0 and draws one of these two conclusions:

- Reject H 0 and conclude that Ha is true.

- Accept (do not reject) H 0 as true. Examples 9.1 and 9.2 show null and alternative hypotheses. You can have a two-tailed test of a hypothesis or a one-tailed

test of a hypothesis, a left tailed-test or a right-tailed test. The test statistic is a single number calculated from sample

data. The p-value is a probability calculated using the test statistic. Either or both of these measures act as a decision maker for the

researcher in deciding whether to reject or accept H 0. Example 9.3 deals with the z-score and the p-value.

Figures 9.1 and 9.2 show acceptance and rejection regions.

Example 9.3

For the test of hypothesis in Example 9.1, the average hourly wage for a random sample of 100 California construction workers might provide a good test statistic for testing.

If the null hypothesis H 0 is true, then the sample mean should

not be too far from the population mean 14. Suppose that this sample produces a sample mean with standard deviation s 2. Is this sample evidence likely or unlikely to occur, if in fact H 0 is true? You can use two measures to find

out. Since the sample size is large, the sampling distribution of is approximately normal with mean 14 and standard error

The test statistic lies standard

deviations from the population mean.

14: versus 14:0 aHH

x.2.100/2/ n

The p-value is the probability of observing a test statistic that is five or more standard deviations from the mean. Since z measures the number of standard deviations a normal random variable lies from its mean, you have

The large value of the test statistic and the small p-value mean that you have observed a very unlikely event, if indeed H 0 is true and 14.

0)5()5(value- zPzPp

Definition:

A Type I error for a statistical test is the error of rejecting the null hypothesis when it is true.

The level of significance (significance level) for a statistical test of a hypothesis is

The value represents the maximum tolerable risk oF incorrectly rejecting H 0.

true) is it when (rejecting

) rejecting(falsely error) I (Type

9.3 A Large-Sample Test About a Population Mean

H 0 : 0

H a : 0

The standard error of is calculated as The standardized test statistic:

Figure 9.3 shows a rejection region. Examples 9.4 and 9.5 deal with tests of hypotheses concerning the mean.

Figure 9.3 The rejection region of a right-tailed test with .01

Example 9.4

The average weekly earnings for women in managerial and professional positions is $670. Do men in the same positions have average weekly earnings that are higher than those for women? A random sample of n 40 men in managerial and professional positions showed $725 and s $102. Test the appropriate hypothesis using .01.

Solution

You would like to show that the average weekly earnings for men are higher than $670, the women’s average. Hence, if is the average weekly earnings in managerial and professional positions for men, the hypotheses to be tested are

H 0 : 670 versus H a : 670

The rejection region for this one-tailed test consists of large values of or, equivalently, values of the standardized test statistic z in the right tail of the standard normal distribution, with .01. This value is found in Table 3 of Appendix I to be z 2.33, as shown in Figure 9.3. The observed value of the test statistic, using s as an estimate of the population standard deviation, is

Since the observed value of the test statistic falls in the rejection region, you can reject H 0 and conclude that the average weekly

earnings for men in managerial and professional positions are significantly higher than those for women. The probability that you have made an incorrect decision is .01.

41.340/102

670725

670 ns

Figure 9.4 The rejection region for a two-tailed test with .01

The two-tailed hypothesis is written as H a : 0, which implies either 0 or 0..

Large-Sample Statistical Test for :

1. Null hypothesis: H 0 : 0

2. Alternative hypothesis:

One-Tailed Test Two-Tailed Test

H a : 0 H a : 0

(or H a : 0 )

3. Test statistic:

If is unknown (which is usually the case), substitute thesample standard deviation s for ..

4. Rejection region: Reject H 0 when

One-Tailed Test Two-Tailed Testz z z z/2 or z z/2

(or z z when the alternative hypothesis is H a : 0)

Assumptions: The n observations in the sample are randomly selected from the population and n is large—say, n 30.

The unnumbered figures on page 344 show one- and two-tailed rejection regions:

Calculating the p-Value

To avoid any ambiguity in their conclusions, some experimenters prefer to use a variable level of significance called the p-value for the test.

Definition: The p-value or observed significance level of a statistical test is the smallest value of for which H0 can be rejected. It is the actual risk of committing a Type I error, if H0 is rejected based on the observed value of the test statistic. The p-value measures the strength of the evidence against H0.

The p-value of the test is actually the area to the right of the calculated value of the test statistic (if the critical value is in the right tail).

Figure 9.5 illustrates variable rejection regions.

Figure 9.5 Variable rejection regions

Definition: If the p-value is less than a preassigned significance level , then the null hypothesis can be rejected, and you can report that the results are statically significant at level .

Example 9.6 shows the calculation of the p-value for a two-tailed test.

Example 9.6

Calculate the p-value for the two-tailed test of hypothesis in Example 9.5. Use the p-value to draw conclusions regarding the statistical test.

Solution

The rejection region for this two-tailed test of hypothesis is found in both tails of the normal probability distribution. Since the observed value of the test statistic is z 3.03, the smallest rejection region that you can use and still reject H0 isz 3.03. For this rejection region, the value of is the p-value:

p-value P (z 3.03) P (z 3.0)

2(.5 .4988) 2(.0012) .0024

Notice that the two-tailed p-value is actually twice the tail area corresponding to the calculated value of the test statistic. If this p-value .0024 is less than the preassigned level of significance , H0 can be rejected. For this test, you can reject H0 at either the 1% or the 5% level of significance.

Many researchers use a “sliding scale” to classify their results:

- If the p-value is less than .01, H0 is rejected. The results are

highly significant.

- If the p-value is between .01 and .05, H0 is rejected.

The results are statistically significant.

- If the p-value is between .05 and .10, H0 is usually not rejected. The results are only tending toward statistical significance.

- If the p-value is greater than .10, H0 is not rejected. The results are not statistically significant.

Example 9.7 conducts a test of hypothesis concerning the mean.

The p-value approach does have two advantages:

- Statistical output from packages such as Minitab usually

report the p-value of the test.

- Based on the p-value, your test results can be evaluated using

any significance level you wish to see. The smaller the p-value, the more unlikely it is that H 0 is true! Table 9.1 illustrates a decision table.

Table 9.1 Null Hypothesis

Decision True False

Reject H 0 Type I error Correct decision

Accept H 0 Correct decision Type II error

Definition: A Type I error for a statistical test is the error of rejecting the null hypothesis when it is true. The probability of making a Type I error is denoted by the symbol .

A Type II error for a statistical test is the error of accepting (not rejecting) the null hypothesis when it is false and some alternative hypothesis is true. The probability of making a Type II error is denoted by the symbol .

Notice that the probability of a Type I error is exactly the same as the level of significance and is therefore controlled by the researcher.

Keep in mind that “accepting” a particular hypothesis means deciding in its favor.

There is always a risk of being wrong, measured by and .

Definition: The power of a statistical test, given as

1 P (reject H 0 when H a is true)

measures the ability of the test to perform as required.

A graph of (1 ), the probability of rejecting H 0 when in fact

H 0 is false, as a function of the true value of the parameter of

interest is called the power curve for the statistical test.

Ideally, you would like to be small and the power (1 ) tobe large.

Example 9.8 shows the calculation of and the power of the test (1 ).

Figure 9.7 Calculating in Example 9.8

Figure 9.8 Power curve for Example 9.8

9.4 A Large-Sample Test of Hypothesis for the Difference

Between Two Population Means In testing whether the difference in sample means

indicates that the true difference in populations means differs from a specified value, ( 1 2) D 0 , you can use the standard error of the difference in sample means:

in the form of a z statistic to measure how many standard deviations the difference lies from the hypothesized difference D 0 .

Large-Sample Statistical Test for ( 1 2 ):

1. Null hypothesis: H 0 : ( 1 2) D 0 , where D 0 is some specified difference that you wish to test. For many tests, you will hypothesize that there is no difference between 1 and 2; that is, D 0 0.

H a : ( 1 2) D 0 H a : ( 1 2) D 0

[or H a : ( 1 2) D 0 ]

3. Test statistic:

If are unknown (which is usually the case), substitute the sample variancesrespectively.

021021

DxxDxxz

and 22

, and for and 22

z z z z/2 or z z/2

[or z z/2 when the alternative hypothesisis H a : ( 1 2) D 0 ]

or when p-value .

Assumptions: The samples are randomly and independently selected from the two populations and n1 30 and n2 30.

Example 9.9 illustrates a test of the difference in two means.

Example 9.9

A university investigation conducted to determine whether car ownership affects academic achievement was based on two random samples of 100 male students, each drawn from the student body. The grade point average for the n1 100 nonowners of cars had an average and variance equal to

as opposed to

for the n2 100 car owners. Do the data present sufficient

evidence to indicate a difference in the mean achievements between car owners and nonowners of cars? Test using .05.

,36. and 70.2 211 sx 40. and 54.22 2

Solution

To detect a difference, if it exists, between the mean academic achievements for nonowners of cars 1 and car owners 2 , you will test the null hypothesis that there is no difference between the means against the alternative hypothesis that ( 1 2) 0;

that is,

Substituting into the formula for the test statistic, you get

0)(: versus 0)(: 210210 aHDH

10040.

10036.

54.270.2

Hypothesis Testing and Confidence Intervals

- If the confidence interval you construct contains the value of

the parameter specified by H 0 , then that value is one of the

likely or possible values of the parameter and H 0 should be rejected.

- If the hypothesized value lies outside of the confidence limits,

the null hypothesis is rejected at the level of significance.

Example 9.10 constructs a 95% confidence interval for the difference in average academic achievements.

It is important to understand the difference between results that are “significant” and results that are “practically” important. In statistical language, the word significant does not necessarily mean “ important”, but only that the results could not have occurred by chance.

The unnumbered example on page 364 illustrates a case of statistical versus practical significance.

9.5 A Large-Sample Test of a Hypothesis for a Binomial Proportion

Large-Sample Statistical Test for p

1. Null hypothesis: H 0 : p p 0

H a : p p 0 Ha : p p 0

(or H a : p p 0 )

3. Test statistic:

where x is the number of successes in n binomial trials.

ˆ with

(or z z/2 when the alternative hypothesisis H a : p p 0 )

or when p-value

Assumption: The sampling satisfies the assumptions of a binomial experiment and n is large enough so that the sampling distribution of can be approximated by a normal distribution(np 0 5 and nq 0 5).

Example 9.11 shows a large sample test of hypothesis for a binomial proportion.

Example 9.11

Regardless of age, about 20% of American adults participate in fitness activities at least twice a week. However, these fitness activities change as the people get older, and occasional participants become nonparticipants as they age. In a local survey of n 100 adults over 40 years old, a total of 15 people indicated that they participated in a fitness activity at least twice a week. Do these data indicate that the participation rate for adults over 40 years of age is significantly less than the 20% figure? Calculate the p-value and use it to draw the appropriate conclusions.

Solution

It is assumed that the sampling procedure satisfies the requirements of a binomial experiment. You can answer the

question posed by testing the hypothesis

A one-tailed test is used because you wish to detect whether the value of p is less than .2.

The point estimator of p is and the test statistic is

When H 0 is true, the value of p is p 0 .2, and the sampling

distribution of has a mean equal to p 0 and a standard deviation of

Hence, is not used to estimate

the standard error of in this case because the test statistic is

calculated under the assumption that H 0 is true. (When you estimate

the value of p using the estimator , the standard error of is not

known and is estimated by

,ˆ nxp

.00 nqp nqp ˆˆ

2.: versus 2.:0 pHpH a

p̂p̂ .ˆˆ nqp

The value of the test statistic is

The p-value associated with this test is found as the area under the standard normal curve to the left of z 1.25 as shown in Figure 9.10. Therefore,

1056.)3944.5(.)25.1(value- zPp

100)80)(.20(.

20.15.ˆ

Figure 9.10 p-value for Example 9.11

9.6 A Large-Sample Test of Hypothesis for the Difference

Between Two Binomial ProportionsLarge-Sample Statistical Test for p 1 p 2 :

1. Null hypothesis: H 0 : ( p 1 p 2) 0 or equivalently H 0 : p 1 p 2

H a : ( p 1 p 2 ) 0 Ha : p 1 p 2 ) 0

[or H a : ( p 1 p 2 ) 0 ]

3. Test statistic:

2121 ˆˆˆˆ

Since the common value of p 1 p 2 p (used in the standard

error) is unknown, it is estimated by

and the test statistic is

.ˆ and ˆ 222111 nxpnxp

21ˆnn

11ˆˆ

ˆˆ or

ˆˆˆˆ

[or z z/2 when the alternative hypothesisis H a : ( p 1 p 2 ) D 0 ]

or when p-value

Assumptions: Samples are selected in a random and independent manner from two binomial populations, and n 1 and n 2 are large enough so that the sampling distribution of can be approximated by a normal distribution. That is, should all be greater than 5.

21 ˆˆ pp

22221111 ˆ and ,ˆ ,ˆ ,ˆ qnpnqnpn

Example 9.12 illustrates a large-sample statistical test for the difference in two populations and Figure 9.11 shows the location of the rejection region in this example.

Figure 9.11

In some situations, you may need to test for a difference D 0 (other than 0) between two binomial proportions. If this is the case, the test statistic is modified for testing H 0 : ( p 1 p 2 ) D 0 , and a pooled estimate for a common p is no longer used in the standard error. The modified test statistic is

Although this test statistic is not used often, the procedure is no different from other large-sample tests you have already mastered!

ˆˆˆˆ

9.7 Some Comments on Testing Hypotheses

If the p-value is greater than .05, the results are reported as NS — not significant at the 5% level.

If the p-value lies between .05 and .01, the results are reported as P .05 — significant at the 5% level.

If the p-value lies between .01 and .001, the results are reported as P .01— “ highly significant ” or significant at the 1% level.

If the p-value is less that .001, the results are reported as P .001— “ very highly significant ” or significant at the .1% level.

Key Concepts and Formulas

I. Parts of a Statistical Test

1. Null hypothesis: a contradiction of the alternative hypothesis

2. Alternative hypothesis: the hypothesis the researcher wants to support.

3. Test statistic and its p-value: sample evidence calculated from sample data.

4. Rejection region—critical values and significance levels: values that separate rejection and nonrejection of the null hypothesis

5. Conclusion: Reject or do not reject the null hypothesis, stating the practical significance of your conclusion.

II. Errors and Statistical Significance

1. The significance level is the probability if rejecting H 0 when it is in fact true.

2. The p-value is the probability of observing a test statistic as

extreme as or more than the one observed; also, the smallest

value of for which H 0 can be rejected.

3. When the p-value is less than the significance level , the null hypothesis is rejected. This happens when the test statistic exceeds the critical value.

4. In a Type II error, is the probability of accepting H 0 when it

is in fact false. The power of the test is (1 ), the probability

of rejecting H 0 when it is false.

III. Large-Sample Test Statistics Using the z Distribution

To test one of the four population parameters when the sample sizes are large, use the following test statistics:

chapter 9 large-sample tests of hypotheses general objectives: in this chapter, the concept of a...

sample mean

null hypothesis h

good test statistic

mean potency

righttailed test

left tailedtest

alternative hypothesis

random sample

Documents

inference: significance tests about hypotheses chapter 9

chapter 8 testing hypotheses about means

preview objectives scientific method observing and...

tests of hypotheses – one sample case general objectives :...

chapter 5 statistical inference estimation and testing...

chapter two: research ideas, critiquing research, and...

chapter evaluating hypotheses - github pages

chapter 2: research questions, hypotheses and...

chapter 20: testing hypotheses about proportions

chapter 3: explanations, hypotheses, and making comparisons

1 tests of hypotheses: small samples tests of hypotheses:...

chapter 12: testing hypotheses

chapter 6 results of hypotheses testing and...

chapter 13 – 1 chapter 12: testing hypotheses overview...

chapter 9: tests of hypotheses - github...

kris' dissertation chapter 3 analytical framework and...

chapter 1 introduction, hypotheses, aims and objectives

chapter 2 literature review and hypotheses 2.1 introduction

chapter 5 statistical inference estimation and testing...

chapter 7 : mediation analysis and hypotheses...