chapter 9 large-sample tests of hypotheses general objectives: in this chapter, the concept of a...
Post on 12-Jan-2016
250 Views
Preview:
TRANSCRIPT
Chapter 9 Large-Sample Tests of Hypotheses
General Objectives:
In this chapter, the concept of a statistical test of a hypothesis is formally introduced. The sampling distributions of statistics presented in earlier chapters are used to construct large-sample tests concerning the values of population parameters of interest to the experimenter.
©1998 Brooks/Cole Publishing/ITP
Specific Topics
1. A Statistical test of hypotheses
2. Large-sample test about a population mean
3. Large-sample test about ( 1 2)
4. Testing a hypothesis about a population proportion p
5. Testing a hypothesis about (p 1 p 2)
©1998 Brooks/Cole Publishing/ITP
9.1 Testing Hypotheses About Population Parameters
Samples can be used to estimate the mean potency of a population.
Two possibilities:
- The mean potency does not exceed the minimum allowable potency.
- The mean potency exceeds the minimum allowable potency.
This is an example of a statistical test of a hypothesis.
©1998 Brooks/Cole Publishing/ITP
9.2 A Statistical Test of Hypothesis
A statistical test of hypothesis consists of five parts:
1. The null hypothesis, denoted by H 0
2. The alternative hypothesis, denoted by Ha
3. The test statistic and its p-value
4. The rejection region
5. The conclusion
Definition: The two competing hypotheses are the alternative hypothesis Ha , generally the hypothesis that the researcher
wishes to support, and the null hypothesis H 0 , a contradiction of
the alternative hypothesis.
©1998 Brooks/Cole Publishing/ITP
The researcher then uses the sample data to decide whether the evidence favors Ha rather than H 0 and draws one of these two conclusions:
- Reject H 0 and conclude that Ha is true.
- Accept (do not reject) H 0 as true. Examples 9.1 and 9.2 show null and alternative hypotheses. You can have a two-tailed test of a hypothesis or a one-tailed
test of a hypothesis, a left tailed-test or a right-tailed test. The test statistic is a single number calculated from sample
data. The p-value is a probability calculated using the test statistic. Either or both of these measures act as a decision maker for the
researcher in deciding whether to reject or accept H 0. Example 9.3 deals with the z-score and the p-value.
Figures 9.1 and 9.2 show acceptance and rejection regions.
©1998 Brooks/Cole Publishing/ITP
Example 9.3
For the test of hypothesis in Example 9.1, the average hourly wage for a random sample of 100 California construction workers might provide a good test statistic for testing.
If the null hypothesis H 0 is true, then the sample mean should
not be too far from the population mean 14. Suppose that this sample produces a sample mean with standard deviation s 2. Is this sample evidence likely or unlikely to occur, if in fact H 0 is true? You can use two measures to find
out. Since the sample size is large, the sampling distribution of is approximately normal with mean 14 and standard error
The test statistic lies standard
deviations from the population mean.
©1998 Brooks/Cole Publishing/ITP
14: versus 14:0 aHH
x
x.2.100/2/ n
52.
1415
/
n
xz
15x
15x
©1998 Brooks/Cole Publishing/ITP
The p-value is the probability of observing a test statistic that is five or more standard deviations from the mean. Since z measures the number of standard deviations a normal random variable lies from its mean, you have
The large value of the test statistic and the small p-value mean that you have observed a very unlikely event, if indeed H 0 is true and 14.
0)5()5(value- zPzPp
Definition:
A Type I error for a statistical test is the error of rejecting the null hypothesis when it is true.
The level of significance (significance level) for a statistical test of a hypothesis is
The value represents the maximum tolerable risk oF incorrectly rejecting H 0.
©1998 Brooks/Cole Publishing/ITP
true) is it when (rejecting
) rejecting(falsely error) I (Type
0
0
HP
HPP
9.3 A Large-Sample Test About a Population Mean
H 0 : 0
H a : 0
The standard error of is calculated as The standardized test statistic:
Figure 9.3 shows a rejection region. Examples 9.4 and 9.5 deal with tests of hypotheses concerning the mean.
©1998 Brooks/Cole Publishing/ITP
.nx
n
xz
/0
x
Figure 9.3 The rejection region of a right-tailed test with .01
Example 9.4
The average weekly earnings for women in managerial and professional positions is $670. Do men in the same positions have average weekly earnings that are higher than those for women? A random sample of n 40 men in managerial and professional positions showed $725 and s $102. Test the appropriate hypothesis using .01.
Solution
You would like to show that the average weekly earnings for men are higher than $670, the women’s average. Hence, if is the average weekly earnings in managerial and professional positions for men, the hypotheses to be tested are
H 0 : 670 versus H a : 670
©1998 Brooks/Cole Publishing/ITP
x
©1998 Brooks/Cole Publishing/ITP
The rejection region for this one-tailed test consists of large values of or, equivalently, values of the standardized test statistic z in the right tail of the standard normal distribution, with .01. This value is found in Table 3 of Appendix I to be z 2.33, as shown in Figure 9.3. The observed value of the test statistic, using s as an estimate of the population standard deviation, is
Since the observed value of the test statistic falls in the rejection region, you can reject H 0 and conclude that the average weekly
earnings for men in managerial and professional positions are significantly higher than those for women. The probability that you have made an incorrect decision is .01.
x
41.340/102
670725
/
670 ns
xz
©1998 Brooks/Cole Publishing/ITP
Figure 9.4 The rejection region for a two-tailed test with .01
The two-tailed hypothesis is written as H a : 0, which implies either 0 or 0..
Large-Sample Statistical Test for :
1. Null hypothesis: H 0 : 0
2. Alternative hypothesis:
One-Tailed Test Two-Tailed Test
H a : 0 H a : 0
(or H a : 0 )
3. Test statistic:
If is unknown (which is usually the case), substitute thesample standard deviation s for ..
©1998 Brooks/Cole Publishing/ITP
n
xxz
x/
00
4. Rejection region: Reject H 0 when
One-Tailed Test Two-Tailed Testz z z z/2 or z z/2
(or z z when the alternative hypothesis is H a : 0)
Assumptions: The n observations in the sample are randomly selected from the population and n is large—say, n 30.
The unnumbered figures on page 344 show one- and two-tailed rejection regions:
©1998 Brooks/Cole Publishing/ITP
Calculating the p-Value
To avoid any ambiguity in their conclusions, some experimenters prefer to use a variable level of significance called the p-value for the test.
Definition: The p-value or observed significance level of a statistical test is the smallest value of for which H0 can be rejected. It is the actual risk of committing a Type I error, if H0 is rejected based on the observed value of the test statistic. The p-value measures the strength of the evidence against H0.
The p-value of the test is actually the area to the right of the calculated value of the test statistic (if the critical value is in the right tail).
Figure 9.5 illustrates variable rejection regions.
©1998 Brooks/Cole Publishing/ITP
©1998 Brooks/Cole Publishing/ITP
Figure 9.5 Variable rejection regions
Definition: If the p-value is less than a preassigned significance level , then the null hypothesis can be rejected, and you can report that the results are statically significant at level .
Example 9.6 shows the calculation of the p-value for a two-tailed test.
©1998 Brooks/Cole Publishing/ITP
Example 9.6
Calculate the p-value for the two-tailed test of hypothesis in Example 9.5. Use the p-value to draw conclusions regarding the statistical test.
Solution
The rejection region for this two-tailed test of hypothesis is found in both tails of the normal probability distribution. Since the observed value of the test statistic is z 3.03, the smallest rejection region that you can use and still reject H0 isz 3.03. For this rejection region, the value of is the p-value:
p-value P (z 3.03) P (z 3.0)
2(.5 .4988) 2(.0012) .0024
Notice that the two-tailed p-value is actually twice the tail area corresponding to the calculated value of the test statistic. If this p-value .0024 is less than the preassigned level of significance , H0 can be rejected. For this test, you can reject H0 at either the 1% or the 5% level of significance.
©1998 Brooks/Cole Publishing/ITP
Many researchers use a “sliding scale” to classify their results:
- If the p-value is less than .01, H0 is rejected. The results are
highly significant.
- If the p-value is between .01 and .05, H0 is rejected.
The results are statistically significant.
- If the p-value is between .05 and .10, H0 is usually not rejected. The results are only tending toward statistical significance.
- If the p-value is greater than .10, H0 is not rejected. The results are not statistically significant.
Example 9.7 conducts a test of hypothesis concerning the mean.
©1998 Brooks/Cole Publishing/ITP
The p-value approach does have two advantages:
- Statistical output from packages such as Minitab usually
report the p-value of the test.
- Based on the p-value, your test results can be evaluated using
any significance level you wish to see. The smaller the p-value, the more unlikely it is that H 0 is true! Table 9.1 illustrates a decision table.
Table 9.1 Null Hypothesis
Decision True False
Reject H 0 Type I error Correct decision
Accept H 0 Correct decision Type II error
©1998 Brooks/Cole Publishing/ITP
Definition: A Type I error for a statistical test is the error of rejecting the null hypothesis when it is true. The probability of making a Type I error is denoted by the symbol .
A Type II error for a statistical test is the error of accepting (not rejecting) the null hypothesis when it is false and some alternative hypothesis is true. The probability of making a Type II error is denoted by the symbol .
Notice that the probability of a Type I error is exactly the same as the level of significance and is therefore controlled by the researcher.
Keep in mind that “accepting” a particular hypothesis means deciding in its favor.
There is always a risk of being wrong, measured by and .
©1998 Brooks/Cole Publishing/ITP
Definition: The power of a statistical test, given as
1 P (reject H 0 when H a is true)
measures the ability of the test to perform as required.
A graph of (1 ), the probability of rejecting H 0 when in fact
H 0 is false, as a function of the true value of the parameter of
interest is called the power curve for the statistical test.
Ideally, you would like to be small and the power (1 ) tobe large.
Example 9.8 shows the calculation of and the power of the test (1 ).
©1998 Brooks/Cole Publishing/ITP
©1998 Brooks/Cole Publishing/ITP
Figure 9.7 Calculating in Example 9.8
©1998 Brooks/Cole Publishing/ITP
Figure 9.8 Power curve for Example 9.8
9.4 A Large-Sample Test of Hypothesis for the Difference
Between Two Population Means In testing whether the difference in sample means
indicates that the true difference in populations means differs from a specified value, ( 1 2) D 0 , you can use the standard error of the difference in sample means:
in the form of a z statistic to measure how many standard deviations the difference lies from the hypothesized difference D 0 .
©1998 Brooks/Cole Publishing/ITP
21 xx
2
22
1
21SE
nn
Large-Sample Statistical Test for ( 1 2 ):
1. Null hypothesis: H 0 : ( 1 2) D 0 , where D 0 is some specified difference that you wish to test. For many tests, you will hypothesize that there is no difference between 1 and 2; that is, D 0 0.
2. Alternative hypothesis:
One-Tailed Test Two-Tailed Test
H a : ( 1 2) D 0 H a : ( 1 2) D 0
[or H a : ( 1 2) D 0 ]
3. Test statistic:
If are unknown (which is usually the case), substitute the sample variancesrespectively.
©1998 Brooks/Cole Publishing/ITP
2
22
1
21
021021
SE
nn
DxxDxxz
and 22
21
, and for and 22
21
22
21 ss
4. Rejection region: Reject H 0 when
One-Tailed Test Two-Tailed Test
z z z z/2 or z z/2
[or z z/2 when the alternative hypothesisis H a : ( 1 2) D 0 ]
or when p-value .
Assumptions: The samples are randomly and independently selected from the two populations and n1 30 and n2 30.
©1998 Brooks/Cole Publishing/ITP
Example 9.9 illustrates a test of the difference in two means.
Example 9.9
A university investigation conducted to determine whether car ownership affects academic achievement was based on two random samples of 100 male students, each drawn from the student body. The grade point average for the n1 100 nonowners of cars had an average and variance equal to
as opposed to
for the n2 100 car owners. Do the data present sufficient
evidence to indicate a difference in the mean achievements between car owners and nonowners of cars? Test using .05.
©1998 Brooks/Cole Publishing/ITP
,36. and 70.2 211 sx 40. and 54.22 2
2 sx
Solution
To detect a difference, if it exists, between the mean academic achievements for nonowners of cars 1 and car owners 2 , you will test the null hypothesis that there is no difference between the means against the alternative hypothesis that ( 1 2) 0;
that is,
Substituting into the formula for the test statistic, you get
©1998 Brooks/Cole Publishing/ITP
0)(: versus 0)(: 210210 aHDH
84.1
10040.
10036.
54.270.2
2
22
1
21
021
nn
Dxxz
Hypothesis Testing and Confidence Intervals
- If the confidence interval you construct contains the value of
the parameter specified by H 0 , then that value is one of the
likely or possible values of the parameter and H 0 should be rejected.
- If the hypothesized value lies outside of the confidence limits,
the null hypothesis is rejected at the level of significance.
Example 9.10 constructs a 95% confidence interval for the difference in average academic achievements.
©1998 Brooks/Cole Publishing/ITP
It is important to understand the difference between results that are “significant” and results that are “practically” important. In statistical language, the word significant does not necessarily mean “ important”, but only that the results could not have occurred by chance.
The unnumbered example on page 364 illustrates a case of statistical versus practical significance.
©1998 Brooks/Cole Publishing/ITP
9.5 A Large-Sample Test of a Hypothesis for a Binomial Proportion
Large-Sample Statistical Test for p
1. Null hypothesis: H 0 : p p 0
2. Alternative hypothesis:
One-Tailed Test Two-Tailed Test
H a : p p 0 Ha : p p 0
(or H a : p p 0 )
3. Test statistic:
where x is the number of successes in n binomial trials.
©1998 Brooks/Cole Publishing/ITP
nx
p
n
qp
ppppz
ˆ with
ˆ
SE
ˆ
00
00
4. Rejection region: Reject H 0 when
One-Tailed Test Two-Tailed Testz z z z/2 or z z/2
(or z z/2 when the alternative hypothesisis H a : p p 0 )
or when p-value
Assumption: The sampling satisfies the assumptions of a binomial experiment and n is large enough so that the sampling distribution of can be approximated by a normal distribution(np 0 5 and nq 0 5).
©1998 Brooks/Cole Publishing/ITP
p̂
Example 9.11 shows a large sample test of hypothesis for a binomial proportion.
Example 9.11
Regardless of age, about 20% of American adults participate in fitness activities at least twice a week. However, these fitness activities change as the people get older, and occasional participants become nonparticipants as they age. In a local survey of n 100 adults over 40 years old, a total of 15 people indicated that they participated in a fitness activity at least twice a week. Do these data indicate that the participation rate for adults over 40 years of age is significantly less than the 20% figure? Calculate the p-value and use it to draw the appropriate conclusions.
Solution
It is assumed that the sampling procedure satisfies the requirements of a binomial experiment. You can answer the
©1998 Brooks/Cole Publishing/ITP
question posed by testing the hypothesis
A one-tailed test is used because you wish to detect whether the value of p is less than .2.
The point estimator of p is and the test statistic is
When H 0 is true, the value of p is p 0 .2, and the sampling
distribution of has a mean equal to p 0 and a standard deviation of
Hence, is not used to estimate
the standard error of in this case because the test statistic is
calculated under the assumption that H 0 is true. (When you estimate
the value of p using the estimator , the standard error of is not
known and is estimated by
©1998 Brooks/Cole Publishing/ITP
n
qp
ppz
00
0ˆ
^
p
p̂
,ˆ nxp
.00 nqp nqp ˆˆ
2.: versus 2.:0 pHpH a
p̂
p̂p̂ .ˆˆ nqp
The value of the test statistic is
The p-value associated with this test is found as the area under the standard normal curve to the left of z 1.25 as shown in Figure 9.10. Therefore,
©1998 Brooks/Cole Publishing/ITP
1056.)3944.5(.)25.1(value- zPp
25.1
100)80)(.20(.
20.15.ˆ
00
0
n
qp
ppz
Figure 9.10 p-value for Example 9.11
9.6 A Large-Sample Test of Hypothesis for the Difference
Between Two Binomial ProportionsLarge-Sample Statistical Test for p 1 p 2 :
1. Null hypothesis: H 0 : ( p 1 p 2) 0 or equivalently H 0 : p 1 p 2
2. Alternative hypothesis:
One-Tailed Test Two-Tailed Test
H a : ( p 1 p 2 ) 0 Ha : p 1 p 2 ) 0
[or H a : ( p 1 p 2 ) 0 ]
3. Test statistic:
©1998 Brooks/Cole Publishing/ITP
21
1
21
2
22
1
11
2121 ˆˆˆˆ
SE
0ˆˆ
npq
n
qp
pp
n
qp
n
qp
ppppz
where
Since the common value of p 1 p 2 p (used in the standard
error) is unknown, it is estimated by
and the test statistic is
©1998 Brooks/Cole Publishing/ITP
.ˆ and ˆ 222111 nxpnxp
21
21ˆnn
xxp
21
21
21
21
11ˆˆ
ˆˆ or
ˆˆˆˆ
0ˆˆ
nnqp
ppz
nqp
nqp
ppz
4. Rejection region: Reject H 0 when
One-Tailed Test Two-Tailed Testz z z z/2 or z z/2
[or z z/2 when the alternative hypothesisis H a : ( p 1 p 2 ) D 0 ]
or when p-value
Assumptions: Samples are selected in a random and independent manner from two binomial populations, and n 1 and n 2 are large enough so that the sampling distribution of can be approximated by a normal distribution. That is, should all be greater than 5.
©1998 Brooks/Cole Publishing/ITP
21 ˆˆ pp
22221111 ˆ and ,ˆ ,ˆ ,ˆ qnpnqnpn
©1998 Brooks/Cole Publishing/ITP
Example 9.12 illustrates a large-sample statistical test for the difference in two populations and Figure 9.11 shows the location of the rejection region in this example.
Figure 9.11
In some situations, you may need to test for a difference D 0 (other than 0) between two binomial proportions. If this is the case, the test statistic is modified for testing H 0 : ( p 1 p 2 ) D 0 , and a pooled estimate for a common p is no longer used in the standard error. The modified test statistic is
Although this test statistic is not used often, the procedure is no different from other large-sample tests you have already mastered!
©1998 Brooks/Cole Publishing/ITP
2
22
1
11
021
ˆˆˆˆ
ˆˆ
n
qp
n
qp
Dppz
9.7 Some Comments on Testing Hypotheses
If the p-value is greater than .05, the results are reported as NS — not significant at the 5% level.
If the p-value lies between .05 and .01, the results are reported as P .05 — significant at the 5% level.
If the p-value lies between .01 and .001, the results are reported as P .01— “ highly significant ” or significant at the 1% level.
If the p-value is less that .001, the results are reported as P .001— “ very highly significant ” or significant at the .1% level.
©1998 Brooks/Cole Publishing/ITP
Key Concepts and Formulas
I. Parts of a Statistical Test
1. Null hypothesis: a contradiction of the alternative hypothesis
2. Alternative hypothesis: the hypothesis the researcher wants to support.
3. Test statistic and its p-value: sample evidence calculated from sample data.
4. Rejection region—critical values and significance levels: values that separate rejection and nonrejection of the null hypothesis
5. Conclusion: Reject or do not reject the null hypothesis, stating the practical significance of your conclusion.
©1998 Brooks/Cole Publishing/ITP
II. Errors and Statistical Significance
1. The significance level is the probability if rejecting H 0 when it is in fact true.
2. The p-value is the probability of observing a test statistic as
extreme as or more than the one observed; also, the smallest
value of for which H 0 can be rejected.
3. When the p-value is less than the significance level , the null hypothesis is rejected. This happens when the test statistic exceeds the critical value.
4. In a Type II error, is the probability of accepting H 0 when it
is in fact false. The power of the test is (1 ), the probability
of rejecting H 0 when it is false.
©1998 Brooks/Cole Publishing/ITP
III. Large-Sample Test Statistics Using the z Distribution
To test one of the four population parameters when the sample sizes are large, use the following test statistics:
©1998 Brooks/Cole Publishing/ITP
top related