introduction to hypothesis testing
DESCRIPTION
Introduction to Hypothesis Testing. Chapter 11. 11.1 Introduction. The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter. Examples - PowerPoint PPT PresentationTRANSCRIPT
1
Introduction to Hypothesis Testing
Introduction to Hypothesis Testing
Chapter 11
2
11.1 Introduction
• The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter.
• Examples– Is there statistical evidence in a random sample of potential
customers, that support the hypothesis that more than 10% of the potential customers will purchase a new products?
– Is a new drug effective in curing a certain disease? A sample of patients is randomly selected. Half of them are given the drug while the other half are given a placebo. The improvement in the patients conditions is then measured and compared.
3
11.2 Concepts of Hypothesis Testing
• The critical concepts of hypothesis testing.– Example:
• An operation manager needs to determine if the mean demand during lead time is greater than 350.
• If so, changes in the ordering policy are needed. – There are two hypotheses about a population mean:
• H0: The null hypothesis = 350 • H1: The alternative hypothesis > 350
This is what you want to prove
4
11.2 Concepts of Hypothesis Testing
= 350
• Assume the null hypothesis is true (= 350).
– Sample from the demand population, and build a statistic related to the parameter hypothesized (the sample mean).
– Pose the question: How probable is it to obtain a sample mean at least as extreme as the one observed from the sample, if H0 is correct?
5
– Since the is much larger than 350, the mean is likely to be greater than 350. Reject the null hypothesis.
x
355x
11.2 Concepts of Hypothesis Testing
= 350
• Assume the null hypothesis is true (= 350).
450x
– In this case the mean is not likely to be greater than 350. Do not reject the null hypothesis.
6
Types of Errors
• Two types of errors may occur when deciding whether to reject H0 based on the statistic value.
– Type I error: Reject H0 when it is true.
– Type II error: Do not reject H0 when it is false.• Example continued
– Type I error: Reject H0 ( = 350) in favor of H1 ( > 350) when the real value of is 350.
– Type II error: Believe that H0 is correct ( = 350) when the real value of is greater than 350.
7
Controlling the probability of conducting a type I error
• Recall:– H0: = 350 and H1: > 350.
– H0 is rejected if is sufficiently large
• Thus, a type I error is made if when = 350.
• By properly selecting the critical value we can limit the probability of conducting a type I error to an acceptable level.
xvaluecriticalx
Critical value
x= 350
8
11.3 Testing the Population Mean When the Population Standard Deviation is Known
• Example 11.1– A new billing system for a department store will be cost-
effective only if the mean monthly account is more than $170.
– A sample of 400 accounts has a mean of $178.– If accounts are approximately normally distributed with
= $65, can we conclude that the new system will be cost effective?
9
• Example 11.1 – Solution– The population of interest is the credit accounts at
the store.– We want to know whether the mean account for all
customers is greater than $170.H1 : > 170
– The null hypothesis must specify a single value of the parameter
H0 : = 170
Testing the Population Mean ( is Known)
10
Approaches to Testing
• There are two approaches to test whether the sample mean supports the alternative hypothesis (H1)– The rejection region method is mandatory for
manual testing (but can be used when testing is supported by a statistical software)
– The p-value method which is mostly used when a statistical software is available.
11
The rejection region is a range of values such that if the test statistic falls into that range, the null hypothesis is rejected in favor of the alternative hypothesis.
The rejection region is a range of values such that if the test statistic falls into that range, the null hypothesis is rejected in favor of the alternative hypothesis.
The Rejection Region Method
12
Example 11.1 – solution continued
• Recall: H0: = 170 H1: > 170
therefore, • It seems reasonable to reject the null hypothesis and believe that > 170 if the sample mean is sufficiently large.
The Rejection Region Method – for a Right - Tail Test
Reject H0 here
Critical value of the sample mean
13
Example 11.1 – solution continued
• Define a critical value for that is just large enough to reject the null hypothesis.
xLx
• Reject the null hypothesis if
Lxx Lxx
The Rejection Region Method for a Right - Tail Test
14
• Allow the probability of committing a Type I error be (also called the significance level).
• Find the value of the sample mean that is just large enough so that the actual probability of committing a Type I error does not exceed Watch…
Determining the Critical Value for the Rejection Region
15
P(commit a Type I error) = P(reject H0 given that H0 is true)Lx
170x x
= P( given that H0 is true)Lxx
40065
170xz L
Example 11.1 – solution continued
… is allowed to be
)ZZ(PSince we have:
Determining the Critical Value – for a Right – Tail Test
16
Determining the Critical Value – for a Right – Tail Test
.34.17540065
645.1170x
.645.1z,05.0selectweIf
.40065
z170x
L
05.
L
40065
170xz L
= 0.05
170x Lx
Example 11.1 – solution continued
17
Determining the Critical value for a Right - Tail Test
34.175xifhypothesisnullthejectRe
34.175xifhypothesisnullthejectRe
ConclusionSince the sample mean (178) is greater than the critical value of 175.34, there is sufficient evidence to infer that the mean monthly balance is greater than $170 at the 5% significance level.
ConclusionSince the sample mean (178) is greater than the critical value of 175.34, there is sufficient evidence to infer that the mean monthly balance is greater than $170 at the 5% significance level.
18
– Instead of using the statistic , we can use the standardized value z.
– Then, the rejection region becomes
x
n
xz
zzOne tail test
The standardized test statistic
19
• Example 11.1 - continued– We redo this example using the standardized test
statistic.Recall:H0: = 170
H1: > 170– Test statistic:
– Rejection region: z > z.051.645.
46.240065
170178
n
xz
The standardized test statistic
20
• Example 11.1 - continued
The standardized test statistic
645.1ZifhypothesisnullthejectRe
645.1ZifhypothesisnullthejectRe
ConclusionSince Z = 2.46 > 1.645, reject the null hypothesis in favor of the alternative hypothesis.
ConclusionSince Z = 2.46 > 1.645, reject the null hypothesis in favor of the alternative hypothesis.
21
– The p-value provides information about the amount of statistical evidence that supports the alternative hypothesis.
– The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed, given that the null hypothesis is true.
– Let us demonstrate the concept on Example 11.1
P-value Method
22
0069.)4615.2z(P
)40065170178
z(P
170x 178x
The probability of observing a test statistic at least as extreme as 178, given that = 170 is…
The p-value
P-value Method
)170when178x(P
23
Because the probability that the sample mean will assume a value of more than 178 when = 170 is so small (.0069), there are reasons to believe that > 170.
178x
170:H x0 170:H x1
…it becomes more probable under H1, when 170x
Note how the event is rare under H0
when but...178x
,170x
Interpreting the p-value
24178x
170:H x0 170:H x1
We can conclude that the smaller the p-value the more statistical evidence exists to support the alternative hypothesis.
We can conclude that the smaller the p-value the more statistical evidence exists to support the alternative hypothesis.
Interpreting the p-value
25
• Describing the p-value– If the p-value is less than 1%, there is overwhelming
evidence that supports the alternative hypothesis.– If the p-value is between 1% and 5%, there is a strong
evidence that supports the alternative hypothesis.– If the p-value is between 5% and 10% there is a weak
evidence that supports the alternative hypothesis.– If the p-value exceeds 10%, there is no evidence that
supports the alternative hypothesis.
Interpreting the p-value
26
– The p-value can be used when making decisions based on rejection region methods as follows:• Define the hypotheses to test, and the required
significance level • Perform the sampling procedure, calculate the test statistic
and the p-value associated with it.• Compare the p-value to Reject the null hypothesis only
if p-value <; otherwise, do not reject the null hypothesis.
The p-value
34.175xL
= 0.05
170x
178x
The p-value and the Rejection Region Methods
27
• If we reject the null hypothesis, we conclude that there is enough evidence to infer that the alternative hypothesis is true.
• If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true.
• If we reject the null hypothesis, we conclude that there is enough evidence to infer that the alternative hypothesis is true.
• If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true. The alternative hypothesis
is the more importantone. It represents whatwe are investigating.
The alternative hypothesisis the more importantone. It represents whatwe are investigating.
Conclusions of a Test of Hypothesis
28
A Left - Tail Test
• The SSA Envelop Example.– The chief financial officer in FedEx believes that
including a stamped self-addressed (SSA) envelop in the monthly invoice sent to customers will decrease the amount of time it take for customers to pay their monthly bills.
– Currently, customers return their payments in 24 days on the average, with a standard deviation of 6 days.
29
• The SSA envelop example – continued – It was calculated that an improvement of two days on the
average will cover the costs of the envelops (checks can be deposited earlier).
– A random sample of 220 customers was selected and SSA envelops were included with their invoice packs.
– The times customers’ payments were received were recorded (SSA.xls)
– Can the CFO conclude that the plan will be profitable at 10% significance level?
A Left - Tail Test
30
• The SSA envelop example – Solution– The parameter tested is the population mean
payment period ()– The hypotheses are:
H0: = 22H1: < 22(The CFO wants to know whether the plan will be profitable)
A Left - Tail Test
31
• The SSA envelop example – Solution continued– The rejection region:
It makes sense to believe that < 22 if the sample mean is sufficiently smaller than 22.
– Reject the null hypothesis if
A Left - Tail Test
Sxx Sxx
32
• The SSA envelop example – Solution continued– The standardized one tail left hand test is:
A Left -Tail Test
28.110. zzz
91.2206
2263.21
n
xz
Since -.91 > –1.28 do not reject the null hypothesis. The p value = P(Z<-.91) = .1814
Since .1814 > .10, do not reject the null hypothesis
Left-tail test
Define the rejection region
33
A Two - Tail Test
• Example 11.2– AT&T has been challenged by competitors who
argued that their rates resulted in lower bills.– A statistics practitioner determines that the mean
and standard deviation of monthly long-distance bills for all AT&T residential customers are $17.09 and $3.87 respectively.
34
A Two - Tail Test
• Example 11.2 - continued– A random sample of 100 customers is selected and
customers’ bills recalculated using a leading competitor’s rates (see Xm11-02).
– Assuming the standard deviation is the same (3.87), can we infer that there is a difference between AT&T’s bills and the competitor’s bills (on the average)?
35
• Solution– Is the mean different from 17.09?
H0: = 17.09
09.17:H1
– Define the rejection region
A Two - Tail Test
2/2/ zzorzz
36
17.09
We want this erroneous rejection of H0 to be a rare event, say 5% chance.
x x
If H0 is true ( =17.09), can still fall far above or far below 17.09, in which case we erroneously reject H0 in favor of H1
x
)09.17(
20.025 20.025
Solution - continued
A Two – Tail Test
37
20.025
17.09
0
x x20.025
20.025 20.025
19.110087.3
09.1755.17
n
xz
-z= -1.96 z= 1.96
Rejection region
Solution - continued
A Two – Tail Test
55.17x
From the sample we have:
17.55
38
20.025 20.025
19.110087.3
09.1755.17
n
xz
-z= -1.96 z= 1.96
There is insufficient evidence to infer that there is a difference between the bills of AT&T and the competitor.
-1.19
Also, by the p value approach:The p-value = P(Z< -1.19)+P(Z >1.19) = 2(.1173) = .2346 > .05
1.190
A Two – Tail TestTwo-tail test
39
11.4 Calculating the Probability of a Type II Error
• To properly interpret the results of a test of hypothesis, we need to– specify an appropriate significance level or judge the
p-value of a test;– understand the relationship between Type I and
Type II errors.– How do we compute a type II error?
40
• To calculate Type II error we need to…– express the rejection region directly, in terms of the
parameter hypothesized (not standardized).– specify the alternative value under H1.
• Let us revisit Example 11.1
Calculation of the Probability of a Type II Error
41
Express the rejection region directly, not in standardized terms
34.175xL
=.05
= 170
Calculation of the Probability of a Type II Error
• Let us revisit Example 11.1– The rejection region was with = .05.34.175x
Do not reject H0
180
H1: = 180
H0: = 170
Specify the alternative value
under H1.
– Let the alternative value be = 180 (rather than just >170)
4234.175
xL
=.05
= 170
Calculation of the Probability of a Type II Error
34.175x 180
H1: = 180
H0: = 170
– A Type II error occurs when a false H0 is not rejected.
A false H0……is not rejected
4334.175
xL = 170
Calculation of the Probability of a Type II Error
180
H1: = 180
H0: = 170
)180thatgiven34.175x(P
)falseisHthatgiven34.175x(P 0
0764.)40065
18034.175z(P
44
• Decreasing the significance level increases the value of and vice versa
Effects on of changing
= 170 180
2 >2 <
45
• A hypothesis test is effectively defined by the significance level and by the sample size n.
• If the probability of a Type II error is judged to be too large, we can reduce it by– increasing , and/or– increasing the sample size.
Judging the Test
46
• Increasing the sample size reduces
Judging the Test
By increasing the sample size the standard deviation of the sampling distribution of the mean decreases. Thus, decreases.
Lx
nzxthus,
nx
z:callRe LL
47Lx 180= 170
Judging the Test
Lx
Note what happens when n increases:
Lx LxLx Lx
does not change,but becomes smaller
• Increasing the sample size reduces
nzxthus,
nx
z:callRe LL
48
• Increasing the sample size reduces • In Example 11.1, suppose n increases from 400
to 1000.
0)22.3Z(P)100065
18038.173Z(P
38.173100065
645.1170n
zxL
Judging the Test
• remains 5%, but the probability of a Type II drops dramatically.
49
• Power of a test– The power of a test is defined as 1 - – It represents the probability of rejecting the null
hypothesis when it is false.
Judging the Test