“students” t-test. recall: the z-test for means the test statistic
TRANSCRIPT
“Students” t-test
Recall: The z-test for means
ns
x
n
xxz
x
000
The Test Statistic
Comments
• The sampling distribution of this statistic is the standard Normal distribution
• The replacement of by s leaves this distribution unchanged only if the sample size n is large.
For small sample sizes:
ns
xt 0
The sampling distribution of
is called “students” t distribution with n –1 degrees of freedom
Properties of Student’s t distribution
• Similar to Standard normal distribution– Symmetric– unimodal– Centred at zero
• Larger spread about zero.– The reason for this is the increased variability introduced
by replacing by s.
• As the sample size increases (degrees of freedom increases) the t distribution approaches the standard normal distribution
-4 -2 2 4
0.1
0.2
0.3
0.4
t distribution
standard normal distribution
The Situation
• Let x1, x2, x3 , … , xn denote a sample from a normal population with mean and standard deviation . Both and are unknown.
• Let
• we want to test if the mean, , is equal to some given value 0.
mean sample the1
n
xx
n
ii
deviation standard sample the
11
2
n
xxs
n
ii
The Test Statistic
ns
xt 0
The sampling distribution of the test statistic is the t distribution with n-1 degrees of freedom
The Alternative Hypothesis HA
The Critical Region
0: AH
0: AH
0: AH
2/2/ or tttt
tt
tt
t and t/2 are critical values under the t distribution with n – 1 degrees of freedom
Critical values for the t-distribution
or /2
0 t
tt or 2/
Critical values for the t-distribution are provided in tables. A link to these tables are given with today’s lecture
Look up df
Look up
Note: the values tabled for df = ∞ are the same values for the standard normal distribution, z
…
Example
• Let x1, x2, x3 , x4, x5, x6 denote weight loss from a new diet for n = 6 cases.
• Assume that x1, x2, x3 , x4, x5, x6 is a sample from a normal population with mean and standard deviation . Both and are unknown.
• we want to test:
0: AH
0:0 H
versus
New diet is not effective
New diet is effective
The Test Statistic
ns
xt 0
The Critical region:
tt Reject if
The Data
The summary statistics:
462418.1 and 96667.0 sx
1 2 3 4 5 6
2.0 1.0 1.4 -1.8 0.9 2.3
The Test Statistic
619.1
6462418.1
096667.00
ns
xt
The Critical Region (using = 0.05)
d.f. 5for 0152050 .tt . Reject if
Conclusion: Accept H0:
Confidence Intervals
Confidence Intervals for the mean of a Normal Population, , using the Standard Normal distribution
nzx
2/
Confidence Intervals for the mean of a Normal Population, , using the t distribution
n
stx 2/
The Data
The summary statistics:
462418.1 and 96667.0 sx
1 2 3 4 5 6
2.0 1.0 1.4 -1.8 0.9 2.3
Example
• Let x1, x2, x3 , x4, x5, x6 denote weight loss from a new diet for n = 6 cases.
The Data:
The summary statistics:
462418.1 and 96667.0 sx
1 2 3 4 5 6
2.0 1.0 1.4 -1.8 0.9 2.3
Confidence Intervals (use = 0.05)
n
stx 025.0
6
462418.1571.296667.0
535.196667.0
50.2 to57.0
Summary
Statistical Inference
Estimation by Confidence Intervals
Confidence Interval for a Proportion
pzp ˆ2/ˆ
n
pp
n
ppp
ˆ1ˆ1ˆ
point critical 2/upper 2/ z
ndistribtio normal standard theof
ˆ/ 2 / 2 / 2
ˆ ˆ1 1p
p p p pB z z z
n n
Error Bound
The sample size that will estimate p with an Error Bound B and level of confidence P = 1 – is:
where:• B is the desired Error Bound• z is the /2 critical value for the standard normal
distribution• p* is some preliminary estimate of p.
2
22/ *1*
B
ppzn a
Determination of Sample Size
Confidence Intervals for the mean of a Normal Population,
/ 2 xx z
/ 2or x zn
/ 2or s
x zn
sample meanx point critical 2/upper 2/ z
ndistribtio normal standard theof sample standard deviation s
The sample size that will estimate with an Error Bound B and level of confidence P = 1 – is:
where:• B is the desired Error Bound• z is the /2 critical value for the standard normal
distribution• s* is some preliminary estimate of s.
2
222/
2
222/ *
B
sz
B
zn aa
Determination of Sample Size
Confidence Intervals for the mean of a Normal Population, , using the t distribution
n
stx 2/
Hypothesis Testing
An important area of statistical inference
To define a statistical Test we
1. Choose a statistic (called the test statistic)
2. Divide the range of possible values for the test statistic into two parts
• The Acceptance Region
• The Critical Region
To perform a statistical Test we
1. Collect the data.
2. Compute the value of the test statistic.
3. Make the Decision:
• If the value of the test statistic is in the Acceptance Region we decide to accept H0 .
• If the value of the test statistic is in the Critical Region we decide to reject H0 .
Determining the Critical Region
1. The Critical Region should consist of values of the test statistic that indicate that HA is true. (hence H0 should be rejected).
2. The size of the Critical Region is determined so that the probability of making a type I error, , is at some pre-determined level. (usually 0.05 or 0.01). This value is called the significance level of the test.
Significance level = P[test makes type I error]
To find the Critical Region
1. Find the sampling distribution of the test statistic when is H0 true.
2. Locate the Critical Region in the tails (either left or right or both) of the sampling distribution of the test statistic when is H0 true.
Whether you locate the critical region in the left tail or right tail or both tails depends on which values indicate HA is true.
The tails chosen = values indicating HA.
3. the size of the Critical Region is chosen so that the area over the critical region and under the sampling distribution of the test statistic when is H0 true is the desired level of =P[type I error]
Sampling distribution of test statistic when H0
is true
Critical Region - Area =
The z-test for Proportions
Testing the probability of success in a binomial experiment
Situation• A success-failure experiment has been
repeated n times
• The probability of success p is unknown. We want to test either
0 0 01. : versus :AH p p H p p
0 0 0
or
2. : versus :AH p p H p p
0 0 0
or
3. : versus :AH p p H p p
The Test Statistic
n
pp
ppppz
p 00
0
ˆ
0
1
ˆ
ˆ
Critical Region (dependent on HA)
Alternative Hypothesis Critical Region
0 :AH p p
0 :AH p p
0 :AH p p
or z z z z
z z
z z
The z-test for the mean of a Normal population (large samples)
Situation• A sample of n is selected from a normal
population with mean (unknown) and standard deviation . We want to test either
0 0 01. : versus :AH H
0 0 0
or
2. : versus :AH H
0 0 0
or
3. : versus :AH H
The Test Statistic
0 0 0 x
x x xz
s
n n
if is large.n
Critical Region (dependent on HA)
Alternative Hypothesis Critical Region
0 :AH
0 :AH
0 :AH
or z z z z
z z
z z
The t-test for the mean of a Normal population (small samples)
Situation• A sample of n is selected from a normal
population with mean (unknown) and standard deviation (unknown). We want to test either
0 0 01. : versus :AH H
0 0 0
or
2. : versus :AH H
0 0 0
or
3. : versus :AH H
The Test Statistic
0 0 x
x xt
ssn
Critical Region (dependent on HA)
Alternative Hypothesis Critical Region
0 :AH
0 :AH
0 :AH
or t t t t
t t
t t
Testing and Estimation of Variances
Let x1, x2, x3, … xn, denote a sample from a Normal distribution with mean and standard deviation (variance 2)
The point estimator of the variance 2 is:
The point estimator of the standard deviation is:
2
2 1
1
n
ii
x xs
n
2
1
1
n
ii
x xs
n
The statistic
has a 2 distribution with n – 1 degrees of freedom
2
21
2 2
1
n
ii
x xn s
U
Sampling Theory
Critical Points of the 2 distribution
0
0.1
0.2
0 5 10 15 202
Confidence intervals for 2 and .
0
0.1
0.2
0 5 10 15 202
/ 2
/2
22 21 / 2 / 22
11
n sP
21 / 2
/2
Confidence intervals for 2 and .
22 21 / 2 / 22
11
n sP
2 2 22 2
/ 2 1 / 2
1 11
n nP s s
2 2
/ 2 1 / 2
1 11
n nP s s
It is true that
from which we can show
and
Hence (1 – )100% confidence limits for 2 are:
2 22 2
/ 2 1 / 2
1 1 to
n ns s
and (1 – )100% confidence limits for are:
2 2
/ 2 1 / 2
1 1 to
n ns s
Example• In this example the subject is asked to type his
computer password n = 6 times.
• Each time xi = time to type the password is recorded. The data are tabulated below:
i 1 2 3 4 5 6 Sx iSx i
2
x i 6.63 8.51 9.01 8.69 8.71 8.83 50.38 426.9062
50.388.3967
6
ii
xx
n
2
22 1
1
50.38426.9062
6 0.8811511 5
n
ini
ii
x
xn
sn
95% confidence limits for the mean
.025
sx t
n
0.881151or 8.3967 2.571
6
.025 2.571 for 5 . .t d f
8.3967 0.9249
7.472 to 9.322
95% confidence limits for
97
1 1 to
n ns s
2 2.975 .0250.8312, 12.83 for 5 . .d f
95% confidence limits for 2
2 2
97
1 1 to
n s n s
5 5(0.881151) to (0.881151)
12.83 0.8312
2 25(0.881151) 5(0.881151) to
12.83 0.8312
0.550 to 2.161
0.303 to 4.671
Testing Hypotheses for 2 and .
2
20
1n sU
Suppose we want to test:
The test statistic:
2 2 2 20 0 0: against :AH H
If H 0 is true the test statistic, U, has a 2 distribution with n – 1 degrees of freedom:
2 2
1 / 2 / 22 20 0
1 1 or
n s n s
Thus we reject H0 if
0
0.1
0.2
0 5 10 15 202
/ 2
/2
21 / 2
/2
Accept RejectReject
One-tailed Tests for 2 and .
2
20
1n sU
Suppose we want to test:
The test statistic:
2 2 2 20 0 0: against :AH H
2
20
1n s
We reject H0 if
0
0.1
0.2
0 5 10 15 202
Accept Reject
2
20
1n sU
Or suppose we want to test:
The test statistic:
2 2 2 20 0 0: against :AH H
2
120
1n s
We reject H0 if
0
0.1
0.2
0 5 10 15 2021
AcceptReject
Example
• The current method for measuring blood alcohol content has the following properties– Measurements are
1. Normally distributed
2. Mean = true blood alcohol content
3. standard deviation 1.2 units
• A new method is proposed that has the first two properties and it is believed that the measurements will have a smaller standard deviation.
• We want to collect data to test this hypothesis.
• The experiment will be to collect n = 10 observations on a case were the true blood alcohol content is 6.0
• The data are tabulated below:
6.0550i
i
xx
n
2
2 1
21 0.692359, 0.4793611
n
ini
ii
x
xn
s sn
i 1 2 3 4 5 6 7 8 9 10 Sx iSx i
2
x i 5.21 6.90 5.69 5.05 5.75 5.90 6.92 6.48 6.85 5.80 60.55 370.9445
2
20
1n sU
To test:
The test statistic:
2 2 2 20 : 1.2 against : 1.2AH H
1 0.95 3.325 for 9 . .U d f
We reject H0 if
2
9 0.4793612.996
1.2U
Thus we reject H0 if = 0.05.
Two sample Tests