copyright (c) bani k. mallick1 stat 651 lecture 9
Post on 19-Dec-2015
213 views
TRANSCRIPT
![Page 1: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/1.jpg)
Copyright (c) Bani K. Mallick 1
STAT 651
Lecture 9
![Page 2: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/2.jpg)
Copyright (c) Bani K. Mallick 2
Topics in Lecture #9 Comparing two population means
Output: detailed look
The t-test
![Page 3: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/3.jpg)
Copyright (c) Bani K. Mallick 3
Book Sections Covered in Lecture #9
Chapter 6.2
![Page 4: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/4.jpg)
Copyright (c) Bani K. Mallick 4
Relevant SPSS Tutorials Transformations of Data
2-sample t-test
Paired t-test
![Page 5: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/5.jpg)
Copyright (c) Bani K. Mallick 5
Lecture 8 Review: Comparing Two Populations
There a two populations
Take a sample from each population
The sample sizes need not be the same
Population 1:
Population 2:
1n
2n
![Page 6: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/6.jpg)
Copyright (c) Bani K. Mallick 6
Lecture 8 Review: Comparing Two Populations
Each will have a sample standard deviation
Population 1:
Population 2:
1s
2s
![Page 7: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/7.jpg)
Copyright (c) Bani K. Mallick 7
Lecture 8 Review: Comparing Two Populations
Each sample with have a sample mean
Population 1:
Population 2:
That’s the statistics. What are the parameters?
1X
2X
![Page 8: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/8.jpg)
Copyright (c) Bani K. Mallick 8
Lecture 8 Review: Comparing Two Populations
Each sample with have a population standard deviation
Population 1:
Population 2:
1
2
![Page 9: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/9.jpg)
Copyright (c) Bani K. Mallick 9
Lecture 8 Review: Comparing Two Populations
Each sample with have a population mean
Population 1:
Population 2:
1
2
![Page 10: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/10.jpg)
Copyright (c) Bani K. Mallick 10
Lecture 8 Review: Comparing Two Populations
How do we compare the population means and ????
The usual way is to take their difference:
If the population means are equal, what is their difference?
12
1 2
![Page 11: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/11.jpg)
Copyright (c) Bani K. Mallick 11
Lecture 8 Review: Comparing Two Populations
The usual way is to take their difference:
If the population means are equal, their difference = 0
Suppose we form a confidence interval for the difference. From this we learn whether 0 is in the confidence interval, and hence can make decisions about the hypothesis
1 2
![Page 12: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/12.jpg)
Copyright (c) Bani K. Mallick 12
NHANES Comparison
Group Statistics
60 2.9905 .6173 7.969E-02
59 2.6969 .6423 8.362E-02
Health StatusHealthy
Cancer
Log(Saturated Fat)N Mean Std. Deviation
Std. ErrorMean
![Page 13: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/13.jpg)
Copyright (c) Bani K. Mallick 13
NHANES Comparison: what the output looks like
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
![Page 14: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/14.jpg)
Copyright (c) Bani K. Mallick 14
NHANES Comparison: the variable
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
![Page 15: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/15.jpg)
Copyright (c) Bani K. Mallick 15
NHANES Comparison: The method. If you think the
varianes are wildly different, try a transformation
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
![Page 16: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/16.jpg)
Copyright (c) Bani K. Mallick 16
NHANES Comparison: the p-value.
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
![Page 17: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/17.jpg)
Copyright (c) Bani K. Mallick 17
NHANES Comparison: the difference in sample means
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
![Page 18: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/18.jpg)
Copyright (c) Bani K. Mallick 18
NHANES Comparison: the standard error of difference in
sample means
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
![Page 19: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/19.jpg)
Copyright (c) Bani K. Mallick 19
NHANES Comparison: the 95% confidence interval
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
F Sig.
Levene's Test forEuality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
![Page 20: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/20.jpg)
Copyright (c) Bani K. Mallick 20
NHANES Comparison
The “Mean Difference” is 0.2937. Since the healthy cases had a higher mean, this is
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
What is this a CI for? The difference in population mean log(saturated fat) intake between cancer cases and healthy controls:
(Healthy) – (Cancer)
![Page 21: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/21.jpg)
Copyright (c) Bani K. Mallick 21
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
The null hypothesis of interest is that the population means are equal, i.e.,
(Healthy) – (Cancer) = 0
![Page 22: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/22.jpg)
Copyright (c) Bani K. Mallick 22
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Is the p-value p < 0.05 or p > 0.05?
![Page 23: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/23.jpg)
Copyright (c) Bani K. Mallick 23
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
0 = Hypothesized
value
0.0065 0.5223
Confidence Interval
![Page 24: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/24.jpg)
Copyright (c) Bani K. Mallick 24
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Is the p-value p < 0.05 or p > 0.05?
Answer: p < 0.05 since the 95% CI does not cover zero.
![Page 25: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/25.jpg)
Copyright (c) Bani K. Mallick 25
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Is the p-value p < 0.01 or p > 0.01?
Answer: You cannot tell from a 95% CI. However, from the SPSS output, p = 0.012. (see next slide)
![Page 26: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/26.jpg)
Copyright (c) Bani K. Mallick 26
NHANES Comparison: the 95% confidence interval
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
F Sig.
Levene's Test forEuality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
![Page 27: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/27.jpg)
Copyright (c) Bani K. Mallick 27
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
What do we conclude from this confidence interval?
![Page 28: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/28.jpg)
Copyright (c) Bani K. Mallick 28
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
What do we conclude from this confidence interval?
The population mean log(saturated fat) intake is greater in the Healthy cases by between 0.0065 and 0.5223 (exponentiate to get in terms of grams of saturated fat), with 95% confidence
![Page 29: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/29.jpg)
Copyright (c) Bani K. Mallick 29
Comparing Two Population Means: the Formulas
The data:
The populations:
The aim: CI for
1X 1s 1n
2X 2s 2n
1 12 2
1 2
![Page 30: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/30.jpg)
Copyright (c) Bani K. Mallick 30
Comparing Two Populations
Does it matter which one you call population 1 and which one you call population 2?
Not at all. The key is to interpret the difference properly.
![Page 31: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/31.jpg)
Copyright (c) Bani K. Mallick 31
Comparing Two Populations
The aim: CI for
This is the difference in population means
The estimate of the difference in population means is the difference in sample means
This is a random variable: it has sample to sample variability
1 2
1 2X X
![Page 32: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/32.jpg)
Copyright (c) Bani K. Mallick 32
Comparing Two Populations
Difference of sample means
“Population” mean from repeated sampling is
The s.d. from repeated sampling is
1 2X X
1 2
2 21 2
1 2n n
![Page 33: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/33.jpg)
Copyright (c) Bani K. Mallick 33
Comparing Two Populations
Difference of sample means
The s.d. from repeated sampling is
You need reasonably large samples from BOTH populations
1 2X X
2 21 2
1 2n n
![Page 34: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/34.jpg)
Copyright (c) Bani K. Mallick 34
Comparing Two Populations
If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by
2 21 1 2 2
p1 2
(n 1)s (n 1)ss
n n 2
![Page 35: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/35.jpg)
Copyright (c) Bani K. Mallick 35
Comparing Two Populations
The standard error then of is the value
The number of degrees of freedom is
1 2X X
p 1 2
1 1s
n n
1 2n n 2
![Page 36: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/36.jpg)
Copyright (c) Bani K. Mallick 36
Comparing Two Populations
A (1100% CI for is
Note how the sample sizes determine the CI length
1 2X X /2 1 2 p 1 2
1 1t (n +n -2)s
n n
1 2
![Page 37: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/37.jpg)
Copyright (c) Bani K. Mallick 37
Comparing Two Populations
Generally, you should make your sample sizes nearly equal, or at least not wildly unequal. Consider a total sample size of 100
= 1 if n1 = 1, n2 = 99
= 0.20 if n1 = 50, n2 = 50
Thus, in the former case, your CI would be 5 times longer!
1 2
1 1
n n
1 2 /2 1 2 p 1 2
1 1X X t (n +n -2)s
n n
![Page 38: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/38.jpg)
Copyright (c) Bani K. Mallick 38
Comparing Two Populations
The CI can of course be used to test hypotheses
This is the same as
So we just need to check whether 0 is in the interval, just as we have done
0 1 2 a 1 2H : vs H :
0 1 2 a 1 2H : =0 vs H : 0
![Page 39: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/39.jpg)
Copyright (c) Bani K. Mallick 39
Comparing Two Populations: The t-test
There is something called a t-test, which gives you the information as to whether 0 is in the CI.
It does not tell you where the means lie however, so it is of limited use. P-values tell you the same thing.
0 1 2 a 1 2H : =0 vs H : 0
![Page 40: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/40.jpg)
Copyright (c) Bani K. Mallick 40
Comparing Two Populations: The t-test
The t-statistic is defined by
1 2
p 1 2
X Xt =
1 1s
n n
![Page 41: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/41.jpg)
Copyright (c) Bani K. Mallick 41
Comparing Two Populations: The t-test
You reject equality of means if
In this case, is p < or is p > ?
/2 1 2|t| > t (n +n -2)
![Page 42: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/42.jpg)
Copyright (c) Bani K. Mallick 42
Comparing Two Populations: The t-test
You reject equality of means if
p <
/2 1 2|t| > t (n +n -2)
![Page 43: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/43.jpg)
Copyright (c) Bani K. Mallick 43
NHANES Comparison: the t-test
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
tdf
Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
/2 1 2 .025t (n +n -2) = t (117) 1.98
/2 1 2t = 2.543 > t (n +n -2) 1.98, hence reject
the hypothesis that the population means are equal,
for = 0.05
![Page 44: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9](https://reader038.vdocuments.us/reader038/viewer/2022110207/56649d405503460f94a19b15/html5/thumbnails/44.jpg)
Copyright (c) Bani K. Mallick 44
Comparing Two Populations
SPSS Demonstrations: bluebonnets and Framingham Heart Disease and Blood Pressure, as time permits