copyright (c) bani k. mallick1 stat 651 lecture 9

Copyright (c) Bani K. Mallick 1

STAT 651

Lecture 9


Topics in Lecture #9 Comparing two population means

Output: detailed look

The t-test


Book Sections Covered in Lecture #9

Chapter 6.2


Relevant SPSS Tutorials Transformations of Data

2-sample t-test

Paired t-test


Lecture 8 Review: Comparing Two Populations

There a two populations

Take a sample from each population

The sample sizes need not be the same

Population 1:

Population 2:

1n

2n



Each will have a sample standard deviation

Population 1:

Population 2:

1s

2s



Each sample with have a sample mean

Population 1:

Population 2:

That’s the statistics. What are the parameters?

1X

2X



Each sample with have a population standard deviation

Population 1:

Population 2:

1

2



Each sample with have a population mean

Population 1:

Population 2:

1

2



How do we compare the population means and ????

The usual way is to take their difference:

If the population means are equal, what is their difference?

12

1 2



The usual way is to take their difference:

If the population means are equal, their difference = 0

Suppose we form a confidence interval for the difference. From this we learn whether 0 is in the confidence interval, and hence can make decisions about the hypothesis

1 2


NHANES Comparison

Group Statistics

60 2.9905 .6173 7.969E-02

59 2.6969 .6423 8.362E-02

Health StatusHealthy

Cancer

Log(Saturated Fat)N Mean Std. Deviation

Std. ErrorMean


NHANES Comparison: what the output looks like

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means


NHANES Comparison: the variable


.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224








Difference



NHANES Comparison: The method. If you think the

varianes are wildly different, try a transformation


.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224








Difference



NHANES Comparison: the p-value.


.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224








Difference



NHANES Comparison: the difference in sample means


.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224








Difference



NHANES Comparison: the standard error of difference in

sample means


.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224








Difference



NHANES Comparison: the 95% confidence interval


.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224



F Sig.

Levene's Test forEuality of Variances




Difference



NHANES Comparison

The “Mean Difference” is 0.2937. Since the healthy cases had a higher mean, this is

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

What is this a CI for? The difference in population mean log(saturated fat) intake between cancer cases and healthy controls:

(Healthy) – (Cancer)


NHANES Comparison


The 95% CI is from 0.0065 to 0.5223

The null hypothesis of interest is that the population means are equal, i.e.,

(Healthy) – (Cancer) = 0


NHANES Comparison


The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.05 or p > 0.05?


NHANES Comparison


The 95% CI is from 0.0065 to 0.5223

0 = Hypothesized

value

0.0065 0.5223

Confidence Interval


NHANES Comparison


The 95% CI is from 0.0065 to 0.5223


Answer: p < 0.05 since the 95% CI does not cover zero.


NHANES Comparison


The 95% CI is from 0.0065 to 0.5223


Answer: You cannot tell from a 95% CI. However, from the SPSS output, p = 0.012. (see next slide)


NHANES Comparison: the 95% confidence interval


.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224



F Sig.

Levene's Test forEuality of Variances




Difference



NHANES Comparison


The 95% CI is from 0.0065 to 0.5223

What do we conclude from this confidence interval?


NHANES Comparison


The 95% CI is from 0.0065 to 0.5223

What do we conclude from this confidence interval?

The population mean log(saturated fat) intake is greater in the Healthy cases by between 0.0065 and 0.5223 (exponentiate to get in terms of grams of saturated fat), with 95% confidence


Comparing Two Population Means: the Formulas

The data:

The populations:

The aim: CI for

1X 1s 1n

2X 2s 2n

1 12 2

1 2


Comparing Two Populations

Does it matter which one you call population 1 and which one you call population 2?

Not at all. The key is to interpret the difference properly.



The aim: CI for

This is the difference in population means

The estimate of the difference in population means is the difference in sample means

This is a random variable: it has sample to sample variability

1 2

1 2X X



Difference of sample means

“Population” mean from repeated sampling is

The s.d. from repeated sampling is

1 2X X

1 2

2 21 2

1 2n n



Difference of sample means

The s.d. from repeated sampling is

You need reasonably large samples from BOTH populations

1 2X X

2 21 2

1 2n n



If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by

2 21 1 2 2

p1 2

(n 1)s (n 1)ss

n n 2



The standard error then of is the value

The number of degrees of freedom is

1 2X X

p 1 2

1 1s

n n

1 2n n 2



A (1100% CI for is

Note how the sample sizes determine the CI length

1 2X X /2 1 2 p 1 2

1 1t (n +n -2)s

n n

1 2



Generally, you should make your sample sizes nearly equal, or at least not wildly unequal. Consider a total sample size of 100

= 1 if n1 = 1, n2 = 99

= 0.20 if n1 = 50, n2 = 50

Thus, in the former case, your CI would be 5 times longer!

1 2

1 1

n n

1 2 /2 1 2 p 1 2

1 1X X t (n +n -2)s

n n



The CI can of course be used to test hypotheses

This is the same as

So we just need to check whether 0 is in the interval, just as we have done

0 1 2 a 1 2H : vs H :

0 1 2 a 1 2H : =0 vs H : 0


Comparing Two Populations: The t-test

There is something called a t-test, which gives you the information as to whether 0 is in the CI.

It does not tell you where the means lie however, so it is of limited use. P-values tell you the same thing.

0 1 2 a 1 2H : =0 vs H : 0



The t-statistic is defined by

1 2

p 1 2

X Xt =

1 1s

n n



You reject equality of means if

In this case, is p < or is p > ?

/2 1 2|t| > t (n +n -2)



You reject equality of means if

p <

/2 1 2|t| > t (n +n -2)


NHANES Comparison: the t-test


.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224





tdf

Sig. (2-tailed)Mean



Difference


/2 1 2 .025t (n +n -2) = t (117) 1.98

/2 1 2t = 2.543 > t (n +n -2) 1.98, hence reject

the hypothesis that the population means are equal,

for = 0.05



SPSS Demonstrations: bluebonnets and Framingham Heart Disease and Blood Pressure, as time permits

copyright (c) bani k. mallick1 stat 651 lecture 9

Documents