comparing means: confidence intervals and hypotheses tests for the difference between two population...
TRANSCRIPT
Comparing Means:Confidence Intervals and Hypotheses Tests for the Difference between Two
Population Means µ1 - µ2
Comparing Means:Confidence Intervals and Hypotheses Tests for the Difference between Two
Population Means µ1 - µ2
1
Confidence Intervals for the Difference between Two Population Means µ1 - µ2: Independent Samples
• Two random samples are drawn from the two populations of interest.
• Because we compare two population means, we use the statistic .
2
21 xx
3
Population 1 Population 2
Parameters: µ1 and 12 Parameters: µ2 and 2
2 (values are unknown) (values are unknown)
Sample size: n1 Sample size: n2
Statistics: x1 and s12 Statistics: x2 and s2
2
Estimate µ1 µ2 with x1 x2
2 2
1 21 2
1 2
( )s s
SE x xn n
Sampling distribution model for ? 1 2x x
22 21 2
1 22 22 2
1 2
1 1 2 2
1 11 1
s sn n
dfs s
n n n n
Sometimes used (not always very good) estimate of the
degrees of freedom is
min(n1 − 1, n2 − 1).
2 21 2
1 2 1 2 1 21 2
( ) ( )E x x SD x xn n
Shape?
Estimate using
df
0t
1 2
2 21 21 21 2By Central Limit Theorem ~ , n nXX N
2 21 2
1 2
1 2 1 2( ) ( )s sn n
x xt
Two sample t-confidence interval with confidence level C
C
t*−t*
Practical use of t: t*
C is the area between −t* and
t*.
If df is an integer, we can find
the value of t* in the line of the
t-table for the correct df and the
column for confidence level C.
If df is not an integer find the
value of t* using technology.
Confidence Interval for m1 – m2
6
*
*
2 21 2( )
1 21 2
where is the value from the t-table
that corresponds to the confidence level
df
df
Confidence interval
s sx x t
n n
t
22 21 2
1 22 22 2
1 2
1 1 2 2
1 11 1
s sn n
dfs s
n n n n
Example: “Cameron Crazies”. Confidence interval for m1 – m2
Do the “Cameron Crazies” at Duke home games help the Blue Devils play better defense?
Below are the points allowed by Duke (men) at home and on the road for the conference games from a recent season.
7
Pts allowed at home
44 56 44 54 75 101 91 81
Pts allowed on road
58 56 70 74 80 67 65 79
1 1 1
2 2 2
home: 68.25 21.8 8
road: 68.63 8.9 8
x s n
x s n
Example: “Cameron Crazies”. Confidence interval for m1 – m2
8
Calculate a 95% CI for 1 - 2 where 1 = mean points per game allowed by Duke at home.2 = mean points per game allowed by Duke on road
• n1 = 8, n2 = 8; s12= (21.8)2 = 475.36; s2
2 = (8.9)2 = 79.41
2 22 21 2
1 22 2 2 22 2
1 2
1 1 2 2
475.36 79.418 8
9.271 475.36 1 79.411 17 8 7 81 1
s sn n
dfs s
n n n n
• To use the t-table let’s use df = 9; t9* = 2.2622• The confidence interval estimator for the difference between two means is …
9
*9
2 21 2( )
1 21 2
475.36 79.41(68.25 68.63) 2.2622
8 8
.38 18.84 19.22,18.46
s sx x t
n n
Example: “Cameron Crazies”. Confidence interval for m1 – m2
Interpretation• The 95% CI for 1 - 2 is (-19.22, 18.46).• Since the interval contains 0, there appears to be
no significant difference between1 = mean points per game allowed by Duke at home.
2 = mean points per game allowed by Duke on road
• The Cameron Crazies appear to have no affect on the ABILITY of the Duke men to play better defense.
10
How can this be?
Example: 95% confidenceinterval for m1 – m2
• Example– Do people who eat high-fiber cereal for
breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast?
– A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal.
– For each person the number of calories consumed at lunch was recorded. 11
Example: 95% confidence interval for m1 – m2
12
Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748
. .
. .
. .
. .
Solution:• The parameter to be tested is the difference between two means. • The claim to be tested is: The mean caloric intake of consumers (m1) is less than that of non-consumers (m2).22 2
1 2
1 22 22 2
1 2
1 1 2 2
122.61 1
1 1
s sn n
dfs s
n n n n
1 2
1 2
2 2
1 2
43 107
604.02 633.239
4103 10670
n n
x x
s s
Example: 95% confidence interval for m1 – m2
• Let’s use df = 122.6; t122.6* = 1.9795• The confidence interval estimator for• the difference between two means is…
13
*122.6
2 21 2( )
1 21 2
4103 10670(604.02 633.239) 1.9795
43 107
29.21 27.652 56.862, 1.56
s sx x t
n n
Interpretation• The 95% CI is (-56.862, -1.56).• Since the interval is entirely negative (that is,
does not contain 0), there is evidence from the data that µ1 is less than µ2. We estimate that non-consumers of high-fiber breakfast consume on average between 1.56 and 56.862 more calories for lunch.
14
• Let’s use df = min(43-1, 107-1) = min(42, 106) = 42;• t42* = 2.0181• The confidence interval estimator for the difference
between two means is
15
*42
2 21 2( )
1 21 2
4103 10670(604.02 633.239) 2.0181
43 107
29.21 28.19 57.40, 1.02
s sx x t
n n
Example: (cont.) confidence interval for 1 – 2 using min(n1 –1, n2 -1) to approximate the df
Beware!! Common Mistake !!!
A common mistake is to calculate a one-sample
confidence interval for m1, a one-sample confidence interval for
m2, and to then conclude that m1 and m2 are equal if the
confidence intervals overlap.
This is WRONG because the variability in the sampling
distribution for from two independent samples is more
complex and must take into account variability coming from both
samples. Hence the more complex formula for the standard error.
2
22
1
21
n
s
n
sSE
21 xx
INCORRECT Two single-sample 95% confidence intervals: The confidence interval for the male mean and the confidence interval for the female mean overlap,
suggesting no significant difference between the true mean for males and the true mean for females.
Male interval: (18.68, 20.12)Male Female
mean 19.4 17.9
st. dev. s 2.52 3.39
n 50 50
Female interval: (16.94, 18.86)
2 2* 1 2
1 2 .025,1 2
The 2-sample 95% confidence interval of the form
( ) for the difference between the means
is . Interval is entirely positive,
dfs sy y t n n
CORRECT
(.313, 2.69) suggesting signi
male female
between the true mean for males and the true mean for females
(evidence that true male mean is larger than true female mean).
ficant difference
0 1.5.313 2.69
Reason for Contradictory Result
18
2 21 2 1 2
1 2 1 2
1 2 1 2
It's always true that
. Specifically,
( ) ( ) ( )
a b a b
s s s s
n n n n
SE x x SE x SE x
Does smoking damage the lungs of children exposed
to parental smoking?
Forced vital capacity (FVC) is the volume (in milliliters) of
air that an individual can exhale in 6 seconds.
FVC was obtained for a sample of children not exposed to
parental smoking and a group of children exposed to
parental smoking.
We want to know whether parental smoking decreases
children’s lung capacity as measured by the FVC test.
Is the mean FVC lower in the population of children
exposed to parental smoking?
Parental smoking FVC s n
Yes 75.5 9.3 30
No 88.2 15.1 30
x
Parental smoking FVC s n
Yes 75.5 9.3 30
No 88.2 15.1 30
We are 95% confident that lung capacity is between 19.21 and 6.19 milliliters LESS in children of smoking parents.
x
95% confidence interval for (µ1 − µ2), with
df = 48.23 t* = 2.0104:2 21 2
1 21 2
2 2
( ) *
9.3 15.1(75.5 88.2) 2.0104
30 3012.7 2.0104*3.24
12.7 6.51 ( 19.21, 6.19)
s sx x t
n n
m1 = mean FVC of children with a smoking parent;
m2 = mean FVC of children without a smoking parent
22 21 2
1 22 22 2
1 2
1 1 2 2
48.231 1
1 1
s s
n ndf
s s
n n n n
Do left-handed people have a shorter life-expectancy than
right-handed people? Some psychologists believe that the stress of being left-
handed in a right-handed world leads to earlier deaths
among left-handers. Several studies have compared the life expectancies of
left-handers and right-handers. One such study resulted in the data shown in the table.
We will use the data to construct a confidence interval
for the difference in mean life expectancies for left-
handers and right-handers.
Is the mean life expectancy of left-handers less
than the mean life expectancy of right-handers?
Handedness Mean age at death s n
Left 66.8 25.3 99
Right 75.2 15.1 888
x
left-handed presidents
star left-handed quarterback Steve Young
We are 95% confident that the mean life expectancy for left-handers is between 3.27 and 13.53 years LESS than the mean life expectancy for right-handers.
95% confidence interval for (µ1 − µ2), with
df = 105.92 t* = 1.9826:2 21 2
1 21 2
2 2
( ) *
(25.3) (15.1)(66.8 75.2) 1.9826
99 8888.4 1.9826*2.59
8.4 5.13 ( 13.53, 3.27)
s sx x t
n n
m1 = mean life expectancy of left-handers;
m2 = mean life expectancy of right-handers
Handedness Mean age at death s n
Left 66.8 25.3 99
Right 75.2 15.1 888
The “Bambino”,left-handed Babe Ruth, baseball’s all-time best
player.
1 2 1 2
2 21 2
1 2
( ) ( )test statistic:
x xt
s sn n
The null hypothes H0 is that both population means m1 and m2 are
equal, thus their difference is equal to zero.
2
22
1
21
21 )0()(
ns
ns
xxt
Because in a two-sample test H0
says (m1 − m2) = 0, the test statistic is …
0 1 2
1 2
: 0
0,1: - 0,1
0,2A
H
tailH tail
tail
Two-sample t-test
P-value=P(t > t0) P-value=P(t < t0)
P-value=2P(t > |t0|)
Does smoking damage the lungs of children
exposed to parental smoking?Forced vital capacity (FVC) is the volume (in milliliters) of air that an
individual can exhale in 6 seconds.
FVC was obtained for a sample of children not exposed to parental
smoking and a group of children exposed to parental smoking.
We want to know whether parental smoking decreases
children’s lung capacity as measured by the FVC test.
Is the mean FVC lower in the population of children
exposed to parental smoking?
Parental smoking FVC s n
Yes 75.5 9.3 30
No 88.2 15.1 30
x
Parental smoking FVC s n
Yes 75.5 9.3 30
No 88.2 15.1 30
Conclusion: Reject H0. Lung capacity issignificantly impaired in children of smoking parents.
x
H0: m1 − m2 = 0 df = 48.23
Ha: m1 − m2 < 0
m1 = mean FVC of children with a smoking parent;
m2 = mean FVC of children without a smoking parent
22 21 2
1 22 22 2
1 2
1 1 2 2
48.231 1
1 1
s s
n ndf
s s
n n n n
1 2
2 2 2 21 2
1 2
75.5 88.2
9.3 15.130 30
12.7 3.9
2.9 7.6
x xt
s sn n
t
P-value=P(t<-3.9) .0001
Recall the 95% CI for m1 − m2: (19.21, 6.19)
Can directed reading activities in the classroom help improve reading ability?
A class of 21 third-graders participates in these activities for 8 weeks while a
control classroom of 23 third-graders follows the same curriculum without the
activities. After 8 weeks, all children take a reading test (scores in table).
0 1 2
1 2
2 2
: 0
: 0
51.48 41.522.31
11.01 17.15
21 23df = 37.86
A
H
H
t
1 = mean test score of activities participants2 = mean test score of controls
P-value=P(t37.86 > 2.31) = .013
There is evidence that reading activities improve reading ability.
RobustnessThe two-sample t procedures are more robust than the one-
sample t procedures. They are the most robust when both
sample sizes are equal and both sample distributions are similar.
But even when we deviate from this, two-sample tests tend to
remain quite robust.
When planning a two-sample study, choose equal sample sizes
if you can.
As a guideline, a combined sample size (n1 + n2) of 40 or more
will allow you to work even with the most skewed distributions.
Pooled two-sample procedures
There are two versions of the two-sample t-test: one assuming
equal variance (“pooled 2-sample test”) and one not assuming
equal variance (“unequal” variance, as we have studied) for the
two populations. They have slightly different formulas and
degrees of freedom.
Two normally distributed populations with unequal variances
The pooled (equal variance) two-
sample t-test was often used before
computers because it has exactly
the t distribution for degrees of
freedom n1 + n2 − 2.
However, the assumption of equal
variance is hard to check, and thus
the unequal variance test is safer.
When both population have the
same standard deviation, the
pooled estimator of σ2 is:
The sampling distribution for has exactly the t distribution
with (n1 + n2 − 2) degrees of freedom.
A level C confidence interval for µ1 − µ2 is
(with area C between −t* and t*)
To test the hypothesis H0: µ1- µ2 = 0 against a
one-sided or a two-sided alternative,
compute the pooled two-sample t statistic
for the t(n1 + n2 − 2) distribution.
1 2x x
Pooled two-sample procedures (cont.)
Matched pairs t proceduresSometimes we want to compare treatments or conditions at the
individual level. These situations produce two samples that are not
independent — they are related to each other. The members of one
sample are identical to, or matched (paired) with, the members of the
other sample.
– Example: Pre-test and post-test studies look at data collected on the
same sample elements before and after some experiment is performed.
– Example: Twin studies often try to sort out the influence of genetic
factors by comparing a variable between sets of twins.
– Example: Using people matched for age, sex, and education in social
studies allows canceling out the effect of these potential lurking
variables.
Matched pairs t procedures• The data:
– “before”: x11 x12 x13 … x1n
– “after”: x21 x22 x23 … x2n
• The data we deal with are the differences di of the paired values:
d1 = x11 – x21 d2 = x12 – x22 d3 = x13 – x23 … dn = x1n – x2n
• A confidence interval for matched pairs data is calculated just like a confidence interval for 1 sample data:
• A matched pairs hypothesis test is just like a one-sample test:H0: µdifference= 0 ; Ha: µdifference>0 (or <0, or ≠0) 31
*1
dn
sd t
n
Sweetening loss in colasThe sweetness loss due to storage was evaluated by 10 professional
tasters (comparing the sweetness before and after storage):
Taster
• 1 2.0 95% Confidence interval:• 2 0.4 1.02 2.2622(1.196/sqrt(10)) = 1.02 2.2622(.3782)• 3 0.7 = 1.02 .8556 =(.1644, 1.8756)• 4 2.0• 5 −0.4• 6 2.2• 7 −1.3• 8 1.2• 9 1.1• 10 2.3Summary stats: = 1.02, s = 1.196
We want to test if storage results in a
loss of sweetness, thus:
H0: mdifference = 0
versus Ha: mdifference > 0
Before sweetness – after sweetness
This is a pre-/post-test design and the variable is the cola sweetness
before storage minus cola sweetness after storage.
A matched pairs test of significance is indeed just like a one-sample
test.
d
Sweetening loss in colas hypothesis test
• H0: mdifference = 0 vs Ha: mdifference > 0
• Test statistic
• From t-table: for df=9,2.2622 <t=2.6970<2.8214 .01 < P-value < .025
• ti83 gives P-value = .012263…
• Conclusion: reject H0 and conclude colas do lose sweetness in storage (note that CI was entirely positive.
33
1.02 0 1.022.6970
1.196 .378210
t
Does lack of caffeine increase depression?
Individuals diagnosed as caffeine-dependent are
deprived of caffeine-rich foods and assigned
to receive daily pills. Sometimes, the pills
contain caffeine and other times they contain
a placebo. Depression was assessed (larger number means more depression).
– There are 2 data points for each subject, but we’ll only look at the difference.
– The sample distribution appears appropriate for a t-test.
SubjectDepression
with CaffeineDepression
with PlaceboPlacebo - Cafeine
1 5 16 112 5 23 183 4 5 14 3 7 45 8 14 66 5 24 197 0 6 68 0 3 39 2 15 1310 11 12 111 1 0 -1
11 “difference” data points.
-5
0
5
10
15
20
DIF
FER
ENC
E
-2 -1 0 1 2Normal quantiles
Hypothesis Test: Does lack of caffeine increase depression?For each individual in the sample, we have calculated a difference in depression score
(placebo minus caffeine).
There were 11 “difference” points, thus df = n − 1 = 10.
We calculate that = 7.36; s = 6.92
H0 :mdifference = 0 ; Ha: mdifference > 0
53.311/92.6
36.70
ns
xt
SubjectDepression
with CaffeineDepression
with PlaceboPlacebo - Cafeine
1 5 16 112 5 23 183 4 5 14 3 7 45 8 14 66 5 24 197 0 6 68 0 3 39 2 15 1310 11 12 111 1 0 -1
For df = 10, 3.169 < t = 3.53 < 3.581 0.005 > p > 0.0025ti83 gives P-value = .0027
Caffeine deprivation causes a significant increase in depression.
x
Which type of test? One sample,paired samples, two samples?
• Comparing vitamin content of bread
immediately after baking vs. 3 days
later (the same loaves are used on day
one and 3 days later).
Paired
• Comparing vitamin content of bread
immediately after baking vs. 3 days
later (tests made on independent
loaves).
Two samples
• Average fuel efficiency for 2005
vehicles is 21 miles per gallon. Is
average fuel efficiency higher in the
new generation “green vehicles”?
One sample
• Is blood pressure altered by use of
an oral contraceptive? Comparing a
group of women not using an oral
contraceptive with a group taking it.
Two samples
• Review insurance records for dollar
amount paid after fire damage in
houses equipped with a fire
extinguisher vs. houses without one.
Was there a difference in the
average dollar amount paid?
Two samples