Week 10
Comparing Two Means or Proportions
Generalising from sample
Individuals Measurement Groups Question
Children aged 10
Mark in maths test
Boys & girls
Are male marks higher on average?
Plots in field Yield of wheat Varieties A & B
Which gives higher yields?
Cars leaving production line
CO emissions from exhaust
Production lines 1 & 2
Are both lines same?
Generalising from sample
Individuals Measurement Groups Question
Children aged 10
Pass/fail in maths test
Boys & girls Are males more likely to pass?
Cabbages in field
Infected by cabbage butterfly
Varieties A & B
Which is less likely to be infected?
Cars leaving production line
Rattle in exhaust
Production lines 1 & 2
Do both lines have same chance of rattle?
Numerical measurements: means
Difference in average weight loss for those who diet compared to those who exercise to lose weight?
Difference is there between the mean foot lengths of men and women?
Population parameter 2 – 1 = difference between population means
Sample estimate x2 – x1 = difference between sample means
Categorical measurements: propns
Difference between the proportions that would quit smoking if taking the antidepressant buproprion (Zyban) versus wearing a nicotine patch?
Difference between proportion who have heart disease of men who snore and men who don’t snore?
Population parameter 2 – 1 = difference between population
proportions
Sample estimate p2 – p1 = difference between sample proportions
Requirement: independent samples
Random samples taken separately from 2 populations
Randomised experiment with 2 treatments
One random sample, but a categorical variable splits individuals into 2 groups.
Two samples are called independent samples when the measurements in one sample are not related to the measurements in the other sample.
Model for numerical data
Sample 1 ~ population (mean 1, s.d. 1)Sample 2 ~ population (mean 2, s.d. 2)
Estimation: estimate (2 – 1) with Standard error? Confidence interval?
Testing: is (2 – 1) zero? p-value
€
x2 − x1( )
Model for categorical data
Sample 1 ~ population (proportion 1)Sample 2 ~ population (proportion 2)
Estimation: estimate (2 – 1) with (p2 – p1) Standard error? Confidence interval?
Testing: is (2 – 1) zero? p-value
Distribution of difference
In both cases, we need to find distribution of difference (p2 – p1) or
Independent samples >> difference of independent random variables.
We already know distns of the two parts — what is distn of their difference?
€
x2 − x1( )
Sum of 2 variables
Sample mean:
Sample total:
€
X1 + X22
~ distn, 2
⎛
⎝ ⎜
⎞
⎠ ⎟
€
X1 + X2 ~ distn2, 2( )
Same distns
Different distns
€
X1 + X2 ~ distn1 +2 , 12 +2
2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟
Difference between 2 variables
Same standard devn as sum
€
X2 − X1 ~ distn2 −1, 12 +2
2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟
If X1 and X2 are normal
€
X2 − X1 ~ normal2 −1, 12 +2
2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟
Remember that X1 and X2 must be independent
Example
Husband height ~ normal(1.85, 0.1)Wife height ~ normal(1.7, 0.08)
Assume independent. (Probably not!!)
Prob that wife is taller than husband?
(Husband - Wife) ~
€
normal0.15, 0.12 + 0.082 ⎛ ⎝ ⎜
⎞ ⎠ ⎟ = normal0.15, 0.1281( )
Example
Husband height ~ normal(1.85, 0.1)Wife height ~ normal(1.7, 0.08)
Husband - Wife ~ normal(0.15, 0.1281)
P (diff ≤ 0) = area
0.15 0.28 0.410.02-0.11
€
z = −0.150.1281
= −0.534 Prob = 0.297
Difference between proportions
If X1 and X2 are independent,
€
X2 − X1 ~ distn2 −1, 12 +2
2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟
If p1 and p2 are independent,
€
p2 − p1 ~ distn2 −1, 1 1−1( )
n1+2 1−2( )
n2
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
For large samples, p1 and p2 are approx normal, so their difference is too.
n1 = n2 = 244 randomly assigned to each treatment
Std error for difference in propns
Nicotine patches vs Antidepressant (Zyban)?
Zyban: 85 out of 244 quit smokingPatch: 52 out of 244 quit smoking
€
s.e.(p1−p2 ) =p1 1−p1( )
n1+p2 1−p2( )
n2
So,
€
p1−p2 =.348 −.213 =.135
€
and s.e.(p1−p2 ) =.348 1−.348( )
244+.213 1−.213( )
244=.040
Approximate 95% C.I.
Best you can do for difference between proportions
For means, CI can be improved by replacing ‘2’ by a different value.
For sufficiently large samples, the interval
Estimate 2 Standard error
is an approximate 95% C.I.
Patch vs Antidepressant
Approx 95% C.I. .135 2(.040) => .135 .080 => .055 to .215
Study: n1 = n2 = 244 randomly assigned to each group
Zyban:85 of the 244 Zyban users quit smoking = .348
Patch: 52 of the 244 patch users quit smoking = .2131p̂2p̂
So, 135.213.348.ˆˆ 21 =−=−pp 040.)ˆˆ.(. and 21 =−ppes
We are 95% confident that Zyban gives an improvement of between 5.5% and 21.5% of the probability of quitting smoking.
Difference between means
If X1 and X2 are independent,
€
X2 − X1 ~ distn2 −1, 12 +2
2 ⎛ ⎝ ⎜
⎞ ⎠ ⎟
If X1 and X2 are independent,
€
X2 − X1 ~ distn2 −1, 12
n1+22
n2
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
If both populations are normal, so is the difference.
n1 = 42 men on diet, n2 = 27 men on exercise routine
Std error for difference in means
Lose More Weight by Diet or Exercise?
2
22
1
21
21 ).(.n
s
n
sxxes +=−
Diet: Lost an average of 7.2 kg with std dev of 3.7 kgExercise: Lost an average of 4.0 kg with std dev of 3.9 kg
So, kg 2.30.42.721 =−=−xx
( ) ( )81.0
47
9.3
42
7.3).(. and
22
21 =+=− xxes
We are 95% confident that those who diet lose on average 1.58 to 4.82 kg more than those who exercised.
Approximate 95% Confidence Interval: 3.2 2(.81) => 3.2 1.62 => 1.58 to 4.82 kg
Study: n1 = 42 men on diet, n2 = 27 men exercise
Diet: Lost an average of 7.2 kg with std dev of 3.7 kgExercise: Lost an average of 4.0 kg with std dev of 3.9 kg
So, kg 2.30.42.721 =−=−xx kg 81.0).(. and 21 =−xxes
Diet vs Exercise
A CI for the Difference Between Two Means(Independent Samples):
where t* is a value from t-tables.
2
22
1
21*
21 n
s
n
stxx +±−
Better C.I. for mean
d.f. = min(n1–1, n2–1) Welch’s approx gives a different d.f. (higher)
but is a complicated formula t* is approx 1.96 if d.f. is high
Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection.
Estimate difference between the mean crossing times.
No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, 7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7
Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, 6.1, 4.8, 4.9, 4.5, 7.2, 5.8
Effect of a stare on driving
No outliers; no strong skewness.
Crossing times in stare group seem faster & less variable.
Checking data
A 95% CI for 2–1 is
Effect of stare on driving
Using df = min(n1–1, n2–1) = 12, gives t* = 2.179
€
x2 − x1 = 6.63 −5.59 = 1.04 sec
€
s.e.(x 2 − x 1) =1.36( )
2
14+
0.822( )2
13= 0.43
€
1.04 ± 2.179 × 0.43 = 0.10 to 1.98 sec
Slightly narrower C.I. that we got with d.f. = 12.
N.B. C.I. is based on df = 21 (Welch’s approx)
Effect of stare on driving
Minitab
Interpretation
We are 95% confident that it takes drivers between 0.17 and 1.91 seconds less on average to cross intersection if someone stares at them.
A 95% CI for 2–1 is 0.17 to 1.91 sec
Testing two proportions
Hypotheses
H0: 1 – 2 = 0 HA: 1 – 2 ≠ 0
or 1 – 2 < 0or 1 – 2 > 0
Watch how Population 1 and 2 are defined. Data requirements
Independent samples n1 p1, n1(1-p1), n2 p2, n2(1-p2) all at least 5, preferably ≥10
Test statistic
Based on p1 – p2
Standardise:
€
z = p1 − p2( ) − 0
se p1 − p2( )
€
se p1 − p2( ) = π1 1− π1( )
n1
+π 2 1− π 2( )
n2
=1
n1
+1
n2
⎛
⎝ ⎜
⎞
⎠ ⎟π 1− π( ) if H0 is true
Test statistic
If H0 is true, best estimate of is
€
z = p1 − p2( ) − 0
1
n1
+1
n2
⎛
⎝ ⎜
⎞
⎠ ⎟p 1− p( )
€
p = x1 + x2( )n1 + n2( )
So we use test statistic
If H0 is true, this has standard normal distn p-value from normal distn
Prevention of Ear Infections
Does the use of sweetener xylitol reduce the incidence of ear infections?
Randomized Experiment:Of 165 children on placebo, 68 got ear infection.Of 159 children on xylitol, 46 got ear infection.
Hypotheses: H0: 1 – 2 = 0 Ha: 1 – 2 > 0 Data check: At least 5 success & failure in each group
123.ˆˆ and ,289.159
46ˆ ,412.
165
68ˆ 2121 =−==== pppp
Prevention of Ear Infections
Overall propn getting infection
35.324
114
159165
4668ˆˆˆ
21
2211 ==++
=++
=nnpnpn
p
( ) ( )32.2
1591
1651
35.135.
0123.
11ˆ1ˆ
0ˆˆ
21
21 =
⎟⎠⎞
⎜⎝⎛ +−
−=
⎟⎟⎠
⎞⎜⎜⎝
⎛+−
−−=
nnpp
ppz
Test statistic
p-value = 0.01
Conclusion: Strong evidence xylitol reduces
chance of ear infection
Testing two means
Hypotheses
H0: 1 – 2 = 0 HA: 1 – 2 ≠ 0
or 1 – 2 < 0or 1 – 2 > 0
Watch how Population 1 and 2 are defined. Data requirements
Fairly large n1 and n2 (say 30 or more), or Not much skewness & no outliers (normal model reasonable)
Test statistic
Based on
Standardise:
€
t = x 1 − x 2s1
2
n1
+s2
2
n2€
se x 1 − x 2( ) = σ 1
2
n1
+σ 2
2
n2
≈s1
2
n1
+s2
2
n2
€
x 1 − x 2
€
t = x 1 − x 2( ) − 0
se x 1 − x 2( )
Test
Test statistic:
If H0 is true, this has approx t-distn with
d.f. = min(n1–1, n2–1) Same d.f. as CI for 1 – 2
p-value from t distn Minitab or Excel
€
t = x 1 − x 2s1
2
n1
+s2
2
n2
n1 and n2 ≥ 30 Use normal tables
Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection.
Test whether stare speeds up crossing times.
No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, 7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7
Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, 6.1, 4.8, 4.9, 4.5, 7.2, 5.8
Effect of a stare on driving
Small sample sizes, but
No outliers; no strong skewness.
Checking data
Effect of stare on driving
€
x 1 − x 2 = 6.63 − 5.59 = 1.04 sec
HypothesesH0: 1 – 2 = 0 HA: 1 – 2 > 0
where 1 = no-stare, 2 = stare
€
t = x 1 − x 2
se x 1 − x 2( ) =
1.04
0.429 = 2.42
€
s.e.(x 2 − x 1) =1.36( )
2
14+
0.822( )2
13= 0.429
Effect of stare on driving
Test statistic
df = min(n1–1, n2–1) = 12
Upper tail area of t-distn (12 d.f.)
p = 0.016
P-value
€
t = x 1 − x 2
se x 1 − x 2( ) =
1.04
0.429 = 2.42
Strong evidence that stare speeds up crossing
Very similar p-value and same conclusion
N.B. Test is based on df = 21 (Welch’s approx)
Effect of stare on driving
Minitab
Strong evidence that stare speeds up crossing
Paired data and 2-sample data
Make sure you distinguish between: 2 measurements on each individual (e.g. before &
after)
Measurements from 2 independent groups
Different cars assessed for insurance claims in garages A and B
Same cars assessed by both garages
2 independent samples
Paired data