chapter 13: comparing two population parameters

Chapter 13: Comparing Two Population Parameters

13.1 – Comparing Two Means

Comparative studies are more convincing than single-sample investigations, so one-sample inference is not as common as comparative (two-sample) inference. In a comparative study, we may want to compare two treatments, or we may want to compare two populations. In either case, the samples must be chosen randomly and independently in order to perform statistical inference.

How is this different than a matched pairs design?A matched pairs design is when you compare two similar things given the same treatment. This is when you are comparing two sets of samples given different treatments!

Two-Sample inference:

Compare two treatments or two populations. The null hypothesis is that there is no difference between the two parameters.

1 2: or oH 1 2: 0oH

Review:

How do you subtract two means?

1 2–

How do you subtract two standard deviations?

1 2+2 2

Add their variances and take their square roots!

Two Sample Z:

is known

Two Sample T:

is not known

SRS SRS

Normality Normality•Population approx normal• n1 + n2 30 by CLT

•Population approx normal• n1 + n2 30 by CLT• n1 + n2 < 30 and data doesn’t have strong skewness

Independence IndependenceN 10nThe two samples are independent

N 10nThe two samples are independent

Note!The t* statistic does not have an exact t-distribution.

The degrees of freedom are calculated differently.22 2

1 2

1 22 22 2

1 2

1 1 2 2

1 11 1

s sn n

dfs s

n n n n

Your calculator will do this for you!

Two Sample Z:

Two Sample T:

Confidence Interval:

2 21 2

1 21 2

*x x Zn n

2 21 2

1 21 2

*dfs sx x tn n

estimate test statistic sd

Two Sample Z:

Two Sample T:

Hypothesis Test:

1 2 1 22 21 2

1 2

x xZ

n n

test statistic = estimate – hypothesized valuestandard deviation of statistic

1 2 1 22 21 2

1 2

x xt

s sn n

Two Sample Z: Two Sample T:STAT-TESTS- 2-SampZtest STAT-TESTS- 2-SampTtest

STAT-TESTS- 2-SampZInt STAT-TESTS- 2-SampTInt

Note: The only time you pool is when the standard deviations are the same. This almost never happens, so just don’t do it!

Calculator Tip!

Example #1Patients with heart-attack symptoms arrive at an emergency room either by ambulance or self-transportation provided by themselves, family, or friends. When a patient arrives at the emergency room, the time of arrival is recorded. The time when the patient’s diagnostic treatment begins is also recorded. An administrator of a large hospital wanted to determine whether the mean wait time (time between arrival and diagnostic treatment) for patients with heart-attack symptoms differ according to the mode of transportation. A random sample of 150 patients with heart-attack symptoms who had reported to the emergency room was selected. For each patient, the mode of transportation and wait time were recorded. Summary statistics for each mode of transportation are shown in the table below.

Mode of Transportation

Sample Size

Mean Wait Time (in minutes)

Standard Deviation of Wait Time

(in minutes)

Self 73 8.30 5.16

Ambulance 77 6.04 4.30

a. Use a 99% confidence interval to estimate the difference between the mean wait times for ambulance transported patients and self-transported patients at this emergency room.

P: μS = mean wait time for diagnostic treatment if traveled by self-transportation

μA = mean wait time for diagnostic treatment if traveled by ambulance

μD = μA - μS = Difference in wait times

A:

SRS (says so)

Normality73 + 77 ≥ 30

150 ≥ 30

By the CLT, ok to assume normality

Independence(More than 1500 people with heart-attack symptoms)

Self-transported patients shouldn’t influence the wait time in ambulance transported patients

N: Two-Sample t-interval

nA + nS 30

I: 22

* SAA S df

A S

ssx x tn n

222

22 2211 1

1 1

SA

A S

S

A A S S

ssn n

dfss

n n n n

22 2

2 22 2

5.16 4.3073 77

1 5.16 1 4.373 1 73 77 1 77

0.365860.00185 0.0007587

140.3717611

22

100 * SAA S

A S

ssx x tn n

4.30 5.166.04 8.30 2.62677 73

2.26 2.626 0.3557

4.302, 0.218

2.26 0.93409

Note: Using the calculator!

22

* SAA S df

A S

ssx x tn n

*140.37

4.30 5.166.04 8.3077 73

t

4.2910, 0.2291

I am 99% confident the true mean difference of wait time of ambulance and self-transported patients is between –4.2910 and –0.2291 minutes

C:

b. Based only on this confidence interval, do you think the difference in the mean wait times is statistically significant? Justify your answer.

Since 0 is not in the confidence interval, we can say that the ambulance wait times are statistically significantly shorter than the wait times for self-transported patients at the 99% confidence level.

Example #2: The following is a list of salary rates (per hour in dollars) for men and women with a high school diploma.

Women Men8 10.6 7.5 11.98.25 10.8 8.5 11.959 11 8.5 129.25 11.5 9.85 129.35 11.9 10.5 129.8 12.25 10.5 12.59.95 12.5 10.5 1310 12.5 10.9 13.710 12.95 10.95 13.7510 13.9 11 14.510.25 13.95 11 14.7510.5 14.45 11.65 1510.5 14.8 11.9 15.5

If the two samples are independent and are taken randomly, is there significant evidence that the men make more money than the women? Assume that in past experience = 1.99 dollars for men and = 2.01 for women.

P: μM = mean dollars per hour for men with high school diploma

μW = mean dollars per hour for women with high school diploma

μD = μM - μW = Difference in dollars per hour

H:

: or : 0o M W o M WH H

: or : 0A M W A M WH H

A:

SRS (says so)

Normality26 + 26 ≥ 30

52 ≥ 30

By the CLT, ok to assume normality

Independence(More than 520 people with engineer degree)

Men’s salaries shouldn’t influence the salaries of women with high school diploma. Also, says independent

N: Two-Sample Z-Test

nM + nW 30

T:

22

( ) ( )M W M W

WM

M W

x xZ

n n

2 2

(11.76153 11.075) (0)

1.99 2.0126 26

0.686530.5547

1.2376

1.24

P(Z > 1.24) = 1 – P(Z < 1.24) =

O:

1.24

P(Z > 1.24) = 1 – P(Z < 1.24) = 1 – 0.8925 = 0.1075

M:

____ p 0.1075 0.05

>

Accept the Null

There is not enough evidence to say that men with a high school diploma make more money per hour than women.

S:

13.2 – Comparing Two Proportions

If we want to compare two populations or compare the responses to two treatments from independent samples, we look at a two-sample proportion:

21: ppHo 0: 21 ppHoor

Conditions for Proportion Interval:

SRS

Normality

IndependenceN 10(n1 + n2)The two samples are independent

1 1ˆ 5n p

1 1ˆ1 5n p 2 2ˆ 5n p

2 2ˆ1 5n p

2

22

1

1121

)ˆ1(ˆ)ˆ1(ˆ*ˆˆ

npp

nppzpp

Confidence Interval:

estimate test statistic sd

Conditions for Proportion Test:

SRS

Normality

IndependenceN 10(n1 + n2)The two samples are independent

1 ˆ 5Cn p

1 ˆ1 5Cn p 2 ˆ 5Cn p

2 ˆ1 5Cn p

count of success in both samplesˆcount of individuals from both samplesCp 1 2

1 2

x xn n

Hypothesis Test:

test statistic = estimate – hypothesized valuestandard deviation of statistic

1 2

1 2

ˆ ˆ

1 1ˆ ˆ(1 )c c

p pz

p pn n

Confidence Interval: Hypothesis Test

STAT-TESTS- 2-PropZInt STAT-TESTS- 2-PropZTest

Note: The only time you pool is when the standard deviations are the same. This almost never happens, so just don’t do it!

Calculator Tip!

Example #1An election is bitterly contested between two rivals. In a poll of 750 potential voters taken 4 weeks before the election, 420 indicated a preference for candidate Grumpy over candidate Dopey. Two weeks later, a new poll of 900 randomly selected potential voters found 465 who plan to vote for Grumpy. Dopey immediately began advertising that support for Grumpy was slipping drastically and that he was going to win the election. Statistically speaking (at the 0.05 level), how happy should Dopey be?

P: p1 = true proportion of people who want Grumpy to win in 1st poll

p2 = true proportion of people who want Grumpy to win in 1st poll

pD = p1 - p2 = Difference in proportion of people in 1st poll and second

21: ppHo 0: 21 ppHoor

1 2:AH p p 1 2: 0AH p p or

H:

SRS

Normality

1 ˆ 5Cn p

1 ˆ1 5Cn p

2 ˆ 5Cn p

2 ˆ1 5Cn p

count of success in both samplesˆcount of individuals from both samplesCp 1 2

1 2

x xn n

(Says in second one only. Must assume the first)

885 0.5361650

402.27 5(750)(0.536) 5

(750) 1 0.536 5 347.73 5

(900)(0.536) 5482.73 5

(900) 1 0.536 5

417.27 5

Independence

Safe to assume there were more than 10(750+900), or 16,500 voters

The first poll might have influenced the second poll, proceed with caution!

N: 2-PropZTest

T:

1 2

1 2

ˆ ˆ

1 1ˆ ˆ(1 )c c

p pz

p pn n

0.56 0.5167

1 10.536(1 0.536)750 900

0.04330.02466

1.7576

1.75

P(Z > 1.75) = 1 – P(Z < 1.75) =

O:

1.24

P(Z > 1.24) = 1 – P(Z < 1.24) = 1 – 0.9599 = 0.0401

Or, by calculator:

P(Z > 1.24) = 0.03941

M:

____ p 0.03941 0.05

<

Reject the Null

There is enough evidence to say that the proportion of voters that support Grumpy has dropped from the 1st poll to the second.

S:

Dopey should be very happy!

Example #2Two groups of 40 randomly selected students were selected to be part of a study on drop-out rates. One group was enrolled in a counseling program designed to give them skills needed to succeed in school and the other group received no special counseling. Fifteen of the students who received counseling dropped out of school, and 23 of the students who did not receive counseling dropped out. Construct a 90% confidence interval for the true difference between the drop-out rates of the two groups. P: pC = true proportion of students who drop out with

counseling

pN = true proportion of students who drop out without any counseling

pD = pC - pD = Difference in proportion of students who drop out with counseling vs. without

SRS

Normality

1 1ˆ 5n p

1 1ˆ1 5n p

2 2ˆ 5n p

2 2ˆ1 5n p

A: (says in both groups)

(40)(0.375) 5

15 5

40 1 0.375 5

25 5

40 0.575 5

40 1 0.575 5

23 5

17 5

Independence

Safe to assume there were more than 10(40+40), or 800 students

The drop out rate of the group with counseling might influence the group without counseling. Proceed with caution!

N: 2-PropZInt

I:

2

22

1

1121

)ˆ1(ˆ)ˆ1(ˆ*ˆˆ

npp

nppzpp

0.375(1 0.375) 0.575(1 0.575)0.375 0.575 1.64540 40

0.2 1.645 0.1094

0.3799, 0.0201

I am 90% confident the true difference in the proportion of dropouts with counseling vs. without counseling is between –0.3799 and –0.0201.

C:

It appears that drop out rates are lower with the group that got counseling than without it.

chapter 13: comparing two population parameters

Documents