test of a hypothesis about µ when value of sigma is … · web viewbiderman’s p2010 one way...

24
PSY 2010 Corty - Ch 10: Analysis of Variance Situation We wish to compare the means of THREE or more groups. Example. Suppose we are studying the effects of three different antibiotics for treatment of C. Difficile infection. Ironically, C. Difficile (C.Diff) is most often caused by the patient having taken an antibiotic for some other condition. That antibiotic killed the bacteria that normally keep C.Diff in check. But the appropriate treatment is another antibiotic – one that targets C.Diff. The issue is, “Which one?” Suppose that three C.Diff-targeting antibiotics have been proposed by different pharmaceutical companies – A, B, and C. Suppose that a small scale study is proposed to see if there are any large differences in the effects of the three on the number of C.Diff bacteria. Thirty patients each are identified at a group of hospitals, all of whom have been diagnosed with C.Diff. The first 10 patients are given antibiotic A. The second group of 10 is given antibiotic B. The third group of 10, you guessed it, is given antibiotic C. After 14 days, let’s suppose that a standardized count of number of bacteria present is taken from each patient. This Biderman’s P2010 One Way Analysis of Variance - 1 7/6/2022

Upload: others

Post on 11-Jan-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

PSY 2010 Corty - Ch 10: Analysis of Variance

Situation

We wish to compare the means of THREE or more groups.

Example.

Suppose we are studying the effects of three different antibiotics for treatment of C. Difficile infection.Ironically, C. Difficile (C.Diff) is most often caused by the patient having taken an antibiotic for some other condition. That antibiotic killed the bacteria that normally keep C.Diff in check. But the appropriate treatment is another antibiotic – one that targets C.Diff. The issue is, “Which one?”

Suppose that three C.Diff-targeting antibiotics have been proposed by different pharmaceutical companies – A, B, and C. Suppose that a small scale study is proposed to see if there are any large differences in the effects of the three on the number of C.Diff bacteria.

Thirty patients each are identified at a group of hospitals, all of whom have been diagnosed with C.Diff. The first 10 patients are given antibiotic A. The second group of 10 is given antibiotic B. The third group of 10, you guessed it, is given antibiotic C.

After 14 days, let’s suppose that a standardized count of number of bacteria present is taken from each patient. This standardized count is on a scale of 0 to 100, with 0 representing complete absence of the C.Diff and 100 representing the greatest proportion of C.Diff possible. (The actual measures taken are more complicated than this.) Note that for many people there are always C.Diff bacteria present. The issue is that for most people there are not enough present to cause difficulty. They are kept in check by other, non-harmful bacteria. So the goal of treatment is to get the count of C.Diff down sufficiently that the C.Diff will not create future problems.

The hypothetical data are presented here. . . (The red bars were added to help you identify the groups.) Suppose that the average of the count variable was 60 prior to treatment.

Biderman’s P2010 One Way Analysis of Variance - 1 5/16/2023

Page 2: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

The inferential situation

Group 1 is a sample from a population of persons with C.Diff who could have been given antibiotic A.

Group 2 is a sample from a population who could have been given B.

Group 3 is a sample from the antibiotic C population.

First question to answer

Are the means of the count variable equal in the three populations.

We begin with the null: Means of the three populations are equal.

Our alternative is: Means of the three populations are not equal.

Note: The null, as always, is about the populations, not the sample.

Implications of the hypothesis test.

If the population means are not different, the implication is that any of the antibiotics will work just as well as either of the others.

But if the null is rejected, then there are differences in the efficacy of the antibiotics.

Biderman’s P2010 One Way Analysis of Variance - 2 5/16/2023

Page 3: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

Test Statistic: F Statistic

Equal sample size formula

Common Sample size * Variance of MeansF = -----------------------------------------------------

Mean of Sample variances

where n = common sample sizeK = No. of means being comparedS2

X-bar = Variance of sample means. S2

i = Variance of scores within group i

Unequal Sample Size formula

where ni = No. of scores in group i

N = n1 + n2 + . . . + nK = Total no. of scores observed.X-bar- = Mean of all the N scores.Numerator df = K - 1Denominator df = N - K

Luckily, we will not have to compute any of these by hand. We will have the computer do it for us.

Biderman’s P2010 One Way Analysis of Variance - 3 5/16/2023

Page 4: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

More Than You Ever Wanted to Know about F

The F statistic compares the variability of the sample means with the variability of individual scores within the samples.

Because it is a comparison of variability, it’s called the Analysis Of Variance, or ANOVA.

ANOVA was first used by Ronald Fisher, a British Mathematician, in the 1930s.

The theory underlying F is beautiful. But it requires far more knowledge of mathematics than necessary for this course. So we’ll skip the theory for this semester.

Values of Fexpected if the Null Hypothesis is true

The F statistic can take on only positive values. So if you see a negative value of F, something is wrong.

If the null hypothesis of no difference in population means is true, the value of F should be about equal to 1.

Values of F expected if the Null is false

If the null is false, F should be larger than 1.

After the fact (Post hoc) tests conducted if the null is false.

If the null is false, a natural question to ask is, “Well, if the means are not equal, which means are different from which?”.

This question has led statisticans to develop what are called Post Hoc tests.

These tests are carried out and referred to when the null hypothesis has been rejected.

Obviously, if the null (that the population means are equal) is retained, there is no need to ask, “Which means are different from which?” because they’re NOT different.

Biderman’s P2010 One Way Analysis of Variance - 4 5/16/2023

Page 5: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

Working out our problem in SPSS . . .Recall the data . . .

Count is the standardized count of number of C.Diff bacteria after 14 days.

Condit is the antibiotic condition

1=A 2=B 3=C

There are 10 patients per condition.

Biderman’s P2010 One Way Analysis of Variance - 5 5/16/2023

Page 6: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

The One-Way ANOVA dialog box.

There are a TON of Post Hoc tests from which to choose.

I prefer the Tukey’s-b test. We’ll use that here.

I’ll ask you to use Tukey’s-b for all of your submissions to me.

Biderman’s P2010 One Way Analysis of Variance - 6 5/16/2023

Page 7: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

Options you should take . . .

Always take the opportunity to get

1) Descriptive statistics, and 2) a visual display of your analysis.

The resultsDescriptives

count

N Mean Std. Deviation Std. Error

95% Confidence Interval for Mean

Minimum MaximumLower Bound Upper Bound

1 A 10 9.00 3.266 1.033 6.66 11.34 4 14

2 B 10 15.10 3.604 1.140 12.52 17.68 9 19

3 C 10 17.50 5.662 1.790 13.45 21.55 8 24

Total 30 13.87 5.526 1.009 11.80 15.93 4 24

Biderman’s P2010 One Way Analysis of Variance - 7 5/16/2023

Page 8: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

ANOVA

count

Sum of Squares df Mean Square F Sig.

Between Groups 384.067 2 192.033 10.341 .000

Within Groups 501.400 27 18.570

Total 885.467 29

The F value is MUCH larger than 1, suggesting that the null is false.

The p-value is zero to 3 decimals places, much less than .050.

So the chances of getting such large differences between sample means if the population means

were equial are nearly 0.

This suggests we should reject the null hypothesis.

Post Hoc Tests

Homogeneous Subsetscount

Tukey Ba

condit N

Subset for alpha = 0.05

1 2

1 A 10 9.00

2 B 10 15.10

3 C 10 17.50

Means for groups in homogeneous subsets are displayed.

a. Uses Harmonic Mean Sample Size = 10.000.Means Plots

Biderman’s P2010 One Way Analysis of Variance - 8 5/16/2023

Reading the Post Hoc results . . .

1. Means of groups in different columns are significantly different.

2. Means of groups in the same column are NOT significantly different.

So the mean, 9.00, is significantly different from 15.10 and from 17.50.

But 15.10 and 17.50 are NOT significantly different from each other.

So it appears that antibiotic A works best.

Page 9: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

Working out our problem in Excel . . .

Excel does NOT follow the convention used by all other statistical packages that all values to be analyzed are in the same column. Instead, it’s easiest in Excel to put the values in adjacent columns of the Excel Spreadsheet . . .

The Excel Results . . .

Note that no Post Hoc tests are available in Excel.

Biderman’s P2010 One Way Analysis of Variance - 9 5/16/2023

Page 10: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

Completing the Corty Hypothesis Testing Answer Sheet . . .

Give the name and the formula of the test statistic that will be employed to test the null hypothesis.

One-Way Analysis of Variance

Check the assumptions of the test

Distributions appear to be approximately US within each group.

Null Hypothesis:________________________________________________________________

Alternative Hypothesis:______________________________________________________________

What significance level will you use to separate "likely" value from "unlikely" values of the test statistic?

Significance Level = _________________.05_______________________________________

What is the value of the test statistic computed from your data and the p-value?F = 10.341 p-value = .000 (from SPSS output) f

What is your conclusion? Do you reject or not reject the null hypothesis?

Reject the null. p-value is less than .050.

What are the upper and lower limits of a 95% confidence interval appropriate for the problem? Present them in a sentence, with standard interpretive language.

Confidence intervals are not required for problems involving 3 or more populations.

State the implications of your conclusion for the problem you were asked to solve. That is, relate your statistical conclusion to the problem.

There are significant differences in mean bacteria counts between the three antibiotics.

Results of Post Hoc tests suggest that antibiotic A works best.

Biderman’s P2010 One Way Analysis of Variance - 10 5/16/2023

Means of the three populations are equal.

Mean of the three populations are not equal

Page 11: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

One Way Analysis of Variance: Second Worked Out Example

Problem

A professor teaches the same class to students from three different populations. The first is a population of "regular" day students. The second is a population of students attending at night. The third is a population of students working for a large corporation and meeting in a room provided by the corporation. The same test is given to all three classes. The professor wonders whether the mean final exam performance of students in the three populations will be equal.

Statement of Hypotheses

H0: µ1 = µ2 = µ3.H1: At least 1 inequality.

Test statistic

F statistic for the One-Way Analysis of Variance.

Data

Regular: 58 69 67 80 91 86 94Night: 79 89 93 96 83 90 99 Corporate: 72 85 89 75 79 80 94

Summary statistics

Group Mean SDRegular 77.86 13.51Night 89.86 7.03Corporate 82.00 7.79

Variance of the sample means is 6.0952 = 37.149

Conclusion, worked out by hand. (Children – don’t try this at home.)

7*6.0952 260.043 F = ------------------------------ = ---------------------------- = 2.666 (13.5082+7.0342+7.7892) --------------------------- 97.537 3

The following shows how SPSS was used to conduct the analysis.

The SPSS output reports the p-value associated with the F statistic.

Biderman’s P2010 One Way Analysis of Variance - 11 5/16/2023

Page 12: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

One way analysis of variance using SPSSAnalyze -> Compare Means -> One-Way ANOVA

Biderman’s P2010 One Way Analysis of Variance - 12 5/16/2023

Click on the Options button to open the Options Dialog box.

Put the name of the variable being analyzed (the dependent variable) in this box.

Put the name of the variable which designates the groups being compared in this box.

Page 13: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

Oneway

7 77. 8571 13. 5084 5. 1057 65. 3640 90. 3502 58. 00 94. 00

7 89. 8571 7. 0339 2. 6586 83. 3519 96. 3624 79. 00 99. 00

7 82. 0000 7. 7889 2. 9439 74. 7965 89. 2035 72. 00 94. 00

21 83. 2381 10. 6673 2. 3278 78. 3824 88. 0938 58. 00 99. 00

1. 00 Regular

2. 00 Night

3. 00 Cor por at e

Tot al

CLASSTypeof c lass

SCO REN M ean

St d.Deviat ion

St d.Er r or

LowerBound

UpperBound

95% Conf idenceI nt er val f or M ean

M inim um M axim um

Descr i pt i ves

5 2 0 .0 9 5 2 2 6 0 .0 4 8 2 .6 6 6 .0 9 7

1 7 5 5 .7 1 4 1 8 9 7 .5 4 0

2 2 7 5 .8 1 0 2 0

Be twe e n Gro u p s

W i th i n Gro u p s

T o ta l

SCORE

Su m o fSq u a re s d f

Me a nSq u a re F Si g .

ANOVA

Means Plot

Biderman’s P2010 One Way Analysis of Variance - 13 5/16/2023

The F statistic is larger than 1, but the p-value says that we could have gotten an F that big by chance alone.

So we’ll retain the null hypothesis of no-differences between the population means.

The plot makes it appear as if there are huge differences between the means.

But the authors of the plotting algorithm adjust the vertical axis scale to always make the graph fill the plot. So these apparently huge differences are not significant.

Page 14: Test of a hypothesis about µ when value of sigma is … · Web viewBiderman’s P2010 One Way Analysis of Variance - 1110/27/2015 PSY 2010 Corty - Ch 10: Analysis of Variance Situation

Completing the Corty Hypothesis Testing Answer Sheet . . .

Give the name and the formula of the test statistic that will be employed to test the null hypothesis.

One-Way Analysis of Variance

Check the assumptions of the test

Distributions appear to be approximately US within each group.

Null Hypothesis:________________________________________________________________

Alternative Hypothesis:______________________________________________________________

What significance level will you use to separate "likely" value from "unlikely" values of the test statistic?

Significance Level = _________________.05_______________________________________

What is the value of the test statistic computed from your data and the p-value?F = 2.666 p-value = .097 (from SPSS output) f

What is your conclusion? Do you reject or not reject the null hypothesis?

Retain the null. p-value is larger than .050.

What are the upper and lower limits of a 95% confidence interval appropriate for the problem? Present them in a sentence, with standard interpretive language.

Confidence intervals are not required for problems involving 3 or more populations.

State the implications of your conclusion for the problem you were asked to solve. That is, relate your statistical conclusion to the problem.

There are no significant differences in means of scores of the three groups of students.

No Post Hoc tests were computed because there were no significant differences.

Biderman’s P2010 One Way Analysis of Variance - 14 5/16/2023

Means of the three populations are equal.

Mean of the three populations are not equal