t-t ests and a nalysis of v ariance jennifer kensler july 13, 2010 fralin auditorium, virginia tech...

T-TESTS AND ANALYSIS OF VARIANCE

Jennifer KenslerJuly 13, 2010

Fralin Auditorium, Virginia Tech

This presentation is annotated. Please click on the numbered yellow squares for more information.

This course was part of the LISA Short Course Series. Please visit www.lisa.stat.vt.edu for more information

about LISA and past courses.

http://www.lisa.stat.vt.edu/

Laboratory for Interdisciplinary Statistical Analysis

Collaboration From our website request a meeting for personalized statistical adviceGreat advice right now:Meet with LISA before collecting your data

Short Courses Designed to help graduate students apply statistics in their research

Walk-In Consulting

Monday—Friday* 12-2PM for questions requiring <30 mins

*Mon—Thurs during the summer

All services are FREE for VT researchers. We assist with research—not class projects or homework.

LISA helps VT researchers benefit from the use of Statistics

www.lisa.stat.vt.edu

Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...)

T-TESTS AND ANALYSIS OF VARIANCE

ONE SAMPLE T-TEST

ONE SAMPLE T-TEST Used to test whether the population mean is

different from a specified value.

Example: Is the mean height of 12 year old girls different from 62 inches?

JK

The population includes all individuals of interest. A sample is the individuals about which you actually have data.

JK

In this case the population is all 12 year old girls. We could measure the height of all 12 year old girls in order to determine their mean height. However, this strategy is impractical. Instead we would take a sample of say 100 12 year old girls and measure their heights. Even if the mean height of all 12 year old girls is 62 inches, the mean of the sample won't be. Let's say we take a sample and the mean is 63 inches. The one sample t-test answers the question of whether the sample mean of 63 inches was likely observed by chance or is evidence that the population mean height is different from 62 inches.

STEP 1: FORMULATE THE HYPOTHESES The population mean is not equal to a specified

value.H0: μ = μ0

Ha: μ ≠ μ0 The population mean is greater than a

specified value. H0: μ = μ0

Ha: μ > μ0 The population mean is less than a specified

value.H0: μ = μ0

Ha: μ < μ0

JK

There are two hypotheses that are part of the hypothesis test. The null hypothesis, H0, and the alternative hypothesis, Ha. The null hypothesis is generally what is accepted by convention. The alternative hypothesis is what the researcher is trying to show. Court Room Analogy:H0: Defendant is innocentHa: Defendant is guiltyIn court the burden of proof is on the prosecution, likewise in hypothesis testing the burden of proof is on the alternative hypothesis. We will not reject the null hypothesis unless we have strong evidence (innocent until proven guilty).

JK

There are three basic formats for hypothesis tests: the two-sided test and one-sided tests. In the example with the heights of the 12 year old girls we haveH0: mu = 62Ha: mu not equal to 62

STEP 2: CHECK THE ASSUMPTIONS The sample is random.

The population from which the sample is drawn is either normal or the sample size is large.

JK

Whenever we perform a hypothesis test we must make sure that the assumptions of the test are met. If the assumptions are not met the results may be meaningless.

JK

Suppose there is a class of 100 students and we would like to select a random sample of 10 students. One way to do this randomly would be to write each student's name on a piece of paper, put the papers in a hat and pick 10 names out of the hat. A bad way to select a sample would be to pick the first 10 students who raise their hands.

JK

One rule of thumb is that the one sample t-test is appropriate if the sample size is at least 30.

STEPS 3-5 Step 3: Calculate the test statistic:

Where

Step 4: Calculate the p-value based on the appropriate alternative hypothesis.

Step 5: Write a conclusion.

nsyt

/0

11

2

n

yys

n

ii

JK

For the one sample t-test the test statistic is the number of standard deviations the sample mean is from the hypothesized population mean.

JK

The p-value is the probability (assuming the null hypothesis is true) of obtaining results at least as extreme as those observed.Traditionally if the p-value is less than 0.05 the null hypothesis is rejected. If the p-value is greater than 0.05 we fail to reject the null hypothesis.

JK

The sample standard deviation, s, is a measure of spread. A larger standard deviation implies a larger spread.

JK

ybar is the sample meanmu_0 is the population mean under the null hypothesisn is the sample size

JK

Either reject the null hypothesis (p-value less than 0.05) or fail to reject the null hypothesis (p-value greater than 0.05).

IRIS EXAMPLE A researcher would like to know whether the

mean sepal width of a variety of irises is different from 3.5 cm.

The researcher randomly selects 50 irises and measures the sepal width.

Step 1: HypothesesH0: μ = 3.5 cmHa: μ ≠ 3.5 cm

JMP Steps 2-4:

JMP DemonstrationAnalyze DistributionY, Columns: Sepal Width

Normal Quantile Plot

Test MeanSpecify Hypothesized Mean: 3.5

JK

JMP data sets can be found at www.lisa.stat.vt.edu/?q=node/906.

JMP OUTPUT

Step 5 Conclusion: The mean sepal width is not significantly different from 3.5 cm.

JK

The normality assumption is appropriate if most of the points on the Normal Quantile Plot fall within the dashed red lines.

TWO SAMPLE T-TEST

TWO SAMPLE T-TEST Two sample t-tests are used to determine

whether the population mean of one group is equal to, larger than or smaller than the population mean of another group.

Example: Is the mean cholesterol of people taking drug A lower than the mean cholesterol of people taking drug B?

STEP 1: FORMULATE THE HYPOTHESES The population means of the two groups are

not equal.H0: μ1 = μ2

Ha: μ1 ≠ μ2 The population mean of group 1 is greater than

the population mean of group 2.H0: μ1 = μ2

Ha: μ1 > μ2 The population mean of group 1 is less than

the population mean of group 2.H0: μ1 = μ2

Ha: μ1 < μ2

JK

There are three basic formats for the hypothesis test: the two-sided test and one-sided tests.

STEP 2: CHECK THE ASSUMPTIONS The two samples are random and

independent.

The populations from which the samples are drawn are either normal or the sample sizes are large.

The populations have the same standard deviation.

JK

Example of two independent samples: In a clinical trial to compare two cholesterol medications half the participants are randomly assigned to receive medication A and half are randomly assigned to receive medication B.Example of dependent samples: All participants receive both drugs (at different times). These samples are dependent because the two samples contain the same subjects.

JK

The two sample t-test is robust to differences between the standard deviations. As long as one of the sample standard deviations is not more than twice the other, the two-sample t-test should be okay.

STEPS 3-5 Step 3: Calculate the test statistic

where

Step 4: Calculate the appropriate p-value. Step 5: Write a Conclusion.

21

21

11nn

s

yyt

p

2)1()1(

21

222

211

nn

snsnsp

JK

s_p is the pooled standard deviation

JK

y_1bar and y_2bar are the means of samples oneand 2 respectively.n_1 and n_2 are the samples sizes of samples oneand two respectively.

TWO SAMPLE EXAMPLE A researcher would like to know whether the

mean sepal width of setosa irises is different from the mean sepal width of versicolor irises.

The researcher randomly selects 50 setosa irises and 50 versicolor irises and measures their sepal widths.

Step 1 Hypotheses:H0: μsetosa = μversicolor

Ha: μsetosa ≠ μversicolor

JMP Steps 2-4:

JMP Demonstration:Analyze Fit Y By XY, Response: Sepal WidthX, Factor: Species

Means/ANOVA/Pooled t

Normal Quantile Plot Plot Actual by Quantile

JMP OUTPUT

Step 5 Conclusion: There is strong evidence (p-value < 0.0001) that the mean sepal widths for the two varieties are different.

setosa

versicolor

-2.33 -1.64-1.28 -0.67 0.0 0.67 1.281.64 2.33

0.5

0.8

0.9

0.2

0.1

0.02

0.98

Normal Quantile

JK

If the points follow the lines then the normality assumption is appropriate. If the two lines are relatively parallel, then the assumption of equal standard deviations is appropriate.

PAIRED T-TEST

PAIRED T-TEST The paired t-test is used to compare the

population means of two groups when the samples are dependent.

Example:A researcher would like to determine if background noise causes people to take longer to complete math problems. The researcher gives 20 subjects two math tests one with complete silence and one with background noise and records the time each subject takes to complete each test.

STEP 1: FORMULATE THE HYPOTHESES The population mean difference is not equal to

zero. H0: μdifference = 0 Ha: μdifference ≠ 0

The population mean difference is greater than zero. H0: μdifference = 0Ha: μdifference > 0

The population mean difference is less than a zero.H0: μdifference = 0Ha: μdifference < 0

JK

Idea of the paired t-test:Calculate the difference between each pair.If there is no difference between the two populations then the differences should be close to zero.A one sample t-test on the differences is performed.

JK

Is the population mean of the differences different from zero?

STEP 2: CHECK THE ASSUMPTIONS The sample is random.

The data is matched pairs.

The differences have a normal distribution or the sample size is large.

JK

Examples of matched pairs data: Examining participant's weights before and after a diet program. Testing the vision of the left eye and right eye for all participants.

JK

The subjects chosen to participate are randomly selected from the population.

STEPS 3-5

nsdtd /

0

Where d bar is the mean of the differences and sd is the standard deviations of the differences.

Step 4: Calculate the p-value.


Step 3: Calculate the test Statistic:

JK

n is the number of pairs.

PAIRED T-TEST EXAMPLE A researcher would like to determine whether

a fitness program increases flexibility. The researcher measures the flexibility (in inches) of 12 randomly selected participants before and after the fitness program.

Step 1: Formulate a HypothesisH0: μAfter - Before = 0Ha: μ After - Before > 0

PAIRED T-TEST EXAMPLE Steps 2-4:

JMP Analysis:Create a new column of After – BeforeAnalyze DistributionY, Columns: After – Before

Normal Quantile Plot

Test MeanSpecify Hypothesized Mean: 0

JMP OUTPUT

Step 5 Conclusion: There is not evidence that the fitness program increases flexibility.

ONE-WAY ANALYSIS OF VARIANCE

ONE-WAY ANOVA ANOVA is used to determine whether three or

more populations have different distributions.

A B C

Medical Treatment

JK

The one-way ANOVA is an extension of the two-sample t-test. We test whether the population means of 3 or more groups are different.

ANOVA STRATEGYThe first step is to use the ANOVA F test to determine if there are any significant differences among the population means. If the ANOVA F test shows that the population means are not all the same, then follow up tests can be performed to see which pairs of population means differ.

ONE-WAY ANOVA MODEL

i

ij

i

ij

ijiij

njri

N

y

y

,,1,,1

),0(~

groupith theofmean theis

levelfactor ith on the jth trial theof response theis Where

2

In other words, for each group the observed value is the group mean plus some random variation.

JK

r is the number of groupsn_i is the sample size of the ith group

ONE-WAY ANOVA HYPOTHESIS Step 1: We test whether there is a difference

in the population means.

equal. allnot are The :: 210

ia

r

HH

STEP 2: CHECK ANOVA ASSUMPTIONS The samples are random and independent of

each other. The populations are normally distributed. The populations all have the same standard

deviations.

The ANOVA F test is robust to the assumptions of normality and equal standard deviations.

STEP 3: ANOVA F TEST

Compare the variation within the samples to the variation between the samples.

A B C A B C

Medical Treatment

ANOVA TEST STATISTIC

MSEMSG

Groupswithin Variation Groupsbetween Variation F

Variation within groups small compared with variation between groups → Large F

Variation within groups large compared with variation between groups → Small F

MSG

1-r)(n)(n)(n

1 -r SSGMSG

21r

222

211

yyyyyy

The mean square for groups, MSG, measures the variability of the sample averages.

SSG stands for sums of squares groups.

JK

Mean of sample 1.

JK

Mean of observations from all groups.

MSE

1

)(s

Wherer -n

1)s - (n1)s - (n 1)s - (nr -n

SSE MSE

1i

2rr

222

211

i

n

jiij

n

yyi

Mean square error, MSE, measures the variability within the groups.

SSE stands for sums of squares error.

JK

s_i is the standard deviation for the ith group.

STEPS 4-5 Step 4: Calculate the p-value.


ANOVA EXAMPLE A researcher would like to determine if three

drugs provide the same relief from pain. 60 patients are randomly assigned to a

treatment (20 people in each treatment).

Step 1: Formulate the HypothesesH0: μDrug A = μDrug B = μDrug C

Ha : The μi are not all equal.

STEPS 2-4 JMP demonstration

Analyze Fit Y By X Y, Response: Pain

X, Factor: Drug

Normal Quantile Plot Plot Actual by Quantile

Means/ANOVA

JMP OUTPUT AND CONCLUSION

Step 5 Conclusion: There is strong evidence that the drugs are not all the same.

50

55

60

65

70

75

Pai

n

Drug A Drug B Drug CDrug

Drug ADrug BDrug C

-2.33 -1.64-1.28 -0.67 0.0 0.67 1.281.64 2.33

0.5

0.8

0.9

0.2

0.1

0.02

0.98

Normal Quantile

JK

If the points follow the line the normality assumption is appropriate. If the lines are more or less parallel the assumption of equal standard deviations is appropriate. This normal quantile plot indicates that the assumptions of normality and equal standard deviations are appropriate.

FOLLOW-UP TEST The p-value of the overall F test indicates

that the level of pain is not the same for patients taking drugs A, B and C.

We would like to know which pairs of treatments are different.

One method is to use Tukey’s HSD (honestly significant differences).

TUKEY TESTS Tukey’s test simultaneously tests

JMP demonstrationOneway Analysis of Pain By Drug Compare Means All Pairs, Tukey HSD

'a

'0

:H:H

ii

ii

for all pairs of factor levels. Tukey’s HSD controls the overall type I error.

JK

A type I error is rejecting the null hypothesis even though the null hypothesis is true.

JMP OUTPUT

The JMP output shows that drugs A and C are significantly different.

Drug CDrug CDrug B

LevelDrug ADrug BDrug A

- Level5.8500003.6000002.250000

Difference1.6776651.6776651.677665

Std Err Dif1.81283

-0.43717-1.78717

Lower CL9.8871737.6371736.287173

Upper CL0.0027*0.08970.3786

p-Value

TWO-WAY ANALYSIS OF VARIANCE

TWO-WAY ANOVA We are interested in the effect of two

categorical factors on the response. We are interested in whether either of the

two factors have an effect on the response and whether there is an interaction effect. An interaction effect means that the effect on

the response of one factor depends on the level of the other factor.

JK

In the previous example we compared three drugs. We may also be interested in determining the effect of two doses.

INTERACTION

Low High Factor A

Resp

onse

No Interaction

Factor B Low Factor B High

Low High Factor A

Resp

onse

Interaction

Factor B Low Factor B High

JK

No Interaction:The effect of changing factor A from low to high is the same for both levels of factor B. (The lines are parallel). Wire Example: Suppose factor A is the amount of an alloy used to make a wire, factor B is the cooling temperature and the response is the strength of the wire. No interaction means that increasing the amount of alloy has the same effect on the strength of the wire regardless of whether the low or high temperature is used.

JK

Interaction:The effect of changing factor A from low to high depends on the level of factor B. (The lines are not parallel). Wire example: The interaction plot says that the effect on wire strength of changing from the low amount of the alloy to the high amount of the alloy depends on whether the temperature is at the high level or at the low level.

TWO-WAY ANOVA MODEL

ij

ijk

ij

j

i

ijk

ijkijjiijk

nkbjai

N

y

y

,...,1,,1,,1

),0(~

Bfactor of leveljth theandA factor of levelith theofeffect n interactio theis )(

Bfactor of leveljth theofeffect main theis Afactor of levelith theofeffect main theis

mean overall theis

level Bfactor jth theand levelA factor ith on the kth trial theof response theis Where

)(

2

JK

a is the number of levels of factor A.b is the number of levels of factor B.n_ij is the sample size for the group with factor Aat level i and factor B at level j.

TWO-WAY ANOVA EXAMPLE We would like to determine the effect of two

alloys (low, high) and three cooling temperatures (low, medium, high) on the strength of a wire.

JMP demonstrationAnalyze Fit ModelY: StrengthHighlight Alloy and Temp and click Macros Factorial to DegreeRun Model

JMP OUTPUT

Conclusion: There is strong evidence of an interaction between alloy and temperature.

JK

This p-value says that the overall model is significant.

JK

These p-values indicate which effects are significant.

ANALYSIS OF COVARIANCE

ANALYSIS OF COVARIANCE (ANCOVA) Covariates are variables that may affect the

response but cannot be controlled. Covariates are not of primary interest to the

researcher. We will look at an example with two

covariates, the model is

ijiijy covariates

ANCOVA EXAMPLE Consider the one-way ANOVA example where

we tested whether the patients receiving different drugs reported different levels of pain. Perhaps age and gender may influence the pain. We can use age and gender as covariates.

JK

In this case we consider age and gender as covariates because we are not primarily interested in them. We are primarily interested in the drugs, but feel that we need to account for the effects of age and gender.

JMP INSTRUCTIONS JMP demonstration

Analyze Fit ModelY: PainAdd: Drug Age

GenderRun Model

Response Pain Estimates Show PredictionExpression

JMP OUTPUT

Drug and age had significant effects on pain, but gender did not.

CONCLUSION The one sample t-test allows us to test

whether the population mean of a group is equal to a specified value.

The two-sample t-test and paired t-test allow us to determine if the population means of two groups are different.

ANOVA and ANCOVA methods allow us to determine whether the population means of several groups are different.

SAS, SPSS AND R For information about using SAS, SPSS and R

to do ANOVA:

http://www.ats.ucla.edu/stat/sas/topics/anova.htmhttp://www.ats.ucla.edu/stat/spss/topics/anova.htmhttp://www.ats.ucla.edu/stat/r/sk/books_pra.htm

http://www.ats.ucla.edu/stat/sas/topics/anova.htm

http://www.ats.ucla.edu/stat/sas/topics/anova.htm

http://www.ats.ucla.edu/stat/spss/topics/anova.htm

http://www.ats.ucla.edu/stat/spss/topics/anova.htm

http://www.ats.ucla.edu/stat/r/sk/books_pra.htm

http://www.ats.ucla.edu/stat/r/sk/books_pra.htm

REFERENCES Fisher’s Irises Data (used in one sample and

two sample t-test examples).

Flexibility data (paired t-test example):Michael Sullivan III. Statistics Informed Decisions Using Data. Upper Saddle River, New Jersey: Pearson Education, 2004: 602.

t-t ests and a nalysis of v ariance jennifer kensler july 13, 2010 fralin auditorium, virginia tech...

Documents