t-t ests and a nalysis of v ariance jennifer kensler july 13, 2010 fralin auditorium, virginia tech...
DESCRIPTION
T-T ESTS AND A NALYSIS OF V ARIANCETRANSCRIPT
T-TESTS AND ANALYSIS OF VARIANCE
Jennifer KenslerJuly 13, 2010
Fralin Auditorium, Virginia Tech
This presentation is annotated. Please click on the numbered yellow squares for more information.
This course was part of the LISA Short Course Series. Please visit www.lisa.stat.vt.edu for more information
about LISA and past courses.
Laboratory for Interdisciplinary Statistical Analysis
Collaboration From our website request a meeting for personalized statistical adviceGreat advice right now:Meet with LISA before collecting your data
Short Courses Designed to help graduate students apply statistics in their research
Walk-In Consulting
Monday—Friday* 12-2PM for questions requiring <30 mins
*Mon—Thurs during the summer
All services are FREE for VT researchers. We assist with research—not class projects or homework.
LISA helps VT researchers benefit from the use of Statistics
www.lisa.stat.vt.edu
Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...)
T-TESTS AND ANALYSIS OF VARIANCE
ONE SAMPLE T-TEST
ONE SAMPLE T-TEST Used to test whether the population mean is
different from a specified value.
Example: Is the mean height of 12 year old girls different from 62 inches?
STEP 1: FORMULATE THE HYPOTHESES The population mean is not equal to a specified
value.H0: μ = μ0
Ha: μ ≠ μ0 The population mean is greater than a
specified value. H0: μ = μ0
Ha: μ > μ0 The population mean is less than a specified
value.H0: μ = μ0
Ha: μ < μ0
STEP 2: CHECK THE ASSUMPTIONS The sample is random.
The population from which the sample is drawn is either normal or the sample size is large.
STEPS 3-5 Step 3: Calculate the test statistic:
Where
Step 4: Calculate the p-value based on the appropriate alternative hypothesis.
Step 5: Write a conclusion.
nsyt
/0
11
2
n
yys
n
ii
IRIS EXAMPLE A researcher would like to know whether the
mean sepal width of a variety of irises is different from 3.5 cm.
The researcher randomly selects 50 irises and measures the sepal width.
Step 1: HypothesesH0: μ = 3.5 cmHa: μ ≠ 3.5 cm
JMP Steps 2-4:
JMP DemonstrationAnalyze DistributionY, Columns: Sepal Width
Normal Quantile Plot
Test MeanSpecify Hypothesized Mean: 3.5
JMP OUTPUT
Step 5 Conclusion: The mean sepal width is not significantly different from 3.5 cm.
TWO SAMPLE T-TEST
TWO SAMPLE T-TEST Two sample t-tests are used to determine
whether the population mean of one group is equal to, larger than or smaller than the population mean of another group.
Example: Is the mean cholesterol of people taking drug A lower than the mean cholesterol of people taking drug B?
STEP 1: FORMULATE THE HYPOTHESES The population means of the two groups are
not equal.H0: μ1 = μ2
Ha: μ1 ≠ μ2 The population mean of group 1 is greater than
the population mean of group 2.H0: μ1 = μ2
Ha: μ1 > μ2 The population mean of group 1 is less than
the population mean of group 2.H0: μ1 = μ2
Ha: μ1 < μ2
STEP 2: CHECK THE ASSUMPTIONS The two samples are random and
independent.
The populations from which the samples are drawn are either normal or the sample sizes are large.
The populations have the same standard deviation.
STEPS 3-5 Step 3: Calculate the test statistic
where
Step 4: Calculate the appropriate p-value. Step 5: Write a Conclusion.
21
21
11nn
s
yyt
p
2)1()1(
21
222
211
nn
snsnsp
TWO SAMPLE EXAMPLE A researcher would like to know whether the
mean sepal width of setosa irises is different from the mean sepal width of versicolor irises.
The researcher randomly selects 50 setosa irises and 50 versicolor irises and measures their sepal widths.
Step 1 Hypotheses:H0: μsetosa = μversicolor
Ha: μsetosa ≠ μversicolor
JMP Steps 2-4:
JMP Demonstration:Analyze Fit Y By XY, Response: Sepal WidthX, Factor: Species
Means/ANOVA/Pooled t
Normal Quantile Plot Plot Actual by Quantile
JMP OUTPUT
Step 5 Conclusion: There is strong evidence (p-value < 0.0001) that the mean sepal widths for the two varieties are different.
setosa
versicolor
-2.33 -1.64-1.28 -0.67 0.0 0.67 1.281.64 2.33
0.5
0.8
0.9
0.2
0.1
0.02
0.98
Normal Quantile
PAIRED T-TEST
PAIRED T-TEST The paired t-test is used to compare the
population means of two groups when the samples are dependent.
Example:A researcher would like to determine if background noise causes people to take longer to complete math problems. The researcher gives 20 subjects two math tests one with complete silence and one with background noise and records the time each subject takes to complete each test.
STEP 1: FORMULATE THE HYPOTHESES The population mean difference is not equal to
zero. H0: μdifference = 0 Ha: μdifference ≠ 0
The population mean difference is greater than zero. H0: μdifference = 0Ha: μdifference > 0
The population mean difference is less than a zero.H0: μdifference = 0Ha: μdifference < 0
STEP 2: CHECK THE ASSUMPTIONS The sample is random.
The data is matched pairs.
The differences have a normal distribution or the sample size is large.
STEPS 3-5
nsdtd /
0
Where d bar is the mean of the differences and sd is the standard deviations of the differences.
Step 4: Calculate the p-value.
Step 5: Write a conclusion.
Step 3: Calculate the test Statistic:
PAIRED T-TEST EXAMPLE A researcher would like to determine whether
a fitness program increases flexibility. The researcher measures the flexibility (in inches) of 12 randomly selected participants before and after the fitness program.
Step 1: Formulate a HypothesisH0: μAfter - Before = 0Ha: μ After - Before > 0
PAIRED T-TEST EXAMPLE Steps 2-4:
JMP Analysis:Create a new column of After – BeforeAnalyze DistributionY, Columns: After – Before
Normal Quantile Plot
Test MeanSpecify Hypothesized Mean: 0
JMP OUTPUT
Step 5 Conclusion: There is not evidence that the fitness program increases flexibility.
ONE-WAY ANALYSIS OF VARIANCE
ONE-WAY ANOVA ANOVA is used to determine whether three or
more populations have different distributions.
A B C
Medical Treatment
ANOVA STRATEGYThe first step is to use the ANOVA F test to determine if there are any significant differences among the population means. If the ANOVA F test shows that the population means are not all the same, then follow up tests can be performed to see which pairs of population means differ.
ONE-WAY ANOVA MODEL
i
ij
i
ij
ijiij
njri
N
y
y
,,1,,1
),0(~
groupith theofmean theis
levelfactor ith on the jth trial theof response theis Where
2
In other words, for each group the observed value is the group mean plus some random variation.
ONE-WAY ANOVA HYPOTHESIS Step 1: We test whether there is a difference
in the population means.
equal. allnot are The :: 210
ia
r
HH
STEP 2: CHECK ANOVA ASSUMPTIONS The samples are random and independent of
each other. The populations are normally distributed. The populations all have the same standard
deviations.
The ANOVA F test is robust to the assumptions of normality and equal standard deviations.
STEP 3: ANOVA F TEST
Compare the variation within the samples to the variation between the samples.
A B C A B C
Medical Treatment
ANOVA TEST STATISTIC
MSEMSG
Groupswithin Variation Groupsbetween Variation F
Variation within groups small compared with variation between groups → Large F
Variation within groups large compared with variation between groups → Small F
MSG
1-r)(n)(n)(n
1 -r SSGMSG
21r
222
211
yyyyyy
The mean square for groups, MSG, measures the variability of the sample averages.
SSG stands for sums of squares groups.
MSE
1
)(s
Wherer -n
1)s - (n1)s - (n 1)s - (nr -n
SSE MSE
1i
2rr
222
211
i
n
jiij
n
yyi
Mean square error, MSE, measures the variability within the groups.
SSE stands for sums of squares error.
STEPS 4-5 Step 4: Calculate the p-value.
Step 5: Write a conclusion.
ANOVA EXAMPLE A researcher would like to determine if three
drugs provide the same relief from pain. 60 patients are randomly assigned to a
treatment (20 people in each treatment).
Step 1: Formulate the HypothesesH0: μDrug A = μDrug B = μDrug C
Ha : The μi are not all equal.
STEPS 2-4 JMP demonstration
Analyze Fit Y By X Y, Response: Pain
X, Factor: Drug
Normal Quantile Plot Plot Actual by Quantile
Means/ANOVA
JMP OUTPUT AND CONCLUSION
Step 5 Conclusion: There is strong evidence that the drugs are not all the same.
50
55
60
65
70
75
Pai
n
Drug A Drug B Drug CDrug
Drug ADrug BDrug C
-2.33 -1.64-1.28 -0.67 0.0 0.67 1.281.64 2.33
0.5
0.8
0.9
0.2
0.1
0.02
0.98
Normal Quantile
FOLLOW-UP TEST The p-value of the overall F test indicates
that the level of pain is not the same for patients taking drugs A, B and C.
We would like to know which pairs of treatments are different.
One method is to use Tukey’s HSD (honestly significant differences).
TUKEY TESTS Tukey’s test simultaneously tests
JMP demonstrationOneway Analysis of Pain By Drug Compare Means All Pairs, Tukey HSD
'a
'0
:H:H
ii
ii
for all pairs of factor levels. Tukey’s HSD controls the overall type I error.
JMP OUTPUT
The JMP output shows that drugs A and C are significantly different.
Drug CDrug CDrug B
LevelDrug ADrug BDrug A
- Level5.8500003.6000002.250000
Difference1.6776651.6776651.677665
Std Err Dif1.81283
-0.43717-1.78717
Lower CL9.8871737.6371736.287173
Upper CL0.0027*0.08970.3786
p-Value
TWO-WAY ANALYSIS OF VARIANCE
TWO-WAY ANOVA We are interested in the effect of two
categorical factors on the response. We are interested in whether either of the
two factors have an effect on the response and whether there is an interaction effect. An interaction effect means that the effect on
the response of one factor depends on the level of the other factor.
INTERACTION
Low High Factor A
Resp
onse
No Interaction
Factor B Low Factor B High
Low High Factor A
Resp
onse
Interaction
Factor B Low Factor B High
TWO-WAY ANOVA MODEL
ij
ijk
ij
j
i
ijk
ijkijjiijk
nkbjai
N
y
y
,...,1,,1,,1
),0(~
Bfactor of leveljth theandA factor of levelith theofeffect n interactio theis )(
Bfactor of leveljth theofeffect main theis Afactor of levelith theofeffect main theis
mean overall theis
level Bfactor jth theand levelA factor ith on the kth trial theof response theis Where
)(
2
TWO-WAY ANOVA EXAMPLE We would like to determine the effect of two
alloys (low, high) and three cooling temperatures (low, medium, high) on the strength of a wire.
JMP demonstrationAnalyze Fit ModelY: StrengthHighlight Alloy and Temp and click Macros Factorial to DegreeRun Model
JMP OUTPUT
Conclusion: There is strong evidence of an interaction between alloy and temperature.
ANALYSIS OF COVARIANCE
ANALYSIS OF COVARIANCE (ANCOVA) Covariates are variables that may affect the
response but cannot be controlled. Covariates are not of primary interest to the
researcher. We will look at an example with two
covariates, the model is
ijiijy covariates
ANCOVA EXAMPLE Consider the one-way ANOVA example where
we tested whether the patients receiving different drugs reported different levels of pain. Perhaps age and gender may influence the pain. We can use age and gender as covariates.
JMP INSTRUCTIONS JMP demonstration
Analyze Fit ModelY: PainAdd: Drug Age
GenderRun Model
Response Pain Estimates Show PredictionExpression
JMP OUTPUT
Drug and age had significant effects on pain, but gender did not.
CONCLUSION The one sample t-test allows us to test
whether the population mean of a group is equal to a specified value.
The two-sample t-test and paired t-test allow us to determine if the population means of two groups are different.
ANOVA and ANCOVA methods allow us to determine whether the population means of several groups are different.
SAS, SPSS AND R For information about using SAS, SPSS and R
to do ANOVA:
http://www.ats.ucla.edu/stat/sas/topics/anova.htmhttp://www.ats.ucla.edu/stat/spss/topics/anova.htmhttp://www.ats.ucla.edu/stat/r/sk/books_pra.htm
REFERENCES Fisher’s Irises Data (used in one sample and
two sample t-test examples).
Flexibility data (paired t-test example):Michael Sullivan III. Statistics Informed Decisions Using Data. Upper Saddle River, New Jersey: Pearson Education, 2004: 602.