locating variance: post-hoc tests dr james betts developing study skills and research methods...

Post on 19-Dec-2015

222 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Locating Variance: Post-Hoc Tests

Dr James Betts

Developing Study Skills and Research Methods (HL20107)

Why do we use Hypothesis Testing?

• It is easy (i.e. data in P value out)

• It provides the ‘Illusion of Scientific Objectivity’

• Everybody else does it.

Problems with Hypothesis Testing?

• P<0.05 is an arbitrary probability (P<0.06?)

• The size of the effect is not expressed

• The variability of this effect is not expressed

• Overall, hypothesis testing ignores ‘judgement’.

Lecture Outline:

•Influence of multiple comparisons on P

•Tukey’s HSD test

•Bonferroni Corrections

•Ryan-Holm-Bonferroni Adjustments.

Tim

e to

Fat

igu

e (m

in)

0

20

40

60

80

100

120 PlaceboLGIHGIGlucose

Thomas et al. (1991)

*

*P <0.05 vs. Placebo, HGI & Glucose

PlaceboLucozadeGatoradePowerade

PlaceboLucozadeGatoradePowerade

Tim

e to

Fat

igu

e (m

in)

0

20

40

60

80

100

120 PlaceboLGIHGIGlucose

Thomas et al. (1991)

*

*P <0.05 vs. Placebo, HGI & Glucose

PlaceboLucozadeGatoradePowerade

PlaceboLucozadeGatoradePowerade

Why not multiple t-tests?i.e.• Placebo vs Lucozade• Placebo vs Gatorade• Placebo vs Powerade• Lucozade vs Gatorade• Lucozade vs Powerade• Gatorade vs Powerade

• We accept ‘significance’ and reject the null hypothesis at P0.05 (i.e. a 5% chance that we are wrong)

• Performing multiple tests therefore means that our overall chance of committing a type I error is >5%.

Post-hoc Tests• A popular solution is the Tukey HSD

(Honestly Significant Difference) test

• This uses the omnibus error term from the ANOVA to determine which means are significantly different

T = (q)

√Error Variance

n

ANOVA

TimetoFatigue

3434.475 3 1144.825 6.364 .001

6476.500 36 179.903

9910.975 39

Between Groups

Within Groups

Total

Sum ofSquares df Mean Square F Sig.

q table for Tukey’s HSD

Multiple Comparisons

Dependent Variable: TimetoFatigue

Tukey HSD

-20.00000* 5.99838 .010 -36.1550 -3.8450

3.30000 5.99838 .946 -12.8550 19.4550

-11.40000 5.99838 .246 -27.5550 4.7550

20.00000* 5.99838 .010 3.8450 36.1550

23.30000* 5.99838 .002 7.1450 39.4550

8.60000 5.99838 .487 -7.5550 24.7550

-3.30000 5.99838 .946 -19.4550 12.8550

-23.30000* 5.99838 .002 -39.4550 -7.1450

-14.70000 5.99838 .086 -30.8550 1.4550

11.40000 5.99838 .246 -4.7550 27.5550

-8.60000 5.99838 .487 -24.7550 7.5550

14.70000 5.99838 .086 -1.4550 30.8550

(J) TrialLucozade

Gatorade

Powerade

Placebo

Gatorade

Powerade

Placebo

Lucozade

Powerade

Placebo

Lucozade

Gatorade

(I) TrialPlacebo

Lucozade

Gatorade

Powerade

MeanDifference

(I-J) Std. Error Sig. Lower Bound Upper Bound

95% Confidence Interval

The mean difference is significant at the .05 level.*.

Tukey Test Critique• As you learnt last week, the omnibus error term is

not reflective of all contrasts if sphericity is violated

• So Tukey tests commit many type I errors with even a slight degree of asphericity.

PlaceboLucozadeGatoradePowerade

Solution for Aspherical Data• There are alternatives to the Tukey HSD test which

use specific error terms for each contrast – Fisher’s LSD (Least Significant Difference)

– Sidak

– Bonferroni

– Many others…e.g. Newman-Kewls, Scheffe, Duncan, Dunnett, Gabriel, R-E-G-W, etc.

Fisher’s LSD BonferroniTrial 3

Trial 1

Trial 2

Trial 4

Pre 30min 60min 90min 1h 2h 3h 4h 10min Post

Ser

um

In

suli

n (

pm

ol.l-1

)

0

100

200

300

400

CHO CHO-PRO

Run 1 Recovery Run 2

*

Bonferroni Correction Critique• Correction of LSD values successfully controls for

type I errors following a 1-way ANOVA

• However, factorial designs often involve a larger number of contrasts, many of which may not be relevant.

Recovery Supp. 1Recovery Supp. 2

See also Perneger (1998) BMJ 316: 1236

Solution for Factorial Designs• An adjustment to the standard Bonferroni

correction can be applied for factorial designs

• This ‘Ryan-Holm-Bonferroni’ or ‘stepwise’ method involves returning to the P values of interest from our LSD test

• These P values are placed in numerical order and the most significant is Bonferroni corrected (i.e. P x m)

• However, all subsequent P values are multplied by m minus the number of contrasts already corrected.

Summary Post-Hoc Tests• A Tukey test may be appropriate when sphericity

can be assumed

• Multiple t-tests with a Bonferroni correction are more appropriate for aspherical data

• Stepwise correction of standard Bonferroni procedures maintain power with factorial designs

• Best option is to keep your study simple:– Pre-planned contrast at a specific time point– Summary statistics (e.g. rate of change, area under curve)

– Just make an informed based on the data available.

• Atkinson, G. (2001) Analysis of repeated measurements in physical therapy research Physical Therapy in Sport 2: p. 194-208

• Atkinson, G. (2002) Analysis of repeated measurements in physical therapy research: multiple comparisons amongst level means and multi-factorial designs Physical Therapy in Sport 3: p. 191-203

Further reading from this lecture…

• Batterham A. M. & Atkinson, G. (2005) How Big Does My Sample Need to Be? A primer on the Murky World of Sample Size Estimation Physical Therapy in Sport 6: p. 153-163.

Compulsory reading for next week’s lecture…

top related