why you need power analysis

Why you need power analysis even if you think you don’t Paul Johnson1 Sarah Barry2 Heather Ferguson1 Pie Müller3

1: Boyd Orr Centre, IBAHCM, University of Glasgow

2: Robertson Centre for Biostatistics, University of Glasgow

3: Swiss Tropical and Public Health Institute

14/12/2015 @paulcdjo 1

Power

Narrow definition: The probability of a significant result given some effect size

• Applies only to null hypothesis significance testing (NHST)

Broad definition: The information we expect to gain from a study that uses statistical inference; “informativeness”

• Applies to all modes of statistical inference:

• Null hypothesis significance testing (NHST)

• Estimating an effect & confidence interval

• Information theoretic (AIC, BIC, etc)

• Bayesian

14/12/2015 @paulcdjo 2

Reproducibility

“Reproducibility is the ability of an entire experiment or study to be duplicated… Reproducibility is one of the main principles of the scientific method.”

Wikipedia, 08/12/15

“non-reproducible single occurrences are of no significance to science”

Karl Popper, The Logic of Scientific Discovery, 1935

14/12/2015 @paulcdjo 3

Low power means low reproducibility

• 1000 studies test a null hypothesis (H0) at 5% significance level and with 30% power

• In 100, H0 is false, 30% x 100 = 30 significant and true

• In 900, H0 is true, 5% x 900 = 45 significant but false

• P(true | significant) = 30/(30 + 45) = 40%

14/12/2015 @paulcdjo 4

A crisis of irreproducibility?

14/12/2015 @paulcdjo 5


• Human genetics

• Psychology

• Neuroscience

14/12/2015 @paulcdjo 6

Irreproducibility in human genetics

14/12/2015 @paulcdjo 7

Nature Genetics 29, 306-309 (2001)

Typical GWAS sample sizes from current Nature Genetics issue:

• 23,000

• 27,000

• 116,000

Irreproducibility in psychology

14/12/2015 @paulcdjo 8

Original study effect size versus replication effect size

(correlation coefficients).

Open Science Collaboration

Science 2015;349:aac4716

Reproducibility Project: Psychology

• Aimed to replicate 100 studies

• Result: 39/100 replicated original result

Irreproducibility in neuroscience

14/12/2015 @paulcdjo 9

Nature Reviews Neuroscience 14, 365-376 (2013)

“Our results indicate that the median statistical power in neuroscience is 21%.” “The consequences of this [very low power] include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.”

14/12/2015 @paulcdjo 10


• Human genetics

• Psychology

• Neuroscience

14/12/2015 @paulcdjo 11


• Human genetics

• Psychology

• Neuroscience

• Ecology?

14/12/2015 @paulcdjo 12

A crisis of irreproducibility in ecology?

• No study has directly assessed reproducibility across ecology

• Are known causes of irreproducibility prevalent in ecological research?

14/12/2015 @paulcdjo 13

Low power in ecological research?

• Jennions & Møller (2003) A survey of the statistical power of research in behavioral ecology and animal behavior. Behavioral Ecology • Average power 40-47% for medium effect sizes

• Taborsky (2010) Sample size in the study of behaviour. Ethology • Power analysis rarely used in behavioural research

• Smith et al. (2011) Power rangers: no improvement in the statistical power of analyses published in Animal Behaviour. Animal Behaviour • Average power 23-26% for medium effect sizes

What can we do about it?

14/12/2015 @paulcdjo 14

14/12/2015 @paulcdjo 15

Obstacle to using power analysis Solution

Power analysis doesn’t apply to my study

It does if you are using statistical inference – broaden your definition of power analysis

I don’t know what parameter values to assume (pilot data not available)

Assume a plausible worst-case scenario based on experience, or do a pilot study

I don’t know the true effect size You don’t need to – this is the job of the study. Think instead of • The smallest effect worth detecting (for NHST) • The desired precision of the estimate (for estimation)

My model is too complicated for power analysis

• Simplify to the bare essentials • Use simulations • Ask for help

Power analysis is hard & no one is forcing me to do it

Power analysis should be promoted by • Journal editors • Funders • Group leaders

14/12/2015 @paulcdjo 16

14/12/2015 @paulcdjo 17

14/12/2015 @paulcdjo 18

14/12/2015 @paulcdjo 19

Obstacle to using power analysis Solution

Power analysis doesn’t apply to my study

It does if you are using statistical inference – broaden your definition of power analysis

I don’t know what parameter values to assume (pilot data not available)

Assume a plausible worst-case scenario based on experience, or do a pilot study

I don’t know the true effect size You don’t need to – this is the job of the study. Think instead of • The smallest effect worth detecting (for NHST) • The desired precision of the estimate (for estimation)

My model is too complicated for power analysis

• Simplify to the bare essentials • Use simulations • Ask for help

Power analysis is hard & no one is forcing me to do it

Power analysis should be strongly encouraged by • Journal editors • Funders • Group leaders

Conclusions

• Underpowered studies are probably common in ecology, so irreproducibility is probably rife

• Time and money spend on underpowered research is wasted

• We should stop looking for excuses to avoid power analysis

• We should collaborate with statisticians who know how to do power analysis

• Journal editors, funders & group leaders should encourage power analysis

Funding acknowledgement: AvecNet - African Vector Control: New Tools

14/12/2015 @paulcdjo 20