statistical tips for interpreting scientific claims/media/files/hospitals/pch/general... ·...

22
Research Skills Seminar Series 2019 CAHS Research Education Program ResearchEducationProgram.org Statistical Tips for Interpreting Scientific Claims Mark Jones Statistician, Telethon Kids Institute 18 October 2019 Research Skills Seminar Series | CAHS Research Education Program Department of Child Health Research | Child and Adolescent Health Service

Upload: others

Post on 24-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

Research Skills Seminar Series 2019CAHS Research Education Program

ResearchEducationProgram.org

Statistical Tips for Interpreting Scientific ClaimsMark JonesStatistician, Telethon Kids Institute

18 October 2019

Research Skills Seminar Series | CAHS Research Education Program Department of Child Health Research | Child and Adolescent Health Service

Page 2: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

Copyright to this material produced by the CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service, Western Australia, under the provisions of the Copyright Act 1968 (C’wth Australia). Apart from any fair dealing for personal, academic, research or non-commercial use, no part may be reproduced without written permission. The Department of Child Health Research is under no obligation to grant this permission. Please acknowledge the CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service when reproducing or quoting material from this source.

© CAHS Research Education Program, Department of Child Health Research,Child and Adolescent Health Service, WA 2019

Page 3: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

Statistical Tips for Interpreting Scientific Claims

CONTENTS:

1 PRESENTATION ............................................................................................................................... 1

2 ARTICLE: TWENTY TIPS FOR INTERPRETING SCIENTIFIC CLAIMS, SUTHERLAND, SPIEGELHALTER & BURGMAN, 2013 .................................................................................................................................. 15

3 ADDITIONAL RESOURCES – STATISTICAL TIPS FOR INTERPRETING SCIENTIFIC CLAIMS ..................... 18

3.1 STATISTICAL BIAS ............................................................................................................................. 18

3.2 CORRELATIONS ................................................................................................................................ 18

3.3 COGNITIVE BIAS ............................................................................................................................... 19

3.4 REFERENCES ..................................................................................................................................... 19

Page 4: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

1

RESEARCH SKILLS SEMINAR SERIES 2019CAHS Research Education Program

ResearchEducationProgram.org

Statistical Tips for Interpreting Scientific ClaimsMark Jones|Biostatistician, Telethon Kids [email protected]

On behalf of Dr Julie MarshSnr Research Fellow, Telethon Kids Institute

Research Skills Seminar Series | CAHS Research Education Program Department of Child Health Research | Child and Adolescent Health Service

Statistical Tips for Interpreting Scientific Claims

Sutherland, Spiegelhalter and Burgman (2013). Twenty tips for interpreting scientific claims. Nature 503 (335)

2

The 20 Tips1. Differences and Chance Cause Variation2. No Measurement is Exact3. Bias is Rife4. Bigger is usually Better for Sample Size5. Correlation Does Not Imply Causation6. Regression to the Mean Can Mislead7. Extrapolating Beyond the Data is Risky8. Beware the Base‐Rate Fallacy9. Controls are Important10. Randomisation minimises bias11. Seek Replication12. Scientists are Human

13. Significance is Significant14. Separate No Effect from Non‐Significance15. Effect Size Matters16. Study Relevance Limits Generalisation17. Feelings Influence Risk Perception18. Dependencies change the risks19. Data can be dredged, or cherry picked.20. Extreme measurements may mislead.

3

1. Differences and Chance Cause Variation2. No Measurement is Exact3. Bias is Rife4. Bigger is usually Better for Sample Size5. Correlation Does Not Imply Causation6. Regression to the Mean Can Mislead7. Extrapolating Beyond the Data is Risky8. Beware the Base‐Rate Fallacy9. Controls are Important10. Randomisation minimises bias11. Seek Replication12. Scientists are Human

13. Significance is Significant14. Separate No Effect from Non‐Significance15. Effect Size Matters16. Study Relevance Limits Generalisation17. Feelings Influence Risk Perception18. Dependencies change the risks19. Data can be dredged, or cherry picked.20. Extreme measurements may mislead.

The 20 Tips

4

Statistical InsightsStudy Design

Human Factors

Page 5: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

2

Part 1: Study Design

5

Randomisation (minimises bias)Randomization avoids bias

Simple Random Sample

Population

Control Group

Randomised Allocation

Treatment 

Treatment Group

Placebo 

Source: Final Fantasy VII

EXPERIMENTAL UNIT ‐smallest ‘thing’ to which we allocate treatments.

“Results consistent across many studies, replicated on independent populations, are more likely to be solid.”

“The results of several such experiments may be combined in a meta‐analysis to provide an overarching view of the topic with potentially much greater statistical power than any of the individual studies.”

7

ReplicationSeek Replication, not pseudoreplication 

8

Controls are Important

Page 6: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

3

Generalisability (aka external validity)Study relevance limits generalizations

9

“At its best a trial shows what can be accomplished with a medicine under careful observation and certain restricted conditions. The same results will not invariably or necessarily be observed when the medicine passes into general use.” 

Austin Bradford Hill, 1984

Generalisability (aka external validity)Study relevance limits generalizations

10

Part 2: Statistical Insights

11

Average efficacy can be more reliably and accurately estimated from a study with hundreds of participants than from a study with only a few participants.

Basic analyses often assume that participants are homogenous.

12

Sample Size (adopt the goldilocks principle)Bigger is usually better for sample size

Page 7: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

4

13

Sample Size (adopt the goldilocks principle)Bigger is usually better for sample size

too small too big just right

“The main challenge of research is teasing apart the importance of the process of interest from the innumerable other sources of variation.” 

Differences and chance cause variation

14

“The main challenge of research is teasing apart the importance of the process of interest from the innumerable other sources of variation.” 

Differences and chance cause variation

15

“The main challenge of research is teasing apart the importance of the process of interest from the innumerable other sources of variation.” 

Differences and chance cause variation

16

Page 8: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

5

Hypotheses, Effects and SignificanceSignificance is significant, Separate no effect from non‐significance, Effect size matters

17

Hypothesis Test is a statistical technique to assist with answering questions or claims about population parameters by sampling from the population of interest. 

P‐values give the probability that an observed association could have arisen solely as a result of chance bias in sampling, given the null hypothesis – it is dependent on sample size and variability.

p‐value=0.01 means there is a 1‐in‐100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at all.

Hypotheses, Effects and SignificanceSignificance is significant, Separate no effect from non‐significance, Effect size matters

18

Hypothesis Test is a statistical technique to assist with answering questions or claims about population parameters by sampling from the population of interest. 

P‐values give the probability that an observed association could have arisen solely as a result of chance bias in sampling, given the null hypothesis – it is dependent on sample size and variability.

p‐value=0.01 means there is a 1‐in‐100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at all.

Hypotheses, Effects and SignificanceSignificance is significant, Separate no effect from non‐significance, Effect size matters

19

Hypothesis Test is a statistical technique to assist with answering questions or claims about population parameters by sampling from the population of interest. 

P‐values give the probability that an observed association could have arisen solely as a result of chance bias in sampling, given the null hypothesis – it is dependent on sample size and variability.

p‐value=0.01 means there is a 1‐in‐100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at all.

Hypotheses, Effects and SignificanceSignificance is significant, Separate no effect from non‐significance, Effect size matters

20

Hypothesis Test is a statistical technique to assist with answering questions or claims about population parameters by sampling from the population of interest. 

P‐values give the probability that an observed association could have arisen solely as a result of chance bias in sampling, given the null hypothesis – it is dependent on sample size and variability.

p‐value=0.01 means there is a 1‐in‐100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at all.

Page 9: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

6

Hypotheses, Effects and SignificanceSignificance is significant, Separate no effect from non‐significance, Effect size matters

21

1. P‐values can indicate how incompatible the data are with a specified statisticalmodel.2. P‐values do not measure the probability that the studied hypothesis is true, orthe probability that the data were produced by random chance alone.3. Scientific conclusions and business or policy decisions should not be based only on whether a p‐value passes a specific threshold.4. Proper inference requires full reporting and transparency5. A p‐value, or statistical significance, does not measure the size of an effect or theimportance of a result.6. By itself, a p‐value does not provide a good measure of evidence regarding amodel or hypothesis.

Hypotheses, Effects and SignificanceSignificance is significant, Separate no effect from non‐significance, Effect size matters

22

95% Confidence Interval

Hypotheses, Effects and SignificanceSignificance is significant, Separate no effect from non‐significance, Effect size matters

23

Hypotheses, Effects and SignificanceSignificance is significant, Separate no effect from non‐significance, Effect size matters

24

Page 10: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

7

Measurement errors represent the limits of precision with which we can observe phenomena and are dictated by our instruments and operators.

Impacts:Bias, Power and Masking

Measurement ErrorNo measurement is exact

25 26

Measurement ErrorNo measurement is exact

27

Measurement ErrorNo measurement is exact

Bias is Rife

28

Sources:https://data36.com/statistical‐bias‐types‐explained/https://www.statisticshowto.datasciencecentral.com/what‐is‐bias/https://newonlinecourses.science.psu.edu/stat509/node/28/https://en.wikipedia.org/wiki/Bias_(statistics)

Page 11: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

8

Bias is Rife

29

Bias is Rife

30

Regression to the MeanRegression to the mean can mislead

31 32

Think ConditionallyDependencies Change the Risks

Page 12: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

9

Part 3: Human Factors

33

Scientists are Human (as are you)

“Peer review is not infallible: journal editors might favour positive findings & newsworthiness.” 

Statistical ToolsReporting Guidelines:  CONSORT, TREND, STROBE, PRISMA, REMARK, STREGA 

34

Base‐Rate FallacyBeware the base‐rate fallacy

35

15% Green85% Blue

80% Sure it’s green

Correlation Does Not Imply Causation 

36https://tylervigen.com/spurious‐correlations

Page 13: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

10

Correlation Does Not Imply Causation 

37https://tylervigen.com/spurious‐correlations

Correlation Does Not Imply Causation 

38https://tylervigen.com/spurious‐correlations

Correlation Does Not Imply Causation 

39

Correlation Does Not Imply Causation 

40 https://spice‐spotlight.scot/2019/07/08/searching‐for‐causes‐in‐the‐blue‐water‐schools/

Bradford Hill criteria

Page 14: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

11

Cognitive BiasFeelings influence risk perception

“Broadly, risk can be thought of as the likelihood of an event occurring in some time frame, multiplied by the consequences should the event occur. People’s risk perception is influenced disproportionately by many things, including the rarity of the event, how much control they believe they have, the adverseness of the outcomes, and whether the risk is voluntarily or not.” 

41

Cognitive BiasFeelings influence risk perception

42https://www.visualcapitalist.com/18‐cognitive‐bias‐examples‐mental‐mistakes/

Extreme Measurements may Mislead

43

That extreme measurements exist is a logical conclusion from random variation, regression to the mean, biasand measurement error. 

Data can be Dredged or Cherry Picked

44IMDB & Wikipedia

Page 15: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

12

Risky ExtrapolationExtrapolating beyond the data is risky

45

Risky ExtrapolationExtrapolating beyond the data is risky

46

Risky ExtrapolationExtrapolating beyond the data is risky

47 48

Summary

Reviewed the 20 Tips from Sutherland’s paper from the standpoint of study design, statistical insights and human factors.

Introduced some related material.

Drew some links between the common themes that underly the 20 Tips.

Page 16: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

13

Extra Slides Follow

49

Bias is Rife

50

Potential for Bias in RCT

Bias is Rife

51

Activity level(indep. variable)

Weight gain(dep. Variable)

Age(confounder)

ConfounderPotential for Bias in RCT

Questions?Upcoming Research Skills Seminars:1 Nov Grant Applications and Finding Funding

A/Prof Sue Skull15 Nov Qualitative Research Methods

Dr Shirley McGough*Full 2019 seminar schedule in back of handouts

52

Please give us feedback!A survey is included in the back of your handout or complete it online via:

www.surveymonkey.com/r/stattips2019ResearchEducationProgram.org | [email protected]

Page 17: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

Calls for the closer integration of science in political decision-making have been commonplace for decades. How-

ever, there are serious problems in the appli-cation of science to policy — from energy to health and environment to education.

One suggestion to improve matters is to encourage more scientists to get involved in politics. Although laudable, it is unrealistic to expect substantially increased political involvement from scientists. Another prop-osal is to expand the role of chief scientific advisers1, increasing their number, availabil-ity and participation in political processes. Neither approach deals with the core prob-lem of scientific ignorance among many who vote in parliaments.

Perhaps we could teach science to politi-cians? It is an attractive idea, but which busy politician has sufficient time? In practice, policy-makers almost never read scientific papers or books. The research relevant to the topic of the day — for example, mitochon-drial replacement, bovine tuberculosis or nuclear-waste disposal — is interpreted for them by advisers or external advocates. And there is rarely, if ever, a beautifully designed double-blind, randomized, replicated, con-trolled experiment with a large sample size and unambiguous conclusion that tackles the exact policy issue.

In this context, we suggest that the imme-diate priority is to improve policy-makers’ understanding of the imperfect nature of science. The essential skills are to be able to intelligently interrogate experts and advisers, and to understand the quality, limitations and biases of evidence. We term these inter-pretive scientific skills. These skills are more accessible than those required to understand the fundamental science itself, and can form part of the broad skill set of most politicians.

To this end, we suggest 20 concepts that should be part of the education of civil serv-ants, politicians, policy advisers and jour-nalists — and anyone else who may have to interact with science or scientists. Politicians with a healthy scepticism of scientific advo-cates might simply prefer to arm themselves with this critical set of knowledge.

We are not so naive as to believe that improved policy decisions will automati-cally follow. We are fully aware that scien-tific judgement itself is value-laden, and that bias and context are integral to how data are collected and interpreted. What we offer is a simple list of ideas that could help decision-makers to parse how evidence can contribute to a decision, and potentially to avoid undue influence by those with vested interests. The harder part — the social acceptability of different policies — remains in the hands of politicians and the broader political process.

Of course, others will have slightly different lists. Our point is that a wider

Twenty tips for interpreting

scientific claimsThis list will help non-scientists to interrogate advisers and to grasp the limitations of evidence, say William J.

Sutherland, David Spiegelhalter and Mark A. Burgman.

DAW

ID R

YSK

I

2 1 N O V E M B E R 2 0 1 3 | V O L 5 0 3 | N A T U R E | 3 3 5

COMMENT

© 2013 Macmillan Publishers Limited. All rights reserved

Page 18: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

understanding of these 20 concepts by society would be a marked step forward.

Differences and chance cause variation. The real world varies unpredictably. Science is mostly about discovering what causes the patterns we see. Why is it hotter this decade than last? Why are there more birds in some areas than others? There are many explana-tions for such trends, so the main challenge of research is teasing apart the importance of the process of interest (for example, the effect of climate change on bird populations) from the innumerable other sources of variation (from widespread changes, such as agricultural intensification and spread of invasive species, to local-scale processes, such as the chance events that determine births and deaths).

No measurement is exact. Practically all measurements have some error. If the meas-urement process were repeated, one might record a different result. In some cases, the measurement error might be large compared with real differences. Thus, if you are told that the economy grew by 0.13% last month, there is a moderate chance that it may actu-ally have shrunk. Results should be pre-sented with a precision that is appropriate for the associated error, to avoid implying an unjustified degree of accuracy.

Bias is rife. Experimental design or measur-ing devices may produce atypical results in a given direction. For example, determin-ing voting behaviour by asking people on the street, at home or through the Internet will sample different proportions of the population, and all may give different results. Because studies that report ‘statistically significant’ results are more likely to be writ-ten up and published, the scientific literature tends to give an exaggerated picture of the

magnitude of problems or the effectiveness of solutions. An experiment might be biased by expectations: participants provided with a treatment might assume that they will experience a difference and so might behave differently or report an effect. Researchers collecting the results can be influenced by knowing who received treatment. The ideal experiment is double-blind: neither the par-ticipants nor those collecting the data know who received what. This might be straight-forward in drug trials, but it is impossible for many social studies. Confirmation bias arises when scientists find evidence for a favoured theory and then become insuffi-ciently critical of their own results, or cease searching for contrary evidence.

Bigger is usually better for sample size. The average taken from a large number of observations will usually be more informa-tive than the average taken from a smaller number of observations. That is, as we accu-mulate evidence, our knowledge improves. This is especially important when studies are clouded by substantial amounts of natural variation and measurement error. Thus, the effectiveness of a drug treatment will vary naturally between subjects. Its average effi-cacy can be more reliably and accurately esti-mated from a trial with tens of thousands of participants than from one with hundreds.

Correlation does not imply causation. It is tempting to assume that one pattern causes another. However, the correlation might be coincidental, or it might be a result of both patterns being caused by a third factor — a ‘confounding’ or ‘lurking’ variable. For example, ecologists at one time believed that poisonous algae were killing fish in estuar-ies; it turned out that the algae grew where fish died. The algae did not cause the deaths2.

Regression to the mean can mislead. Extreme patterns in data are likely to be, at least in part, anomalies attributable to chance or error. The next count is likely to be less extreme. For example, if speed cameras are placed where there has been a spate of acci-dents, any reduction in the accident rate can-not be attributed to the camera; a reduction would probably have happened anyway.

Extrapolating beyond the data is risky. Patterns found within a given range do not necessarily apply outside that range. Thus, it is very difficult to predict the response of ecological systems to climate change, when the rate of change is faster than has been expe-rienced in the evolutionary history of existing species, and when the weather extremes may be entirely new.

Beware the base-rate fallacy. The ability of an imperfect test to identify a condi-tion depends upon the likelihood of that condition occurring (the base rate). For example, a person might have a blood test that is ‘99% accurate’ for a rare disease and test positive, yet they might be unlikely to have the disease. If 10,001 people have the test, of whom just one has the disease, that person will almost certainly have a positive test, but so too will a further 100 people (1%) even though they do not have the disease. This type of calculation is valuable when considering any screening procedure, say for terrorists at airports.

Controls are important. A control group is dealt with in exactly the same way as the experimental group, except that the treat-ment is not applied. Without a control, it is difficult to determine whether a given treat-ment really had an effect. The control helps researchers to be reasonably sure that there

Science and policy have collided on contentious issues such as bee declines, nuclear power and the role of badgers in bovine tuberculosis.

BA

DG

ER: A

ND

Y R

OU

SE/

NAT

UR

E P

ICTU

RE

LIB

RA

RY;

NU

CLE

AR

PLA

NT:

MIC

HA

EL

KO

HAU

PT/

FLIC

KR

/GET

TY; B

EE: M

ICH

AEL

DU

RH

AM

/MIN

DEN

/FLP

A

3 3 6 | N A T U R E | V O L 5 0 3 | 2 1 N O V E M B E R 2 0 1 3

COMMENT

© 2013 Macmillan Publishers Limited. All rights reserved

Page 19: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

are no confounding variables affecting the results. Sometimes people in trials report positive outcomes because of the context or the person providing the treatment, or even the colour of a tablet3. This underlies the importance of comparing outcomes with a control, such as a tablet without the active ingredient (a placebo).

Randomization avoids bias. Experiments should, wherever possible, allocate individ-uals or groups to interventions randomly. Comparing the educational achievement of children whose parents adopt a health programme with that of children of parents who do not is likely to suffer from bias (for example, better-educated families might be more likely to join the programme). A well-designed experiment would randomly select some parents to receive the programme while others do not.

Seek replication, not pseudoreplication. Results consistent across many studies, replicated on independent populations, are more likely to be solid. The results of several such experiments may be combined in a sys-tematic review or a meta-analysis to provide an overarching view of the topic with poten-tially much greater statistical power than any of the individual studies. Applying an inter-vention to several individuals in a group, say to a class of children, might be misleading because the children will have many features in common other than the intervention. The researchers might make the mistake of ‘pseu-doreplication’ if they generalize from these children to a wider population that does not share the same commonalities. Pseu-doreplication leads to unwarranted faith in the results. Pseudoreplication of studies on the abundance of cod in the Grand Banks in Newfoundland, Canada, for example, con-tributed to the collapse of what was once the largest cod fishery in the world4.

Scientists are human. Scientists have a vested interest in promoting their work, often for status and further research funding, although sometimes for direct financial gain. This can lead to selective reporting of results and occasionally, exaggeration. Peer review is not infallible: journal editors might favour positive findings and newsworthiness. Mul-tiple, independent sources of evidence and replication are much more convincing.

Significance is significant. Expressed as P, statistical significance is a measure of how likely a result is to occur by chance. Thus P = 0.01 means there is a 1-in-100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at all. Typically, scientists report results as significant when the P-value of the test is less than 0.05 (1 in 20).

Separate no effect from non-significance. The lack of a statistically significant result (say a P-value > 0.05) does not mean that there was no underlying effect: it means that no effect was detected. A small study may not have the power to detect a real differ-ence. For example, tests of cotton and potato crops that were genetically modified to pro-duce a toxin to protect them from damaging insects suggested that there were no adverse effects on beneficial insects such as pollina-tors. Yet none of the experiments had large enough sample sizes to detect impacts on beneficial species had there been any5.

Effect size matters. Small responses are less likely to be detected. A study with many rep-licates might result in a statistically signifi-cant result but have a small effect size (and so, perhaps, be unimportant). The impor-

tance of an effect size is a biological, physi-cal or social question, and not a statistical one. In the 1990s, the editor of the US

journal Epidemiology asked authors to stop using statistical significance in submitted manuscripts because authors were routinely misinterpreting the meaning of significance tests, resulting in ineffective or misguided recommendations for public-health policy6.

Study relevance limits generalizations. The relevance of a study depends on how much the conditions under which it is done resemble the conditions of the issue under consideration. For example, there are limits to the generalizations that one can make from animal or laboratory experiments to humans.

Feelings influence risk perception. Broadly, risk can be thought of as the likelihood of an event occurring in some time frame, multi-plied by the consequences should the event occur. People’s risk perception is influenced disproportionately by many things, includ-ing the rarity of the event, how much control they believe they have, the adverseness of the outcomes, and whether the risk is voluntar-ily or not. For example, people in the United States underestimate the risks associated with having a handgun at home by 100-fold, and overestimate the risks of living close to a nuclear reactor by 10-fold7.

Dependencies change the risks. It is pos-sible to calculate the consequences of indi-vidual events, such as an extreme tide, heavy rainfall and key workers being absent. How-ever, if the events are interrelated, (for exam-ple a storm causes a high tide, or heavy rain prevents workers from accessing the site) then the probability of their co-occurrence is much higher than might be expected8. The assurance by credit-rating agencies

that groups of subprime mortgages had an exceedingly low risk of defaulting together was a major element in the 2008 collapse of the credit markets.

Data can be dredged or cherry picked. Evidence can be arranged to support one point of view. To interpret an apparent asso-ciation between consumption of yoghurt during pregnancy and subsequent asthma in offspring9, one would need to know whether the authors set out to test this sole hypoth-esis, or happened across this finding in a huge data set. By contrast, the evidence for the Higgs boson specifically accounted for how hard researchers had to look for it — the ‘look-elsewhere effect’. The question to ask is: ‘What am I not being told?’

Extreme measurements may mislead. Any collation of measures (the effective-ness of a given school, say) will show vari-ability owing to differences in innate ability (teacher competence), plus sampling (chil-dren might by chance be an atypical sample with complications), plus bias (the school might be in an area where people are unu-sually unhealthy), plus measurement error (outcomes might be measured in different ways for different schools). However, the resulting variation is typically interpreted only as differences in innate ability, ignoring the other sources. This becomes problematic with statements describing an extreme out-come (‘the pass rate doubled’) or comparing the magnitude of the extreme with the mean (‘the pass rate in school x is three times the national average’) or the range (‘there is an x-fold difference between the highest- and lowest-performing schools’). League tables, in particular, are rarely reliable summaries of performance. ■

William J. Sutherland is professor of conservation biology in the Department of Zoology, University of Cambridge, UK. David Spiegelhalter is at the Centre for Mathematical Sciences, University of Cambridge. Mark Burgman is at the Centre of Excellence for Biosecurity Risk Analysis, School of Botany, University of Melbourne, Parkville, Australia.e-mail: [email protected]

1. Doubleday, R. & Wilsdon, J. Nature 485, 301–302 (2012).

2. Borsuk, M. E., Stow, C. A. & Reckhow, K. H. J. Water Res. Plan. Manage. 129, 271–282 (2003).

3. Huskisson, E. C. Br. Med. J. 4, 196–200 (1974) 4. Millar, R. B. & Anderson, M. J. Fish. Res. 70,

397–407 (2004).5. Marvier, M. Ecol. Appl. 12, 1119–1124 (2002).6. Fidler, F., Cumming, G., Burgman, M., Thomason,

N. J. Socio-Economics 33, 615–630 (2004).7. Fischhoff, B., Slovic, P. & Lichtenstein, S. Am. Stat.

36, 240–255 (1982).8. Billinton, R. & Allan, R. N. Reliability Evaluation of

Power Systems (Plenum, 1984).9. Maslova, E., Halldorsson, T. I., Strøm, M., Olsen, S.

F. J. Nutr. Sci. 1, e5 (2012).

“The question to ask is: ‘What am I not being told?’”

2 1 N O V E M B E R 2 0 1 3 | V O L 5 0 3 | N A T U R E | 3 3 7

COMMENT

© 2013 Macmillan Publishers Limited. All rights reserved

Page 20: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

3 ADDITIONAL RESOURCES – STATISTICAL TIPS FOR INTERPRETING

SCIENTIFIC CLAIMS

3.1 Statistical Bias

Statistical Bias Types explained

https://data36.com/statistical-bias-types-explained/

What is Bias in Statistics?

https://www.statisticshowto.datasciencecentral.com/what-is-bias/

PennState Eberly College of Science, STAT 509: Clinical Biases

https://newonlinecourses.science.psu.edu/stat509/node/28/

https://en.wikipedia.org/wiki/Bias_(statistics)

The Cochrane Collaboration’s tool of assessing risk of bias in randomised trials

https://www.bmj.com/content/343/bmj.d5928

Revised Cochrane risk-of-bias tool for randomised trials (Rob 2)

https://methods.cochrane.org/bias/resources/rob-2-revised-cochrane-risk-bias-tool-randomized-trials

3.2 Correlations

Spurious Correlations

https://tylervigen.com/spurious-correlations

Casual Diagrams: Draw your assumptions before your conclusions

https://online-learning.harvard.edu/course/causal-diagrams-draw-your-assumptions-your-conclusions

https://spice-spotlight.scot/2019/07/08/searching-for-causes-in-the-blue-water-schools/

Page 21: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

3.3 Cognitive Bias

18 Cognitive Bias examples show why mental mistakes get made https://www.visualcapitalist.com/18-cognitive-bias-examples-mental-mistakes/

3.4 References

• Sutherland, Spiegelhalter and Burgman (2013). Twenty tips for interpreting scientific claims. Nature 503 (335)

• Kristine R. Broglio, Jason T. Connor & Scott M. Berry (2014) Not Too Big, Not Too Small: A Goldilocks Approach To Sample Size Selection, Journal of Biopharmaceutical Statistics, 24:3, 685-705, DOI: 10.1080/10543406.2014.888569

• van der Bles Anne Marthe, van der Linden Sander, Freeman Alexandra L. J., Mitchell James, Galvao Ana B., Zaval Lisa and Spiegelhalter David J. Communicating uncertainty about facts, numbers and science 6R. Soc. open sci. http://doi.org/10.1098/rsos.181870

• Gigerenzer G. We need statistical thinking, not statistical rituals. Behavioral and Brain Sciences. 1998;21(2):199-200

• Codling, E., Plank, M., Benhamou, S., & Codling, E. (2008). Random walk models in biology. Journal of the Royal Society, Interface, 5(25), 813–834. https://doi.org/10.1098/rsif.2008.0014

• Dixon P. The p-value fallacy and how to avoid it. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Experimentale. 2003;57(3):189-202

• Ronald L. Wasserstein & Nicole A. Lazar (2016) The ASA Statement on p-Values: Context, Process, and Purpose, The American Statistician, 70:2, 129-133, DOI: 10.1080/00031305.2016.1154108

• Nuzzo, R. (2014), “Scientific Method: Statistical Errors,” Nature, 506, 150–152. Available at http://www.nature.com/news/scientific-method-statistical-errors-1.14700.

• Sackett, David L. “Bias in Analytic Research.” Journal of chronic diseases. 32.1-2 51–63. • Pannucci, C. J., & Wilkins, E. G. (2010). Identifying and avoiding bias in research. Plastic and

reconstructive surgery, 126(2), 619–625. doi:10.1097/PRS.0b013e3181de24bc • https://www.understandinghealthresearch.org/ • https://training.cochrane.org/handbook/current • https://sites.google.com/site/riskofbiastool/welcome/rob-2-0-tool/current-version-of-rob-2 • Adrian G Barnett, Jolieke C van der Pols, Annette J Dobson, Regression to the mean: what it is and

how to deal with it, International Journal of Epidemiology, Volume 34, Issue 1, February 2005, Pages 215–220, https://doi.org/10.1093/ije/dyh299

• Samuels, M. (1991). Statistical Reversion Toward the Mean: More Universal Than Regression Toward the Mean. The American Statistician, 45(4), 344-346. doi:10.2307/2684474

• Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology, 3(3), 430–454. https://doi.org/10.1016/0010-0285(72)90016-3

• Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychologica, 44(3), 211–233. https://doi.org/10.1016/0001-6918(80)90046-3

• Rothwell, P. (2005). External validity of randomised controlled trials: “To whom do the results of this trial apply?” The Lancet., 365(9453), 82–93. https://doi.org/10.1016/S0140-6736(04)17670-8

• Kidholm, K., Gerke, O., Vondeling, H., & Dyrvig, A. (2014). Checklists for external validity: a systematic review. Journal of Evaluation in Clinical Practice., 20(6), 857–864. https://doi.org/10.1111/jep.12166

Page 22: Statistical Tips for Interpreting Scientific Claims/media/Files/Hospitals/PCH/General... · Statistical Tips for Interpreting Scientific Claims. Sutherland, Spiegelhalter and Burgman

Research Skills Seminar Series 2019CAHS Research Education Program

ResearchEducationProgram.orgReserachEducationProgram@health.wa.gov.au

Research Skills Seminar Series | CAHS Research Education Program Department of Child Health Research | Child and Adolescent Health Service