how not tonot tonot toanalyse and present data · 5. using an incorrect statistical method to...

Dr Arul Earnest PhD,MSc, DLSHTM Associate Professor Department of Epidemiology & Preventive MedicineSchool of Public Health & Preventive MedicineFaculty of Medicine, Nursing & Health SciencesMonash UniversityFeb 2015

How not tonot tonot tonot to analyse and present data

Study design Study power Random sampling

Statistical assumptions

Mis-use of statistics

Examples (loads!)

Avoiding spins

Presenting data

Read the fine print ….. “children born after unplanned pregnancies tend to have a more limited vocabulary and poorer non-verbal and spatial abilities; however this is almost entirely explained by their disadvantaged circumstances”

http://www.theguardian.com/commentisfree/2011/aug/05/bad-science-adjusting-figures

1.RCTs- randomisation, control group, blinding2.Cohort- prospective, rare exposures3.Case-control- relatively faster and cheaper, rare outcomes4.Cross-sectional- one time point, cheapest and most convenient5. Before-after design. Common in hospital intervention setting

�Looking for “absence of evidence” is different from “evidence of absence”� Employing incorrect or hybrid designs, e.g. prospective case-control study� Choosing the correct study design, but cut some corners (e.g. selecting controls from different population)� Describing study design incorrectly

Researchers wanted to examine the Researchers wanted to examine the Researchers wanted to examine the Researchers wanted to examine the relationship between smoking and lung relationship between smoking and lung relationship between smoking and lung relationship between smoking and lung cancer (outcome). Casecancer (outcome). Casecancer (outcome). Casecancer (outcome). Case----control?control?control?control?

Smokers

Non-smokers

1. Recruit 50 in each

group

2. Follow-up over

time

Lung

Cancer

No Lung

Cancer

3.Determine the

proportion with

lung cancer in each

group

Another caseAnother caseAnother caseAnother case----control study..control study..control study..control study..

Select 100 subjects in Waverley hospital with hypertension, and a further 100 patients without hypertension from the community in Glen Waverley. Examine their dietary habits

Hypertension

No

Hypertension

1. Recruit 100 subjects

in each group

Poor Diet

Good Diet

2. Examine diet habits

retrospectively

Risk of cardiovascular events associated with selective COX-2 inhibitors. Mukherjee D, Nissen S, Topol E. JAMA 2001; 286:954-959

Suggested that use of Celebrex was suggestive of an increase in the occurrence of heart attacks

However, treatment arm of one RCT was compared against placebo arms of different trials!

�These are typically known as under-powered

studies

� There are too few patients recruited, such that

the study may not adequately demonstrate a

statistically significant result

� Low power implies a high type 2 error

Keen HI, Pile K, Hill CL. The prevalence of underpowered randomized clinical trials in rheumatology. J Rheumatol. 2005 Nov;32(11):2083-8.

A)5%

B)20%

C)50%

Comparing clinical outcomes across hospitalsComparing clinical outcomes across hospitalsComparing clinical outcomes across hospitalsComparing clinical outcomes across hospitals Hospital AHospital AHospital AHospital A Hospital BHospital BHospital BHospital B pppp----valuevaluevaluevalue

Healthcare-associated Staphylococcus aureus bloodstream infections (rate per 100,000 bed-days) 1.01 1.03 <0.001

Mean breast cancer surgery waiting times 12.5 12.7 <0.001

Mean lung cancer surgery waiting times 10.5 10.6 <0.001

Emergency department time (% leaving wihin 4 hrs) 85% 84% <0.001

P-values seem to indicate that hospital B is performing worse than hospital A!

But, are the differences clinically meaningful???

Hospital A Hospital B p-value Statistical significanceClinical

significance

80% 30% <0.001 Yes Yes

80% 79% <0.001 Yes Unlikely

80% 60% 0.893 No Yes

80% 78% 0.932 No Unlikely

Looking at nosocomial infection rates

Hospital A Hospital BSample

sizep-value (Fisher's exact

test)

80% 60% 20 0.628

80% 60% 200 0.003

80% 60% 2000 <0.001

As n p-value

Idea is to obtain a sample that is representative of the population.� Simple random sampling� Stratified random sampling

� ‘Man on the street’ surveys are bad� Similarly, non-probability sampling should be

avoided

Hospital staff satisfaction survey indicates only 60% are satisfied with their job. Should the board be concerned with retention of doctors as an example?

Demographic profile of hospital Demographic profile of hospital Demographic profile of hospital Demographic profile of hospital staffstaffstaffstaff

Overall hospital Overall hospital Overall hospital Overall hospital (n=6000)(n=6000)(n=6000)(n=6000)

Sample Sample Sample Sample (n=200)(n=200)(n=200)(n=200)

Number of females (%) 2500 (42%) 140 (70%)

Number of doctors (%) 600 (10%) 5 (3%)

Number of nurses (%) 4000 (67%) 100 (50%)

Number of allied health professionals (%) 500 (33%) 75 (37.5%)

� Using parametric methods for small sample data

� Dependencies in the data are ignored� Can lead to misleading conclusions

Let’s look at the following example comparing mean cholesterol between males and females using the independent student t-test

Can you remember what are the assumptions?

The famous bell-shaped curve

Source: http://commons.wikimedia.org/wiki/File:The_Dome_Church_at_Les_Invalides_-_July_2006-3.jpg

Use & Mis-use of Statistics

The use of statistics in medical research, especially in

publications is widespread and increasingly popular.

However, the use of incorrect or inappropriate statistical

methods is also not uncommon.

For example, a review by Schor and Karten (1966) of 295

papers published in 10 medical journals found that 28% of

the papers were statistically acceptable, 68% were deficient

and 5% were ‘unsalvageable’.


Mean vs Median

Patient Asthma Patient COPD 1 4 11 62 5 12 73 6 13 54 6 14 85 7 15 76 7 16 97 8 17 68 9 18 79 10 19 510 33 20 8

Mean 9.5 Mean 6.8Median 7 Median 7

Table 1. Comparison of average length of stay (in d ays) among patients with asthma and COPD


If we look at the mean length of stay, it would appear that asthma

patients, on average, are staying longer than COPD patients (9.5 vs 6.8

days). However, on closer inspection, it appears that patient 10 has

stayed for a substantially longer period (33 days) and this has artificially

inflated the mean length of stay.

The median would be a better indicator of average length of stay than

the mean, as it is not influenced by outliers and works well with data

that is not normally distributed. In fact, if we were to use the median,

we would conclude that the median length of stay among patients with

asthma and COPD is similar (7 vs 7 respectively).

One indication that the data is not normally distributed is when the

mean and median are not the same or when the standard

deviation is large (8.4 versus 1.3!)


Correlation versus agreement

Substantially high correlation, yet 5 point difference (30%) difference in rating of pain score between physician A and B

Pain score rating

Common Errors in Analysis

1. Using methods of analysis when the assumptions are not met

2. Analysing paired data ignoring the pairing

3. Using multiple paired comparisons instead of an analysis that considers all

groups (e.g. ANOVA with Bonferroni correction)

4. Quoting confidence intervals that include impossible values

5. Using an incorrect statistical method to analyse data

Common Errors in Presentation

1. Presenting standard errors instead of standard deviation

2. Presenting p-values as 0 or 1.

3. Presenting means, standard deviations instead of median and range


� The standard deviation (SD) describes the variability between individuals in a sample

� Standard error of the mean (SEM) describes the uncertainty of how the sample mean represents the population mean

� SEM is always lower than the mean

Source: Nagele et al. 2001

AE7

Slide 29

AE7 put in some reportsArul Earnest, 28/01/2015

In 8 of 9 years from 2002 to 2010, the percentage of children who needed care right away for an illness, injury, or condition in the last 12 months who sometimes or never got care as soon as wanted was significantly lower for children with private insurance than for children with public insurance

National Healthcare Quality Report, 2013 . Source: http://www.ahrq.gov/research/findings/nhqrdr/nhqr13/chap5.html

Example from report...

Daycare centre and mind damage among young children?

http://www.theguardian.com/commentisfree/2011/sep/23/bad-science-ben-goldacre

This news story was based on a scientific paper by Sigman in The Biologist. It misrepresents individual studies and it cherry-picks the scientific literature, selectively referencing only the studies that support Sigman's view.

Spin strategy� Focusing on within-group comparison and

subgroup analyses in the Results section� Interpreting a non-significant result as

demonstrating a similar effect when the study was not designed to assess equivalence or noninferiority

� Using reporting strategies to highlight that the experimental treatment is beneficial, even though the results were not statistically significant

� One should also pay attention to careful interpretation of the results, and avoid incorrect conclusions or possible ‘spin’ in the results.

� In a review of parallel-group randomised controlled trial articles with a clearly identified primary outcome showing statistically nonsignificant results(Boutron et al. 2010), the authors found that 40% of the reports had spin in at least 2 of these sections in the main text.

� Spin was identified in the Results and Conclusions sections of the abstracts of 37.5% and 58.3% of the reports.

0

5

10

15

20

25

30

35

Day1 Day2 Day3 Day4 Day5

Drug A Drug B

0

50

100

150

200

Day1 Day2 Day3 Day4 Day5

Drug A Drug B

Peakflow response by drug.

Changing the scales in the graph makes a huge difference in presentation!

0

.1

.2

.3

.4

.5P

ositi

ve m

argi

n ra

te

0 500 1000 1500 2000 2500Number of operations

Hospitals

Funnel plots looking at positive margin rates after radical prostatectomy

.2

.3

.4

.5

0 500 1000 1500 2000 2500

Hospitals

DataVis. http://www.datavis.ca/gallery/lie-factor.php

� Understand the common mistakes people make in the design, analysis and reporting of data

� Acquire simple tips to critically evaluate the quality of reports/research

� Bad reporting is prevalent in conferences, reports, news articles and even peer-reviewed journals

� Get a biostatistician involved in your project, pronto..

1. Phillip I. Good and James W. Hardin. Common Errors in Statistics (and How to Avoid Them). John Wiley & Sons 2003.

2. Robert Hooke. How to tell the liars from the statisticians. Marcel Dekker 1983.

3. Brian S. Everitt. Chance rules- An informal guide to probability, risk and statistics. Springer-Verlag 1999.

4. Raymond Hubbard and R. Murray Lindsay. Why P Values Are Not a Useful Measure of Evidence in Statistical Theory Psychology 2008; 18; 69

5. Say Beng Tan. Three Myths about Biostatisticians. Proceedings of Singapore Healthcare. Volume 19, Number 1, 2010: pp 83.

6. P. Nagele. Misuse of standard error of the mean (SEM) when reporting variability of a sample. A critical evaluation of four anaesthesia journals. British Journal of Anaesthesia 90 (4): 514, 6 (2001)

how not tonot tonot toanalyse and present data · 5. using an incorrect statistical method to...

Documents