how not tonot tonot toanalyse and present data · 5. using an incorrect statistical method to...
TRANSCRIPT
Dr Arul Earnest PhD,MSc, DLSHTM Associate Professor Department of Epidemiology & Preventive MedicineSchool of Public Health & Preventive MedicineFaculty of Medicine, Nursing & Health SciencesMonash UniversityFeb 2015
How not tonot tonot tonot to analyse and present data
Study design Study power Random sampling
Statistical assumptions
Mis-use of statistics
Examples (loads!)
Avoiding spins
Presenting data
Read the fine print ….. “children born after unplanned pregnancies tend to have a more limited vocabulary and poorer non-verbal and spatial abilities; however this is almost entirely explained by their disadvantaged circumstances”
http://www.theguardian.com/commentisfree/2011/aug/05/bad-science-adjusting-figures
1.RCTs- randomisation, control group, blinding2.Cohort- prospective, rare exposures3.Case-control- relatively faster and cheaper, rare outcomes4.Cross-sectional- one time point, cheapest and most convenient5. Before-after design. Common in hospital intervention setting
�Looking for “absence of evidence” is different from “evidence of absence”� Employing incorrect or hybrid designs, e.g. prospective case-control study� Choosing the correct study design, but cut some corners (e.g. selecting controls from different population)� Describing study design incorrectly
Researchers wanted to examine the Researchers wanted to examine the Researchers wanted to examine the Researchers wanted to examine the relationship between smoking and lung relationship between smoking and lung relationship between smoking and lung relationship between smoking and lung cancer (outcome). Casecancer (outcome). Casecancer (outcome). Casecancer (outcome). Case----control?control?control?control?
Smokers
Non-smokers
1. Recruit 50 in each
group
2. Follow-up over
time
Lung
Cancer
No Lung
Cancer
3.Determine the
proportion with
lung cancer in each
group
Another caseAnother caseAnother caseAnother case----control study..control study..control study..control study..
Select 100 subjects in Waverley hospital with hypertension, and a further 100 patients without hypertension from the community in Glen Waverley. Examine their dietary habits
Hypertension
No
Hypertension
1. Recruit 100 subjects
in each group
Poor Diet
Good Diet
2. Examine diet habits
retrospectively
Risk of cardiovascular events associated with selective COX-2 inhibitors. Mukherjee D, Nissen S, Topol E. JAMA 2001; 286:954-959
Suggested that use of Celebrex was suggestive of an increase in the occurrence of heart attacks
However, treatment arm of one RCT was compared against placebo arms of different trials!
�These are typically known as under-powered
studies
� There are too few patients recruited, such that
the study may not adequately demonstrate a
statistically significant result
� Low power implies a high type 2 error
Keen HI, Pile K, Hill CL. The prevalence of underpowered randomized clinical trials in rheumatology. J Rheumatol. 2005 Nov;32(11):2083-8.
A)5%
B)20%
C)50%
Comparing clinical outcomes across hospitalsComparing clinical outcomes across hospitalsComparing clinical outcomes across hospitalsComparing clinical outcomes across hospitals Hospital AHospital AHospital AHospital A Hospital BHospital BHospital BHospital B pppp----valuevaluevaluevalue
Healthcare-associated Staphylococcus aureus bloodstream infections (rate per 100,000 bed-days) 1.01 1.03 <0.001
Mean breast cancer surgery waiting times 12.5 12.7 <0.001
Mean lung cancer surgery waiting times 10.5 10.6 <0.001
Emergency department time (% leaving wihin 4 hrs) 85% 84% <0.001
P-values seem to indicate that hospital B is performing worse than hospital A!
But, are the differences clinically meaningful???
Hospital A Hospital B p-value Statistical significanceClinical
significance
80% 30% <0.001 Yes Yes
80% 79% <0.001 Yes Unlikely
80% 60% 0.893 No Yes
80% 78% 0.932 No Unlikely
Looking at nosocomial infection rates
Hospital A Hospital BSample
sizep-value (Fisher's exact
test)
80% 60% 20 0.628
80% 60% 200 0.003
80% 60% 2000 <0.001
As n p-value
Idea is to obtain a sample that is representative of the population.� Simple random sampling� Stratified random sampling
� ‘Man on the street’ surveys are bad� Similarly, non-probability sampling should be
avoided
Hospital staff satisfaction survey indicates only 60% are satisfied with their job. Should the board be concerned with retention of doctors as an example?
Demographic profile of hospital Demographic profile of hospital Demographic profile of hospital Demographic profile of hospital staffstaffstaffstaff
Overall hospital Overall hospital Overall hospital Overall hospital (n=6000)(n=6000)(n=6000)(n=6000)
Sample Sample Sample Sample (n=200)(n=200)(n=200)(n=200)
Number of females (%) 2500 (42%) 140 (70%)
Number of doctors (%) 600 (10%) 5 (3%)
Number of nurses (%) 4000 (67%) 100 (50%)
Number of allied health professionals (%) 500 (33%) 75 (37.5%)
� Using parametric methods for small sample data
� Dependencies in the data are ignored� Can lead to misleading conclusions
Let’s look at the following example comparing mean cholesterol between males and females using the independent student t-test
Can you remember what are the assumptions?
The famous bell-shaped curve
Source: http://commons.wikimedia.org/wiki/File:The_Dome_Church_at_Les_Invalides_-_July_2006-3.jpg
Use & Mis-use of Statistics
The use of statistics in medical research, especially in
publications is widespread and increasingly popular.
However, the use of incorrect or inappropriate statistical
methods is also not uncommon.
For example, a review by Schor and Karten (1966) of 295
papers published in 10 medical journals found that 28% of
the papers were statistically acceptable, 68% were deficient
and 5% were ‘unsalvageable’.
Use & Mis-use of Statistics
Mean vs Median
Patient Asthma Patient COPD 1 4 11 62 5 12 73 6 13 54 6 14 85 7 15 76 7 16 97 8 17 68 9 18 79 10 19 510 33 20 8
Mean 9.5 Mean 6.8Median 7 Median 7
Table 1. Comparison of average length of stay (in d ays) among patients with asthma and COPD
Use & Mis-use of Statistics
If we look at the mean length of stay, it would appear that asthma
patients, on average, are staying longer than COPD patients (9.5 vs 6.8
days). However, on closer inspection, it appears that patient 10 has
stayed for a substantially longer period (33 days) and this has artificially
inflated the mean length of stay.
The median would be a better indicator of average length of stay than
the mean, as it is not influenced by outliers and works well with data
that is not normally distributed. In fact, if we were to use the median,
we would conclude that the median length of stay among patients with
asthma and COPD is similar (7 vs 7 respectively).
One indication that the data is not normally distributed is when the
mean and median are not the same or when the standard
deviation is large (8.4 versus 1.3!)
Use & Mis-use of Statistics
Correlation versus agreement
Substantially high correlation, yet 5 point difference (30%) difference in rating of pain score between physician A and B
Pain score rating
Common Errors in Analysis
1. Using methods of analysis when the assumptions are not met
2. Analysing paired data ignoring the pairing
3. Using multiple paired comparisons instead of an analysis that considers all
groups (e.g. ANOVA with Bonferroni correction)
4. Quoting confidence intervals that include impossible values
5. Using an incorrect statistical method to analyse data
Common Errors in Presentation
1. Presenting standard errors instead of standard deviation
2. Presenting p-values as 0 or 1.
3. Presenting means, standard deviations instead of median and range
Use & Mis-use of Statistics
� The standard deviation (SD) describes the variability between individuals in a sample
� Standard error of the mean (SEM) describes the uncertainty of how the sample mean represents the population mean
� SEM is always lower than the mean
Source: Nagele et al. 2001
AE7
Slide 29
AE7 put in some reportsArul Earnest, 28/01/2015
In 8 of 9 years from 2002 to 2010, the percentage of children who needed care right away for an illness, injury, or condition in the last 12 months who sometimes or never got care as soon as wanted was significantly lower for children with private insurance than for children with public insurance
National Healthcare Quality Report, 2013 . Source: http://www.ahrq.gov/research/findings/nhqrdr/nhqr13/chap5.html
Example from report...
Daycare centre and mind damage among young children?
http://www.theguardian.com/commentisfree/2011/sep/23/bad-science-ben-goldacre
This news story was based on a scientific paper by Sigman in The Biologist. It misrepresents individual studies and it cherry-picks the scientific literature, selectively referencing only the studies that support Sigman's view.
Spin strategy� Focusing on within-group comparison and
subgroup analyses in the Results section� Interpreting a non-significant result as
demonstrating a similar effect when the study was not designed to assess equivalence or noninferiority
� Using reporting strategies to highlight that the experimental treatment is beneficial, even though the results were not statistically significant
� One should also pay attention to careful interpretation of the results, and avoid incorrect conclusions or possible ‘spin’ in the results.
� In a review of parallel-group randomised controlled trial articles with a clearly identified primary outcome showing statistically nonsignificant results(Boutron et al. 2010), the authors found that 40% of the reports had spin in at least 2 of these sections in the main text.
� Spin was identified in the Results and Conclusions sections of the abstracts of 37.5% and 58.3% of the reports.
0
5
10
15
20
25
30
35
Day1 Day2 Day3 Day4 Day5
Drug A Drug B
0
50
100
150
200
Day1 Day2 Day3 Day4 Day5
Drug A Drug B
Peakflow response by drug.
Changing the scales in the graph makes a huge difference in presentation!
0
.1
.2
.3
.4
.5P
ositi
ve m
argi
n ra
te
0 500 1000 1500 2000 2500Number of operations
Hospitals
Funnel plots looking at positive margin rates after radical prostatectomy
.2
.3
.4
.5
0 500 1000 1500 2000 2500
Hospitals
DataVis. http://www.datavis.ca/gallery/lie-factor.php
� Understand the common mistakes people make in the design, analysis and reporting of data
� Acquire simple tips to critically evaluate the quality of reports/research
� Bad reporting is prevalent in conferences, reports, news articles and even peer-reviewed journals
� Get a biostatistician involved in your project, pronto..
1. Phillip I. Good and James W. Hardin. Common Errors in Statistics (and How to Avoid Them). John Wiley & Sons 2003.
2. Robert Hooke. How to tell the liars from the statisticians. Marcel Dekker 1983.
3. Brian S. Everitt. Chance rules- An informal guide to probability, risk and statistics. Springer-Verlag 1999.
4. Raymond Hubbard and R. Murray Lindsay. Why P Values Are Not a Useful Measure of Evidence in Statistical Theory Psychology 2008; 18; 69
5. Say Beng Tan. Three Myths about Biostatisticians. Proceedings of Singapore Healthcare. Volume 19, Number 1, 2010: pp 83.
6. P. Nagele. Misuse of standard error of the mean (SEM) when reporting variability of a sample. A critical evaluation of four anaesthesia journals. British Journal of Anaesthesia 90 (4): 514, 6 (2001)