Systematic Reviews of Diagnostic Studies
Guides for appraisal
Acknowledgements: Paul Glasziou, Jon Deeks, Madhukar Pai, Patrick Bossuyt, and Matthias Egger.
Information Overload 20,000 biomedical periodicals (6M articles) 17,000 biomedical books annually 30,000 recognized diseases 15,000 therapeutic agents (250/yr)
MEDLINE– 4,000 journals surveyed– 11,000,000 citations– 1.27 million articles related to oncology– 35,000 articles related to ear, nose, or throat
surgery
What makes a Review “Systematic”?
Based on a clearly formulated question
Identifies relevant studies
Appraises quality of studies
Summarizes evidence by use of explicit methodology
Comments based on evidence gathered
Origin of Clinical Questions
Diagnosis: how to select and interpret diagnostic tests
Prognosis: how to anticipate the patient’s likely course
Therapy: how to select treatments that do more good than harm
Prevention: how to screen and reduce the risk for disease
ROADMAPFOR DIAGNOSTICREVIEWS
Steps in a Systematic Review
1 Framing the Question (Q)
2 Identifying relevant publications (F)
3 Assessing Study quality (A)
4 Summarising Evidence and interpreting finding (S)
Step 1- Framing the Question (Q)
Clear, unambiguous, structured question
Questions formulated re: PPICO
– Populations of interest
– Prior test(s) (if appropriate)
– Intervention
– Comparisons (if appropriate)
– Outcomes
Unstructured Question
Is cervicovaginal fetal fibronectin useful?
– For what?
– For whom?
– What is meant by “useful”?
Structured Question
Does a positive cervicovaginal fetal fibronectin test predict
spontaneous preterm birth in asymptomatic women?
Test (Intervention)
Outcome Patient
Step 2 – Identifying relevant publications (F) Wide search of medical/scientific databases
– Medline– Cochrane Reviews– Ovid
Relevance to focused question PPICO– Population– Prior test– Intervention– Comparator– Outcome
Publication and reporting biases
Health Technology Assessment, 2000; 4(10):1-115
All studies conducted
All studies published
Studies reviewedGrey
literature
• Positive Results Bias• Grey Literature Bias• Time-Lag Bias• Language and Country Bias• Multiple Publication Bias• Selective Citation Bias• Database Indexing Bias• Selective Outcome Reporting B.
Registered vs. Published Studies Ovarian Cancer chemotherapy: single v combined
Published Registered
No. studies 16 13
Survival ratio 1.16 1.05
95% CI 1.06-1.27 0.98-1.12
P-Value 0.02 0.25
Simes, J. Clin Oncol, 86, p1529
Sensitivity
1-Specificity0 .2 .4 .6 .8 1
0 .2 .4 .6 .8 1
0
.8
.9
1
0
.8
.9
1
Lucas Bachmann
No search filter: 39 studies retrieved
Search filters for diagnostic studies
Sensitivity
1-Specificity0 .2 .4 .6 .8 1
0 .2 .4 .6 .8 1
0
.8
.9
1
0
.8
.9
1
With search filter: 12 studies retrieved (27 missed)
Lucas Bachmann
Documenting &storing
Assessment of Study Quality (A)
Quality varies, therefore Standardized Assessment (?blind*)
Group/Rank by quality
Select a threshold, e.g. all prospective studies with blind reading of reference and index tests.
* assessment of quality blind to study outcome
Quality Score: Mammals example
In natural habitat (No = 0; Yes = 1)– Setting
Whole animals (No = 0; Yes = 1)– Complete information
Photographs (No = 0; Yes = 1)– Level of evidence
Exercise I: Study Quality
Exercise I: Study Quality
3
3
3
2
2
1
3
1
1
2
1
1
1
3
1
Assessing a Study of a Test (Jaeschke et al, JAMA, 1994, 271: 389-91)
Was an appropriate spectrum of patients included? – (Spectrum Bias)
All patients subjected to a Gold Standard?– (Verification Bias)
Was there an independent, "blind" comparison with a Gold Standard? – Observer Bias; Differential Reference Bias
Methods described so you could repeat test?
Diagnostic Accuracy Study: Basic Design
Series of patientsSeries of patients
Index testIndex test
Reference standardReference standard
Blinded cross-classificationBlinded cross-classification
Selected PatientsSelected Patients
Index testIndex test
Reference standardReference standard
Blinded cross-classificationBlinded cross-classification
Spectrum Bias
Series of patientsSeries of patients
Index testIndex test
Reference Reference standardstandard
Blinded cross-classificationBlinded cross-classification
Verification Bias
Series of patientsSeries of patients
Index testIndex test
Blinded cross-classificationBlinded cross-classification
Ref. Std ARef. Std A Ref. Std. BRef. Std. B
Differential Reference Bias
Series of patientsSeries of patients
Index testIndex test
Reference standardReference standard
Unblinded cross-classificationUnblinded cross-classification
Observer Bias
HF patientsHF patients
Index testIndex test
Blinded cross-classificationBlinded cross-classification
controlscontrols
“Case-control” design
Empirical Effects of Bias
0 1 2 3 4Relative Diagnostic Odds Ratio
no description reference
no description population
no description test
retrospective
non-consecutive
not blinded
partial verification
different reference tests
case-control
0.7 (0.6-0.9)
1.4 (1.1-1.7)
1.7 (1.1-2.5)
1.0 (0.7-1.4)
0.9 (0.7-1.1)
1.3 (1.0-1.9)
1.0 (0.8-1.3)
2.2 (1.5-3.3)
3.0 (2.0-4.5)
Lijmer JG et al. JAMA 1999;282:1062-1067
Step 4 – Summarising the Evidence (S)
Extracting data from trials
Combining data – Meta analysis
Does it make sense to combine?
What is a meta-analysis?
A way to calculate an average
Estimates an ‘average’ or ‘common’ effect
Improves the precision of an estimate by using all available data
What is a meta-analysis?
Optional part of a systematic review
Systematic reviews
Meta-analyses
Summary ROC Meta-analytical Display
0.2
.4.6
.81
sen
sitiv
ity
0.2.4.6.81specificity
for predicting spontaneous birth
Fetal fibronectin
Threshold effects
Increasing threshold increases specificity but decreases sensitivity
Decreasing threshold increases sensitivity but decreases specificity
0.2
.4.6
.81
sens
itivi
ty
0.2.4.6.81specificity
for predicting spontaneous birth
Fetal fibronectin
Accuracy effects
0.2
.4.6
.81
sen
sitiv
ity
0.2.4.6.81specificity
for predicting spontaneous birth
Fetal fibronectin
Over-estimation of accuracy e.g. spectrum bias
Under-estimation of accuracy e.g. poor reference standard
Spectrum effects
0.2
.4.6
.81
sen
sitiv
ity
0.2.4.6.81specificity
for predicting spontaneous birth
Fetal fibronectin
Variation in the non-diseased
study participants
Variation in the
diseased study
participants
Methods of Meta-analysis
Separate pooling of sensitivity and specificity (and likelihood ratios)
– Inappropriate when highly heterogeneous
– Underestimates if there is heterogeneity in threshold
Constant diagnostic odds ratio across thresholds
sensitivity specificity LR+ LR- DOR=LR+/LR-
99% 71% 3.44 0.01 231
97% 86% 6.95 0.03 231
94% 94% 15.19 0.07 231
86% 97% 33.21 0.14 231
71% 99% 67.09 0.29 231
Methods of Meta-analysis Creation of Summary ROC
– DOR often reasonably consistent across studies
– Deals with variation in threshold
– Moses/Littenberg – allows for trends in DOR with threshold
– Difficult to interpret a unique operating point
– More advanced methods (HSROC, bivariate normal, random effects) estimate variability and uncertainty in values
– Investigate why studies have different results
Variation in studies:Fetal fibronectin in asymptomatic women
SROC (95%CI) predicting 37 weeks’ gestation (28 studies)
Does it make sense to combine?
Do we need studies to be exactly the same?
When can we say we are measuring the same thing?
Are the studies consistent?
Are variations in results between studies consistent with chance?(Test of homogeneity: has low power)
If NO, then WHY?– Variation in study methods (biases)– Variation in intervention– Variation in outcome measure (e.g. timing)– Variation in population
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth
Q – Clearly focused question?
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth
Objective:
– To determine the accuracy with which a cervicovaginal fetal fibronectin test predicts spontaneous preterm birth in women with or without symptoms of preterm labour.
Q
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth
Q – Clearly focused question?
F – Found all available evidence?
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth
Electronic Search: Medline (1966 2000), Embase (1980 2000), PASCAL (1973 2001), BIOSIS (1969 2001), the Cochrane Library (2000:4), MEDION (1974 2000), National Research Register (2000:4), SCISEARCH (1974 2001), and conference papers (1973 2000).
Grey literature: Contacted individual experts and manufacturer of fetal fibronectin test.
Cross-checking: Checked reference lists of known reviews and primary articles toidentify cited articles not captured by electronic searches.
F
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth
Q – Clearly focused question?
F – Found all available evidence?
A – Studies are critically appraised?
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth
A
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth
Q – Clearly focused question?
F – Found all available evidence?
A – Studies are critically appraised?
S – Results are adequately synthesised?
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Subgroups
– Asymptomatic women spontaneous preterm birth before 34 and 37 weeks' gestation
– Symptomatic women spontaneous preterm birth before 34 and 37 weeks' gestation, and within 7 10 days of testing
Quantitative summary:– Used SROC curves as measures of accuracy for all included
studies regardless of their thresholds. – Provided summary likelihood ratios (positive and negative)
Heterogeneity – Assessed heterogeneity of diagnostic odds ratios graphically
and statistically – Meta-regression to explored sources of heterogeneity – Sensitivity - estimated accuracy of the highest quality studies
S
Exercise II
Honest H, Bachmann LM, Gupta JK, Kleijnen J, Khan KS.
Accuracy of cervicovaginal fetal fibronectin test in predicting spontaneous preterm birth: systematic review.
BMJ 2002;325: 301-4
Final points
To Assess Systematic Reviews of Diagnostic Studies use - QFAS
Q– The question should be a structured one
PPICO
F– Finding studies of diagnostic tools is
generally more difficult than therapies.
Final points
A– Spectrum, verification, differential reference and
observer bias to be taken into account
S– Summaries affected by choices of:
• Threshold, Population, and Reference test
– Methods not as well researched as for Therapies– Heterogeneity analysis particularly important in
these reviews
Steering Committee: Bossuyt, Bruns, Gatsonsis, Glasziou, Irwig, Lijmer, Moher, Rennie, de Viet
Guidelines for Conducting SRinDS For diagnostic reviews:
– Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: Recommended Methods. Cochrane Collaboration, 1996.
– Deville WL, Buntinx F, Bouter LM, Montori VM, De Vet HC, Van Der Windt DA, Bezemer P. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol 2002; 2(1): 9.
– Deeks JJ. Systematic reviews of evaluations of diagnostic and screening tests. In: Egger M, Smith GD, Altman DG, eds. Systematic reviews in health care. Meta-analysis in context. London: BMJ Publishing Group, 2001: 248–282.
– Irwig L, Macaskill P, Glasziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995; 48: 119–30.
– The Bayes Library of Diagnostic Studies and Reviews. 2nd Edition, 2002. For diagnostic studies reporting:
– Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC; Standards for Reporting of Diagnostic Accuracy steering group. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003; 326: 41–44.
Databases/sources of studies
Electronic databases:• General: Cochrane CENTRAL, PubMed, Embase, etc.• Subject-specific: AIDSLINE, CANCERLIT, PsycInfo, MEDION, etc.
Reference lists of included studies (citation tracking) References lists of earlier reviews, commentaries
– CDSR, DARE, MEDION, PubMed search with filters for systematic reviews Personal communication with experts and authors Contacting drug/device companies Hand-searching of key, high-yield journals Grey literature
– Dissertation abstracts, reports, conference proceedings, etc. Sources of ongoing studies
– Trial registers, drug companies, contacting experts
Quality assessment
Criteria for validity of diagnostic studies:– Study design
• Cross-sectional study of a clinically indicated population or case-control
– Verification• Complete, different reference tests, or partial
– Blinding• Blinded or not
– Patient selection• Consecutive or random or nonconsecutive
– Data collection• Prospective or retrospective
– Appropriateness of reference standard– Description of test– Description of study population
Lijmer et al. Empiric evidence of design-related bias in studies of diagnostic studies. JAMA 1999;282:1061
Present absolute numbers for test results
Distribution of plasma concentrations of B type natriuretic peptide in normal elderly people and in those with left ventricular systolic dysfunction confirmed by echocardiography
BMJ, 2000; 320: 906-8.