systematic reviews of diagnostic studies guides for appraisal acknowledgements: paul glasziou, jon...

Post on 01-Apr-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Systematic Reviews of Diagnostic Studies

Guides for appraisal

Acknowledgements: Paul Glasziou, Jon Deeks, Madhukar Pai, Patrick Bossuyt, and Matthias Egger.

Information Overload 20,000 biomedical periodicals (6M articles) 17,000 biomedical books annually 30,000 recognized diseases 15,000 therapeutic agents (250/yr)

MEDLINE– 4,000 journals surveyed– 11,000,000 citations– 1.27 million articles related to oncology– 35,000 articles related to ear, nose, or throat

surgery

What makes a Review “Systematic”?

Based on a clearly formulated question

Identifies relevant studies

Appraises quality of studies

Summarizes evidence by use of explicit methodology

Comments based on evidence gathered

Origin of Clinical Questions

Diagnosis: how to select and interpret diagnostic tests

Prognosis: how to anticipate the patient’s likely course

Therapy: how to select treatments that do more good than harm

Prevention: how to screen and reduce the risk for disease

ROADMAPFOR DIAGNOSTICREVIEWS

Steps in a Systematic Review

1 Framing the Question (Q)

2 Identifying relevant publications (F)

3 Assessing Study quality (A)

4 Summarising Evidence and interpreting finding (S)

Step 1- Framing the Question (Q)

Clear, unambiguous, structured question

Questions formulated re: PPICO

– Populations of interest

– Prior test(s) (if appropriate)

– Intervention

– Comparisons (if appropriate)

– Outcomes

Unstructured Question

Is cervicovaginal fetal fibronectin useful?

– For what?

– For whom?

– What is meant by “useful”?

Structured Question

Does a positive cervicovaginal fetal fibronectin test predict

spontaneous preterm birth in asymptomatic women?

Test (Intervention)

Outcome Patient

Step 2 – Identifying relevant publications (F) Wide search of medical/scientific databases

– Medline– Cochrane Reviews– Ovid

Relevance to focused question PPICO– Population– Prior test– Intervention– Comparator– Outcome

Publication and reporting biases

Health Technology Assessment, 2000; 4(10):1-115

All studies conducted

All studies published

Studies reviewedGrey

literature

• Positive Results Bias• Grey Literature Bias• Time-Lag Bias• Language and Country Bias• Multiple Publication Bias• Selective Citation Bias• Database Indexing Bias• Selective Outcome Reporting B.

Registered vs. Published Studies Ovarian Cancer chemotherapy: single v combined

Published Registered

No. studies 16 13

Survival ratio 1.16 1.05

95% CI 1.06-1.27 0.98-1.12

P-Value 0.02 0.25

Simes, J. Clin Oncol, 86, p1529

Sensitivity

1-Specificity0 .2 .4 .6 .8 1

0 .2 .4 .6 .8 1

0

.8

.9

1

0

.8

.9

1

Lucas Bachmann

No search filter: 39 studies retrieved

Search filters for diagnostic studies

Sensitivity

1-Specificity0 .2 .4 .6 .8 1

0 .2 .4 .6 .8 1

0

.8

.9

1

0

.8

.9

1

With search filter: 12 studies retrieved (27 missed)

Lucas Bachmann

Documenting &storing

Assessment of Study Quality (A)

Quality varies, therefore Standardized Assessment (?blind*)

Group/Rank by quality

Select a threshold, e.g. all prospective studies with blind reading of reference and index tests.

* assessment of quality blind to study outcome

Quality Score: Mammals example

In natural habitat (No = 0; Yes = 1)– Setting

Whole animals (No = 0; Yes = 1)– Complete information

Photographs (No = 0; Yes = 1)– Level of evidence

Exercise I: Study Quality

Exercise I: Study Quality

3

3

3

2

2

1

3

1

1

2

1

1

1

3

1

Assessing a Study of a Test (Jaeschke et al, JAMA, 1994, 271: 389-91)

Was an appropriate spectrum of patients included? – (Spectrum Bias)

All patients subjected to a Gold Standard?– (Verification Bias)

Was there an independent, "blind" comparison with a Gold Standard? – Observer Bias; Differential Reference Bias

Methods described so you could repeat test?

Diagnostic Accuracy Study: Basic Design

Series of patientsSeries of patients

Index testIndex test

Reference standardReference standard

Blinded cross-classificationBlinded cross-classification

Selected PatientsSelected Patients

Index testIndex test

Reference standardReference standard

Blinded cross-classificationBlinded cross-classification

Spectrum Bias

Series of patientsSeries of patients

Index testIndex test

Reference Reference standardstandard

Blinded cross-classificationBlinded cross-classification

Verification Bias

Series of patientsSeries of patients

Index testIndex test

Blinded cross-classificationBlinded cross-classification

Ref. Std ARef. Std A Ref. Std. BRef. Std. B

Differential Reference Bias

Series of patientsSeries of patients

Index testIndex test

Reference standardReference standard

Unblinded cross-classificationUnblinded cross-classification

Observer Bias

HF patientsHF patients

Index testIndex test

Blinded cross-classificationBlinded cross-classification

controlscontrols

“Case-control” design

Empirical Effects of Bias

0 1 2 3 4Relative Diagnostic Odds Ratio

no description reference

no description population

no description test

retrospective

non-consecutive

not blinded

partial verification

different reference tests

case-control

0.7 (0.6-0.9)

1.4 (1.1-1.7)

1.7 (1.1-2.5)

1.0 (0.7-1.4)

0.9 (0.7-1.1)

1.3 (1.0-1.9)

1.0 (0.8-1.3)

2.2 (1.5-3.3)

3.0 (2.0-4.5)

Lijmer JG et al. JAMA 1999;282:1062-1067

Step 4 – Summarising the Evidence (S)

Extracting data from trials

Combining data – Meta analysis

Does it make sense to combine?

What is a meta-analysis?

A way to calculate an average

Estimates an ‘average’ or ‘common’ effect

Improves the precision of an estimate by using all available data

What is a meta-analysis?

Optional part of a systematic review

Systematic reviews

Meta-analyses

Summary ROC Meta-analytical Display

0.2

.4.6

.81

sen

sitiv

ity

0.2.4.6.81specificity

for predicting spontaneous birth

Fetal fibronectin

Threshold effects

Increasing threshold increases specificity but decreases sensitivity

Decreasing threshold increases sensitivity but decreases specificity

0.2

.4.6

.81

sens

itivi

ty

0.2.4.6.81specificity

for predicting spontaneous birth

Fetal fibronectin

Accuracy effects

0.2

.4.6

.81

sen

sitiv

ity

0.2.4.6.81specificity

for predicting spontaneous birth

Fetal fibronectin

Over-estimation of accuracy e.g. spectrum bias

Under-estimation of accuracy e.g. poor reference standard

Spectrum effects

0.2

.4.6

.81

sen

sitiv

ity

0.2.4.6.81specificity

for predicting spontaneous birth

Fetal fibronectin

Variation in the non-diseased

study participants

Variation in the

diseased study

participants

Methods of Meta-analysis

Separate pooling of sensitivity and specificity (and likelihood ratios)

– Inappropriate when highly heterogeneous

– Underestimates if there is heterogeneity in threshold

Constant diagnostic odds ratio across thresholds

sensitivity specificity LR+ LR- DOR=LR+/LR-

99% 71% 3.44 0.01 231

97% 86% 6.95 0.03 231

94% 94% 15.19 0.07 231

86% 97% 33.21 0.14 231

71% 99% 67.09 0.29 231

Methods of Meta-analysis Creation of Summary ROC

– DOR often reasonably consistent across studies

– Deals with variation in threshold

– Moses/Littenberg – allows for trends in DOR with threshold

– Difficult to interpret a unique operating point

– More advanced methods (HSROC, bivariate normal, random effects) estimate variability and uncertainty in values

– Investigate why studies have different results

Variation in studies:Fetal fibronectin in asymptomatic women

SROC (95%CI) predicting 37 weeks’ gestation (28 studies)

Does it make sense to combine?

Do we need studies to be exactly the same?

When can we say we are measuring the same thing?

Are the studies consistent?

Are variations in results between studies consistent with chance?(Test of homogeneity: has low power)

If NO, then WHY?– Variation in study methods (biases)– Variation in intervention– Variation in outcome measure (e.g. timing)– Variation in population

Exercise II: Fetal fibronectin for predicting spontaneous preterm birth

Q – Clearly focused question?

Exercise II: Fetal fibronectin for predicting spontaneous preterm birth

Objective:

– To determine the accuracy with which a cervicovaginal fetal fibronectin test predicts spontaneous preterm birth in women with or without symptoms of preterm labour.

Q

Exercise II: Fetal fibronectin for predicting spontaneous preterm birth

Q – Clearly focused question?

F – Found all available evidence?

Exercise II: Fetal fibronectin for predicting spontaneous preterm birth

Electronic Search: Medline (1966 2000), Embase (1980 2000), PASCAL (1973 2001), BIOSIS (1969 2001), the Cochrane Library (2000:4), MEDION (1974 2000), National Research Register (2000:4), SCISEARCH (1974 2001), and conference papers (1973 2000).

Grey literature: Contacted individual experts and manufacturer of fetal fibronectin test.

Cross-checking: Checked reference lists of known reviews and primary articles toidentify cited articles not captured by electronic searches.

F

Exercise II: Fetal fibronectin for predicting spontaneous preterm birth

Q – Clearly focused question?

F – Found all available evidence?

A – Studies are critically appraised?

Exercise II: Fetal fibronectin for predicting spontaneous preterm birth

A

Exercise II: Fetal fibronectin for predicting spontaneous preterm birth

Q – Clearly focused question?

F – Found all available evidence?

A – Studies are critically appraised?

S – Results are adequately synthesised?

Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Subgroups

– Asymptomatic women spontaneous preterm birth before 34 and 37 weeks' gestation

– Symptomatic women spontaneous preterm birth before 34 and 37 weeks' gestation, and within 7 10 days of testing

Quantitative summary:– Used SROC curves as measures of accuracy for all included

studies regardless of their thresholds. – Provided summary likelihood ratios (positive and negative)

Heterogeneity – Assessed heterogeneity of diagnostic odds ratios graphically

and statistically – Meta-regression to explored sources of heterogeneity – Sensitivity - estimated accuracy of the highest quality studies

S

Exercise II

Honest H, Bachmann LM, Gupta JK, Kleijnen J, Khan KS.

Accuracy of cervicovaginal fetal fibronectin test in predicting spontaneous preterm birth: systematic review.

BMJ 2002;325: 301-4

Final points

To Assess Systematic Reviews of Diagnostic Studies use - QFAS

Q– The question should be a structured one

PPICO

F– Finding studies of diagnostic tools is

generally more difficult than therapies.

Final points

A– Spectrum, verification, differential reference and

observer bias to be taken into account

S– Summaries affected by choices of:

• Threshold, Population, and Reference test

– Methods not as well researched as for Therapies– Heterogeneity analysis particularly important in

these reviews

Steering Committee: Bossuyt, Bruns, Gatsonsis, Glasziou, Irwig, Lijmer, Moher, Rennie, de Viet

Guidelines for Conducting SRinDS For diagnostic reviews:

– Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: Recommended Methods. Cochrane Collaboration, 1996.

– Deville WL, Buntinx F, Bouter LM, Montori VM, De Vet HC, Van Der Windt DA, Bezemer P. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol 2002; 2(1): 9.

– Deeks JJ. Systematic reviews of evaluations of diagnostic and screening tests. In: Egger M, Smith GD, Altman DG, eds. Systematic reviews in health care. Meta-analysis in context. London: BMJ Publishing Group, 2001: 248–282.

– Irwig L, Macaskill P, Glasziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995; 48: 119–30.

– The Bayes Library of Diagnostic Studies and Reviews. 2nd Edition, 2002. For diagnostic studies reporting:

– Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC; Standards for Reporting of Diagnostic Accuracy steering group. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003; 326: 41–44.

Databases/sources of studies

Electronic databases:• General: Cochrane CENTRAL, PubMed, Embase, etc.• Subject-specific: AIDSLINE, CANCERLIT, PsycInfo, MEDION, etc.

Reference lists of included studies (citation tracking) References lists of earlier reviews, commentaries

– CDSR, DARE, MEDION, PubMed search with filters for systematic reviews Personal communication with experts and authors Contacting drug/device companies Hand-searching of key, high-yield journals Grey literature

– Dissertation abstracts, reports, conference proceedings, etc. Sources of ongoing studies

– Trial registers, drug companies, contacting experts

Quality assessment

Criteria for validity of diagnostic studies:– Study design

• Cross-sectional study of a clinically indicated population or case-control

– Verification• Complete, different reference tests, or partial

– Blinding• Blinded or not

– Patient selection• Consecutive or random or nonconsecutive

– Data collection• Prospective or retrospective

– Appropriateness of reference standard– Description of test– Description of study population

Lijmer et al. Empiric evidence of design-related bias in studies of diagnostic studies. JAMA 1999;282:1061

Present absolute numbers for test results

Distribution of plasma concentrations of B type natriuretic peptide in normal elderly people and in those with left ventricular systolic dysfunction confirmed by echocardiography

BMJ, 2000; 320: 906-8.

top related