multiple tests, multivariable decision rules, and studies of diagnostic test accuracy michael a....

63
Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple Tests and Multivaraible Decision Rules Coursebook Chapter 6 – Studies of Diagnostic Test Accuracy

Post on 20-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Multiple Tests, Multivariable Decision Rules, and Studies of

Diagnostic Test Accuracy

Michael A. Kohn, MD, MPP

10/14/2004

Coursebook Chapter 5 – Multiple Tests and Multivaraible Decision Rules

Coursebook Chapter 6 – Studies of Diagnostic Test Accuracy

Page 2: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Outline of Topics• Combining results of multiple tests: importance of test

non-independence• Recursive Partitioning• Logistic Regression• Published “rules” for combining test results: importance

of validation separate from derivation• Biases in studies of diagnostic tests:

– Overfitting bias– Incorporation bias– Referral bias– Double gold standard bias– Spectrum bias

Page 3: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Warning: Different ExampleExample of combining two tests in this talk:Exercise ECG and Nuclide Scan as

dichotomous tests for CAD (assumed to be a dichotomous D+/D- disease)*

Example of combining two tests in Coursebook:

Premature birth (GA < 36 weeks) and low birth weight (BW < 2500 grams) as dichotomous tests for neonatal morbidity

*Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology : a basic science for clinical medicine. 2nd ed. Boston: Little Brown; 1991.

Page 4: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

One Dichotomous Test

Exercise ECG CAD+ CAD- LR

Positive 299 44 6.80

Negative 201 456 0.44

Total 500 500

Do you see that this is (299/500)/(44/500)?

Review of Chapter 3: What are the sensitivity, specificity, PPV, and NPV of this test? (Be careful.)

Page 5: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Clinical Scenario – One TestPre-Test Probability of CAD = 33%

EECG Positive

Pre-test prob: 0.33

Pre-test odds: 0.33/0.67 = 0.5

LR(+) = 6.80

Post-Test Odds = Pre-Test Odds x LR(+)

= 0.5 x 6.80 = 3.40

Post-Test prob = 3.40/(3.40 + 1) = 0.77

Page 6: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Pre-Test Probability of CAD = 33%EECG PositivePost-Test Probability of CAD = 77%

Clinical Scenario – One Test

Pre-Test Odds of CAD = 0.50EECG Positive (LR = 6.80)Post-Test Odds of CAD = 3.40

Using Probabilities

Using Odds

Page 7: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Clinical Scenario – One TestPre-Test Probability of CAD = 33%

EECG Positive

EECG + (LR = 6.80) |----------------->

+------------------------------------------X------------------X----------+ | | | | | | | Log(Odds) 2 -1.5 -1 -0.5 0 0.5 1 Odds 1:100 1:33 1:10 1:3 1:1 3:1 10:1 Prob 0.01 0.03 0.09 0.25 0.5 0.75 0.91

Odds = 0.50Prob = 0.33

Odds = 3.40Prob = 0.77

Page 8: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Second Dichotomous Test

Nuclide Scan CAD+ CAD- LR

Positive 416 190 2.19

Negative 84 310 0.27

Total 500 500

Do you see that this is (416/500)/(190/500)?

Page 9: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Pre-Test Probability of CAD = 33%EECG PositivePost-EECG Probability of CAD = 77%Nuclide Scan PositivePost-Nuclide Probability of CAD = ?

Clinical Scenario –Two Tests

Using Probabilities

Page 10: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Clinical Scenario – Two Tests

Pre-Test Odds of CAD = 0.50EECG Positive (LR = 6.80)Post-Test Odds of CAD = 3.40Nuclide Scan Positive (LR = 2.19?)Post-Test Odds of CAD = 3.40 x 2.19?

= 7.44? (P = 7.44/(1+7.44) = 88%?)

Using Odds

Page 11: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Clinical Scenario – Two TestsPre-Test Probability of CAD = 33%

EECG Positive

Odds = 0.50Prob = 0.33

Odds = 3.40Prob = 0.77

E-ECG + (LR = 6.80) |-----------------> Nuclide + (LR = 2.19) |------> E-ECG + Nuclide + Can we do this? |----------------->|-----> E-ECG + and Nuclide + +--------------------------------X------------------X------X---+ | | | | | | | Log(Odds) 2 -1.5 -1 -0.5 0 0.5 1 Odds 1:100 1:33 1:10 1:3 1:1 3:1 10:1 Prob 0.01 0.03 0.09 0.25 0.5 0.75 0.91

Odds = 7.44Prob = 0.88

Page 12: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Question

Can we use the post-test odds after a positive Exercise ECG as the pre-test odds for the positive nuclide scan?

i.e., can we combine the positive results by multiplying their LRs?

LR(E-ECG +, Nuclide +) = LR(E-ECG +) x LR(Nuclide +) ?

= 6.80 x 2.19 ?

= 14.88 ?

Page 13: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Answer = No

E-ECG Nuclide CAD+ % CAD- % LR

Pos Pos 276 55% 26 5% 10.62

Pos Neg 23 5% 18 4% 1.28

Neg Pos 140 28% 164 33% 0.85

Neg Neg 61 12% 292 58% 0.21

Total Total 500 100% 500 100%  

Not 14.88

Page 14: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Non-Independence

A positive nuclide scan does not tell you as much if the patient has already had a positive exercise ECG.

Page 15: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Clinical Scenario

Pre-Test Odds of CAD = 0.50EECG +/Nuclide Scan + (LR = 10.62)Post-Test Odds of CAD = 0.50 x 10.62

= 5.31 (P = 5.31/(1+5.31) = 84%, not 88%)

Using Odds

Page 16: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Non-Independence

E-ECG + |-----------------> Nuclide + |------> E-ECG + Nuclide + if tests were independent |----------------->|-----> E-ECG + and Nuclide + since tests are dependent |--------------------> +--------------------------------X--------------------X--------+ | | | | | | | Log(Odds) 2 -1.5 -1 -0.5 0 0.5 1 Odds 1:100 1:33 1:10 1:3 1:1 3:1 10:1 Prob 0.01 0.03 0.09 0.25 0.5 0.75 0.91

Prob = 0.84

Page 17: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Non-Independence

Instead of the nuclide scan, what if the second test were just a repeat exercise ECG?

A second positive E-ECG would do little to increase your certainty of CAD. If it was false positive the first time around, it is likely to be false positive the second time.

Page 18: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Counterexamples: Possibly Independent Tests

For Venous Thromboembolism:

• CT Angiogram of Lungs and Doppler Ultrasound of Leg Veins

• Alveolar Dead Space and D-Dimer

• MRA of Lungs and MRV of leg veins

Page 19: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Unless tests are independent, we can’t combine results by

multiplying LRs

Page 20: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Ways to Combine Multiple Tests

On a group of patients (derivation set), perform the multiple tests and determine true disease status (apply the gold standard)

• Measure LR for each possible combination of results

• Recursive Partitioning

• Logistic Regression

Page 21: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Determine LR for Each Result Combination

E-ECG Nuclide CAD+ % CAD- % LRPost Test

Prob*

Pos Pos 276 55% 26 5% 10.62 84%

Pos Neg 23 5% 18 4% 1.28 39%

Neg Pos 140 28% 164 33% 0.85 30%

Neg Neg 61 12% 292 58% 0.21 9%

Total Total 500 100% 500 100%  

*Assumes pre-test prob = 33%

Page 22: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Determine LR for Each Result Combination

2 dichotomous tests: 4 combinations

3 dichotomous tests: 8 combinations

4 dichotomous tests: 16 combinations

Etc.

2 3-level tests: 9 combinations

3 3-level tests: 27 combinations

Etc.

Page 23: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Determine LR for Each Result Combination

How do you handle continuous tests?

Not practical for most groups of tests.

Page 24: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Recursive Partitioning

Exercise ECG

Nuclide Scan

Negative Positive

77%18%

Negative

9 %

Suspected CAD (P = 33%)

30 %

Nuclide Scan

Positive Negative

39 %

Positive

84 %

Page 25: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Recursive Partioning

• Same as Classification and Regression Trees (CART)

• Don’t have to work out probabilities (or LRs) for all possible combinations of tests, because of “tree pruning”

Page 26: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Tree Pruning: Goldman Rule*

8 “Tests” for Acute MI in ER Chest Pain Patient :1. ST Elevation on ECG; 2. CP < 48 hours; 3. ST-T changes on ECG; 4. Hx of ACI; 5. Radiation of Pain to Neck/LUE; 6. Longest pain > 1 hour; 7. Age > 40 years; 8. CP not reproduced by palpation.

*Goldman L, Cook EF, Brand DA, et al. A computer protocol to predict myocardial infarction in emergency department patients with chest pain. N Engl J Med. 1988;318(13):797-803.

Page 27: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

ST Elevation

CP < 48 hrs No Yes

14%No

9 %

10 %

YesNo

Yes

80 %CP < 48 hrs

No Yes No YesHx of ACI Hx of ACI Hx of ACI Hx of ACI

YesST Changes

No7% 25 %

CP > 1 hr CP > 1 hr

No NoYes Yes0% 11%

No YesNo Yes

8 tests 28 = 256 Combinations

Page 28: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple
Page 29: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Recursive Partitioning

• Does not deal well with continuous test results

Page 30: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Logistic Regression

Ln(Odds(D+)) =

a + bE-ECGE-ECG+ bNuclideNuclide + binteract(E-ECG)(Nuclide)

“+” = 1

“-” = 0

More on this later in ATCR!

Page 31: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Logistic Regression Approach to the “R/O ACI patient”

*Selker HP, Griffith JL, D'Agostino RB. A tool for judging coronary care unit admission appropriateness, valid for both real-time and retrospective use. A time-insensitive predictive instrument (TIPI) for acute cardiac ischemia: a multicenter study. Med Care. Jul 1991;29(7):610-627. For corrected coefficients, see http://medg.lcs.mit.edu/cardiac/cpain.htm

Coefficient MV Odds Ratio

Constant -3.93  

Presence of chest pain 1.23 3.42

Pain major symptom 0.88 2.41

Male Sex 0.71 2.03

Age 40 or less -1.44 0.24

Age > 50 0.67 1.95

Male over 50 years** -0.43 0.65

ST elevation 1.314 3.72

New Q waves 0.62 1.86

ST depression 0.99 2.69

T waves elevated 1.095 2.99

T waves inverted 1.13 3.10

T wave + ST changes** -0.314 0.73

Page 32: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Clinical Scenario*

71 y/o man with 2.5 hours of CP, substernal, non-radiating, described as “bloating.” Cannot say if same as prior MI or worse than prior angina.

Hx of CAD, s/p CABG 10 yrs prior, stenting 3 years and 1 year ago. DM on Avandia.

ECG: RBBB, Qs inferiorly. No ischemic ST-T changes.

*Real patient seen by MAK 1 am 10/12/04

Page 33: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple
Page 34: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Coefficient Clinical Scenario

Constant -3.93 Result -3.93

Presence of chest pain 1.23 1 1.23

Pain major symptom 0.88 1 0.88

Sex 0.71 1 0.71

Age 40 or less -1.44 0 0

Age > 50 0.67 1 0.67

Male over 50 years -0.43 1 -0.43

ST elevation 1.314 0 0

New Q waves 0.62 0 0

ST depression 0.99 0 0

T waves elevated 1.095 0 0

T waves inverted 1.13 0 0

T wave + ST changes -0.314 0 0

-0.87

Odds of ACI 0.418952

Probability of ACI 30%

Page 35: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

What Happened to Pre-test Probability?

Typically clinical decision rules report probabilities rather than likelihood ratios for combinations of results.

Can “back out” LRs if we know prevalence, p[D+], in the study dataset.

With logistic regression models, this “backing out” is known as a “prevalence offset.” (See Chapter 5A.)

Page 36: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Need for ValidationDevelop prediction rule by choosing a few

tests and findings from a large number of possibilities.

Takes advantage of chance variations in the data.

Predictive ability of rule will probably disappear when you try to validate on a new dataset.

Can be referred to as “overfitting.”

Page 37: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Need for Validation: Example*Study of clinical predictors of bacterial diarrhea.Evaluated 34 historical items and 16 physical

examination questions. 3 questions (abrupt onset, > 4 stools/day, and

absence of vomiting) best predicted a positive stool culture (sensitivity 86%; specificity 60% for all 3).

Would these 3 be the best predictors in a new dataset? Would they have the same sensitivity and specificity?

*DeWitt TG, Humphrey KF, McCarthy P. Clinical predictors of acute bacterial diarrhea in young children. Pediatrics. Oct 1985;76(4):551-556.

Page 38: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

VALIDATION

No matter what technique (CART or logistic regression) is used, the “rule” for combining multiple test results must be tested on a data set different from the one used to derive it.

Beware of “validation sets” that are just re-hashes of the “derivation set”.

(This begins our discussion of potential problems with studies of diagnostic tests.)

Page 39: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic Test Accuracy

Sackett, EBM, pg 68

1. Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis?

2. Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?

3. Was the reference standard applied regardless of the diagnostic test result?

4. Was the test (or cluster of tests) validated in a second, independent group of patients?

Page 40: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsOverfitting Bias (“Data Snooping”)

Usually a problem for multi-test rules which use a few predictors chosen from a wide array of candidates.

But, in studies of single tests, beware of “data-snooped” cutoffs:

“A procalcitonin concentration of 3.9088 ng/ml is the best cutoff for predicting ventilator-associated pneumonia.”

“A CSF WBC:RBC ratio < 1:117 is a sensitive and specific predictor of ‘real’ meningitis vs. a traumatic puncture”

“A birth weight cutoff of 1625 grams accurately identifies newborns at high risk for neonatal morbidity and mortality.”

Page 41: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsOverfitting Bias

Problems with “Data-Snooped” Cutoffs

-- Dependent on the derivation set, require independent validation

-- Fixed cutoffs assume a common prevalence or pre-test probability of disease (Recall our discussion in Chapter 4 about the undesirability of a fixed cutoff for a multi-level or continuous test)

Page 42: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsSackett, EBM, pg 68

1. Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis?

2. Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?

3. Was the reference standard applied regardless of the diagnostic test result?

4. Was the test (or cluster of tests) validated in a second, independent group of patients?

Page 43: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsIncorporation Bias

Consider a study of the usefulness of various findings for diagnosing pancreatitis. If the "Gold Standard" is a discharge diagnosis of pancreatitis, which in many cases will be based upon the serum amylase, then the study can't quantify the accuracy of the amylase for this diagnosis.

Page 44: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsIncorporation Bias

A study* of BNP in dyspnea patients as a diagnostic test for CHF also showed that the CXR performed extremely well in predicting CHF.

*Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002;347(3):161-7.

The two cardiologists who determined the final diagnosis of CHF were blinded to the BNP level but not to the CXR report, so the assessment of BNP should be unbiased, but not the assessment CXR.

Page 45: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsSackett, EBM, pg 68

1. Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis?

2. Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?

3. Was the reference standard applied regardless of the diagnostic test result?

4. Was the test (or cluster of tests) validated in a second, independent group of patients?

Page 46: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsReferral Bias

The study population only includes those to whom the gold standard was applied, but patients with positive index tests are more likely to be referred for the gold standard.

Example: Swelling as a test for ankle fracture. Gold standard is a positive X-ray. Patients with swelling are more likely to be referred for x-ray. Only patients who had x-rays are included in the study.

Page 47: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsReferral Bias

Fracture No Fracture

Swelling a b

No Swelling c d

Sensitivity (a/(a+c)) is biased UP.

Specificity (d/(b+d)) is biased DOWN.

Page 48: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsReferral Bias Example*

Test: A-a O2 gradient

Disease: PE

Gold Standard: VQ scan or pulmonary angiogram

Study Population: Patients who had VQ scan or PA-gram

Results: A-a O2 gradient > 20 mm Hg had very high sensitivity (almost every patient with PE by VQ scan or PA gram had a gradient > 20 mm Hg), but a very low specificity (lots of patients with negative PA grams had gradients > 20 mm Hg).

*McFarlane MJ, Imperiale TF. Use of the alveolar-arterial oxygen gradient in the diagnosis of pulmonary embolism. Am J Med. 1994;96(1):57-62.

Page 49: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsReferral Bias

VQ Scan + VQ Scan -

A-aO2 > 20 mmHg

a b

A-aO2 < 20 mmHg

c d

Sensitivity (a/(a+c)) is biased UP.*

Specificity (d/(b+d)) is biased DOWN.

*Still concluded test not sensitive enough, so it probably isn’t.

Page 50: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsDouble Gold Standard

One gold standard (e.g. biopsy) is applied in patients with positive index test, another gold standard (e.g., clinical follow-up) is applied in patients with a negative index test.

Page 51: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsDouble Gold Standard

Test: A-a O2 gradient

Disease: PE

Gold Standard: VQ scan or pulmonary angiogram in patients who had one, clinical follow-up in patients who didn’t

Study Population: All patients presenting to the ED with dyspnea.

Some patients did not get VQ scan or PA-gram because of normal A-a O2 gradients but would have had positive studies. Instead they had negative clinical follow-up and were counted as true negatives.

Page 52: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsDouble Gold Standard

PE No PE

A-a O2 > 20 a b

A-a O2 < 20 c d

Sensitivity (a/(a+c)) biased UPSpecificity (d/(b+d)) biased UP

Page 53: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsSackett, EBM, pg 68

1. Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis?

2. Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?

3. Was the reference standard applied regardless of the diagnostic test result?

4. Was the test (or cluster of tests) validated in a second, independent group of patients?

Page 54: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsSpectrum Bias

So far, we have said that PPV and NPV of a test depend on the population being tested, specifically on the prevalence of D+ in the population.

We said that sensitivity and specificity are properties of the test and independent of the prevalence and, by implication at least, the population being tested.

In fact, …

Page 55: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsSpectrum Bias

Sensitivity depends on the spectrum of disease in the population being tested.

Specificity depends on the spectrum of non-disease in the population being tested.

Page 56: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsSpectrum Bias

D+ and D- groups are not homogeneous.

D-/D+ really is D-,D+, D++, or D+++

D-/D+ really is (D1-, D2-, or D3-)/D+

Page 57: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Studies of Diagnostic TestsSpectrum Bias

Example: Pale Conjunctiva as Test for Iron Deficiency Anemia

Assume that conjunctival paleness always occurs at HCT < 25

Page 58: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Pale Conjunctiva as a Test for Iron Deficiency

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 5 10 15 20 25 30 35 40 45Hematocrit

Pro

bab

ilit

y

African Iron-DeficientChildren

US Iron-DeficientChildren

Pale Conjunctiva Normal Conjunctiva

Page 59: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Pale Conjunctiva as a Test for Iron Deficiency

Sensitivity is HIGHER in the population with more severe disease

Page 60: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Pale Conjunctiva as a Test for Iron Deficiency

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

10 15 20 25 30 35 40 45 50 55 60

Pro

bab

ilit

y

African ChildrenWITHOUT IronDeficiencyUS Children WITHOUTIron Deficiency

Pale Conjunctiva Normal Conjunctiva

Page 61: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Pale Conjunctiva as a Test for Iron Deficiency

Specificity is LOWER in the population with more severe non-disease.(Patients without the disease in question are more likely to have other diseases that can be confused with the disease in question.)

Page 62: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Biases in Studies of Tests• Overfitting Bias – “Data snooped” cutoffs take advantage

of chance variations in derivations set making test look falsely good.

• Incorporation Bias – index test part of gold standard (Sensitivity Up, Specificity Up)

• Referral Bias – positive index test increases referral to gold standard (Sensitivity Up, Specificity Down)

• Double Gold Standard – positive index test causes application of definitive gold standard, negative index test results in clinical follow-up (Sensitivity Up, Specificity Up)

• Spectrum Bias– D+ sickest of the sick (Sensitivity Up)– D- wellest of the well (Specificity Up)

Page 63: Multiple Tests, Multivariable Decision Rules, and Studies of Diagnostic Test Accuracy Michael A. Kohn, MD, MPP 10/14/2004 Coursebook Chapter 5 – Multiple

Biases in Studies of Tests

Don’t just identify potential biases, figure out how the biases could affect the conclusions.

Studies concluding a test is worthless are not invalid if biases in the design would have led to the test looking BETTER than it really is.