evidence based diagnosis mark j. pletcher, md mph 6/28/2012 combining tests

Evidence Based DiagnosisMark J. Pletcher, MD MPH

6/28/2012

Combining Tests

Acknowledgements

For this lecture I’ve adapted a slide set from Mike Kohn

Combining Tests

Overview A case with 2 simple tests Test non-independence Approaches to combining tests

Looking at all possible combinations of results

Recursive partitioning, logistic regression, other

Overfitting and validation in multitest panels

Combining Tests – A Case Case

Pregnant woman getting prenatal care, worried about Down’s Syndrome (Trisomy 21)

Chorionic Villus Samping (CVS) is a definitive test, but there is a risk of miscarriage

Should she get this procedure?

Combining Tests – A Case

Age helps…

Risk goes up with age

Our patient is 41, so pretest risk is ~2%...

Ultrasound can help even more

It’s harmless Several features predict Trisomy 21

(Down’s) at 11-14 weeks* Nuchal translucency Nasal bone absence


*Cicero, S., G. Rembouskos, et al. (2004). "Likelihood ratio for trisomy 21 in fetuses with absent nasal bone at the 11-14-week scan." Ultrasound Obstet Gynecol 23(3): 218-23.

How do we use these two features together?


First, nuchal translucency (NT)

Wider translucent “gap” here is predictive of Down’s

Nuchal Translucency Data

Cross-sectional study

5556 Pregnant Women undergoing CVS

333 (6%) with Trisomy 21 fetus

All had ultrasound at 11-14 weeks

>95th Perc.37.9%, 88.6%

> 3.5 mm9.2%, 63.7%

> 4.5 mm3.5%, 43.5%

> 5.5 mm1.9%, 31.2%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

1 - Specificity

Sen

siti

vity

Dichotomize here for purposes of illustration


Trisomy 21

Nuchal D+ D-

Translucency

≥ 3.5 mm (+) 212 478 690

< 3.5 mm (-) 121 4745 4866

Total 333 5223

Sensitivity and Specificity?

PPV and NPV?

Nuchal Translucency

• Sensitivity = 212/333 = 64%• Specificity = 4745/5223 = 91%

and IF we assume that this cross-sectional sample represents our population of interest, then:

• Prevalence = 333/(333+5223) = 6%• PPV = 212/(212 + 478) = 31%• NPV = 4745/(121 + 4745) = 97.5%


Trisomy 21

Nuchal D+ D-

Translucency

≥ 3.5 mm 212 478 690

< 3.5 mm 121 4745 4866

Total 333 5223

LR+ and LR-?


Trisomy 21

Nuchal D+ D- LR

Translucency

≥ 3.5 mm 212 478

< 3.5 mm 121 4745

Total 333 5223

LR+ = P(T+|D+)/P(T+|D-)

LR- = P(T-|D+)/P(T-|D-)


Trisomy 21

Nuchal D+ D- LR

Translucency

≥ 3.5 mm 212 478 7.0

< 3.5 mm 121 4745 0.4

Total 333 5223

LR+ = (212/333)/(478/5223) = 7.0

LR- = (121/333)/(4745/5223) = 0.4

Back to the case…

Let’s apply this data to our case, with pre-test probability of 2%

Post-test risk using NT only

Pre-test prob: 0.02 at age 41Pre-test odds: 0.02/0.98 = 0.0204IF TEST IS POSITIVE - LR = 7.0Post-Test Odds = Pre-Test Odds x LR(+)

= 0.0204 x 7.0 = 0.143Post-Test prob = 0.143/(0.143 + 1) = 12.5%

Post-test risk using NT only

Pre-test prob: 0.02 at age 41Pre-test odds: 0.02/0.98 = 0.0204IF TEST IS NEGATIVE - LR = 0.4Post-Test Odds = Pre-Test Odds x LR(+)

= 0.0204 x 0.4 = 0.0082Post-Test prob = 0.0082/(0.0082 + 1) = .8%

Back to the case…

Is .8% risk low enough to not get CVS?

Is 12.5% risk high enough to risk CVS?

OTHER Ultrasound features are also predictive Nasal bone absence

Nasal Bone SeenNBA=“No”

Neg for Trisomy 21

Nasal Bone AbsentNBA=“Yes”

Pos for Trisomy 21

Nasal Bone Absence Test Data

Nasal Bone Tri21+ Tri21- LR

Absent

Yes 229 129 27.8

No 104 5094 0.32

Total 333 5223

Post-test risk using NBA only

Pre-test prob: 0.02 at age 41

Pre-test odds: 0.02/0.98 = 0.0204

IF TEST IS POSITIVE - LR = 27.8

Post-Test Odds = Pre-Test Odds x LR(+)

= 0.0204 x 27.8 = .567

Post-Test prob = .567/(.567 + 1) = 36%

Post-test risk using NBA only

Pre-test prob: 0.02 at age 41

Pre-test odds: 0.02/0.98 = 0.0204

IF TEST IS NEGATIVE - LR = 0.32

Post-Test Odds = Pre-Test Odds x LR(+)

= 0.0204 x 0.32 = 0.0065

Post-Test prob = 0.0065/(0.0065 + 1) = .6%

Back to the case…

NBA is a bit better than NT, but still important uncertainty…

Can we combine our NT results with NBA results and do even better?

How do we combine test results?

Combining tests

Approach #1 – Assume independence Knowing results of one test doesn’t

influence how you interpret the next test We usually assume LR is independent of

pre-test probability This is what we did when we used a pre-test

risk of 2% instead of 6% in our calculations If so, we can just do the calculations

sequentially

Assuming test independence

First do NT, assume it’s positive (LR = 7)

Pre-test risk Post-test risk

2% 12.5%

Then do NBA, assume it’s also positive (LR = 23.7)

Pre-test risk Post-test risk

12.5% 77%


What’s the mathematical shortcut?

LR(1) * LR(2) = LR(1&2)

7 * 27.8 = 195


What’s the mathematical shortcut?

LR(1) * LR(2) = LR(1&2)

NT NBA LR

+ + 195

+ - 2.2

- + 11.2

- - 0.13


Slide rule approach (pre-test prob = 6%)Line arrows up without shrinkage

Combining tests

Is it reasonable to assume independence?

Does nasal bone absence tell you as much if you already know that the nuchal translucency is >3.5 mm?

What can we do to figure this out?

Combining tests

Approach #2 – evaluate all possible test result combinations

Joint eval of 4 test result combinations

NT NBA

Trisomy 21+ %

Trisomy 21- % LR

Pos Pos 158 47% 36 0.7% 69

Pos Neg 54 16% 442 8.5% 1.9

Neg Pos 71 21% 93 1.8% 12

Neg Neg 50 15% 4652 89% 0.2

Totals 333 100% 5223 100%

Vs.

195

2.2

11.2

.13

If tests were independent…

Combining tests

The Answer – the tests are NOT completely independent

So we CANNOT just multiply LR’sWhat should we do in this case?

Use LR’s from the combination table

Joint eval of 4 test result combinations

NT NBE

Trisomy 21+ %

Trisomy 21

- % LR

Pos Pos 158 47% 36 0.7% 69

Pos Neg 54 16% 442 8.5% 1.9

Neg Pos 71 21% 93 1.8% 12

Neg Neg 50 15% 4652 89% 0.2

Totals 333 100% 5223 100%

Use these!

Create ROC Table

NT NBE Tri21+ Sens Tri21- 1 - Spec LR

0% 0%

Pos Pos 158 47% 36 0.70% 69

Neg Pos 71 68% 93 3% 12

Pos Neg 54 84% 442 11% 1.9

Neg Neg 50 100% 4652 100% 0.2

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Sensitivity

1 - S

pecific

ity

AUROC = 0.896

Optimal Cutoff Analysis

NT NBE LRPost-Test

Prob

Pos Pos 69 81%

Neg Pos 12 43%

Pos Neg 1.9 11%

Neg Neg 0.2 1%

If we assume:

• Pre-test probability = 6%

• Threshold for CVS = 2%

Optimal algorithm is “any positive test CVS”

Non-independence

What does non-independence mean?

Non-independenceSlide rule approach (pre-test prob = 6%) The total arrow

length is NOT equal to the sum of its parts!

Non-independence

Technical definition of independence - must condition on disease status:

If this stringent definition is not met, the tests are non-independent

In patients with disease,a false negative on Test 1 does not affect the probability of a false negative on Test 2.

In patients without disease, a false positive on Test 1 does not affect the

probability of a false positive on Test 2.

Non-independence

Reasons for non-independence? Tests measure the same aspect of

disease.

Simple example: predicting pneumonia Cyanosis: LR = 5 O2 sat 85%-90%: LR = 6 Can’t just multiply these LR’s because

they really just reflect the same physiologic state!

Non-independence

Reasons for non-independence? Tests measure the same aspect of

disease. In our example:

One aspect of Down’s syndrome is slower fetal development; the NT decreases more slowly AND the nasal bone ossifies later. Chromosomally NORMAL fetuses that develop slowly will tend to have false positives on BOTH the NT Exam and the Nasal Bone Exam.

Non-independence

Other reasons for non-independence? Disease is heterogeneous

In severe pneumonia, all tests tend to be abnormal, so each individual test tells you less

O2 sat and respiratory rate

Non-disease is heterogeneous In patients with cough but no pneumonia,

abnormal tests may still track together 02 sat and respiratory rate also both abnormal with

PE; and both are normal with viral URI

See EBD page 158

Back to the case…

Remember that we actually simplified the case: Nuchal translucency is really a continuous test.

How do we take into account actual continuous NT measurement and NBA (and age, race, fetal crown-rump length, etc)?

Back to the case…

Can’t do combination table for all possible combinations!

2 dichotomous tests = 4 combinations 4 dichotomous tests = 16 combinations 3 3-level tests = 27 combinations How do we deal with continuous tests?

Combining tests

Approach #3: Recursive partitioning Repeatedly split the data to find

optimal testing/decision algorithm “prune” the tree

Combining tests

Approach #3: Recursive partitioning

Nuchal Translucency

Nasal Bone

< 3.5 mm ≥ 3.5 mm

31%2.5%

Present

1 %

Suspected Trisomy 21 (P = 6%)

43 %

Nasal Bone

Absent Present

11 %

Absent

81 %

Combining tests


Nuchal Translucency

Nasal Bone

< 3.5 mm ≥ 3.5 mm

31%2.5%

Present

1 %


43 %

Nasal Bone

Absent Present

11 %

Absent

81 %

Non-optimal test ordering

Nasal Bone

Nuchal Translucency

< 3.5 mm≥ 3.5 mm

64%2%

Present

1 %


11 % 43 %

Absent

81 %

< 3.5 mm≥ 3.5 mm

Nuchal Translucency

No NT, CVS

CVSNo CVS

Combining tests


You might do nasal bone test first, then “prune”

Nasal Bone

Nuchal Translucency

< 3.5 mm

64%2%

Present

1 %

Suspected Trisomy 21 (P = 6% )

11 %

Absent

≥ 3.5 mmCVS

CVSNo CVS

Combining tests


Final algorithm: do Nasal Bone exam first, stop if absence and do CVS…

Combining tests


Sophisticated statistical algorithms optimize cutpoints

Combining tests


For classic example, see Figure 8.7: Chest pain workup algorithm (Goldman et al)

Combining tests


BUT: Still requires dichotomizing at cutpoints

Combining tests

Approach #4: Logistic regression Uses a statistical model to combine

test results and predict disease Designed to account for non-

independence Handles continuous test results Can produce a “score”

A single integrated continuous test result Score subject to ROC curve, C-statistic,

other standard continuous test analyses

Combining tests

Approach #4: Logistic regression

For classic example, see Table 8.5: Predicting death in patients with pneumonia – The PORT score

Combining tests

Approach #5: Other fancy algorithms

Neural networks Random forests Boosting Etc.

Combining tests

The Major Pitfall - Overfitting

What happens when you throw more variables into a model? Will the model perform better?

Combining tests

The Major Pitfall

What happens when you throw more variables into a model? Will the model perform better?

YES, in the “derivation” set (even random noise will look good!)

NO, when you try to apply in the real world!

Combining tests

The more complex your test algorithm, the more important it is to VALIDATE Split your sample into a “derivation

set” and a “test set” 10-fold cross-validation, etc Validate in an EXTERNAL sample

Example 1 - predicting CAC with multiple risk factors

Should we do a heart scan for atherosclerosis?

Can we predict with clinical characteristics who has atherosclerosis without doing a heart scan?

Example 1 - predicting CAC with multiple risk factors

AUC-ROCModel Naïve* Cross-validated

Age + sex + race .7754 .7743“” + standard CHD RF’s .7901 .7881“” + all possible race-sex interactions .7941 .7843

Last model is most complex, highest “naïve” AUC-ROC, but NOT the highest cross-validated AUC-ROC because it is “over-fit”.

* - “Naïve” AUC-ROC refers to the AUC-ROC that you get when you estimate it within the same dataset from which the test algorithm was derived

Example 2 - predicting CAC with a proteomics “signal”

Proteomic analysis is an extreme example of combining test results: hundreds to thousands of signal peak heights, many just noise

Example 2: proteomics-CAC

Proteomics algorithm looks great in the derivation set!

Example 2: proteomics-CAC

But cross-validation shows that it was all just useless noise (AUC-ROC ~0.5)

VALIDATION

No matter what technique (CART or logistic regression) is used, the tests included in a model and the way in which their results are combined must be tested on a data set different from the one used to derive the rule.

Combining Tests Take home points

Test non-independence is the rule, not the exception, so usually CAN’T just multiple LR’s together

In simple cases, look at LR’s for all possible test result combinations

Fancier methods often used, but look for validation analyses, especially when there are LOTS of tests being combined.

evidence based diagnosis mark j. pletcher, md mph 6/28/2012 combining tests

Documents

posttest odds0

posttest prob0

posttest odds4

posttest odds12

logpretest odds0

pretest prob0

pos nbepos nbepretest

neg nbeneg nbepretest