evidence based diagnosis mark j. pletcher, md mph 6/28/2012 combining tests
TRANSCRIPT
Combining Tests
Overview A case with 2 simple tests Test non-independence Approaches to combining tests
Looking at all possible combinations of results
Recursive partitioning, logistic regression, other
Overfitting and validation in multitest panels
Combining Tests – A Case Case
Pregnant woman getting prenatal care, worried about Down’s Syndrome (Trisomy 21)
Chorionic Villus Samping (CVS) is a definitive test, but there is a risk of miscarriage
Should she get this procedure?
Combining Tests – A Case
Age helps…
Risk goes up with age
Our patient is 41, so pretest risk is ~2%...
Ultrasound can help even more
It’s harmless Several features predict Trisomy 21
(Down’s) at 11-14 weeks* Nuchal translucency Nasal bone absence
Combining Tests – A Case
*Cicero, S., G. Rembouskos, et al. (2004). "Likelihood ratio for trisomy 21 in fetuses with absent nasal bone at the 11-14-week scan." Ultrasound Obstet Gynecol 23(3): 218-23.
How do we use these two features together?
Nuchal Translucency Data
Cross-sectional study
5556 Pregnant Women undergoing CVS
333 (6%) with Trisomy 21 fetus
All had ultrasound at 11-14 weeks
>95th Perc.37.9%, 88.6%
> 3.5 mm9.2%, 63.7%
> 4.5 mm3.5%, 43.5%
> 5.5 mm1.9%, 31.2%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1 - Specificity
Sen
siti
vity
Dichotomize here for purposes of illustration
Nuchal Translucency Data
Trisomy 21
Nuchal D+ D-
Translucency
≥ 3.5 mm (+) 212 478 690
< 3.5 mm (-) 121 4745 4866
Total 333 5223
Sensitivity and Specificity?
PPV and NPV?
Nuchal Translucency
• Sensitivity = 212/333 = 64%• Specificity = 4745/5223 = 91%
and IF we assume that this cross-sectional sample represents our population of interest, then:
• Prevalence = 333/(333+5223) = 6%• PPV = 212/(212 + 478) = 31%• NPV = 4745/(121 + 4745) = 97.5%
Nuchal Translucency Data
Trisomy 21
Nuchal D+ D-
Translucency
≥ 3.5 mm 212 478 690
< 3.5 mm 121 4745 4866
Total 333 5223
LR+ and LR-?
Nuchal Translucency Data
Trisomy 21
Nuchal D+ D- LR
Translucency
≥ 3.5 mm 212 478
< 3.5 mm 121 4745
Total 333 5223
LR+ = P(T+|D+)/P(T+|D-)
LR- = P(T-|D+)/P(T-|D-)
Nuchal Translucency Data
Trisomy 21
Nuchal D+ D- LR
Translucency
≥ 3.5 mm 212 478 7.0
< 3.5 mm 121 4745 0.4
Total 333 5223
LR+ = (212/333)/(478/5223) = 7.0
LR- = (121/333)/(4745/5223) = 0.4
Post-test risk using NT only
Pre-test prob: 0.02 at age 41Pre-test odds: 0.02/0.98 = 0.0204IF TEST IS POSITIVE - LR = 7.0Post-Test Odds = Pre-Test Odds x LR(+)
= 0.0204 x 7.0 = 0.143Post-Test prob = 0.143/(0.143 + 1) = 12.5%
Post-test risk using NT only
Pre-test prob: 0.02 at age 41Pre-test odds: 0.02/0.98 = 0.0204IF TEST IS NEGATIVE - LR = 0.4Post-Test Odds = Pre-Test Odds x LR(+)
= 0.0204 x 0.4 = 0.0082Post-Test prob = 0.0082/(0.0082 + 1) = .8%
Back to the case…
Is .8% risk low enough to not get CVS?
Is 12.5% risk high enough to risk CVS?
OTHER Ultrasound features are also predictive Nasal bone absence
Nasal Bone Absence Test Data
Nasal Bone Tri21+ Tri21- LR
Absent
Yes 229 129 27.8
No 104 5094 0.32
Total 333 5223
Post-test risk using NBA only
Pre-test prob: 0.02 at age 41
Pre-test odds: 0.02/0.98 = 0.0204
IF TEST IS POSITIVE - LR = 27.8
Post-Test Odds = Pre-Test Odds x LR(+)
= 0.0204 x 27.8 = .567
Post-Test prob = .567/(.567 + 1) = 36%
Post-test risk using NBA only
Pre-test prob: 0.02 at age 41
Pre-test odds: 0.02/0.98 = 0.0204
IF TEST IS NEGATIVE - LR = 0.32
Post-Test Odds = Pre-Test Odds x LR(+)
= 0.0204 x 0.32 = 0.0065
Post-Test prob = 0.0065/(0.0065 + 1) = .6%
Back to the case…
NBA is a bit better than NT, but still important uncertainty…
Can we combine our NT results with NBA results and do even better?
How do we combine test results?
Combining tests
Approach #1 – Assume independence Knowing results of one test doesn’t
influence how you interpret the next test We usually assume LR is independent of
pre-test probability This is what we did when we used a pre-test
risk of 2% instead of 6% in our calculations If so, we can just do the calculations
sequentially
Assuming test independence
First do NT, assume it’s positive (LR = 7)
Pre-test risk Post-test risk
2% 12.5%
Then do NBA, assume it’s also positive (LR = 23.7)
Pre-test risk Post-test risk
12.5% 77%
Assuming test independence
What’s the mathematical shortcut?
LR(1) * LR(2) = LR(1&2)
NT NBA LR
+ + 195
+ - 2.2
- + 11.2
- - 0.13
Combining tests
Is it reasonable to assume independence?
Does nasal bone absence tell you as much if you already know that the nuchal translucency is >3.5 mm?
What can we do to figure this out?
Joint eval of 4 test result combinations
NT NBA
Trisomy 21+ %
Trisomy 21- % LR
Pos Pos 158 47% 36 0.7% 69
Pos Neg 54 16% 442 8.5% 1.9
Neg Pos 71 21% 93 1.8% 12
Neg Neg 50 15% 4652 89% 0.2
Totals 333 100% 5223 100%
Vs.
195
2.2
11.2
.13
If tests were independent…
Combining tests
The Answer – the tests are NOT completely independent
So we CANNOT just multiply LR’sWhat should we do in this case?
Use LR’s from the combination table
Joint eval of 4 test result combinations
NT NBE
Trisomy 21+ %
Trisomy 21
- % LR
Pos Pos 158 47% 36 0.7% 69
Pos Neg 54 16% 442 8.5% 1.9
Neg Pos 71 21% 93 1.8% 12
Neg Neg 50 15% 4652 89% 0.2
Totals 333 100% 5223 100%
Use these!
Create ROC Table
NT NBE Tri21+ Sens Tri21- 1 - Spec LR
0% 0%
Pos Pos 158 47% 36 0.70% 69
Neg Pos 71 68% 93 3% 12
Pos Neg 54 84% 442 11% 1.9
Neg Neg 50 100% 4652 100% 0.2
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Sensitivity
1 - S
pecific
ity
AUROC = 0.896
Optimal Cutoff Analysis
NT NBE LRPost-Test
Prob
Pos Pos 69 81%
Neg Pos 12 43%
Pos Neg 1.9 11%
Neg Neg 0.2 1%
If we assume:
• Pre-test probability = 6%
• Threshold for CVS = 2%
Optimal algorithm is “any positive test CVS”
Non-independenceSlide rule approach (pre-test prob = 6%) The total arrow
length is NOT equal to the sum of its parts!
Non-independence
Technical definition of independence - must condition on disease status:
If this stringent definition is not met, the tests are non-independent
In patients with disease,a false negative on Test 1 does not affect the probability of a false negative on Test 2.
In patients without disease, a false positive on Test 1 does not affect the
probability of a false positive on Test 2.
Non-independence
Reasons for non-independence? Tests measure the same aspect of
disease.
Simple example: predicting pneumonia Cyanosis: LR = 5 O2 sat 85%-90%: LR = 6 Can’t just multiply these LR’s because
they really just reflect the same physiologic state!
Non-independence
Reasons for non-independence? Tests measure the same aspect of
disease. In our example:
One aspect of Down’s syndrome is slower fetal development; the NT decreases more slowly AND the nasal bone ossifies later. Chromosomally NORMAL fetuses that develop slowly will tend to have false positives on BOTH the NT Exam and the Nasal Bone Exam.
Non-independence
Other reasons for non-independence? Disease is heterogeneous
In severe pneumonia, all tests tend to be abnormal, so each individual test tells you less
O2 sat and respiratory rate
Non-disease is heterogeneous In patients with cough but no pneumonia,
abnormal tests may still track together 02 sat and respiratory rate also both abnormal with
PE; and both are normal with viral URI
See EBD page 158
Back to the case…
Remember that we actually simplified the case: Nuchal translucency is really a continuous test.
How do we take into account actual continuous NT measurement and NBA (and age, race, fetal crown-rump length, etc)?
Back to the case…
Can’t do combination table for all possible combinations!
2 dichotomous tests = 4 combinations 4 dichotomous tests = 16 combinations 3 3-level tests = 27 combinations How do we deal with continuous tests?
Combining tests
Approach #3: Recursive partitioning Repeatedly split the data to find
optimal testing/decision algorithm “prune” the tree
Combining tests
Approach #3: Recursive partitioning
Nuchal Translucency
Nasal Bone
< 3.5 mm ≥ 3.5 mm
31%2.5%
Present
1 %
Suspected Trisomy 21 (P = 6%)
43 %
Nasal Bone
Absent Present
11 %
Absent
81 %
Combining tests
Approach #3: Recursive partitioning
Nuchal Translucency
Nasal Bone
< 3.5 mm ≥ 3.5 mm
31%2.5%
Present
1 %
Suspected Trisomy 21 (P = 6%)
43 %
Nasal Bone
Absent Present
11 %
Absent
81 %
Non-optimal test ordering
Nasal Bone
Nuchal Translucency
< 3.5 mm≥ 3.5 mm
64%2%
Present
1 %
Suspected Trisomy 21 (P = 6%)
11 % 43 %
Absent
81 %
< 3.5 mm≥ 3.5 mm
Nuchal Translucency
No NT, CVS
CVSNo CVS
Combining tests
Approach #3: Recursive partitioning
You might do nasal bone test first, then “prune”
Nasal Bone
Nuchal Translucency
< 3.5 mm
64%2%
Present
1 %
Suspected Trisomy 21 (P = 6% )
11 %
Absent
≥ 3.5 mmCVS
CVSNo CVS
Combining tests
Approach #3: Recursive partitioning
Final algorithm: do Nasal Bone exam first, stop if absence and do CVS…
Combining tests
Approach #3: Recursive partitioning
Sophisticated statistical algorithms optimize cutpoints
Combining tests
Approach #3: Recursive partitioning
For classic example, see Figure 8.7: Chest pain workup algorithm (Goldman et al)
Combining tests
Approach #4: Logistic regression Uses a statistical model to combine
test results and predict disease Designed to account for non-
independence Handles continuous test results Can produce a “score”
A single integrated continuous test result Score subject to ROC curve, C-statistic,
other standard continuous test analyses
Combining tests
Approach #4: Logistic regression
For classic example, see Table 8.5: Predicting death in patients with pneumonia – The PORT score
Combining tests
The Major Pitfall - Overfitting
What happens when you throw more variables into a model? Will the model perform better?
Combining tests
The Major Pitfall
What happens when you throw more variables into a model? Will the model perform better?
YES, in the “derivation” set (even random noise will look good!)
NO, when you try to apply in the real world!
Combining tests
The more complex your test algorithm, the more important it is to VALIDATE Split your sample into a “derivation
set” and a “test set” 10-fold cross-validation, etc Validate in an EXTERNAL sample
Example 1 - predicting CAC with multiple risk factors
Should we do a heart scan for atherosclerosis?
Can we predict with clinical characteristics who has atherosclerosis without doing a heart scan?
Example 1 - predicting CAC with multiple risk factors
AUC-ROCModel Naïve* Cross-validated
Age + sex + race .7754 .7743“” + standard CHD RF’s .7901 .7881“” + all possible race-sex interactions .7941 .7843
Last model is most complex, highest “naïve” AUC-ROC, but NOT the highest cross-validated AUC-ROC because it is “over-fit”.
* - “Naïve” AUC-ROC refers to the AUC-ROC that you get when you estimate it within the same dataset from which the test algorithm was derived
Example 2 - predicting CAC with a proteomics “signal”
Proteomic analysis is an extreme example of combining test results: hundreds to thousands of signal peak heights, many just noise
Example 2: proteomics-CAC
But cross-validation shows that it was all just useless noise (AUC-ROC ~0.5)
VALIDATION
No matter what technique (CART or logistic regression) is used, the tests included in a model and the way in which their results are combined must be tested on a data set different from the one used to derive the rule.
Combining Tests Take home points
Test non-independence is the rule, not the exception, so usually CAN’T just multiple LR’s together
In simple cases, look at LR’s for all possible test result combinations
Fancier methods often used, but look for validation analyses, especially when there are LOTS of tests being combined.