michael a. kohn, md, mpp 10/28/2010 chapter 7 – prognostic tests chapter 8 – combining tests and...
TRANSCRIPT
Michael A. Kohn, MD, MPP10/28/2010
Chapter 7 – Prognostic TestsChapter 8 – Combining Tests and
Multivariable Decision Rules
Outline of Topics• Prognostic Tests
– Differences from diagnostic tests– Quantifying prediction: calibration and discrimination– Value of prognostic information– Comparing predictions– Example: ABCD2 Score
• Combining Tests/Diagnostic Models– Importance of test non-independence– Recursive Partitioning– Logistic Regression– Variable (Test) Selection– Importance of validation separate from derivation
Prognostic Tests (Ch 7)*
Differences from diagnostic tests Validation/Quantifying Accuracy
(calibration and discrimination) Assessing the value of prognostic
information Comparing predictions by different
people or different models*Will not discuss time-to-event analysis or predicting continuous outcomes. (Covered in Chapter 7.)
Chance determines whether you get the disease
Spin the needle
Diagnostic Test
1) Spin needle to see if you develop disease.
2) Perform test for disease.3) Gold standard determines true
disease state. (Can calculate sensitivity, specificity, LRs.)
Prognostic Test
1) Perform test to predict the risk of disease.
2) Spin needle to see if you develop disease.
3) How do you assess the validity of the predictions?
Example: Mastate Cancer
Once developed, always fatal.Can be prevented by mastatectomy.Two oncologists separately assign
each of N individuals a risk for developing mastate cancer in the next 5 years.
PatientID
Oncologist 1's Predicted
Probability
Oncologist 2's Predicted
Probability
Mastate Cancer within 5 years
1 20% 20% 0
2 50% 20% 0
3 35% 20% 0
4 50% 20% 1
5 35% 20% 0
6 20% 20% 0
7 20% 20% 0
8 20% 20% 0
9 35% 20% 1
10 50% 20% 0
11 50% 20% 1
12 35% 20% 0
How do you assess the validity of the predictions?
Oncologist 1 assigns risk of 50%
How many like this?
How many get mastate cancer?
Spin the needles.
Oncologist 1 assigns risk of 35%
How many like this?
How many get mastate cancer?
Spin the needles.
Oncologist 1 assigns risk of 20%
How many like this?
How many get mastate cancer?
Spin the needles.
How accurate are the predicted probabilities? Break the population into groups Compare actual and predicted
probabilities for each group
Calibration*
*Related to Goodness-of-Fit and diagnostic model validation.
Calibration
Oncologist 1's Predicted Risk
Observed ProportionObserved -
Predicted
50% 5/16 31.3% -18.8%
35% 3/16 18.8% -16.3%
20% 2/16 12.5% -7.5%
Oncologist 2's Predicted Risk
Observed ProportionObserved -
Predicted
20% 10/48 20.8% +0.8%
Calibration
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
Predicted Probability of Cancer
Ob
se
rve
d P
rop
ort
ion
wit
h C
an
ce
r
Oncologist 2
Oncologist 1
How well can the test separate subjects in the population from the mean probability to values closer to zero or 1?
May be more generalizable Often measured with C-statistic
(AUROC)
Discrimination
Discrimination
Oncologist 1 D+ D-
Risk = 50% 5 50% 11 29%
Risk = 35% 3 30% 13 34%
Risk = 20% 2 20% 14 37%
Total 10 100% 38 100%
Discrimination
0%
10%
20%
30%
40%
50%
60%
20% 35% 50%
Risk Group
Pro
po
rtio
n i
n R
isk
Gro
up
Cancer
No Cancer
Discrimination
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
1 - Specificity
Se
ns
itiv
ity
AUROC = 0.63
True Risk
Oncologist 1: 20%
Oncologist 2: 20%
True Risk: 11.1%
Oncologist 1: 35%
Oncologist 2: 20%
True Risk: 16.7%
Oncologist 1: 50%
Oncologist 2: 20%
True Risk: 33.3%
True Risk -- Calibration
True RiskObserved Proportion
Observed - Predicted
33.3% 5/16 31.3% -2.1%
16.7% 3/16 18.8% 2.1%
11.1% 2/16 12.5% 1.4%
True Risk -- Calibration
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
Predicted Probability of Cancer
Ob
se
rve
d P
rop
ort
ion
wit
h C
an
ce
r
True Risk
Oncologist 2
True Risk -- Discrimination
True Risk D+ D-
33.3% 5 50% 11 29%
16.7% 3 30% 13 34%
11.1% 2 20% 14 37%
Total 10 100% 38 100%
True Risk -- Discrimination
0%
10%
20%
30%
40%
50%
60%
11.1% 16.7% 33.3%
Risk Group
Pro
po
rtio
n i
n R
isk
Gro
up
Cancer
No Cancer
True Risk -- Discrimination
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
1 - Specificity
Se
ns
itiv
ity
AUROC = 0.63
ROC curve depends only on rankings, not calibration
Random event occurs AFTER prognostic test.
1) Perform test to predict the risk of disease.2) Spin needle to see if you develop disease.
Only crystal ball allows perfect prediction.
True Risk: 11.1% True Risk: 16.7% True Risk: 33.3%
Maximum AUROC
Maximum AUROC = 0.65
Diagnostic Test Prognostic Test
Purpose
Chance Event Occurs to Patient
Study Design
Test Result
Maximum Obtainable
AUROC
Diagnostic versus Prognostic Tests
Identify Prevalent Disease
Predict Incident Disease/Outcome
Prior to Test After Test
Cross-Sectional Cohort
+/-, ordinal, continuous Risk (Probability)
<1 (not clairvoyant)1 (gold standard)
Value of Prognostic Information
Why do you want to know risk of mastate cancer?
To decide whether to do a mastatectomy.
Value of Prognostic Information It is 4 times worse to die of
mastate gland cancer than to have a mastatectomy.
Cdeath = 4Cmastatectomy
Should do mastatectomy when P × Cdeath > Cmastatectomy
P > Cmastatectomy / Cdeath
P > 1/4Fine Point: If it is 4 times worse to die of mastate cancer that to live AND have a mastatectomy, then the NET cost of a death is 4C – C = 3C. Threshold odds equal C:B or 1:3.
Oncologist 1: 20%
< 25%
NO Mastatectomy
11 out of 100 die of mastate cancer, no mastatectomies
Oncologist 1: 35%
> 25%
Mastatectomy
83 out of 100 unnecessary; no mastate cancer deaths
Oncologist 1: 50%
> 25%
Mastatectomy
67 out of 100 unnecessary; no mastate cancer deaths
Value of Prognostic Information300 patients (100 per risk group)
Oncologist 2: 20%
< 25%
No Mastatectomy
11 out of 100 die of mastate cancer; no mastatectomies
Oncologist 2: 20%
< 25%
No Mastatectomy
17 out of 100 die; no mastatectomies
Oncologist 2: 20%
< 25%
No Mastatectomy
33 out of 100 die; no mastatectomies
Value of Prognostic Information300 patients (100 per risk group)
True Risk: 11%
< 25%
No Mastatectomy
11 out of 100 die of mastate cancer; no mastatectomies
True Risk: 17%
< 25%
No Mastatectomy
17 out of 100 die; no mastatectomies
True Risk: 33%
> 25%
Mastatectomy
67 out of 100 unnecessary; no mastate cancer deaths
Value of Prognostic Information300 patients (100 per risk group)
Mastatectomies
Deaths from Mastate Cancer
Mastatectomy
"Equivalents"
Death “Equivalents”
Oncologist 1 200 11 244 61
Oncologist 2 0 61 244 61
True Risk 100 28 212 53
Value of True Risk Estimate Relative to Oncologists 1 and 2 = 33 “mastatectomy equivalents“ and 8 “death equivalents.
Value of Prognostic Information300 patients (100 per risk group)
Comparing Predictions Identify cohort. Obtain predictions (or information
necessary for prediction) at inception. Provide uniform treatment to cohort or
at least treat independent of (blinded to) prediction.
Determine outcomes. Scenario: What would have happened if
treatment were based on predicted risk?
Doctors and patients like prognostic information
But hard to assess its value Most objective approach is decision-
analytic. Consider: What decision is to be made? Costs of errors? Cost of test?
Value of Prognostic Information
Common Problems with Studies of Prognostic Tests
See Chapter 7
Comparing Predictions Compare ROC Curves and AUROCs Reclassification Tables*, Net
Reclassification Improvement (NRI), Integrated Discrimination Improvement (IDI)
See Jan. 30, 2008 Issue of Statistics in Medicine* (? and EBD Edition 2 ?)
*Pencina et al. Stat Med. 2008 Jan 30;27(2):157-72;
Risk FactorPoint
s
Age
≥ 60 years 1
Blood Pressure
SBP ≥ 140 or DBP ≥ 90 1
Clincal features of TIA
Unilateral weakness (with or without speech impairment) 2
Speech impairment without unilateral weakness 1
Duration
TIA duration ≥ 60 minutes 2
TIA duration 10-59 minutes 1
Diabetes
Diabetes diagnosed by a physician 1
Total ABCD2 Score 0 – 7
ABCD2
Johnston SC, et al. Lancet. 2007 Jan 27;369(9558):283-92.
ABCD2 (Calibration)
Johnston SC, et al. Lancet. 2007 Jan 27;369(9558):283-92.
Score% of TIA Patients
90-Day Stroke Risk
0-3 34% 3.1%
4-5 45% 9.8%
6-7 21% 17.8%
ABCD2 (Discrimination)
Johnston SC, et al. Lancet. 2007 Jan 27;369(9558):283-92.
Score90-Day Stroke
No 90-Day Stroke
LR
6-7 40.6% 19.0% 2.14
4-5 47.9% 44.7% 1.07
0-3 11.5% 36.3% 0.32
100.0% 100.0%
ABCD2 (Discrimination)
Johnston SC, et al. Lancet. 2007 Jan 27;369(9558):283-92.
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
0-3 4-5 6-7
Stroke +
Stroke -
ABCD2 (Discrimination)
Johnston SC, et al. Lancet. 2007 Jan 27;369(9558):283-92.
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
0.0% 20.0% 40.0% 60.0% 80.0% 100.0%
Sensitivity
1 -
Sp
ecif
icit
y
≥ 4
≥ 6
AUROC = 0.67
Better Discrimination
Replace This
Johnston SC, et al. Lancet. 2007 Jan 27;369(9558):283-92.
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
0-3 4-5 6-7
Stroke +
Stroke -
With This
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
0-3 4-5 6-7
Stroke +
Stroke -
Replace This
Johnston SC, et al. Lancet. 2007 Jan 27;369(9558):283-92.
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
0.0% 20.0% 40.0% 60.0% 80.0% 100.0%
Sensitivity
1 -
Sp
ecif
icit
y
≥ 4
≥ 6
AUROC = 0.67
With This
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
0.0% 20.0% 40.0% 60.0% 80.0% 100.0%
Sensitivity
1 -
Sp
ecif
icit
y
≥ 4
≥ 6
AUROC = 0.92
What to with the ABCD2 score?
Recommendation is to admit TIA patients with ABCD2 > 5, and consider admission for ABCD2 4-5. Could give tPA if they have a stroke. Accelerated work-up.(? evidence that accelerated work-up
actually improves outcomes.)
Importance of test non-independence
Recursive Partitioning Logistic Regression Variable (Test) Selection Importance of validation separate
from derivation (calibration and discrimination revisited)
Combining Tests/Diagnostic Models
Combining TestsExample
Prenatal sonographic Nuchal Translucency (NT) and Nasal Bone Exam as dichotomous tests for Trisomy 21*
*Cicero, S., G. Rembouskos, et al. (2004). "Likelihood ratio for trisomy 21 in fetuses with absent nasal bone at the 11-14-week scan." Ultrasound Obstet Gynecol 23(3): 218-23.
If NT ≥ 3.5 mm Positive for Trisomy 21*
*What’s wrong with this definition?
>95th Perc.37.9%, 88.6%
> 3.5 mm9.2%, 63.7%
> 4.5 mm3.5%, 43.5%
> 5.5 mm1.9%, 31.2%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1 - Specificity
Sen
siti
vity
In general, don’t make multi-level tests like NT into dichotomous tests by choosing a fixed cutoff
I did it here to make the discussion of multiple tests easier
I arbitrarily chose to call ≥ 3.5 mm positive
One Dichotomous Test
Trisomy 21
Nuchal D+ D- LR
Translucency
≥ 3.5 mm 212 478 7.0
< 3.5 mm 121 4745 0.4
Total 333 5223
Do you see that this is (212/333)/(478/5223)?
Review of Chapter 3: What are the sensitivity, specificity, PPV, and NPV of this test? (Be careful.)
Nuchal Translucency
• Sensitivity = 212/333 = 64%
• Specificity = 4745/5223 = 91%
• Prevalence = 333/(333+5223) = 6%
(Study population: pregnant women about to undergo CVS, so high prevalence of Trisomy 21)
PPV = 212/(212 + 478) = 31%
NPV = 4745/(121 + 4745) = 97.5%** Not that great; prior to test P(D-) = 94%
Clinical Scenario – One TestPre-Test Probability of Down’s = 6%NT Positive
Pre-test prob: 0.06Pre-test odds: 0.06/0.94 = 0.064LR(+) = 7.0Post-Test Odds = Pre-Test Odds x LR(+)
= 0.064 x 7.0 = 0.44Post-Test prob = 0.44/(0.44 + 1) = 0.31
NT Positive
• Pre-test Prob = 0.06
• P(Result|Trisomy 21) = 0.64
• P(Result|No Trisomy 21) = 0.09
• Post-Test Prob = ?
http://www.quesgen.com/PostProbofDisease.php
Slide Rule
Nasal Bone SeenNBA=“No”
Neg for Trisomy 21
Nasal Bone AbsentNBA=“Yes”
Pos for Trisomy 21
Second Dichotomous Test
Nasal Bone Tri21+ Tri21- LR
Absent
Yes 229 129 27.8
No 104 5094 0.32
Total 333 5223
Do you see that this is (229/333)/(129/5223)?
Pre-Test Probability of Trisomy 21 = 6%NT Positive for Trisomy 21 (≥ 3.5 mm)Post-NT Probability of Trisomy 21 = 31%Nasal Bone AbsentPost-NBA Probability of Trisomy 21 = ?
Clinical Scenario –Two Tests
Using Probabilities
Clinical Scenario – Two Tests
Pre-Test Odds of Tri21 = 0.064NT Positive (LR = 7.0)Post-Test Odds of Tri21 = 0.44Nasal Bone Absent (LR = 27.8?)Post-Test Odds of Tri21 = .44 x 27.8?
= 12.4? (P = 12.4/(1+12.4) = 92.5%?)
Using Odds
Clinical Scenario – Two TestsPre-Test Probability of Trisomy 21 = 6%NT ≥ 3.5 mm AND Nasal Bone Absent
Question
Can we use the post-test odds after a positive Nuchal Translucency as the pre-test odds for the positive Nasal Bone Examination?
i.e., can we combine the positive results by multiplying their LRs?
LR(NT+, NBE +) = LR(NT +) x LR(NBE +) ? = 7.0 x 27.8 ? = 194 ?
Answer = No
NT NBE
Trisomy 21+ %
Trisomy 21- % LR
Pos Pos 158 47% 36 0.7% 69
Pos Neg 54 16% 442 8.5% 1.9
Neg Pos 71 21% 93 1.8% 12
Neg Neg 50 15% 4652 89% 0.2
Total Total 333 100% 5223 100%
Not 194
158/(158 + 36) = 81%, not 92.5%
Non-Independence
Absence of the nasal bone does not tell you as much if you already know that the nuchal translucency is ≥ 3.5 mm.
Clinical Scenario
Pre-Test Odds of Tri21 = 0.064NT+/NBE + (LR =68.8)Post-Test Odds = 0.064 x 68.8
= 4.40 (P = 4.40/(1+4.40) = 81%, not 92.5%)
Using Odds
Non-Independence
Non-Independence of NT and NBA
Apparently, even in chromosomally normal fetuses, enlarged NT and absence of the nasal bone are associated. A false positive on the NT makes a false positive on the NBE more likely. Of normal (D-) fetuses with NT < 3.5 mm only 2.0% had nasal bone absent. Of normal (D-) fetuses with NT ≥ 3.5 mm, 7.5% had nasal bone absent.
Some (but not all) of this may have to do with ethnicity. In this London study, chromosomally normal fetuses of “Afro-Caribbean” ethnicity had both larger NTs and more frequent absence of the nasal bone.
In Trisomy 21 (D+) fetuses, normal NT was associated with the presence of the nasal bone, so a false negative on the NT was associated with a false negative on the NBE.
Non-Independence
Instead of looking for the nasal bone, what if the second test were just a repeat measurement of the nuchal translucency?
A second positive NT would do little to increase your certainty of Trisomy 21. If it was false positive the first time around, it is likely to be false positive the second time.
Reasons for Non-Independence
Tests measure the same aspect of disease.
One aspect of Down’s syndrome is slower fetal development; the NT decreases more slowly and the nasal bone ossifies later. Chromosomally NORMAL fetuses that develop slowly will tend to have false positives on BOTH the NT Exam and the Nasal Bone Exam.
Reasons for Non-Independence
Heterogeneity of Disease (e.g. spectrum of severity)*.
Heterogeneity of Non-Disease.
(See EBD page 158.)*In this example, Down’s syndrome is the only chromosomal abnormality considered, so disease is fairly homogeneous
Unless tests are independent, we can’t combine results by multiplying LRs
Ways to Combine Multiple Tests
On a group of patients (derivation set), perform the multiple tests and (independently*) determine true disease status (apply the gold standard)
Measure LR for each possible combination of results
Recursive Partitioning Logistic Regression*Beware of incorporation bias
Determine LR for Each Result Combination
NT NBA Tri21+ % Tri21- % LRPost Test
Prob*
Pos Pos 158 47% 36 0.7% 69 81%
Pos Neg 54 16% 442 8.5% 1.9 11%
Neg Pos 71 21% 93 1.8% 12 43%
Neg Neg 50 15% 4652 89.1% 0.2 1%
Total Total 333 100% 5223 100%
*Assumes pre-test prob = 6%
Sort by LR (Descending)
NT NBA Tri21+ % Tri21- % LR
Pos Pos 15847
% 36 0.70% 69
Neg Pos 71
21% 93 1.80% 12
Pos Neg 5416
% 442 8.50% 1.9
Neg Neg 50
15% 4652 89.10% 0.2
Apply Chapter 4 – Multilevel Tests
Now you have a multilevel test (In this case, 4 levels.)
Have LR for each test result Can create ROC curve and calculate
AUROC Given pre-test probability and
treatment threshold probability (C/(B+C)), can find optimal cutoff.
Create ROC Table
NTNBE Tri21+
Sens
Tri21- 1 - Spec LR
0% 0%
Pos Pos 158 47% 36 0.70% 69
Neg Pos 71 68% 93 3% 12
PosNeg 54 84% 442 11% 1.9
NegNeg 50
100% 4652 100% 0.2
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Sensitivity
1 - S
pecific
ity
AUROC = 0.896
Optimal Cutoff
NT NBE LRPost-Test
Prob
Pos Pos 69 0.81
Neg Pos 12 0.43
Pos Neg 1.9 0.11
Neg Neg 0.2 0.01
Assume
• Pre-test probability = 6%
• Threshold for CVS is 2%
Determine LR for Each Result Combination
2 dichotomous tests: 4 combinations
3 dichotomous tests: 8 combinations
4 dichotomous tests: 16 combinations
Etc.
2 3-level tests: 9 combinations
3 3-level tests: 27 combinations
Etc.
Determine LR for Each Result Combination
How do you handle continuous tests?
Not always practical for groups of tests.
Recursive PartitioningMeasure NT First
Nuchal Translucency
Nasal Bone
< 3.5 mm ≥ 3.5 mm
31%2.5%
Present
1 %
Suspected Trisomy 21 (P = 6%)
43 %
Nasal Bone
Absent Present
11 %
Absent
81 %
Recursive PartitioningExamine Nasal Bone First
Nasal Bone
Nuchal Translucency
< 3.5 mm≥ 3.5 mm
64%2 %
Present
1 %
Suspected Trisomy 21 (P = 6%)
11 % 43 %
Absent
81 %
< 3.5 mm≥ 3.5 mm
Nuchal Translucency
Do Nasal Bone Exam First
Better separates Trisomy 21 from chromosomally normal fetuses
If your threshold for CVS is between 11% and 43%, you can stop after the nasal bone exam
If your threshold is between 1% and 11%, you should do the NT exam only if the NBE is normal.
Recursive PartitioningExamine Nasal Bone FirstCVS if P(Trisomy 21 > 5%)
Nasal Bone
Nuchal Translucency
< 3.5 mm≥ 3.5 mm
64%2%
Present
1 %
Suspected Trisomy 21 (P = 6%)
11 % 43 %
Absent
81 %
< 3.5 mm≥ 3.5 mm
Nuchal Translucency
No NT, CVS
CVSNo CVS
Recursive PartitioningExamine Nasal Bone FirstCVS if P(Trisomy 21 > 5%)
Nasal Bone
Nuchal Translucency
< 3.5 mm
64%2%
Present
1 %
Suspected Trisomy 21 (P = 6% )
11 %
Absent
≥ 3.5 mmCVS
CVSNo CVS
Recursive Partitioning
Same as Classification and Regression Trees (CART)
Don’t have to work out probabilities (or LRs) for all possible combinations of tests, because of “tree pruning”
Recursive Partitioning Does not deal well with continuous
test results*
*when there is a monotonic relationship between the test result and the probability of disease
Logistic Regression
Ln(Odds(D+)) = a + bNTNT+ bNBANBA + binteract(NT)(NBA)
“+” = 1“-” = 0
More on this later in ATCR!
Why does logistic regression model log-odds instead of probability?
Related to why the LR Slide Rule’s log-odds scale helps us visualize combining test results.
Probability of Trisomy 21 vs. Maternal Age
Ln(Odds) of Trisomy 21 vs. Maternal Age
Combining 2 Continuous Tests
> 1% Probability of Trisomy 21
< 1% Probability of Trisomy 21
Choosing Which Tests to Include in the Decision Rule
Have focused on how to combine results of two or more tests, not on which of several tests to include in a decision rule.
Variable Selection Options include:
• Recursive partitioning
• Automated stepwise logistic regression
Choice of variables in derivation data set requires confirmation in a separate validation data set.
Variable Selection
Especially susceptible to overfitting
Need for Validation: Example*Study of clinical predictors of bacterial diarrhea.Evaluated 34 historical items and 16 physical
examination questions. 3 questions (abrupt onset, > 4 stools/day, and
absence of vomiting) best predicted a positive stool culture (sensitivity 86%; specificity 60% for all 3).
Would these 3 be the best predictors in a new dataset? Would they have the same sensitivity and specificity?
*DeWitt TG, Humphrey KF, McCarthy P. Clinical predictors of acute bacterial diarrhea in young children. Pediatrics. Oct 1985;76(4):551-556.
Need for ValidationDevelop prediction rule by choosing a few
tests and findings from a large number of candidates.
Takes advantage of chance variations* in the data.
Predictive ability of rule will probably disappear when you try to validate on a new dataset.
Can be referred to as “overfitting.”
e.g., low serum calcium in 12 children with hemolytic uremic syndrome and bad outcomes
VALIDATION
No matter what technique (CART or logistic regression) is used, the tests included in a model and the way in which their results are combined must be tested on a data set different from the one used to derive the rule.
Beware of studies that use a “validation set” to tweak the model. This is really just a second derivation step.
Prognostic Tests and Multivariable Diagnostic Models
Commonly express results in terms of a probability
-- risk of the outcome by a fixed time point (prognostic test)
-- posterior probability of disease (diagnostic model)
Need to assess both calibration and discrimination.
Validation Dataset
Measure all the variables needed for the model.
Determine disease status (D+ or D-) on all subjects.
VALIDATIONCalibration
-- Divide dataset into probability groups (deciles, quintiles, …) based on the model (no tweaking allowed).-- In each group, compare actual D+ proportion to model-predicted probability in each group.
VALIDATIONDiscrimination
Discrimination-- Test result is model-predicted probability of disease.-- Use “Walking Man” to draw ROC curve and calculate AUROC.
Outline of Topics• Prognostic Tests
– Differences from diagnostic tests– Quantifying prediction: calibration and discrimination– Comparing predictions – Value of prognostic information– Example: ABCD2
• Combining Tests/Diagnostic Models– Importance of test non-independence– Recursive Partitioning– Logistic Regression– Variable (Test) Selection– Importance of validation separate from derivation