statistical methods for evaluating biomarkerscourses.washington.edu/epi573/janes session...
TRANSCRIPT
![Page 1: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/1.jpg)
Statistical Methods for Evaluating Biomarkers
Holly JanesFred Hutchinson Cancer Research Center
November 10, 2008
![Page 2: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/2.jpg)
Biomarkers for...
Diagnosis: disease versus non-diseaseScreening: early diagnosisPrognosis: predicting outcome
ExamplesI clinical signs / symptomsI laboratory testsI gene expression technologyI proteomicsI combinations of any of the above
How to evaluate their accuracy?
![Page 3: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/3.jpg)
Outline
1. Measures of biomarker accuracy2. Evaluating incremental value3. Phases of biomarker development4. Study design issues5. Advanced topics6. Software
![Page 4: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/4.jpg)
Measures of Accuracy for Binary Markers
![Page 5: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/5.jpg)
Classification Probabilities
D = outcome (disease)Y = binary marker
D = 0 D = 1Y = 0 True negative False negativeY = 1 False positive True positive
false positive fraction = FPF = P[Y = 1|D = 0] = 1 - specificitytrue positive fraction = TPF = P[Y = 1|D = 1] = sensitivity
Ideal test: TPF = 1 and FPF = 0
![Page 6: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/6.jpg)
Classification Probabilities, cont’d
I condition on disease statusI describe test accuracyI helpful to public health researchers: should the test be
used?I helpful to individual: should I have the test?
![Page 7: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/7.jpg)
Predictive Values
positive predictive value = PPV = P[D = 1|Y = 1]negative predictive value = NPV = P[D = 0|Y = 0]
Ideal test: PPV = 1 and NPV = 1
I condition on test resultI require cohort study to estimateI depend on TPF, FPF, and prevalence (ρ)
PPV = ρTPF/(ρTPF + (1-ρ)FPF)NPV = (1-ρ)(1-FPF)/((1-ρ)(1-FPF) + ρ(1-TPF))
I describe predictive capacity of testI given my test result, how likely is it that I’m diseased?
![Page 8: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/8.jpg)
Example: Diagnosis of CADY : exercise stress testD : coronary artery disease
D = 0 D = 1Y = 0 22.3% 14.2% 36.5%Y = 1 7.8% 55.6% 63.4%
30.1% 69.8% 100%
TPF = 0.797, FPF = 0.259, ρ = 0.698PPV = 0.877, NPV = 0.611, τ = 0.634
I CAD detects 80% of diseased subjects and incorrectlyidentifies 26% of non-diseased as suspicious
I 88% of test positives and 39% of test negatives havedisease
From The Statistical Evaluation of Medical Tests for Classification and Predictionby Margaret S. Pepe, Ph.D., Oxford University Press, 2003
![Page 9: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/9.jpg)
Inappropriate Commonly Used Measures
I misclassification rate (MCR)I odds ratio
![Page 10: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/10.jpg)
MCR
I = P[Y 6= D]= P[Y = 1|D = 0]P[D = 0] + P[Y = 0|D = 1]P[D = 1]= FPF ∗ (1− ρ) + (1− TPF) ∗ ρ
I ignores differential importance of false negative and falsepositive errors
I depends on the prevalence (ρ)I eg, if P[Y = 1|D = 1] = P[Y = 1|D = 0] = 0 with low ρ,
MCR lowI used a lot in statistics, not in medical settings
![Page 11: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/11.jpg)
Odds Ratio
I = TPF∗(1−FPF)FPF∗(1−TPF)
I measure of association, not classificationI good classification ⇒ huge odds ratiosI e.g., TPF = 0.80, FPF = 0.10 (a ‘good’ test)
I Odds Ratio = 0.80∗(1−0.10)0.10∗(1−0.80) = 36
![Page 12: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/12.jpg)
FPF
0.0 0.2 0.4 0.6 0.8 1.0
TPF
0.0
0.2
0.4
0.6
0.8
1.0
1.5
2.0
3.0
9.0
16.0
1.0
36.0
(FPF, TPF) corresponding to different odds ratios
I large odds ratio does not imply good classifierI need to report FPF and TPF separately
From Pepe et al. AJE 2004; 159:882–90.
![Page 13: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/13.jpg)
Measures of Accuracy for Continuous Markers
![Page 14: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/14.jpg)
Classification Accuracy for a Continuous Test
Continuous marker, YI most markers
The ROC curve generalizes (FPF, TPF) to continuous markers
I thresholding rule: ‘positive’ if Y ≥ cI TPF(c) = P[Y ≥ c|D = 1]
FPF(c) = P[Y ≥ c|D = 0]
I ROC(·) = {(FPF(c), TPF(c)), c ε (−∞,∞)}
![Page 15: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/15.jpg)
−5 0 5
Y
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
FPF(c)
TP
F(c
)
![Page 16: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/16.jpg)
−5 0 5
Y
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
FPF(c)
TP
F(c
)
![Page 17: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/17.jpg)
−5 0 5
Y
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
FPF(c)
TP
F(c
)
![Page 18: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/18.jpg)
−5 0 5
Y
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
FPF(c)
TP
F(c
)
![Page 19: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/19.jpg)
Attributes of the ROCI shows entire range of possible performanceI puts different tests on a common relevant scale
ABR
-10 0
DPOAE 65 at 2kHz
-20 0 20
normal
hearing impaired
Figure 4.3 Probability distributions of test results for the DPOAE and ABR tests among hearing impaired ears and normally hearing ears.
From The Statistical Evaluation of Medical Tests for Classification and Predictionby Margaret S. Pepe, Ph.D., Oxford University Press, 2003
![Page 20: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/20.jpg)
t
0 1R
OC
(t)
0
1
Figure 4.4 ROC curves for the DPOAE and
ABR tests.
DPOAE
ABR
I two tests have similar ability to distinguish betweenhearing-impaired and normal ears
From The Statistical Evaluation of Medical Tests for Classification and Predictionby Margaret S. Pepe, Ph.D., Oxford University Press, 2003
![Page 21: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/21.jpg)
Choosing a Threshold
Formal decision theory:
Expected cost(c) = ρ(1− TPF (c))CD + (1− ρ)FPF (c)CN
CD is the cost of negatively classifying a diseased subjectCN is the cost of positively classifying a non-diseased subject
=⇒ cost minimized at the threshold c where the slope of theROC curve equals
1− ρ
ρ
CN
CD
I requires specifying costs CD and CN (tricky!)
![Page 22: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/22.jpg)
Choosing a Threshold, cont’d
Common informal practice:I fix maximum tolerated FPFI eg must be very low (< 5%) for cancer screening testI f0 = FPF → threshold = 1− f0 quantile among controlsI or fix minimum tolerated TPFI eg must be very high in most diagnostic settingsI t0 = TPF → threshold = 1− t0 quantile among cases
![Page 23: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/23.jpg)
Summary Measures of Classification Accuracy
I TPF = ROC(f0) at chosen FPF = f0I percent cases detected for fixed FPF
I FPF = ROC−1(t0) at chosen TPF = t0I FPF for fixed percent cases detected
I AUC =∫ 1
0 ROC(f )dfI probability of correctly ordering a randomly chosen case
and control observationI little clinical relevanceI summarizes TPF over entire FPF range
I partial AUC =∫ f0
0 ROC(f0)dfI restricted ROC region, but little clinical relevance
![Page 24: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/24.jpg)
t = FPF(c)
0 1
RO
C(t
) =
TP
F(c
)
0
1
ROC(t)
![Page 25: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/25.jpg)
Example: Pancreatic Cancer Data
I marker sought for screening for pancreatic cancerI data on two markers: CA 19-9 and CA 125
FPF
0 1
TPF
0
1
0.2
CA 19-9
CA 125
From The Statistical Evaluation of Medical Tests for Classification and Predictionby Margaret S. Pepe, Ph.D., Oxford University Press, 2003
![Page 26: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/26.jpg)
AUC for CA 125 = 0.71AUC for CA 19-9 = 0.89p-value = 0.007=⇒ the probability of correct ordering is 18% higher with CA19-9
ROC(0.2) for CA 125 = 0.49ROC(0.2) for CA 19-9 = 0.78p-value = 0.04=⇒ CA 19-9 detects 29% more cancers with the same FPR =0.2
I conclusions about ROC(0.2) are more clinically importantthan those about AUC
![Page 27: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/27.jpg)
Generalizing Predictive Values to ContinuousBiomarkers
I a relatively new area of research; not well developed
![Page 28: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/28.jpg)
Evaluating Incremental Value
![Page 29: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/29.jpg)
Incremental Value
I how much classification accuracy does the new markeradd to existing predictors?
I eg how much does CRP add to existing lipidmeasurements and risk factor information in discriminatingbetween those who will and will not develop CVD?
![Page 30: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/30.jpg)
How Best to Combine Markers?
I Y = (Y1, . . . , YP)
I the “best" combination is the risk score,R(Y ) = P(D = 1|Y1, . . . , YP) McIntosh and Pepe(Biometrics, 2000)
I “best" =⇒ No other combination of (Y1, . . . , YP) has a(FPF, TPF) point above its ROC curve
![Page 31: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/31.jpg)
To Combine Markers
I EstimateR(Y ) = P(D = 1|Y1, . . . , YP)
I using logistic regression, neural networks, classificationtrees, support vector machines, Bayesian modelling, . . . .
I logistic regression can be used with case-control data
I Calculate the ROC curve for R(Y ) (it’s just anothermarker!)
I avoid overoptimism due to fitting and evaluating model onsame data
I split into training and validation dataI or use cross-validation
![Page 32: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/32.jpg)
Evaluating Incremental Value
I new marker Y ∗, baseline markers Y1, . . . , YP
I compare the ROC curves for
P(D = 1|Y1, . . . , YP)
andP(D = 1|Y1, . . . , YP , Y ∗)
I NOT quantified by β∗ in
g(P(D = 1|Y1, . . . , YP , Y ∗)) = β0+β1Y1+. . .+βPYP +β∗Y ∗
![Page 33: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/33.jpg)
Pancreatic Cancer Example
I Y1 = log CA-19-9 Y2 = log CA-125I combination β1Y1 + β2Y2 from fitting
logitP(D = 1|Y1, Y2) = α + β1Y1 + β2Y2
exp(β2) = 2.54 (p = 0.002)
I Y2 strongly associated with D
![Page 34: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/34.jpg)
ROC(0.05) = 0.68 for CA 19-9ROC(0.05) = 0.71 for combination of CA 19-9 and CA 125
I extremely common phenomenon
![Page 35: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/35.jpg)
Phases of Biomarker Development
![Page 36: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/36.jpg)
# Phase Objective Design1 Preclinical
Exploratorypromising directions identified,assess test reproducibility
diverse andconvenient casesand controls
2 ClinicalAssay andValidation
clinical assay detectsestablished disease, comparetest with standard of practice,assess covariate effects
population based,cases withdisease, controlswithout disease
3 RetrospectiveLongitudinal
biomarker detects disease earlybefore it becomes clinical (forscreening markers)
case-control studynested in alongitudinal cohort
4 ProspectiveScreening
extent and characteristics ofdisease detected by the testand the false referral rate areidentified
Cross-sectionalcohort of people
5 DiseaseControl
impact of screening on reducingthe burden of disease on thepopulation is quantified
randomized trial(ideally)
From: Pepe et al. Phases of biomarker development for early detection of cancer. JNCI 93(14):1054–61, 2001.
![Page 37: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/37.jpg)
Study Design Issues
![Page 38: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/38.jpg)
Matching in Case-Control Studies
I randomly sample casesI select controls matched to cases with respect to
confoundersI attempts to eliminate confoundingI eg Physicians’ Health Study
I evaluate PSA as a screening tool for prostate cancerI for each case select 3 controls within 1 years of age of the
caseI cases tend to be older, older subjects tend to have higher
PSA =⇒ age is confounderI matching on age attempts to correct for this
![Page 39: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/39.jpg)
Implications of Matching
I must adjust for matching covariates in analysisI unadjusted analysis is biasedI more complicated analysis
I can’t assess incremental value of marker over matchingcovariates
I tends to increase efficiency
![Page 40: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/40.jpg)
Selected Verification
I in prospective studies, may not be possible to obtain theoutcome (disease status) for all individuals
I too expensive (cost or resources)I not ethical (eg biopsy)
I often biomarker value determines whether disease statusis verified
I eg, in study of PSA and DRE for prostate cancer screening,biopsy recommended if PSA > 2.5 or DRE+
I selective sampling can lead to biased estimates ofaccuracy – “verification bias" or “work-up bias"
![Page 41: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/41.jpg)
Implications of Selected Verification
When comparing two binary biomarkers in paired study:I those who test negative on both tests are not needed to
estimate relative TPF, FPF
When evaluating one binary biomarker:I naive TPF,FPF are biasedI there are methods for correcting for verification biasI all make untestable assumptions about the verification
mechanismI verification may depend on unmeasured factors!
I lead to decreased precision of estimated TPFI difficult to find settings with cost savings: reduction in
number verified offset by increased total sample sizeI avoid selected verification whenever possible
![Page 42: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/42.jpg)
Advanced Topics
![Page 43: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/43.jpg)
Covariate adjustment
I adjust for covariates that impact the marker distribution incontrols
I eg center effects in multicenter studiesI analogous to covariate adjustment in studies of associationI the accuracy of the marker in a population with fixed
covariate value
![Page 44: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/44.jpg)
ROC regression
I model covariate effects on biomarker accuracyI eg disease severityI fit regression model for ROC curve, as function of
covariates
![Page 45: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/45.jpg)
Time-dependent ROC curves
I model biomarker accuracy as a function of time betweenmarker measurement and disease
I eg the accuracy of PSA may decline with increasing timelag between sample collection and disease
I define time-dependent versions of TPF,FPFI model accuracy as a function of time
![Page 46: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/46.jpg)
Imperfect reference test
I account for lack of gold standard for DI eg questionnaire to diagnose depressionI various statistical approaches ... but is this a statistical
problem?
![Page 47: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/47.jpg)
Software
On DABS Center website: http://www.fhcrc.org/labs/pepe/dabs
I Stata packages for ROC analysis and sample sizecalculations by Pepe et al.
I R programs for time-dependent ROC curves by PatrickHeagerty
![Page 48: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/48.jpg)
Websites
http://www.fhcrc.org/labs/pepe/dabs
DABS Center website. Contains datasets, software,references...
http://faculty.washington.edu/∼azhou/books/software.docLists some free and commercial computer programs. Alsoavailable through the Wiley website for Statistical Methods inDiagnostic Medicine by Zhou, Obuchowski and McClish, 2002.
http://xray.bsd.uchicago.edu/krl/roc_soft.htm
Charles Metz and colleagues at University of Chicago arepioneers in ROC analysis software. Developed with a focus onapplications in radiologic imaging.
![Page 49: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/49.jpg)
ReferencesStudy design
I Baker SG, Kramer BS, McIntosh M, Patterson BH, Shyr Y,Skates S. Evaluating markers for the early detection of cancer:overview of study designs and methods. Clinical Trials 3:43-56,2006.
I Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP,Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet, HCW. Towardscomplete and accurate reporting of studies of diagnosticaccuracy: the STARD initiative. Annals of Internal Medicine138:40-44, 2003.
I Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP,Irwig LM, Moher D, Rennie D, de Vet, HCW, Lijmer JG. TheSTARD statement for reporting studies of diagnostic accuracy:explanation and elaboration. Annals of Internal Medicine138(1):W1-12, 2003.
I Janes H, Pepe MS. The optimal ratio of cases to controls forestimating the classification accuracy of a biomarker.Biostatistics 7(3):456-68, 2006.
![Page 50: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/50.jpg)
I Pepe MS, Etzioni R, Feng Z Potter JD, Thompson M, ThornquistM, Winget M and Yasui Y. Phases of biomarker development forearly detection of cancer. Journal of the National CancerInstitute 93(14):1054–61, 2001.
I Zhou, SH, McClish, DK, and Obuchowski, NA. StatisticalMethods in Diagnostic Medicine. Wiley Press, 2002.
Combining markers
I McIntosh MS and Pepe MS. Combining several screening tests:Optimality of the risk score. Biometrics 58:657–64, 2002.
I Pepe MS, Cai T, Longton G. Combining predictors forclassification using the area under the ROC curve. Biometrics62:221–229, 2006.
Covariate adjustment
I Janes H, Pepe MS. Matching in studies of classificationaccuracy: Implications for analysis, efficiency, and assessmentof incremental value. Biometrics 2008; 64: 1-9.
![Page 51: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/51.jpg)
I Janes H, Pepe MS. Adjusting for Covariate Effects onClassification Accuracy Using the Covariate-Adjusted ROCCurve. Biometrika (in press)
ROC regression
I Alonzo TA and Pepe MS. Distribution-free ROC analysis usingbinary regression techniques. Biostatistics 3:421–32, 2002.
I Cai T and Pepe MS. Semi-parametric ROC analysis to evaluatebiomarkers for disease. Journal of the American StatisticalAssociation 97: 1099–1107, 2002.
I Cai T. Semiparametric ROC regression analysis. Biostatistics5(1):45–60, 2004.
I Dodd L, Pepe MS. Partial AUC estimation and regression.Biometrics 59:614–623, 2003.
I Dodd L, Pepe MS. Semi-parametric regression for the areaunder the Receiver Operating Characteristic Curve. Journal ofthe American Statistical Association 98:409–417, 2003.
![Page 52: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/52.jpg)
I Heagerty PJ and Pepe MS. Semiparametric estimation ofregression quantiles with application to standardizing weight forheight and age in US children. Applied Statistics 48:533–51,1999.
Time-dependent ROCs
I Cai T, Pepe MS, Zheng Y, Lumley T, Jenny NS. The sensitivityand specificity of markers for event times Biostatistics7:182–197, 2006.
I Heagerty PJ, Zheng Y. Survival model predictive accuracy andROC curves. Biometrics 61:92–105, 2005.
I Zheng Y, Heagerty PJ. Semi-parametric estimation oftime-dependent ROC curves for longitudinal marker data.Biostatistics 5:615–632, 2004.
Imperfect reference test
I Albert PS, Dodd LE. A cautionary note on robustness of latentclass models for estimating diagnostic error without a goldstandard. Biometrics 60:427–35, 2004.
![Page 53: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/53.jpg)
I Albert PS, McShane LM, Shih JH, et al. Latent class modelingapproaches for assessing diagnostic error without a goldstandard. Biometrics 57:610–19, 2001.
I Alonzo TA and Pepe MS. Using a combination of reference teststo assess the accuracy of a new diagnostic test. Statistics inMedicine 18:2897-3003, 1999.
I Pepe MS, Janes H. Insights into latent class analysis.Biostatistics 8:474-84, 2007.
I Vacek PM. The effect of conditional dependence on theevaluation of diagnostic tests. Biometrics 41:959-68, 1985.
Verification bias
I Alonzo TA. Verification bias-corrected estimators of the relativetrue and false positive rates of two binary screening tests.Statistics in Medicine 24:403–417, 2005.
I Alonzo TA and Pepe MS. Assessing accuracy of a continuousscreening test in the presence of verification bias. Journal of theRoyal Statistical Society: Applied Statistics 54:173–190, 2005.
![Page 54: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/54.jpg)
I Alonzo TA, Braun TB, Moskowitz CS. Small sample estimation ofrelative accuracy for binary screening tests. Statistics inMedicine 23:21–34, 2004.
I Alonzo TA, Kittelson JC. A novel design for estimating relativeaccuracy of screening tests when complete disease verificationis not feasible. Biometrics DOI:10.1111/j.1541-0420.2005.00445.x (early online access).
I Begg CB and Greenes RA. Assessment of diagnostic testswhen disease verification is subject to selection bias. Biometrics39:207-15, 1983.
I Kosinski AS and Barnhart HX. Accounting for nonignorableverification bias in assessment of diagnostic tests. Biometrics59:163-71, 2003.
I Obuchowski NA, Zhou X. Prospective studies of diagnostic testaccuracy when disease prevalence is low. Biostatistics 3:477-92,2002.
I Pepe MS and Alonzo TA. Comparing disease screening testswhen true disease status is ascertained only for screenpositives. Biostatistics 2:1–12, 2001.
![Page 55: Statistical Methods for Evaluating Biomarkerscourses.washington.edu/epi573/janes session 11-10-08.pdf · 10/8/2011 · From The Statistical Evaluation of Medical Tests for Classification](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f8a457918b7353d0677b2ff/html5/thumbnails/55.jpg)
I Pepe MS and Alonzo TA. Reply to Letter to Editor regardingAlonzo TA and Pepe MS, Assessing the accuracy of a newdiagnostic test when a gold standard does not exist. Statistics inMedicine 20:656–660, 2001.
I Punglia et al. Effect of verification bias on screening for prostatecancer by measurement of PSA. NEJM 349:335-42, 2003.