a graphic approach to the evaluation of the performance of ... file~ a graphic approach to the...

9
___ __ .• A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani Edoardo Bracci Marco Ferdeghini Introduction CNR - Institute of Clinical Physiology and Department of Public Health and Biostatistics, University of Pisa CNR - CNUCE Institute, Pisa Department of Nuclear Medicine, University of Pisa Diagnostic tests are currently used to classify, as correctly as possible, symptomatic patients into specific categories of pathology. The performances of these tests are defined by their sensitivity, specificity and their predictive values; the latter ones being strictly related to the prevalence of the disease that the test is intended to identify. All this information, easily obtained from the frequencies of true positive (TP), true negative (TN), false negative (FN) and false positive (FP) results, can be summarized in a graphic way to simplify the comparison among different tests. When the results of a test arise from a continuous variable, Receiver Operating Characteristic (ROC) curves are frequently used to graphically display sensitivity as a function of false positive rate. Once an optimal cut-off value has been selected for a given test, a double ring diagram can be useful to show the test performance taking into account the actual proportion of both correct and incorrect classifications. Methods The ROC curve plots sensitivity and (l-specificity) at all possible threshold levels separating results classified as normal from those classified as abnormal. A test with a ROC curve coinciding with the diagonal is worthless, whereas a test is perfect when the curve reaches the top left corner (corresponding to sensitivity and specificity equal to 1, i.e., no misclassification). We used an empirical ROC curve, in which each plotted point resulted from a sensitivity/specificity pair corresponding to the percentiles computed over the entire range of observed results. We also set, on the curve, the 'best' cut-off value selected for the sensitivity/specificity pair that maximizes the function: sensitivity - m(l- specificity), where the m weight, fixed by Zweig and Campbell as: false positive cost 1- Prevalence m= *----- false negative cost Prevalence can be optionally entered by the user; its default value is 1. When the m weight is 1 its value corresponds to the Youden index. The default value is appropriate only if an equal cost of both possible misclassification errors can be assumed and the prevalence is quite near 0.5 (in the present application we only used this default value). Sensitivity and specificity, and therefore ROC curves, are independent of the prevalence of disease; conversely, the number of both correct and incorrect classification results is strictly dependent on such prevalence. Thus, we used, in addition to the ROC curves, a double-ring diagram, which shows the test results on the inner ring and the fraction of disease on the outer one (see Fig. 2). The central sectors correspond to true positive (TP), true negative (TN), false positive (FP) and false negative (FN) results. TN and TP,with a uniform fill pattern, identify correct classifications; FN and FP, with a streaked fill pattern, correspond to misclassification areas. The latter areas result from the 1008

Upload: vanliem

Post on 21-Aug-2019

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A graphic approach to the evaluation of the performance of ... file~ A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani

r~ ___ ~~ __ ~~C"~C~"C~ .• ~c~~~

~ ~ ~ A graphic approach to the evaluation of the performance of diagnostic

tests using the SAS@ software.

Renza Cristofani

Edoardo Bracci Marco Ferdeghini

Introduction

CNR - Institute of Clinical Physiology and Department of Public Health and Biostatistics, University of Pisa CNR - CNUCE Institute, Pisa Department of Nuclear Medicine, University of Pisa

Diagnostic tests are currently used to classify, as correctly as possible, symptomatic patients into specific categories of pathology.

The performances of these tests are defined by their sensitivity, specificity and their predictive values; the latter ones being strictly related to the prevalence of the disease that the test is intended to identify. All this information, easily obtained from the frequencies of true positive (TP), true negative (TN), false negative (FN) and false positive (FP) results, can be summarized in a graphic way to simplify the comparison among different tests.

When the results of a test arise from a continuous variable, Receiver Operating Characteristic (ROC) curves are frequently used to graphically display sensitivity as a function of false positive rate. Once an optimal cut-off value has been selected for a given test, a double ring diagram can be useful to show the test performance taking into account the actual proportion of both correct and incorrect classifications.

Methods

The ROC curve plots sensitivity and (l-specificity) at all possible threshold levels separating results classified as normal from those classified as abnormal. A test with a ROC curve coinciding with the diagonal is worthless, whereas a test is perfect when the curve reaches the top left corner (corresponding to sensitivity and specificity equal to 1, i.e., no misclassification).

We used an empirical ROC curve, in which each plotted point resulted from a sensitivity/specificity pair corresponding to the percentiles computed over the entire range of observed results.

We also set, on the curve, the 'best' cut-off value selected for the sensitivity/specificity pair that maximizes the function:

sensitivity - m(l- specificity),

where the m weight, fixed by Zweig and Campbell as:

false positive cost 1-Prevalence m= *-----

false negative cost Prevalence

can be optionally entered by the user; its default value is 1. When the m weight is 1 its value corresponds to the Youden index. The default value is

appropriate only if an equal cost of both possible misclassification errors can be assumed and the prevalence is quite near 0.5 (in the present application we only used this default value).

Sensitivity and specificity, and therefore ROC curves, are independent of the prevalence of disease; conversely, the number of both correct and incorrect classification results is strictly dependent on such prevalence.

Thus, we used, in addition to the ROC curves, a double-ring diagram, which shows the test results on the inner ring and the fraction of disease on the outer one (see Fig. 2).

The central sectors correspond to true positive (TP), true negative (TN), false positive (FP) and false negative (FN) results.

TN and TP,with a uniform fill pattern, identify correct classifications; FN and FP, with a streaked fill pattern, correspond to misclassification areas. The latter areas result from the

1008

Page 2: A graphic approach to the evaluation of the performance of ... file~ A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani

; . .';:',-.'"

superimposition of the fill pattern corresponding to the test result onto the background corresponding to the disease status.

The ratio between the black area (TP) and the clockwise adjacent one (FP) gives an estimate of the positive predictive value (TP/(TP+FP», whereas the ratio of TP sector to the counter-clockwise adjacent one (FN) graphically estimates the sensitivity of the test (TP/(TP+FN» .

Likewise, the ratio between the grey area (TN) and the clockwise adjacent one (FN) gives an estimate of the negative predictive value (TN/(TN+FN», whereas the ratio of TN sector to the counter-clockwise adjacent one (FP) graphically estimates the specificity of the test (TN/(TN+FP».

This diagram is drawn using the 'best' cut-off level computed by the foregoing algorithm and displayed on the ROC curve.

The same cut-off value is used to build the frequency table and to compute the performance variables reported in the output.

Study group

In a group of 623 consecutive women undergoing laparotomy for adnexal masses, three mucinous tumour markers (CAI25, CAI5.3, TAG72) were measured by immunoradiometric assays.

One hundred and forty-eight women were shown to have an epithelial ovarian cancer (EOC) and 475 a benign ovarian disease (BOD).

In order to evaluate the possible age-dependence of the concentration levels of these markers, we stratified EOC and BOD patients into two groups, i.e.,. subjects younger or older than 50 years.

User interface

In our application the user must first specify, in the main menu, the name of the variable (test) he has to analyze and the data set in which it is recorded(see Fig. I). Figure 2 shows the output obtained from the standard run of the program. Each graphic element of the display can be enlarged by using the ZOOM push button. The GOBACK button brings back the user to the main menu. If the user specifies more than one test (max. 3 in the present application), he must select the type of output ( Table, Ring, ROC) he wishes to compare (Fig. 3). Figures 4-6 show the possible results. In addition, the user can optionally specify the name of a stratification variable, age in our example, and the lower and upper interval limits for each established class (maximum three strata); the user can then compare the results separately obtained for each subgroup of subjects previously defined. Figure 7 shows the menu with application to two classes of age (less than 50 and over 50 years). Figures 8-10 show the resulting outputs.

Results

The results of the SAS program are a menu driven description of ROC curves, double ring diagrams, two-way frequency tables and performance variables in different possible combinations. They were obtained in Windows environment, using the Frame technology to build the user interface and the Annotate facility to create and/or to customize the graphic outputs. .

The example shows three different applications: I) the capability of CAI25 test to discriminate between EOC and BOD in the

preoperative evaluation of adnexal masses; the information included in Figure 2 allows a performance evaluation of the test;

2) the comparison of CA125, CAI5.3 and TAG72 capability to discriminate between ovarian malignancy and BOD: in particular, the examination of the three ROC plots shown in Figure 4 identifies the test with the highest values of sensitivity and specificity and the corresponding cut-off value. The Figures 5 and 6 show data (FP, FN, TP, TN, etc.)

1009

Page 3: A graphic approach to the evaluation of the performance of ... file~ A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani

corresponding to the selected cut-off value, displayed as both double ring graphs and contingency tables. These data, in addition to the performance characteristics (sensitivity, specificity and predictive values), can be used to compare test effectiveness;

3) the different performance characteristics of CA125 in patients younger than 50 years and in those older than 50 years. The information of Figures 8-10 ,is the same as for the previous application. In this case, the test appears to have better performance characteristics in the older age group, probably due to the small proportion of diseased patients in the younger group which makes sensitivity, specificity and predictive values unstable.

Conclusions

The intrinsic value of a test is expressed by its sensitivity and specificity determined with respect to an external reference value (gold standard). However, in practical applications, the positive (and negative) predictive values are important to estimate the probability of disease in subjects with a positive (or negative) test result. The double ring graph may be helpful, once an optimal cut-off value has been dermed (optionally adjusted for the prevalence of disease and/or differential costs of false positive/false negative results), to achieve the immediate and straightforward visualization of the test results. The main advantage of this graphic representation over standard frequency tables is the capability of evaluating, at a glance, the actual proportion of correctly classified results in comparing several tests or different groups of subjects.

References

1) SAS Institute Inc., SAS/GRAPH® Software: Reference, Version 6.08, First Edition, Cary, NC: SAS Institute Inc., 1993.

2) SAS Institute Inc., SAS/AF® Software: Frame Entry, Usage and Reference, Version 6, First Edition, Cary, NC: SAS Institute Inc., 1993.

3) W.J. Youden: Index for rating diagnostic tests. Cancer: 3, 217-35,1950.

4) RS Galen, SR Gambino: Beyond normality: the predictive value and efficiency of medical diagnoses. J Wiley & Sons, Inc., New York, 1975.

5) K Linnett : A review on the methodology for assessing diagnostic tests. Clin. Chem .. : 34, 1379-86,1988.

6) S Wieand, MH Gail, BR James, KL James: A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika: 76,585-592, 1989.

7) MA Stefadouros: A new system of visual presentation of analysis of test performance: the "double ring" diagram. J. CUn. Epidemiol.: 46,1151-1158,1993.

8) MH Zweig, G Campbell: Receiver-Operating Characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin. Chem .. : 39,561-577,1993.

9) P Fioretti, A Gadducci, M Ferdeghini, R Bianchi: Tumour marker association in ovarian and cervical cancer. In Updating on tumour markers in tissues and biologicalfluids: 737-750, Edizioni Minerva Medica, Torino, 1993.

1010

Page 4: A graphic approach to the evaluation of the performance of ... file~ A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani

i I it ii :!' ~.'.- ...

P D+

D T+

T

FN

FP TN

TP

Sensitivity

Specificity

Accuracy

ROC curve

Appendix

Definitions and abbreviations

Prevalence of the disease under study, assessed in a reference group.

Patients with the disease as defined according to a gold standard.

Patients without the disease as defined according to a gold standard.

Patients with a positive test result.

Patients with a negative test result.

Patients with a false negative test result.

Patients with a false positive test result.

Patients with a true negative test result.

Patients with a true positive test result.

Fraction of patients with the disease who are test positive: TP/(TP+FN), or

probability of a positive test result in diseased patients: P(T + ID +).

Fraction of patients without the disease who are test negative: TN/(TN+FP) or probability of a negative test result in patients without

the disease: P(T ID ).

Predictive Value of a positive test result. Fraction of patients with a positive test who have the disease, or probability of dise3;se in patients with

a positive test value P(D + rr + ).

Predictive Value of a negative test result. Fraction of patients with a negative test who do not have the disease, or probability of absence of disease in patients with a negative test value P(D -rr -). Fraction of subjects correctly classified by the test: (TP+ TN)/(TP+ TN +FP+FN).

Receiver Operating Characteristic curve.

SAS System ®, SAS/Graph, SAS IAF are registered trademarks of SAS Institute Inc., Cary, NC, USA.

For additional information:

Renza Cristofani Istituto Fisiologia Clinica CNR via Trieste, 41 56100 PI S A Tel: -39-50-502771 EMAIL:[email protected]

1011

Edoardo Bracci CNUCE-CNR via S .Maria,36 56100PI SA Tel: -39-50-593223 EMAIL:[email protected]

Page 5: A graphic approach to the evaluation of the performance of ... file~ A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani

Enter the data set names ~IS=E=UG~I~.mID==E=OC~ __ ~I~~I~1 __________ ~I~~~"~ __________ ~I~~~I

Tests to be analyzed I CA125 I~II I~I 1L--_____ -lI.=!!!~=..J1

Specify the stratification variable VARIABLE I I~I

FROM c=J TO c=J FROM [=:J TO [=:J FROM c=J TO [=:J

Specify the weight values CD c=I c=I

Data set: SEUGI.BODEOC

Test: CA125

Weight: 1.0

Ifl_l~1 U~:~I l~f"MI 1.1 . ....--r-

1.1 a.~-

_______ s'

I.l

1.1 1.1 U 1.1 1.1 1.1

1 - Specifjuil¥

Fig. 2

D+ 1 1 1

37

1 4- 8

D-

H 1155 4-31 _4-68

4-75 623 SenGitivity=D.75 PV+=D.72 Specificity=D.9! PV-=D.Q2 Totol accuraey=O.87

1012

&} ft'

8m .'JP ::~t 'IN

OT+ 8'1'-

• D+ @ D-

Page 6: A graphic approach to the evaluation of the performance of ... file~ A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani

I""'r """"""" .............................. '""''''''~.,''''-~~''-=~o., ~ ....

I 1

Test1 : Test2: Test3:

CA125 CA153 TA672

We ghtl: We ght2: We ght3:

I_I Test1

Test1 : Test2: Test3:

Test1

(;A125 CA153 TA672

!' .I!; 1.1

:l5 • 1iI1.' ID

Fig.3

1.11~,......,.,,,....,rTT'''''''''''''''''''''''''''''TT 1.1

1.0 1.0 1.0

TestS

Fig. 4 , \,

1013

Select Type of graph Roc Plot Ring Graph Table output

Page 7: A graphic approach to the evaluation of the performance of ... file~ A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani

Testl Test2 Test3

CAI25 CAI53 TAG 72

1-Nww Test I

Testl CAI25 Test2 CAISa Test3 TAG72

I-.=----Test I D+ 0-

Weightl Weight2 Weight3

«1) liP

• 1111 • 'l'P $ III

o 'f+

• I-

• D+ :$ D-

We ghtl We ght2 We ght3

T+ 111 44 155 \----+----1

T- 37 431 4&8

148 475 Sen 5 i l i v i l y = D . 15 S~eci(icily=D .91 Tot 0 I occ. rocy=D. 81

&23 PV+=D.72 PV-=D.92

1.0 1.0 1.0

Fig. 6

Testa

:~~ iP

elK •• 4~ ..

0 ... • '1'-

• D+ :,*:~ D-

-=0~: J"P

i,.. eTP $,..

On til T-

eD+­it D-

~ ~ 1-1-3-3--1--3-5- 5--11 ~:! 123 408

Sensit i vi lV-D.13 Specificily=D.B7 TDtal accuracy-D,a,

Test3 0+ 0-

531 PV+ .. D.&3 PV-=D.91

T+ 89 27 116 ~---~--~

T- 44 399 443 133 426

5ens i l i vi ty=O .61 5 pee i fi ci t y =0 . 9 4-TDLal accuracy=0.H7

1014

559 PV+=O.l1 PV-=0.90

Page 8: A graphic approach to the evaluation of the performance of ... file~ A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani

Enter the data set names L.!!I S~E=UG~I.!..!:. BGD=E=oc"'---_.....JI ..... :O&:: ...... 11 SEUG I • BGDEOC I:;&:;I~I ______ -.....JI ..... :O&:: ...... I

Tests to be analyzed 1 '-' c=A'"'-'12=5'--___ ~I_:;&:; .... 1I CA 125 I:;&:; I 1'---______ --"_4 ..... '

Specify the stratification variable VM I ABLE I AGE hl::l

FROM CD TO 13[] FROM.~ TO [HJ FROM c=J TO c=J

Specify the weight values CD CD c=J.

I~

Fig. 7

Testl: Cft125 Weightl: 1.0

Agel 49

TDSt2: CA125 Weight2: 1.0

Age2 50 99

TDStI 1.1 Test!

1.1

I 2. : 1.1

1.4 III

1.1

1.1 I.' 1.4 1.1 1.1 .. I

t-:!¥~""

Fig. 8

1015

1.1

1.1

, 1.1

:ill

J 1.+

I.'

1.1 I.t 1.4 1.1 •.• 1.1

t -~<!.:k!""

Page 9: A graphic approach to the evaluation of the performance of ... file~ A graphic approach to the evaluation of the performance of diagnostic tests using the SAS@ software. Renza Cristofani

\

\.

Test I : Cftl25

Agel

Test.2: Cftl25

Age2 50

Test I

Testl : Cft 125

Agel

Test.2: Cft125

Age2 50

49

99

49

99 1-Test. I D+ D-

Weightl: 1.0

Weight.2: 1.0

~} 8"P

.1111' .TP ~:; TN

o T+ ... T-

.D+ $> D-

Weightl: 1.0

Weight.2: 1.0

T+ IS 32 48

T- 11 341 352 '-------'"----'

27 373 400 SeI8ili~ity=D.59 PV+=D.33 Specificity=D.91 PV-=O.91 Tolol occuracy=O.89

Fig. 10

Test2

Test.2 D+ D-

t~, JP

em e1P ~=$< m

o T+ • T-

• D+ -$ D-

~ ~ I-I-~-:---+--~ -: --II : ~ : 121 102 223

Seuitivily=O.79 Spec i fie i t Y =0.89 Tolal accuracy=O.8~

1016

PV+=O.90 PV-=O.18

!.