receiver operating characteristic (roc) curve

INDIAN PEDIATRICS 277 VOLUME 48__APRIL 17, 2011

Diagnostic tests play a vital role inmodern medicine not only forconfirming the presence of disease butalso to rule out the disease in individualpatient. Diagnostic tests with two outcomecategories such as a positive test (+) and negativetest () are known as dichotomous, whereas thosewith more than two categories such as positive,indeterminate and negative are called polytomoustests. The validity of a dichotomous test comparedwith the gold standard is determined by sensitivityand specificity. These two are components thatmeasure the inherent validity of a test.

A test is called continuous when it yieldsnumeric values such as bilirubin level and nominalwhen it yields categories such as Mantoux test.Sensitivity and specificity can be calculated in bothcases but ROC curve is applicable only forcontinuous or ordinal test.

When the response of a diagnostic test iscontinuous or on ordinal scale (minimum 5

categories), sensitivity and specificity can becomputed across all possible threshold values.Sensitivity is inversely related with specificity in thesense that sensitivity increases as specificitydecreases across various threshold. The receiveroperating characteristic (ROC) curve is the plot thatdisplays the full picture of trade-off between thesensitivity and (1- specificity) across a series of cut-off points. Area under the ROC curve is considered asan effective measure of inherent validity of adiagnostic test. This curve is useful in (i) findingoptimal cut-off point to least misclassify diseased ornon-diseased subjects, (ii) evaluating the discri-minatory ability of a test to correctly pick diseasedand non-diseased subjects; (iii) comparing theefficacy of two or more tests for assessing the samedisease; and (iv) comparing two or more observersmeasuring the same test (inter-observer variability).

INTRODUCTION

This article provides simple introduction to measuresof validity: sensitivity, specificity, and area under

Receiver Operating Characteristic (ROC) Curvefor Medical ResearchersRAJEEV KUMAR AND ABHAYA INDRAYAN

From the Department of Biostatistics and Medical Informatics, University College of Medical Sciences, Delhi, India.Correspondence to:Mr Rajeev Kumar, Department of Biostatistics and Medical Informatics, University College of MedicalSciences, Delhi 110 095. [email protected]

P E R S P E C T I V EP E R S P E C T I V EP E R S P E C T I V EP E R S P E C T I V EP E R S P E C T I V E

Sensitivity and specificity are two components that measure the inherent validity of a diagnostic test for dichotomousoutcomes against a gold standard. Receiver operating characteristic (ROC) curve is the plot that depicts the trade-offbetween the sensitivity and (1-specificity) across a series of cut-off points when the diagnostic test is continuous or onordinal scale (minimum 5 categories). This is an effective method for assessing the performance of a diagnostic test. Theaim of this article is to provide basic conceptual framework and interpretation of ROC analysis to help medical researchersto use it effectively. ROC curve and its important components like area under the curve, sensitivity at specified specificityand vice versa, and partial area under the curve are discussed. Various other issues such as choice between parametricand non-parametric methods, biases that affect the performance of a diagnostic test, sample size for estimating thesensitivity, specificity, and area under ROC curve, and details of commonly used softwares in ROC analysis are alsopresented.

Key words: Sensitivity, Specificity, Receiver operating characteristic curve, Sample size, Optimal cut-off point,Partial area under the curve.


KUMAR AND INDRAYAN RECEIVER OPERATING CHARACTERISTIC CURVE

ROC curve, with their meaning and interpretation.Some popular procedures to find optimal thresholdpoint, possible bias that can affect the ROC analysis,sample size required for estimating sensitivity,specificity and area under ROC curve, and finallycommonly used statistical softwares for ROCanalysis and their specifications are also discussed.

PubMed search of pediatric journals reveals thatROC curve is extensively used for clinical decisions.For example, it was used for determining the validityof biomarkers such as serum creatine kinase muscle-brain fraction and lactate dehydrogenase (LDH) fordiagnosis of the perinatal asphyxia in symptomaticneonates delivered non-institutionally where areaunder the ROC curve for serum creatine kinasemuscle-brain fraction recorded at 8 hours was 0.82(95% CI 0.69-0.94) and cut-off point above 92.6 U/Lwas found best to classify the subjects. The areaunder ROC curve for LDH at 72 hours was 0.99 (95%CI 0.99-1.00) and cut-off point above 580 U/L wasfound optimal for classifying the perinatal asphyxiain symptomatic neonates [1]. It has also beensimilarly used for parameters such as mid-armcircumference at birth for detection of low birthweight [2], and first day total serum bilirubin value topredict the subsequent hyperbilirubinemia [3]. It isalso used for evaluating model accuracy andvalidation such as death and survival in children orneonates admitted in the PICU based on the childcharacteristics [4], and for comparing predictabilityof mortality in extreme preterm neonates by birth-weight with predictability by gestational age and withclinical risk index of babies score [5].

SENSITIVITY AND SPECIFICITY

Two popular indicators of inherent statistical validityof a medical test are the probabilities of detecting

correct diagnosis by test among the true diseasedsubjects (D+) and true non-diseased subjects (D-).For dichotomous response, the results in terms of testpositive (T+) or test negative (T-) can be summarizedin a 22 contingency table (Table I). The columnsrepresent the dichotomous categories of truediseased status and rows represent the test results.True status is assessed by gold standard. Thisstandard may be another but more expensivediagnostic method or a combination of tests or maybe available from the clinical follow-up, surgicalverification, biopsy, autopsy, or by panel of experts.Sensitivity or true positive rate (TPR) is conditionalprobability of correctly identifying the diseased

TPsubjects by test: SN = P(T+/D+) = ; andTP + FNspecificity or true negative rate (TNR) is conditionalprobability of correctly identifying the non-disease

TNsubjects by test: SP= P(T-/D-) = . FalseTN +FPpositive rate (FPR) and false negative rate (FNR) arethe two other common terms, which are conditionalprobability of positive test in non-diseased subjects:

FPP(T+/D-)= ; and conditional probability ofFP + TNFNnegative test in diseased subjects: P(T-/D+)= ,TP + FN

respectively.

Calculation of sensitivity and specificity ofvarious values of mid-arm circumference (cm) fordetecting low birth weight on the basis of ahypothetical data are given in Table II as anillustration. The same data have been used later todraw a ROC curve.

ROC CURVE

ROC curve is graphical display of sensitivity (TPR)on y-axis and (1 specificity) (FPR) on x-axis forvarying cut-off points of test values. This is

TABLE I DIAGNOSTIC TEST RESULTS IN RELATION TO TRUE DISEASE STATUS IN A 22 TABLE

Diagnostic test result Disease status Total

Present Absent

Present True positive (TP) False positive (FP) All test positive (T+)Absent False negative (FN) True negative (TN) All test negative (T-)Total Total with disease (D+) Total without disease (D-) Total sample size



generally depicted in a square box for convenienceand its both axes are from 0 to 1. Figure 1 depicts aROC curve and its important components asexplained later. The area under the curve (AUC) isan effective and combined measure of sensitivityand specificity for assessing inherent validity of adiagnostic test. Maximum AUC = 1 and it meansdiagnostic test is perfect in differentiating diseasedwith non-diseased subjects. This implies bothsensitivity and specificity are one and both errorsfalse positive and false negativeare zero. This canhappen when the distribution of diseased and non-diseased test values do not overlap. This isextremely unlikely to happen in practice. The AUCcloser to 1 indicates better performance of the test.

The diagonal joining the point (0, 0) to (1,1)divides the square in two equal parts and each hasan area equal to 0.5. When ROC is this line, overallthere is 50-50 chances that test will correctly discri-minate the diseased and non-diseased subjects. Theminimum value of AUC should be considered 0.5instead of 0 because AUC = 0 means test incorrectlyclassified all subjects with disease as negative andall non-disease subjects as positive. If the test re-sults are reversed then area = 0 is transformed to area= 1; thus a perfectly inaccurate test can betransformed into a perfectly accurate test!

ADVANTAGES OF THE ROC CURVE

ROC curve has following advantages comparedwith single value of sensitivity and specificity at aparticular cut-off.

1. The ROC curve displays all possible cut-offpoints, and one can read the optimal cut-off forcorrectly identifying diseased or non-diseasedsubjects as per the procedure given later.

TABLE II HYPOTHETICAL DATA SHOWING THE SENSITIVITY AND SPECIFICITY AT VARIOUS CUT-OFF POINTS OF MID-ARMCIRCUMFERENCE TO DETECT LOW BIRTH WEIGHT

Mid-arm cir- Low birthweight Normal birth weight Sensitivity Specificitycumference (cm) (



2. The ROC curve is independent of prevalence ofdisease since it is based on sensitivity andspecificity which are known to be independentof prevalence of disease [6-7].

3. Two or more diagnostic tests can be visuallycompared simultaneously in one figure.

4. Sometimes sensitivity is more important thanspecificity or vice versa, ROC curve helps infinding the required value of sensitivity at fixedvalue of specificity.

5. Empirical area under the ROC curve (explainedlater) is invariant with respect to the addition orsubtraction of a constant or transformation likelog or square root [8]. Log or square roottransformation condition is not applicable forbinormal ROC curve. Binormal ROC is alsoshortly explained.

6. Useful summary of measures can be obtainedfor determining the validity of diagnostic testsuch as AUC and partial area under the curve.

NON-PARAMETRIC AND PARAMETRIC METHODS TOOBTAIN AREA UNDER THE ROC CURVE

Statistical softwares provide non-parametric andparametric methods for obtaining the area underROC curve. The user has to make a choice. Thefollowing details may help.

Non-parametric Approach

This does not require any distribution pattern of testvalues and the resulting area under the ROC curve iscalled empirical. First such method uses trapezoidalrule. It calculates the area by just joining the points(1-SP,SM) at each interval of the observed values ofcontinuous test and draws a straight line joining thex-axis. This forms several trapezoids (Fig 2) andtheir area can be easily calculated and summed.Figure 2 is drawn for the mid-arm circumferenceand low birth weight data in Table II. Another non-parametric method uses Mann-Whitney statistics,also known as Wilcoxon rank-sum statistic and thec-index for calculating area. Both these methods ofestimating AUC estimate have been foundequivalent [7].

Standard errors (SE) are needed to construct aconfidence interval. Three methods have beensuggested for estimating the SE of empirical areaunder ROC curve [7,9-10]. These have been foundsimilar when sample size is greater than 30 in eachgroup provided test value is on continuous scale[11]. For small sample size it is difficult torecommend any one method. For discrete ordinaloutcome, Bamber method [9] and Delong method[10] give equally good results and better thanHanley and McNeil method [7].

Parametric Methods

These are used when the statistical distribution ofdiagnostic test values in diseased and non-diseasedis known. Binormal distribution is commonly usedfor this purpose. This is applicable when test valuesin both diseased and non-diseased subjects follownormal distribution. If data are actually binormal ora transformation such as log, square or Box-Cox[12] makes the data binormal then the relevantparameters can be easily estimated by means andvariances of test values in diseased and non-diseased subjects. Details are available elsewhere[13].

Another parametric approach is to transform thetest results into an unknown monotone form whenboth the diseased and non-diseased populations

Se

ns

it

iv

it

y

FIG. 2 Comparison of empirical and binormal ROC curvesfor hypothetical neonatal data in Table II.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.91 - Specificity



follow binormal distribution [14]. This firstdiscretizes the continuous data into a maximum 20categories, then uses maximum likelihood methodto estimate the parameters of the binormaldistribution and calculates the AUC and standarderror of AUC. ROCKIT package containingROCFIT method uses this approach to draw theROC curve, to estimate the AUC, for comparisonbetween two tests, and to calculate partial area [15].

The choice of method to calculate AUC forcontinuous test values essentially depends uponavailability of statistical software. Binormal methodand ROCFIT method produce results similar to non-parametric method when distribution is binormal[16]. In unimodal skewed distribution situation,Box-Cox transformation that makes test valuebinormal and ROCFIT method perform give resultssimilar to non-parametric method but former twoapproaches have additional useful property forproviding smooth curve [16,17]. When software forboth parametric and non-parametric methods isavailable, conclusion should be based on the methodwhich yields greater precision to estimate the AUC.However, for bimodal distribution (having twopeaks), which is rarely found in medical practice,Mann-Whitney gives more accurate estimatescompare to parametric methods [16]. Parametricmethod gives small bias for discrete test valuecompared to non-parametric method [13].

The area under the curve by trapezoidal rule andMann-Whitney U are 0.9142 and 0.9144,respectively, of mid-arm circumference forindicating low birth weight in our data. The SE alsois nearly equal by three methods in these data:Delong SE = 0.0128, Bamber SE = 0.0128, andHanley and McNeil SE = 0.0130. For parametricmethod, smooth ROC curve was obtained assumingbinormal assumption (Fig 2) and the area under thecurve is calculated by using means and standarddeviations of mid-arm circumference in normal andlow birth weight neonates which is 0.9427 and itsSE is 0.0148 in this example. Binormal methodshowed higher area compared to area by non-parametric method which might be due to violationof binormal assumption in this case. Binormal ROCcurve is initially above the empirical curve (Fig 2)suggesting higher sensitivity compared to empirical

values in this range. When (1specificity) liesbetween 0.1 to 0.2, the binormal curve is below theempirical curve, suggesting comparatively lowsensitivity compared to empirical values. Whenvalues of (1specificity) are greater than 0.2, thecurves are almost overlapping suggesting bothmethods giving the similar sensitivity. The AUC byusing ROCFIT methods is 0.9161 and it standarderror is 0.0100. This AUC is similar to the non-parametric method; however standard error is littleless compared to standard error by non-parametricmethod. The data in our example has unimodalskewed distribution and results agree with previoussimulation study [16, 17] on such data. Allcalculations were done using MS Excel and STATAstatistical software for this example.

Interpretation of ROC Curve

Total area under ROC curve is a single index formeasuring the performance a test. The larger theAUC, the better is overall performance ofdiagnostic test to correctly pick up diseased andnon-diseased subjects. Equal AUCs of two testsrepresents similar overall performance of medicaltests but this does not necessarily mean that both thecurves are identical. They may cross each other.Three common interpretations of area under theROC curve are: (i) the average value of sensitivityfor all possible values of specificity, (ii) the averagevalue of specificity for all possible values ofsensitivity [13]; and (iii) the probability that arandomly selected patient with disease has positivetest result that indicates greater suspicion than arandomly selected patient without disease [10]when higher values of the test are associated withdisease and lower values are associated with non-disease. This interpretation is based on non-parametric Mann-Whitney U statistic forcalculating the AUC.

Figure 3 depicts three different ROC curves.Considering the area under the curve, test A is betterthan both B and C, and the curve is closer to theperfect discrimination. Test B has good validity andtest C has moderate.

Hypothetical ROC curves of three diagnostictests A, B, and C applied on the same subjects to



classify the same disease are shown in Fig 4. Test B(AUC=0.686) and C (AUC=0.679) have nearlyequal area but cross each other whereas test A(AUC=0.805) has higher AUC value than curves Band C. The overall performance of test A is betterthan test B as well as test C at all the thresholdpoints. Test C performed better than test B wherehigh sensitivity is required, and test B performedbetter than C when high specificity is needed.

Sensitivity at Fixed Point and Partial Area Underthe ROC Curve

The choice of fixed specificity or range ofspecificity depends upon clinical setting. Forexample, to diagnose serious disease such as cancerin a high risk group, test that has higher sensitivity ispreferred even if the false positive rate is highbecause test giving more false negative subjects ismore dangerous. On the other hand, in screening alow risk group, high specificity is required for thediagnostic test for which subsequent confirmatorytest is invasive and costly so that false positive rateshould be low and patient does not unnecessarilysuffers pain and pays price. The cut-off point shouldbe decided accordingly. Sensitivity at fixed point ofspecificity or vice versa and partial area under the

curve are more suitable in determining the validityof a diagnostic test in above mentioned clinicalsetting and also for the comparison of twodiagnostic tests when applied to same orindependent patients when ROC curves cross eachother.

Partial area is defined as area between range offalse positive rate (FPR) or between the twosensitivities. Both parametric (binormalassumption) and non-parametric methods areavailable in the literature [13,18] but most statisticalsoftwares do not have option to calculate the partialarea under ROC curve (Table III). STATAsoftware has all the features for ROC analysis.

Standardization of the partial area by dividing itwith maximum possible area (equal to the width ofthe interval of selected ranged of FPRs orsensitivities) has been recommended [19]. It can beinterpreted as average sensitivity for the range ofselected specificities. This standardization makespartial area more interpretable and its maximumvalue will be 1. Figure 5 shows that the partial areaunder the curve for FPR from 0.3 to 0.5 for test A is0.132, whereas for test B is 0.140. Afterstandardization, they it would be 0.660 and 0.700,

FIG. 3 Comparison of three smooth ROC curves withdifferent areas.

FIG. 4 Three empirical ROC curves. Curves for B and Ccross each other but have nearly equal areas, curveA has bigger area.



TA

BLE II

I SO

ME P

OPU

LAR

RO

C A

NA

LYSI

S SO

FTW

AR

ES A

ND

TH

EIR IM

PORT

AN

T FEA

TUR

ES

Nam

e of R

OC

anal

ysis

softw

are

Met

hods

use

d to

Com

paris

on o

f tw

o or

mor

ePa

rtial

area

Impo

rtant

by-

prod

ucts

estim

ate A

UC

of

RO

C cu

rves

RO

C cu

rve a

nd it

sva

rianc

eC

alcu

latio

nC

ompa

rison

Med

calc

sof

twar

e ver

sion

11.3

;N

on-p

aram

etric

Avai

labl

eN

ot av

aila

ble

Not

avai

labl

eSe

nsiti

vity

, spe

cific

ity,

Com

mer

cial

softw

are t

rial v

ersi

onLR

+, L

R- w

ith 9

5% C

Iav

aila

ble a

t: w

ww.

med

calc

.be

at ea

ch p

ossi

ble c

ut-p

oint

SPSS

ver

sion

17.

0;N

on-p

aram

etric

Not

avai

labl

eN

ot av

aila

ble

Not

avai

labl

eSe

nsiti

vity

and

Com

mer

cial

softw

are

spec

ifici

ty at

each

cut-o

ff po

int -

no

95%

Cl

STAT

A v

ersi

on 11

;N

on-p

aram

etric

Avai

labl

e for

pai

red

and

unpa

i-Av

aila

ble

spec

ifici

tyO

nly

two

parti

alSe

nsiti

vity

, spe

cific

ity,

Com

mer

cial

softw

are

Para

met

ricre

d su

bjec

ts w

ith B

onfe

rron

iat

spec

ified

rang

e and

AU

Cs c

an b

eLR

+, L

R-

at ea

ch cu

t-off

(Met

z et a

l.)ad

just

men

t whe

n 3

or m

ore

sens

itivi

ty ra

nge

com

pare

dpo

int b

ut n

o 95

% C

Icu

rves

to b

e com

pare

dR

OC

KIT

(Bet

a ver

sion

)Pa

ram

etric

Avai

labl

e for

bot

h pa

ired

Avai

labl

eAv

aila

ble

Sens

itivi

ty an

dFr

ee so

ftwar

e; A

vaila

ble a

t(M

etz e

t al.)

and

unpa

ired

subj

ects

spec

ifici

ty at

each

cut-

ww

w.ra

diol

ogy.u

chic

ago.

edu

poin

tA

naly

se -

itCom

mer

cial

softw

are;

Non

-par

amet

ricAv

aila

ble

no

optio

n fo

rN

ot av

aila

ble

Not

avai

labl

eSe

nsiti

vity

, spe

cific

ity,

add

on in

the M

S Ex

cel;

trial

ver

sion

paire

d an

d un

paire

dLR

+, L

R- w

ith 9

5% C

I of

avai

labl

e at w

ww.

anal

yse-

it.co

msu

bjec

tsea

ch p

ossi

ble c

ut-p

oint

;Va

rious

dec

isio

n pl

ots

XLs

tat2

010

Com

mer

cial

softw

are;

Non

-par

amet

ricAv

aila

ble f

or b

oth

paire

dN

ot av

aila

ble

Not

avai

labl

eSe

nsiti

vity

, spe

cific

ity,

add

on in

the M

S Ex

cel;

trial

ver

-an

d un

paire

d su

bjec

tsLR

+, L

R- w

ith 9

5% C

I of

sion

avai

labl

e at:

ww

w.xl

stat

.com

each

pos

sibl

e cut

-Var

ious

- de

cisi

on p

lots

Sigm

a plo

t Com

mer

cial

softw

are;

Non

-par

amet

ricAv

aila

ble f

or b

oth

paire

dad

d on

in th

e MS

Exce

l; tri

al v

er-

and

unpa

ired

subj

ects

Not

avai

labl

eN

ot av

aila

ble

Sens

itivi

ty, s

peci

ficity

,av

aila

ble a

t:LR

+, L

R- w

ith 9

5% C

I of

ww

w.si

gmap

lot.c

omea

ch p

ossi

ble c

ut-p

oint

:Va

rious

- dec

isio

n pl

ots

*LR+

=Po

sitiv

e lik

elih

ood

ratio

, LR-

= N

egat

ive l

ikel

ihoo

d ra

tio, P

aire

d su

bjec

ts m

eans

bot

h di

agno

stic t

ests

(test

and

gold

) ap

plie

d to

sam

e sub

ject

s and

unp

aire

d su

bjec

ts m

eans

dia

gnos

ticte

sts a

pplie

d to

diff

eren

t sub

ject

s, C

I = C

onfid

ence

inte

rval

, AU

C=

Area

und

er cu

rve,

RO

C=

Rece

iver

ope

ratin

g ch

arac

teris

tic.



respectively. The portion of partial area will dependon the range of interest of FPRs selected byresearcher. It may lie on one side of intersectingpoint or may be on both sides of intersecting pointof ROC curves. In Figure 5, SN(A) and SN(B) aresensitivities at specific value of FPR. For example,sensitivity at FPR=0.3 is 0.545 for test A and 0.659for test B. Similarly sensitivity at fixed FPR=0.5 fortest A is 0.76 and 0.72 for test B. All thesecalculations were done by STATA (version 11)statistical software using comproc command withoption pcvmeth(empirical).

METHOD TO FIND THE OPTIMAL THRESHOLDPOINT

Optimal threshold is the point that gives maximumcorrect classification. Three criteria are used to findoptimal threshold point from ROC curve. First twomethods give equal weight to sensitivity andspecificity and impose no ethical, cost, and noprevalence constraints. The third criterionconsiders cost which mainly includes financial costfor correct and false diagnosis, cost of discomfort toperson caused by treatment, and cost of furtherinvestigation when needed. This method is rarelyused in medical literature because it is difficult toimplement. These three criteria are known as pointson curve closest to the (0, 1), Youden index, andminimize cost criterion, respectively.

The distance between the point (0, 1) and anypoint on the ROC curve is d2 =[(1SN)

2 + (1 Sp)2].

To obtain the optimal cut-off point to discriminatethe disease with non-disease subject, calculate thisdistance for each observed cut-off point, and locatethe point where the distance is minimum. Most ofthe ROC analysis softwares (Table III) calculatethe sensitivity and specificity at all the observedcut-off points allowing you to do this exercise.

The second is Youden index [20] that maximizesthe vertical distance from line of equality to thepoint [x, y] as shown in Fig 1. The x-axis represents(1- specificity) and y-axis represents sensitivity. Inother words, the Youden index J is the point on theROC curve which is farthest from line of equality(diagonal line). The main aim of Youden index is tomaximize the difference between TPR (SN) and

FPR (1 SP) and little algebra yields J =max[SN+SP]. The value of J for continuous test canbe located by doing a search of plausible valueswhere sum of sensitivity and specificity can bemaximum. Youden index is more commonly usedcriterion because this index reflects the intension tomaximize the correct classification rate and is easyto calculate. Many authors advocate this criterion[21]. Third method that considers cost is rarely usedin medical literature and is described in [13].

BIASES THAT CAN AFFECT ROC CURVE RESULTS

We describe more prevalent biases in this sectionthat affect the sensitivity, specificity and conse-quently may affect the area under the ROC curve.Interested researcher can find detailed descriptionof these and other biases such as withdrawal bias,lost to follow-up bias, spectrum bias, andpopulation bias, elsewhere [22,23].

1. Gold standard: Validity of gold standard isimportantideally it should be error free and thediagnostic test under review should beindependent of the gold standard as this canincrease the area under the curve spuriously. Thegold standard can be clinical follow-up, surgicalverification, biopsy or autopsy or in some cases

FIG. 5 The partial area under the curve and sensitivity atfixed point of specificity (see text).



opinion of panel of experts. When gold standard isimperfect, such as peripheral smear for malariaparasites [24], sensitivity and specificity of thetest are under estimated [22].

2. Verification bias: This occurs when all diseasesubjects do not receive the same gold standard forsome reason such as economic constraints andclinical considerations. For example, inevaluating the breast bone density as screeningtest for diagnosis of breast cancer and only thosewomen who have higher value of breast bonedensity are referred for biopsy, and those withlower value but suspected are followed clinically.In this case, verification bias would overestimatethe sensitivity of breast bone density test.

3. Selection bias: Selection of right patients with andwithout diseased is important because some testsproduce prefect results in severely diseased groupbut fail to detect mild disease.

4. Test review bias: The clinician should be blind tothe actual diagnosis while evaluating a test. Aknown positive disease subject or known non-disease subject may influence the test result.

5. Inter-observer bias: In the studies where observerabilities are important in diagnosis, such as forbone density assessment through MRI,experienced radiologist and junior radiologistmay differ. If both are used in the same study, theobserver bias is apparent.

6. Co-morbidity bias: Sometimes patients haveother types of known or unknown diseases whichmay affect the positivity or negativity of test. Forexample, NESTROFT (Naked eye single tube redcell osmotic fragility test), used for screening ofthalassaemia in children, shows good sensitivityin patients without any other hemoglobindisorders but also produces positive results whenother hemoglobin disorders are present [25].

7. Uninterpretable test results: This bias occurswhen test provides results which can not beinterpreted and clinician excludes these subjectsfrom the analysis. This results in over estimationof validity of the test.

It is difficult to rule out all the biases butresearcher should be aware and try to minimizethem.

TABLE IV SAMPLE SIZE FORMULA FOR ESTIMATING SENSITIVITY AND SPECIFICITY AND AREA UNDER THE ROC CURVE

Problem Formula Description of symbol used

Estimating the sensitivity of test SN = Anticipated sensitivityPrev = Prevalence of disease in population canbe obtained from previous literature or pilotstudy = required absolute precision on either side ofthe sensitivity

Estimating the specificity of test SN = Anticipated specificityPrev = Prevalence of disease in population canbe obtained from previous literature or pilotstudy = required absolute precision on either side ofthe specificity.

Estimating the area under the V(AUC) = Anticipated variance anticipatedROC curve area under ROC curve

nD = number of diseased subjects = required absolute precision on either side ofn = nD(1+k), k is ratio of prevalence the area under the curve.of non-disease to disease subjects

Z1/2 is a standard normal value and is the confidence level. Z1/2= 1.645 for =0.10 and Z1/2= 1.96 for =0.05.

Z21/2SN (1SN)

2 Prev

Z21/2Sp (1Sp)

2 (1Prev)

Z2/2V (AUC)nD=

2



SAMPLE SIZE

Adequate power of the study depends upon thesample size. Power is probability that a statisticaltest will indicate significant difference wherecertain pre-specified difference is actually present.In a survey of eight leading journals, only two out of43 studies reported a prior calculation of samplesize in diagnostic studies [26]. In estimation set-up,adequate sample size ensures the study will yieldthe estimate with desired precision. Small samplesize produces imprecise or inaccurate estimate,while large sample size is wastage of resourcesespecially when a test is expensive. The sample sizeformula depends upon whether interest is inestimation or in testing of the hypothesis. Table IVprovides the required formula for estimation ofsensitivity, specificity and AUC. These are based onthe normal distribution or asymptotic assumption(large sample theory) which is generally used forsample size calculation.

Variance of AUC, required in formula 3 (TableIV), can be obtained by using either parametric ornon-parametric method. This may also be availablein literature on previous studies. If no previousstudy is available, a pilot study is done to get someworkable estimates to calculate sample size. Forpilot study data, appropriate statistical software canprovide estimate of this variance.

Formulas of sample size to test hypothesis onsensitivity-specificity or the AUC with a pre-specified value and for comparison on the samesubjects or different subjects are complex. Refer[13] for details. A Nomogram was devised to readthe sample size for anticipated sensitivity and

specificity at 90%, 95%, 99% confidence level [27].

There are many more topics for interested readerto explore such as combining the multiple ROCcurve for meta-analysis, ROC analysis to predictmore than one alternative, ROC analysis in theclustered environment, and for tests for repeatedover the time. For these see [13,28]. For predictivitybased ROC, see [6].

Contributors: Both authors contributed to concept, review ofliterature, drafting the paper and approved the final version.Funding: None.Competing interests: None stated.

REFERENCES

1. Reddy S, Dutta S, Narang A. Evaluation of lactatedehydrogenase, creatine kinase and hepatic enzymesfor the retrospective diagnosis of perinatal asphyxiaamong sick neonates. Indian Pediatr. 2008;45:144-7.

2. Sood SL, Saiprasad GS, Wilson CG. Mid-arm circum-ference at birth: A screening method for detection oflow birth weight. Indian Pediatr. 2002;39:838-42.

3. Randev S, Grover N. Predicting neonatalhyperbilirubinemia using first day serum bilirubinlevels. Indian J Pediatr. 2010; 77:147-50.

4. Vasudevan A, Malhotra A, Lodha R, Kabra SK. Profileof neonates admitted in pediatric ICU and validation ofscore for neonatal acute physiology (SNAP). IndianPediatr. 2006;43:344-8.

5. Khanna R, Taneja V, Singh SK, Kumar N, Sreenivas V,Puliyel JM. The clinical risk index of babies (CRIB)score in India. Indian J Pediatr. 2002;69:957-60.

6. Indrayan A. Medical Biostatistics (Second Edition).Boca Raton: Chapman & Hall/ CRC Press; 2008. p.263-7.

7. Hanley JA, McNeil BJ. The meaning and use of the areaunder a receiver operating characteristic (ROC) curve.Radiology. 1982;143:29-36.

8. Campbell G. Advances in statistical methodology forevaluation of diagnostic and laboratory tests. Stat Med.1994;13:499-508.

KEY MESSAGES For sensitivity and specificity to be valid indicators, the gold standard should be nearly perfect.

For a continuous test, optimal cut-off point to correctly pick-up a disease and non-diseased cases is the pointwhere sum of specificity and sensitivity is maximum, when equal weight is given to both.

ROC curve is obtained for a test on continuous scale. Area under the ROC curves is an index of overall inherentvalidity of test as well as used for comparing sensitivity and specificity at particular cut-offs of interest.

Area under the curve is not an accurate index of comparing two tests when their ROC curves cross each other.

Adequate sample size and proper designing is required to yield valid and unbiased results.



9. Bamber D. The area above the ordinal dominance graphand area below the receiver operating characteristicgraph. J Math Psychol. 1975;12:387-415.

10. DeLong ER, DeLong DM, Clarke-Pearson DL.Comparing the area under two or more correlatedreceiver operating characteristic curves: anonparametric approach. Biometrics. 1988;44:837-45.

11. Cleves MA. Comparative assessment of three commonalgorithms for estimating the variance of the area underthe nonparametric receiver operating characteristiccurve. Stata J. 2002;3:280-9.

12. Box GEP, Cox DR. An analysis of transformation. JRoyal Statistical Society, Series B. 1964;26:211-52.

13. Zhou Xh, Obuchowski NA, McClish DK. StatisticalMethods in Diagnostic Medicine. New York: JohnWiley and Sons, Inc; 2002.

14. Metz CE, Herman BA, Shen JH. Maximum likelihoodestimation of receiver operating characteristic (ROC)curve from continuously distributed data. Stat Med.1998;17:1033-53.

15. ROCKIT [Computer program]. Chicago: University ofChicago. Available from: www-radiology.uchicago.edu/krl/KRL_ROC/software_index6.htm. Accessed onFebruary 27, 2007.

16. Faraggi D, Reiser B. Estimating of area under the ROCcurve. Stat Med. 2002; 21:3093-3106.

17. Hajian Tilaki KO, Hanley JA, Joseph L, Collet JP. Acomparison of parametric and nonparametricapproaches to ROC analysis of quantitative diagnosistests. Med Decis Making. 1997;17:94-102.

18. Pepe M, Longton G, Janes H. Estimation andcomparison of receiver operating characteristic curves.Stata J. 2009;9:1.

19. Jiang Y, Metz CE, Nishikawa RM. A receiver operating

characteristic partial area index for highly sensitivediagnostic test. Radiology. 1996;201:745-50.

20. Youden WJ. An index for rating diagnostic test.Cancer. 1950;3:32-5.

21. Perkins NJ, Schisterman EF. The inconsistency ofoptimal cut points obtained using two criteria basedon the receiver operating characteristic curve. Am JEpidemiol. 2006;163:670-5.

22. Whiting P, Ruljes AW, Reitsma JB, Glas AS, BossuytPM, Kleijnen J. Sources of variation and bias in studiesof diagnostic accuracy a systematic review. AnnIntern Med. 2004;140:189-202.

23. Kelly S, Berry E, Proderick P, Harris KM,Cullingworth J, Gathercale L, et al. The identificationof bias in studies of the diagnostic performance ofimaging modalities. Br J Radiol. 1997;70:1028-35.

24. Malaria Site. Peripheral smear study for malariaparasites Available from: www.malariasite.com/malaria/DiagnosisOfMalaria.htm. Accessed on July05, 2010.

25. Thomas S, Srivastava A, Jeyaseelan L, Dennison D,Chandy M. NESTROFT as screening test for thedetection of thalassaemia & common haemoglobino-pathies an evaluation against a high performanceliquid chromatographic method. Indian J Med Res.1996;104:194-7.

26. Bachmann LM, Puhan MA, ter Riet G, Bossuyt PM.Sample sizes of studies on diagnostic accuracy:literature survey. BMJ. 2006;332:1127-9.

27. Malhotra RK, Indrayan A. A simple nomogram forsample size for estimating the sensitivity and specificityof medical tests. Indion J Ophthalmol.2010;58:519-22.

28. Kester AD, Buntinx F. Meta analysis of curves. MedDecis Making. 2000;20:430-9.

/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /Warning /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 215 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.04651 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 600 /MonoImageMinResolutionPolicy /Warning /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

/CreateJDFFile false /Description >>> setdistillerparams> setpagedevice

receiver operating characteristic (roc) curve

Documents