04/08/23 Validity and reliability of Tests 1
VALIDITY AND RELIABILITY OF SCREENING TESTS
Rashida B Syed, EpidemiologistConsultant Faculty Field Epidemiology Training Program (FETP)-Pakistan
04/08/23 Validity and reliability of Tests 2
Objectives Calculate and interpret measures of the validity of a screening
test: Sensitivity Specificity
Understand the relationship between sensitivity and specificity.
Calculate and interpret measures of the performance (yield) of a screening test:
Predictive value positive (PV+) Predictive value negative (PV-)
Understand factors that influence PV+ and PV-
Recognize issues and sources of bias in evaluating screening programs.
04/08/23 Validity and reliability of Tests 3
Purpose of screening The early detection of disease in individuals
who do not show any signs of disease.
Aims to reduce morbidity and mortality from disease among persons being screened.
Is the application of a relatively simple, inexpensive test, examinations or other procedures to people.
a means of identifying persons at increased risk for the presence of disease, who warrant further evaluation
04/08/23 Validity and reliability of Tests 4
Diagnosis = Screening
Screening tests can also often be used as diagnostic tests
Diagnosis involves confirmation of presence or absence of disease in someone suspected of or at risk for disease
Screening is generally in done among individuals who are not suspected of having disease
04/08/23 Validity and reliability of Tests 5
Requirements
Is there a truly effective treatment available for the discovered disease?
Is that treatment more effective in screened than non-screened cases?
What are the side effects of the screening process?
How efficient is screening? Do we have the right threshold? i.e. how many people must be screened to obtain a case?
04/08/23 Validity and reliability of Tests 6
Susceptible Host
Subclinical Disease
Clinical Disease
Stage of Recovery, Disability, or Death
Point of Exposure
Screening
Onset of symptoms
Diagnosis sought
Natural History of Disease
Detectable sub-clinical disease
04/08/23 Validity and reliability of Tests 7
Examples of Screening Tests
Questions Clinical Examinations Laboratory Tests Genetic Tests X-rays
Goel
04/08/23 Validity and reliability of Tests 8
Diseases for which screening has been recommendedDiseases for which screening has been recommended
Cervical cancer
Breast cancer
Prostate cancer
Colon cancer
Diabetes
Hypertension
04/08/23 Validity and reliability of Tests 9
Terminology
Validity is analogous to accuracy
The validity of a screening test is how well the given screening test reflects another test of known greater accuracy
Validity assumes that there is a gold standard to which a test can be compared
Paneth
04/08/23 Validity and reliability of Tests 10
04/08/23 Validity and reliability of Tests 11
Three key measures of validity
• Sensitivity• Specificity• Predictive value
04/08/23 Validity and reliability of Tests 12
Sensitivity and Specificity
Sensitivity tells us how well a positive test detects disease.
It is defined as the ability of the test to identify correctly as diseased, those who have the disease.
---------------------------------------------------------------------------------
Specificity tells us how well a negative test detects
non-disease.
Defined as the ability of the test to identify correctly those who do not have the disease as test negative.
04/08/23 Validity and reliability of Tests 13
DiseaseS
cree
nin
gT
est
Present Absent
PositiveTrue
positives
Negative
Falsepositives
Falsenegatives
Truenegatives
04/08/23 Validity and reliability of Tests 14
Present Absent
Positive a b
Negative c d
a + b
c + d
a + c b + d
DiseaseS
cree
nin
gT
est
N
04/08/23 Validity and reliability of Tests 15
Sensitivity
Proportion of individuals who have the disease who test positive (true positive rate) tells us how well a “+” test picks up disease
a
a + c=Sensitivityyes no
+ a b
- c d
a + b
c + d
a + c b + d
Disease
Scr
eeni
ngT
est
N
04/08/23 Validity and reliability of Tests 16
Specificity
Proportion of individuals who don’t have the disease who test negative (true negative rate) tell us how well a “-” test detects no disease
d
b + d=Specificityyes no
+ a b
- c d
a + b
c + d
a + c b + d
Disease
Scr
eeni
ngT
est
N
04/08/23 Validity and reliability of Tests 17
Predictive value
Positive predictive value – the number of individuals who have a condition from all those who test positive.
Negative predictive value - the number of individuals who do not have a condition from all those who test negative
04/08/23 Validity and reliability of Tests 18
Positive Predictive Value
Proportion of individuals who test positive who actually have the disease
a
a + b=P.P.V.yes no
+ a b
- c d
a + b
c + d
a + c b + d
Disease
Scr
een
ing
Tes
t
N
04/08/23 Validity and reliability of Tests 19
Negative Predictive Value
Proportion of individuals who test negative who don’t have the disease
d
c + d=N.P.V.yes no
+ a b
- c d
a + b
c + d
a + c b + d
Disease
Scr
een
ing
Tes
t
N
04/08/23 Validity and reliability of Tests 20
Determinants of predictive value
The predictive value of a test is determined by 3 factors:
1. Sensitivity 2. Specificity 3. Prevalence of the disease in the
population being tested
04/08/23 Validity and reliability of Tests 21
Effect of prevalence on PPV
As prevalence rates decrease, the positive predictive value of a test also decreases
This explains why diagnostic tests which are developed in clinical populations (where the prevalence of the disease being tested is often high) often perform poorly in general population settings (where disease prevalence tends to be lower).
In our example-prove it
04/08/23 Validity and reliability of Tests 22
Scenarios
Tests with Dichotomous Results Examples
(Positive or Negative)
Tests with Continuous results Examples
Systolic blood pressure (mm Hg) Tuberculin reaction (induration diameter, mm)
04/08/23 Validity and reliability of Tests 23
Examples
In a sample of 200 people: 100 people have the disease Hypothyroidism, and 100 people do not have it.
In the same sample of 200 people: 110 people test positive for Hypothyroidism using a new diagnostic test, and 90 people test negative for Hypothyroidism using the same diagnostic test.
Of the 110 people who are test positive, 90 do have the disease and 20 do not.
Of the 90 people who are test negative, 10 do have the disease and 80 do not.
Sensitivity and Specificity?
04/08/23 Validity and reliability of Tests 24
Solution
SENSITIVITY=TP/TP+FN
=90/90+10=90% SPECIFICITY=TN/TN+FP
=80/80+20=80%
04/08/23 Validity and reliability of Tests 25
Present Absent
Positive 48 3
Negative 2 47
51
49
50 50
Disease
Scr
een
ing
Tes
t
100
A test is used in 50 people with disease and50 people without. These are the results.
Paneth
04/08/23 Validity and reliability of Tests 26
Present Absent
Positive 48 3
Negative 2 47
51
49
50 50
Disease
Scr
een
ing
Tes
t
100Sensitivity = 48/50Specificity = 47/50Positive Predictive Value = 48/51Negative Predictive Value = 47/49
Paneth
04/08/23 Validity and reliability of Tests 27
So… you understand the accuracy of a screening test …
What is the next step?
Put screening to use in the population
04/08/23 Validity and reliability of Tests 28
Sensitive vs. Specific tests
A test with high levels of sensitivity is usually positive when disease is present and has few false negatives – useful when it is important not to miss a diagnosis (e.g. if the disease is dangerous but has an effective treatment)
A test with high levels of specificity is usually negative when disease is absent and has few false positives – useful when a false positive diagnosis would be harmful (e.g. if it resulted in unnecessary treatment)
04/08/23 Validity and reliability of Tests 29
Balancing sensitivity vs. specificity A really good test would be highly sensitive and highly specific. In practice, this is often not the case.
Instead, there is often a trade-off between the sensitivity and the specificity of diagnostic tests
This occurs in cases where the test result is expressed on a continuous scale (e.g. blood pressure, blood sugar levels)
In such circumstances, a cut-point has to be chosen to define normal vs. abnormal
The decision for the cut point involves weighing the consequences of leaving cases undetected (false negatives) against erroneously classifying healthy persons as diseased (false positives).
Refer to Gordis
04/08/23 Validity and reliability of Tests 30
NET SENSITIVITY AND SPECIFICITY
Use of multiple tests Refer Gordis
04/08/23 Validity and reliability of Tests 31
Balancing sensitivity vs. specificity
Blood sugar level Sensitivity % Specificity%2hrs after eating(mg/100ml)
70 98.6 8.890 94.3 47.6110 85.7 84.1130 64.3 96.9170 42.9 100.0
04/08/23 Validity and reliability of Tests 32
ROC curves
One method for determining the best cut-off point is by constructing a ROC curve
ROC=receiver operating characteristic, a term that comes from radar science
ROC curves are constructed by plotting the sensitivity (or true positive rate) against the false positive rate (1-specificity)
04/08/23 Validity and reliability of Tests 33
ROC curve for blood sugar readings
Source: Fletcher, Fletcher and Wagner, Clinical epidemiology: the essentials (3rd ed)
04/08/23 Validity and reliability of Tests 34
Shows trade-off between sensitivity and specificity
Closer to left hand and top borders the more accurate the test
Slope of tangent at cut point gives the Likelihood Ratio (LR) for that value of the test
The area under the curve is a measure of test accuracy
04/08/23 Validity and reliability of Tests 35
The Area under an ROC Curve
04/08/23 Validity and reliability of Tests 36
Good tests lie close to the upper left hand corner of the graph – where sensitivity and specificity are both high
Generally the best cut-off point lies at or near the “shoulder” of the curve*
The overall accuracy of the test is represented by the area under the curve
Tests that plot close to the diagonal across the middle of the graph are least useful, as this is where the test is no better than chance
ROC curves can also be used to compare different tests
*unless there are clinical reasons for preferring a highly sensitive or highly specific test
04/08/23 Validity and reliability of Tests 37
Sources of Bias in the Evaluation of Screening Programs
Lead time bias Length bias Volunteer bias
04/08/23 Validity and reliability of Tests 38
Lead time bias
Lead time: interval between the diagnosis of a disease at screening and the usual time of diagnosis (by symptoms)
Diagnosis by screening
Diagnosis via symptoms
Lead Time
04/08/23 Validity and reliability of Tests 39
Consider a condition where the natural history allows for an earlier diagnosis, however, survival does not improve despite identifying it earlier
A screening program here will… survival will appear to increase
but in reality, it is increased by exactly the amount of time their diagnosis was advanced by the screening program
Thus there is no benefit to screening from a survival standpoint.
Lead-Time Bias
04/08/23 Validity and reliability of Tests 40
Lead time bias Assumes survival is time between screen and
death Does not take into account lead time between
diagnosis at screening and usual diagnosis.
Diagnosis by screening
in 1994
Deathin 2008
Survival = 14 years
04/08/23 Validity and reliability of Tests 41
Lead time bias
Diagnosis by
screeningin 1994
Usual time of diagnosis
via symptomsin 1998
Lead Time 4 years
Deathin 2008
True Survival = 10 years
Survival = 14 years
04/08/23 Validity and reliability of Tests 42
Length Bias
Most chronic diseases, especially cancers, do not progress at the same rate in everyone.
Any group of diseased people will include some in whom the disease developed slowly and some in whom it developed rapidly.
Screening will preferentially pick up slowly developing disease (longer opportunity to be screened) which usually has a better prognosis
Paneth
04/08/23 Validity and reliability of Tests 43
Leng
th b
ias
OBiological onset of disease
Screening
YSymptoms
Begin
DDeath
PDisease
detectable via screening
O DP Y
O DP Y
O DP Y
O DP Y
O DP Y
O P Y D
Time
04/08/23 Validity and reliability of Tests 44
Volunteer bias
Type of bias where those who choose to participate are likely to be different from those who don’t
Volunteers tend to have: Better health Lower mortality Likely to adhere to prescribed medical regimens
04/08/23 Validity and reliability of Tests 45
A worked example the Fecal occult blood (FOB) screen test is used in 203 people to look for bowel cancer: Patients with bowel cancer (as confirmed on endoscopy)
False positive rate (α) = FP / (FP + TN) = 18 / (18 + 182) = 9% = 1 − specificity.
False negative rate (β) = FN / (TP + FN) = 1 / (2 + 1) = 33% = 1 − sensitivity.
Power = sensitivity = 1 − β
Hence with large numbers of false positives and few false negatives, a positive FOB screen test is in itself poor at confirming cancer (PPV = 10%) and further investigations must be undertaken, it will though pickup 66.7% of all cancers (the sensitivity). However as a screening test, a negative result is very good at reassuring that a patient does not have cancer (NPV = 99.5%) and at this initial screen correctly identifies 91% of those who do not have cancer (the specificity).
04/08/23 Validity and reliability of Tests 46
Reliability
Validity (accuracy) Reliability (Repeatability)
Refer Epidemiology by Gordis
04/08/23 Validity and reliability of Tests 47
Review questions from Gordis
04/08/23 Validity and reliability of Tests 48
Likelihood-ratio positive = sensitivity / (1 − specificity) = 66.67% / (1 − 91%) = 7.4
Likelihood-ratio negative = (1 − sensitivity) / specificity = (1 − 66.67%) / 91% = 0.37