evidence-based laboratory medicine: diagnostic performance

25
Evidence-based Laboratory Medicine: Diagnostic Performance and Clinical Utility of a Laboratory Test Pieter Vermeersch, M.D. Ph.D. Laboratory Medicine, UZ Leuven Decide Assess Find

Upload: others

Post on 12-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Evidence-based Laboratory Medicine:

Diagnostic Performance and ClinicalUtility of a Laboratory Test

Pieter Vermeersch, M.D. Ph.D.

Laboratory Medicine, UZ Leuven

DecideAssessFind

The Diagnostic Process

History and clinical examination

List of possible diagnoses

Decide which disorders to pursue

- Possibilistic: test for all possible diagnoses

- Probabilistic: first most likely diagnoses

- Prognostic: first most serious disorders if left undiagnosed

- Pragmatic: first most easily diagnosed disorders or responsive to treatment

EMB makes this diagnostic process more systematic and explicit which allows

evidence from clinical research to be incorporated

Test and treatment tresholds in the diagnostic process

0% 100%

Test

Treshold

Treatment

Treshold

Further testing requiredNo testing warranted Treatment commences

The Diagnostic Process

Role for laboratory testing

Why are laboratory tests requested during the diagnostic process?

- Habit

- Unspoken concern for documentation or liability

- Simple curiosity

- To improve outcome

The Diagnostic Process

Why are laboratory tests requested during the diagnostic process?

- Habit

- Unspoken concern for documentation or liability

- Simple curiosity

- To improve outcome

Test Result Prediction Action Outcome

A test is clinically useful and improves outcome when the test provides information

in the form of a prediction which then influences further patient management.

e.g. IgA anti-tTG to determine the probability that the patient has a celiac disease

e.g. SPE, BM aspirate smear and Rx to estimate 10-year survival of a patient with MGUS

The Diagnostic Process

Signal detection theory- A test is considered to have 2 fixed characteristics: sensitivity and specificity

- There are 4 possible outcomes for a dichotomous result: TP, TN, FP and FN

Test Operating Characteristics

Diseasepresent

Diseaseabsent

Test positive TP FP

Test negative FN TN

Sens=TP/TP+FNSpec=TN/TN+FP

- Sensitivity is the probability that a patient having the disease will test positive.

- Specificity is the probability that a patient not having the disease will test negative.

- Sensitivity and specificity are independent of the prevalence of the disease since they

are calculated within the group of diseased and non-diseased patients, respectively.

Signal detection theory- A test is considered to have 2 fixed characteristics: sensitivity and specificity

- There are 4 possible outcomes for a dichotomous result: TP, TN, FP and FN

Diseasepresent

Diseaseabsent

Test positive TP FP

Test negative FN TN

PPV=TP/TP+FP

NPV=TN/TN+FN

Test Operating Characteristics

- Clinicians are, however, not intrested in knowing the probability that a patient having

the disease will test positive. They want to know the probability that a patient will

have the disease if the test is positive.

- PPV estimates the probability that a patient will have the disease if he tests positive.

Since it is calculated within the whole group, it depends on the prevalence (not fixed).

Example- Analysis of the diagnostic performance of IgA anti-tTG for celiac disease in 591

patients that had an intestinal biopsy.

Diseasepresent

Diseaseabsent

Test positive 41 42

Test negative 2 506

Test Operating Characteristics

Sens=95.3% Spec=92.3%

Example- Analysis of the diagnostic performance of IgA anti-tTG for celiac disease in 591

patients that had an intestinal biopsy.

Diseasepresent

Diseaseabsent

Test positive 41 42

Test negative 2 506

Test Operating Characteristics

Sens=95.3% Spec=92.3%

PPV=49.3%

NPV=99.6%

PPV is only 49.3% because the prevalence or pre-test probability in our study group

was “only” 7.3%!

PPV and NPV depend on the prevalence and are only valid for patients with a pre-

test probability that is identical to the pre-test probability of the study group.

- The likelihood is independent of the prevalence since it is calculated within the group

of patients with/without the the disease.

Likelihood and likelihood ratio- The likelihood (LH) is the proportion of patients with/without the disease that have a

certain result. The sensitivity is the likelihood that a patient with the disease tests

positive.

Diseasepresent

Diseaseabsent

Test positive 0.953 0.077

Test negative 0.047 0.923

Pretest to posttest prbability

LR+=12.4

LR+=0.05

- The LR+ is the ratio of the proportion of individuals with the disease that have a test

result divided by the proportion of non-diseased individuals with the same test result.

- The LR indicates the change in pretest to posttest probability associated with a

specific result.

Likelihood and likelihood ratio- The likelihood (LH) is the proportion of patients with/without the disease that have a

certain result. The sensitivity is the likelihood that a patient with the disease tests

positive.

Diseasepresent

Diseaseabsent

Test positive 0.953 0.077

Test negative 0.047 0.923

LR+=12.4

LR+=0.05

Pretest to posttest prbability

LR Interpretation

1 No clinical value

2-5 or 0.2-0.5 Small difference, may be relevant in certain clinical settings

5-10 or 0.1-0.2 Modest, but substantial difference in pretest-posttest probability

>10 or <0.1 Clinically important differences in pretest-posttest probability

LR allows to evaluate the diagnostic performance of a laboratory test

Calculating the posttest probability- Posttest odds = pretest odds x LR (Bayesian Theorem)

pretest odds=pretest prob/(1-pretest prob); posttest prob=posttest odds/(1+posttest odds)

- Using the likelihood ratio, the posttest probability can be calculated for different

pretest probabilities.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Po

stt

est

pro

bab

ilit

y C

D

Pretest probability CD

tTG <7

tTG ≥7

Pretest to posttest prbability

Prev=7.3%

0.49

M1 M2 M3

Defining normality or excluding disease- Reference limits are traditionally defined using healthy volunteers (e.g. blood donors).

- This approach is appropriate for these parameters routine clinical chemistry

parameters such as AST, ALT, CK and LDH which are not specific for a certain disease

and are often determined as part of a routine work-out.

- The use of healthy volunteers can, however, cause problems for the interpretation of

laboratory test results.

Reference limits

Reference limits

Defining normality or excluding disease- Reference limits are traditionally defined using healthy volunteers (e.g. blood donors).

- This approach is appropriate for these parameters routine clinical chemistry

parameters such as AST, ALT, CK and LDH which are not specific for a certain disease

and are often determined as part of a routine work-out.

- The use of healthy volunteers can, however, cause problems for the interpretation of

laboratory test results.

- Specificity is higher in healthy volunteers than in hospitalized patients.

e.g. AxSYM anti-HCV EIA (product information sheet)

Heathy blood donors: 99.84%

Hospitalized patients: 99.60%

Prevalence HCV: 2%

Sensitivity AxSYM: 99.6%

PPV=92.7%

PPV=83.5%

- While specificity is independent of prevalence, it is dependent on the reference

population tested.

=> When possible, the control group should include individuals in whom the target

disease is suspected, but who lack this disease (“diseased control group”).

Reference limits

ExampleReference limits for immunoglobulins in the cryoprecipitate

IgA

(mg/L)

IgG

(mg/L)

IgM

(mg/L)

Diseased controls (n=110)

95th percentile 2 10 20

97.5th percentile 2 11 26

Maximum 3 17 34

Healthy volunteers (n=33)

Maximum <2 <6 9

- In contrast to AST and ALT which are performed as part of almost every routine

work-out, cryoglobulins are only requested in a limited number of patients with

possible cryoglobulin-related symptoms.

A cut-off can be choosen for different reasons

1) To optimize sensitivity and specificity using ROC-curve analysis

e.g. IgA anti-tTG for celiac disease

Reference limits

2) To optimize sensitivity (rule-out if negative)

e.g. HIV screening test

3) To optimize specificity (rule-in if positive)

e.g. HIV confirmation test

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Sen

siti

vity

1-Specificity

sens x spec

Likelihood ratios for non-dichotomous results- For many tests, the probability that a patient has the disease is higher when the test

result is more abnormal.

- This additional information is lost when sensitivity and specificity are used.

- Likelihood and likelihood ratio, in contrast, allow to take this additional information

into acount.

Non-Dichotomous Outcome

0.05 2.0 64

319

0

100

200

300

400

0.0

0.2

0.4

0.6

0.8

1.0

tTG <7 tTG 7-20 tTG >20-100 tTG >100

Like

liho

od

Rat

io

Like

liho

od

LH CD+DH

LH non-CD

LR

Calculating posttest probability for non-dichotomous resultsAs for dichotomous results, the posttest probability can be calculated for different

pretest probabilities using the likelihood ratio for the different non-dichotomous

results.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Po

stte

st p

rob

abili

ty C

D

Pretest probability CD

tTG <7

tTG 7-20

tTG >20-100

tTG >100

Non-Dichotomous Outcome

Prev=7.3%

M1 M2 M3

Comparing different tests

Keep in mind that:1) Comparing the area under the curve does not take into account whether the cut-off

was correctly chosen.

2) Comparing sensitivity and specificity does not tell you anything about the“discriminating” power, the percentage of patients for which there is an importantchange in pretest to posttest probability.

2) An increase in analytical imprecision mainly affects specificity because of the largernumber of non-diseased patients.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Sen

siti

vity

1 - Specificity

Innova

Euroimmune

Dicriminating power

0.20

3.411.6

247

0

200

400

600

800

0.0

0.2

0.4

0.6

0.8

1.0

<15 15-<25 25-<50 >50

Like

liho

od

CD

non-CD

LR

0.13

0.8 7.7

594

0

200

400

600

800

0.0

0.2

0.4

0.6

0.8

1.0

<10 10-<20 20-30 >30

Like

liho

od

CD

non-CD

LR

Innova

Euroimmune

Sens: 84.4%Spec: 99.3%

Sens: 76.0%Spec: 98.7%

Based on: Price C.P. & Christenson R.H. EBLM: principles, practice, and outcomes, 2nd Ed.

Analytical imprecision

Effect of analytical imprecision on diagnostic performance- An increase in analytical imprecision mainly affects specificity because of the larger

number of non-diseased patients (e.g. modeling in 641 patients).

- From a population perspective, the effect of increased imprecision on ROC scores is

in general limited because only a limited number of individuals have results close to

the cut-off point. For the affected individuals, however, the effect is 100%!

Imprecision FP FN Sens Spec FPR

0% 0 0 100% 100% 0%

20% 1 3 99% 99% 1%

50% 11 8 98% 93% 7%

90% 20 13 97% 87% 13%

100% 22 14 97% 86% 14%

110% 24 15 97% 85% 15%

150% 30 19 96% 80% 20%

200% 36 24 95% 77% 23%

Routine assay

STARD Initiative

Standards for Reporting of Diagnostic Accuracy- Aims to improve the accuracy and completeness of reporting of studies of diagnostic

accuracy, to allow readers to assess the potential for bias in the study (internal

validity) and to evaluate its generalisability (external validity).

- Diagnostic studies with specific design features have been shown to be associated

with biased, optimistic estimates of diagnostic accuracy compared with studies

without such deficiencies.

- Complete and accurate reporting should allow the reader to detect the potential for

bias in the study and to assess the generalisability and applicability of the results.

- This includes a cross tabulation of the results of the index tests (including

indeterminate and missing results) by the results of the reference standard and, for

continuous results, the distribution of the test results by the results of the reference

standard.

http://www.stard-statement.org/

Communicating Diagnostic Accuracy

Results of a controlled study (Steurer J et al. BMJ 2002;324:824-826)

- 263 general practicioners, questionnaire with multiple choice questions

- Most GPS recognized the correct definitions for sensitivity (76%) and PPV (61%), butdid not apply them correctly to estimate the probability of disease.

Information providedAll GPs: prevalence 10% in women with abnl. bleedingGroup 1: no information on test accuracyGroup 2: test sens. (80%) and spec. (60%) Group 3: positive LR of 2 in non-technical language (“a positive result is obtained twice as frequently in

women with endometrial cancer than in women without this disease”)

Probability of endometrial cancer attributed by GPs to a possitive result from tranvaginal ultrasound scanning (“compatible with cancer”) in women with abnormal uterine bleeding

What do physicians Do?

Survey of 300 physicians (Reid M.C. et al. Am J. Med. 1998;104:374-80)-

- Bayesian transformation: 3% (pretest odds x (sens/1-spec) = posttest odds)

- ROC curve analysis: 1%

- Likelihood ratios: 1%

Specialty n Bayesian ROC curves LR

General Internal Medicine 50 5 1 1

Internal Medicine subsp. 50 2 0 1

Pediatrics 50 1 1 0

Genreal surgery 50 0 1 0

Family practice 50 0 0 0

Obstetrics/Gynaecology 50 0 0 0

Publication bias??

Am. J. Med. is the official journal of The Association of Professors of Medicine, comprised

of chairs of Dep. of internal medicine at more than 125 medical schools across the USA.

Impractical Lack of knowledge

DecideAssessFind

Questions?