evaluation considerations for ehr-based phenotyping

Evaluation considerations for EHR-based phenotyping algorithms: A case study for Drug Induced Liver Injury

Casey Overby, Chunhua Weng, Krystl Haerian, Adler Perotte Carol Friedman, George Hripcsak

Department of Biomedical Informatics

Columbia University

AMIA TBI paper presentation March 20th, 2013 !

Published Genome-Wide Associations through 09/2011 1,596 published GWA at p≤5X10-8 for 249 traits

NHGRI GWA Catalog www.genome.gov/GWAStudies

Success in part due to GWAS consortia to obtain needed sample sizes

Background and Motivation

There are added challenges to studying pharmacological traits

  Drug response is complex   Risk factors in pathogenesis of drug induced liver injury (DILI)

  Sample sizes are small compared to typical association studies of common disease   Adverse drug events   Responder types

Source: Tujios & Fontana et al. Nat. Rev. Gastroenterol. Hepatol. 2011


Consortium recruitment approaches

  Recruit and phenotype participants prospectively   Protocol driven recruitment

  Electronic health records (EHR) linked with DNA biorepositories   EHR phenotyping


Successes developing EHR algorithms within eMERGE

  Type II diabetes   Peripheral arterial disease   Atrial fibrillation   Crohn disease   Multiple sclerosis   Rheumatoid arthritis

  High PPV

(Kho et al. JAMIA 2012; Kullo et al. JAMIA 2010; Ritchie et al. AJHG 2010; Denny et al. Circulation 2010; Peissig et al. JAMIA 2012)

Source: www.phekb.org


Unique characteristics of DILI

  Rare condition of low prevalence   Complex condition

  Drug is causal agent of liver injury   Different drugs can cause DILI   Pattern of liver injury varies between drug

  Pattern of liver injury based on liver enzyme elevations   No tests to confirm drug causality (some assessment tools exist)

  High PPV may be challenging


Why study?   DILI accounts for up to 15 % of acute liver failure cases in the

U.S., of which 75% requires liver transplant or lead to death   Most frequent reason for withdrawal of approved drugs from the

market

  Lack understanding of underlying mechanisms of DILI   Computerized approaches can reduce the burden of

traditional approaches to screening for rare conditions (Jha AK et al. JAMIA 1998; Thadani SR et al. JAMIA 2009)


Overview of EHR phenotyping process

Case definition



Case definition (Re-)Design

EHR Phenotyping algorithm


e.g., liver injury e.g., ICD-9 codes for acute liver injury, Decreased liver function lab




Implement EHR Phenotyping

algorithm





Evaluate EHR Phenotyping

algorithm


algorithm






algorithm


algorithm

Disseminate EHR Phenotyping

algorithm

If algorithm needs

improvement

If algorithm is sufficient to be

useful


Overview of methods to develop & evaluate initial algorithm

DILI Case definition (iSAEC)

Design EHR Phenotyping

algorithm


algorithm


algorithm


algorithm

Methods and Results

Overview of methods to develop & evaluate initial algorithm



algorithm


algorithm


algorithm


algorithm

Develop an evaluation framework

Report lessons learned

Methods and Results



algorithm


algorithm


algorithm


algorithm



Lessons inform evaluator approach and algorithm design changes

Lessons learned

DILI case definition 1.  Liver injury diagnosis (A1)

a.  Acute liver injury (C1-C4) b.  New liver injury (B)

2.  Caused by a drug a.  New drug (A2) b.  Not by another disease (D)

A1. Diagnosed with liver

injury?

Clinical data warehouse

B. New liver

injury?

Consider chronicity

no

C2. ALT >= 5x ULN

Patients meeting drug induced liver injury criteria

A2. Exposure to

drug?

C1. ALP >= 2x ULN

C3. ALT >= 3x ULN

C4. Bilirubin >=

2x ULN

yes

yes

no

no

no

Exclude

yes

no

Exclude

D. Other

diagnoses?yes

yes

yes

no

yes

Exclude

no

Exclude

yes

no

Exclude

18,423

13,972

2,375

1,264

560

Initial DILI EHR phenotyping algorithm

Methods and Results

Ref: Aithal, G.P., et al. Case Definition and Phenotype Standardization in Drug-induced Liver Injury. Clin Charmacol Ther. 2011 Jun; 89(6):806-15

  TP: 27

  FP: 42

  NA: 30

  PPV: TP/(TP+FP) = 27/(42+27) = 39%

  Preliminary kappa coefficient: 0.50 (Moderate agreement)

  Interpretation of PPV is unclear given moderate agreement among reviewers

Initial algorithm results: 100 randomly selected for manual review from 560 patients

Estimated positive predictive value

20

20

20

20

20 Reviewer 2

Reviewer 1

Reviewer 3

Reviewer 4

Methods and Results

Included measurement and demonstration studies

  Measurement study goal   “to determine the extent and nature of the errors with which a measurement

is made using a specific instrument.”   Evaluator effectiveness

  Demonstration study goal   “establishes a relation – which may be associational or causal – between a set

of measured variables.”   Algorithm performance

Definitions from: “Evaluation methods in medical informatics” Friedman & Wyatt 2006

Methods and Results

Included quantitative and qualitative assessment

  Quantitative data   Inter-rater reliability assessment   PPV

  Qualitative data   Perceptions of evaluation approach effectiveness

  e.g., review tool, artifacts reviewed

  Perceptions of benefit of results   e.g., correct for the case definition?

Methods and Results

An evaluation framework and results Measurement study

(evaluator effectiveness) Demonstration study

(algorithm performance)

Qua

ntit

ativ

e re

sult

s Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39%

Qua

litat

ive

resu

lts

Perceptions of evaluation approach effectiveness: •  Differences between evaluation

platforms •  Visualizing lab values •  Availability of notes

•  Discharge summary vs. other notes

Perceptions of benefit of results (themes in FPs): •  Babies •  Patients who died •  Overdose patients •  Patients who had a liver transplant

Methods and Results

Lesson learned: What’s correct for the algorithm may not be correct for the case definition

  Are we measuring what we mean to measure?

  Case definition: liver injury due to medication, not by another disease

  Many FPs were transplant patients   Patients correct for the algorithm, but liver enzymes elevated due to procedure

  Revision: exclude transplant patients

Lessons learned

Improved algorithm design given themes in FPs

  Added exclusions   Babies   Overdose patients   Patients who died   Transplant patients

Lessons learned “A collaborative approach to develop an EHR phenotyping algorithm for DILI” in preparation

Lesson learned: Evaluator effectiveness influences ability to drawing appropriate inferences about algorithm performance

  How does our evaluation approach influence performance estimations?

  Moderate agreement among algorithm reviewers, so interpretation of PPV unclear

  Revision: Improve evaluator approach

Lessons learned

Improved evaluator approach

  Consensus among 4 reviewers

  Assign TP/FP status by 1.  First-pass review of temporal relationship   Assign preliminary TP, FP, unknown status

2.  Perform Chart review   Confirm suspected TPs   Assign TP/FP if unknown status in first pass

review

●

●●

●●●●

●●●●●●●●●●●●●●●

●

●●●●

●●●●●●●●

●●●●

●●●●●●●

●●●●●

●

●

●●●

●●●●●●●

●●●●●

●

●

●

●●●

●

●●●●●●

●● ●●

●●●●●●

●

● ● ●

● ●●

100

200

300

aph

valu

e

2012−0

3−15

2012−0

3−20

2012−0

3−25

2012−0

3−30

2012−0

4−05

2012−0

4−10

2012−0

4−15

2012−0

4−20

2012−0

4−26

2012−0

5−01

2012−0

5−06

2012−0

5−11

2012−0

5−17

2012−0

5−22

2012−0

5−27

2012−0

6−01

2012−0

6−07

2012−0

6−12

2012−0

6−17

2012−0

6−22

2012−0

6−28

2012−0

7−03

2012−0

7−08

2012−0

7−13

2012−0

7−19

●

●

●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●

●

●●●●●

●

●

●

●

●●●

●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ● ●●●●●●●●● ●●●●●●●● ● ● ● ● ● ● ●0

200

400

alt v

alue

2012−0

3−15

2012−0

3−20

2012−0

3−25

2012−0

3−30

2012−0

4−05

2012−0

4−10

2012−0

4−15

2012−0

4−20

2012−0

4−26

2012−0

5−01

2012−0

5−06

2012−0

5−11

2012−0

5−17

2012−0

5−22

2012−0

5−27

2012−0

6−01

2012−0

6−07

2012−0

6−12

2012−0

6−17

2012−0

6−22

2012−0

6−28

2012−0

7−03

2012−0

7−08

2012−0

7−13

2012−0

7−19

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●●

●

●●

●

●●●

●●

●

●●●●

●●

●●

●

●●

●

●●

●

●●●

●

●

●

●

●●

●

●●●

●●

●●●

●

●

●

●●

●

●●

●

●

●

●

●●●

●●

●

●●

●

●●

●

●●●

●

●

●

●●●●

●●

●●

●

●●

●

●

●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●●

●●●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●●

●

●●●

●●

●

●●

●

●●

●

●●

●

●●●

●

●●

●●●

●●

●

●●

●

●●

●

●

●●

●

●●

●●●

●●●●●

●

●

●●

●●●●● ●

●●●●●●●●●●●●●

●

●●●●●●●●●

●●

●

●●

●

●●●

●

●●●

●●●●●

●●●0

12

34

bilir

ubin

IV v

alue

2012−0

3−15

2012−0

3−20

2012−0

3−25

2012−0

3−30

2012−0

4−05

2012−0

4−10

2012−0

4−15

2012−0

4−20

2012−0

4−26

2012−0

5−01

2012−0

5−06

2012−0

5−11

2012−0

5−17

2012−0

5−22

2012−0

5−27

2012−0

6−01

2012−0

6−07

2012−0

6−12

2012−0

6−17

2012−0

6−22

2012−0

6−28

2012−0

7−03

2012−0

7−08

2012−0

7−13

2012−0

7−19

Lessons learned “A collaborative approach to develop an EHR phenotyping algorithm for DILI” in preparation

Summary of findings

  Lessons learned from applying our evaluation framework   What’s correct for the algorithm may not be correct for the case definition (Are we

measuring what we mean to measure?)   Evaluator effectiveness influences ability to draw appropriate inferences about algorithm

performance

  Demonstrated that our evaluation framework is useful   Informed improvements in algorithm design   Informed improvements in evaluator approach   Likely more useful for rare conditions

  Demonstrated EHR phenotyping algorithm development is an iterative process   Complexity of the algorithm may influence

Acknowledgments

  Dr. Yufeng Shen - Serious Adverse Event Consortium collaborator

  eMERGE collaborators   Mount Sinai (Drs. Omri Gottesman, Erwin Bottinger, and Steve Ellis)   Mayo Clinic (Drs. Jyotishman Pathak, Sean Murphy, Kevin Bruce, Stephanie Johnson,

Jay Talwalker, Christopher G. Chute, Iftikhar J. Kullo)   Northwestern (Dr. Abel Kho)   Vanderbilt (Dr. Josh Denny)

  DILIN collaborator   UNC-CH (Dr. Ashraf Farrag)

  Columbia Training in Biomedical Informatics (NIH NLM #T15 LM007079)

  The eMERGE network U01 HG006380-01 (Mount Sinai)

!

Questions?

Casey L. Overby [email protected]

Measurement study Demonstration study

Qua

ntit

ativ

e re

sult

s Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39%

Qua

litat

ive

resu

lts

Perceptions of evaluation approach effectiveness: •  Differences between evaluation

platforms •  Visualizing lab values •  Availability of notes

•  Discharge summary vs. other notes

Perceptions of benefit of results (themes in FPs): •  Babies •  Patients who died •  Overdose patients •  Patients who had a liver transplant

!



algorithm


algorithm


algorithm


algorithm



evaluation considerations for ehr-based phenotyping

Documents

ehr phenotyping

acute liver

report lessons

evaluation

ehr phenotyping

notes discharge

dili case

liver injury