evaluation considerations for ehr-based phenotyping
DESCRIPTION
2013 Summit on Translational BioinformaticsTRANSCRIPT
Evaluation considerations for EHR-based phenotyping algorithms: A case study for Drug Induced Liver Injury
Casey Overby, Chunhua Weng, Krystl Haerian, Adler Perotte Carol Friedman, George Hripcsak
Department of Biomedical Informatics
Columbia University
AMIA TBI paper presentation March 20th, 2013 !
Published Genome-Wide Associations through 09/2011 1,596 published GWA at pβ€5X10-8 for 249 traits
NHGRI GWA Catalog www.genome.gov/GWAStudies
Success in part due to GWAS consortia to obtain needed sample sizes
Background and Motivation
There are added challenges to studying pharmacological traits
β― Drug response is complex β― Risk factors in pathogenesis of drug induced liver injury (DILI)
β― Sample sizes are small compared to typical association studies of common disease β― Adverse drug events β― Responder types
Source: Tujios & Fontana et al. Nat. Rev. Gastroenterol. Hepatol. 2011
Background and Motivation
Consortium recruitment approaches
β― Recruit and phenotype participants prospectively β― Protocol driven recruitment
β― Electronic health records (EHR) linked with DNA biorepositories β― EHR phenotyping
Background and Motivation
Successes developing EHR algorithms within eMERGE
β― Type II diabetes β― Peripheral arterial disease β― Atrial fibrillation β― Crohn disease β― Multiple sclerosis β― Rheumatoid arthritis
β― High PPV
(Kho et al. JAMIA 2012; Kullo et al. JAMIA 2010; Ritchie et al. AJHG 2010; Denny et al. Circulation 2010; Peissig et al. JAMIA 2012)
Source: www.phekb.org
Background and Motivation
Unique characteristics of DILI
β― Rare condition of low prevalence β― Complex condition
β― Drug is causal agent of liver injury β― Different drugs can cause DILI β― Pattern of liver injury varies between drug
β― Pattern of liver injury based on liver enzyme elevations β― No tests to confirm drug causality (some assessment tools exist)
β― High PPV may be challenging
Background and Motivation
Why study? β― DILI accounts for up to 15 % of acute liver failure cases in the
U.S., of which 75% requires liver transplant or lead to death β― Most frequent reason for withdrawal of approved drugs from the
market
β― Lack understanding of underlying mechanisms of DILI β― Computerized approaches can reduce the burden of
traditional approaches to screening for rare conditions (Jha AK et al. JAMIA 1998; Thadani SR et al. JAMIA 2009)
Background and Motivation
Overview of EHR phenotyping process
Case definition
Background and Motivation
Overview of EHR phenotyping process
Case definition (Re-)Design
EHR Phenotyping algorithm
Background and Motivation
e.g., liver injury e.g., ICD-9 codes for acute liver injury, Decreased liver function lab
Overview of EHR phenotyping process
Case definition (Re-)Design
EHR Phenotyping algorithm
Implement EHR Phenotyping
algorithm
Background and Motivation
Overview of EHR phenotyping process
Case definition (Re-)Design
EHR Phenotyping algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Background and Motivation
Overview of EHR phenotyping process
Case definition (Re-)Design
EHR Phenotyping algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
If algorithm needs
improvement
If algorithm is sufficient to be
useful
Background and Motivation
Overview of methods to develop & evaluate initial algorithm
DILI Case definition (iSAEC)
Design EHR Phenotyping
algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
Methods and Results
Overview of methods to develop & evaluate initial algorithm
DILI Case definition (iSAEC)
Design EHR Phenotyping
algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
Develop an evaluation framework
Report lessons learned
Methods and Results
DILI Case definition (iSAEC)
Design EHR Phenotyping
algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
Develop an evaluation framework
Report lessons learned
Lessons inform evaluator approach and algorithm design changes
Lessons learned
DILI case definition 1.β― Liver injury diagnosis (A1)
a.β― Acute liver injury (C1-C4) b.β― New liver injury (B)
2.β― Caused by a drug a.β― New drug (A2) b.β― Not by another disease (D)
A1. Diagnosed with liver
injury?
Clinical data warehouse
B. New liver
injury?
Consider chronicity
no
C2. ALT >= 5x ULN
Patients meeting drug induced liver injury criteria
A2. Exposure to
drug?
C1. ALP >= 2x ULN
C3. ALT >= 3x ULN
C4. Bilirubin >=
2x ULN
yes
yes
no
no
no
Exclude
yes
no
Exclude
D. Other
diagnoses?yes
yes
yes
no
yes
Exclude
no
Exclude
yes
no
Exclude
18,423
13,972
2,375
1,264
560
Initial DILI EHR phenotyping algorithm
Methods and Results
Ref: Aithal, G.P., et al. Case Definition and Phenotype Standardization in Drug-induced Liver Injury. Clin Charmacol Ther. 2011 Jun; 89(6):806-15
β― TP: 27
β― FP: 42
β― NA: 30
β― PPV: TP/(TP+FP) = 27/(42+27) = 39%
β― Preliminary kappa coefficient: 0.50 (Moderate agreement)
β― Interpretation of PPV is unclear given moderate agreement among reviewers
Initial algorithm results: 100 randomly selected for manual review from 560 patients
Estimated positive predictive value
20
20
20
20
20 Reviewer 2
Reviewer 1
Reviewer 3
Reviewer 4
Methods and Results
Included measurement and demonstration studies
β― Measurement study goal β― βto determine the extent and nature of the errors with which a measurement
is made using a specific instrument.β β― Evaluator effectiveness
β― Demonstration study goal β― βestablishes a relation β which may be associational or causal β between a set
of measured variables.β β― Algorithm performance
Definitions from: βEvaluation methods in medical informaticsβ Friedman & Wyatt 2006
Methods and Results
Included quantitative and qualitative assessment
β― Quantitative data β― Inter-rater reliability assessment β― PPV
β― Qualitative data β― Perceptions of evaluation approach effectiveness
β― e.g., review tool, artifacts reviewed
β― Perceptions of benefit of results β― e.g., correct for the case definition?
Methods and Results
An evaluation framework and results Measurement study
(evaluator effectiveness) Demonstration study
(algorithm performance)
Qua
ntit
ativ
e re
sult
s Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39%
Qua
litat
ive
resu
lts
Perceptions of evaluation approach effectiveness: β’β― Differences between evaluation
platforms β’β― Visualizing lab values β’β― Availability of notes
β’β― Discharge summary vs. other notes
Perceptions of benefit of results (themes in FPs): β’β― Babies β’β― Patients who died β’β― Overdose patients β’β― Patients who had a liver transplant
Methods and Results
Lesson learned: Whatβs correct for the algorithm may not be correct for the case definition
β― Are we measuring what we mean to measure?
β― Case definition: liver injury due to medication, not by another disease
β― Many FPs were transplant patients β― Patients correct for the algorithm, but liver enzymes elevated due to procedure
β― Revision: exclude transplant patients
Lessons learned
Improved algorithm design given themes in FPs
β― Added exclusions β― Babies β― Overdose patients β― Patients who died β― Transplant patients
Lessons learned βA collaborative approach to develop an EHR phenotyping algorithm for DILIβ in preparation
Lesson learned: Evaluator effectiveness influences ability to drawing appropriate inferences about algorithm performance
β― How does our evaluation approach influence performance estimations?
β― Moderate agreement among algorithm reviewers, so interpretation of PPV unclear
β― Revision: Improve evaluator approach
Lessons learned
Improved evaluator approach
β― Consensus among 4 reviewers
β― Assign TP/FP status by 1.β― First-pass review of temporal relationship β― Assign preliminary TP, FP, unknown status
2.β― Perform Chart review β― Confirm suspected TPs β― Assign TP/FP if unknown status in first pass
review
β
ββ
ββββ
βββββββββββββββ
β
ββββ
ββββββββ
ββββ
βββββββ
βββββ
β
β
βββ
βββββββ
βββββ
β
β
β
βββ
β
ββββββ
ββ ββ
ββββββ
β
β β β
β ββ
100
200
300
aph
valu
e
2012β0
3β15
2012β0
3β20
2012β0
3β25
2012β0
3β30
2012β0
4β05
2012β0
4β10
2012β0
4β15
2012β0
4β20
2012β0
4β26
2012β0
5β01
2012β0
5β06
2012β0
5β11
2012β0
5β17
2012β0
5β22
2012β0
5β27
2012β0
6β01
2012β0
6β07
2012β0
6β12
2012β0
6β17
2012β0
6β22
2012β0
6β28
2012β0
7β03
2012β0
7β08
2012β0
7β13
2012β0
7β19
β
β
β
β
β
β
β
β
β
ββββββββββββββββββ
β
βββββ
β
β
β
β
βββ
βββββββββββββββ ββββββββββββββββββ β βββββββββ ββββββββ β β β β β β β0
200
400
alt v
alue
2012β0
3β15
2012β0
3β20
2012β0
3β25
2012β0
3β30
2012β0
4β05
2012β0
4β10
2012β0
4β15
2012β0
4β20
2012β0
4β26
2012β0
5β01
2012β0
5β06
2012β0
5β11
2012β0
5β17
2012β0
5β22
2012β0
5β27
2012β0
6β01
2012β0
6β07
2012β0
6β12
2012β0
6β17
2012β0
6β22
2012β0
6β28
2012β0
7β03
2012β0
7β08
2012β0
7β13
2012β0
7β19
β
β
β
β
ββ
β
ββ
β
β
β
β
ββ
β
ββ
β
ββ
β
βββ
ββ
β
ββββ
ββ
ββ
β
ββ
β
ββ
β
βββ
β
β
β
β
ββ
β
βββ
ββ
βββ
β
β
β
ββ
β
ββ
β
β
β
β
βββ
ββ
β
ββ
β
ββ
β
βββ
β
β
β
ββββ
ββ
ββ
β
ββ
β
β
β
β
ββ
β
ββ
β
ββ
β
ββ
β
ββ
β
β
ββ
ββ
β
β
ββ
β
ββ
β
ββ
βββ
β
β
β
β
ββ
ββ
β
β
β
β
ββ
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
ββ
β
ββ
β
ββ
β
βββ
ββ
β
ββ
β
ββ
β
ββ
β
βββ
β
ββ
βββ
ββ
β
ββ
β
ββ
β
β
ββ
β
ββ
βββ
βββββ
β
β
ββ
βββββ β
βββββββββββββ
β
βββββββββ
ββ
β
ββ
β
βββ
β
βββ
βββββ
βββ0
12
34
bilir
ubin
IV v
alue
2012β0
3β15
2012β0
3β20
2012β0
3β25
2012β0
3β30
2012β0
4β05
2012β0
4β10
2012β0
4β15
2012β0
4β20
2012β0
4β26
2012β0
5β01
2012β0
5β06
2012β0
5β11
2012β0
5β17
2012β0
5β22
2012β0
5β27
2012β0
6β01
2012β0
6β07
2012β0
6β12
2012β0
6β17
2012β0
6β22
2012β0
6β28
2012β0
7β03
2012β0
7β08
2012β0
7β13
2012β0
7β19
Lessons learned βA collaborative approach to develop an EHR phenotyping algorithm for DILIβ in preparation
Summary of findings
β― Lessons learned from applying our evaluation framework β― Whatβs correct for the algorithm may not be correct for the case definition (Are we
measuring what we mean to measure?) β― Evaluator effectiveness influences ability to draw appropriate inferences about algorithm
performance
β― Demonstrated that our evaluation framework is useful β― Informed improvements in algorithm design β― Informed improvements in evaluator approach β― Likely more useful for rare conditions
β― Demonstrated EHR phenotyping algorithm development is an iterative process β― Complexity of the algorithm may influence
Acknowledgments
β― Dr. Yufeng Shen - Serious Adverse Event Consortium collaborator
β― eMERGE collaborators β― Mount Sinai (Drs. Omri Gottesman, Erwin Bottinger, and Steve Ellis) β― Mayo Clinic (Drs. Jyotishman Pathak, Sean Murphy, Kevin Bruce, Stephanie Johnson,
Jay Talwalker, Christopher G. Chute, Iftikhar J. Kullo) β― Northwestern (Dr. Abel Kho) β― Vanderbilt (Dr. Josh Denny)
β― DILIN collaborator β― UNC-CH (Dr. Ashraf Farrag)
β― Columbia Training in Biomedical Informatics (NIH NLM #T15 LM007079)
β― The eMERGE network U01 HG006380-01 (Mount Sinai)
!
Questions?
Casey L. Overby [email protected]
Measurement study Demonstration study
Qua
ntit
ativ
e re
sult
s Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39%
Qua
litat
ive
resu
lts
Perceptions of evaluation approach effectiveness: β’β― Differences between evaluation
platforms β’β― Visualizing lab values β’β― Availability of notes
β’β― Discharge summary vs. other notes
Perceptions of benefit of results (themes in FPs): β’β― Babies β’β― Patients who died β’β― Overdose patients β’β― Patients who had a liver transplant
!
DILI Case definition (iSAEC)
Design EHR Phenotyping
algorithm
Evaluate EHR Phenotyping
algorithm
Implement EHR Phenotyping
algorithm
Disseminate EHR Phenotyping
algorithm
Develop an evaluation framework
Report lessons learned