adventures in ehr computable phenotypes: lessons learned...
TRANSCRIPT
Adventures in EHR Computable
Phenotypes: Lessons Learned from the
Southeastern Diabetes Initiative (SEDI)
PCORnet Best Practices Sharing Session
Wednesday, August 5, 2015
Introductions to the Round Table
Joseph Lucas, PhD
Associate Director, Health System Operations, Information Initiative at Duke
Adjunct Associate Research Professor of Statistical Science, Duke University
Ben Neely, MS
Biostatistician, Duke Clinical Research Institute
Rachel Richesson, PhD, MS, MPH, FACMI
Associate Professor, Duke University School of Nursing
Shelley Rusincovitch
Project Leader, Applied Informatics & Architecture
The round table on August 5, from left to right:
Host Kristin Newby, MD, MHS; Shelley Rusincovitch; Rachel
Richesson, PhD, MS, MPH, FACMI; Benjamin Neely, MS;
Joseph Lucas, PhD
Phenotyping Objectives
and Statistical Design
Joe Lucas, PhD
Population Health
• Who are the at risk population?
• Can we identify them from EMR?
– Intervention before secondary morbidity/mortality
– Appropriately identified patients lead to more accurate
treatments
– Improvement in accuracy of retrospective studies
• Financial incentives: Accountable care
– Can intervene early to lower future cost?
– Better assessment of future risk (higher reimbursement
from payers)
Identifying patients: Diabetes
• What are the
performance
characteristics of an
algorithm for identifying
patients?
– Sample and compare to
“truth”
• Disease status is not
always clear in the EMR
• What is the “gold
standard” truth?
Stability
• Suppose:
– We have 100,000 patients in the “all
zeros” strata
– We sample 40 patients from this strata
– 1000 patients with disease in other strata
• MLE estimate of number of patients
with disease
– If 0/40 have disease: 1000
– If 1/40 have disease: 3500
• This has drastic consequences for
sensitivity estimates
• Bayesian approach, prior distributions
Don’t get real estimates of
sensitivity until we sample
at least one false negative
Sensitivity: 𝑡𝑝(𝑡𝑝+𝑓𝑛)
Uniform Sampling for Uncommon Disease
• Test and disease positive in 3% of the population
• Odds ratio 50– Odds disease given positive
test over odds disease given negative
– (tp/fp) / (fn/tn)
• We can improve estimates of PPV by over-sampling patients with positive tests
• Sensitivity depends on estimating false negatives
Stratified Sampling
• Suppose we instead sample
preferentially from patients
with positive test
– 𝑃𝑃𝑉 = 𝑡𝑝𝑡𝑝+𝑓𝑝 can be
estimated well
– NPV and Specificity are
dominated by a very low false
negative rate
• We can trade sample size to
get a better estimate of PPV
Multiple Computable Phenotypes
• Multiple definitions– Stratify based on definitions
– At least one stratum contains patients not identified by any definition
• Sensitivity: 𝑡𝑝(𝑡𝑝+𝑓𝑛)
• True positive can be well estimated
• False negative is poorly estimated, but only in one of the strata– All computable definitions have
the same false negative rate in that stratum
• Example: Two definitions
• Hard to get accurate estimate of false negative because events are so rare in the 0,0 strata– However, inaccuracy is
shared by all definitions
Definition 1
0 1
Definitio
n 2
0 94.4% 1.8%
1 2.2% 1.6%Hard to be
accurate
in this box
Comparing definitions
Our stratification makes comparing definitions possible because they
share false negative rates in the largest stratum.
Estimates of
sensitivity are
indistinguishabl
e
Estimates of
improvement in
sensitivity clearly favor
definition 2
Methods Overview
Rachel Richesson, PhD, MS, MPH, FACMI
The CPM-SEDI
Phenotype
Development
Process
Methods
• Blinded review by 2 reviewers with adjudication (S. Spratt, MD)
• Reviewers diabetes experts (physicians and NPs) from DUHS
• Reviews conducted May – December 2014
• The Research Electronic Data Capture (REDCap) platform
used for random assignment of charts to reviewers and the
collection of data for each review.
• Reviewers trained on chart review in MAESTRO Care and
REDCap (one-hour training session + Manual of Operations)
• The reviewers examined electronic charts for a defined time
range (2007 – 2011) to match the time period of the phenotype
queries.
Discussion of Results
Ben Neely, MS
(Unpublished results in manuscript preparation)
Live demo: Visualization of
False Positives
Ben Neely, MS
Next Horizons and
Improving Workflows
Joe Lucas, PhD
Ben Neely, MS
STEARNS
SequenTial EstimAtion with Redcap aNd Shiny.
Lessons Learned
Shelley Rusincovitch
Themes
1. Precision of language is important
2. Gold standard clinical definitions can be challenging and
nuanced
3. Reviewer concordance can be challenging and nuanced
4. Codes change
Precision of language is important
• “Revascularization”
– Coronary revascularization
– Coronary artery revascularization
– Myocardial revascularization
– Cerebral revascularization
– Revascularization of lower limb
– Revascularization of whole leg
– Revascularization of foot
– Revascularization of toe
Slide acknowledgement and thanks to Michelle Smerek
In Summary
• In order for the phenotypers to find good definitions (FIT),
it is essential that they know what we are looking for
(PURPOSE)
• This process is iterative! The clinicians and statisticians
give us initial requirements, we survey the landscape,
circle back with any questions and to get clarification, and
then resume the search.
• More regular communication among the parties will result
in phenotype definitions that better fit our purpose.
Slide acknowledgement and thanks to Michelle Smerek
Applicability, Broader
Context, and PCORnet
Considerations
Rachel Richesson, PhD, MS, MPH, FACMI
Benefits of Sharing Phenotypes…
• Development and conduct of new multi-site studies
(interventional and observational)
– Efficiencies of re-using definitions and code
• Comparability of EHR-derived data sets
• Comparison of study results and aggregation of evidence
• Reporting of data sets or results (e.g., ClinicalTrials.gov, NIH)
• Description of research populations in medical journals
• Understandable
• Reproducible
• Usable
• Useful
o Validation (results and methods)
o Use data elements and coding systems that are widely implemented in EHR systems
o Community acceptance --“Standardized” across sites or research communities
Desirable Features– “URU + U”
essential for
pragmatic trials...
essential for
multi-site
studies...
PCORnet: the National Patient-Centered Clinical
Research Network
PCORnet’s goal is to improve the nation’s capacity
to conduct CER efficiently, by creating a large,
highly representative, national patient-centered
clinical research network for conducting clinical
outcomes research.
The vision is to support a learning US healthcare system, which would allow for large-scale research to be conducted with enhanced accuracy and efficiency.
32
Guiding principle: Make research
easier
• Analysis ready data
• Reusable analysis tools
• Administrative simplicity
• Simple, pragmatic studies integrated into routine care
• A national/regional resource to answer questions
important to patients, clinicians, and delivery system
leaders
• A foundation of the Learning Health System
PCORnet Approach to Phenotypes
• Networks share phenotypes with CC
• Strongly encourage harmonization across PCORnet
• Encourage public posting (PheKB) by researchers
The greatest challenge – part 1
“standard” phenotype definitions:
identify, store, promote, implement
Sufficient & appropriate documentation:
• Communication
– Clinical, scientific, statistical, data science, & technical experts
– Multiple users, stakeholders
– Research sponsors
– Disease and patient advocates
• Collaboration
The greatest challenge – part 2
Acknowledgements
We gratefully acknowledge the leadership of Susan Spratt,
MD, and thank the dedicated team of chart reviewers.
We acknowledge and
appreciate the individual
contributions from members
of the Center for Predictive
Medicine and our
collaborators in this work.
https://www.dcri.org/our-
services/biostatistics/center-
for-predictive-medicine
Acknowledgements, continued
The projects and the work described are supported in part
by grant number 1C1CMS331018-01-00 from the
Department of Health and Human Services, Centers for
Medicare & Medicaid Services, and in part by the Bristol
Myers Squibb Foundation Together on Diabetes program.
These contents are solely the responsibility of the authors
and do not necessarily represent the official views of the
U.S. Department of Health and Human Services or any of its
agencies.
Contact Information
Joseph Lucas, PhD
Associate Director, Health System
Operations, Information Initiative at
Duke
Adjunct Associate Research Professor
of Statistical Science, Duke University
https://www.linkedin.com/in/joelucas1
Ben Neely, MS
Biostatistician
Duke Clinical Research Institute
https://github.com/benneely
Rachel Richesson, PhD, MS,
MPH, FACMI
Associate Professor,
Duke University School of Nursing
https://twitter.com/rrichesson
Shelley Rusincovitch
Project Leader in Applied
Informatics and Architecture
Duke Translational Research
Institute (DTRI)
https://twitter.com/Rusincovitch
Discussion