square wheels: electronic medical records for discovery research in rheumatoid arthritis

64
Square wheels: electronic medical records for discovery research in rheumatoid arthritis Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored "Using EHR Data for Discovery Research" HARVARD MEDICAL SCHOOL

Upload: calvin-michael

Post on 04-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Square wheels: electronic medical records for discovery research in rheumatoid arthritis. ^ genetic. Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored " Using EHR Data for Discovery Research ". HARVARD MEDICAL SCHOOL. Key questions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Square wheels: electronic medical records for discovery research in rheumatoid arthritis

Robert M. Plenge, M.D., Ph.D.

October 30, 2009

NCRR sponsored "Using EHR Data for Discovery

Research" HARVARDMEDICAL SCHOOL

Page 2: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Key questions

• What are the regulatory obstacles impacting your work?

• What are the resource needs required to replicate your work at other institutions?

• What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

Page 3: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Key questions

How can I implement your approach, and how much

better is it?

Page 4: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis
Page 5: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

genotype

phenotype

clinical care

Page 6: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

genotype

phenotype

clinical care

bottleneck

Page 7: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Raychaudhuri et al in press Nature Genetics

October 2009: >30 RA risk loci

20031978 1987 20052004

PTPN22

2008

“shared epitope”hypothesis

HLADR4

2007

PADI4 CTLA4

TNFAIP3STAT4TRAF1-C5IL2-IL21

CD40CCL21CD244IL2RBTNFRSF14PRKCQPIP4K2CIL2RAAFF3

Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples: >10 new

loci

2009

RELBLKTAGAPCD28TRAF6PTPRCFCGR2APRDM1CD2-CD58

Together explain ~35% of the genetic burden of

disease

Page 8: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

genotype

phenotype

clinical carebottleneck

Page 9: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Genetic predictors of response to anti-TNF

therapy in RA

PTPRC/CD45 allelen=1,283 patients

P=0.0001

Submitted to Arth & Rheum

Page 10: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

How can we collect DNA and detailed clinical data on >20,000 RA patients?

Page 11: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

What are the options for collecting clinical

data and DNA for genetic studies?

Page 12: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Options for clinical + DNA

design Clinical

data

DNA Sample size

cost

clinical trial

+++ +++ + $$$

registry ++ +++ ++ $$

claims data

+ n/a +++ $

EMR ++ +++ +++ $

Page 13: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

• Narrative data = free-form written text– info about symptoms, medical history,

medications, exam, impression/plan

• Codified data = structured format– age, demographics, and billing codes

Content of EMRs

EMRs are increasingly utilized!

Page 14: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Gabriel (1994) Arthritis and Rheumatism

This is not a new idea…

Sens: 89%PPV: 57%Sens: 89%PPV: 57%

Page 15: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Gabriel (1994) Arthritis and Rheumatism

Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis.

…but EMR data are “dirty”

Page 16: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Partners HealthCare: 4 million patients

Page 17: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Partners HealthCare: linked by EMR

Page 18: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Partners HealthCare: organized by i2b2

Page 19: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

4 million patients

31,171 patients

ICD9 RA and/or CCP checked(goal = high sensitivity)

3,585 RA patients

Classification algorithm(goal = high PPV)

Clinical subsetsClinical subsets

Discarded blood for DNA

Page 20: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

• Natural language processing (NLP)– disease terms (e.g., RA, lupus)– medications (e.g., methotrexate)– autoantibodies (e.g., CCP, RF)– radiographic erosions

• Codified data– ICD9 disease codes– prescription medications– laboratory autoantibodies

Our library of RA phenotypes

Qing Zeng

Concept/term Accuracy of concept presence of erosion 88% seropositive 96% CCP positive 98.7% RF positive 99.3% etanercept 100% methotrexate 100%

Page 21: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

• Natural language processing (NLP)– disease terms (e.g., RA, lupus)– medications (e.g., methotrexate)– autoantibodies (e.g., CCP, RF)– radiographic erosions

• Codified data– ICD9 disease codes– prescription medications– laboratory autoantibodies

Our library of RA phenotypes

Shawn Murphy

Page 22: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

‘Optimal’ algorithm to classify RA:

NLP + codified data

Regression model with a penalty parameter (to avoid over-fitting)

Codified data NLP data

Tianxi Cai, Kat Liao

Page 23: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

High PPV with adequate sensitivity

✪392 out of 400 (98%) had definite or possible RA!

Page 24: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

This means more patients!

~25% more subjects with the complete algorithm:

3,585 subjects (3,334 with true RA)3,046 subjects (2,680 with true RA)

Page 25: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

4 million patients

31,171 patients

ICD9 RA and/or CCP checked(goal = high sensitivity)

3,585 RA patients

Classification algorithm(goal = high PPV)

Discarded blood for DNA

Page 26: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Linking the Datamart-Crimson

NLP

data

Cod

ified

data

Page 27: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

• Over 3,000 samples collected to date– cost = $10 per sample

• DNA extracted on >2,400 Buffy coats– cost = $20 per sample– >90% had ≥1 ug of DNA– >99% had ≥5 ug of DNA after WGA

Status of i2b2 Crimson collection

genotyping of 384 SNPs (RA risk alleles, AIMs, other) is ongoing at

Broad Institute

Page 28: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

• Measured autoantibodies from plasma– 5 autoantibodies in ~380 RA patients– ~85% are CCP+, ~35% ANA+, ~15%

TPO+

• Question: are non-RA autoantibodies present at increased frequency in RA patients vs matched controls?

stay tuned…more data soon!

Status of i2b2 Crimson collection

Page 29: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Key questions

How can I implement your approach, and how much

better is it?

Page 30: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Key questions

• What are the regulatory obstacles impacting your work?

• What are the resource needs required to replicate your work at other institutions?

• What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

Page 31: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Key questions

• What are the regulatory obstacles impacting your work?

• What are the resource needs required to replicate your work at other institutions?

• What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?

Page 32: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Regulatory obstacles

• IRB approval

• De-identified vs truly anonymous

• Open question: sharing of genetic data

Page 33: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Key questions

• What are the regulatory obstacles impacting your work?

• What are the resource needs required to replicate your work at other institutions?

• What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?

Page 34: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Resources required• Building a research DataMart

– clinical EMR ≠ research EMR– multiple FTE’s to build/maintain

• NLP expertise– open-source software available– iterative process for fine-tuning

• Clinical expertise– understand nature of clinical data

Page 35: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Resources required (cont.)

• Statistical expertise– simple algorithm is not sufficient– prepare for the unexpected!– true for narrative and codified

• Biospecimen collection, DNA extraction– varies by institution– Crimson – Broad Institute

Page 36: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Key questions

• What are the regulatory obstacles impacting your work?

• What are the resource needs required to replicate your work at other institutions?

• What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

Page 37: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

4 million patients

31,171 patients

ICD9 RA and/or CCP checked(goal = high sensitivity)

3,585 RA patients

Classification algorithm(goal = high PPV)

Clinical subsetsClinical subsets

Discarded blood for DNA

Page 38: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Characteristics

i2b2 RA CORRONA

total number 3,585 7,971

Mean age (SD) 57.5 (17.5) 58.9 (13.4)

Female (%) 79.9 74.5

Anti-CCP(%) 63 N/A

RF (%) 74.4 72.1

Erosions (%) 59.2 59.7

MTX (%) 59.5 52.8

Anti-TNF (%) 32.6 22.6

Clinical features of patients

CCP has an OR = 1.5 for predicting erosions

Page 39: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Subset patients in clinically meaningful ways: causes of

mortality

NLP+codified data, together with statistical modeling, to define

cardiovascular disease

Page 40: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Non-responder to anti-TNF therapy

NLP+codified data, together with statistical modeling, to define treatment

response

Page 41: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Responder to anti-TNF therapy

NLP+codified data, together with statistical modeling, to define treatment

response

Page 42: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Post-marketing surveillance of adverse events

NLP+codified data, together with statistical modeling, to define treatment

response

pharmacovigilance

Page 43: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Conclusions

Page 44: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Options for clinical + DNA

design Clinical

data

DNA Sample size

cost

clinical trial

+++ +++ + $$$

registry ++ +++ ++ $$

claims data

+ n/a +++ $

EMR ++ +++ +++ $

Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.

Page 45: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Options for clinical + DNA

design Clinical

data

DNA Sample size

cost

clinical trial

+++ +++ + $$$

registry ++ +++ ++ $$

claims data

+ n/a +++ $

EMR ++ +++ +++ $

Conclusion: We can collect DNA and plasma in a high-throughput manner.

Page 46: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Options for clinical + DNA

design Clinical

data

DNA Sample size

cost

clinical trial

+++ +++ + $$$

registry ++ +++ ++ $$

claims data

+ n/a +++ $

EMR ++ +++ +++ $

Conclusion: The cost is reasonable...even for >20,000 RA patients!

Page 47: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

genotype

phenotype

clinical care

Page 48: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

AcknowledgmentsZak KohaneSusanne ChurchillVivian GainerKat LiaoTianxi CaiShawn MurphyQing ZingSoumya RaychaudhuriBeth KarlsonPete SzolovitsLee-Jen WeiLynn Bry (Crimson)Sergey GoryachevBarbara Mawn & many others !

Namaste!

Page 49: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis
Page 50: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Narrative data (NLP text extractions)

Codified data (ICD9 codes, etc)

Page 51: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Run specific queries

Page 52: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Visualize results in a timeline

Page 53: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Identifying RA patients in our i2b2 RA DataMart

1993 2008

Signs and symptomsDiseases that mimick RA

Medications specific to RANotes (including whether seen by a rheumatologist)

diagnostic codes for RA

Shawn Murphy, Vivian Gainer, others

Page 54: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

signs and symptoms c/w RA

RA without other diseases

Specific RA meds, including MTX

Seen by rheumatology

Many diagnostic codes for RA

1993 2008

Identifying RA patients in our i2b2 RA DataMart

Page 55: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Probability of RA: all 31K subjects

Probability of RA

Freq

uen

cy

not RA RA (n=3,585)

Page 56: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

ROC curves for algorithms

sensi

tivit

y

1 - specificity

97% specificity

codified + NLP

NLP only

codified only

Page 57: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Other algorithms to classify RA

NLP OnlyCodified only

Portability!

Page 58: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Classification of RA cases (and not RA)

1.00

0.80

0.60

0.40

0.20

0.00

Pro

bab

ility

R

A

Not RA possible Yes RA

threshold

0.29

???

Page 59: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Diagnosis = Ankylosing Spondylitis

(but many RA codes)

A few signs and symptoms c/w RA

NLP with few mentions of RA Specific meds

Visits to BWH/MGH

diagnostic codes for RA

Probability RA = 0.78

Page 60: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Diagnosis = JRA (but many RA codes)

signs and symptoms c/w RA

NLP with “RA” and “JRA”

Specific meds

Visits to the RA Center at BWH

Many diagnostic codes for RA

Page 61: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Probability RA = 0.33

Diagnosis not clear initially…

signs and symptoms c/w RA

NLP without much “RA”, few specific meds (MTX x 1)

…and few diagnostic codes for RA, despite multiple LMR notes, including visits to the BWH Arthritis Center

Page 62: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Now the false negatives…

Page 63: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Diagnosed in 1992, little follow-up

For some reason few RA diagnostic codes

Probability RA = 0.11

Page 64: Square wheels:  electronic medical records for discovery research in rheumatoid arthritis

Enbrel (etanercept)codified: 1,628NLP: 3,796

overlap: 1,612 (99%)

Note: review of 50 NLPoccurrences shows that 38 out of 50 actively on Enbrel

Medications: codified data vs. NLP