using ontologies to make sense of unstructured medical data nigam shah, mbbs, phd [email protected]

28
Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD [email protected]

Upload: dustin-sims

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Using ontologies to make sense of unstructured medical data

Nigam Shah, MBBS, [email protected]

Page 2: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

NCBO: Key activities

• We create and maintain a library of biomedical ontologies.

• We build tools and Web services to enable the use of ontologies and their derivatives.

• We collaborate with scientific communities that develop and use ontologies.

Page 3: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

http:

//re

st.b

ioon

tolo

gy.o

rghtt

p://

rest

.bio

onto

logy

.org

Ontology ServicesOntology Services

• Download• Traverse• Search• Comment

• Download• Traverse• Search• Comment

WidgetsWidgets• Tree-view• Auto-complete• Graph-view

• Tree-view• Auto-complete• Graph-view

AnnotationAnnotation

Data AccessData Access

Mapping ServicesMapping Services

• Create• Download• Upload

• Create• Download• Upload

ViewsViews

Term recognitionTerm recognition

Fetch “data” annotated with a given term

Fetch “data” annotated with a given term

http://bioportal.bioontology.orghttp://bioportal.bioontology.org

Page 4: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Annotation service

Process textual metadata to automatically tag text with as many ontology terms as possible.

90 million calls, ~700 GB of data90 million calls, ~700 GB of data

Page 5: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Resource index

Pubmed AbstractsPubmed Abstracts

Adverse Events (AERS)Adverse Events (AERS)

GEOGEO

::

Clinical TrialsClinical Trials

Drug BankDrug Bank

Won 1st prize at the 2010 Semantic Web Challenge @ ISWC

Won 1st prize at the 2010 Semantic Web Challenge @ ISWC

Page 6: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Creating Lexicons

Term – 1:::Term – n

Sentence in Clinical Note – 1:::Sentence in Clinical Note – m

tf df NN JJ … VP …

ID Term-1 150,879 90,000 0.90 0.05 … 0.03 …

ID :

ID Term-n

Syntactic typesSyntactic typesFrequencyFrequency

Frequency counter

Page 7: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Annotation AnalyticsAnnotation Analytics

Analyzing tagged data for hypothesis generation in bioinformatics

Page 8: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Genome

Generic GO based analysis routineGeneric GO based analysis routine

Reference set

Study Set• Get annotations for each

gene in a set

• Count the occurrence of each annotation term in the study set

• Count the occurrence of that term in some reference set (whole genome?)

• P-value for how surprising their overlap is.

Page 9: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Annotation Analytics Landscape

SNOMED-CT

Gene Ontology

Gene Sets

NCIT

ICD-9

Human Disease

Cell Type

MeSH

Drugs, Chemicals

Grant Sets

Paper Sets

Patient Sets

Drug Sets

: ??

Health Indicator Warehouse datasets

Page 10: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Open questions

1. Can we use something other than the GO?

2. Lack of annotations—even today, roughly 20% of genes lack any GO annotation.

3. Annotation bias—annotation with certain ontology terms is not independent of each other.

4. Lack of a systematic mechanism to define a level of abstraction.

Page 11: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Profiling a set of Aging genes

Disease Ontology

~ 30% of genome

261 Age-related genes

Genome

Page 12: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Using ontologies other than GO

ERCC6 nucleoplasmPARP1 protein N-terminus bindingERCC6 nucleoplasmPARP1 protein N-terminus binding

ERCC6 <disease term?> PARP1 <disease term?>ERCC6 <disease term?> PARP1 <disease term?>

Page 13: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

ERCC6 GO:0005654 PMID:16107709ERCC6 GO:0008094 PMID:16107709PARP1 GO:0047485 PMID:16107709ERCC6 GO:0005730 PMID:16107709PARP1 GO:0003950 PMID:16107709

http://www.geneontology.org/GO.downloads.annotations.shtmlhttp://www.geneontology.org/GO.downloads.annotations.shtml

Enrichment Analysis with the DO

www.ncbi.nlm.nih.gov/pubmed/16107709www.ncbi.nlm.nih.gov/pubmed/16107709

NCBO Annotator:http://bioportal.bioontology.orgNCBO Annotator:http://bioportal.bioontology.org

{ERCC6, PARP1} PMID:16107709{ERCC6, PARP1} PMID:16107709

{ERCC6, PARP1} {Cockayne syndrome, DNA damage}{ERCC6, PARP1} {Cockayne syndrome, DNA damage}

Page 14: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Annotation Analytics on EMR dataAnnotation Analytics on EMR data

Analysis of tagged data from electronic health records

Page 15: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Profiling patient setsProfiling patient sets

86k patient Reports

ICD9 789.00 (Abdominal pain, unspecified site)

Patient records processed from U. Pittsburgh NLP Repository with IRB approval.

Page 16: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Annotation (Clinical Text)

Page 17: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Term – 1:::Term – nSyntactic typesSyntactic types

FrequencyFrequency

Term recognition tool NCBO Annotator

Term recognition tool NCBO Annotator

NegEx Patterns

NegEx Patterns

NegEx Rules – Negation detection

NegEx Rules – Negation detection

P1 ICD9 ICD9 ICD9 ICD9 ICD9 ICD9

P1 T1, T2, no T4

… T5, T4, T3

… T4, T3, T1

T8, T9, T4

… T6, T8, T10

T1, T2, no T4

P2

P2

P3

P3

:

:

Pn

PnTerms form a temporal series of tags

Coh

ort

of

Inte

rest

DiseasesDiseases

ProceduresProcedures

DrugsDrugs

BioPortal – knowledge graph

Creating clean lexicons

Annotation Workflow

Furt

her A

naly

sis

Text clinical note

Terms Recognized

Negation detection

Generation of tagged dataGeneration of tagged data

Page 18: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

ROR of 2.058, CI of [1.804, 2.349]The X2 statistic has p-value < 10-7

ROR=1.524, CI=[0.872, 2.666] X2 p-value = 0.06816.

Detecting the Vioxx Risk SignalDetecting the Vioxx Risk Signal

Vioxx Patients (1,560)

RA Patients (14,079)

MI Patients (1,827)

VioxxMI (339)

p-value < 1.3x10-24

Page 19: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Detecting Adverse Events

Page 20: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Detecting Adverse Events

Linear Space Features Logarithmic Space Features

Drug frequency Drug frequency

Disease frequency Disease frequency

Observed drug-first fraction Observed co-mention count

Drug-first fraction z-score (fixed drug)

Co-mention count z-score (fixed drug)

Drug-first fraction z-score (fixed disease)

Co-mention count z-score (fixed disease)

Page 21: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Detecting Adverse Events

Page 22: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Detecting Off-label useDetecting Off-label use

Page 23: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Annotation Analytics Landscape

SNOMED-CT

Gene Ontology

Gene Sets

NCIT

ICD-9

Human Disease

Cell Type

MeSH

Drugs, Chemicals

Grant Sets

Paper Sets

AgingAging

Patient Sets

Drug Sets

:

EMRsEMRs

What questions

can we ask?

What questions

can we ask?

Health Indicator Warehouse datasets

Page 24: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Associations and outcomes

Gene Disease Drug Device Procedure Environment

Gene

Disease

Drug

Device

Procedure

Environment

Side effects

Off-label Indications

Enrichment

What questions

can we ask?

What questions

can we ask?

Page 25: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Acknowledgements

• Paea LePendu• Yi Liu• Srinivasan Iyer• Steve Racunas• Anna Bauer-Mehren• Clement Jonquet• Rong Xu

• Mark Musen• NIH – NCBO funding• Mayo Team

• Hongfang Liu• Stephen Wu

• Sylvia Holland• Alex Skrenchuk

Page 26: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Mining Annotations of Grants, Publications

Grants from 1972 to 2007 30 funding agencies

Publications from MedlineOnly “Journal articles”

Page 27: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Sponsorship and AllocationSponsorship and Allocation

Page 28: Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

Who funds whatWho funds what