what is in there for public health? · 11/21/2008  · ontologies and data integration in...

Post on 30-May-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ontologies and data integration in biomedicineWhat is in there for Public Health?

Olivier Bodenreider

Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA

Public Health Informatics Fellowship ProgramNovember 21, 2008

Why integrate data?

Lister Hill National Center for Biomedical Communications 3

Motivation

Multiple sources of dataEach focusing on one aspect

Annotated with specific vocabulariesSpecific format

Need to be integrated to infer new factsTranslational medicine (“Bench to bedside”)BiosurveillanceTranslation into policy (Public health)

Lister Hill National Center for Biomedical Communications 4

Example Oncology

Cancerregistries

Researchdata

Clinicalrepositories

Lister Hill National Center for Biomedical Communications 5

Integrating subdomains

Cancerregistries

ICD-O

Researchdata

NCIThesaurus

Clinicalrepositories

SNOMED CT

Lister Hill National Center for Biomedical Communications 6

Example Oncology

Multiple data repositoriesInternational Classification of Diseases-Oncology (ICD-O-3)

Cancer registriesEpidemiology, Public health

SNOMED CTPatient recordsClinical care

NCI ThesaurusAnnotation of research data

Lister Hill National Center for Biomedical Communications 7

SNOMED CT

http://www.clinical-info.co.uk/

Lister Hill National Center for Biomedical Communications 8

NCI Thesaurus

http://nciterms.nci.nih.gov/

Lister Hill National Center for Biomedical Communications 9

ICD-O-3

Morphology[…]814-838 Adenomas and adenocarcinomas

8140/3 Adenocarcinoma, NOS

Anatomy[…]C60-C63 Male genital organs

C61 Prostate gland– C61.9 Prostate, NOS

Prostate gland

Adenocarcinomaof prostate

Lister Hill National Center for Biomedical Communications 10

Cancerregistries

Researchdata

Clinicalrepositories

Terminology integration NCI Metathesaurus

ICD-O

NCIThesaurus

SNOMED CT

NCI

Meta.C61.9 Prostate, NOS

C12410 Prostate gland

41216001 Prostatic structure

C0033572

Lister Hill National Center for Biomedical Communications 11

Approaches to data integration

WarehousingSources to be integrated are transformed into a common format and converted to a common vocabulary

MediationLocal schema (of the sources)Global schema (in reference to which the queries are made)

Lister Hill National Center for Biomedical Communications 12

Ontologies and warehousing

RoleProvide a conceptualization of the domain

Help define the schemaInformation model vs. ontology

Provide value sets for data elementsEnable standardization and sharing of data

ExamplesBioSense (original design)Repositories for translational research (CTSA)Clinical information systems

Lister Hill National Center for Biomedical Communications 13

Ontologies and mediation

RoleReference for defining the global schemaMap between local and global schemas

ExamplesBioSense (redesign)BioMediatorOntoFusion

Lister Hill National Center for Biomedical Communications 14

Ontologies Normalization

From the London Bills of Mortality…

Lister Hill National Center for Biomedical Communications 15

Ontologies Normalization

…to HITSP vocabulary specifications for interoperability http://www.hitsp.org/

Lister Hill National Center for Biomedical Communications 16

Gene Ontology

Ontologies Inference

gene

GO

PubMed

Gene name

OMIM

Sequence

InteractionsGlycosyltransferase

Congenital muscular dystrophy

Entrez Gene

[Sahoo, Medinfo 2007]

Lister Hill National Center for Biomedical Communications 17

From glycosyltransferaseto congenital muscular dystrophy

MIM:608840 Muscular dystrophy, congenital, type 1D

GO:0008375

has_associated_phenotype

has_molecular_function

EG:9215LARGE

acetylglucosaminyl-transferase

GO:0016757glycosyltransferase

GO:0008194isa

GO:0008375 acetylglucosaminyl-transferase

GO:0016758

Ontologies, data integrationand public health

Case study: BioSense

Lister Hill National Center for Biomedical Communications 19

Lister Hill National Center for Biomedical Communications 20

BioSense Integrating multiple datasets

Acute care data (emergency room)Demographic dataClinical data

Chief complaintsInitial / final diagnosisLaboratory data

Pharmacy data (national pharmacies)Laboratory data (national labs)Poison control center data…

Lister Hill National Center for Biomedical Communications 21

Case study Materials and methods

MaterialsBioSense publications and presentations

Methods: look for mentions ofData integration approachesUse of ontologies / terminologies / vocabularies

NormalizationInference

Lister Hill National Center for Biomedical Communications 22

BioSense Approaches to data integration

Initial approach: Warehouse

[Chambers,PHIN 2008]

Lister Hill National Center for Biomedical Communications 23

BioSense Approaches to data integration

Redesign: Mediation (Federation)

[Lenert, ASTHORoudtable 2008]

Lister Hill National Center for Biomedical Communications 24

BioSense Ontologies

Normalized vocabulary for data exchange

Lister Hill National Center for Biomedical Communications 25

BioSense Ontologies

Normalized vocabulary for data exchangeLink data elements to value sets from standard vocabularies

http://www.hitsp.org/

Lister Hill National Center for Biomedical Communications 26

BioSense Ontologies

Ontologies for aggregation and inference

[Lenert, ASTHORoudtable 2008]

Lister Hill National Center for Biomedical Communications 27

Public Health Reference Ontology

Summary

Lister Hill National Center for Biomedical Communications 29

Summary

Integration is key to making sense of the many health data repositories

E.g., Biosurveillance

Two major data integration approachesData warehousingMediation

Ontologies play a major role in data integrationNormalization of vocabularySupporting aggregation and inference

Lister Hill National Center for Biomedical Communications 30

Current trends

Semantic Web technologies provide a convenient platform for integrating biomedical data

Standard tooling

Many biomedical ontologies available, ongoing harmonization

Standard vocabulary

Emerging semantic interoperability specifications (e.g., HITSP, BRIDG, caBIG, HL7, …)

Standard information models

Lister Hill National Center for Biomedical Communications 31

Some persisting challenges

Reconcile data annotated to different ontologiesIncomplete terminology integration systems (e.g., UMLS, RxNorm), limited harmonization between terminologiesLack of permanent identifiers for biomedical entities

Biomedical ontologiesAvailabilityFormalism (OWL, OBO, RRF, …)Quality

MedicalOntologyResearch

Olivier Bodenreider

Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA

Contact:Web:

olivier@nlm.nih.govmor.nlm.nih.gov

top related