semantic web technologies: a paradigm for medical informatics

31
Semantic Web Technologies: A Paradigm for Medical Informatics Chimezie Ogbuji (Owner, Metacognition LLC.) http://metacognition.info/presentations/SWTMedicalInformatics.p http://metacognition.info/presentations/SWTMedicalInformatics.p

Upload: deacon

Post on 15-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Semantic Web Technologies: A Paradigm for Medical Informatics. Chimezie Ogbuji (Owner, Metacognition LLC.). http://metacognition.info/presentations/SWTMedicalInformatics.pdf http://metacognition.info/presentations/SWTMedicalInformatics.ppt. Who I am. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semantic Web Technologies: A Paradigm for Medical  Informatics

Semantic Web Technologies: A Paradigm for Medical InformaticsChimezie Ogbuji (Owner, Metacognition LLC.)

http://metacognition.info/presentations/SWTMedicalInformatics.pdfhttp://metacognition.info/presentations/SWTMedicalInformatics.ppt

Page 2: Semantic Web Technologies: A Paradigm for Medical  Informatics

Who I amWho I amCirca 2001: Introduced to web standards

and Semantic Web technologies2003-2011: Lead architect of CCF in-house

clinical repository project2006-2011: Member representative of CCF

in World-wide Web Consortium (W3C)◦ Editor of various standards and Semantic Web

Health Care and Life Sciences Interest Group chair

2011-2012: Senior Research Associate at CWRU Center for Clinical Investigations

2012-current: Started business providing resource and data management software for home healthcare agencies (Metacognition LLC)

Page 3: Semantic Web Technologies: A Paradigm for Medical  Informatics

Medical Informatics Medical Informatics ChallengesChallengesSemantic interoperability

◦Exchange of data with common meaning between sender and receiver

Most of the intended benefits of HIT depend on interoperability between systems

Difficulties integrating patient record systems with other information resources are among the major issues hampering their effectiveness◦ Interoperability is a major goal for

meaningful use of Electronic Health Records (EHR)

Rodrigues et al. 2013; Kadry et al. 2010; Shortliffe and Cimino, 2006

Page 4: Semantic Web Technologies: A Paradigm for Medical  Informatics

Requirements and Requirements and SolutionsSolutionsSemantic interoperability

requires:◦Structured data◦A common controlled vocabulary

Solutions emphasize the meaning of data rather than how they are structured◦“Semantic” paradigms

Page 5: Semantic Web Technologies: A Paradigm for Medical  Informatics

Registries and Research Registries and Research DBsDBsPatient registries and clinical

research repositories capture data elements in a uniform manner

The structure of the underlying data needs to be able to evolve along with the investigations they support

Thus, schema extensibility is important

Page 6: Semantic Web Technologies: A Paradigm for Medical  Informatics

Querying InterfacesQuerying InterfacesStandardized interfaces for

querying facilitate:◦Accessibility to clinical information

systems◦Distributed querying of data from

where they resideRequires:

◦Semantically-equivalent data structuresAlternatively, data are centralized

in data warehouses

Austin et al. 2007, “Implementation of a query interface for a generic record server”

Page 7: Semantic Web Technologies: A Paradigm for Medical  Informatics

Biomedical OntologiesBiomedical OntologiesOntologies are artifacts that

conceptualize a domain as a taxonomy of classes and constraints on relationships between their members

Represented in a particular formalismIncreasingly adopted as a foundation for

the next generation of biomedical vocabularies

Construction involves representing a domain of interest independent of behavior of applications using an ontology

Important means towards achieving semantic interoperability

Page 8: Semantic Web Technologies: A Paradigm for Medical  Informatics

Biomedical Ontology Biomedical Ontology CommunitiesCommunitiesProminent examples of adoption

by life science and healthcare terminology communities:◦The Open Biological and Biomedical

Ontologies (OBO) Foundry◦Gene Ontology (GO)◦National Center for Biomedical

Ontology (NCBO) Bioportal◦International Health Terminology

Standards Development Organization (IHTSDO)

Page 9: Semantic Web Technologies: A Paradigm for Medical  Informatics

Semantic Web and Semantic Web and TechnologiesTechnologiesThe Semantic Web is a vision of

how the existing infrastructure of the World-wide Web (WWW) can be extended such that machines can interpret the meaning of data on it

Semantic Web technologies are the standards and technologies that have been developed to achieve the vision

Page 10: Semantic Web Technologies: A Paradigm for Medical  Informatics

An AnalogyAn Analogy(Technological) singularity is a

theoretical moment when artificial intelligence (AI) will have progressed to a greater-than-human intelligence

Despite remaining in the realm of science fiction, it has motivated many useful developments along the way◦The use of ontologies for knowledge

representation and IBM Watson capabilities, for example

Page 11: Semantic Web Technologies: A Paradigm for Medical  Informatics

Background: GraphsBackground: GraphsGraphs are data structures

comprising nodes and edges that connect them

The edges can be directionalEither the nodes, the edges, or

both can be labeledThe labels provide meaning to

the graphs (edge labels in particular)

Page 12: Semantic Web Technologies: A Paradigm for Medical  Informatics

Resource Description Resource Description FrameworkFrameworkThe Resource Description

Framework (RDF) is a graph-based knowledge representation language for describing resources

It’s edges are directional and both nodes and edges are labeled

It uses Universal Resource Identifiers (URI) for labeling

Foundation for Semantic Web technologies

Page 13: Semantic Web Technologies: A Paradigm for Medical  Informatics

RDF: ContinuedRDF: ContinuedThe edges are statements (triples)

that go from a subject to an objectSome objects are text valuesSome subjects and objects can be left

unlabeled (Blank nodes)◦Anonymous resources: not important to

label them uniquelyThe URI of the edge is the predicatePredicates used together for a

common purpose are a vocabulary

Page 14: Semantic Web Technologies: A Paradigm for Medical  Informatics

Subject: Dr. X (a URI)Object: ChimePredicate: treatsVocabulary:

◦treats, subject of record, author, and full name

Page 15: Semantic Web Technologies: A Paradigm for Medical  Informatics

RDF vocabulariesRDF vocabulariesHow meaning is interpreted from an RDF

graphThere are vocabularies that constrain

how predicates are used◦ Want a sense of treats where the subject is a

clinician and the object is a patient There is a predicate relating resources to

the classes they are a member of (type)There are vocabularies that define

constraints on class hierarchiesThese comprise a basic RDF Schema

(RDFS) languageRepresented as an RDF graph

Page 16: Semantic Web Technologies: A Paradigm for Medical  Informatics
Page 17: Semantic Web Technologies: A Paradigm for Medical  Informatics

Ontologies for RDFOntologies for RDFThe Ontology Web Language (OWL)

is used to describe ontologies for RDF graphs

More sophisticated constraints than RDFS

Commonly expressed as an RDF graph

Defines the meaning of RDF statements through constraints:◦On their predicates◦On the classes the resources they relate

belong to

Page 18: Semantic Web Technologies: A Paradigm for Medical  Informatics
Page 19: Semantic Web Technologies: A Paradigm for Medical  Informatics

OWL FormatsOWL FormatsMost common format for

describing ontologiesDistribution format of ontologies

in the NCBO BioPortalSNOMED CT distributions include

an OWL representation◦RDF graphs can describe medical

content in a SNOMED CT-compliant way through the use of this vocabulary

Page 20: Semantic Web Technologies: A Paradigm for Medical  Informatics

Validation and DeductionValidation and DeductionOWL is based on a formal,

mathematical logic that can be used for validating the structure of an ontology and RDF data that conform to it (consistency checking)

Used to deduce additional RDF statements implied by the meaning of a given RDF graph (logical inference)

Logical reasoners are used for this

Page 21: Semantic Web Technologies: A Paradigm for Medical  Informatics

InferenceInferenceCan infer anatomical location

from SNOMED CT definitions

Hypertension DX <-> 1201005 / “Benign essential hypertension (disorder)”

Page 22: Semantic Web Technologies: A Paradigm for Medical  Informatics

Querying RDF GraphsQuerying RDF GraphsSPARQL is the official query language

for RDF graphsComparable to relational query

languages ◦Primary difference: it queries RDF triples,

whereas SQL queries tables of arbitrary dimensions

Includes various web protocols for querying RDF graphs

Foundation of SPARQL is the triple pattern

(?clinician, treats, ?patient)◦?clinician and ?patient are variables (like a

wildcard)

Page 23: Semantic Web Technologies: A Paradigm for Medical  Informatics

Which physicians have given essential hypertension diagnoses and to whom?

(?physician, author, ?dx)(?physician, treats, ?patient)(?dx, subject of record, ?patient)(?dx, type, Hypertension DX)

?physician ?patient ?dx

Dr. X Chime …

Page 24: Semantic Web Technologies: A Paradigm for Medical  Informatics

SPARQL over Relational SPARQL over Relational DataDataMost common implementations convert

SPARQL to SQL and evaluate over:◦a relational databases designed for RDF

storage◦an existing relational database

There are products for both approachesFormer requires native storage of RDF

◦Relational structure doesn’t change even as RDF vocabulary does (schema extensibility)

Elliot et al. 2009, “A Complete Translation from SPARQL into Efficient SQL”

Page 25: Semantic Web Technologies: A Paradigm for Medical  Informatics

SPARQL over Existing SPARQL over Existing Relation DataRelation Data“Virtual RDF view”

◦Translation to SQL follows a given mapping from existing relational structures to an RDF vocabulary

◦Allows non-disruptive evolution of existing systems

◦Well-suited as a standard querying interface over clinical data repositories

◦They can be queried as SPARQL, securely over encrypted HTTP

Page 26: Semantic Web Technologies: A Paradigm for Medical  Informatics
Page 27: Semantic Web Technologies: A Paradigm for Medical  Informatics

Example: Cleveland Clinic Example: Cleveland Clinic (SemanticDB)(SemanticDB)Content repository and data

production system released in Jan. 2008

80 million (native) RDF statements◦Uses vocabulary from a patient record

OWL ontology for the registryBased on

◦Existing registry of heart surgery and CV interventions

◦200,000 patient records◦Generating over 100 publications per

yearPierce et al. 2012, “SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting”

Page 28: Semantic Web Technologies: A Paradigm for Medical  Informatics

Cohort IdentificationCohort IdentificationInterface developed in

conjunction with CycorpLeverage their logical reasoning

system (Cyc)◦Identifies cohorts using natural

language (NL) sentence fragments◦Converts fragments to SPARQL◦SPARQL is evaluated against RDF

store

Page 29: Semantic Web Technologies: A Paradigm for Medical  Informatics

Example: Mayo Clinic Example: Mayo Clinic (MCLSS)(MCLSS)Mayo Clinic Life Sciences System

(MCLSS)◦Effort to represent Mayo Clinic EHR data

as RDF graphs◦Patient demographics, diagnoses,

procedures, lab results, and free-text notes

◦Goal was to wrap MCLSS relational database and expose as read-only, query-able RDF graphs that conform to standard ontologies

◦Virtual RDF viewPathak et al. 2012, "Using Semantic Web Technologies for Cohort Identification from Electronic Health Records for Clinical Research"

Page 30: Semantic Web Technologies: A Paradigm for Medical  Informatics

Example: Mayo Clinic Example: Mayo Clinic (CEM)(CEM)Clinical Element Model (CEM)

◦Represents logical structure of data in EHR

◦Goal: translate CEM definitions into OWL and patient (instance) data into conformant RDF

◦Use tools (logical reasoners) to check semantic consistency of the ontology, instance data, and to extract new knowledge via deduction

◦Instance data validation: correct number of linked components, value

within data range, existence of units, etc.Tao et al. 2012, ”A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data"

Page 31: Semantic Web Technologies: A Paradigm for Medical  Informatics

SummarySummarySchema extensibility

◦Use of RDFSemantic Interoperability

◦Domain modeling using OWL and RDFSStandardized query interfaces

◦Querying over SPARQLIncremental, non-disruptive adoption

◦Virtual RDF viewsMain challenge: highly disruptive

innovation