awakening clinical data: semantics for scalable...

19
Awakening Clinical Data: Semantics for Scalable Medical Research Informatics Satya S. Sahoo Division Medical Informatics Electrical Engineering and Computer Science Department Case Western Reserve University Cleveland, OH, USA

Upload: trinhque

Post on 30-May-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Satya S. Sahoo Division Medical Informatics

Electrical Engineering and Computer Science Department Case Western Reserve University

Cleveland, OH, USA

Patient Reports

Polysomnograms 1-20GB each

source: PRISM project, BME dept CWRU

source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU

source: PRISM project CWRU

500-600MB per patient per stay in EMU

Epilepsy Monitoring Unit (EMU) Data

Pathology Reports, Tissue Bank

National Sleep Research Resource: 500 TB

Case Western EMU: 250 TB

Wireless Health Data source: CWRU School of Engineering

MRI: 50-100MB PET: 60-100MB

MRI, PET scans

143, 961 Patients per year (e.g. Emory)

~5.6 billion wireless connections and growing

Big Picture of Data in Clinical Research

Patient Reports

Polysomnograms 1-20GB each

source: PRISM project, BME dept CWRU

source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU

source: PRISM project CWRU

500-600MB per patient per stay in EMU

Epilepsy Monitoring Unit (EMU) Data

Pathology Reports, Tissue Bank

National Sleep Research Resource: 500 TB

Case Western EMU: 250 TB

Wireless Health Data source: CWRU School of Engineering

MRI: 50-100MB PET: 60-100MB

MRI, PET scans

143, 961 Patients per year (e.g. Emory) •  Ultra large volume of data and growing rapidly

•  Data is Multi-modal, Heterogeneous •  Heterogeneity: Syntactic, Structural, Semantic

~5.6 billion wireless connections and growing

Big Picture of Data in Clinical Research

Patient Reports

Polysomnograms

source: PRISM project, BME dept CWRU

source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU

source: PRISM project CWRU

Epilepsy Monitoring Unit (EMU) Data

Pathology Reports, Tissue Bank

Exemplar: Sleep Medicine Research

Wireless Health Data source: CWRU School of Engineering

MRI, PET scans

Scalability in Medical Informatics: Beyond Volume

Patient Reports

Polysomnograms

source: PRISM project, BME dept CWRU

source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU

source: PRISM project CWRU

Epilepsy Monitoring Unit (EMU) Data

Pathology Reports, Tissue Bank

Exemplar: Sleep Medicine Research

Wireless Health Data source: CWRU School of Engineering

MRI, PET scans

•  Multi-Center Studies with differing administrative requirements – business logic

•  Dynamic data – grows over project duration •  Data Semantics as foundation to support a

wide spectrum of users – clinicians, nurse practitioners, research fellows

Scalability in Medical Informatics: Beyond Volume

A Wish List for Scalable Clinical Data Management

•  Reconcile Data Heterogeneity – most critical to successful translational research o  Syntactic heterogeneity – less of a problem, data dictionaries

help o  Structural heterogeneity – problematic, XML somewhat helpful o  Semantic heterogeneity – a huge problem, ontologies to the

rescue? •  Provenance – essential for data quality, compliance, insight

o  Blood Oxygen Baseline: oxygen saturation during the first 15 or 30 seconds of sleep

o  Patient blood report last month cause of change in medication – Domain Provenance (not just tuple provenance)

•  Intuitive access to information – clinical trials eligibility, cohort identification

•  Scalable - Data sources, research partners added or removed dynamically

A “not to do” list for Clinical Data Management

•  No Linked Open Patient Data – HIPAA, HITECH Act (US), Data Protection Act (UK) o De-identified data – IRB approval

•  Ontology as global schema – but no RDF o Vast majority as RDB o Practical issues with RDF – cannot be institution-

specific URI (privacy)

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch

Physio-MIMI: Multi‐Modality, Multi‐Resource Environment for Physiological and Clinical Research

Sleep Domain Ontology

Any number of

new centers

FMA

OGMS …

SNOMED-CT

Clinical Researcher

Physio-MIMI: Enabling Scalable Medical Research

•  NCRR‐funded, multi‐CTSA site project: Sleep medicine as exemplar

•  Federated data management – scalable, adapts to changing data access policies

•  Ontology-driven: o Data mappings – Ontology class to data dictionary terms

(manually curated) o Drive query interface o Manage provenance

•  Privacy aware, IRB-compliant •  Collaboration among Case Western, U. of Michigan,

Marshfield Clinic and U. of Wisconsin, Madison o Now Harvard Medical School

Key Resource: Sleep Domain Ontology (SDO) https://mimi.case.edu/concepts

Data Mappings: SDO to Data Dictionary

Physio-Map Module •  Visual interface •  Stores mappings in XML – moving towards rules •  Dynamically executed in response to user query

User Voting

Provenance: Contextual Metadata for Clinical Research

Slide courtesy: Remo Mueller

Provenance: To Trace Variations in Data and Results

Slide courtesy: Remo Mueller

Modified from slide courtesy: Remo Mueller

Provenance: Source information for Patient Data

Slide courtesy: Remo Mueller

Intuitive Query Interface: Ontology (SDO)-driven Visual Aggregator and Explorer (VisAgE)

DataSets

Ontology Concept – Type of Query Widget

PhysioMIMI in National Sleep Research Resource

•  National Sleep Research Resource (NSSR) – scored and awaiting funding review

•  Collaboration between Harvard Medical School (domain experts) and Case Western (CS) with 15 projects o  50,000 sleep research studies – total size of 500TB

•  Semantic Data Integration – SDO and Sleep Provenance Ontology (extending W3C PROV Ontology PROV-O)

•  Signal processing tools – using a common format called European Data Format (EDF), XML-based

•  Domain analysis, cross-linking – secure Web access

Challenges: Semantics in Large Scale Clinical Data

•  Incentives for adopting RDF in clinical data management – what is already not possible in RDB?

•  OWL2, RDFS reasoning – Privacy aware reasoning, semantics-aware access control (Nguyen et al. 2012)

•  Missing Semantics? o  Variable, missing provenance in original study - re-

create provenance with (limited) provenance? o  Fine-level granularity for semantic annotation of

signal data – currently not scalable •  A little semantics does not go too far in clinical data

o  Need for greater involvement of Semantic Web community in development of EHR systems

Acknowledgements •  Guo-Qiang Zhang, Remo Mueller, Samden Lhatoo, Susan Redline, Alireza Bozorgi •  Division of Medical Informatics: Lingyun Luo, Joe Teagno, Meng Zhao, Jake Luo,

Licong Cui, Chien-Hung Chen, Catherine Jayapandian •  Physio-MIMI Team: http://physiomimi.case.edu/ •  Contact Information: [email protected],

http://cci.case.edu/cci/index.php/Satya_Sahoo