semantic technology empowering real world outcomes in biomedical research and clinical practices
DESCRIPTION
Talk at Case Western Reserve university: http://engineering.case.edu/eecs/node/392TRANSCRIPT
1
Semantic technology empowering real world outcomes in
biomedical research and clinical practices
Amit ShethKno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio http://knoesis.org
http://knoesis.org/amit/hcls
Special thanks: Sujan Parera
Talk presented at Case Western Reserve University on Nov 26, 2012
Integration
Semantics
Amit Sheth
Ashutosh Jadhav
Hemant Purohit
Vinh Nguyen
Lu ChenPavan
KapanipathiPramod
Anantharam
Sujan Perera
Alan Smith
Pramod Koneru
Maryam Panahiazar
Sarasi Lalithsena Prateek Jain
Cory Henson
Ajith Ranabahu
Kalpa Gunaratna
Delroy Cameron
Sanjaya Wijeratne
Wenbo Wang
Semantic Web
• Improve the machine understandability and processing of data of all types to
• Modeling and Background Knowledge• Annotation• Complex Querying/Analysis, Reasoning
• Improve Insight from Biomedical Data• Improve Clinical Decision Making
• Vastness/Volume• Velocity• Variety/Heterogeneity• Vagueness, Uncertainty, Inconsistency, Deceit
Objective
Challenges
Approach
Identifiers: URI Character set: UNICODE
Syntax: XML
Data interchange: RDF
Querying:SPARQL
Taxonomies: RDFS
Ontologies:OWL
Rules:RIF/SWRL
Unifying logic
Proof
TrustCryptography
User interface and applications
QueryingData/Knowledge Representation
Knowledge Representation
Applications
• Semantic Search and Browsing(Doozer++, SCOONER, iExplore)
• Semantics and Services enabled Problem Solving Environment for T.cruzi(SPSE)
• Active Semantic Electronic Medical Record(ASEMR)
• Mining and Analysis of EMR(ezFIND, ezMeasure)
• kHealth
• PREscription Drug abuse Online Surveillance and Epidemiology(PREDOSE)
Biomedical
Healthcare
Epidemiology
Insights
Better Understanding
Intuitive Browsing
Hypothesis Generation
Personalization
Knowledge Exploration
Doozer++
iExplore
SCOONER
Some of the semantic tools
Knowledge Acquisition – Doozer++
• Building ontology is costly• Large volume of knowledge available in semi-
structured/unstructured format• No assurance for the credibility of such
knowledge
Knowledge Acquisition – Doozer++
Circle of Knowledgehttp://knoesis.org/node/71
Knowledge Acquisition – Doozer++
Knowledge Acquisition – Doozer++
j.1:category_science
j.1:category_neuroscience
j.1:category_cognitive_science
j.1:category_psychology
j.1:category_behavior
j.1:category_philosophy_of_mind
j.1:category_brain
j.1:category_psycholinguistics
j.1:category_neurology
j.1:category_neurophysiology
10 classes…
Knowledge Acquisition – Doozer++
Doozer++ Demo
Knowledge Acquisition from Community-Generated Content
Continuous Semantics to Analyze Real-Time Data , IEEE Internet Computing (Volume 14)
• Identify Relationships• Textual pattern-based extraction for known
relationships• Facts available in background knowledge• Find evidence for such facts• Combined evidence from many different
patterns increases the certainty of a relationship between the entities
Beyond Hierarchy
• Evaluating acquired knowledge• Explicit
• User can vote for facts• Facts presented based on user interests
• Implicit• User’s browsing history used as a indication of
which propositions are correct and interesting• Now it adds validated knowledge back to community
Validating Knowledge
Base Hierarchy from Wikipedia
SenseLab Neuroscience Ontologies
Meta KnowledgebasePubMed Abstracts
Focused pattern based extraction
Initial KB creation
Enriched Knowledgebase
HPC Keywords
Kno.e.sis: NLP based triples
NLM: Rule based BKR triples
Building Human Performance & Cognition Ontology (HPCO)
Merge
Use Case for HPCO
• Number of Entities – 2 million• Number of non-trivial facts – 3 million
• NLP Based*: calcium-binding protein S100B modulates long-term synaptic plasticity
• Pattern Based**: Olfactory Bulb has physical part of anatomic structure Mitral cell
* Joint Extraction of Compound Entities and Relationships from Biomedical Literature , Web Intel. 2008 * A Framework for Schema-Driven Relationship Discovery from Unstructured Text, ISWC 2006** On Demand Creation of Focused Domain Models using Top-down and Bottom-up Information Extraction, Technical Report
Knowledge-based Browsing - SCOONER
• Knowledge-based browsing: relations window, inverse relations, creating trails
• Persistent Projects: Work bench, Browsing history, Comments, Filtering
• Collaboration: Comments, Dashboard, Exporting projects, Importing projects
SCOONER Demo
An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains , IHI 2012- 2nd ACM SIGHIT International Health Informatics Symposium
iExploreInteractive Browsing and Exploring
Biomedical Knowledge
Architecture
Generate Novel Hypothesis
iExplore video
iExplore Demo
Turning to Applications with End Users
Active Semantic Electronic Medical Record - ASEMR
• New Drugs• Adds interaction with current drugs• Changes possible procedures to treat an
illness• Insurance coverage changes
• Will pay for drug X, but not Y• May need certain diagnosis before expensive
tests• Physicians are require to keep track of ever
changing landscape
• A Document • With semantic annotations
• entities linked to ontology• terms linked to specialized lexicon
• With actionable information• rules over semantic annotations• rule violation indicated with alerts
Atrial fibrillation with prior stroke, currently on Pradaxa, doing well.Mild glucose intolerance and hyperlipidemia, being treated by primary care.
ASEMR – Active Semantic Document
• Type of ASD• Three Ontologies
• PracticeInformation about practice such as patient/physician data
• DrugInformation about drugs, interaction, formularies, etc.
• ICD/CPTDescribes the relationships between
CPT and ICD codes
ASEMR – Active Semantic Patient Record
encounter
ancillary
event
insurance_carrier
insurance
facility
insurance_plan
patient
person
practitioner
insurance_policy
owl:thing
ambularory_episode
ASEMR – Practice Ontology Hierarchy
owl:thing
prescription_drug_ brand_name
brandname_undeclared
brandname_composite
prescription_drug
monograph_ix_class
cpnum_ group
prescription_drug_ property
indication_ property
formulary_ property
non_drug_ reactant
interaction_property
property
formulary
brandname_individual
interaction_with_prescription_drug
interaction
indication
generic_ individual
prescription_drug_ generic
generic_ composite
interaction_ with_non_ drug_reactant
interaction_with_monograph_ix_class
ASEMR – Drug Ontology Hierarchy
ASEMR
0
100
200
300
400
500
600
Month/Year
Charts
Same Day
Back Log
Before ASEMR
0100200300400500600700
Sept05
Nov 05 Jan 06 Mar 06
Month/Year
Charts Same Day
Back Log
After ASEMR
• Error Prevention• Patient care• Insurance
• Decision Support• Patient satisfaction• Reimbursement
• Efficiency/Time• Real-time chart completion• “semantic” and automated linking with
billing
ASEMR - Benefits
ASEMR Demo
Active Semantic Electronic Medical Record, ISWC 2006
Semantics and Services enabled Problem Solving Environment for
T.cruzi - SPSE
• Majority of experimental data reside in labs• Integration of lab data facilitate new insights• Formulating queries against such data required
deep technical knowledge
A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi, 2012
SPSE
• Data Sources• Internal Lab Data• External Database
• Ontological Infrastructure
• Parasite Lifecycle• Parasite
Experiment
• Query Processing• Cuebee
• Integrated internal data with external databases, such as KEGG, GO, and some datasets on TriTrypDB
• Developed semantic provenance framework and influenced W3C community
• SPSE supports complex biological queries that help find gene knockout, drug and/or vaccination targets. For example:• Show me proteins that are downregulated in the epimastigote
stage and exist in a single metabolic pathway.• Give me the gene knockout summaries, both for plasmid
construction and strain creation, for all gene knockout targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosoma brucei.
SPSE
Complex queries can also include:- on-the-fly Web services execution to retrieve additional data- inference rules to make implicit knowledge explicit
SPSE
• So many ontologies• Rich in number of concepts• Mostly concentrated on taxonomical
relationships• Applications require domain relationships
• A is_symptom_of B• C is_treated_with D
Knowledge Enrichment from Data
DataInformation
Knowledge
Knowledge Enrichment from Data
IntellegO
Background knowledge
Modified background knowledge
EMR
Knowledge Enrichment from Data
Data Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in Healthcare, BIBM 2012
An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web, Applied Ontology 2011
Knowledge Enrichment from Data
atrial Fibrillationhypertension
diabeteschest pain
weight gaindiscomfort in chest
rash skincough
weight lossheadache
edemashortness of breath
fatiguesyncope
weight losschest pain
discomfort in chestdizzy
shortness of breathnausea
vomitingheadache
coughweight gain
Diseases
Symptoms
Symptoms
From EMR From KB
Is edema symptom of atrial fibrillation? Is edema symptom of hypertension? Is edema symptom of diabetes?
Domains
Cardiology
Orthopedics
Oncology
Neurology
Etc…
No of concepts 1008161
Problems(diseases, symptoms) 125778
Procedures 262360
Medicines 298993
Medical Devices 33124
Relationships 77261
is treated with (disease -> medication) 41182
is relevant procedure (procedure -> disease) 3352
is symptom of (symptom -> disease) 8299
contraindicated drug (medication -> disease) 24428
Knowledge Enrichment from Data
with the above method
+UMLS
healthline.comdruglib.com
• 80% unstructured healthcare data • Pose challenges in
• Searching • Understanding• Mining • Knowledge discovery• Decision support
• Evidence based medicine• Federal policies promote meaningful use
Healthcare Challenge
Coding Complexity ICD-9 ICD-10
Diagnostic Codes 14,000 69,000
Procedure Codes 3,800 72,000
Example: 821.01: ICD-9 code for “closed” Fractured Femur, or thigh bone.Translates to 36 codes in ICD-10 with details regarding the precise nature of fracture, which thigh was fractured, whether a delay in healing occurred etc.
Healthcare Challenge
• Traditional methods doesn’t work• Understanding the context is crucial
Need to Do Better
Healthcare Challenge
Search Mining
Decision Support
Knowledge Discovery Evidence-based Medicine
NLP +
Semantics
Healthcare Challenge – The Solution
ezHealth
cTAKESezNLP
ezKB<problem value="Asthma" cui="C0004096"/><med value="Losartan" code="52175:RXNORM" /><med value="Spiriva" code="274535:RXNORM" /><procedure value="EKG" cui="C1623258" />
ezFIND ezMeasure ezCDIezCAC
www.ezdi.us
ezHealth - Benefits
• Advance search• All hypertension patients with ejection
fraction <40• All MI patients who are taking either beta-
blockers or ACE Inhibitors• Patients diagnose with Atrial Fibrillation on
Coumadin or Lovanox• Support core-measure initiative
Error Detection
EMR: 1. “Sepsis due to urinary tract infection….”2. “Her prognosis is poor both short term and long term, however, we will do everything possible to keep her alive and battle this infection."
SNM:40733004_infection SNM:68566005_infection_urinary_tract
A syntax based NLP extractor (such as Medlee) can extract this term and annotate as SNM:40733004_infection
By utilizing IntellegO and cardiology background knowledge, we can more accurately annotate the term as SNM:68566005_infection_urinary_tract
without IntellegOwith usage of IntellegO
Problem Problem
EMR: ”The patient is to receive 2 fluid boluses."
SNM:32457005_body_fluid
A syntax based NLP extractor (such as Medlee) can extract this term and annotate as SNM:32457005_body_fluid
without IntellegO
Problem
Fluid is part of buloses treatment, not a problem
with IntellegO
By utilizing IntellegO and cardiology background knowledge, we can determine that this is an incorrect annotation.
Treatment
Error Detection
The balance of evidence would suggest that his episode of atrial fibrillation seems to be an isolated event
He has had no documented atrial fibrillation since that time
Patient has atrial fibrillation
Patient does not have atrial fibrillation
NLP
NLP
Atrial FibrillationSyncope
Is_symptom_of
Warfarin
Atenolol
AspirinIs_medication_for
Resolve Inconsistency
She denies any chest pain but is not really function due to leg stiffness, swelling an shortness of breath
Regarding the shortness of breath, we will send for a dobutamine stress echocardiogram
Patient does not have shortness of breath
Patient has shortness of breath
NLP
NLP
Shortness of Breath
Is_symptom_of
Obesity
Hypertension
Sleep Apnea Obstructive
Resolve Inconsistency
PREscription Drug abuse Online Surveillance and Epidemiology -
PREDOSE• Non-Medical Use of Prescr - iption Drugs
• Fastest growing drug issue in US• Escalating accidental overdose deaths
• Epidemiological Data Systems• Data collection practices• Data analysis limitations
• Poor Scalability• Limited Reusability• Interoperability is
challenging• Small sample size
Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting 48 hours after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any rush to speak of, but after 5 minutes I started to feel pretty damn good. So I injected another 1 mg. That was about half an hour ago. I feel great now.
PREDOSE
http://wiki.knoesis.org/index.php/PREDOSE
Describe drug user’s knowledge, attitudes, and behaviors related to illicit use of Prescription Drugs (Information extraction)
Describe temporal patterns of non-medical use of Prescription Drugs (Trend Detection)
PREDOSE
Web Crawler Informal Text
Data StoreWeb Forums
Semantics-based Techniques
Natural Language Processing
2 4
Data Cleaning
Stage 1. Data Collection
Stage 2. Automatic Coding
Stage 3. Data Analysis and Interpretation
1
7
Qualitative and Quantitative Analysis of Drug User Knowledge, Attitudes
and Behaviors
Entity, Relationship, Sentiment and Triple Extraction
+ =
Semantic Web DatabaseInformation Extraction Module
Temporal Analysis for Trend Detection
Cuebee
Semantic Web Tools
910
Scooner
Triples/RDF Database
8
3
5
6
Schema
Instances
e.g. Opioid, Pain Pills
e.g. Suboxone, Subutex
Ontology
PREDOSE
Forum Y
Entity (pre)Entity (confirmed)+ve Sentiment-ve Sentiment
PREDOSE
Entity+ve Sentiment
Opiated Effect
Extra-medical Use of Loperamide
PREDOSE
All ForumsForum XForum YForum Z
PREDOSE
kHealth
68
Health information is now available from multiple sources
• medical records• background knowledge • social networks• personal observations • sensors• etc.
69
Foursquare is an online application which integrates a persons physical location and social network.
Community of enthusiasts that share experiences of self-tracking and measurement.
FitBit Community allows the automated collection and sharing of health-related data, goals, and achievements
kHealth
70
Sensors, actuators, and mobile computing are playing an increasingly important role in providing data for early phases of the health-care life-cycle
This represents a fundamental shift: • people are now empowered to monitor and manage their own health; • and doctors are given access to more data about their patients
kHealth
71
kHealth
72
Personal Health Dashboard
kHealth
73
Personal Health Dashboard
1 2 3
Continuous Monitoring Personal Assessment Medical Service
Auxiliary Information – background knowledge, social/community support, personal context, personal medical history
kHealth
74
?
kHealth
kHealth – Key Ingredients
75
Background Knowledge
Social Network Input
Personal Observations
Personal Medical History
76
Abstractions
Observations
kHealth
77
kHealth - Technology
observes
inheres in
perceives
sendsfocus
sends observation
Observer Quality
EntityPerceiver
79
kHealth - Technology
Background Knowledge as
Bi-partite Graph
80
kHealth - Technology
Explanation: is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
Discrimination: is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features
81
kHealth - Technology
Explanatory Feature: a feature that explains the set of observed propertiesExplanatoryFeature ≡ ssn:isPropertyOf∃ —.{p1} … ssn:isPropertyOf⊓ ⊓ ∃ —.{pn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Observed Property Explanatory Feature
Explanation
82
kHealth - Technology
Discrimination
Expected Property: would be explained by every explanatory featureExpectedProperty ≡ ssn:isPropertyOf.{f∃ 1} … ssn:isPropertyOf.{f⊓ ⊓ ∃ n}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Expected Property Explanatory Feature
83
kHealth - Technology
Discrimination
Not Applicable Property: would not be explained by any explanatory feature
NotApplicableProperty ≡ ¬ ssn:isPropertyOf.{f∃ 1} … ¬ ssn:isPropertyOf.{f⊓ ⊓ ∃ n}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Not Applicable Property Explanatory Feature
84
kHealth - Technology
Discrimination
Discriminating Property: is neither expected nor not-applicableDiscriminatingProperty ≡ ¬ExpectedProperty ¬NotApplicableProperty⊓
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Discriminating Property
Explanatory Feature
87
kHealth Demo
88
kHealth
Thank You
Visit Us @ www.knoesis.orgwith additional background at http://knoesis.org/amit/hcls