New York State Center of Excellence in Bioinformatics & Life Sciences
Biomedical Ontology in Buffalo
Part I: The Gene Ontology
Barry Smith and Werner Ceusters
New York State Center of Excellence in Bioinformatics & Life Sciences
Biomedical data is siloed
• Lab / pathology data
• Electronic Health Record data
• Clinical trial data
• Patient histories
• Medical imaging
• Microarray data
• Protein chip data
• Flow cytometry
• Genotype / SNP data2
New York State Center of Excellence in Bioinformatics & Life Sciences
Biomedical data is siloed
Data in PittsburghData owned by MedicareData owned by the NIHData owned by HIV researchersData owned by the Cleveland ClinicData owned by regional health organizations Data owned by mouse biologistsData owned by Dr McFritz
NIH mandates for data reusability
3
New York State Center of Excellence in Bioinformatics & Life Sciences
Ontology: An antidote to silos
4
Department of Philosophy135 Park HallUniversity at BuffaloBuffalo NY 14260
Department of Philosophy135 Park HallUniversity at BuffaloBuffalo NY 14260
promoting:
• information retrieval
• information consistency, and thus continuity and cumulation
• information integration
• reasoning
New York State Center of Excellence in Bioinformatics & Life Sciences
Uses of ‘ontology’ in PubMed abstracts
5
New York State Center of Excellence in Bioinformatics & Life Sciences
By far the most successful: GO (The Gene Ontology)
Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...
Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)
attacked
time
control
Puparial adhesionMolting cyclehemocyanin
Defense responseImmune response
Response to stimulusToll regulated genes
JAK-STAT regulated genes
Immune responseToll regulated genes
Amino acid catabolismLipid metobolism
Peptidase activityProtein catabloismImmune response
Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...
Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)
Microarray datashows changed
expression ofthousands of genes.
How will you spot the
patterns?
8
You’re interested in which of your hospital’s patient
data is relevant to understanding how genes
control heart muscle development
9
Lab / pathology dataEHR data
Clinical trial dataFamily history data
Medical imagingMicroarray data
Model organism dataFlow cytometry
Mass specGenotype / SNP data
How will you spot the patterns?How will you find the data you
need?10
New York State Center of Excellence in Bioinformatics & Life Sciences
GO provides a controlled system of 25,000 categories for use in annotating data
• multi-species (model organism research)
• multi-disciplinary
• open source
11
Hierarchical view representing relations between represented types
15
The GO categorizations are organized in a way which provides a tool for algorithmic
reasoning
New York State Center of Excellence in Bioinformatics & Life Sciences
$100 mill. invested in literature curation using GO
16
over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO
experimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO
New York State Center of Excellence in Bioinformatics & Life Sciences
One standard method
Sjöblöm T, et al. analyzed13,023 genes in 11 breast and 11 colorectal cancers
using baseline functional information captured by GO for given gene product types
identified 189 genes as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention.
Science. 2006 Oct 13;314(5797):268-74. 17
New York State Center of Excellence in Bioinformatics & Life Sciences
Uses of GO in studies of:
• Persistent changes in spinal cord gene expression after recovery from inflammatory hyperalgesia: a preliminary study on pain memory. PMID: 18366630
• Spinal cord transcriptional profile analysis reveals protein trafficking and RNA processing as prominent processes regulated by tactile allodynia. PMID: 17069981
• Immune system involvement in abdominal aortic aneurisms (PMID 17634102)
• Biomedical discovery acceleration, with applications to craniofacial development. PMID: 19325874
18
New York State Center of Excellence in Bioinformatics & Life Sciences
Ontology in Buffalo
Part 2: Problems of Clinical Ontologies
New York State Center of Excellence in Bioinformatics & Life Sciences
Source of all data
Reality !
20
New York State Center of Excellence in Bioinformatics & Life Sciences
Ultimate goal
A digital copy of the world21
New York State Center of Excellence in Bioinformatics & Life Sciences
Requirements for this digital copy
• R1: A faithful representation of reality• R2 … of everything that is digitally registered,
what is generic scientific theories
what is specific what individual entities exist and how they relate
• R3 … which is computable, in order to … … allow queries over the world’s past and present
… make predictions (diagnostic support, early warnings …)
… fill in gaps
… identify mistakes
...
22
New York State Center of Excellence in Bioinformatics & Life Sciences
… the ultimate crystal ball
23
New York State Center of Excellence in Bioinformatics & Life Sciences
The ‘binding’ wall
How to do it right ?
A cartoon of the world 24
New York State Center of Excellence in Bioinformatics & Life Sciences
“Better Information” must cover …
• EHR-EMR-ENR-…• PHR• Various modality-related
databases– Lab, imaging, …
• Textbooks
• Classification systems
• Terminologies
• Ontologies
Patient-specific information
Scientific “knowledge”
1
2
3
25
New York State Center of Excellence in Bioinformatics & Life Sciences
Key question
How to extend to clinical medicine the standard of quality of the GO and other ontologies based in biological science?
26
New York State Center of Excellence in Bioinformatics & Life Sciences
NCI Thesaurus (April 2008)2
27
New York State Center of Excellence in Bioinformatics & Life Sciences
NCI Thesaurus (April 2008)
?
2
28
New York State Center of Excellence in Bioinformatics & Life Sciences
MeSH: some paths from top to Wolfram Syndrome
Wolfram Syndrome
All MeSH Categories
Diseases Category
Nervous System Diseases
Cranial Nerve Diseases
Optic Nerve Diseases
Optic Atrophy
Optic Atrophies,Hereditary
NeurodegenerativeDiseases
HeredodegenerativeDisorders,
Nervous System
Eye Diseases
Eye Diseases, Hereditary
Optic Nerve Diseases
Male UrogenitalDiseases
Urologic Diseases
Kidney Diseases
Diabetes Insipidus
Female Urogenital Diseasesand Pregnancy Complications
Female Urogenital Diseases
2
32
New York State Center of Excellence in Bioinformatics & Life Sciences
What would it mean if used in the context of a patient ?
Wolfram Syndrome
All MeSH Categories
Diseases Category
Nervous System Diseases
Cranial Nerve Diseases
Optic Nerve Diseases
Optic Atrophy
Optic Atrophies,Hereditary
has
NeurodegenerativeDiseases
HeredodegenerativeDisorders,
Nervous System
Eye Diseases
Eye Diseases, Hereditary
Optic Nerve Diseases
Female Urogenital Diseasesand Pregnancy Complications
Female Urogenital Diseases
Male UrogenitalDiseases
Urologic Diseases
Kidney Diseases
Diabetes Insipidus
???
…
has
3 ???
33
New York State Center of Excellence in Bioinformatics & Life Sciences
Biomedical Ontology in BuffaloPart
3: What we do
New York State Center of Excellence in Bioinformatics & Life Sciences
The GO is amazingly successful in overcoming silo problemsbut it covers only generic biological entities of three sorts:
– cellular components
– molecular functions
– biological processes
and it does not provide representations of diseases, symptoms, …
35
New York State Center of Excellence in Bioinformatics & Life Sciences
The core of biomedical ontology in Buffalo
– extending the methodology of high quality ontologies to other domains of biology and medicine, and to EHRs and coding systems
– combining ontology with referent tracking
36
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry37
New York State Center of Excellence in Bioinformatics & Life Sciences
NCBO
NIH Roadmap Center for Biomedical Computing
Collaboration of:
Stanford Biomedical Informatics Research
Mayo Clinic
University at Buffalo
National Center for Biomedical Ontology(NCBO)
38
New York State Center of Excellence in Bioinformatics & Life Sciences
National Center for Ontological Research(NCOR)
• Army Net-Centric Data Strategy Center of Excellence – Biometrics Ontology
– Command and Control Ontology
– Universal Core Semantic Layer
39
New York State Center of Excellence in Bioinformatics & Life Sciences
Current funded biomedical ontology projects
• Protein Ontology (PRO) (NIH/NIGMS)
• Infectious Disease Ontology (IDO) (NIH/NIAID)
• Realism-Based Versioning for Biomedical Ontologies (SNOMED) (NIH/NLM)
• Ontology for Risks Against Patient Safety (RAPS) (EU)
• DSM Ontology (to support work on revision of Diagnostic and Statistical Manual of Mental Disorders
• Cleveland Clinic Semantic Database in Cardiothoracic Surgery
40
New York State Center of Excellence in Bioinformatics & Life Sciences
IDO Consortium
• MITRE, Mount Sinai, UTSouthwestern – Influenza
• IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)
• Colorado State University – Dengue Fever
• Duke University – Tuberculosis
• Cleveland Clinic – Infective Endocarditis
• University of Michigan – Brucilosis
41
New York State Center of Excellence in Bioinformatics & Life Sciences
“Better Information” must cover …
• EHR-EMR-ENR-…• PHR• Various modality-related
databases– Lab, imaging, …
• Textbooks
• Classification systems
• Terminologies
• Ontologies
Patient-specific information
Scientific “knowledge”
1
2
3
42
New York State Center of Excellence in Bioinformatics & Life Sciences
Ontologies
Keeping track of what is general (diabetes, malaria, nasal bone, nose …)
43
New York State Center of Excellence in Bioinformatics & Life Sciences
Referent tracking
Keeping track of what is particular (this particular nasal bone, this particular fracture, this particular swimming pool, this particular image …)
44
New York State Center of Excellence in Bioinformatics & Life Sciences
Ontology for Risks Against Patient Safety
46
New York State Center of Excellence in Bioinformatics & Life Sciences
REMINE: RT-based adverse event analysisIUI Particular description Properties
#1 the patient who is treated #1 member C1 since t2 #2 #1’s treatment #2 instance_of C3
#2 has_participant #1 since t2
#2 has_agent #3 since t2
#3 the physician responsible for #2 #3 member C4 since t2 #4 #1’s arthrosis #4 member C5 since t1 #5 #1’s anti-inflammatory treatment #5 part_of #2
#5 member C2 since t3 #6 #1’s physiotherapy #6 part_of #2 #7 #1’s stomach #7 member C6 since t2 #8 #7’s structure integrity #8 instance_of C8 since t0
#8 inheres_in #7 since t0 #9 #1’s stomach ulcer #9 part_of #7 since t3 #10 coming into existence of #9 #10 has_participant #9 at t3 #11 change brought about by #9 #11 has_agent #9 since t3
#11 has_participant #8 since t3
#11 instance_of C10 at t3 #12 noticing the presence of #9 #12 has_participant #9 at t3+x
#12 has_agent #3 at t3+x
#13 cognitive representation in #3 about #9 #13 is_about #9 since t3+x
47