exploiting semantic technologies to build an application ontology
Post on 29-Jan-2016
44 Views
Preview:
DESCRIPTION
TRANSCRIPT
Exploiting semantic technologies to build an application ontology
James Malone PhD, Helen Parkinson PhD, Tomasz Adamusiak Phd, MD
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Overview
• Motivation• Our use cases• Annotating HTP experimental data • Integrating clinical data
• Methodology for creating the ontology• Semi-automated mapping and manual curation
• Current ontology usage• Future use
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
22.04.233
Our Use Cases
• Query support (e.g, query for 'cancer' and get also 'leukemia')• Over-representation analysis in groups of samples (analogous to the
use of GO terms in over-representation analysis in groups of genes)• Data visualisation – e.g., presenting an ontology tree to the user of
what is in the database• Data integration by ontology terms – e.g., we assume that 'kidney' in
independent studies roughly means the same, so we can count how many kidney samples we have in the database
• Intelligent template generation for different experiment types in submission or data presentation
• Summary level data • Nonsense detection – e.g. telling us that something marked as
cancer can not be marked as healthy
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
4
Scope of Experimental Factor Ontology (EFO)
• Modelling all of the experimental factors that are currently present in the ArrayExpress repository
• Experimental factors are variable aspects of an experiment design which can be used to describe an experiment
• Scope is primarily determined by data currently held in ArrayExpress
species levelsample leveldevelopmental
stage
clinical conditions level (e.g. disease)
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Developing an Experimental Factor Ontology22.04.235
‘Experimental Factors’
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
22.04.236
Annotating High Throughput Data
• Text mining at data acquisition• Ontology driven queries• Data mining
22.04.236
acquire
246,000
assays
Experiment queries > 200 species
Public/Private
Genes in ExptsRe-annotate
Gene level queries, 9 species
Public Only
ATLASSummarize
Ranked gene/
condition queries
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Integrating Clinical Data
• Use cases include: • Homologizing clinical data for study designs (e.g. GWAS)• …
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
22.04.238
Building the Experimental Factor Ontology• Position of EFO in the ‘bigger picture’• Key is orthogonal coverage, reuse of existing resources
and shared frameworks
EFO
Disease Ontology Anatomy Reference Ontology
Cell Type Ontology
Chemical Entities of Biological Interest
(ChEBI)
Various Species Anatomy
Ontologies
Relation Ontology
Text mining
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Semi-automated mapping text to ontology
• Following an evaluation from Tim Rayner we selected Double Metaphone algorithm
• Perform matching of our values in database to ontology class labels and definitions.
• Also perform mappings from EFO to other ontologies, so that EFO: cancer = NCI: cancer, DO: cancer et al.
• Sanity checking over mappings before adding to ontology
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Mapping using Agent Technology
User
Bioontologies
Search Engines
Repositories
Querying Mechanism
Component 2: ontology mappings
Component 3: ontology discovery
Component 1: MAS architecture
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
What does agent technology buy us?
• Annotation consistency• EFO_1001214 is now inconsistent
because DO_15654 has new parent
• Richer mappings (hence annotations)• EFO_1000156 can have new mappings
because new cancer class found in MIT ontology
• New potentially relevant ontologies• New ontology found relating to molecular + pathways• Semantic web compatible (i.e. can be deployed as
standards compliant service)
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
EFO Axes
EFO
process
material
information
material property
site
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Process
process
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Information
information
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Material
material
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Material Property
material property
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Using the ontology: Querying
• Public repository of gene expression data• Multiple sources – direct submissions, external databases• >200 species• 8400 experiments, 246,000 assays
22.04.2317
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
22.04.2318
Using the ontology: Atlas Querying
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Using the ontology: Exposing data via external resources• NCBO Bioportal
Developing an Experimental Factor Ontology22.04.2319
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Using the Ontology:Detecting Nonsense: Enforcing correctness
cell line (Hela)
organism part (cervix)
cell type (epithelial)
disease (cervical adenocarcinoma)
species (human)
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Using the Ontology:Detecting Nonsense: Enforcing correctness
organism part (hair follicle)
disease (cardiovascular disease)
species (human)
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Using Ontology: Integrating Clinical data for Study Design
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Future Work for EFO
• Mapping in external ids on request – Snomed-CT, FMA, ChEBI, Brenda tissue ontology etc
• API development for serving external ids from AE• Working with external ontologies to produce cross products• Extensions for clinical data capture Gen2Phen, Engage• Extensions for mouse model of human disease queries• Addressing ‘temporal dimension’• Addition of units• Improving query implementation in ArrayExpress Atlas – GUI
changes• Addition of synonyms • Semantic clustering of experiments
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Conclusion
• Ontology development for text mining, annotation, query Built with our needs in mind, however covers a wide range of experimental variables across a wide range of technologies, extensible, open source
• Xref’d to existing ontology resources when possible• Text mining works, reduces the workload• 1.0 is released on April 1st 2009• 0.10 version currently available in OLS and NCBO
bioportal• http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=EFO• http://www.ebi.ac.uk/microarray-srv/efo/
Exploiting semantic technologies to build an application ontologymalone@ebi.ac.uk
Acknowledgments• Ontology creation:
• James Malone, Helen Parkinson, Tomasz Adamusiak, Ele Holloway
• Mapping tools and text mining evaluation:• Tim Rayner, Holly Zheng
• External Specialist Review:• Trish Whetzel, Jonathan Bard
• AE Team:• Anna Farne, Ele Holloway, Margus Lukk, Eleanor Williams, Tony
Burdet, Alvis Brazma, Misha Kapushesky• EBI Rebholz Group (Whatizit text mining tool)• EC (Gen2Phen,FELICS,MUGEN, EMERALD, ENGAGE, SLING),
EMBL, NIH
Developing an Experimental Factor Ontology22.04.2325
top related