gene expression data annotation – an application of the cell type ontology helen parkinson, phd 19...
TRANSCRIPT
Gene Expression Data Annotation – an application of the cell type ontology
Helen Parkinson, PhD
19 May 2010
Use Cases
• Query support and expansion
• Data visualization and exploration
• Summary level data presentation
• Data integration via ontology terms
• Meta analysis – human and mouse
• Semantic distance queries across experiments
• Cross products between – cell lines, tissues, cell types, diseases ...
• Users – curators, biologists, engineers
• Intelligent template generation for different experiment types in submission or data presentation
• Detection of annotation inconsistency
• Annotator support, term suggestion
• Text mining at acquisition/submission for GEO data and post-hoc
• Literature text mining
Integration challenges
• 1,000,000 sample annotations in ArrayExpress (Aug 2009)• Seq DBs, tissues, metagenomics, reactions, etc• Cross database integration issues EGA/AE/ERA etc • Name value pairs ‘Disease’ =‘cancer’, semi-controlled text, papers• Algorithms, software, methods,• Parameter annotation e.g. Virtual Physiological Human• Complex phenotypes, clinical information• Embedded literature, Pubmed abstracts, full text papers,
supplemental information• Most of the data relate to cell lines, tissues, disease samples, clinical
information and phenotypes• Millions of records, legacy data, since ~1985
www.ebi.ac.uk/efo
Phenotypes
EBI Sample DatabaseMolecular databases
Genomes, genesENSEMBL
ProteinsUniPROT
ChemicalsChEBI
Archives of supporting data
Molecular Atlases
European Nucleotide Archive
Proteomics measurementsPride
Metabolomics experimentsA new database
Mol
ecul
es
Pathways(Reactome)
Transcript measurementsArrayExpress DBs
European Sample database
Atlas Querying All genes under/over expressed in cell types per species, where cell type is annotated as a variable
EFO Vital Statistics• May 2010, release 2.3 (23 monthly releases), 2888 classes (832 no xrefs)• Built in Protégé, OWL, uses DL converted to OBO• Available via OLS, BioPortal, www.ebi.ac.uk/EFO• Focus on diseases, cell types, cell lines, ‘mammalian anatomy’, plant terms,
compound, experimental processes and hardware• OWL tools available – ontology differ• Mapped to 24 semantic resources • Malaria Ontology (MALIDO) ver0.2b Mammalian phenotype (MP) ver1.309 Medical Subject
Headings (MSH) ver2009_2009_02_13 International Classification of Diseases (ICD-9) ver9 Phenotypic quality (PATO) ver1.188 CRISP Thesaurus Version 2.5.2.0 Mosquito gross anatomy (TGMA) ver1.10 Human disease (DOID) ver1.88 Chemical entities of biological interest (CHEBI) ver1.59 Drosophila gross anatomy (FBbt) ver1.30 Foundational Model of Anatomy (FMA) ver3.0 The Arabidopsis Information Resource (TAIR) (various dates) The Jackson Lab mouse database SNOMED Clinical Terms (SNOMEDCT) ver2009_01_31 Ontology for Biomedical Investigations (OBI) ver2009-11-06 Philly Units of measurement (UO) ver1.21 Microarray experimental conditions (MO) ver1.3.1.1 Plant structure (PO) Minimal anatomical terminology (MAT) ver1.1 NIFSTD (nif) ver1.4 NCI Thesaurus (NCIt) ver09.07 Cell type (CL) ver1.40 Zebrafish anatomy and development (ZFA) ver1.23 BRENDA tissue / enzyme source (BTO) ver1.3 , Relations ontology 1.2, BFO• .
Building the Experimental Factor Ontology• Position of EFO in the ‘bigger picture’• Key is orthogonal coverage, reuse of existing resources
and shared frameworks
Disease Ontology Anatomy Reference OntologyEFO
Cell Type Ontology
Chemical Entities of Biological Interest
(ChEBI)
Various Species Anatomy
Ontologies
Relation Ontology
Text mining
Deploying EFO
• Text mining at data acquisition • Ontology driven queries• Data mining• Data driven ontology development• Term requests for source ontologies
AE/GEO acquire
310,000
assays
Experiment
Archive
Re-annotate, summarize, add semantics ATLAS
Gene Expression
Atlas
Desiderata for the Cell Type Ontology
• Release with hematopoietic cell types ASAP• Mass deprecation release ASAP• All leaf nodes defined in text/logically• Cross products – anatomy, GO process• Cell line x cell types• More orthogonality - CTO as a definitive source• MIREOT for appropriate terms• EFO will import CTO name spaces (when?)• Synonyms - non-exact=bad
VBO – Vertebrate bridging ontology
• Collaboration between ArrayExpress, MRC Harwell, Cambridge Anatomy and Genetics• Scope: mouse, human, rat, teleosts• FMA view creation – ‘mammalian view’ of anatomy• Mapping to existing ontologies – single species, Uberon• Modelling using ‘homologous to’ relationship• Skeletal focus, adult stages• Evidence for homology – literature, experts, phylogeny• Workshop June 2010