the european bioinformatics institute miame and ontologies for sample description helen parkinson...

26
The European Bioinformatics Institute The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute EMBO Course, October 2001

Upload: naomi-george

Post on 26-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

MIAME and Ontologies for Sample Description

Helen ParkinsonMicroarray Informatics Team

European Bioinformatics Institute

EMBO Course, October 2001

Page 2: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Talk Structure

ArrayExpress - a public database for microarray data and integration of ontologies

Ontologies for gene expression data Submission and annotation tool

Page 3: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Problems of microarray dataanalysis

Size of the datasets Different platforms - nylon, glass

Different technologies on platforms- oligo/spotted

Referencing external databases which are not stable

Sample annotation Array annotation Need for LIMS systems and the need for

bioinformaticians

Page 4: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

General MIAME principles

Recorded info should be sufficient to interpret and replicate the experiment

Information should be structured so that querying and automated data analysis and mining are feasible

Page 5: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

A gene expression database from the data analyst’s point of view

SamplesG

enes

Gene expression levels

Sample annotations

Gene annotations

Gene expression matrix

Page 6: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Gene Annotation

Can be given by links to gene sequence databases and GO can be used on the analysis side (function,process,cell compartment)

MIAME is flexible, allows many kinds of sequence identifiers or even sequence itself.

In some cases it’s more useful to include a real sequence than an inaccurate id

In the end we will need a mapping from a gene list to all the spots on all arrays, this is non trivial given the problems with names

Page 7: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Sample annotation

Gene expression data only have meaning in the context of detailed sample descriptions

If the data is going to be interpreted by independent parties, sample information has to be searchable and in the database

Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description

Page 8: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Standardisation of microarray data and annotations -MGED

group

The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. Includes most of the worlds largest microarray laboratories and companies (TIGR,Affymetrix Stanford,Sanger,Agilent etc)

www.mged.org

Page 9: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Sample annotation- what can be done? Build an ontology for gene expression data

(MGED) Use existing ontologies and link them in Incorporate the ontology into the database Develop internal editing tools for the ontology Develop browser or other interface for the

ontology and link to LIMS Some use of free text descriptions are

unavoidable (curation workload)

Page 10: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Use case scenariosReturn a summary of all experiments that use a specified type of biosource (primary source).

Group the experiments according to treatment.

Return a summary of all experiments done examining effects of a specified treatment

Group the experiments according to biosource.

Return a summary of all experiments measuring the expression of a specified gene.

Indicate when experiments confirm results, provide new information, or conflict.

Page 11: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

MIAME – Minimum Information About a Microarray Experiment

PublicationExternal links

6 parts of a microarray experiment

www.mged.org

Hybridisation ArrayGene

(e.g., EMBL)Sample

Source(e.g., Taxonomy)

Data

Experiment

Normalisation

Page 12: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

MGED Biomaterial (sample) Ontology

Under construction by Chris Stoeckert – Using OILed (though other tools exist)

Motivated by MIAME and coordinated with the database model

We will extend classes, provide constraints, define terms, provide new terms and develop cv’s for submissions (EBI)

Page 13: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Part of the MGED biomaterial ontology

class Agedocumentation: The time period elapsed since an identifiable point in the life cycle of an

organism. If a developmental stage is specified, the identifiable point would be the beginning of that stage. Otherwise the identifiable point must be specified such as planting.

type: primitivesuperclasses: BiosourceProperty constraints: slot-constraint has_measurement has-value Measurementslot-constraint

initial_time_point has-value one-of (planting beginning_of_stage) used in slots: initial_time_point

Page 14: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

organism (NCBI taxonomy)cell source - provider cell type (if derived from primary sources (s))sexagegrowth conditionsdevelopment stageorganism part (tissue)animal/plant strain or linegenetic variation (e.g., gene knockout, transgenic variation)individualindividual genetic characteristics (e.g., disease alleles, polymorphisms)disease state or normaltarget cell typecell line and source (if applicable)in vivo treatments (organism or individual treatments)in vitro treatments (cell culture conditions)treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)compoundis additional clinical information available (link)separation technique (e.g., none, trimming, microdissection, FACS)

laboratory protocol for sample treatment……

MIAME Section on Sample Source and Treatment

Page 15: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Examples of usable external ontologies

NCBI taxonomy database Jackson Lab mouse strains and genes Edinburgh mouse atlas anatomy HUGO nomenclature for Human genes Chemical and compound Ontologies - Merck

index TAIR Flybase GO

Page 16: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Excerpts from a Sample Descriptioncourtesy of M. Hoffman, S. Schmidtke, Lion BioSciences

Organism: Mus musculus [ NCBI taxonomy browser ]Cell source: in-house bred mice (contact: [email protected]) Sex: female [ MGED ]Age: 3 - 4 weeks after birth [ MGED ]Growth conditions: normal

controlled environment20 - 22 oC average temperaturehoused in cages according to EU legislationspecified pathogen free conditions (SPF)14 hours light cycle10 hours dark cycle

[Developmental stage]: stage 28 (juvenile (young) mice)) [ GXD "Mouse Anatomical Dictionary" ]Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ]Strain or line: C57BL/6 [International Committee on Standardized Genetic Nomenclature for Mice]Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [International Committee on Standardized Genetic Nomenclature for Mice ]Treatment: in vivo [MGED] [intraperitoneal] injection of [Dexamethasone] into mice, 10 microgram per 25 g bodyweight of the mouseCompound: drug [MGED] synthetic [glucocorticoid] [dexamethasone], dissolved in PBS

Page 17: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Introduction to the database

ArrayExpress is implemented in Oracle The submission tool is a different

implementation of the ArrayExpress model in Mysql

Faster, easier to update Short term solution to the problem of

data submission

Page 18: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

ArrayExpress conceptual model

PublicationExternal links

Hybridisation ArraySampleSource

(e.g., Taxonomy)

Experiment

Normalisation

Gene(e.g., EMBL)

Data

Page 19: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

ArrayExpress DatabaseMAGE-OM Model

Curation Database

User Login

Array Submission

Protocol Sub.

Experiment submission

Submission tool

Query Interface for Public Data

Analysis ToolsExpression Profiler

Large ScaleSubmissionsMAGE-ML

format

Submitter LIMS

Browse Arrays

Browse Protocols

Browse Protocols

Data File ExportExternal

Applications

Browse Arrays

External Databases,

EMBL, Ontology Resources…

etc

Page 20: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

MIAMExpress Based on MIAME concepts and

questionnaire Experiment, Array, Protocol submissions CV/Ontology wherever possible Future versions organism specific pages and

related linked ontologies Allow user driven ontology development Will be developed according to user needs Will also need to be an update tool

Page 21: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Design Considerations

Speed and ease of use, scalability Need to browse existing protocols and array

designs in ArrayExpress Requirement for curator control over

submissions Submissions tracking Future use as a LIMS Flexibility

Page 22: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Features of MIAMExpress Creates a user login account instead of on-

the-fly submissions so sessions can be saved Allows existing protocols to be copied and

saved and linked to more than one hyb/expt Forms the basis of a LIMS using the

ArrayExpress model Will be available as a stand alone tool for

local installation Is open source and free Will be supported by curation staff and

developers

Page 23: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Page 24: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Expected Users

Users with limited local bioinformatics support

Users of bought in arrays without LIMS Small scale users with self made

arrays who will need to provide a description

Commercial arrays descriptions will be provided

Page 25: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Acknowledgments

Whole Microarray Informatics Team, EBI, esp. Alvis Brazma, Mohammad Shojatalab and Ugis Sarkans

Industry Support team, EBI MGED steering committee MIAME working group Chris Stoeckert, U. Penn. and members of

MGED

Page 26: The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Demo Version of MIAMExpress

Coming soon to www.ebi.ac.uk.microarray

Beta tester recuitment