the european bioinformatics institute miame and ontologies for sample description helen parkinson...

Post on 26-Dec-2015

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The European Bioinformatics InstituteThe European Bioinformatics Institute

MIAME and Ontologies for Sample Description

Helen ParkinsonMicroarray Informatics Team

European Bioinformatics Institute

EMBO Course, October 2001

The European Bioinformatics InstituteThe European Bioinformatics Institute

Talk Structure

ArrayExpress - a public database for microarray data and integration of ontologies

Ontologies for gene expression data Submission and annotation tool

The European Bioinformatics InstituteThe European Bioinformatics Institute

Problems of microarray dataanalysis

Size of the datasets Different platforms - nylon, glass

Different technologies on platforms- oligo/spotted

Referencing external databases which are not stable

Sample annotation Array annotation Need for LIMS systems and the need for

bioinformaticians

The European Bioinformatics InstituteThe European Bioinformatics Institute

General MIAME principles

Recorded info should be sufficient to interpret and replicate the experiment

Information should be structured so that querying and automated data analysis and mining are feasible

The European Bioinformatics InstituteThe European Bioinformatics Institute

A gene expression database from the data analyst’s point of view

SamplesG

enes

Gene expression levels

Sample annotations

Gene annotations

Gene expression matrix

The European Bioinformatics InstituteThe European Bioinformatics Institute

Gene Annotation

Can be given by links to gene sequence databases and GO can be used on the analysis side (function,process,cell compartment)

MIAME is flexible, allows many kinds of sequence identifiers or even sequence itself.

In some cases it’s more useful to include a real sequence than an inaccurate id

In the end we will need a mapping from a gene list to all the spots on all arrays, this is non trivial given the problems with names

The European Bioinformatics InstituteThe European Bioinformatics Institute

Sample annotation

Gene expression data only have meaning in the context of detailed sample descriptions

If the data is going to be interpreted by independent parties, sample information has to be searchable and in the database

Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description

The European Bioinformatics InstituteThe European Bioinformatics Institute

Standardisation of microarray data and annotations -MGED

group

The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. Includes most of the worlds largest microarray laboratories and companies (TIGR,Affymetrix Stanford,Sanger,Agilent etc)

www.mged.org

The European Bioinformatics InstituteThe European Bioinformatics Institute

Sample annotation- what can be done? Build an ontology for gene expression data

(MGED) Use existing ontologies and link them in Incorporate the ontology into the database Develop internal editing tools for the ontology Develop browser or other interface for the

ontology and link to LIMS Some use of free text descriptions are

unavoidable (curation workload)

The European Bioinformatics InstituteThe European Bioinformatics Institute

Use case scenariosReturn a summary of all experiments that use a specified type of biosource (primary source).

Group the experiments according to treatment.

Return a summary of all experiments done examining effects of a specified treatment

Group the experiments according to biosource.

Return a summary of all experiments measuring the expression of a specified gene.

Indicate when experiments confirm results, provide new information, or conflict.

The European Bioinformatics InstituteThe European Bioinformatics Institute

MIAME – Minimum Information About a Microarray Experiment

PublicationExternal links

6 parts of a microarray experiment

www.mged.org

Hybridisation ArrayGene

(e.g., EMBL)Sample

Source(e.g., Taxonomy)

Data

Experiment

Normalisation

The European Bioinformatics InstituteThe European Bioinformatics Institute

MGED Biomaterial (sample) Ontology

Under construction by Chris Stoeckert – Using OILed (though other tools exist)

Motivated by MIAME and coordinated with the database model

We will extend classes, provide constraints, define terms, provide new terms and develop cv’s for submissions (EBI)

The European Bioinformatics InstituteThe European Bioinformatics Institute

Part of the MGED biomaterial ontology

class Agedocumentation: The time period elapsed since an identifiable point in the life cycle of an

organism. If a developmental stage is specified, the identifiable point would be the beginning of that stage. Otherwise the identifiable point must be specified such as planting.

type: primitivesuperclasses: BiosourceProperty constraints: slot-constraint has_measurement has-value Measurementslot-constraint

initial_time_point has-value one-of (planting beginning_of_stage) used in slots: initial_time_point

The European Bioinformatics InstituteThe European Bioinformatics Institute

organism (NCBI taxonomy)cell source - provider cell type (if derived from primary sources (s))sexagegrowth conditionsdevelopment stageorganism part (tissue)animal/plant strain or linegenetic variation (e.g., gene knockout, transgenic variation)individualindividual genetic characteristics (e.g., disease alleles, polymorphisms)disease state or normaltarget cell typecell line and source (if applicable)in vivo treatments (organism or individual treatments)in vitro treatments (cell culture conditions)treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)compoundis additional clinical information available (link)separation technique (e.g., none, trimming, microdissection, FACS)

laboratory protocol for sample treatment……

MIAME Section on Sample Source and Treatment

The European Bioinformatics InstituteThe European Bioinformatics Institute

Examples of usable external ontologies

NCBI taxonomy database Jackson Lab mouse strains and genes Edinburgh mouse atlas anatomy HUGO nomenclature for Human genes Chemical and compound Ontologies - Merck

index TAIR Flybase GO

The European Bioinformatics InstituteThe European Bioinformatics Institute

Excerpts from a Sample Descriptioncourtesy of M. Hoffman, S. Schmidtke, Lion BioSciences

Organism: Mus musculus [ NCBI taxonomy browser ]Cell source: in-house bred mice (contact: person@somewhere.ac.uk) Sex: female [ MGED ]Age: 3 - 4 weeks after birth [ MGED ]Growth conditions: normal

controlled environment20 - 22 oC average temperaturehoused in cages according to EU legislationspecified pathogen free conditions (SPF)14 hours light cycle10 hours dark cycle

[Developmental stage]: stage 28 (juvenile (young) mice)) [ GXD "Mouse Anatomical Dictionary" ]Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ]Strain or line: C57BL/6 [International Committee on Standardized Genetic Nomenclature for Mice]Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [International Committee on Standardized Genetic Nomenclature for Mice ]Treatment: in vivo [MGED] [intraperitoneal] injection of [Dexamethasone] into mice, 10 microgram per 25 g bodyweight of the mouseCompound: drug [MGED] synthetic [glucocorticoid] [dexamethasone], dissolved in PBS

The European Bioinformatics InstituteThe European Bioinformatics Institute

Introduction to the database

ArrayExpress is implemented in Oracle The submission tool is a different

implementation of the ArrayExpress model in Mysql

Faster, easier to update Short term solution to the problem of

data submission

The European Bioinformatics InstituteThe European Bioinformatics Institute

ArrayExpress conceptual model

PublicationExternal links

Hybridisation ArraySampleSource

(e.g., Taxonomy)

Experiment

Normalisation

Gene(e.g., EMBL)

Data

The European Bioinformatics InstituteThe European Bioinformatics Institute

ArrayExpress DatabaseMAGE-OM Model

Curation Database

User Login

Array Submission

Protocol Sub.

Experiment submission

Submission tool

Query Interface for Public Data

Analysis ToolsExpression Profiler

Large ScaleSubmissionsMAGE-ML

format

Submitter LIMS

Browse Arrays

Browse Protocols

Browse Protocols

Data File ExportExternal

Applications

Browse Arrays

External Databases,

EMBL, Ontology Resources…

etc

The European Bioinformatics InstituteThe European Bioinformatics Institute

MIAMExpress Based on MIAME concepts and

questionnaire Experiment, Array, Protocol submissions CV/Ontology wherever possible Future versions organism specific pages and

related linked ontologies Allow user driven ontology development Will be developed according to user needs Will also need to be an update tool

The European Bioinformatics InstituteThe European Bioinformatics Institute

Design Considerations

Speed and ease of use, scalability Need to browse existing protocols and array

designs in ArrayExpress Requirement for curator control over

submissions Submissions tracking Future use as a LIMS Flexibility

The European Bioinformatics InstituteThe European Bioinformatics Institute

Features of MIAMExpress Creates a user login account instead of on-

the-fly submissions so sessions can be saved Allows existing protocols to be copied and

saved and linked to more than one hyb/expt Forms the basis of a LIMS using the

ArrayExpress model Will be available as a stand alone tool for

local installation Is open source and free Will be supported by curation staff and

developers

The European Bioinformatics InstituteThe European Bioinformatics Institute

The European Bioinformatics InstituteThe European Bioinformatics Institute

Expected Users

Users with limited local bioinformatics support

Users of bought in arrays without LIMS Small scale users with self made

arrays who will need to provide a description

Commercial arrays descriptions will be provided

The European Bioinformatics InstituteThe European Bioinformatics Institute

Acknowledgments

Whole Microarray Informatics Team, EBI, esp. Alvis Brazma, Mohammad Shojatalab and Ugis Sarkans

Industry Support team, EBI MGED steering committee MIAME working group Chris Stoeckert, U. Penn. and members of

MGED

The European Bioinformatics InstituteThe European Bioinformatics Institute

Demo Version of MIAMExpress

Coming soon to www.ebi.ac.uk.microarray

Beta tester recuitment

top related