content, format, and standards in genomics scale data the ilsi – ebi collaboration wm. b. mattes,...
Post on 03-Jan-2016
213 Views
Preview:
TRANSCRIPT
Content, Format, and Standards in Genomics Scale Data
The ILSI – EBI Collaboration
Wm. B. Mattes, PhD, DABT
Outline
Why do we need a database for toxicogenomics
How is it envisioned that this will be developed
What are the issues for such a database
Who is involved in such development
The ILSI – EBI Collaboration
Challenge of Genomics
“It’s the informatics, period!”
And it’s awfully tempting to take shortcuts!
Experiment
Biological ExplanationINFORMATICS
?
Why do we need a database?
Volume of data Traditional endpoints per animal
<20 histopathology observations<10 gross measurements (e.g. weights, food)<25 serum measurements<10 urinalysis measurements
Genomic endpoints per animal5,000-10,000 transcripts !!!
Why do we need a database?(cont)
Influence of technology details Influence of probe sequence
Many genes are “alternatively spliced” – such events may not be detected unambiguously by a microarray
Why do we need a database?(cont)
Influence of technology details Influence of probe sequence
Many genes are “alternatively spliced” – such events may not be detected unambiguously by a microarray
For cDNA arrays, probes may hybridize to more than one sequence
A database that captures probe sequence is required to resolve discrepancies through automated bioinformatics
How are databases being developed?
Microarray Gene Expression Data Society - MGED Society MIAME - Minimum Information About a
Microarray Experiment “the minimum information that should be reported
about a microarray experiment to enable its unambiguous interpretation and reproduction”
Essentially, what should go into the database
How are databases being developed?
MIAME – Basic Areas Experiment Design Samples used, extract preparation and
labeling Hybridization procedures and parameters Measurement data and specifications Array Design
How are databases being developed? (cont)
MGED Society MAGE
Programming conventions and data structures to communicate Microarray Gene Expression data MAGE-OM Object Model MAGE-ML Markup Language
Essentially, how the data is exchanged/ how the database is constructed
How are databases being developed? (cont)
MGED Society Ontology working group
Ontologies provide a vocabulary for representing and communicating knowledge about a topic,allowing interpretation and use by computers
MGED Ontology will provide standard terms for the annotation of microarray experiments, allowing:
structured queries
unambiguous descriptions of experiments
How are databases being developed? (cont)
MGED Society Data Transformation and Normalization
Working GroupStandards for recording how microarray data are
transformed and normalized.
What are the issues for a toxicogenomics database?
Scope of the ILSI effort: Genotoxicity Group
10 array platforms11 compounts
>2 time points, up to 10 doses / compound
Nephrotoxicity Group6 array platforms3 compounds, 260 animals
What are the issues for a toxicogenomics database?
Scope of the ILSI effort: Hepatotoxicity Group
8 array platforms2 compounds, 144 animals2 in-life studies / compound
ALL GroupsAnalysis of each sample at multiple sites
What are the issues fortoxicogenomics databases? (cont)
Traditional toxicology endpoints are not currently covered by MAGE, MIAME, or the MGED Ontologies! Organ weights Clinical pathology Histopathology Etc
What are the issues for toxicogenomics databases?
Traditional toxicology endpoints are not standardized in nomenclature Clinical pathology/chemistry
AACC IUPAC
Histopathology STP WHO/IARC/RITA NACAD SNOMED NTP, TDMS Database Pathology Code Table
Who is involved in database development
Private Companies Genelogic, Iconix, Curagen
MSU- dbZach NIEHS - CEBS NCTR - ArrayTrack ILSI - EBI
ILSI-HESI and EBI collaboration
Establishment of database for toxicogenomics data
Capture, store and analyse gene expression data produced from many different toxicogenomic experiments, conducted in several different laboratories worldwide by the ILSI-HESI members
Interrogate the gene array data integrating information from genomic, experimental and toxicological domains
Gain knowledge of possible links between gene expression changes and toxicological endpoints
ILSI-HESI and EBI collaboration
Aims of the database and tools Provide a way to integrate the different domains Control the annotation to achieve data harmonization Centralize the information to ease data access and data
sharing Improve array annotations as the genome assemblies are
released ALLOW data comparison
ILSI-HESI and EBI collaboration
Main challenge• Get internally consistent data to allow comparability among
the experiments and run complex queries across and within domains
• Note= Experiments conducted in ~40 different sites, using different array platforms and terminologies, measuring parameters with different units and storing information in different format !
ILSI-HESI and EBI collaboration
‘Simple’ question:• “Does gene X expression goes up after treatment with
compound Y with biological endpoint Z in experiments from ILSI-HESI members A and B ?”
‘Not simple’ question:• “Which are the most reproducible gene expression changes
(and the quantitative measure of this reproducibility) for all experiments on the rat arrays, with biological endpoint X, and which functional category these genes belong to and which are the human homologues ? ”
An international effort aiming to• Share expertise• Encourage harmonization• Promote standardization initiative
A call for community participation!
NIEHS-NCT
EMBL-EBI
Toxico-genomics ILSI-HESI
MIAME/Tox
MIAME/Tox objectives
Standard contextual information• Establish worldwide scientific consensus on the minimal information
descriptors for array-based toxicogenomics experiments
Data harmonization• Encourage use of controlled vocabularies for the toxicological
assessments
Data integration and data sharing• Link data within a study • Link several studies from one institution • Exchange datasets among institutions
Data storage• Facilitate development of MIAME/Tox compliant data management
softwares and databases- ArrayExpress @ EBI and CEBS @ NIEHS-NCT
MIAME/Tox document
Promote standard contextual information• Defining the core common to most experiments
- Minimum/sufficient information- Structured information
Promote data harmonization, data capture and communication
• MIAME/Tox is based on MIAME
Focus on toxicological domain• Sample treatment and conventional toxicology information
- Clinical pathology, pathology, histopathology……
MIAME/Tox document
Available at the MGED Society and ILSI-HESI web sites• Circulate for consensus
- Toxicogenomics, pharmacogenomics and ecotoxicogenomics communities
- Regulatory bodies- MGED Meeting (AAAS, Denver, Feb 2003; MGED6, France, Sept 2003)- Toxicology societies (SOT Meeting, Salt Lake City, March 2003)
• Review and publish Work closely with the MGED working groups
• Ontology working group- Identify controlled vocabularies for toxicological metadata
Data Input As a Key Step
1. Capture data in a standard manner Tox-MIAMExpress
2. Store information domains in database ArrayExpress
3. Compare/query across and within domains
Tox-MIAMExpress
Array designs• A set of procedures for formatting the array design information into a standard referencing format (ADF)• A set of procedure to re-annotate or up date the array designs via a link to another database at EBI (EnsMart)
Tox-MIAMExpress
Experiment• Experiment design, quality controls, publications• Sample source and treatment• Conventional toxicology tests data• Microarray hybridizations data
ILSI-HESI and EBI collaboration
Status: Interface and database infrastructure
developed Data input ongoing
top related