Download - Haider Embrace Bosc2008
EMBRACE – BioMart Developments & Future
Syed HaiderRice Group - EBIJuly 2008
EMBRACEwww.embracegrid.info
European Model for Bioinformatics Research and Community Education
Objective:
to integrate the major databases and software
tools in bioinformatics
A Collaboration:
- European Bioinformatics Institute (EBI)
- Ontario Institute for Cancer Research (OICR)
BioMartwww.biomart.org
BioMart
A generic data management system with a particular focus on supporting biological research featuring:
- Built-in query optimisation for fast data retrieval- Data Federation- Easy to use interfaces and APIs- Web Services and DAS
In a nutshell
ATGCTGTTGTGCATGCTGGACTGGATGGCCCGATGGATGCTGTTGTGCATGCTGGACTGGATGGCCCGATGG
Source data(MySQL, Oracle, Postgres)
DB
Mart
Deploying BioMart
– STEP 1 - Transformation– STEP 2 - Configuration
1. Transformation
ATGCTGTTGTGCATGCTGGACTGGATGGCCCGATGGATGCTGTTGTGCATGCTGGACTGGATGGCCCGATGG
DB
Mart
Source data(MySQL, Oracle, Postgres)
1. TransformationMartBuilder
2. Configuration
Mart
Mart
Mart
2. ConfigurationMartEditor
User Interfaces
Concepts for End Users
1.Dataset
2.Filter
3.Attribute
Examples of all rat genes
located on chromosome 1, expressed in lungs
name, chromosome, description
of all mouse genes
ENSMUSG00000042351
exon sequences in FASTA format
of all rat genes
up-regulated in brain and associated with a QTL for
a neurological disorder
Upstream sequences
Web Service Access
<Query>
<Dataset name="hsapiens_gene_ensembl" >
<Filter name="chromosome_name" value="1"/><Attribute name="ensembl_gene_id"/><Attribute name="ensembl_transcript_id"/><Attribute name="biotype"/>
</Dataset> </Query>
wget --post-data 'query=
‘http://www.biomart.org/biomart/martservice
Web Service Access
<Query>
<Dataset name="hsapiens_gene_ensembl" >
<Filter name="chromosome_name" value="1"/><Attribute name="ensembl_gene_id"/><Attribute name="ensembl_transcript_id"/><Attribute name="biotype"/>
</Dataset> </Query>
wget --post-data 'query=
‘http://www.biomart.org/biomart/martservicemartview
VIRTUALSCHEMANAME=default
&ATTRIBUTES=hsapiens_gene_ensembl.default.feature_page.
ensembl_gene_id
&FILTERS=hsapiens_gene_ensembl.default.filters.
chromosome_name."1"
Web Service AccessXML Free URL
http://biomart.org/biomart/martview?
BioMart DAS Access
http://www.YourBioMart.org/biomart/das/DATASET/features? segment=FILTERS
http://www.biomart.org/biomart/das/default__hsapiens_gene_ensembl__ensembl_das_chr/features? segment=1:1,100000
http://www.biomart.org/biomart/das/default__hsapiens_gene_ensembl__ensembl_das_gene/features? segment=ENSG00000197194
Web based AccessHow far it has gone ?
Taverna
BiomaRt - BioConductor package
Cytoscape
Galaxy
Template Queries
Learn as you go....
Show URL Request
Show XML Query
Show Perl Script
- Scalability - Maintaining large databases and configurations
- Security- UserName/Password based access for clinical and
experimental data etc
- Multiple and Custom GUIs
Future
- Beyond rows and columns- Framework for Visualisations and Analysis Tools
Future
Visualisation: Gene List Analysis & Clinical Significance
Gene List
Query
Visualisation
Gene list analysis
Clinical SignificanceResponse to
therapy
Map genes onto genome
Map genes onto GO
Map genes onto Pathways
Map Genes onto Genome
Visualisation: Gene List Analysis & Clinical Significance
Gene List
Query
Visualisation
Gene list analysis
Clinical SignificanceResponse to
therapy
Map genes onto genome
Map genes onto GO
Map genes onto Pathways
Map Genes onto GO
GO
Biological process (32)
Cellular component (18)
Molecular Function (24)
Stem cell maintenance (7)
Positive regulation of developmental process (8)
Leukocyte mediated cytotoxicity (5)
regulation of cell killing (12)
Developmental process (15)
Cell killing (17)
Visualisation: Gene List Analysis & Clinical Significance
Gene List
Query
Visualisation
Gene list analysis
Clinical SignificanceResponse to
therapy
Map genes onto genome
Map genes onto GO
Map genes onto Pathways
Map Genes onto Pathways
Reactome
Apoptosis (43)
Intrinsic pathway for apoptosis (26)
Signaling by Wnt (10)
Signaling by TGFβ (23)
Activation of BH3-only proteins (5)
Permeabilization of mitochondria (3)
Release of apoptotic factors from mitochondria (18)
Future- Summary Pages
Annotation for each gene
1. Entrez/Ensembl gene info
2. Gene ontology/pathways
3. Biblography
4. Transcript & protein info, etc.
Genomic variations for each gene
1. for each cancer studied
Information for each patient
1. Demographics
2. History of cancer
3. Progress & outcome
4. Types of samples available
5. Histopathology of tumor
Submission support
Galaxy
BioMart TeamArek Kasprzyk (OICR-Toronto)
Syed Haider (Rice Group-EBI)
AcknowledgementsBenoit Ballester (Ensembl) Richard Holland (Ensembl)
Andreas Kahari (Ensembl) Craig Melsopp (Ensembl)
Damian Smedley (Ensembl) Arne Stabenau (Ensembl)
Asif Kibria (EBI) Gulam Patel (EBI)
Stephen Robinson (EBI) Katerina Tzouvara (EBI)
Will Spooner (CSHL) Gudmundur Thorisson (CSHL)
Darin London (Duke University) Don Gilbert (Indiana University)
Steffen Durinck (NCI NIH) Eric Just (Northwestern University)
Paul Donlon (Unilever) Christina Yung (OICR)
Igor Antoshechkin (Caltech)
Credits
References
Thanks.
BioMart Central Portal – queries served
Sept'07 Oct'07 Nov'07 Dec'07 Jan'08 Feb'08 Mar'080
200000
400000
600000
800000
1000000
1200000
1400000
1600000