b07-genomecontent-biomart

24
BioMart 0.8 offers new tools, more interfaces, and increased flexibility through plugins Junjun Zhang BOSC 2011, Vienna, Austria July 15, 2011

Upload: bioinformatics-open-source-conference

Post on 04-Jul-2015

400 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: B07-GenomeContent-Biomart

BioMart 0.8 offers new tools, more interfaces, and increased flexibility through plugins

Junjun Zhang BOSC 2011, Vienna, Austria

July 15, 2011

Page 2: B07-GenomeContent-Biomart

2

BioMart: an open source federated data management system

•  Widely used by public/private biological databases

•  Quickly bring in-house data accessible online

•  User friendly and flexible querying interfaces: web GUI and programmatic access API (REST, Perl, biomaRt etc)

•  Automated data conversion tool

•  Effortlessly federate in-house datasets with existing public BioMart datasets

www.biomart.org  

Page 3: B07-GenomeContent-Biomart

3

BioMart 0.8 new features

•  Integrated Java application makes it possible to build a BioMart data source, configure querying and presentation interfaces, and deploy a BioMart server from a single tool (MartConfigurator)

•  Support more RDBMS (MS SQL Server, DB2, in addition to MySQL, PostgreSQL, and Oracle)

•  Create ‘virtual mart’ from 3NF normalized source database without materialization

•  New diverse Web GUIs and APIs provide added flexibility and ease of use

•  Link indexing and parallel querying optimizations

•  Support several security features (HTTPS, OpenID and oAuth protocols) for managing sensitive data

•  Extendable plugin framework for analysis and visualization

Page 4: B07-GenomeContent-Biomart

4

Basic BioMart Concepts – the Power of Simplicity Building  or  querying  a  BioMart  data  source  only  requires  understanding  of  a  few  basic  concepts:  •  DataSource  •  DataMart  •  DataSet  •  A;ribute    •  Filter  •  AccessPoint  (new)  •  Analysis  (new)  •  Parameter  (new)    

BioMart  hides  complexity  of  underlie  database  schema  and  federaCon  mechanism.  

Page 5: B07-GenomeContent-Biomart

5

BioMart dataset is organized in a reverse star schema

Page 6: B07-GenomeContent-Biomart

6

3NF normalized database can be converted to reversed star schema

Source  schema  

Reverse  star  schema  

Page 7: B07-GenomeContent-Biomart

7

BioMart system components

Query  Engine  /  Plugin  

       Client-­‐side  Plugin  

Page 8: B07-GenomeContent-Biomart

8

MartConfigurator – an integrated tool for setting up, configuring and managing a BioMart server

Page 9: B07-GenomeContent-Biomart

9

BioMart 0.8 provides several data querying GUIs MartForm

Page 10: B07-GenomeContent-Biomart

10

MartWizard

BioMart 0.8 provides several data querying GUIs

Page 11: B07-GenomeContent-Biomart

11

MartExplorer

BioMart 0.8 provides several data querying GUIs

Page 12: B07-GenomeContent-Biomart

12

Programmatic access API query syntax at the click of a button

Page 13: B07-GenomeContent-Biomart

13

Ensembl

KEGG Reactome

Mutation frequencies from cancer projects with data distributed around the globe

COSMIC

Pancreatic Expression Database (PED)

Breast Cancer Campaign Tissue Bank (BCCTB)

Special GUI - MartReport

Page 14: B07-GenomeContent-Biomart

14

Special GUI - MartAnalysis Mostly affected pathways

Page 15: B07-GenomeContent-Biomart

15

Special GUI – MartAnalysis

Sequence retrieval tool is implemented as server-side analysis plugin

Genomic sequence retrieval tool

Page 16: B07-GenomeContent-Biomart

16

New query type - Analysis Query against ‘affected_pathways’ analysis: <Query>

<Analysis name="affected_pathways" dataset="gene_oicrPanc"> <Parameter name="biotype" value="protein_coding"/> <Parameter name="file_type" value=”png"/> <Parameter name="img_height" value="8000"/> <Parameter name="img_width" value="12000"/> </Analysis>

</Query> Query against ‘gene_sequence’ sequence retrieval tool: <Query>

<Analysis name="gene_sequence"> <Parameter name="seq_type" value="gene_flank"/> <Parameter name="upstream_flank" value="500"/> </Analysis>

</Query>

Page 17: B07-GenomeContent-Biomart

17

Several large collaborative projects are using BioMart for data management

•  BioMart Central Portal (http://central.biomart.org)

•  International Cancer Genome Consortium (http://dcc.icgc.org) •  POPCURE (collaboration with Pfizer, controlled access)

Page 18: B07-GenomeContent-Biomart

18

BioMart Central Portal (central.biomart.org)

First-­‐of-­‐its  kind,  community-­‐driven  effort  to  provide  unified  access  to  dozens  of  biological  databases  spanning  genomics,  proteomics,  model  organisms,  cancer  data,  and  more  

Page 19: B07-GenomeContent-Biomart

19

BioMart Portal provides access to a collection of data sources

“Master/Slave” like

Page 20: B07-GenomeContent-Biomart

20

International Cancer Genome Consortium Data Portal

GOALS: To obtain a comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 different tumor types and/or subtypes, which are of clinical and societal importance across the globe. 500 tumor and matched control samples will be analyzed per tumor type. At present, 12 countries joined ICGC. Data will be generated by institutions all over the world.

To make the data available rapidly and with minimal restrictions, to accelerate research of the causes and control of cancer.

AUSTRALIA Ovarian cancer

(Serous cystadenocarcinoma) Pancreatic cancer

(Ductal adenocarcinoma) Prostate cancer

MEXICO Multiple sub-types

FRANCE Breast cancer

(Subtype de!ned by an ampli!cation of the HER2 gene)

Liver cancer (Hepatocellular carcinoma) (Secondary to alcohol and adiposity) Prostate cancer

(Adenocarcinoma)

EU / FRANCE Renal cancer

(Renal cell carcinoma) (Focus on but not limited to clear cell subtype)

CANADA Pancreatic cancer

(Ductal adenocarcinoma) Prostate cancer

(Adenocarcinoma)

Bladder cancer Blood cancer

(Acute myeloid leukemia) Brain cancer

(Glioblastoma multiforme/ lower grade glioma)

Breast cancer (Ductal & lobular)

Cervical cancer (Squamous)

Colon cancer (Adenocarcinoma)

Endometrial cancer (Uterine corpus endometrial carcinoma) Gastric cancer

(Adenocarcinoma) Head and neck cancer

(Squamous cell carcinoma/ Thyroid carcinoma)

Renal cancer (Renal clear cell carcinoma/ Renal papillary carcinoma)

Liver cancer (Hepatocellular carcinoma)

Lung cancer (Adenocarcinoma/ squamous cell carcinoma)

Ovarian cancer (Serous cystadenocarcinoma)

Prostate cancer (Adenocarcinoma)

Rectal cancer (Adenocarcinoma)

Skin cancer (Cutaneous melanoma)

INDIA Oral cancer

(Gingivobuccal)

GERMANY Malignant lymphoma

(Germinal center B-cell derived lymphomas)

Pediatric brain tumors (Medulloblastoma and Pediatric pilocytic astrocytoma) Prostate cancer

(Early onset) JAPAN

Liver cancer (Hepatocellular carcinoma) (Virus-associated)

CHINA Gastric cancer

(Intestinal- and di"use-type)

UNITEDKINGDOM

Bone cancer (Osteosarcoma/ chondrosarcoma/ rare subtypes)

Breast cancer (Triple negative/lobular/ other)

Chronic Myeloid Disorders (Myelodysplastic syndromes, myeloproliferative neoplasms and other chronic myeloid malignancies) Esophageal cancer Prostate cancer

EU / UNITEDKINGDOM

Breast cancer (ER positive, HER2 negative)

UNITED STATES

SPAIN Chronic lymphocytic

leukemia (CLL with mutated and unmutated IgVH)

ITALY Rare pancreatic tumors

(Enteropancreatic endocrine tumors and rare pancreatic exocrine tumors)

Page 21: B07-GenomeContent-Biomart

21

ICGC Data Portal Architecture

“Peer-to-Peer” like

Page 22: B07-GenomeContent-Biomart

22

(dcc.icgc.org)

Page 23: B07-GenomeContent-Biomart

23

Future Directions

•  Creation of BioMart Central Registry to improve coordination between BioMart servers. It will be a permanent resource where BioMart data providers can register their data models, data sources and services.

•  Enhancing data transformation module for building BioMart databases from non-RDBMS data sources (e.g. flat data files, XML data files etc) with high scalability and flexibility.

•  Enhancing the plugin system to allow various forms of

data analysis and visualization. Third parties are encouraged to develop plugins to extend the capabilities of the system.

Page 24: B07-GenomeContent-Biomart

24

The BioMart team

Joachim  Baran  Anthony  Cros  Jonathan  Guberman  Jack  Hsu  Yong  Liang  Elena  Rivkin  Bre;  Whi;y  Marie  Wong-­‐Erasmus  Long  Yao  Syed  Haider  Junjun  Zhang  Arek  Kasprzyk  

For  support:  [email protected]