biomart update

Post on 27-May-2015

1.032 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Title: Biomart 2007Author: Arek Kasprzyk

TRANSCRIPT

BioMart 2007

Arek KasprzykEuropean Bioinformatics InstituteBOSC Vienna, July 2007

Data Flow

Mart

JAVA

PERL

Source data

DAS

WebGUI

Commandline

Desktop GUI

WebService

Data Flow

JAVA

PERLMartDAS

WebGUI

Commandline

Desktop GUI

WebService

Admin Tools

Recent developments (0.4- 0.6)

• MartBuilder

• MartView

• Web services

• API

• DAS

• Central Server

• More deployers

Data Flow

Mart

JAVA

PERL

Source data

DAS

WebGUI

Commandline

Desktop GUI

WebService

MartBuilder

MartBuilder

MartBuilder

MartView

APImy $initializer = BioMart::Initializer->new('registryFile'=>$confFile);my $registry = $initializer->getRegistry;my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>’central_server_1');

$query->setDataset("hsapiens_gene_ensembl"); $query->addFilter("chromosome_name", [”1"]);

$query->addAttribute("ensembl_gene_id"); $query->addAttribute("ensembl_transcript_id"); $query->addAttribute(”ensembl_peptide_id"); $query->setDataset(“msd”); $query->addFilter(”experiment_type", [”NMR"]); $query->addAttribute("pdb_id"); $query->addAttribute(”resolution");

$query->addAttribute(”release_date"); $query->addAttribute(”header");

my $query_runner = BioMart::QueryRunner->new(); $query_runner->execute($query); $query_runner->printResults();

Web service

<Query virtualSchemaName="central_server_1">

<Dataset name="hsapiens_gene_ensembl" > <Filter name="chromosome_name" value="1"/><Attribute name="ensembl_gene_id"/><Attribute name="ensembl_transcript_id"/><Attribute name="ensembl_peptide_id"/> </Dataset>

<Dataset name="msd"> <Filter name="experiment_type" value=”NMR"/><Attribute name="pdb_id"/><Attribute name=”resolution"/><Attribute name=”release_date"/><Attribute name=”header"/>

</Dataset></Query>

MartService

• Meta data(GET)– Marts

– Datasets

– Configuration

• Queries (POST)

Meta data

http://www.mycompany.com/mypath/martservice?

• Martstype=registry

• Datasetstype=datasets&mart=mymart

• Configurationtype=configuration&dataset=mydataset

Query

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Query><Query virtualSchemaName = "default" count = "" softwareVersion = "0.5" > <Dataset name="hsapiens_gene_ensembl" >

<Attribute name="ensembl_gene_id"/><Attribute name="ensembl_transcript_id"/><Filter name="chromosome_name" value="1"/><Filter name="band_end" value=”p36.33"/><Filter name="band_start" value=”q44"/>

</Dataset>

<Dataset name="msd"><Attribute name="pdb_id"/><Attribute name=”experiment_type"/><Filter name="experiment_type" value=”NMR"/>

</Dataset></Query>

wget -q 'http://www.biomart.org/biomart/martservice?query=

-O 5utr.dat

Results

• Ordered according to 1. Datasets

2. Attributes

• Default Format TSV– Can be altered by specifying a formatter

Genomic data

Uniprot, MSD, ArrayExpress

Model organism databases

Developmental models

Proteomics

Name Fragment Position Alleles strand

SNP1 AL139258 1659852 T/A 1

SNP2 NT_25698 2569873 C/T -1

SNP3 chr13 1125698 C/G 1

Data conversion and integration

Ensembl

HapMap

NCBI

UCSC

Priopriatery data

Diabetes-Gene Association DataBase

Combined proprietary and

public data

Genetics of Infectious and Autoimmune Diseases, Pasteur Institute, INSERM U730, Paris, France.

Target SNP selection for the study of

type 1 diabetes (T1D), malaria and dengue

CAPRISA understanding HIV pathogenesis and epidemiology as

well as HIV/AIDS treatment and prevention

Clinical Data

MID

Cellular ImmunityHumoral Immunity HLA TypingSequence &

Sequence Related

Pipeline

Unilever

• Human study to evaluate Omics in assessing safety indicators

• Study of skin inflammation in response to detergent

• Skin samples taken and analyzed with multiple Omics techniques. – Blood– Skin biopsy– Microdialysis

1. Filter 2. Attributes

3. Results

Use Example 1 All genes in the human genome

up-regulated in Pancreatic Adenocarcinomas (PDACs) vs Normal Pancreas (ND))

1. Filter 2. Attributes

3. Results

Use Example 2 all upstream sequences for all genes on chromosome 1

up-regulated in Pancreatic Adenocarcinomas (PDACs) vs Normal Pancreas (ND))

1. Filter2. Attributes

Use Example 3

3. Results

Just Finished my experiment and would like to get the overlaps between my results and those reported in previous studies !

Web service

Perl

DAS

Bioconductor package biomaRt

Galaxy

Taverna

Central Server (www.biomart.org)

www.biomart.org/biomart/martservice

Future plans

New configuration system

• Normalized– Based on a partition table concept– Unified pointer system -> relational attribute– Configuration merge - implicit federation– Write to the db

• Run time slice and dice of a registry object rather than combinatorial pre-compilation

New configuration system

• Scalability– Updates and maintenance of large configurations

– Run time server scalability (cache and memory)

– Scalable for multiple mart users (single instance - security)

– Scalable for alternative configurations (new MartGUI framework)

New MartGUI framework

• Components– Alternative DS Configurations– Alternative GUIs (MView, MQForm,MSForm etc)– Alternative Analyzers/Vizualizers (optional install)

• Extensible – Custom extensions to the components

• Common interface – Formatters, DAS, Analyzers, Visualizers– Importable/Exportable pair interface

New GUI framework

• Old ‘GUI unit’:– full registry+MartView+default formatters– Customization limited to colors and headers

• New ‘GUI unit’:- RegistrySlice+ MartGUI+Visualizer/Analyzer- Combine units into your unique functional

environment- Functional level customization

New GUI framework

Gene Id conversion

Functional annotation

Compare two gene lists

Analyze gene list Draw distribution

Full search

Draw bla bla chart

Home

Welcome to my data mining website

SITE HEADER

New GUI framework

Gene Id conversion

Functional annotation

Compare two gene lists

Analyze gene list Genbank

TremblUniprot

Submit

Draw distribution

Full search

paste your ids here

Draw bla bla chart

Hugo

Home

SITE HEADER

New GUI framework

Home Gene Id converterFu

Full search

Welcome to my data mining website

New GUI framework

Hugo Genebank

Uniprot Swissprot

Submit

paste your ids here

HomeFu

Full searchGene Id conversion

Cytogenetic distribution of pancreatic cancer genes satisfying my query (histogram)

Cytogenetic distribution of pancreatic cancer genes satisfying my query (ideogram)

Cytogenetic distribution of chromosomal aberrations in pancreatic cancer

New GUI framework

New GUI framework

New configuration tool

• MartConfigurator– Handles a complete registry object

– Defines GUI units

– Automated service discovery

– Manual link override

– Automated updates for large configurations

– Improved user interaction

Credits• Martians

– Syed Haider

– Richard Holland

– Damian Smedley

• Contributors– Steffen Durinck (NCI, NIH)

– Eric Just (Northwestern University)

– Don Gilbert (Indiana University)

– Darin London (Duke University)

– Will Spooner (CSHL)

– Gudmundur Thorisson (CSHL)

– Benoit Ballester (Universite de la Mediterranee)

– James Smith (Ensembl)

– Arne Stabenau (Ensembl)

– Andreas Kahari (Ensembl)

– Craig Melsopp (Ensembl)

– Katerina Tzouvara (EBI)

– Paul Donlon (Unilever)

top related