a demonstration of the use of datagrid testbed and services for the biomedical community

9
WP10 A demonstration of the use of Datagrid testbed and services for the biomedical community Biomedical applications work package V. Breton, Y Legré (CNRS/IN2P3) R. Météry (CS) Credits : C. Blanchet, T. Contamine, S. Gadras, M. Joubert, A.Minne, J. Montagnat

Upload: aaralyn

Post on 12-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

A demonstration of the use of Datagrid testbed and services for the biomedical community. Biomedical applications work package V. Breton, Y Legré (CNRS/IN2P3) R. Météry (CS) Credits : C. Blanchet, T. Contamine, S. Gadras, M. Joubert, A.Minne, J. Montagnat. The Visual DataGrid Blast. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A demonstration of the use of Datagrid testbed and services for the biomedical community

WP10

A demonstration of the use of Datagrid testbed and services for the biomedical

community

Biomedical applications work package

V. Breton, Y Legré (CNRS/IN2P3)

R. Météry (CS)

Credits : C. Blanchet, T. Contamine, S. Gadras, M. Joubert, A.Minne, J. Montagnat

Page 2: A demonstration of the use of Datagrid testbed and services for the biomedical community

WP10

The Visual DataGrid Blast

• A graphical interface to enter query sequences and select the reference database

• A script to execute the BLAST algorithm on the grid

• A graphical interface to analyze results

Page 3: A demonstration of the use of Datagrid testbed and services for the biomedical community

WP10 When/Where do biologists use BLAST ?

• (When ?) The first step for analysing new sequences: to compare DNA or protein sequences to other ones: stored in personal or public databases

• (Where ?) in a laboratory with an updated version of the genomics and post-genomics data banks– Requires equipment to store databases and run algorithms– Requires manpower for system & network maintenance and frequent

update of databases

• Most biologists use “integrated” web portals for their genomics comparative analysis: no need to worry about the biological file format and the method arguments

Page 4: A demonstration of the use of Datagrid testbed and services for the biomedical community

WP10Web portals for biologists under

growing pressure • Biologist enters sequences through web interface• Pipelined execution of bio-informatics algorithms

– Genomics comparative analysis– Phylogenetics– 2D, 3D molecular structure of proteins…

• The algorithms are executed on a local cluster– Big labs have big clusters …– But growing pressure

• More and more biologists• compare larger and larger sequences (whole

genomes)…• to more and more genomes…• with fancier and fancier algorithms !!

Page 5: A demonstration of the use of Datagrid testbed and services for the biomedical community

WP10Executing BLAST on the grid

UIJDL

Information Information ServiceService

Logging & Logging & BookkeepingBookkeeping

Job Submission Job Submission ServiceService

InInput Sandbox :put Sandbox :Input sequencesInput sequences

Job S

ub

mit

Event

Job S

ub

mi t

Event

Job StatusJob Status

Computing Computing ElementElement StorageStorage ElementElement

Credit : Fabio Hernandez

Replica Replica CatalogCatalog

DBBLAST

OutOutput put Sandbox :Sandbox :BLAST resultBLAST result

Resource BrokerResource Broker

DB

Page 6: A demonstration of the use of Datagrid testbed and services for the biomedical community

WP10 Actual demonstration

DB

BLASTSeq1 > dcscdssdcsdcdsc

bscdsbcbjbfvbfvbvfbvbvbhvbhsvbhdvbhfdbvfd

Seq2 > bvdfvfdvhbdfvb

bhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkc

Seqn > bvdfvfdvhbdfvb

bhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkchdsqhfduhdhdhqedezhhezldhezhfehflezfzejfv

DB

BLAST

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

DB

BLAST

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

DB

BLAST

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

RESULTdedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbfvbfvbvfbvbvbhvbhsvbhdvbhfdbvfdbvdfvfdvhbdfvbhdbhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkcqhdsqhfduhdhdhqedezhdhezldhezhfehflezfzeflehfhezfhehfezhflezhflhfhfelhfehflzlhfzdjazslzdhfhfdfezhfehfizhflqfhduhsdslchlkchudcscscdscdscdscsddzdzeqvnvqvnq! Vqlvkndlkvnldwdfbwdfbdbd wdfbfbndblnblkdnblkdbdfbwfdbfn

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

UI

Computing element

Computing element

Inputfile

Computing element

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

Seq1 > dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbdfndfjvbndfbnbnfbjnbjxbnxbjk:nxbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

Seq2 > dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbdfndfjvbndfbnbnfbjnbjxbnxbjk:nxbf

dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf

Seqn > dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbdfndfjvbndfbnbnfbjnbjxbnxbjk:nxbf

Page 7: A demonstration of the use of Datagrid testbed and services for the biomedical community

WP10

The Grid impact on computing

• Swissprot vs Swissprot (100000 sequences)– Running time on one CPU : 228 hours

– Tests at Institut de Biologie et Chimie des Protéines (quadripro) : ~49 hours

– Tests on DataGrid (cc-in2p3) : 3 hours

• Impacts :– Reduced pressure on local computing

– Ability to handle very large jobs

Page 8: A demonstration of the use of Datagrid testbed and services for the biomedical community

WP10The grid impact on data handling

• DataGrid will allow mirroring of databases– An alternative to the

current costly replication mechanism

– Allowing web portals on the grid to access updated databases

BiomedicalBiomedicalReplica Replica CatalogCatalog

Trembl(EBI)

Swissprot(Geneva)

Page 9: A demonstration of the use of Datagrid testbed and services for the biomedical community

WP10This demo illustrates how grids can bring a revolution to genomics

• Grids expand the performances of genomics web portals– Distributed execution of bio-informatics algorithms,– Even the ones requiring huge amount of CPU– Maintenance of up-to-date biological databases over the network

• Grids open new perspectives in large scale genomics analysis– Complete genome annotation– Cross-genomes analysis– Data mining on distributed databases– Pipelining of huge automatic bio-informatics analysis– …