bioinformatics ontology for automatic workflow generation on web/grid services

Post on 10-Feb-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Bioinformatics Ontology for Automatic Workflow Generation on Web/Grid Services . Konagaya Akihiko Project Director Advanced Genome Information Technology Research Group RIKEN Genomic Sciences Center. Contents. Role of Ontology Web Services for Bioinformatics Automatics Workflow Generation - PowerPoint PPT Presentation

TRANSCRIPT

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Bioinformatics Ontology for Automatic Workflow Generation

on Web/Grid Services Konagaya Akihiko

Project DirectorAdvanced Genome Information Technology

Research GroupRIKEN Genomic Sciences Center

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Contents

• Role of Ontology• Web Services for Bioinformatics• Automatics Workflow Generation• Lessons from our First Experience

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Role of Ontology

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Tacit and Explicit Knowledge

Michael Polanyi (1891-1976)    

We should start from the fact that 'we can know more than we can tell'.

Michael Polanyi, “The Tacit Dimension” 1967

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Rainbow ColorHow many colors can you see in rainbow?

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Ontology for Rainbow Colors#800080 RGB Value

#000080

#0000FF

#008000

#FFFF00

#FF8000

#FF0000 Red

Yellow

Green

Blue

Indigo

Purple

Orange

From 360 nm~ 400 nm

to 760 nm~ 830 nm

All the colors you can see with your own eyes!

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Which are Purple?

#800070

#800060

#500080

#800050 #700080

#600080#800080

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Representation by Elements and Constructor

#800070

#500080

#800050

#700080

#600080

#800080

#800060

Purple

BlueElement

RedElement

Purple

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Web Services for Bioinformatics

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Advantages of Web Services

TaskC

Input

OutputTask

ATask

BTask

D

   Web Services

computing computing computing computingWeb Services

DB X DB Y DB Z

•Liberating from the maintenance of biological databases and tools•Scalability of computational resources•High-level application programming interface

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Very Simple Work Flow

BLAST SearchSequenceUniProt

GetEntry

CLUSTAL W

UniProt

Hittable

Sequences

Tree View

Multiple Alignment

Phylogenetic tree

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Manual Workflow on Web Apps

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Web Service Programming

#!/usr/bin/perl

use SOAP::Lite;

# SOAP API # specify WSDL my $service = SOAP::Lite-> service('http://xml.nig.ac.jp/wsdl/GetEntry.wsdl');

# call web service $result = $service->getXML_DDBJEntry("AB000003");

# print result print $result;

http://www.xml.nig.ac.jp/perl.txt

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Why don’t we use workflow tools?

http://www.cyclonic.org/Taverna_and_myGrid.ppt

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Needs Automatic Workflow Generate Tool from

Very High Level Specification

apply Blastp to UniProt

GetEntry from UniProt

apply CLUSTALW

apply TreeView

Workflow for

Bioinformatics Web Services

AutomaticsGeneration

?

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Automatic Generation of Bioinformatics Workflow

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Task as Atomic Componentof Workflow

Output DataSpecification

sample{ddbjentry,flatfile}

Input DataSpecification

sample{aa_sequence,fasta}

Application

sample{blastp DAD}

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Workflow as a Sequence of Tasks

Input

Output

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Automatic Generation of Workflowfrom Given Input and Output Data Specification and Tasks

• Path Finding using Meta Information

Input

Output

TaskA

TaskB

TaskC

TaskD

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Meta Data for Output

sample{ddbjentry,flatfile}

{aablastentry,hittable}

Meta Information to Specifythe Functionality of Task

Meta Data for Database

samples {uniprot}

{nt}

Meta Datafor Input

samples{na_sequence,fasta}{aa_sequence,fast}

Meta Information for Command and

Options

{blastn}{getentry}

TASK

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Task Hierarchy (is_a)Abstract Task

Concrete Task

I

I

O

OI : Input TypeO: Output Type

S : Sequence or Sequence Name

V : Various Type

N : Nucleoside Sequence

A : Amino acid Sequence

id : Accession ID

E : Database Entry

Homology search

BLAST FASTA SSEARCH

rfastablastp

fasta

・・・

S V

S S S

N

V V V

A id

N

A

id

idN idblastx

idblastn

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Task Hierarchy (has_a)

Genome Annotation[glimmer2,blastn,getEntry]

Well Known/user defined Task

glimmer2 blastn getEntryN id idS S E

ES

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Prototype for ‘Proof of Concept’

• Language   tuProlog– Java to Prolog– Prolog to Java

• Web Service Interface through JAVA API

• Task Database– Prolog Clause Database

• Optimal Path Finding– Bidirectional Breadth First Search Algorithm

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

System Overview

User

UserSpecification

Server

SPBIO

DDBJ

Workflow System

Knowledge Base

TaskDatabase

(1697)

Web ServiceInformation

(1596)

UIPrologEngine

tuProlog

Web ServiceLibrary(Java)

WorkflowLibrary(prolog)

Workflow

WorkflowExecution

UserData

Result

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Screen Snapshot(Workflow Generation Phase)

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Screen Snapshot (Workflow Execution Phase)

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Obtained Phylogenetic Treeby a generated workflow

when applying to a Human Insulin Sequence P61982[Mus musculus] P61981[Homo sapiens] Q5RC20[Pongo pygmaeus] P61983[Rattus norvegicus] Q5F3W6[Gallus gallus] P68252[Bos taurus] Q6PCG0[Xenopus laevis] Q6NRY9[Xenopus laevis] Q04917[Homo sapiens] P68509[Bos taurus] P68511[Rattus norvegicus] P68510[Mus musculus] Q6UFZ2[Oncorhynchus mykiss] Q6PC29[Brachydanio rerio] Q6UFZ3[Oncorhynchus mykiss]

A.Konagaya ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Lessons from our First Experience

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Task Database (prototype)

Web Service CallDDBJ BlastDDBJ SRSDDBJ GetEntryDDBJ ClustalWSPBIO Blast

453638

3862

405

Format TransformationData Selection

5645

In Total 1697

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Test Set of Specification

Format Type Format Typeblastp uniprotfilter num25getfasta_swissentrymultiplealignmentblastp uniprotfilter num25getfasta_swissentrymultiplealignmentalignmentsearchfiltergetentrymultiplealignmentfiltergetentrymultiplealignment

aamultiplealignment

4 fasta aasequence

3 fasta aasequence gde

aamultiplealignment

2

1 fasta aasequence gde

NoWorkflow

Input Output Aplications

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Differences of Generated WorkflowID Description

1052 alignment by blastp from UNIPROT102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier

10005 idlist[25] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4018 multiplealignment by clustalw with blosum1052 alignment by blastp from UNIPROT102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier1005 idlist[25] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4001 multiplealignment by clustalw with blosum1043 alignment by blastp from DAD102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier

10001 idlist[25] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4018 multiplealignment by clustalw with blosum1043 alignment by blastp from DAD102 identify data format and content.104 extract SeqIdentifier from ddbj BLAST result record109 extract Swiss-plot ACNumber from SequenceIdentifier

10001 idlist[5] from idlist[??]3107 Get SWISSPROT entry of FASTA Format by Accession Number.129 multi fasta format from fasta list4001 multiplealignment by clustalw with blosum

No. SolutionNum time(ms) First Match WebServiceCall

1 8 11266

2 41 46704

3 100 over 249906

4 100 over 25297 X?

InputDatabaseOutputFull Cmds

No inputDatabaseNo outputFull Cmds

inputNo DBNo outputPartial Cmds

InputNo DBOutputPartial Cmds

Meta Data

X?

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Why Failed?

Output

Output

HitTable

HitTable

Lack ofInteroperabilityBetween theWeb Services

UNIPROT

InputAmino AcidSequence

blastp

DAD

InputAmino AcidSequence

blastp

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Very Similar but not the Same FormatBlastp for UniProt

Blastp for DAD

sp|Q8HXV2|INS_PONPY Insulin precursor [Contains: Insulin B chain... 171 4e-43

L15440-1|AAA59179.1| 107|Homo sapiens insulin protein. 177 1e-43

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Conclusion

• Web Services have great potential to share Bioinformatics Data ant Tools in all over the world

• Needs Automatic Workflow Generation Tools to make full use of Web Services

• Bioinformatics Ontology is a key to establish Interoperability among Bioinformatics Web Services

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006

Acknowledgement

• Daisuke Shinbara Tokyo Institute of Technology (Hitachi, ltd.)

• Sumi Yoshikawa RIKEN GSC, TITECH

ReferencesAkihiko Konagaya: “Bioinformatics Ontology: Towards the Automatics Generation of Bioinformatics Workflow for Web Services,” in Proc. of Distributed Applications, Web Services, Tools and GRID Infrastructures for Bioinformatics (NETTAB2006),S. Margherita di Pula, Italy (http://www.nettab.org/2006/), pp.75-82 (2006)

Akihiko Konagaya: “OBIGrid: Towards the 'Ba' for Sharing Resources, Services and Knowledge for Bioinformatics”, in Proc. of Fourth International Workshop on Biomedical Computations on the Grid (BioGrid), Singapore (CCGRID 2006), 37 (2006)

A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006ご静聴ありがとうございました。

Thank You for Listening

top related