taverna the story from up-above
DESCRIPTION
Taverna the story from up-above. Antoon Goderis The University of Manchester, UK. http://www.mygrid.org.uk/taverna http://www.omii.ac.uk. DART workshop, Brisbane, Australia, 14 December 2006. Overview. The situation in –omics Creating new biology using Taverna Taverna Key traits - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/1.jpg)
Tavernathe story from up-aboveAntoon Goderis
The University of Manchester, UKhttp://www.mygrid.org.uk/tavernahttp://www.omii.ac.uk
DART workshop, Brisbane, Australia, 14 December 2006
![Page 2: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/2.jpg)
2
Overview The situation in –omics Creating new biology using Taverna Taverna
Key traits Features on the OMII roadmap
Including today’s release
![Page 3: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/3.jpg)
3
Bioinformaticians & co.
![Page 4: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/4.jpg)
4
Open environmentData, Data, Data
EBI
SeqHoundSRS
National Center for Biotechnology Information (USA)
Cambridge, UKTokyo, Japan
![Page 5: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/5.jpg)
5
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
![Page 6: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/6.jpg)
6
The situation in {genomics, transcriptomics, proteomics,
metabolomics ..} Lots of data Lots of parameters to choose An analysis takes a long time The analysis services are unreliable Lots of analysis steps Need to record and explain your steps
![Page 7: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/7.jpg)
7
Enter workflows Lots of data
[high throughput] Lots of parameters to choose
[best practice] An analysis takes a long time
[long running] The analysis services are unreliable
[fault tolerance] Lots of analysis steps
[data and control flow] Need to record and explain your steps
[provenance]
![Page 8: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/8.jpg)
8
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg
Workflow-based middleware
![Page 9: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/9.jpg)
9
myGrid myGrid http://www.mygrid.org.uk UK e-Science pilot project since 2001 Part of the Open Middleware Infrastructure Institute UK Build middleware for Life Scientists that enables them
to undertake in silico experiments and share those experiments and their results.
Individual scientists, in under-resourced labs, who use other people’s applications.
Open source. Workflows & Semantic Techologies for metadata
management. Data flows. Ad hoc & exploratory
![Page 10: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/10.jpg)
10
Overview The situation in -omics Creating new biology using Taverna Taverna
Key traits Features on the OMII roadmap
Including today’s release
![Page 11: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/11.jpg)
11
?200
Microarray + QTL
Genes captured in microarray experiment and present in QTL region
Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping
Genotype Phenotype
[Andy Brass, Steve Kemp, Paul Fisher, 2006]
![Page 12: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/12.jpg)
12
Key:
A – Retrieve genes in QTL region
B – Annotate genes with external database Ids
C – Cross-reference Ids with KEGG gene ids
D – Retrieve microarray data from MaxD database
E – For each KEGG gene get the pathways it’s involved in
F – For each pathway get a description of what it does
G – For each KEGG gene get a description of what it does
[Andy Brass, Steve Kemp, Paul Fisher, 2006]
![Page 13: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/13.jpg)
13
Result Captured the pathways returned by QTL and
Microarray workflows over the MaxD microarray database
Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance.
Manually analysis on the microarray and QTL data had failed to identify this gene as a candidate.
[Andy Brass, Steve Kemp, Paul Fisher, 2006]
![Page 14: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/14.jpg)
14
Trichuris muris (mouse whipworm) infection
Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite.
Manual experimentation: Two year study of candidate genes, processes unidentified
Workflows: trypanosomiasis cattle experiment, was reused without change.
Analysis of the data by a biologist found the processes in a couple of days.
[Joanne Pennock, Paul Fisher, 2006]
![Page 15: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/15.jpg)
15
Changing scientific practice Systematic and comprehensive automation.
Eliminated user bias and premature filtering of datasets and results leading to single sided, expert-driven hypotheses
Dry people hypothesise, wet people validate. “make sense of this data” -> “does this make sense?”
Workflow factories. Different dataset, different result
Accurate provenance.
![Page 16: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/16.jpg)
16
Overview The situation in -omics Creating new biology using Taverna Taverna
Key traits Features on the OMII roadmap
Including today’s release
![Page 17: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/17.jpg)
17
User Uptake ~25000 downloads Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis Heart simulations High throughput
screening Phenotypical studies Plants, Mouse, Human Astronomy Dilbert Cartoons
![Page 18: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/18.jpg)
18
Finding and Sharing Tools
Taverna Workbench 3rd Party Applications and
Portals
WorkflowEnactor
Service Management
Results Management
ProvenancelogMetadata
DefaultDataStore
CustomStore
DAS
KAVE BAKLAVA
Feta
myExperiment
Utopia
ClientsClients
LSIDs
Workflow enactor
![Page 19: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/19.jpg)
19
Taverna workbench
![Page 20: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/20.jpg)
20
3000+ services Open domain services and
resources, Third party. Enforce NO common data model. No common typing, Missing
metadata.
Soaplab InstantSoap
![Page 21: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/21.jpg)
21
Services Landscape
![Page 22: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/22.jpg)
22
User Interaction Allows a workflow to call
out to an expert human user
E.g. Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline
[University of Bergen]
![Page 23: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/23.jpg)
23
Tools, Tools, Tools
Feta Search tool
Pedro Annotation tool
![Page 24: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/24.jpg)
24
Capture and Curation Effort
Ontology and Annotation Curation Team
Franck Tanoh and Katy Wolstencroft
Community Service Providers
Community Scientists
![Page 25: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/25.jpg)
25
Scufl Model
TavernaWorkbench
Shielding & Extensible
plug-ins
Workflow Execution
Application
Workflow enactor
Processor Processor
PlainWeb
Service
Soaplab
Processor
LocalJava App
Processor
WFEnactor
Processor
BioMOBY
Processor
SeqHound
Processor
BioMART
Processor
WSRF
Processor
Beanshell
Simple Conceptual Unified Flow Language
Nested workflows, Automatic iterations,Best guess data type handling
![Page 26: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/26.jpg)
26
Service incompatibility Fix up the services to be compatible or…. Shims – libraries of adapters. Automated data type matching using reasoning over
a mismatch and service ontology
Duncan Hull, myGridKhalid Belhajjame, ISPIDER
![Page 27: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/27.jpg)
27
Shimidentification
Mismatchdetection
![Page 28: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/28.jpg)
28
Service failure? Most services are owned by other people No control over service failure Some are research level
Workflows only as good as the services they connect. Notify failures Instigate retries Set criticality Substitute services
![Page 29: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/29.jpg)
29
Provenance Collection Observes events from
the workflow engine Populates an RDF triple
store with information from these events
Browse interface Simple browser replicates
Taverna’s existing result and status browser
Graphical browser ProQA Query API
urn:data:f2
urn:data:f2
urn:data1urn:data1
urn:data2urn:data2
urn:compareinvocation3urn:compareinvocation3
urn:data12
urn:data12
Blast_report
[input]
[output]
[input]
[distantlyDerivedFrom]
SwissProt_seq
[instanceOf]
Sequence_hit
[hasHits]
urn:hit2….
urn:hit2….
urn:hit1…urn:hit1…
urn:hit50…..
urn:hit50…..
[instanceOf]
[similar_sequence_to]
Data generated by services/workflows
Concepts
[ ]
[performsTask]
Find similar sequence
[contains]
Services
urn:data:3urn:data:3
urn:hit8….
urn:hit8….
urn:hit5…urn:hit5…
urn:hit10…..
urn:hit10…..
[contains]
[instanceOf]
urn:BlastNInvocation3urn:BlastNInvocation3
urn:invocation5urn:invocation5urn:data:f1
urn:data:f1
[output]
New sequence
Missed sequence
[hasName] [hasName
]
literalsDatumCollection
[type]
LSDatum
[type]Properties
[instanceOf]
[output]
[output]
[directlyDerivedFrom]
[Zhao et al 07 provenance challenge paper]
![Page 30: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/30.jpg)
30
![Page 31: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/31.jpg)
31
Provenance Tracking
From which Ensembl gene does pathway mmu004620 come from?
![Page 32: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/32.jpg)
32
Pathway_id KEGG_id Uniprot Ensembl_gene_id
Entrez
dF
dF
dF dF
Workflows over Results
Automatically backtrack through the data provenance graph
![Page 33: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/33.jpg)
33
A workflow marketplace
![Page 34: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/34.jpg)
34
webTaverna GUI - main
![Page 35: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/35.jpg)
35
Overview The situation in -omics Creating new biology using Taverna Taverna
Key traits Features on the OMII roadmap
Including today’s release
![Page 36: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/36.jpg)
36
Ingest Ingest
Early adoptersPioneers
Pioneers ConservativesEarly adoptersPioneers
myGridPre-release
myGrid Release
OMII-UKRelease
Software Engineering
XP
Software Engineering
Quality & Test
Evaluation Evaluation OMII Software Engineering
Quality & TestPrioritise & Plan
Prioritise & Plan
Production Applications & Professional ServicesApplications & Professional Services
myGridAlliance
myGridAlliance
Source-forgecommunity
Source-forgecommunity
![Page 37: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/37.jpg)
37
Who are the OMII Users?
Increasing variation in requirements with the scientific domain.
Different scientific/research domains
End Users
Application Developers
Service and Middleware Developers
Middleware Deployers
Diff
ere
nt a
ctivitie
s
Systems Administrators
![Page 38: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/38.jpg)
38
Taverna is now part of OMII-UK Taverna 1.5 – Today! Taverna 1.6 myExperiment
![Page 39: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/39.jpg)
39
Integrated provenance Raven release mechanism to simplify updates
for the user +/- 300 semantic annotations for core services Patterns for using proxies for bulk data
transactions Redeveloped plug in and enactor framework,
improved iteration events, data management
Taverna 1.5
![Page 40: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/40.jpg)
40
Integrated provenance
Taverna 1.5
![Page 41: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/41.jpg)
41
Integrated provenance Raven release mechanism to simplify updates for the
user
Taverna 1.5
![Page 42: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/42.jpg)
42
Integrated provenance Raven release mechanism to simplify updates for the
user +/- 300 semantic annotations for core services
Add_ncbi_to_string : beanshell script, need to ask Paul for more detailsInput:Output:
Kegg_gene_ids_all_species (bconv): converts external IDs to KEGG IDs [mapping]string: External ID . e.g. NCBI ID [Genebank_GI] return: KEGG gene ID [KEGG_record_id]
Get_pathways_by_genes: Search all pathways which include all the given genes [Searching]Input: List of KEGG genes id [KEGG_gene_id]Output: Return a list of pathway_id of specified KEGG genes_id
Merge_pathwaysStringlistConcatenated
This workflow takes in Entrez gene ids then adds the string "ncbi-geneid:" to the start of each gene id. These gene ids are then cross-referenced to KEGG gene ids. Each KEGG gene id is then sent to the KEGG pathway database and its relevant pathways returned.
Taverna 1.5
![Page 43: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/43.jpg)
43
Integrated provenance Raven release mechanism to simplify updates for the
user +/- 300 semantic annotations for core services Patterns for using proxies for bulk data transactions Redeveloped plug in and enactor framework, improved
iteration events, data management
Taverna 1.5
![Page 44: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/44.jpg)
44
Taverna 1.6 Due out Summer 2007
Revised enactment core Native support for long running workflows Data proxy to deal with bulk data transactions Improved service discovery and provenance
management
![Page 45: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/45.jpg)
46
Obtaining Taverna Taverna is available under the LGPL from our
project site on Sourceforge.net http://taverna.sourceforge.net
Win32, Solaris / Linux & OS-X Includes online and downloadable user
manual, examples etc. Support via project mailing lists
![Page 46: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/46.jpg)
47
Conclusions See plans for Taverna 2.0 on myGrid wiki Taverna development is user-driven
Please keep in touch and tell us what you would like to see by the myGrid mailing lists: Taverna Users, Taverna Hackers
Taverna http://taverna.sourceforge.netmyGrid http://www.mygrid.org.ukOMII-UK http://www.omii.ac.uk
![Page 47: Taverna the story from up-above](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814790550346895db4c1e2/html5/thumbnails/47.jpg)
48
Phase1 myGrid researchers, Phase2 OMII-UK, myGrid Research Team
Peter Li, Paul Fisher, Andy Brass, Robert Stevens, Mark Wilkinson
EPSRC, Wellcome Foundation, EU
Acknowledgements