my grid and taverna: now and in the future
DESCRIPTION
my Grid and Taverna: Now and in the Future. Dr. K. Wolstencroft University of Manchester Helsinki, June 2006. Background. my Grid middleware components to support in silico experiments in biology Originally designed to support bioinformatics chemoinformatics health informatics - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/1.jpg)
myGrid and Taverna:Now and in the Future
Dr. K. Wolstencroft
University of Manchester
Helsinki, June 2006
![Page 2: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/2.jpg)
Background
• myGrid middleware components to support in silico experiments in biology
• Originally designed to support bioinformatics
chemoinformatics
health informatics
medical imaging
integrative biology
![Page 3: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/3.jpg)
History
EPSRC funded UK eScience Program Pilot Project
![Page 4: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/4.jpg)
myGrid in OMII-UK
OGSA-DAI
myGrid
OMII Stack
March 2006
10 DevelopersDedicated design, implementation, testing and support team – moving towards production quality software
![Page 5: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/5.jpg)
Lots of Resources
NAR 2006 – over 850 databases
![Page 6: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/6.jpg)
The User Community
Bioinformatics is an open Community• Open access to data• Open access to resources• Open access to tools• Open access to applications
Global in silico biological research
![Page 7: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/7.jpg)
The User Community Problems
• Everything is Distributed
– Data, Resources and Scientists
• Heterogeneous data • Very few standards
– I/O formats, data representation, annotation – Everything is a string!
Integration of data and interoperability of resources is difficult
![Page 8: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/8.jpg)
ID MURA_BACSU STANDARD; PRT; 429 AA.DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASEDE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINEDE ENOLPYRUVYL TRANSFERASE) (EPT).GN MURA OR MURZ.OS BACILLUS SUBTILIS.OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;OC BACILLUS.KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY).FT CONFLICT 374 374 S -> A (IN REF. 3).SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
![Page 9: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/9.jpg)
myGrid Approach - Workflows
General technique for describing and enacting a process
describes what you want to do, not how you want to do it
Simple language specifies how bioinformatics processes fit together – processes are web services
- High level workflow diagram separated from any lower level coding – therefore, you don’t have to be a coder to build workflows
RepeatMasker
Web service
GenScanWeb Service
BlastWeb Service
Sequence Predicted Genes out
![Page 10: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/10.jpg)
Freefluo Workflow enactor
Scufl + Workflow Object Model
Processor Processor
PlainWeb
Service
Soaplab
Processor
LocalApp
Processor
Enactor
TavernaWorkbench
Processor
BioMOBY
Processor
SeqHound
Processor
BioMART
SCUFL
Application data flow layerScufl graph + service introspection
Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates
Processor invocation layer
Workflow Execution
![Page 11: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/11.jpg)
Taverna Workflow Components
Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available
Freefluo Workflow engine to run workflows
Freefluo
SOAPLABWeb Service
Any Application
Web Service e.g. DDBJ BLAST
![Page 12: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/12.jpg)
What Services we Support
![Page 13: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/13.jpg)
User Interaction Handling
• Interaction Service and corresponding Taverna processor allows a workflow to call out to an expert human user
• Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline
Collaboration with the University of Bergen
Ref: Poster, Nettab 2005
• R for numerical analysis (microarray informatics amongst others)
![Page 14: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/14.jpg)
What shall I do when a service fails?
• Most services are owned by other people• No control over service failure• Some are research level
Workflows are only as good as the services they connect!
To help - Taverna can:• Notify failures• Instigate retries• Set criticality• Substitute services
![Page 15: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/15.jpg)
myGrid Users
• ~20000 downloads
• Users in US, Singapore, UK, Europe, Australia
• Systems biology• Proteomics• Gene/protein annotation• Microarray data analysis• Medical image analysis
![Page 16: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/16.jpg)
Trypanosomiasis Study
Resistance to trypanosomiasis in cattle in Kenya
Andy Brass, Paul Fisher – University of Manchester
•Form of Sleeping sickness in cattle –
Known as n’gana
•Caused by Trypanosoma brucei
![Page 17: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/17.jpg)
Study involves
Microarray data
QTL
SNPs
Metabolic pathway analysis
Need to access microarray data, genomic sequence information, pathway databases AND integrate the results
![Page 18: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/18.jpg)
![Page 19: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/19.jpg)
Workflow Reuse
Addisons Disease
SNP design
Protein annotation
Microarray analysis
myGrid Workflow Repository
http://workflows.mygrid.org.uk/repository
![Page 20: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/20.jpg)
Scufl Workflows + Taverna Workflow Workbench
OGSA-Distributed Query Processing
Results management
LSID
mIR
e-Science coordination e-Science mediator
e-Science process patterns
e-Science events
Notification service
Components designed to work together
myGrid information model
Metadata & provenance management using semantics
KAVE
Service management
Publication and Discovery using semantics
Feta
Pedro
Ontology
Portal & Application tools
![Page 21: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/21.jpg)
Data Management
• Workflows can generate vast amount of data - how can we manage and track it?
• Data AND metadata AND experiment provenance• LSIDs - to identify objects• Semantic Web technologies (RDF, Ontologies)
– To store knowledge provenance
• Taverna workflow workbench & plugins– Ensure automated recording
![Page 22: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/22.jpg)
KAVE Data and metadata management
• Life Science Identifiers (LSIDs)• Information Model• File management• Support for custom database
building• Provenance metadata capture
using RDF• SRB integration• OGSA-DAI integration
urn:data:f2
urn:data:f2
urn:data1urn:data1
urn:data2urn:data2
urn:compareinvocation3urn:compareinvocation3
urn:data12
urn:data12
Blast_report
[input]
[output]
[input]
[distantlyDerivedFrom]
SwissProt_seq
[instanceOf]
Sequence_hit
[hasHits]
urn:hit2….
urn:hit2….
urn:hit1…urn:hit1…
urn:hit50…..
urn:hit50…..
[instanceOf]
[similar_sequence_to]
Data generated by services/workflows
Concepts
[ ]
[performsTask]
Find similar sequence[contains]
Services
urn:data:3urn:data:3
urn:hit8….
urn:hit8….
urn:hit5…urn:hit5…
urn:hit10…..
urn:hit10…..
[contains]
[instanceOf]
urn:BlastNInvocation3urn:BlastNInvocation3
urn:invocation5urn:invocation5urn:data:f1
urn:data:f1
[output]
New sequence
Missed sequence
[hasName] [hasName
]
literalsDatumCollection
[type]
LSDatum
[type]Properties
[instanceOf]
[output]
[output]
[directlyDerivedFrom]
![Page 23: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/23.jpg)
Provenance Browsing in Taverna
New in Taverna 1.4
![Page 24: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/24.jpg)
Feta Semantic Discovery
Over 3000 services!
Find services by their function
Questions we can ask:
Find me all the services that perform a multiple sequence alignment And accepts protein sequences in FASTA format as input
![Page 25: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/25.jpg)
Upper level ontology
Task ontology
Informatics ontology
Molecular Biology ontology
Bioinformatics ontology
Web Service ontology
Specialises
Contributes to
sequence
biological_sequence
protein_sequence
nucleotide_sequence
DNA_sequence
protein_structure_feature
BLASTp service
Similarity Search Service
BLAST service
InterProScan service
myGrid Ontology
![Page 26: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/26.jpg)
Feta Architecture
Feta EngineService
Semantic Discovery
Taverna Workbench
Feta G
UI C
lient
DL ReasonerOntology Editor
Ontologist
User Classification
- In RDF(S) - Build myGrid Domain Ontology
Obtain descriptions
Obtain Classification
3
3
4
Feta Descriptions
Feta Descriptions
Feta Descriptions
![Page 27: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/27.jpg)
Annotations
• Feta has been available for ~1 year• Not yet in the release• Need critical mass of services before release
• Annotation experiments with users and domain experts
• Domain expert annotations much better – – hiring a full-time annotation – see the myGrid website for details
![Page 28: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/28.jpg)
Gene annotation pipeline workflow Integration and visualisation of GD annotation workflow results
Provenance Record
Custom Data Model
Input
Result
Results Integration
Smarter workflow design incorporating visualisation VBI collaboration
![Page 29: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/29.jpg)
Utopia
SeqVista
Visualisation
![Page 30: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/30.jpg)
New Plans for Taverna 2.0
![Page 31: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/31.jpg)
Evolving challenges
• Long running data intensive workflows• Manipulation of confidential or otherwise protected
information• Use with classical grid systems• Interaction with users during workflows
![Page 32: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/32.jpg)
Development
• Development of Taverna 2.0– reworking of the processor model to include duel
execution semantics incorporating data and control flow
– enhanced support for long-running workflows• fully distributed workflow enactment and authoring • User steering
– large scale data transfer
![Page 33: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/33.jpg)
Enhanced Processor Model
• Modular dispatcher mechanism– Dynamic service binding– Recursive invocation– Data filter implementation– Retry, failover, back-off behaviours
• Transparent third party data transfers• High throughput stream handling with implicit iteration
semantics
![Page 34: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/34.jpg)
3rd Party Data Transfers
• Allows ‘in place’ referencing of data – Large data sets no longer round-trip between workflow engine
and data provider– Allows restricted access to sensitive data
• Automatic de-reference when a reference type is linked to a value type within a workflow.– Connecting a grid service to a web service
![Page 35: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/35.jpg)
Streaming Data
• Allow execution of downstream workflow stages on partially complete results from upstream.
Service 1 Service 2 Service 3
Non streaming (Taverna 1), entire iteration must complete at each stage
Streamed data, Service 2 starts operating on partial results from Service 1
![Page 36: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/36.jpg)
Recursive Invocation
• Dispatcher allowing recursive invocation to be plugged into per operation semantics.
Test Forcompletion
Invokeoperation
ModifyInput Set
GatherResult Set
Return Result
ReceiveInput
![Page 37: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/37.jpg)
Future Direction
• Enhancements to the Workflow Core• Enhancements to user interface and experience• Expanded use of semantic web technologies
• Code remains open source and always will
![Page 38: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/38.jpg)
Latest News
• See plans for Taverna 2.0 on myGrid wiki• Taverna development is user-driven
– Please keep in touch and tell us what you would like to see by the myGrid mailing lists: Taverna Users, Taverna Hackers
• Bioinformatics curator for service annotation
Details on the myGrid website
![Page 39: my Grid and Taverna: Now and in the Future](https://reader035.vdocuments.us/reader035/viewer/2022062423/5681492a550346895db6615f/html5/thumbnails/39.jpg)
Acknowledgements
• The myGrid group – Past and Present• OMII-uk
• Carole Goble• Pinar Alper• Tom Oinn• Antoon Goderis• Matthew Gamble• Daniele Turi