bioinformatics workflows chris wroe (based on material from the mygrid team & may tassabehji /...
TRANSCRIPT
![Page 1: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/1.jpg)
Bioinformatics Workflows
Chris Wroe(based on material from the myGrid team &
May Tassabehji / Hannah Tipney
Medical Genetics, St Marys)
![Page 2: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/2.jpg)
Bioinformatics pipelines on the web
• Copying and pasting from one web based application to annotation by hand
• Advantages : quick, easy access to distributed resources
• Disadvantages: time consuming, error prone, tacit procedure so difficult to share both protocol and results
RepeatMasker BLASTn Twinscan
![Page 3: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/3.jpg)
Automating pipelines
• Using Perl/ Matlab scripts to implement a pipeline• Advantages : automation, quick to write,
significant community resources (e.g. BioPerl)• Disadvantages: hard to explain, hard to relocate,
hard to tinker with.
![Page 4: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/4.jpg)
WorkflowsRepeatMasker
Web service
BLASTnWeb Service
TwinscanWeb Service
Sequence in Predicted genes out
• Simple scripting language aims to specify how steps of a pipeline link together
• High level picture of the pipeline separated from any low level fiddling
• Application logic and low level fiddling encapsulated in remote web services
• Advantages : automation, quick to write, easier to explain, share, relocate, and record provenance of results in a standard way
![Page 5: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/5.jpg)
Workflow components in myGrid
• Scufl – Simple Conceptual Unified Flow Language– Developed by myGrid members at EBI.– Designed to be as simple as possible, just enough features to
support bioinformatics workflows
• Taverna – a tool for writing, running workflows and examining results.
(http://taverna.sourceforge.net)
• FreeFluo – workflow engine to run workflows (http://freefluo.sourceforge.net)
![Page 6: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/6.jpg)
![Page 7: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/7.jpg)
Workflow use
• Newcastle University (Anil Wipat, Peter Li)
– Affymetrix Microarray Analysis Workflow– Gene annotation workflow
• Manchester University May Tassabehji, PhD student Hannah Tipney, Medical
Gentics, St Marys (Wellcome Trust Funded)
– Gene alerting service workflow (GAS)– Gene and protein annotation workflow
• And others
![Page 8: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/8.jpg)
Workflow experience +
• Easy to get started with Taverna (1-2 hours tutorial)
• Sharing does happen• Cuts down the time taken to perform one
pipeline from 2wks to 2 hours
![Page 9: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/9.jpg)
Workflow experience: outstanding issues
• Early days: web services rare; significant time take to wrap applications as web services (licensing, installation, maintenance)– Soaplab and Gowlab try to help
(http://industry.ebi.ac.uk/soaplab)
• Fiddly bits don’t go away: Many ‘shim’ services needed to ensure the output of one step fits the expected input of another
• Automation produces many results in a short amount of time. Issues of result management and display
![Page 10: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/10.jpg)
Other workflow systems
• Commercial bioinformatics – drug discovery – Incogen VIBE– TurboWorx Pipeline Pilot
• eScience– DiscoveryNet (bioinformatics – proprietary)– Keppler ( US ecology)– Triana (UK Physics astronomy, signal
processing)
![Page 11: Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649f3e5503460f94c5f066/html5/thumbnails/11.jpg)
Workflow standards
• Can’t have enough of them! All currently come from e-Business rather than science community
• BPEL – Business Process Execution Language• WS – Orchestration• XML Process Definition Language (XPDL)• Business Process Markup Language (BPML)