Download - Workflow discovery in e-science
![Page 1: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/1.jpg)
Workflow discovery in e-science
Antoon Goderis Peter Li Carole Goble
University of Manchester, UK
www.cs.man.ac.uk/~goderisa
![Page 2: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/2.jpg)
Agenda
• Web services in science
• Workflow re-use
• Workflow discovery
– Is workflow discovery a new problem?
– How do people match up workflows?
– Can we replicate the behaviour with tools?
• Conclusions
![Page 3: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/3.jpg)
Workflows Web services
BPEL, SCUFL, MOML, VDL … descriptions
SOAP, WSDL description
Workflow engine Readily invoked
Orchestrates (Web-) services
Can be published as Web service
![Page 4: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/4.jpg)
Science is highly distributed and connected
![Page 5: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/5.jpg)
The Web has revolutionised science
![Page 6: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/6.jpg)
Web services about to do the same?
![Page 7: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/7.jpg)
Scientific workflows• e-science = supporting scientists to encode,
enact, explain and share experimental procedures featuring lots of specialised data
• Case study: bioinformatics – Understanding the DNA to behaviour link
– 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna
– Re-use and repurposing of workflows
– +/- 200 Taverna workflows shared at fffff
![Page 8: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/8.jpg)
![Page 9: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/9.jpg)
Scientific workflows• e-science = supporting scientists to encode,
enact, explain and share experimental procedures
• Case study: bioinformatics – Understanding the DNA to life link
– 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna
– Re-use and repurposing of workflow fragments
– +/- 200 Taverna workflows shared at fffff
![Page 10: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/10.jpg)
Manchester, CS dept
Manchester Biology dept
Newcastle, CS dept
![Page 11: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/11.jpg)
Scientific workflows• e-science = supporting scientists to encode,
enact, explain and share experimental procedures
• Case study: bioinformatics – Understanding the DNA to life link
– 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna
– Re-use and repurposing of workflow fragments
– +/- 200 Taverna workflows shared at www.myExperiment.org
![Page 12: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/12.jpg)
![Page 13: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/13.jpg)
One + Three questions1. Can’t we just do it with ?
• Keyword search doesn’t seem to cut it
1. Is workflow discovery a new problem?
2. How do people match up workflows?
3. Can we replicate the behaviour with tools?
![Page 14: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/14.jpg)
my current workflow myExperiment.org
![Page 15: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/15.jpg)
my current workflow myExperiment.org
?
![Page 16: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/16.jpg)
1. Is workflow discovery a new problem?
Service discovery Workflow discovery
Discovery goal Encapsulate found service
Edit found workflow
Matching process Match over signature
Match over signature and content (data and service flow)
Starting context Service or data Service or data or workflow
Source: survey of 21 myGrid/Taverna users
![Page 17: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/17.jpg)
1. Is workflow discovery a new problem? Yes
Service discovery Workflow discovery
Discovery goal Encapsulate found service
Edit found workflow
Matching process Match over signature
Match over signature and content (data and service flow)
Starting context Service or data Service or data or workflow
Workflow discovery subsumes service discovery
![Page 18: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/18.jpg)
2. How do people match up workflows?
?
![Page 19: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/19.jpg)
3. Can we replicate the behaviour with tools?
?+
1
2
3
...
1
2
3
![Page 20: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/20.jpg)
A user experiment with bioinformatics workflows
?+
![Page 21: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/21.jpg)
Workflow discovery task
• Can I sensibly adapt an existing experimental procedure (workflow) with another one?
• Extend Replace
+
?
![Page 22: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/22.jpg)
Workflow corpus
• 66 similar workflows for Graves’ disease done by single author
• 1 + 5 workflows
• Workflow diagram
• No documentation
• No annotation
1 + 5
![Page 23: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/23.jpg)
By the experts, for the experts
• 9 bioinformaticians and 4 developers at a Taverna training day
![Page 24: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/24.jpg)
Matching strategies
• Matching input workflow with 5 others1 2
3 4
5
?
![Page 25: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/25.jpg)
Human on-line matching strategies!
• Traits
• Scores of attraction
• Yes or no
![Page 26: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/26.jpg)
Matching strategy: traits
Men want.. Women want..
Short term relationship
Long term relationship
Slim Tall
Students, artists, musicians, veterinarians
Lawyers, financial execs, firemen
Blonde Hair or shaved
Medium income High income
From an analysis of 30 000
profiles
![Page 27: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/27.jpg)
Matching strategy: scoring
Confidencelevel
Score
Percentile
www.AmIHotOrNot.com
![Page 28: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/28.jpg)
Matching strategy: yes or no
![Page 29: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/29.jpg)
Traits
• Predicted trait Biological subtask
Biological supertask
Shared inputs + outputs
Same service type
Shared service compositions
Shared path between intermediary input and output
![Page 30: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/30.jpg)
Traits and score
• Predicted trait
• Score of similarity, usefulness and confidence
E.g. [1 Identical –
9 Not similar]
Biological subtask
Biological supertask
Shared inputs + outputs
Same service type
Shared service compositions
Shared path between intermediary input and output
![Page 31: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/31.jpg)
The gold standard
?• The collection of
workflow similarity assessments
• Predictive traits, possibly interacting
1 + 5
Traits/score
![Page 32: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/32.jpg)
2. How do people match up workflows?
• Difficulty of task
– Biological relationship very difficult for 6 out of 9
– Shape similarity difficult for 4 out of 13
– Medium confidence
• Consistency
– Inter participant disagreement on how to order biological similarity and shape similarity [Spearman rank order test]
• Predictive traits
– No one trait dominant between and within participants [Levene homogeneity of variance test]
![Page 33: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/33.jpg)
Can we do better?
• Simpler tasks and workflows
• Taverna experienced users
• Workflow documentation and annotation
• Other factors in use, e.g. size difference
– Fix allowed factors
– Adopt black box approach: yes/no matching
![Page 34: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/34.jpg)
Automated discovery technique• Unattributed graph matcher implementation by
Messmer and Bunke
– Sub-isomorphism detection; exponential time complexity
– DAGs and optimization for repository of graphs
• Workflows parsed as graphs
– Workflow input, workflow output andintermediate services as nodes
– Data links as edges
•probeSetid
AffyMapper_seq databaseid
Blastx
Results_Blastx
![Page 35: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/35.jpg)
• Ranking based on
– shared nodes
– difference in size between input graph and repository graphs
Automated discovery technique
![Page 36: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/36.jpg)
3. Can we replicate the behaviour with tools? Kind of..
Average similarity assessments across participants
?+
1 + 66
Traits/score
![Page 37: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/37.jpg)
Current work
?+
1
2
3
...
1
2
3
12 + 21
Yes/no
Text clustering
OWL workflow ontology
Precision / recall
Graph matching
![Page 38: Workflow discovery in e-science](https://reader035.vdocuments.us/reader035/viewer/2022062314/56814992550346895db6d76d/html5/thumbnails/38.jpg)
Take home • Scientists compose Web services for real – and
share their results
• Workflow discovery is a real problem, which subsumes service discovery
• A range of matching strategies and techniques apply
• Evaluation is a challenge - gold standards hard to build
• Come and play at myExperiment.org
• References at www.cs.man.ac.uk/~goderisa