converting scripts into reproducible workflow research objects

43
Converting Scripts into Reproducible Workflow Research Objects Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros [email protected] Baltimore, Maryland, USA October 23-26, 2016

Upload: khalid-belhajjame

Post on 13-Apr-2017

52 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Converting scripts into reproducible workflow research objects

Converting Scripts into Reproducible Workflow Research Objects

Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros

[email protected]

Baltimore, Maryland, USA

October 23-26, 2016

Page 2: Converting scripts into reproducible workflow research objects

2

Background and Motivation

● Data-Intensive Experiments– Collection of scripts, programs and (big) data

Papers

Page 3: Converting scripts into reproducible workflow research objects

3

Background and Motivation

● Data-Intensive Experiments– Collection of scripts, programs and (big) data

Papers

How to understand, reproduce or reuse data and models of experiments?

Page 4: Converting scripts into reproducible workflow research objects

4

Background and Motivation

● Data-Intensive Experiments– Collection of scripts, programs and (big) data

Manual collection and organization of data provenance

Papers

How to understand, reproduce or reuse data and models of experiments?

Page 5: Converting scripts into reproducible workflow research objects

5

Background and Motivation

● Script-based experiments

What are the inputs and outputs?

How to change this local program for a similar web service?

Example of script code.

Difficult to understand, to reuse, and to reproduce.

Page 6: Converting scripts into reproducible workflow research objects

6

Background and Motivation

● Scientific Workflows

Example of Scientific Workflow Management System.

Page 7: Converting scripts into reproducible workflow research objects

7

CreateUnderstandReuseReproduce

Overview

Page 8: Converting scripts into reproducible workflow research objects

8

CreateUnderstandReuseReproduce

Overview

+

Page 9: Converting scripts into reproducible workflow research objects

9

CreateUnderstandReuseReproduce

Overview

+

Step 2

Step 1

Step 3

Step 4

Step 5

Methodology

Page 10: Converting scripts into reproducible workflow research objects

10

Related Work

● Script-language specific.● Workflow-engine specific.● A new language is needed.● Outcome is not an executable workflow.● Do not collect provenance data of the

conversion process.

Page 11: Converting scripts into reproducible workflow research objects

11

Two Kind of Experts

● Scientists– Domain experts who understand the experiment, and

the script (sometimes called user);

● Curators: – Scientists who are also familiar with workflow and

script programming or;

– Computer scientists who are familiar enough with the domain to be able to implement our methodology;

– Responsible for authoring, documenting and publishing workflows and associated resources.

Page 12: Converting scripts into reproducible workflow research objects

12

Requirements

● Produce workflow-like view of the script.● Create an executable workflow and compare

execution of workflow and script.● Modify the workflow resources.● Record provenance data.● Aggregate all resources to support

Reproducibility and Reuse.

1

2

3

4

5

Page 13: Converting scripts into reproducible workflow research objects

13

Requirements

● Produce workflow-like view of the script.1

Activity 1

Port 1 Port 2 Port 3

Port 1 Port 2

Activity 2

Port 3

Port 3

Activity n

Port n

Script-based experiment.

Abstract workflow.

Page 14: Converting scripts into reproducible workflow research objects

14

Requirements

● Create executable workflow and compare execution of workflow and script.

2

Executable workflow. Script-based experiment.

Page 15: Converting scripts into reproducible workflow research objects

15

Requirements

● Modify the workflow resources.3

Local

(a)

(b)

Algorithm A Algorithm B

Page 16: Converting scripts into reproducible workflow research objects

16

Requirements

● Record provenance data4

Activity 1

Output 1 Output 2

wasGeneratedBy wasGeneratedBy

Sampleused

“2012-06-01”

wasStartedAt

Activity 2used

LucasWorkflowRun

wasAssociatedWith

used

Page 17: Converting scripts into reproducible workflow research objects

17

Requirements

● Aggregate all resources to support Reproducibility and Reuse.

5

Abstract workflows

Concrete workflows

Annotations

Papers and Reports

Provenance

Authors

Scripts

Data

Page 18: Converting scripts into reproducible workflow research objects

18

Script

Generate Abstract Workflow

Generate Abstract Workflow

Create an executable workflow

Create an executable workflow

Refine workflowRefine workflow

Bundle Resources into a Research Object

Bundle Resources into a Research Object

Annotate and check qualityAnnotate and check quality

Abstract workflow

Concrete workflow

2

1

3

4

5

Methodology

Page 19: Converting scripts into reproducible workflow research objects

19

Workflow Research Object (WRO)

● Research Objects are semantically rich aggregations of resources that bring together data, methods and people in scientific investigations.

● WROs encapsulate scientific workflows and additional information regarding their context and resources. Research Object Model

Page 20: Converting scripts into reproducible workflow research objects

20

Running Example

● Molecular Dynamics Simulations– Many branches of material sciences, computational

engineering, physics and chemistry.

– Scripts (shell script), programs (NAMD, VMD, Fortran)

– Phases: set up, simulation and analysis of trajectories.

– Inputs: protein structure, simulation parameters and force field files.

– Output: trajectories and analysis results.

Page 21: Converting scripts into reproducible workflow research objects

21

StepGenerate Abstract Workflow

1

Script code.

Page 22: Converting scripts into reproducible workflow research objects

22

StepGenerate Abstract Workflow

1

Manuallyannotate

Script code.Annotated script code.

Page 23: Converting scripts into reproducible workflow research objects

23

StepGenerate Abstract Workflow

1

Manuallyannotate

Createworkflow-like

view

Script code.Annotated script code.

Abstract workflow.

Page 24: Converting scripts into reproducible workflow research objects

24

StepGenerate Abstract Workflow

1

code blocks

Input/ouput

YesWorkflowMcPhillips et. al, 2015

- Code comments- Tags: ● @begin● @end● @desc● @in● @out● ...

T. McPhillips et al. (2015), “Yesworkflow: A user-oriented, language-independent tool for recovering workflow information from scripts,” International Journal of Digital Curation, vol. 10, no. 1, pp. 298–313, 2015.

CreateWorkflow-like

view

Abstract workflow.

Annotated script code.

Page 25: Converting scripts into reproducible workflow research objects

25

StepGenerate Abstract Workflow

1

CreateWorkflow-like

view

Abstract workflow.

Annotated script code.

Page 26: Converting scripts into reproducible workflow research objects

26

StepCreate an executable workflow

2

Abstract workflow.

Page 27: Converting scripts into reproducible workflow research objects

27

StepCreate an executable workflow

2

Create implementationof activities

Copy code blocks from the script.

Abstract workflow.

Executable workflow.

Page 28: Converting scripts into reproducible workflow research objects

28

StepCreate an executable workflow

2

Create implementationof activities

Copy code blocks from the script.

Abstract workflow.

Executable workflow.

Page 29: Converting scripts into reproducible workflow research objects

29

StepCreate an executable workflow

2

Create implementationof activities

Copy code blocks from the script.

Abstract workflow.

Executable workflow.Script code.

Page 30: Converting scripts into reproducible workflow research objects

30

StepRefine executable workflow

3

Modify resources:● Algorithms● Data Sets● Parallelization● Web Services● ...

Executable workflow.New workflow version.

Page 31: Converting scripts into reproducible workflow research objects

31

StepRefine executable workflow

3

Create newversion

Modify resources:● Algorithms● Data Sets● Parallelization● Web Services● ...

Executable workflow.New workflow version.

Page 32: Converting scripts into reproducible workflow research objects

32

Steps

Record provenance data: execution traces.

2 3

wasEnactedBy

split

Output 1 Output 2wasGeneratedBy wasGeneratedBy

Sampleused

“2012-06-01”

wasStartedAt

psgen

used

LucasWorkflowRun

wasAssociatedWith

usedhasSpecification

W3C PROV

Executable workflow.

Page 33: Converting scripts into reproducible workflow research objects

33

Steps

Record provenance data: conversion process.

2 3

wasDerivedFrom

wasDerivedFrom

wasDerivedFrom

wasAssociatedWith

CuratorCurator

W3C PROV

Executable workflow.New workflow version.

Script code.

Page 34: Converting scripts into reproducible workflow research objects

34

StepAnnotate and check quality

● Annotations describing the workflow.● Use provenance data

– To check the quality of the conversion process.

● Run checks to verify the soundness of the workflow.

4

Page 35: Converting scripts into reproducible workflow research objects

35

StepAnnotate and check quality

4

Script code.

Executable workflow.

Page 36: Converting scripts into reproducible workflow research objects

36

StepAnnotate and check quality

4

Workflow version.

Initial Executable workflow.

Page 37: Converting scripts into reproducible workflow research objects

37

StepAnnotate and check quality

● Common mistakes during the conversion:– not clearly identified the main logical processing

units in the script;

– a mistake when migrating script code into the corresponding activity;

– not provided the correct input files and parameters;

– the coding of the workflow itself contained errors.

4

Page 38: Converting scripts into reproducible workflow research objects

38

StepBundle Resources into a Research Object

5

Script Abstract workflow

Concrete workflow(s)

Annotations

Paper

ProvenanceData

Attributions

Page 39: Converting scripts into reproducible workflow research objects

39

Contributions

● A methodology that guides curators in a principled manner to transform scripts into reproducible and reusable WRO;

● This addresses an important issue in the area of script provenance;

Page 40: Converting scripts into reproducible workflow research objects

40

Conclusions

● We addressed issues wrt understanding, reuse and reproducibility of script-based experiments.

● The methodology created was:– elaborated based on requirements;

– showcased via a real world use case from the field of Molecular Dynamics;

● We exploited tools and standards from the scientific community:– Scientific Workflows, YesWorkflow, Research Objects, the W3C

PROV recommendations and the Web Annotation Data Model.

● The bundle is available at http://w3id.org/w2share/s2rwro/

Page 41: Converting scripts into reproducible workflow research objects

41

Next Steps

● Evaluation using other case studies;● Evaluation of the cost of the effectiveness of

our methodology;● Extension of YesWorkflow to support the

semantic annotation of blocks;● Implementation of tools.

Page 42: Converting scripts into reproducible workflow research objects

42

Acknowledgments

● FAPESP (grant # 2014/23861-4)● CCES/CEPID (grant # 2013/08293-7)

– Center for Computational Engineering & Sciences

● LIS (Laboratory of Information Systems)● Prof. Munir Skaf and his group from Institute of

Chemistry - Unicamp.

Page 43: Converting scripts into reproducible workflow research objects

Converting Scripts into Reproducible Workflow Research Objects

Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros

[email protected]

Baltimore, Maryland, USA

October 23-26, 2016