marco roos: newton's ideas and methods are preserved forever: how about yours?

45
Newton's ideas and methods are preserved forever: how about yours? Marco Roos, Kristina Hettne, Jun Zhao, Mark Thompson Cloud and Workflows for Reproducible Bioinformatics Shenzhen, December 19, 2012

Upload: gigascience-bgi-hong-kong

Post on 28-Jan-2015

104 views

Category:

Documents


0 download

DESCRIPTION

Marco Roos talk at ISCB-Asia: Newton's ideas and methods are preserved forever: how about yours? December 19th 2012

TRANSCRIPT

Page 1: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Newton's ideas and methods are preserved forever:

how about yours?

Marco Roos, Kristina Hettne, Jun Zhao, Mark Thompson

Cloud and Workflows for Reproducible Bioinformatics

Shenzhen, December 19, 2012

Page 2: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Digital preservation for the modern scientist 2

Page 3: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

3

Reproduced workflows

Power & Mass Web Service

Mass

Acceleration

Force

Page 4: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Genome Wide Association Studies

What is the genetic basis for the diseases associated with

Metabolic Syndrome?

Case studyBioinformatics analysis of Metabolic SyndromeKristina Hettne, Harish Dharuri

Page 5: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Reproducible Science

Preservation for the wet laboratory scientist

From Van Roon-Mom et al., BMC Molecular Biology 2008 doi: 10.1186/1471-2199-9-84.

Page 6: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Reproducible Science?

What is the digital equivalent?

Is it equally good?

Can we do better?- or worse?

Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,http://biosemantics.org , myExperiment.org/workflows/2197

GroundHog DB

Page 7: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

7

Reproducible Science

What is our incentive?

Nobility

Good Reproducible Science

Greater Good

Serve the public

Page 8: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

8

Reproducible Science

What is our incentive?

Fame and Glory

Getting on with it...

I’ll be the first in Nature

Page 9: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

9

Stimulate preservation and reproducibility while speeding up the research process

CHALLENGE

Page 10: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

10

SW

Enhance the research cycle

What slows us down?

Research Question

Find Methods and Data, + their Owners

Get Methods and Data

Understand Methods and

Data

Format (Align)Data

Design the

Analysis

Interpret Results

PublishCompute

Page 11: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

11

Bottlenecks

• Loosing track of what you did

• Messy storage

• Preparing material for a publication

• Understanding the computational procedure

• Communication with (non-technical) colleagues

• Keeping tools working

• Getting credit for digital results outside of traditional publications

Page 12: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

12

Getting on with workflows

Page 13: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monolithic Tool → Web Services → Workflows → (Web) ToolExample: Anni 2.0 → Anni workflows

AnniWF

http://workflow.biosemantics.org/t2web/workflow/2725

Page 14: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Digital RepositorymyExperiment.org

The recipes store

• Find workflows• Share workflows & files• Find people• Build communities• Publish packages• Tag workflows• Score, rate, comment

Page 15: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

15

Instructions for workflow authors10 Best Practices for creating workflows

1. Make a sketch workflow2. Use modules3. Think about the output4. Provide example inputs and outputs5. Annotate6. Test execution from outside local

environment7. Choose services carefully8. Reuse existing workflows9. Advertise10. Maintain

10/10

Page 16: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Reproducible ScienceIs a workflow sufficient?

Useful Preservation =

Understandable Objects

Reproduce, Reuse, Repurpose, Repair, ...

Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,http://biosemantics.org , myExperiment.org/workflows/2197

What is this doing?

Page 17: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

17

Useful preservation 1myExperiment Packs

Hypothesis document

MedLine abstracts until

2009

“Workflow to display supporting documents” “SNPs

from

GWAS”

“Experime

nt sketch”

Page 18: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

18

Useful preservationResearch Object Model

http://wf4ever.github.com/ro/

Research Object ModelAggregation and Annotation Model for Digital Methods

Page 19: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Research Object (RO) Model

RO = ORE + AO + vocabulariesObject Re-use and Exchange (OAI-ORE)

Describes aggregations of resources:data, metadata, papers, etc.

Annotation Ontology (AO)Associates RDF metadata descriptions with resources

Generic and domain-specific vocabulariesUsed in annotation bodies to provide information

about resources (types, dependencies, descriptions, etc.)

Builds on RDF, leading to RDF as a natural implementation choice

Model specification: http://wf4ever.github.com/ro/

Page 20: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Research Object Model

Page 21: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Research Object: “Hello World”

https://github.com/wf4ever/ro-catalogue/tree/master/v0.1/HelloWorld

Page 22: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

22

Help organize the materials and methods of computational analysisResearch Object Portal

Materials & Methods ofMetabolic Syndrome

AnalysisKristina HettneHarish Dharuri

http://sandbox.wf4ever-project.org/portal

Page 23: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

23

Expected on myExperiment

Research Objects inside!

• Packs more prominent

• Start a pack when you upload a workflow

• Upload wizards, pack management, export

• Checklists, automated star ratings

• Add workflow runs and example data

• Sticky annotationsRO-enabled myExperiment mockup

Page 24: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

24

Fame and Glory

It was me, me,

me!What

I found

How I found

it

HDAC1 interacts with Parvb

Discovered by: mePublished by: me

Research Object

Page 25: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

25

Nanopublication ModelGetting credit for digital results

Assertion Provenance

Nanopublication ID

Supporting Attribution

assertion

opm:was

DerivedFrom

http://rdf.biosemantics.

org/…profiles_matching_1980_2010

opm:wasGen

e-ratedBy

Integrity Key

thisnanop

ub

dcterms:created

2012-03-28T11:32^̂ xsd:dat

eTime

pav:authore

d-By

associa-tion

issio:statis-

ticalAssociation

sio:has-measurementVal

ue

Association_1_p_value

is

Sio:probability-value

sio:has-value

6.56 e-5

^̂ xsd:float

sio:refers-to

http://bio2rdf.org/omim:210

600

researcherid.com/rB-

6035-2012

dcterms:

DOI

http://dx.doi.org/

….

…http://bio2rdf.org/geneid:558

35

Page 26: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

26

Nanopub.org

Page 27: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

27

Examples

Page 28: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

28

Examples in RDF format

Page 29: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

29

Validator

Page 30: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Example: LOVD

Page 31: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

31

Nanopublications of Genetic Variations visualized on the genome

NanopublicationStore

Other Sources

Other Tools

Zuotian Tatum, Jesse van Dam

Page 32: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

32

Fame and Glory

It was me, me,

me! What I

found

How I found

it

Research Object

<CS7183> <associatedWith> <MetS>

Discovered by: mePublished by: me

Nanopublication

http://purl.org/nanopub/123http://purl.org/ResObj/345

Page 33: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

33

Summary (1/2)

• Preservation under the hood of digital research tools

• Research Object Model: annotated aggregates

• Nanopublication: fine-grained digital creditCheck Nanopub.org to stay updated

Page 34: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

34

Summary (2/2)

• Semantic Web for exchange and interoperability

• In progress: RO-enabling myExperimentWatch myExperiment.org in 2013!

• Plans to RO-enable Taverna, Galaxy, GenomeSpace

Page 35: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Acknowledgements

35

EU Wf4Ever project (270129) funded under EU FP7 (ICT- 2009.4.1). (http://www.wf4ever-project.org)

Page 36: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Thank you for your attention

36

http://biosemantics.org

Page 37: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Reproducible Science

Preserved materials and methods for the ‘wet laboratory’ scientist

From Van Roon-Mom et al., BMC Molecular Biology 2008 doi: 10.1186/1471-2199-9-84.

Page 38: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Reproducible Science?

What is the digital equivalent?

Is it equally good?

Can we do better?- or worse?

Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,http://biosemantics.org , myExperiment.org/workflows/2197

Page 39: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Reproducible Science

What is the digital equivalent?

Is it equally good?

Can we do better? – or worse?

Reproduced from Jelier et al., Schuemie et al., Hettne et al., Haagen et al.,http://biosemantics.org , myExperiment.org/workflows/2197

Can you tell what

this is doing?

Page 40: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

40

Reproducible Science

What is our incentive?

Nobility

Good Reproducible Science

Greater Good

Serve the public

Page 41: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

41

Reproducible Science

What is our incentive?

Fame and Glory

Getting on with it...

I’ll be the first in Nature

Page 42: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

42

Our aim

‘Useful’ preservation

Support reproducibility in tools and by guidelines that

speed up your research

get you acknowledgement

Page 43: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

43

Preservation

Nanopublication

Assertion

Provenance

Supporting

Attribution

Research Results

What?How?

Page 44: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

Monday, April 10, 2023 Towards preserving bioinformatics experiments

44

Preservation

Nanopublication

Assertion

Provenance

Supporting

Attribution

Research Results

What?How?

Deemed of

scientific value by

scientists

Valuable for

scientists

Deemed of

scientific value by

scientists

Digital Value

Page 45: Marco Roos: Newton's ideas and methods are preserved forever: how about yours?

■ Christine Chichester■ Kees Burger - NBIC■ Spyros Kotoulas - VU■ Antonis Loizou - VU■ Valery Tkachenko - RSC■ Andra Waagmeester -

Maastricht■ Sune Askjaer - Lundbeck■ Steve Pettifer -

Manchester■ Lee Harland - Pfizer/CD■ Carina Haupt - Fraunhofer■ Colin Batchelor - RSC■ Miguel Vazquez - CNIO■ José María Fernández -

CNIO■ Jahn Saito - Maastricht■ Andrew Gibson (Outside

Expert) - Amsterdam■ Louis Wich - DTU

■ Erik Schultes■ Andrew Gibson■ Reinout van Schouwen■ Kostas Karasavvas■ Kristina Hettne■ Harish Dharuri■ Eleni Mina■ Jesse van Dam■ Herman van Haagen■ Zuotian Tatum■ Johan den Dunnen■ Peter-Bram ‘t Hoen■ Barend Mons■ Gert-Jan van Ommen

Melton Foundation

■ Paul Groth■ Frank van

Harmelen

■ Erik van Mulligen■ Bharat Singh■ Jan Kors

Acknowledgementshttp://biosemantics.org/