research shared: researchobject.org

25
Research Shared BOSC July 11 th 2015, Dublin Norman Morrison, The University of Manchester researchobject.org

Upload: norman-morrison

Post on 09-Aug-2015

251 views

Category:

Science


0 download

TRANSCRIPT

Research Shared

BOSC July 11th 2015, Dublin Norman Morrison, The University of Manchester

researchobject.org

Framework

A  framework  to  bundle,  exchange  and  link  (scattered)  resources  about  experiments.  

Framework desiderata

       

Technology  Independent.  

The  least  possible  The  simplest  feasible  

Graceful degradation

Standard  tooling  

How?

The  Container    Packaging:    Zip  files,  Docker  images,  BagIt,  Web,  …  Catalogues  &  Commons:    FAIRDOM  SEEK,  Farr  Commons  CKAN,  myExperiment,  Zenodo,  Figshare,  …  

Manifest  Describes the aggregated resources, their annotations and provenance  

Manifest

Manifest

Manifest  Construction  •  Identification  –  id,  title,  creator,  status….  •  Aggregates  –  list  of  ids/links  to  resources  •  Annotations  –  list  of  annotations  about  

resources  

Manifest

Manifest  Description  •  Checklists  –    what  should  be  there  •  Provenance  –  where  it  came  from  •  Versioning  –  its  evolution  •  Dependencies  –  what  else  is  needed  

Manifest

Manifest id:  doi:10.000/zenodo.123  createdOn:  2015-­‐07-­‐10T16:46:00Z  createdBy:  http://orcid.org/0000-­‐0001-­‐9842-­‐9718  aggregates:        -­‐  id:  /sequence/specimen5.bam          conformsTo:  http://gemrb.org/iesdp/file_formats/ie_formats/bam_v1.htm            -­‐  id:  http://example.com/blog/about-­‐specimen5          authoredBy:  http://orcid.org/0000-­‐0001-­‐7066-­‐3350        -­‐  id:  http://www.myexperiment.org/workflows/3355            history:  provenance/workflow-­‐evolution.ttl  annotations:      -­‐  about:      /sequence/specimen5.bam          content:  annotations/specimen5-­‐properties.jsonld          createdBy:  http://orcid.org/0000-­‐0001-­‐7066-­‐3350      -­‐  about:      /sequence/specimen5.bam          content:  http://example.com/blog/about-­‐specimen5          oa:motivatedBy  oa:questioning  

RO Principles

Use unique identifiers as names for things.

Use some mechanism of aggregation to group things together.

Provide metadata about those things & how they relate to each other.

Get tooled up https://github.com/ResearchObject

Real world examples

•  Reviewed to Reproduced •  Workflow run (CWL) •  Farr Commons •  Capturing and describing Docker images

for CERN Atlas analyses •  FAIR-DOM http://fair-dom.org/

– SEEK http://seek4science.org/ •  FAIR Publishing - RO to Figshare

Reviewed to Reproduced

Reviewed to Reproduced

From González-Beltrán et al. doi:10.1371/journal.pone.0127612

Reproducibility Same data Same code

Systematic and extensible meta-data collection ✔

Workflow Run

workflowrun.prov.ttl (RDF)

outputA.txt

outputC.jpg

outputB/

intermediates/

1.txt 2.txt

3.txt

de/def2e58b-50e2-4949-9980-fd310166621a.txt

inputA.txt workflow attribution

execution environment

Aggregating in Research Object

ZIP folder structure (RO Bundle)

mimetype application/vnd.wf4ever.robundle+zip    

.ro/manifest.json

URI reference

s

Exchange Reproducibility Same data Same code Systematic and extensible meta-data collection Uses RO Model WF Extension - basis of CWL

RO’s and Sensitive data

Farr Commons

Exchange Systematic and extensible meta-data collection ✔

Use  case:  ATLAS  Collider    Data  Analytics  

Portable,  lightweight  application  runtime  and  packaging  tool.    

Image  

ATLAS  and  CMS  detector  data  

Charles  Vardeman,  Da  Huo      

All  data  and  files  of  the  execution  +  Instructions  

convert  

bundle  

manifest  

Relate  files    and  layers  

Add  provenance  and  annotations  Link  in  other  

content  

run  

Exchange Reproducibility Same data Same code Same run time environment Systematic and extensible meta-data collection

FAIRDOM SEEK

FAIRDOM

Export as RO Model, Data, SOP, Parameters

RO Unzip

Reproducibility Versioning Systematic and extensible meta-data collection

✔ ✔

FAIR Publishing

Research Objects

•  Reproducibility – Same data, same code, same run time

environment •  Versioning •  Exchange •  Systematic and extensible meta-data

collection

Research Objects

Publish a digital record of your entire scientific

enterprise

You can give it to someone

else You can get credit for it

People think you are a good

person You get a promotion

•  Why does this matter to Biologists?

Okay, but what does it cost?

Conclusion

•  Simple solution, addressing needs towards transparent FAIR principles – Findable, Accessible, Interoperable, Reproducible

•  Adoption – Training

•  Online tutorials •  Face to face

– Need more tools that take advantage of the RO Framework and lower the cost (technological debt) of reproducibility

•  Work together

Acknowledgements Carole  Goble  Stian  Soiland-­‐Reyes  Matt  Gamble  Rob  Haines    Sean  Bechhofer  Phil  Crouch  Finn  Bacall  Stuart  Owen  Carole  Goble  Khalid  Belhajjame    Graham  Klyne  Jun  Zhao      Daniel  Garijo,    Oscar  Corcho    Esteban  García  Cuesta  

University  of  Manchester    

University  of  Oxford  Lancaster  University    

UPM    

http://researchobject.org  http://fair-­‐dom.org  http://www.seek4science.org  http://www.farrinstitute.org  http://www.wf4ever-­‐project.org  http://myexperiment.org    

Raul  Palma    

iSOCO  

PSNC  

Paris  6