data publishing workflows: strategies and standards sünje dallmeier-tiessen (cern) for many...

49
Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows Group

Upload: jasmin-conley

Post on 23-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Data Publishing Workflows:Strategies and Standards

Sünje Dallmeier-Tiessen (CERN)

for many collaborators at CERN andin the RDA-WDS Data Publishing Workflows Group

Page 2: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Outline• Policy pressure• Solutions across disciplines• Standards

• Persistent Identifier• Data Citation• Quality Assurance, Peer Review• Licensing

• Examples in High-Energy Physics (CERN)• INSPIRE• Analysis Preservation Framework• Open Data Portal

Page 3: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Research data is a first class citizen

Royal Society, 1665 and 2012

Page 4: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Towards Open Science

Open Source

Open Access

Open Data & Code

Open ScienceWe are here now

Slide provided by Patricia Herterich, CERN

Page 5: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Policy pressure: STFC example

https://www.stfc.ac.uk/Resources/pdf/STFC_Scientific_Data_Policy.pdf

Page 6: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Policy pressure: DOE example

DMPs should provide a plan for making all research data displayed in publications resulting from the proposed research open, machine-readable, and digitally accessible to the public at the time of publication.

…the underlying digital research data used to generate the displayed data should be made as accessible as possible to the public in accordance with the principles stated above. 

http://science.energy.gov/funding-opportunities/digital-data-management/

Page 7: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Expectations: PLOS Data Policy

ww

w.p

los.

org

Page 8: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Concerns across disciplines

Datasets are…• Not shared or lost• Difficult to discover and access• Difficult to understand > context missing

Nature, 2009

Page 9: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

How this challenge is addressed

Page 10: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Example: Dedicated Data Repositories

ww

w.p

anga

ea.d

e

Page 11: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Preserving and promoting data reuse

ww

w.p

anga

ea.d

e

Page 12: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

International sharing and curation of data

ww

.icgc

.org

Page 13: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

ICGC – Data Publication Timeline

Time limits for publication moratoriums:

All data shall become free of a publication moratorium when either the data is published by the ICGC member project or one year after a specified quantity of data (e.g. genome dataset from 100 tumors per project) has been released via the ICGC database or other public databases.

[…]

In all cases data shall be free of a publication moratorium two years after its initial release.

https://icgc.org/icgc/goals-structure-policies-guidelines/e3-publication-policy

Page 14: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Zenodo – Data Repository

ww

w.z

enod

o.or

g

Page 15: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

How to find a data repository

ww

w.r

e3da

ta.o

rg

Page 16: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Example: A dedicated data journal

ww

w.n

atur

e.co

m/s

data

/

Page 17: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

F1000

http

://f

1000

rese

arc

h.c

om/

Page 18: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Connecting articles and data

Tagged Genbank entry(genetic sequence)

Slide provided by H. Koers, Elsevier. Article: doi: 10.1016/j.biortech.2010.03.063

Page 19: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Towards Open Science

Open Source

Open Access

Open Data & Code

Open ScienceWe are here now

Slide provided by Patricia Herterich

Page 20: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Publish (Citable) Software

Page 21: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

More and more examples

Page 22: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Published Software Papers

http

://op

enre

sear

chso

ftwar

e.m

etaj

nl.c

om/

Page 23: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

STANDARDS

Page 24: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Licensing• Enable others to reuse your data and software • Choose the licenses or public domain dedications

accordingly• As “open” as possible

Re-Use• There are measures to demand citations to track reuse

and the impact of your work• If you re-use, cite the dataset yourself

Page 25: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Digital Object Identifiers (DOI names) offer a solution

Mostly widely used identifier for scientific articles

Researchers, authors, publishers know how to use them

Put datasets on the same playing field as articles

DatasetYancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA.doi:10.1594/PANGAEA.587840

URLs are not persistent

(e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).

DOIs for datasets

Slides by courtesy of Dr. Jan Brase, DataCite

Page 26: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

ORCID id

ww

w.o

rcid

.org

Page 27: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Force11- Data Citation Principles

Author, Publication Year, Dataset Title, Data Repository, Version, Unique Identifier

- should include a persistent method for identification that is machine actionable and globally unique

- should facilitate identification of, access to, and verification of the specific data that support a claim.  

www.force11.org

Page 28: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Data Citation in Practice

Page 29: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Quality assurance for data: peer review

Products

• Data records in data repositories

• Data journals• Data articles

Note: standalone vs. supporting materials

QA Workflows

• Standalone or integrated?

• Blind and invited peer review

• Open peer review• Citable review reports

Page 30: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

How to publish your data

1. Decide which dataset should be preserved or which dataset might be of interest for others to study or reuse

2. Are there issues which restrict the publishing process, e.g. confidentiality for patient data?

3. Which data product? • Do I have enough materials for a dedicated data article? • Which journal or repository works for me?

4. Prepare the documentation/metadata

5. Publish and let the others know you did

6. Cite the dataset in the resulting papers

7. Track who used and cited your data

Page 31: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

HEPHigh-Energy Physics

Page 32: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Research data in HEP

Page 33: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Research Data on INSPIRE: starting from the paper

Page 34: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

The underlying datasets (HEPdata)

Page 35: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Data Citation (Tracking)

Page 36: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Referenced Data

arXiv: 1311.1113

Page 37: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Code snippets

Page 38: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Code snippets

Page 39: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

… and who gets the credit for sharing data?

Page 40: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Kyle’s profile on INSPIRE

Page 41: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Using author IDs for attributing credit

Page 42: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Excerpt from publication list on

Page 43: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Excerpt from publication list on

Make data publications count - alongside your articles

Page 44: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Focusing on reproducibility and reuseTwo important new tools

Page 45: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Capturing the complexity: Analysis Preservation Framework

Page 46: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Open it up: CERN Open Data Portal

Page 47: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

How to publish your data

1. Decide which dataset should be preserved or which dataset might be of interest for others to study or reuse

2. Are there issues which restrict the publishing process, e.g. confidentiality for patient data?

3. Which data product? • Do I have enough materials for a dedicated data article? • Which journal or repository works for me?

4. Prepare the documentation/metadata

5. Publish and let the others know you did

6. Cite the dataset in the resulting papers

7. Track who used and cited your data

Page 48: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Conclusions• Policy pressure nationally and globally: we need data

publishing solutions

• Considerable advancements in many disciplines

We learn from best practices

• HEP with commitment to data preservation and open data releases

• First tools are available to support data preservation and data publishing

Page 49: Data Publishing Workflows: Strategies and Standards Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows

Towards Open Science

Open Source

Open Access

Open Data & Code

Open ScienceWe are here now

Slide provided by Patricia Herterich