recommendations for implementing open science · sharing sensitive data with confidence: the...

18
Recommendations for Implementing Open Science Keynote, Swiss Open Science Action Plan Lausanne, October 17, 2019 Mercè Crosas, Ph.D. Chief Data Science and Technology Officer, IQSS Harvard University’s Research Data Officer, HUIT @mercecrosas

Upload: others

Post on 24-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Recommendations for Implementing Open Science

Keynote, Swiss Open Science Action Plan

Lausanne, October 17, 2019

Mercè Crosas, Ph.D.

Chief Data Science and Technology Officer, IQSS

Harvard University’s Research Data Officer, HUIT

@mercecrosas

Page 2: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

An Open Science solution should be researcher-centric

@mercecrosas 2

Page 3: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

National Academies of Sciences New Report:Open Science by Design (2018)

Researcher at the center:

• Researcher contributes to open science

• Researcher takes advantage of the open

science practices

• From data generation to validation,

dissemination, and preservation

@mercecrosas 3

Page 4: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

OpenAire New White Paper: Achieving Open Science in the European Science Cloud (2019)

@mercecrosas 4

Researcher at the center:

• Researcher publishes all kinds of

scientific products (data, software,

workflows)

• Open as the default

• Services provide technology and

training for researchers

Page 5: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Data Sharing is key to Open Science

@mercecrosas 5

Page 6: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Dataverse: an open-source platform for sharing and archiving research data

• Launched at Harvard in

2006

• Used in 6 continents

• 48 Dataverse sites

• 5500 dataverses (branded

datasets collections)

• 120K datasets

• 10M data files downloads

• Vibrant open-source

community

@mercecrosas 6

Page 7: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

A National Dataverse Site: DataverseNO

• 8 universities in Norway as members; the

other 3 to join soon

• Policies and guidelines common to all

DataverseNO members

• Global and local support

• Applied for Core Trust Seal certificate

• Dataverse Europe Workshop January 2020

• https://site.uit.no/dataverseno/

@mercecrosas 7

Page 8: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

But data sharing only is not sufficient for Open Science

@mercecrosas 8

Page 9: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Open Science should include sharing of:

• Code associated with data to reproduce research results

• Workflows to describe research steps and data transformations

• Research software, algorithms, and tools for reuse

@mercecrosas 9

Page 10: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Data repositories should be integrated with computational environments

Data

+Code

Reproducible container;

research object

Data

+Code

Reproducibility

verification metric

Computational

& Reproducibility

Environments

Pull data from

DataverseArchive input, output, and

workflow to Dataverse

10

Page 11: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Sharing data and computing together helps create a Data Commons to collaborate

“…a data commons brings together (or co-locates) data with

cloud computing infrastructure and commonly used software

services, tools & applications for managing, analyzing and

sharing data to create an interoperable resource for a research

community”

Robert Grossman

Data scientist at the University of Chicago; Director of the Open Commons Consortium

https://medium.com/@rgrossman1/a-proposed-end-to-end-principle-for-data-commons-5872f2fa8a47

@mercecrosas 11

Page 12: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Open Science does not always mean fully open data

@mercecrosas 12

Page 13: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Metadata should be open for discoverability;But data restricted when needed.

@mercecrosas 13

Page 14: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

DataTags: Standardized Data Policies

@mercecrosas 14

DataTag Data Access Authorization Data Use Agreement Encryption

Blue Public

Green Public + Register

Yellow Restricted + Approval

Needed

+ Click-thru DUA + Encrypted transit

Orange Restricted + Approval

Needed

+ Signed DUA + Encrypted transit

+ Encrypted storage

Red Restricted + Approval

Needed

+ Signed DUA

+ Two-factor Auth

+ Encrypted transit

+ Encrypted storage

Crimson Restricted + Approval

Needed

+ Signed DUA

+ Two-factor Auth

+ Encrypted transit

+ Multi-encrypted storage

Sweeney, Crosas, Bar-Sinai, 2015. Sharing Sensitive Data with Confidence: The DataTags System, Technology Science

Page 15: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

OpenDP: A New Project for Sensitive Data

A community effort to build a trustworthy and open-source suite of differential

privacy tools that can be easily adopted by custodians of sensitive data to make it

available for statistical research.

• To be launched in 2020 with Sloan Foundation funding

• Initially led by Harvard Privacy Tools project (PIs: Vadhan, Honaker, King, Crosas)

A tool (algorithm) is differentially private if its output cannot reveal whether any

individual's data was included in the original dataset or not.

@mercecrosas 15

Page 16: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Dataverse + DataTags + OpenDP:Sharing and analyzing sensitive data

Sensitive

Data File

DataTags

Automated

Interview

Review Board

Approval

Sensitive

Data File

Direct

Access

Privacy

Preserving

Access

Register; request access;

approval needed with

signed-DUA

Get data summaries

and analysis with

Differential Privacy tool

16

DUA

Data in

Trusted

Remote

Storage

Page 17: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

In conclusion, an Open Science implementation should include:

• Incentives to share data, software/code, and other research outputs

• Metadata and format standards for discovery and reuse

• Machine-readable data for management and usage by computers

• Sufficient information to reuse the data, software, workflows (all research outputs)

• Support for software licenses and data use agreements

• Public metadata (at a minimum for citation) even when data are restricted

• Integration of archival repositories with computational environments

• Solutions for collaborations that access sensitive, private data

@mercecrosas 17

Page 18: Recommendations for Implementing Open Science · Sharing Sensitive Data with Confidence: The DataTags System, Technology Science OpenDP: A New Project for Sensitive Data A community

Thanks

dataverse.org | dataversecommunity.global/ | scholar.harvard.edu/mercecrosas

@mercecrosas 18