managing sensitive data in your repository

17
Managing sensitive data in your repository Natasha Simons Sharing Health-y and Sensitive Data: Challenges and Solutions Workshop Perth 3 September 2015

Upload: australiannationaldataservice

Post on 15-Apr-2017

168 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Managing sensitive data in your repository

Managing sensitive data in your repository

Natasha SimonsSharing Health-y and Sensitive Data: Challenges and Solutions Workshop Perth 3 September 2015

Page 2: Managing sensitive data in your repository

What is a data repository?

1

A research data repository is a managed environment capable of

storing and sharing (largely) digital data. The data repository supports the process of curating, preserving, and sharing research

data.

Page 3: Managing sensitive data in your repository

What kinds of data repositories are there?

2

Page 4: Managing sensitive data in your repository

Are repositories for open data only?

3

Yes and no….because it depends on the purpose/scope

Repositories can support data that is:1. Open access only2. Mediated access only3. Closed/private only

Most data repositories are a combination of 1 & 2

Page 5: Managing sensitive data in your repository

Are there health data repositories?

4

Yes, many!

http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html

Page 6: Managing sensitive data in your repository

What’s the point of data repositories?

5

Data repositories assist researchers and the research community to:

1. Support data sharing, data discovery & reuse, data preservation

2. Comply with publisher requirements3. Comply with funder requirements4. Comply with institutional or govt policy

requirements5. Support institutional goals Illustration credit: Ainsley Seago. doi:10.1371/journal.pbio.1001779.g001

Page 7: Managing sensitive data in your repository

Can sensitive data be managed in a repository?

6

Yes!

Ask:• Can the raw data be (de-identified and)

made completely open? Or will access be restricted? Mediated?

• What licence should be applied to enable data reuse?

• What metadata elements, links (e.g. to publications) and identifiers (e.g. DOIs, ORCIDs) will aid discovery and reuse of the data? Source: http://www.slideshare.net/WLSA_ORG/wh2014-workshop-health-data-consortium

Page 8: Managing sensitive data in your repository

Can sensitive data be managed in a repository?

7

Also ask:

• Can a citation element be added to support attribution and reuse tracking?

• Who/what will be the method of contact for the data?

• Are there other conditions that the data is subject to e.g. release subject to an embargo period?

Page 15: Managing sensitive data in your repository

What’s really challenging?

14

“Having longitudinal data on individuals is a part of many observational designs, and is needed for research into outcomes, efficacy and many mechanistic studies. Most repositories thus have longitudinal observations. To build such a database you need some way to link observations on the same identified person. Therefore most repositories contain personally identified data, but, because of privacy concerns, they often release only de-identified data. Difficulties in the de-identification process can cause some data to be omitted in a dataset. A lack of direct identifiers in a data collection or federation could prevent linking of data for some patients.

From: Wade, T. Traits and Types of Health Data Repositories. Health Information Science and Systems 2014, 2:4 doi:10.1186/2047-2501-2-4http://www.hissjournal.com/content/2/1/4

Page 16: Managing sensitive data in your repository

Small group exercise

15

Discovering sensitive health data in repositories

Small group exercise

Page 17: Managing sensitive data in your repository

Acknowledgement

Australian National Data Service is funded by

the Commonwealth under the NCRIS Program

31 August, 2015 16