course research data management

27
Course Research Data Management Maarten van Bentum (Library & Archive)

Upload: maarten-van-bentum

Post on 12-Apr-2017

198 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Course Research Data Management

Course Research Data ManagementMaarten van Bentum (Library & Archive)

Page 2: Course Research Data Management

Blackboard

UT website, employees page

ORG-AA-BA-RESDATAMAN: Course Research Data Management

Course material: presentations, links to information, DMP template, datasets

After the course-day: contact for support and feedback

Page 3: Course Research Data Management

Why research data management

• Importance of quality, reliability, replicability and verification of scientific research

• Better and more efficient access to research data• Requirements of research funders with regard to data

management• Data management will become an issue in research

assessments

Page 4: Course Research Data Management

Benefits research data management

• Improved research quality• Improved efficiency• Protection from data-related risks• Enhanced reputation and prestige

Page 5: Course Research Data Management

Research Data Management: importance (1/2)

Scientific integrity (1), funder requirements (2) and developments in science (3)

(1) Fabrication, Falsification and Plagiarism (FFP) > RDM?

Neglect of basic preservation of data Neglect of data management No proper mechanism for quality control: no data or instruments

for easy data reproduction means no possible check

See also: https://www.utwente.nl/en/organization/structure/management/good-management/

Netherlands Code of Conduct for Academic Practice: Verification section

Page 6: Course Research Data Management

Research Data Management: importance (2/2)

(2) NWO and EU Horizon 2020 data management pilots

Focus on open data and reuse

Data Management Plan

Data archived in data repository

NWO: http://www.nwo.nl/en/policies/open+science/data+management

EU H2020:

http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

(3) Development in science

Data intensive science (4th paradigm)

Data collections are future assets of research groups

Page 7: Course Research Data Management

What you will learn today

Data management planning: how to make a DMP, what issues and how to describe (interactive)

Awareness of importance of managing data after research: data citation and publication (persistent identifiers) and proper data archiving

Knowledge about legal issues in data management

Page 8: Course Research Data Management

Programme9:30 Introduction to Research Data Management Dr. ir. Maarten van Bentum, data librarian

UT - Library & Archive9:45 Data Management Planning Dr. ir. Maarten van Bentum, data librarian

UT - Library & Archive10:00 Small group assignment:

Writing a DMP section (based on one of the research cases in the group)

Dr. ir. Maarten van Bentum, data librarian UT - Library & Archive

10:45 Break

11:00 Plenary presentations: Each group presents the section they have prepared, and rest of the teams act as the EU review committee.

Dr. ir. Maarten van Bentum, data librarian UT - Library & Archive

12:30 Lunch

13:30 Data Citation: Claiming Data with DOI’s (incl. small assignments)

Ellen Verbakel, data librarian TU Delft - 3TU.Datacentrum

14:00 Hands on Data CV, ORCID (participants individually) Ellen Verbakel, data librarian TU Delft - 3TU.Datacentrum

14:45 Data publications Ellen Verbakel, data librarian TU Delft - 3TU.Datacentrum

15:00 Break

15:15 Data archive, Dataseal, DIY/DIT Ellen Verbakel, data librarian TU Delft - 3TU.Datacentrum

15:30 Legal issues: Data retention, data protection, privacy, ownership

Drs. Heiko Tjalsma, legal advisor DANS

16:30 Evaluation form: tell us what you think about this course

16:45 Closure

Page 9: Course Research Data Management

Data Management Plan – a definition

Formal research project document about what and how data will be collected, stored, described, and archived and how access, reuse and linking to publications will be realised.

Page 10: Course Research Data Management

Data Management Plan - topics

Responsibility

Description of data

Methodology data collection

Documentation: metadata (standards)

Quality assurance

Storage and backup

Policies for access and sharing and provisions for appropriate protection/privacy

Policies and provisions for reuse, redistribution

Plans for archiving and preservation of access

From: National Science Foundation and University of California

Page 11: Course Research Data Management

Data Management Plan - templates

Information, templates and checklists

UT template: website RDM on Library & Archive

3TU.Datacentrum: template

DANS checklist

NWO form

Page 12: Course Research Data Management

Writing a DMP

6 small groups (data collection, data storage and backup, data documentation, data access, data sharing and reuse, data preservation and archiving)

Use UT template

Work with research case or dataset of one of the group members

Plenary presentations and discussion (15 min each)

Page 13: Course Research Data Management

DMP - Data collection (1/1)

Type of data > what else should be considered to be object for management: software, models, scripts, instruments, questionnaires, informed consent, etc.

Legal and contractual regulations: Personal data? >

Dutch Personal Data Protection Act, http://www.utwente.nl/az/gegevensbescherming/ (in Dutch)

UT classification guideline for information and information systems (in Dutch)

Who collects data: third party? > contract about rights and licenses, example bankruptcy research agency (see later: data access)

Page 14: Course Research Data Management

DMP - Data storage and backup (1/4)

CriteriaSustainability/reliability: frequency backup (off line / off site?)

Dataset type: raw dataset, versions during processing and analysis, final datasets

Size dataset: capacity, costs, data transfer

Legal or contractual regulations

Access: individual, community, open

Page 15: Course Research Data Management

DMP - Data storage and backup (2/4)

Storage options

1.UT central storage

p- or m-disk (ICTS): http

://www.utwente.nl/icts/diensten/catalogus/dataopslag_mw/storage/)

2.Project, community or research institute storage

IGS Datalab: https://www.utwente.nl/igs/datalab/

§Individual data storage (computer, dvd/cd, external hard disk,…)

§Non-commercial cloud storage

Surfdrive: https://www.surfdrive.nl/en

DataverseNL: https://dataverse.nl/dvn/

§Commercial cloud storage: Dropbox, OneDrive, …

Page 16: Course Research Data Management

DMP - Data storage and backup (3/4)

Storage solution Advantages Disadvantages Suitable for University of Twente (ICTS) central storage M: and P:

full service; reliable, durable, secure; high speed data transfer

no sharing outside UT saving large data files; master copy of data; use encryption for sensitive and critical data; use SURFfilesender for encrypted data transfer

PC or laptop always available; portable; low cost; high speed data transfer

sensitive to damage and loss (no automatic backup); no sharing

saving large data files; temporary storage; use encryption for sensitive and critical data

Personal storage devices (USB flash, external hard drive, DVD/CD)

portable; low cost easily damaged or lost (no automatic backup); not for sensitive or critical data; difficult sharing

saving large data files; temporary storage of standard data

Non-commercial cloud services (for example, DataverseNL1, SURFdrive)

automatic synchronization on several devices; easy access; external sharing

medium speed data transfer; not for sensitive or critical data (SURFDrive: when encrypted)

sharing standard data with external parties

Commercial cloud services (for example, Dropbox, Google Drive, OneDrive)

automatic synchronization on several devices; easy access; external sharing

medium speed data transfer; not for sensitive or critical data; unclear access to data; unclear privacy regulations

sharing standard data with external parties

Page 17: Course Research Data Management

DMP - Data storage and backup (4/4)

UT data policy

During the research the research data will be saved in a central repository which is available to at least the members of the research group/ institute and which is managed by this research group/ institute. Storage and access should be managed in accordance with legal regulations, any third party contractual requirements, etc.

Backup3 copies (original, external/local, external/remote)

Local vs. remote depends on recovery time needed

Data transfer

https://www.utwente.nl/icts/en/diensten/catalogus/filesender/

Page 18: Course Research Data Management

DMP - Data documentation (1/4)

Documentation during research of dynamic data sets (for yourself, fellow researchers in the project and/or group)

Documentation after research of static data sets (for discovery, verification, replication, and reuse)

Documentation: standard metadata schemes enhanced with specific descriptive elements necessary for verification, replication, and reuse

See list: http://www.dcc.ac.uk/resources/metadata-standards/list

See also 3TU.Datacentrum Data description and formats

Page 19: Course Research Data Management

DMP - Data documentation (2/4)Title name of the dataset or research project that produced itCreator names and addresses of the organization or people who created the

data, including all significant contributorsIdentifier The identification number used to identify the data, even if it’s just

an internal project reference numberSubject keywords or phrases describing the subject or content of the dataDates key dates associated with the data, including:

project start and end date; release date; other dates associated with the data lifespan, e.g., maintenance

cycle, update scheduleFunders organizations or agencies who funded the researchLanguage language(s) of the intellectual content of the resource, when

relevantLocation where the data relates to a physical location, record information

about its spatial coverageRights description of any known intellectual property rights held for the dataList of file names and relationships list of all digital files in the archive, with

their names and file extensions (e.g., 'NWPalaceTR.WRL', 'stone.mov')

Page 20: Course Research Data Management

DMP - Data documentation (3/4)Formats format(s) of the data, e.g., FITS, SPSS, HTML, JPEGMethodology how the data was generated, including equipment or software

used, experimental protocol, other things you would include in your lab notebook. Can reference a published article, if it covers everything

Workflows or analyses to be able to reproduce your workSources references to source material for data derived from other sources,

including details of where the source data is held, how identified and accessed

Versions date/time stamped, and use a separate ID (e.g., version number) for each version

Checksums to test if your file has changed over timeExplanation of codes used in file names brief explanation of any naming

conventions or abbreviations used to label the filesList of codes used in files list of any special values used in the data (e.g.,

codes for categorical survey responses, '999 indicates a "dummy" value in the data,' etc.)

Store metadata in a text file (such as a readme file or codebook) in the same directory as the data

Page 21: Course Research Data Management

DMP - Data documentation (4/4)

File naming conventions: http://guides.lib.purdue.edu/content.php?

pid=440001&sid=4901667

Good directory structure:

Directory top-level should include

Project title

Unique identifier

Date (e.g. year)

Substructure should have clear, documented naming convention

e.g. each run of an experiment, each version of a dataset, each person in the group.

Page 22: Course Research Data Management

DMP - Data access (1/3)

- UT data policy?

- Funder requirements?

- Requirements other parties? Contracts?

- Open Access required? Possible? Dutch Personal Data Protection Act (UT Data Protection Officer)

Page 23: Course Research Data Management

DMP - Data access (2/3)

data accessM:drive (Home-

directory)P:drive (Group-

permissions)DataverseNL Surfdrive

Commercial cloud (Dropbox, etc)

internal group/organization no yes yes yes yes

external group/organization no no yes yes yes

on request no no yes no no

view/download rights management no yes yes yes yes

edit rights management no yes yes yes yes

collaborating on data no no yes yes yes

Page 24: Course Research Data Management

DMP - Data access (3/3)

DataverseNL

dynamic data sets (file version control)

static data sets (release with persistent id)

access rights management

not for privacy sensitive data!

Page 25: Course Research Data Management

DMP - Data sharing and reuse (1/1)

Why sharing your data?

Replication / verification

Promote your research

Enable new discoveries (reuse)

"Open where possible, protected where needed"

See NWO policy http://www.nwo.nl/en/policies/open+science

After research: public, linked to publication(s) > DataverseNL, data centres

Page 26: Course Research Data Management

DMP - Data preservation and archiving (1/2)

UT data policy Preferably during the research, but not later than 1 month after

finishing the research, the research data are archived in a trusted repository (e.g. DANS or 3TU.Datacentrum). The research data are, taking legal regulations, any third party contractual conditions into account, preferably publicly available. This covers at least the research data that form the basis of publications about the research, but can also comprise the full set of raw and/or edited research data.

After the research all durably stored research data and the publications based on those data are linked. This is at least the case for PhD dissertations.

Page 27: Course Research Data Management

DMP – Data preservation and archiving (2/2)

Data centres:

3TU.Datacentrum

DANS

List of data repositories: Databib or Data repositories