data management plans jennifer l. thoegersen, data curation librarian nuramp workshop series october...
Post on 18-Jan-2016
221 Views
Preview:
TRANSCRIPT
Data Management PlansJENNIFER L. THOEGERSEN, DATA CURATION LIBRARIANNURAMP WORKSHOP SERIES
OCTOBER 8, 2015
2
Jenny Thoegersen, Data Curation LibrarianUniversity of Nebraska-Lincoln Libraries jthoegersen2@unl.edu
3 of 16
Agenda
•Data Management•Definition• Importance
•Overview of Data Management Plans (DMPs)
•Components of a DMP
•Library Services
•DMP Activity
4 of 16
Acronyms DMP Data Management Plan PII Personally Identifiable Information DOI Digital Object Identifier ARK Archival Resource Key URN Uniform Resource Name PURL Persistent Uniform Resource Locator CSV Comma-Separated Values TIF/TIFF Tagged Image File (Format) XML eXtensible Markup Language UNLDR UNL Data Repository and Registry
5 of 16
Definition
"Data Management is the process of controlling the information generated during a research project“
Penn State University Libraries
6 of 16
Importance Funding
Security
Validity
Publishing.
7 of 16
Journal Data Archiving Policy (JDAP)
“…requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive…”
Dryad (2014-04-04). “Joint Data Archiving Policy (JDAP).” http://datadryad.org/pages/jdap
8 of 16
JDAP Journals The American Naturalist
Biological Journal of the Linnean Society
Biology Letters
BMC Ecology
BMC Evolutionary Biology
BMJ
BMJ Open
Ecological Applications
Ecological Monographs
Ecology
Ecosphere
Evolution
Evolutionary Applications
Frontiers in Ecology and the Environment
Functional Ecology
Genetics
Heredity
Journal of Applied Ecology
Journal of Ecology
Journal of Evolutionary Biology
Journal of Fish and Wildlife Management
Journal of Heredity
Molecular Biology and Evolution
Molecular Ecology and Molecular Ecology Resources
Nature
Nucleic Acids Research
Paleobiology
PLOS
Science
9 of 16
DMPs for Proposals
•Follow guidelines provided by granting agency, directorate, division, and solicitation
•Keep the plan clear, complete, and concise
•Refer back to the project proposal, if necessary
•Start early!
•Recheck requirements for changes
10 of 16
NSF Basic DMP Requirements
1. Types of data
2. Standards for data and metadata
3. Policies for sharing and protection
4. Provisions for re-use
5. Plans for preservation
From the Grant Proposal Guide (http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_2.jsp)
11 of 16
Data Management Components
PlanningData
Metadata
Storage & Backup
Preservation & Sharing
Legal & Ethical Issues
12 of 16
Data
•Include ALL data to be produced/used in project
•Explicitly match data types to format
•Consider open, widely-used formats for sharing/preservation• http://en.wikipedia.org/wiki/Open_format
•Library of Congress Formats Recommendations• http://www.loc.gov/preservation/resources/rfs/TOC.html
13 of 16
Examples of Open Formats
•CSV
•XML
•tar
•HTML
•PNG
•FLAC
•MKV
•Plain text
•ePub
•LaTex
•JSON
•OpenDocument
•PDF/A
14 of 16
Metadata
•Provides contextual and descriptive information
•Aids in discovery
•Should use standards, if possible• http://www.dcc.ac.uk/resources/metadata-standards
http://datadryad.org/resource/doi:10.5061/dryad.1321/1
15 of 16
DMP Example: MetadataThe project will leverage existing metadata standards currently stored in Ecological Metadata Language (EML) format. We will add additional metadata entries for the arthropod community composition and arthropod stoichiometry; field notes taken during the time of collection will be recorded. Morpho software will be used to generate the metadata file in EML. We chose EML format for our metadata since it allows integration with existing NutNet data housed in the Knowledge Network for Biocomplexity (KNB) data repository.
16 of 16
Storage & Backup
•Identify where data will be stored during the project
•Specify access and security restrictions
•Explain how data will be backed up•What data will be backed up?•Where?•How often?
•Make it clear that you will maintain at least 3 copies of your data at all times—1 should be remote
Backup
17
Copies Media Types Remote
3 2 1
18 of 16
Storage & Backup Options at UNL
•Departmental servers
•Box
•Personal computer
•External hard drives
•Holland Computing Center
•Nsave backup by ITS
19 of 16
Cloud Considerations
•Research data can be stored in the cloud
•Must be conscious of security, privacy, confidentiality, legal, and access issues•HIPAA•Export controls•PII
•See box.unl.edu for details on allowable data
20 of 16
Ethical & Legal Considerations
•DMP must address whether there are any legal/ethical/IP considerations for data
•If human subjects, how will privacy be protected?
•Consider in terms of storage, access, and potential sharing
•State explicitly if not applicable
21 of 16
DMP Example: Legal & Ethical Considerations
This study will only collect non-sensitive data. No personal identifiers will be recorded or retained by the researchers in any form. There are no copyright or licensing issues associated with the data being submitted.
22 of 16
Preservation & Sharing
•Identify what data will be preserved and shared, when, where and for how long
•May be able to embargo data for a time (usually 1-2 years)
•Data repositories are encouraged for preservation/sharing•Use re3data.org to search for repositories in your discipline•Consider cost, longevity, audience•Popular repositories include ICPSR, Dryad, and Figshare
23 of 16
Data identifiers
“Message error 404” by Roberto Zingales, https://www.flickr.com/photos/filicudi/2891898817 (CC BY 2.0)
•Persistent identifiers assist in citation and long-term access
•Common systems include:•DOI•ARK•URN• PURL
24 of 16
DMP Example: Sharing
The data being submitted will be made publicly available through the {} by {}. There will be no additional restrictions or permissions required for accessing the data. Findings will be published by the researchers based on this data; the estimated date of publication is {}. The specified embargo period associated with the data being submitted extends from the projected conclusion date for initial research until six months after projected publication date for the findings. The embargo will be lifted by {}.
25 of 16
DMP Example: Preservation
The data collected during this study will be archived with { }. The data will be stored in a specific virtual archive and will be made publicly available through {}. This {} data archive is a well-established and trusted archive in the social science field. As a member of the Data Preservation Alliance for the Social Sciences (Data-PASS) and the Library of Congress National Digital Stewardship Alliance (NDSA) {} provides a strong archival and data distribution resource to the project.
26 of 16
Services at UNL Libraries
•Workshops
•Consultations
•UNL Data Repository & Registry
27 of 16
Potential Workshop Topics
•File organization
•Data management plans
•Storage and backup
•File formats
•Documentation and metadata
•Data repositories
28 of 16
Consultations
EveryProjectIs Unique
By Muhammad Rafizeldi (MRafizeldi) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
29
Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. doi:10.1371/journal.pbio.1001779http://127.0.0.1:8081/plosbiology/article?id=info:doi/10.1371/journal.pbio.1001779
UNLDR•Deposit inactive datasets
•Data can be private or public (with embargo if needed)
•Assign DOIs to public data
•Guarantee 20 years
•50GB free for UNL researchers
Bach, Roger and Batelaan, Herman (2015): Electron Double Slit and Talbot-Lau Inteferometer. UNL Data Repository. Dataset. http://dx.doi.org/10.13014/K2RN35SZ
31
Data Management Plan Activity
32
The grant solicitation will always include all of the guidelines for your
data management plan.
33
CSV, TIF, and XML are examples of open file formats.
34
All NSF grant proposals require a data management plan or an explanation of
why one is not necessary for the project.
35
You should maintain at least 4 copies of your data at all times.
36
UNL Libraries offer storage for active datasets.
37
ITS can provide backup services for your desktop and laptop devices.
38
DMPs should address any privacy, ethical, and intellectual property
concerns relevant to the project data.
39
Many journals require relevant data to be made publically available as a
condition of publication.
40
Metadata provides context for data and aids in its discovery.
41
No research data should be stored in the cloud.
42 of 16
Review an NSF Data Management Plan
•Read through the DMP from a successful NSF Grant Proposal
•Consider:•What was done well?• Is there anything that was not done well?
•Discuss in groups
43 of 16
Resources
Overview : unl.libguides.com/datamanagement
SampleDMPs: go.unl.edu/sample_dmpsUNLDR: dataregistry.unl.edu
Email: jthoegersen2@unl.edu
top related