research data management in the humanities and social sciences
Post on 24-Jan-2017
1.069 Views
Preview:
TRANSCRIPT
Research Data Management: Humanities and Social Sciences Edition
CC BY-NC
Celia Emmelhainz and Suzi ColeAugust 11, 2015
Modified from presentation by Leslie Barnes, Dylanne Dearborn, Andrew Nicholson
at http://guides.library.utoronto.ca/RDM-intro
• All liaison librarians need a basic knowledge of research data management (RDM).
• RDM is part of the librarian’s toolkit for serving faculty research needs.
• We don’t all need to be data experts, just as we aren’t experts in many areas that we cover.
• RDM is one of many topics we discuss with faculty over time, like collections, instruction, course guides, and student research.
• Our faculty may not know RDM terms or may not understand what our institutional repository or other archives can do with data.
• Humanists may react negatively to the term “data.”• (Optional): we can faculty by reading their drafts of data
management plan: if we don’t understand, reviewers won’t either.• Knowing data concepts enhances our role & expands our visibility.• Data collection and the data lifecycle are part of where we help
with curation in the library.• This is a new knowledge area for all academic librarians.
Our assumptions
Why do academic libraries help with data management?
• Library culture is to acquire, organize, and preserve information • Logical extension of services we’ve traditionally been involved with• Libraries bring people together across disciplinary differences & campuses
Reading: Coates (2014) Ensuring research integrity: the role of data management in current crises. C&RL News 75(11): 598-601.
After these sessions, you should…
● Know the concepts in data management
● Feel less anxious when talking about data
● Begin listening to faculty talk about their research process and outputs
● Know where to get more help with research data for faculty in your disciplines
But why liaisons?
Info: eScience Team presentation on liaison roles, Image: CC0 from pixabay.com
A logical extension of our role as connections between the library and teaching faculty
A great way to show faculty that we care about their research as well as teaching
Liaisons as natural point of “triage”
Liaisons – Learning Over Time
First Steps: Get comfortable with the idea of research data management.
Next Steps: Start a conversation with faculty about their needs, share resources, and direct them to data librarians for complex questions.
Moving Ahead: Take self-paced courses for librarians on the web. And try it out! Try managing data for one of your own projects.
Source: eScience Team presentation on liaison roles for data management
Our path…
Today …introduction to data management…types of research data you’ll encounter…data formats and organization
Thursday…intro to data storage…intro to data sharing…advising on data management plans
DATA?Q1: What is
Prompt: what materials do your faculty use to make sense of their research?
“Research data is collected, observed, or created
for purposes of analysis to produce original research results.”
- U Edinburgh
DATAQ2: What are
in the humanities?
Textual data in the humanities could include:
- Scholarly editions- Text corpora- Text with markup- Thematic collections- Annotations- Accompanying analysis - Finding aids
Cf: guides.library.ucla.edu/c.php?g=180580&p=1187629, guide.dhcuration.org/intro/, image source: slideshare.net/ULCCEvents/the-humanities-and-data-management
Data in the qualitative social sciences could include:
• microfilms• copies of old
documents• oral interviews• video tapes• hand-written
records
from: www.nsf.gov/sbe/ses/common/archive.jsp
Humanities and arts data:● Texts used for research● Annotations● Images and illustrations● Citations ● Bibliographic information● Contextual information● Audio or video files
Health and Life Sciences data: Health indicators, vital signs Protein or genetic sequences Spectra and images Artifacts and samples Slides and specimens
Social Sciences data:● Survey responses● Focus groups and interviews ● Administrative records● Demographic information● Opinion polling● Maps and geospatial data● Websites, primary sources
Physical Sciences data: Sensor or lab measurements Computer modeling and
simulations Observations and/or field notes Numerical measurements
Cf: Best Practices for Arts/Humanities Data Management Plans, CU-Boulder http://bit.ly/1MkKCIa
DigitalThoreau.org: On the left, the Princeton edition of Walden; right, original 1847 draft with changes marked up.
Text Encoding Initiative (TEI) is a markup language that records the structure of text (author, chapters, pages, quotes) for digital humanities/curation purposes.
Ask Yourself (#1):
Using a project summary, ask yourself:
- what is this research project about? - what types of data are being collected- what types of data are being created
data (the stuff we do research with) are vital at every point in the
research lifecycle. Image: www.lib.uci.edu/dss/images/lifecycle.jpg
example: temperature data from a lake
Raw Processed Analyzed Finalized/Published
Example: data across the lifecycle
WHY manage data?
① for the researchers’ own current/future benefit② for transparency and integrity③ for sharing knowledge & how constructed
④ to meet grant requirements (NEH, NSF)⑤ to comply with ethics requirements⑥ to increase exposure to faculty research
2: Data Formats and Organization
CC image from pixabay.com/en/filing-cabinet-office-furniture-146160/
File Naming video
● Use meaningful names ● Avoid special characters ● Use caps or underscores, not spaces● Choose a standard date format:
YYYYMMDD or YYYY-MM-DD● Label versions (v2, v15)
Data Structures videoCould organize by: ● Type of information● Date and time● Research project● Theme or subject
frontispieces/20141211/images
images/frontispieces/20141211
Data Dictionaries and CodebooksExplains what a dataset contains:● Contents or organization of a file● Glossary of key concepts or terms● Definitions for each variable name● Describes relationships of tables/files● Codes that have been used to sort data● Sampling or other methods used
Use open formats when possible:
“open source” formats keep files accessible over time; proprietary formats may be lost of a company goes out of business. Open formats let future researchers access your data!
Video: .mov, .mpegAudio: .wav, .mp3Data: .csv, .sasImages: .tiff, JPEG 2000Text: PDF/A, ASCII
Ask Yourself (#2):
Using the project summary, ask yourself:
- what file formats are the data now in? - do they need conversion to open formats?- are they well documented with metadata?
Intersession exercise:
Read the NEH guidelines for data management.
View any two data management libguides: Who is the audience? What services are offered? How does it connect to users?
Briefly review your chosen project summary, in preparation for the final class.
Research Data Management: Session Two!
CC BY-NC
Celia Emmelhainz and Suzi Cole
August 13, 2015
Modified from presentation by Leslie Barnes, Dylanne Dearborn, Andrew Nicholson
at http://guides.library.utoronto.ca/RDM-intro
3: Data Security and Sensitive Data
CC image: pixabay.com/en/computer-security-business-767784/
Don’t let this be you! (or your faculty, or your students…)
Image www.neatorama.com/2013/04/24/Backup-Your-Data/
Common options for data storage:
● Local hard drives (weak)Ex: personal or office desktop, laptop computer
● External storage devices (weak) Ex: USB drives, External hard drives
● Networked storage (okay)Ex: university servers, but see Colby**
● Cloud storage services (okay) Ex: Microsoft, RackSpace, Amazon, Google
Data Storage: Best Practices● Back up all data frequently, especially after
major changes
● Automate the backup process
● Use ‘versioning software’ (see ITS) or file names to track changes in team projects
The “Rule of 3”: Keep three copies of key data… in at least two different locations
(original file, local backup, remote backup)… in at least one offline/offsite location
Sensitive Data:
…is any data that, if released, could harm the people who participated in the research:
● Address, birth date, name, location● Sensitive political opinions● Sexual practices● GPS data locating endangered species● Coordinates for burial sites or sacred places
This is treated with caution; few archiving options now.
Concepts in Sensitive Data
● Research ethics: protect identities of people interviewed; minimize risk of any leaks
● Confidentiality: how participants’ identifiable private information will be managed and disseminated
● Disclosure risk: increased with online accessibility of data or storage of documents
Sensitive Data: Best Practices
● Collect data without identifying information, if possible
● Strip sensitive or identifying information before archiving or sharing research data
● Encrypt your computer, and use secure connections, and secure servers
● Place sensitive data in a restricted archive with an embargo (time delay) or ethics approval required for access
Ask Yourself (#3):
Using the project summary, ask yourself:
- where will data be stored? - who is responsible for storage and backup? - how will you manage access to sensitive
data?
4: Data Retention & Preservation
image from datasupport.researchdata.nl/
“What data do I keep?”It all depends on:
…whether data is irreplaceable
e.g. are there other copies of this book, document, version, image, interview?
…how much data is needed to verify or reanalyze a research project
…policies of funders, IRB, discipline
Best Practices: Data Preservation
● Use open-source, non-proprietary files
● Include all software needed, if possible
● Note all files and their relationship/structure
● Identify who is responsible for preservation
● Determine how long data should be held
● Budget time and money before starting a project to properly preserve and archive data at the end!
Ask Yourself (#4):
Using the project summary, ask yourself:
- Which data should be kept? Why? - How long should data be kept for? - Who is responsible to preserve the data?
5: Data Sharing and Publication
Fears in sharing data…
Often, researchers want to hide their data:● Fear criticism of their methods/results● Fear exposure of confidential data● Fear political/legal ramifications● Fear getting “scooped” on analysis● Believe benefits are low, and the cost is high
CC image: pixabay.com/en/hands-holding-embracing-loving-718562/
But, sharing data…
● Is often required by journals and funders
● Reduces the costs of research by reducing project duplication
● Is a valuable check on methods and ethics
● Helps promote faculty discoveries
● Increases the impact of faculty work
● May support faculty tenure or salary increases!
Relevant data repositories:
and of course…
Data Papers:
Dataset Description
Reuse Potential
Methods
Overview/Context
Data as a Publication● Data which has been shared can be cited:
Data citations involve: author, title, year, publisher / archive, version, URL or DOI for access.
● Data citations are a metric that can support tenure and promotion for our faculty!
● ORCiDs can help people find and cite data by a given researcher.
Best Practices in Data Sharing
● Find out who owns the data (researcher? university? funding organization?)
● Review legal issues such as copyright or publishers’ embargoes
● Consider ethical issues related to sensitive data or communities
● See publisher/funder requirements for sharing
Data Management Plans
CC image: pixabay.com/en/whiteboard-man-presentation-write-849812/
What’s in a Data Management Plan?
All the things we’ve discussed!
What’s in a Data Management Plan?
● What types of data will be created?● Who will own, have access to, and be
responsible for managing these data?● What equipment or methods will capture,
process and document the data? ● Where will data be stored during and after
active research? ● How will the data be shared with current or
future researchers?
Data Management Plans (DMPs) are a great way to…
plan how you’ll handle research materials describe how you’ll document, store, and
share data so that others can use it remain accountable for how you use and
share research materials get funded on major research projects!
All research proposals sent to the National Science Foundation (NSF) must include a 2-page data management plan, showing how the data will be cared for and shared.
The NSF is a common source of research money in: anthropology, geography, psychology, economics, government, STS, and many interdisciplinary projects.
The NSF expects that all researchers:
“should be prepared to place their data in fully cleaned and documented form in a data archive or library within one year after the expiration of an award.
Before an award is made, investigators will be asked to specify in writing where they plan to deposit their data set”
- National Science Foundation guide for social and economic sciences at nsf.gov/sbe/ses/common/archive.jsp
For the NEH, data are “materials generated or collected during the course of conducting research.”
Humanities data such as “citations, software code, algorithms, digital tools, documentation... geospatial coordinates… reports, and articles” should be archived. Sensitive information can be excluded.
So, humanities faculty should also have a plan for how they’ll archive and share their research data! Source: neh.gov/files/grants/data_management_plans_2015.pdf
How do we actually make DMPs?
● Templates are a starting point:
● However, researchers still need to carefully think through data issues with grants officers, peers, or librarians
● http://libguides.colby.edu/data_mgmt
Sample DMPSimage: asphalttexas.com/wp-content/uploads/2014/06/Screen-Shot-2014-06-18-at-4.33.29-PM.png
Data management at Colby:• Liaisons are first point of contact
• Suzi and Celia advise on further issues
• We are an ICPSR member; quantitative researchers can deposit data there.
• Images and data may be archived in Digital Commons/Shared Shelf; check with Marty.
cf. libguides.colby.edu/data_mgmt.
Question: What 3 things can you do this year with data management?
Image: http://www.dailymail.co.uk/news/article-2728736/Otter-aerobics-Large-group-spotted-going-paces-synchronised-exercise.html
More questions? Contact us!
Celia Emmelhainzcelia.emmelhainz@colby.edu
Suzi Coleswcole@colby.edu
Thanks to New England Collaborative Data Management Curriculum for sharing their slides.
Many thanks to Leslie Barnes, Dylanne Dearborn, and Andrew Nicholson at University of Toronto for sharing their abbreviated slides (http://guides.library.utoronto.ca/RDM-intro), from which this presentation was adapted for the humanities.
top related