coping with data for whoi jp students
DESCRIPTION
Data management best practices presentation for JP Students at Woods Hole Oceanographic Institution, 12 April 2014.TRANSCRIPT
![Page 1: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/1.jpg)
Coping With Your Data
Carly Strasser California Digital Library [email protected]
WHOI 10 April 2014
Tips & Tools
![Page 2: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/2.jpg)
Roadmap
3. Toolbox
1. Background
2. Best practices
![Page 3: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/3.jpg)
C. S
trass
er
![Page 4: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/4.jpg)
From Flickr by robertpaulyoung
Scientists are bad at data management.
![Page 5: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/5.jpg)
Many tables
![Page 6: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/6.jpg)
Embedded figures
![Page 7: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/7.jpg)
my spreadsheet
No headings
![Page 8: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/8.jpg)
my spreadsheet
![Page 9: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/9.jpg)
my spreadsheet
![Page 10: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/10.jpg)
![Page 11: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/11.jpg)
?
![Page 12: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/12.jpg)
From Flickr by ransomtech
Didn’t share the data Didn’t document the data (metadata) Didn’t document provenance/workflow
![Page 13: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/13.jpg)
From Flickr by ransomtech
Reproducibility Transparency Reuse NO
![Page 14: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/14.jpg)
From Flickr by johntrainor
Why should I care?
![Page 15: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/15.jpg)
Because they care:
From Flickr by Redden-McAllister
![Page 16: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/16.jpg)
the Truth
From
san
dier
past
ures
.com
Data management Metadata Data repositories Data sharing
You need to know
about
![Page 17: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/17.jpg)
… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”
Feb 2013
![Page 18: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/18.jpg)
1. Maximize free public access 2. Ensure researchers create data
management plans 3. Allow costs for data preservation and
access in proposal budgets 4. Ensure evaluation of data management
plan merits 5. Ensure researchers comply with their data
management plans 6. Promote data deposition into public
repositories 7. Develop approaches for identification and
attribution of datasets 8. Educate folks about data stewardship
From Flickr by Joe Crimmings Photography
![Page 19: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/19.jpg)
data management
From
Flic
kr b
y Bi
g Sw
ede
Guy
Best Practices
![Page 20: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/20.jpg)
From Flickr by Mark Sardella
Plan before data collection
![Page 21: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/21.jpg)
• Create a key (data dictionary) • Make sure names are unique • Define codes
From
Flic
kr b
y ze
bbie
Planning Design sample naming scheme
![Page 22: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/22.jpg)
PhDcomics.com
Planning Design file naming scheme
![Page 23: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/23.jpg)
Use descriptive file names • Unique • Reflect contents
From R Cook, ESA Best Practices Workshop 2010
Bad: Mydata.xls 2001_data.csv best version.txt
Better: Eaffinis_nanaimo_2010_counts.xls
Site name
Year What was measured
Study organism
*Not for everyone
*
Planning Design file naming scheme
![Page 24: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/24.jpg)
From S. Hampton
Planning Design file organization
![Page 25: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/25.jpg)
Biodiversity
Lake
Experiments
Field work
Grassland
Biodiv_H20_heatExp_2005to2008.csv Biodiv_H20_predatorExp_2001to2003.csv … Biodiv_H20_PlanktonCount_2001toActive.csv Biodiv_H20_ChlAprofiles_2003.csv …
From S. Hampton
Planning Design file organization
Consider… • Dependencies? • File formats? • Time of collection? • Order of analysis?
Workflows!
![Page 26: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/26.jpg)
Planning
Constrain entries Atomize Break down spreadsheets
Design your spreadsheet
From Flickr by Ulleskelf
![Page 27: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/27.jpg)
A relational database is A set of tables Relationships among the tables A language to specify & query the tables
A RDB provides
Scalability: millions+ records Features for sub-setting, querying, sorting Reduced redundancy & entry errors
From Mark Schildhauer
Planning Consider a database
![Page 28: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/28.jpg)
You should invest time in learning databases if your data sets are large or complex
Consider investing time in learning databases if your data are small and humble you ever intend to share your data you are < 30 years old
Planning
From Mark Schildhauer
Consider a database
![Page 29: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/29.jpg)
Store your data in a repository Institutional archive
Discipline/specialty archive
Pick a data repository
From Flickr by torkildr
Ask a librarian
Repos of repos: databib.org re3data.org
Planning
![Page 30: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/30.jpg)
From
Flic
kr b
y se
pa s
ynod
From Flickr by taberandrew
From Flickr by withassociates
What software? What hardware? What personnel?
How often? Set up reminders!
Test system
Decide on preservation/backup Planning
![Page 31: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/31.jpg)
…document that describes what you will
do with your data throughout
the research project
From Flickr by Barbies Land
Write a data management plan!
Planning
![Page 32: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/32.jpg)
DMP components
But they all have different requirements and express them in
different ways
• What will be collected • Methods • Standards • Metadata • Sharing/access • Long-term storage
Planning
From Flickr by Barbies Land
![Page 33: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/33.jpg)
Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community
dmptool.org Planning
![Page 34: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/34.jpg)
During Data Collection & Entry
From Flickr by Julia Manzerova
![Page 35: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/35.jpg)
Realistically: • Archive .csv version of raw data • Make a “raw” tab in working data file • Do all work on other tabs
During collection Keep raw data raw
![Page 36: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/36.jpg)
Raw data as .csv
R script for processing & analysis
During collection
Ideally: • Use scripts to process data • Save them with data
Keep raw data raw
![Page 37: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/37.jpg)
During collection Document your workflow
Temperature data
Salinity data
Data import into Excel
Analysis: mean, SD
Graph production
Quality control & data cleaning “Clean” T
& S data
Summary statistics
Data in spread-sheet
Workflow: how you get from the raw data to the final products of your research
Simple workflow: flow chart
![Page 38: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/38.jpg)
During collection
Workflow: how you get from the raw data to the final products of your research
Simple workflow: commented script
• R, SAS, MATLAB… • Well-documented code is
Easier to review Easier to share Easier to use for repeat analysis
# % $
&
Document your workflow
![Page 39: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/39.jpg)
Fancy schmancy workflows Resulting output
https://kepler-project.org
During collection Document your workflow
![Page 40: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/40.jpg)
Workflows enable • Reproducibility • Transparency • Reuse
From Flickr by merlinprincesse
During collection Document your workflow
![Page 41: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/41.jpg)
Constrain data entries • Excel lists • Data validation • Google docs forms
Modified from K. Vanderbilt
During collection
![Page 42: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/42.jpg)
Atomize During collection
One piece of information per cell
![Page 43: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/43.jpg)
Create parameter table
From doi:10.3334/ORNLDAAC/777
From doi:10.3334/ORNLDAAC/777
From R Cook, ESA Best Practices Workshop 2010
During collection Break down spreadsheets
Fake a relational database
Create a site table
![Page 44: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/44.jpg)
Why are you promoting
Excel?
During collection Create metadata
![Page 45: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/45.jpg)
Metadata: data reporting
WHO created the data? WHAT is the content
of the data set? WHEN was it created? WHERE was it collected? HOW was it developed? WHY was it developed?
From
Flic
kr b
y /\
/\ich
ael P
atric
|{
During collection Create metadata
![Page 46: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/46.jpg)
Digital context • Name of the data set • The name(s) of the data file(s) in the data set • Date the data set was last modified • Example data file records for each data type
file • Pertinent companion files • List of related or ancillary data sets • Software (including version number) used to
prepare/read the data set • Data processing that was performed Personnel & stakeholders • Who collected • Who to contact with questions • Funders
Scientific context • Scientific reason why the data were
collected • What data were collected • What instruments (including model & serial
number) were used • Environmental conditions during collection • Temporal & spatial resolution • Standards or calibrations used
Information about parameters • How each was measured or produced • Units of measure • Format used in the data set • Precision & accuracy if known
Information about data • Definitions of codes used • Quality assurance & control measures • Known problems that limit data use (e.g.
uncertainty, sampling problems)
During collection Create metadata
![Page 47: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/47.jpg)
• Provide structure to describe data Common terms | definitions | language | structure
• Come in many flavors EML , FGDC, ISO19115, DarwinCore,…
• Can be met using software tools Morpho (EML), Metavist (FGDC), NOAA MERMaid (CSGDM)
What is metadata?
Metadata standards…
During collection
Standard < Create metadata
![Page 48: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/48.jpg)
Back up daily During collection
From Flickr by lippo
From Flickr by see phar
Original Near
Far
![Page 49: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/49.jpg)
During collection
From Flickr by Barbies Land
Remember that data management plan?
Revisit Review Revise
![Page 50: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/50.jpg)
During collection
Schedule a time each week or month
Revisit Review Revise
From Flickr by purplemattfish
![Page 51: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/51.jpg)
From
Flic
kr b
y di
pste
r1
Toolbox
![Page 52: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/52.jpg)
Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community
dmptool.org Write a DMP
![Page 53: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/53.jpg)
databib.org
Where should I put my data?
Find a repository
![Page 54: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/54.jpg)
Get help
From
Flic
kr b
y th
ewm
att
![Page 55: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/55.jpg)
DCXL blog: dcxl.cdlib.org Toolbox:
Get help
![Page 56: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/56.jpg)
From
Flic
kr b
y No
rth C
arol
ina D
igita
l He
ritag
e Ce
nter
From Flickr by Madison Guy
Get help from your library
![Page 58: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/58.jpg)
From Flickr by Andy Graulund
Make a resolution • Triage on current
projects • Get advisor, lab
mates, collaborators on board • Do better next time
![Page 59: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/59.jpg)
From
Flic
kr b
y tw
m13
40
Culture Shift Ahead
![Page 60: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/60.jpg)
science source notebook content access data government knowledge
From
Flic
kr b
y cd
sess
ums
![Page 61: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/61.jpg)
From Flickr by dotpolka
Doing science is a privilege. Data hoarding is science malpractice.
Manage & share your data!
![Page 62: Coping with Data for WHOI JP Students](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c633ed4a7959991a8b4676/html5/thumbnails/62.jpg)
Website Email
Twitter Slides
carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser