research life cycle for geodata 2014
DESCRIPTION
Presentation on challenges for research data management and the data life cycle, for GeoData meeting in Boulder, 18 June 2014.TRANSCRIPT
The Research Data Life Cycle
From Flickr by Velo Steve
Carly Strasser California Digital Library
GeoData 18 June 2014
Why don’t people share data?
Is data management being taught? Do attitudes about
sharing differ among disciplines?
What role can libraries play in data
education?
How can we promote storing data in repositories?
What barriers to sharing can we eliminate?
NSF funded DataNet Project Office of Cyberinfrastructure
Enable data sharing Encourage
new incentives
Think about code sharing
Work with libraries, publishers and
researchers
Explore new tools to help
change system
Build tools
From
Flic
kr b
y gs
agos
tinho
Outreach Education
Assistance
You’re doing it wrong!
Back in the day…
Da Vinci
Curie Newton
classicalschool.blogspot.com
Darwin
Research has changed
Better
From wikimedia
Such Internet!
So many tools!
From Flickr by John Jobby
So much data!
Research has changed Worse
Digital data Fr
om F
lickr
by
Flick
mor
From
Flic
kr b
y US
Arm
y En
viron
men
tal C
omm
and
From
Flic
kr b
y D
W08
25
C. Strasser
Cour
tese
y of
WHO
I
From
Flic
kr b
y d
eltaM
ike
Digital data +
Complex workflows
From Flickr by ~Minnea~
Reproducibility Data management
Documentation
“Reproducibility Crisis”
“Digital Dark Age”
“Erosion of Trust”
“I own my data and you can’t have it.”
“Let me do my work.”
“I’m already too busy.”
“This takes away from research time.”
h/t Ted Hart, NEON
Data can’t be owned.
You can be the Guardian Steward Caretaker
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
The Data Life Cycle
Discussion topics End game
Stakeholders & responsibilities Compliance
Costs Follow-up
Peer review Concrete steps
Liz Lyon: Dealing with Data 2008
UK funder expectations 2009
2009-‐10
DMPs: A Short History
Federal Funding Accountability and Transparency Act 2006
Across the Pond…
2010 2010 –present
DMPs: A Short History
… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”
Feb 2013
From Calisphere, Courtesy of UC Riverside, California Museum of Photography
What do researchers think?
They don’t know about policies.
John Kratz, CLIR/DLF Postdoc at CDL
They aren’t taught data management.
Quality control and quality assurance The proper way to name computer files Types of files and software to use Metadata generation Workflows Protecting data Databases and data archiving Data re-use Meta-analysis Data sharing Reproducibility Notebook protocols (lab or field)
Strasser & Hampton 2013. “Undergraduates & Ecological Data Management Training in the US”. DOI:10.1890/ES12-‐00139.1
0
10
20
30
40
50
60
70 BAS
RU
In Curriculum?
They aren’t taught data management.
No one reads it anyway.
It’s an unfunded mandate. I wrote it the night
before.
They aren’t concerned.
What does success look like? DMPs… • are flexible • are useful and used • result in easily discoverable data • linked to open data • are created in partnership with institutional service
providers • are used as a/n (automated) compliance tool • are part of the workflow of research • include digital and non-digital materials (where
relevant)
“Community-driven” But what if community doesn’t care (yet)?
“Generic, work for everyone” But community-specific standards
Current DMP tools
From
Flic
kr b
y m
hlrad
io
Step-by-step wizard for generating DMP Create | edit | re-use | share | save | generate
Open to community
DMPonline: dmponline.dcc.ac.uk
Step-by-step wizard for generating DMP
Create | edit | re-use | share | save | generate
Open to community
DMPTool: dmptool.org
IEDA Data Management Plan Tool
dmptool.org
We want templates!
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
The Data Life Cycle
Scientists are bad at data management.
still <
From Flickr by iowa_spirit_walker
• Cost • Confusion about
standards • Lack of training • Fear of lost rights or
benefits • No incentives
Data are being recognized as first class products of research
From Flickr by Richard Moross
NSF bio-sketches can include data
Data Publication
Data Citation
Journals Funders Peers
From Flickr by Eva Rinaldi Celebrity and Live Music Photographer
science source notebook content access data government knowledge
From
Flic
kr b
y cd
sess
ums
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
The Data Life Cycle
“Data Publication”
John Kratz, CLIR Postdoc
What does “data publication” mean? 1. Available 2. Citable 3. Trustworthy*
Data are
*peer reviewed? certified?
Props to Sarah Callaghan & colleagues
Available | Citable | Trustworthy
Publish means to “make public”. You should not have to email the author. The data doesn’t have to be open access.
“Email me!” CC-0 on web
Simple case…
Data citations should be in reference list. Five-element citation: author, year, title, publisher, identifier
Available | Citable | Trustworthy
Boettiger C, Dushoff J, Weitz JS (2009). Data from: Fluctuation domains in adaptive evolution. Theoretical Population Biology. Published in Dryad. doi:10.5061/dryad.j8n0p7vc
More complicated…
Deep data citation: what if you want to cite a subset? Dynamic data: how to create a reliable citation when a dataset is changing?
Available | Citable | Trustworthy
Technical VS. Scientific
Sometimes consider impact and/or novelty
Guidelines provided
Available | Citable | Trustworthy
From Flickr by Percival Lowell
1. Data as supplemental material
Data published alongside a traditional journal article. Available + citable. Review varies. Potential issues with long-term availability.
What does a data publication look like?
From Flickr by subsetsum
2. Data paper: Data + descriptive “data paper”
Most require data be in a trusted repository. All have a component of peer review. Examples: • Standalone journals: Nature Scientific Data, Geoscience Data
Journal, Ecological Archives • Journals that publish data papers: GigaScience, F1000 Research,
Internet Archaeology
What does a data publication look like?
From Flickr by subsetsum
3. Standalone data
Data published without a related journal article. Rich metadata (structured or unstructured) Examples: • Open Context • NASA PDS Peer Review Data • figshare (but no validation)
What does a data publication look like?
From Flickr by subsetsum
“Publish”
“Paper”
“Peer review” “Sharing”
“Available”
“Article” “Publication”
From Flickr by Sandia Labs
C. Strasser
C. Strasser
World Bank Photo Collection From Flickr
What do researchers think of data publication?
We have our work cut out for us.
Okay, I’ll share it. Where do I put it?
Repositories for data
General content
Non-institutional
Publishers/for-profits
Other
Institutional
Discipline-specific
Repository choices…
Institutional
Discipline-specific
• All data associated with a paper
• Tells a story • Clearinghouse for
researcher’s works
• Some of data for a given paper
• Discoverable • Integrated systems • Collection policies
? Both
Which should a researcher use?
Which is more important?
Depends
Repository choices…
Simplify data deposit for UC researchers
Branded for campus
Merritt underneath the hood
dash.berkeley.edu
github.com/cdluc3/dash/wiki
From Flickr by dotpolka
Hard work Shifting norms Exciting times
Website Email Twiter Slides
carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser
From Flickr by dotpolka
Hard work Shifting norms Exciting times
Website Email Twiter Slides
carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser