research life cycle for geodata 2014

68
The Research Data Life Cycle From Flickr by Velo Steve Carly Strasser California Digital Library GeoData 18 June 2014

Upload: carly-strasser

Post on 26-Jan-2015

108 views

Category:

Science


2 download

DESCRIPTION

Presentation on challenges for research data management and the data life cycle, for GeoData meeting in Boulder, 18 June 2014.

TRANSCRIPT

Page 1: Research Life Cycle for GeoData 2014

The Research Data Life Cycle

From Flickr by Velo Steve

Carly Strasser California Digital Library

GeoData 18 June 2014

Page 2: Research Life Cycle for GeoData 2014

Why don’t people share data?

Is data management being taught? Do attitudes about

sharing differ among disciplines?

What role can libraries play in data

education?

How can we promote storing data in repositories?

What barriers to sharing can we eliminate?

NSF funded DataNet Project Office of Cyberinfrastructure

Page 3: Research Life Cycle for GeoData 2014
Page 4: Research Life Cycle for GeoData 2014

Enable data sharing Encourage

new incentives

Think about code sharing

Work with libraries, publishers and

researchers

Explore new tools to help

change system

Build tools

Page 5: Research Life Cycle for GeoData 2014

From

Flic

kr b

y gs

agos

tinho

Outreach Education

Assistance

You’re doing it wrong!

Page 6: Research Life Cycle for GeoData 2014

Back in the day…

Da Vinci

Curie Newton

classicalschool.blogspot.com

Darwin

Page 7: Research Life Cycle for GeoData 2014

Research has changed

Better

Page 8: Research Life Cycle for GeoData 2014

From wikimedia

Such Internet!

So many tools!

From Flickr by John Jobby

So much data!

Page 9: Research Life Cycle for GeoData 2014

Research has changed Worse

Page 10: Research Life Cycle for GeoData 2014

Digital data Fr

om F

lickr

by

Flick

mor

From

Flic

kr b

y US

Arm

y En

viron

men

tal C

omm

and

From

Flic

kr b

y D

W08

25

C. Strasser

Cour

tese

y of

WHO

I

From

Flic

kr b

y d

eltaM

ike

Page 11: Research Life Cycle for GeoData 2014

Digital data +

Complex workflows

Page 12: Research Life Cycle for GeoData 2014

From Flickr by ~Minnea~

Reproducibility Data management

Documentation

Page 13: Research Life Cycle for GeoData 2014

“Reproducibility Crisis”

“Digital Dark Age”

“Erosion of Trust”

Page 14: Research Life Cycle for GeoData 2014

“I own my data and you can’t have it.”

“Let me do my work.”

“I’m already too busy.”

“This takes away from research time.”

Page 15: Research Life Cycle for GeoData 2014

h/t Ted Hart, NEON

Page 16: Research Life Cycle for GeoData 2014

Data can’t be owned.

You can be the Guardian Steward Caretaker  

Page 17: Research Life Cycle for GeoData 2014

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

The Data Life Cycle

Page 18: Research Life Cycle for GeoData 2014
Page 19: Research Life Cycle for GeoData 2014

Discussion topics End game

Stakeholders & responsibilities Compliance

Costs Follow-up

Peer review Concrete steps

Page 20: Research Life Cycle for GeoData 2014

Liz Lyon: Dealing with Data 2008

UK funder expectations 2009

2009-­‐10  

DMPs: A Short History

Page 21: Research Life Cycle for GeoData 2014

Federal Funding Accountability and Transparency Act 2006

Across the Pond…

2010 2010  –present    

DMPs: A Short History

Page 22: Research Life Cycle for GeoData 2014

… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”

Feb 2013

Page 23: Research Life Cycle for GeoData 2014

From  Calisphere,    Courtesy  of    UC  Riverside,  California  Museum  of  Photography  

What do researchers think?

Page 24: Research Life Cycle for GeoData 2014

They don’t know about policies.

John  Kratz,  CLIR/DLF  Postdoc  at  CDL  

Page 25: Research Life Cycle for GeoData 2014

They aren’t taught data management.

Quality control and quality assurance The proper way to name computer files Types of files and software to use Metadata generation Workflows Protecting data Databases and data archiving Data re-use Meta-analysis Data sharing Reproducibility Notebook protocols (lab or field)

Strasser  &  Hampton  2013.  “Undergraduates  &  Ecological  Data  Management  Training  in  the  US”.    DOI:10.1890/ES12-­‐00139.1  

Page 26: Research Life Cycle for GeoData 2014

0  

10  

20  

30  

40  

50  

60  

70  BAS  

RU  

In Curriculum?

They aren’t taught data management.

Page 27: Research Life Cycle for GeoData 2014

No  one  reads  it  anyway.  

It’s  an  unfunded  mandate.  I  wrote  it  the  night  

before.  

They aren’t concerned.

Page 28: Research Life Cycle for GeoData 2014

What does success look like? DMPs… •  are flexible •  are useful and used •  result in easily discoverable data •  linked to open data •  are created in partnership with institutional service

providers •  are used as a/n (automated) compliance tool •  are part of the workflow of research •  include digital and non-digital materials (where

relevant)

Page 29: Research Life Cycle for GeoData 2014

“Community-driven” But what if community doesn’t care (yet)?

“Generic, work for everyone” But community-specific standards

Page 30: Research Life Cycle for GeoData 2014

Current DMP tools

From

Flic

kr b

y m

hlrad

io

Page 31: Research Life Cycle for GeoData 2014

Step-by-step wizard for generating DMP Create | edit | re-use | share | save | generate

Open to community

DMPonline: dmponline.dcc.ac.uk

Page 32: Research Life Cycle for GeoData 2014

Step-by-step wizard for generating DMP

Create | edit | re-use | share | save | generate

Open to community

DMPTool: dmptool.org

Page 33: Research Life Cycle for GeoData 2014

IEDA Data Management Plan Tool

Page 34: Research Life Cycle for GeoData 2014

dmptool.org

Page 35: Research Life Cycle for GeoData 2014
Page 36: Research Life Cycle for GeoData 2014

We want templates!

Page 37: Research Life Cycle for GeoData 2014

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

The Data Life Cycle

Page 38: Research Life Cycle for GeoData 2014

Scientists are bad at data management.

still <

Page 39: Research Life Cycle for GeoData 2014

From  Flickr  by  iowa_spirit_walker  

•  Cost •  Confusion about

standards •  Lack of training •  Fear of lost rights or

benefits •  No incentives

Page 40: Research Life Cycle for GeoData 2014
Page 41: Research Life Cycle for GeoData 2014

Data are being recognized as first class products of research

From Flickr by Richard Moross

NSF bio-sketches can include data

Data Publication

Data Citation

Page 42: Research Life Cycle for GeoData 2014
Page 43: Research Life Cycle for GeoData 2014

Journals Funders Peers

From Flickr by Eva Rinaldi Celebrity and Live Music Photographer

Page 44: Research Life Cycle for GeoData 2014

science source notebook content access data government knowledge

From

Flic

kr b

y cd

sess

ums

Page 45: Research Life Cycle for GeoData 2014

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

The Data Life Cycle

Page 46: Research Life Cycle for GeoData 2014

“Data Publication”

Page 47: Research Life Cycle for GeoData 2014

John Kratz, CLIR Postdoc

Page 48: Research Life Cycle for GeoData 2014

What does “data publication” mean? 1. Available 2. Citable 3. Trustworthy*

Data are

*peer reviewed? certified?

Props to Sarah Callaghan & colleagues

Page 49: Research Life Cycle for GeoData 2014

Available | Citable | Trustworthy

Publish means to “make public”. You should not have to email the author. The data doesn’t have to be open access.

“Email me!” CC-0 on web

Page 50: Research Life Cycle for GeoData 2014

Simple case…

Data citations should be in reference list. Five-element citation: author, year, title, publisher, identifier

Available | Citable | Trustworthy

Boettiger C, Dushoff J, Weitz JS (2009). Data from: Fluctuation domains in adaptive evolution. Theoretical Population Biology. Published in Dryad. doi:10.5061/dryad.j8n0p7vc

Page 51: Research Life Cycle for GeoData 2014

More complicated…

Deep data citation: what if you want to cite a subset? Dynamic data: how to create a reliable citation when a dataset is changing?

Available | Citable | Trustworthy

Page 52: Research Life Cycle for GeoData 2014

Technical VS. Scientific

Sometimes consider impact and/or novelty

Guidelines provided

Available | Citable | Trustworthy

From Flickr by Percival Lowell

Page 53: Research Life Cycle for GeoData 2014

1.  Data as supplemental material

Data published alongside a traditional journal article. Available + citable. Review varies. Potential issues with long-term availability.

What does a data publication look like?

From Flickr by subsetsum

Page 54: Research Life Cycle for GeoData 2014

2.  Data paper: Data + descriptive “data paper”

Most require data be in a trusted repository. All have a component of peer review. Examples: •  Standalone journals: Nature Scientific Data, Geoscience Data

Journal, Ecological Archives •  Journals that publish data papers: GigaScience, F1000 Research,

Internet Archaeology

What does a data publication look like?

From Flickr by subsetsum

Page 55: Research Life Cycle for GeoData 2014

3.  Standalone data

Data published without a related journal article. Rich metadata (structured or unstructured) Examples: •  Open Context •  NASA PDS Peer Review Data •  figshare (but no validation)

What does a data publication look like?

From Flickr by subsetsum

Page 56: Research Life Cycle for GeoData 2014

“Publish”

“Paper”

“Peer review” “Sharing”

“Available”

“Article” “Publication”

Page 57: Research Life Cycle for GeoData 2014

From Flickr by Sandia Labs

C. Strasser

C. Strasser

World Bank Photo Collection From Flickr

What do researchers think of data publication?

Page 58: Research Life Cycle for GeoData 2014

We have our work cut out for us.

Page 59: Research Life Cycle for GeoData 2014

Okay, I’ll share it. Where do I put it?

Page 60: Research Life Cycle for GeoData 2014

Repositories for data

General content

Non-institutional

Publishers/for-profits

Other

Institutional

Discipline-specific

Repository choices…

Page 61: Research Life Cycle for GeoData 2014

Institutional

Discipline-specific

•  All data associated with a paper

•  Tells a story •  Clearinghouse for

researcher’s works

•  Some of data for a given paper

•  Discoverable •  Integrated systems •  Collection policies

?  Both

Which should a researcher use?

Which is more important?

Depends

Repository choices…

Page 62: Research Life Cycle for GeoData 2014

Simplify data deposit for UC researchers

Branded for campus

Merritt underneath the hood

Page 63: Research Life Cycle for GeoData 2014

dash.berkeley.edu

Page 64: Research Life Cycle for GeoData 2014

github.com/cdluc3/dash/wiki

Page 65: Research Life Cycle for GeoData 2014

From  Flickr  by  dotpolka  

Hard work Shifting norms Exciting times

Page 66: Research Life Cycle for GeoData 2014

Website Email Twiter Slides

carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser

Page 67: Research Life Cycle for GeoData 2014

From  Flickr  by  dotpolka  

Hard work Shifting norms Exciting times

Page 68: Research Life Cycle for GeoData 2014

Website Email Twiter Slides

carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser