data and donuts: the impact of data management

Post on 16-Apr-2017

349 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Impact of Data

ManagementC. Tobin Magle, PhD

Sept. 29, 20169:00-10:00 a.m.

Morgan Library Computer Classroom 173

but the same principles apply to both

data management !=

data sharing

Why should I care about data management?

Rinehart, AK. “Getting emotional about data” College & Research Libraries News September 2015 vol. 76 no. 8 437-440

Everything* is digital

• Needs new skills• Data are ephemeral• Facilitates sharing

*ok not everything, but most things

More researchers

https://www.nsf.gov/statistics/2016/nsf16300/digest/nsf16300.pdf

See arXiv:1402.4578 for details

Working Email

Data are extant(If status known)

Status of data (if response)

Response (if email working)

doi:10.1016/j.cub.2013.11.014

We are losing vast amounts of data

00

0

0

0

0

0

0

0

00

0

0

1

1

1

11

1

11

1

1

1

1

1

1

1

0

00

0

0

0

000

000 0

1

1

1 1

10

Research funding is tight

http://www.bu.edu/research/articles/funding-for-scientific-research/

Funders want to do more with less

http://figshare.com/blog/2015_The_year_of_open_data_mandates/143

White House’s 2013 OSTP

“The Obama Administration is committed to the proposition that citizens deserve easy access to the results of research their tax dollars have paid for. That’s why, in a policy memorandum released today, OSTP Director John Holdren has directed Federal agencies with more than $100M in R&D expenditures to develop plans to make the results of federally funded research freely available to the public—generally within one year of publication.”

http://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research

NSF post-award requirements

“Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.”

http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/aag_6.jsp#VID4

In other words…In other words…

It’s good for science

• Improves research reproducibility

• Improves efficiency

• Spurs innovation

It’s good for you

• You are the future data user

• Your data get used (and cited)

• Exposure to collaborators

• More competitive grants

But wait…

Barriers to data sharing

“But it’s mine, I don’t want to share!”

• Usually funded by public money• See White House statement

• If you work for CSU, the university actually owns your data

• You are the steward• CSU promotes open data

“But my data are too small to be useful”

“But I work with sensitive/private data”

• CAN share deidentified data

• CAN share summary data • https://clinicaltrials.gov/

• Controlled access• See dbGaP @ NCBI re: NIH genomic data sharing

policy• Release metadata so people know the data exist and

ask for it• Identifying personal genomes by surname

inference• https://www.ncbi.nlm.nih.gov/pubmed/23329047

“But I’m planning applying for a patent!”

• Ok data sharing isn’t right for you

• But good data management practices have benefits even if you don’t share!

• Can share later

What is data management?

The policies, practices and procedures needed to manage the storage, access and preservation of data

produced from a research project

Where does data management fit into

research?

Throughout the whole research cycle

Hypothesis

The research cycle

Hypothesis Experimental design

The research cycle

Hypothesis DataExperimental design

The research cycle

Hypothesis DataExperimental design

Results

The research cycle

Hypothesis DataExperimental design

ResultsArticle

The research cycle

Hypothesis DataExperimental design

ResultsArticle

The research cycle

Hypothesis DataExperimental design

ResultsArticle

Data Management Plans

The research cycle

HypothesisRaw data

Experimental design

Tidy Data

ResultsArticle

Data Management Plans

Cleaning

Analysis

The research cycle

HypothesisRaw data

Experimental design

Tidy Data

ResultsArticle

Data Management Plans

Cleaning

Sharing

Analysis

Open Data

ClosedData

Archiving

The research cycle

HypothesisRaw data

Experimental design

Tidy Data

ResultsArticle

Data Management Plans

Cleaning

Sharing

Analysis

Open Data

Code Reproducible Research

ClosedData

Archiving

The research cycle

HypothesisRaw data

Experimental design

Tidy Data

ResultsArticle

Data Management Plans

Cleaning

Sharing

Analysis

Open Data

Code Reproducible Research

Reuse

ClosedData

Archiving

The research cycle

HypothesisRaw data

Experimental design

Tidy Data

ResultsArticle

Data Management Plans

Cleaning

Sharing

Analysis

Open Data

Code Reproducible Research

Reuse

ClosedData

Archiving

The research cycle

top related