rdap14: an analysis and characterization of dmps in nsf proposals from the university of illinois

17
An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014 William H. Mischo, Mary C. Schlembach, Megan A. O’Donnell University of Illinois at Urbana- Champaign Iowa State University

Upload: asist

Post on 27-Jan-2015

110 views

Category:

Education


1 download

DESCRIPTION

Research Data Access and Preservation Summit, 2014 San Diego, CA March 26-28, 2014 Lightning Talks William Mischo, University of Illinois at Urbana-Champaign

TRANSCRIPT

Page 1: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois

RDAP14 Research Data Access & Preservation SummitMarch 26, 2014

William H. Mischo, Mary C. Schlembach, Megan A. O’DonnellUniversity of Illinois at Urbana-ChampaignIowa State University

Page 2: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

NSF data Management Plans• Data Management Plans (DMPs): required

element in NSF proposals, January 2011• July 2011: the Library, working with the campus

Office of Sponsored Programs and Research Administration (OSPRA) began an analysis of DMPs in submitted NSF grant proposals

• Currently, looked at 1,600 grants with 1,260 in the analysis.

Page 3: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Reasons for DMPs• Make key research data available and sharable • Allow the use of data for verification of results

and reproducibility of research work• Agency can show significant return on

investment to justify funding• We want to know storage venues and

mechanisms for sharing and reuse• Also use of local templates and local campus

resources such as IDEALS

Page 4: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Follow-on• Develop campus-wide infrastructure (Research

Data Service - RDS) to support UIUC researchers in managing their data

• Assist in compliance with federal agencies• Develop important partnerships with campus

units (CITES, NCSA, Colleges) and national entities

• Develop best practices and standard approaches

Page 5: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Analysis• Analysis attempts to characterize and classify

DMPs into categories• DMPs assigned multiple categories• 1,260 DMPs from July 2011 to November 2013

Page 6: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Categories• PI Server – Servers and workstations that the PIs

(and their students/staff) use to store project data. Examples: laboratory server, external hard drive, and group computer.

• PI Website – Websites edited or administered by the PI or a group they belong to. If a departmental URL was given, it was also given the term “department.” Examples: lab website, project website, wiki, PI’s website

Page 7: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Categories• Campus – Services located, operated by, run by

UIUC or endorsed by UIUC. This includes IDEALS, netfiles and Box.net, NCSA, and Beckman.

• Department – Used when a department was specifically mentioned as providing a storage or hosting resource. Examples: Departmental website, departmental server, departmental backup service or a web address traced back to an academic department. Also given the “campus” label.

Page 8: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Categories• Remote – Services and sites not located on the

UIUC campus. Examples: NASA, other campuses, collaborative projects, non-UIUC institutes

• Disciplinary – Disciplinary repositories. Many are open access but not all. Examples: GenBank, arXiv, ICPSR, SEAD, Nanohub, and Dryad

• Cloud – Storage services using cloud technology. Examples: Google Documents, Google Code, Box.net, Amazon, Microsoft, Dropbox

Page 9: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Categories• Publication – Scholarly outputs including journal

articles, workshops, and conference presentations or posters. Very few DMPs were explicit as to how their “publications” and data were related or separated.

• Analog - Physical records including lab notebooks, photographs, and files. Does not include specimens or artifacts.

• Specimens - – Physical specimens; usually biological or artifacts

Page 10: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Categories• Optical Disc - DVD, CD, and Blu-ray discs. Often

used as a backup mechanism• Not specified – the DMP was not specific

enough for us to record details• No Data – Indicated the proposal will produce

no data products. Many were theoretical studies (math), travel grants, or workshop planning sessions.

• Local Template Used

Page 11: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

All DMPs (including “no data”)

n = 1260 

Category Number Percent  

PI Server 503 39.9%  

PI Website 529 41.9%  

Campus 667 52.9%

 

Department 142 11.2%  

Remote 353 28.0%  

Disciplinary 275 21.8%  

Publication 556 44.1%  

Cloud 63 5.0%  

Optical Disc 56 4.0%  

Analog 131 10.4%  

Specimens 111 8.8%  

Not Specified 66 5.2%  

Collaborative 164 13.0%  

No Data 103 8.2%

 

Page 12: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Data Venue and Risk

Data Location

Submitted ProposalsFunded ProposalsSince July 2011

n = 1260Risk of Loss,

Corruption, Breach n = 298Risk of Loss,

Corruption, Breach

PI Server/Website 64% High 61% High

Departmental Server/Website 11.2% Medium to High 7% Medium to High

Campus-Wide Resource 52.9%

Low

45%

LowIDEALS Institutional Repository 21.9% 19.8%

NCSA 4.3% 16.4%

Disciplinary Repository/Cloud 25.8% Medium to Low 21.4% Medium to Low

Remote Repository 28% Medium to High 22.8% Medium to High

Optical Disk, Specimens, Analog 19.4% Out of Scope 11% Out of Scope

Page 13: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Notables• Funded: 298• Used locally developed template: 254• IDEALS: 275• NCSA/XSEDE: 55• Dryad: 22• ICPSR: 17• Genbank/Genetics Repository: 55• ArX: 61• Only 87 DMPS contained information about file

types

Page 14: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Analysis

• Any differences in storage venue or technologies between the unfunded proposals and the funded proposals?

• Any differences between the proposals from the first year and the more current proposals?

• Can look at differences in any of the proposal categories between funded and unfunded

• 734 active NSF awards, $861.8 million

Page 15: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Analysis• Use of IDEALS institutional repository: 62

funded, 197 not funded: chi-square: 0.17• Storing data on PI server or website: 183

funded, 569 not funded: chi-square: 0.7• Disciplinary or Cloud: 67 funded, 241 not

funded: chi-square: 0.85• Remote storage: 68 funded, 267 not funded:

chi-square: 3.01

Page 16: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Analysis• Use of IDEALS before August 2012 = 108, after

(thru November 2013) = 166, chi-square: 4.59, p < .05

• Use of disciplinary or Cloud before August 2012 = 121, after = 182, chi-square: 4.33, p < .05

Page 17: RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois

Implications• Conclusions: 1: no significant differences

between funded/unfunded proposals in storage venues -- no advantage in IDEALS, Disciplinary; 2: more recent proposals suggest IDEALS and disciplinary repositories included at a significantly higher level

• What is the role of the library? The campus? The subject discipline?

• Connecting data to the literature important