two-stream model: toward data production for sharing field

1
Two-Stream Model: Toward Data Production for Sharing Field Science Data Some Implications Regarding data sharing: Data selection for shareability based on flexibility, usability, and ability to preserve. New divisions of labor needed for systematic, streamlined data production and to maintain focus of scientists on knowledge production. Conceptualizing The Work of Data Production for Reuse At present, researchers tend to manage their data to meet the end goal of knowledge production— the publication of results. However, the research process in the era of e-science needs to treat data as research products for dissemination (Baker and Millerand, 2010). The shift in data practices should not be conceptualized as data publication but rather as production of data for “release” for future, unanticipated applications (Parsons and Fox, 2012). In our work with earth scientists and their data, in the Site-Based Data Curation Project, we have conceptualized two separate processes for working with data in the course of research. The aim of the two-stream model is to help scientists consider future dissemination goals. The data production stream at the top of the two-stream figure represents an expanded conceptualization of data management. It draws attention to some of the new activities that need to be integrated into the research process for scientists collecting data in the field and conducting laboratory experiments. Introduction Current data sharing policies and technological capacity are not enough to realize the vision of data as open, accessible, and reusable resources. While data are central to the production of scientific publications, additional work is required to make field-oriented, earth science data available as stand-alone, functional research products. Scientific Data Practices Two Streams (1) Knowledge Production: data optimized for internal use Data for internal use are subject to a complex mix of processing, analysis, integration, and presentation strategies with the final form optimized for publication of papers. (2) Data production: data optimized for public reuse Data for public reuse are prepared with a more standardized set of procedures to create well- described, parameter-based sets of data for release to a data repository and for reuse by others. New Activities Involved in Data Production Data gathering Data description Data packaging Compliance checks with domain dictionaries with repository requirements Data release Acknowledgments This work is supported by the Institute of Museum and Library Services (IMLS). Through funding for two projects: the Site-Based Data Curation Project at Yellowstone National Park (SBDC; IMLS Award# LG-06-12-0706-12) and the Data Curation Education at Research Centers (DCERC; IMLS Award# RE-02-10-0004-10). Data Analysis Paper Submission File Naming Data Release Data Organizing Data Packaging Data Description Karen S. Baker 1 , Carole L. Palmer 1 , Andrea K. Thomer 1 , Karen M. Wickett 1 , Tim DiLauro 3 , Abigail Asangba 2, Bruce W. Fouke 2, G. Sayeed Choudhury 3 1 Center for Informatics Research in Science & Scholarship, Graduate School of Library & Information Science (GSLIS), University of Illinois at Urbana-Champaign (UIUC), 2 Department of Geology, University of Illinois Urbana-Champaign, 3 Sheridan Libraries, Johns Hopkins University (2) Data Production (1) Knowledge Production C2. Archive-Based Data Grouping D2. Repository Submission (digital files) Data Gathering Feed-Back Activities Compliance Checks References • Baker, K. S., and F. Millerand (2010). Infrastructuring ecology: Challenges in achieving data sharing Collaboration in the New Life Sciences: Ashgate. •Palmer, C. L. and M. H. Cragin (2009). Scholarship and disciplinary practices. Annual Review of Information Science and Technology 42(1): 163-212. • Parsons, M. and P. Fox, in press. Is Data Publication the Right Metaphor? Data Science Journal A. Data Collecting B. Data Processing C1. Project-Based Data Synthesis D1. Journal Tables & Figures Quality Control Continuing

Upload: others

Post on 21-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Two-Stream Model: Toward Data Production for Sharing Field Science Data

Some Implications Regarding data sharing: •  Data selection for shareability based on

flexibility, usability, and ability to preserve. •  New divisions of labor needed for systematic,

streamlined data production and to maintain focus of scientists on knowledge production.

Conceptualizing The Work of Data Production for Reuse

At present, researchers tend to manage their data to meet the end goal of knowledge production—the publication of results. However, the research process in the era of e-science needs to treat data as research products for dissemination (Baker and Millerand, 2010). The shift in data practices should not be conceptualized as data publication but rather as production of data for “release” for future, unanticipated applications (Parsons and Fox, 2012). In our work with earth scientists and their data, in the Site-Based Data Curation Project, we have conceptualized two separate processes for working with data in the course of research. The aim of the two-stream model is to help scientists consider future dissemination goals. The data production stream at the top of the two-stream figure represents an expanded conceptualization of data management. It draws attention to some of the new activities that need to be integrated into the research process for scientists collecting data in the field and conducting laboratory experiments.

Introduction

Current data sharing policies and technological capacity are not enough to realize the vision of data as open, accessible, and reusable resources. While data are central to the production of scientific publications, additional work is required to make field-oriented, earth science data available as stand-alone, functional research products.

Scientific Data Practices

Two Streams

(1) Knowledge Production: data optimized for internal use

Data for internal use are subject to a complex mix of processing, analysis, integration, and presentation strategies with the final form optimized for publication of papers.

(2) Data production: data optimized for public reuse

Data for public reuse are prepared with a more standardized set of procedures to create well-described, parameter-based sets of data for release to a data repository and for reuse by others.

New Activities Involved in Data Production

•  Data gathering •  Data description •  Data packaging •  Compliance checks

with domain dictionaries with repository requirements

•  Data release

Acknowledgments

This work is supported by the Institute of Museum and Library Services (IMLS). Through funding for two projects: the Site-Based Data Curation Project at Yellowstone National Park (SBDC; IMLS Award# LG-06-12-0706-12) and the Data Curation Education at Research Centers (DCERC; IMLS Award# RE-02-10-0004-10).

Data Analysis

Paper Submission

File Naming

Data Release

Data Organizing

Data Packaging

Data Description

Karen S. Baker1, Carole L. Palmer1, Andrea K. Thomer1, Karen M. Wickett1, Tim DiLauro3, Abigail Asangba2, Bruce W. Fouke2, G. Sayeed Choudhury3 1Center for Informatics Research in Science & Scholarship, Graduate School of Library & Information Science (GSLIS), University of Illinois at Urbana-Champaign (UIUC), 2Department of Geology, University of Illinois Urbana-Champaign, 3Sheridan Libraries, Johns Hopkins University

(2) Data

Production

(1)  Knowledge Production

C2. Archive-Based Data Grouping

D2. Repository Submission (digital files)

Data Gathering

Feed-Back Activities

Compliance Checks

References

•  Baker, K. S., and F. Millerand (2010). Infrastructuring ecology: Challenges in achieving data sharing Collaboration in the New Life Sciences: Ashgate. • Palmer, C. L. and M. H. Cragin (2009). Scholarship and disciplinary practices. Annual Review of Information Science and Technology 42(1): 163-212.

•  Parsons, M. and P. Fox, in press. Is Data Publication the Right Metaphor? Data Science Journal

A. Data Collecting

B. Data Processing

C1. Project-Based Data Synthesis

D1. Journal Tables & Figures

Quality Control Continuing