Bloomsbury ConferenceUCL, London
6.25.10
Fourth Bloomsbury Conference on e-Publishing and e-PublicationsValued Resources: Roles and Responsibilities of Digital Curators and Publishers
Conceptualizing Library Data Curation and Publishing Services at
Purdue University
D. Scott BrandtAssoc Dean for Research
Purdue University Libraries
Charles WatkinsonDirector
Purdue University Press
Bloomsbury ConferenceUCL, London
6.25.10
Structure of the Presentation
I. Some Background & ContextII. Exploring Library’s Role in the “Data Deluge”III. Data Curation Profiles: what we’re learningIV. What a Publisher can learn from the Profile “Data curation is the activity of managing and
promoting the use of data from the point of creation, to ensure its fitness for contemporary purposes and availability for discovery and reuse.”
Bloomsbury ConferenceUCL, London
6.25.10
Purdue University and Purdue Libraries • ~38K students, ~1.8K faculty• Strengths in science, technology,
agriculture, & engineering. • 12 subject-oriented Libraries + units • University press a unit (only 11% of
US presses report within Libraries)Directors of Office of Copyright,
Finance, and the University Press
Assoc Deanfor Digital
Programs and Information
Access
Assoc Deanfor Planning & Administration
Bloomsbury ConferenceUCL, London
6.25.10
secondary/tertiary
resources
publishedresearchtraditional
publishedresearch
non-traditional
unpublishedresearch
non-traditional
publisheddata/
datasets
Modified from: Brandt, D.S. “Scholarly Communication” (in To Stand the Test of Time: Long-Term Stewardship of Digital Data Sets in Science and Engineering.: Final Report of Workshop New Collaborative Relationships: Academic Libraries in the Digital Data Universe. ARL, Washington, DC, September 2006.)
analyzeddata/
datasets
processeddata/
datasets
“raw”data/
datasets
Analyzed data might need to be reviewed prior to publication, or in case of questions after publication. It is increasingly linked as “supplementary data” by publishers
Quite often data must be scrubbed/anonymized, or processed to format prior to analysis; some disciplines share this data widely within their communities (e.g., astronomy, physics, etc.)
Some raw data are shared readily (e.g., genetics), but also quite often are discarded, depending on discipline
Bloomsbury ConferenceUCL, London
6.25.10
PUL response to “data deluge”
• Investigating research data needs and building relationships with faculty, in order to:
• Design, build, assess prototype infrastructure, tools and services to handle digital data.
• This approach recognizes the disciplinary-specific nature of faculty needs, though there is a tension between this and the practical requirements of building a sustainable suite of services/digital infrastructure.
Bloomsbury ConferenceUCL, London
6.25.10
Our organization to achieve this vision
Faculty Liaisonsubject librarians
disciplinary faculty
Rights ManagementUniversity Copyright Office
Publishinge-Pubs & Press
Data ManagementD2C2
Bloomsbury ConferenceUCL, London
6.25.10
1. Investigating Research Data Needs
• Strategy 1: Embedding data scientists in research projects; D2C2 provides this expert consultancy.
• Strategy 2: Creating tools to structure conversations about data; Data Curation Profiles help liaison librarians structure their conversations.
D2C2 DCP
librarians
researchers
Bloomsbury ConferenceUCL, London
6.25.10
Adapted from: e-Science and the Life Cycle Model of Researchhttp://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc
• Developing a Content Organization Framework for Regenstrief Center Healthcare Delivery Hub• Enabling end-to-end geospatial data modeling workflows via INPort: The Isotope Networks Portal
• Ingest, Preservation and Access for Water Quality Datasets in an Institutional Repository• Developing a Data Management and Curation Workflow for Camp Calcium
• Integrating Spatial Educational Experiences (ISEE) into Crop, Soil, and Environmental Science Curricula• INTEROP: Developing Community-based DRought Information Network Protocols and Tools for Multi -disciplinary Regional Scale Applications
• Leveraging Relational Information in the HUBs using Linked Data • Investigate and Implement Persistence for HUB Resources• DataCite (founding member) • Prototype publications linked to data through e-Pubs and Purdue University Press.
2. Solving Problems and Developing Prototype Tools, Systems, Services
ResearchOutcomes
AnalysisDataProcessing
DataCollection
StudyConcept &
Design
DataAccess &
Dissemination
Bloomsbury ConferenceUCL, London
6.25.10
Data Curation Profiles
Bloomsbury ConferenceUCL, London
6.25.10
Profiling Data
• Research Data Lifecycle (what’s the story of the data from producer's perspective)
• Data Management / Storage• Disposition of the Data• Data Dissemination and Sharing• Data Preservation and Repositories• Roles for Libraries, Librarians, and
PublishersSample Profile link
Bloomsbury ConferenceUCL, London
6.25.10
Disposition of the Data• Willingness / Motivations to share
– feelings/reservations/willingness towards sharing
• Access control– need to restrict or control access to/from others
• Target data for sharing – stage in the lifecycle the data should be shared
• Value of the data – real or potential value, from their perspective
• Embargo (and reasons why/why not)
Bloomsbury ConferenceUCL, London
6.25.10
What data curators can learn
• Advancing university-based cyberinfrastructure is dependent on our understanding of how to support data practices and needs
• Sharing is at the heart of success: collecting, storing, and making use of data can only come after the means for sharing are in place
• We cannot collect and curate all data, particularly in a way that facilitates effective re-use – We will need to work with researchers to develop
selection and appraisal guidelines, and data services
from: M. Cragin. (2009) “Data Sharing, Small Science, and Institutional Repositories.” UK e-Science All Hands Meeting: Oxford, UK
Bloomsbury ConferenceUCL, London
6.25.10
Data Curation Proliferation
dataconservancy.org
DCP
12workshops
Bloomsbury ConferenceUCL, London
6.25.10
What publishers can learn• Researchers want to disseminate outputs, but ranges
in scope, format, use• They are generally willing to share data with others,
but not without certain restrictions, or benefits for themselves
• They hold on to their data but do not do much to curate it; what is most easily or willingly shared is not always the data that has the most re-use value
Bloomsbury ConferenceUCL, London
6.25.10
Purdue UP lesson learned 1“Researchers want to disseminate outputs, but ranges
in scope, format, use”• Print books and subscription-based journals, PUP’s
traditional focus, are not enough• PUP / Libraries need to offer a range of different
channels to fit different needs• PUP / Libraries need a venue to experiment with
hybrid or new models
Bloomsbury ConferenceUCL, London
6.25.10
Post Print
Pre Print
Scholarly Impact of Content
Sou
rce
of s
chol
arsh
ip
Masters Thesis
Fac
ulty
Adm
inS
tude
nt
Honor Papers
Una
ffili
ated
Low High
“A Continuum of Scholarly Content” in the IR(with thanks to J.G. Bankier, Berkeley Electronic Press)
Book Faculty Journal
Faculty Conference
Datasets/Primary research
Non-research output
Dissertation
Graduate Journal
Undergrad Journal
Newsletter
Symposium
Admin ReportAlumni Magazine
Commencement address
Society Journal
Policy Report
Research Finding
Undergrad Conference
Committee Meetings Research Reports
Historical Collection
Red stars = Purdue UP?Blue stars = Purdue e-Pubs?
Bloomsbury ConferenceUCL, London
6.25.10
Purdue UP lesson learned 2“Researchers willing to share data with others, but not
without certain restrictions/benefits”• PUP provides a layer of editorial services for
credentialing that can incentivize data sharing• PUP needs to make it easy to link to and cite data in
publications (Datacite so important!)• PUP / Libraries need to be nuanced in their Open
Access messages (OA is not always right strategy)
Bloomsbury ConferenceUCL, London
6.25.10
Follow in-text URLs to supplementarydata
View spreadsheets on-site or download them fromyour personal computer
Read the full text of the book on yourportable device
Bloomsbury ConferenceUCL, London
6.25.10
Purdue UP lesson learned 3“What is most easily/willingly shared is not always data
that has the most re-use value”• Move away from producing data supplements for
publications to producing supplementary publications to drive re-use of data
• Take advantage of being “inside the tent” to have deeper conversations with scholars about what is most important data for reuse
Bloomsbury ConferenceUCL, London
6.25.10
Next Steps• Spreading the use of DCPs so that we can get a more
complete picture of faculty behavior variations around data
• More clearly defining library-based publishing services, and building relevant skills and tools in Libraries and Press
• Communicating to faculty the full range of library services they have access to, and changing their old views of what Purdue Libraries and Purdue UP “do”
Bloomsbury ConferenceUCL, London
6.25.10
Thankyou!
D. Scott [email protected]
Charles [email protected]