alain frey research data for universities and information producers
TRANSCRIPT
Research Data: Opportunities and Challenges for Universities and Information Producers
Online Information Conference 2013 “Research Data Management” - 20 November 2013 Alain Frey Scientific & Scholarly Research [email protected]
TODAY’S DISCUSSION
Research Data
• Big Data – the growth of data
• Opportunities and Challenges for Universities
• Response from Thomson Science Data Citation
Index Initiative
2
Sound discovery relies upon solid supporting data.
• In recent years there has been a tremendous acceleration in
the movement toward open sharing of research data – a
great, and growing volume of research data is now available.
NATURE | News
Gene data to hit milestone*
With close to one million gene-expression data sets now in publicly accessible
repositories, researchers can identify disease trends without ever having to
enter a laboratory.
This article describes how the publicly available Gene Expression Omnibus research data
repository was used by Stanford investigators to lead them to identify a new drug target for
diabetes.
The investigators explain the “beauty of analyzing data from multiple experiments” is that
biases should cancel out between data sets, stating – “there is safety in numbers”.
* Nature News, Nature Publishing Group, Jul 18, 2012.
Copyright © 2012, Rights Managed by Nature Publishing Group
RESEARCH DATA: PRICELESS, DIVERSE, DISPARATE
Research Data – Influences of Sharing
The impact of the National Science Foundation mandate
on the research community has been extended to the
library community, as information professionals find a
greater need to serve as trusted consultants in advising
university faculty on data management planning.
RESEARCH DATA: PRICELESS, DIVERSE, DISPARATE
Sound discovery relies upon solid supporting data.
• The value of available data to research communities across
all disciplines is massive given the potential for its re-use.
• However, gaining a clear understanding of what data exists,
and where, is a challenge.
• Research data repositories are many, they are separately
maintained, and present a variety of schemes of organization
and search capability.
Research Data – Influences on Sharing
6
– In recent years a significant motivating factor for data sharing
has been the influence of funding organizations.
– In the United States, an event that would impact all aspects
of research data curation and sharing was a 2010 mandate
from the National Science Foundation –
– All funding proposals submitted or due on or after
January 18, 2011, must include a “Data Management Plan”
describing how the proposal will conform to NSF policy on
the dissemination and sharing of research results.
– Globally, other funding organizations have produced similar
mandates.
Scientific data: open access to research results will boost Europe's innovation capacity
• European Commission (July 2012) outlined
measures to improve access to scientific
information produced in Europe.
• Broader and more rapid access to scientific papers
and data will make it easier for researchers and
businesses to build on the findings of public-funded
research.
• Boost Europe's innovation capacity and give
citizens quicker access to the benefits of scientific
discoveries.
7
Challenges for Universities to Manage Research Data
Data Management is one of the essential areas of
• Issues addressed at the university level
– Meet funding body requirements
– Ensure research integrity and ability to repeat
– Accurate, complete records; authentic and reliable
• Benefits for the university
– Increases university research efficiency and effectiveness
– Saves time and money in the long run
– Prevents unnecessary duplication of research by sharing
– Complies with international standards in research
practices
8
The Process for University
• Data Management Planning
• Documenting Data
• Issues for Data Storage
• Data Security Issues
9
Data Planning
• Type of Data to be housed in repository
• Audience for the data
• Control of data (deposit by PI, researchers,
university management)
• Sharing requirements – planning
• How long should data be available (impacts
storage and planning)
• Directory naming and conventions
• Strategies for back up
10
Documenting Data
• Impacts retrieval and sharing
• Identify Naming conventions at the beginning
– Is the data the result of research protocols or survey data
where naming is essential for retrieval and use
• Interoperability
• Administrative – preservation of data and rights
management
11
Data Storage Issues
• Networked drive storage – ensures back up ease
and management by IS
• Available when required
• Storage in a central repository
• Minimizes risks associated with data loss
12
Research Data – Diverse and Disparate Sources
Today there are many quality repositories
maintained for the purpose of providing access to
research data. Available mechanisms for research
data curation and sharing are proliferating.
However, gaining a clear understanding of what
exists can be a challenge – and repositories are
separately maintained, with varying schemes of
organization and search capabilities.
RESEARCH DATA: MAKING IT DISCOVERABLE, ACCESSIBLE, & CITABLE:The Information Provider’s Response
• The need for a multidisciplinary resource to bring this content
into the mainstream
• There is clearly a need for a single point of access to
quality research data from repositories across
disciplines and across the globe.
• This is the objective of Thomson Reuters’ Data Citation Index
• The Data Citation Index is a resource that resides on the well
known Web of Knowledge platform alongside gold-standard
resources such as Web of Science and BIOSIS Citation
Index.
• As with all Thomson Reuters resources, quality is extremely
important. Therefore our approach with the Data Citation
Index is to identify, evaluate, and select key repository
content for inclusion.
REPOSITORY EVALUATION & SELECTION
For the first phase of the Data Citation Index we have identified
data repositories that have some of the most relevant, widely
applicable data and prioritized these for early stage inclusion.
• We will closely monitor usage trends and feedback from our
customers to ensure our content strategy aligns with users’
needs.
• For example, regional needs will be monitored to determine
the best way to meet the expectations of researchers
worldwide.
• Though at this time we cannot yet state growth “numbers”, we
have aggressive goals associated with indexing additional
data repositories going forward.
TR DCI record:
data repository data study data set
microcitation
TR takes descriptive
metadata feed from repository
Repository raw metadata is analyzed by
TR
TR adds metadata
Thomson Reuters Indexing of Research Data Repositories
16
Data Citation Index will include all records within repositories. More than 2
million records are included at launch.
These are organized within four Document Types:
Repository: the resource comprised of data studies, data sets and/or
microcitations. Stores, presents, and provides access to the data.
Data Study: Descriptions of studies or experiments with associated data which
have been used in the data study. Includes serial or longitudinal
studies over time.
Data Set: A single or coherent set of data or a data file provided by the
repository, as part of a collection, data study, or experiment.
17
DOCUMENT TYPES
DATA CITATION INDEX
The Data Citation Index will expose important research
data and drive access to it through the Web of Science
platform
• In combination with Web of Science resources that provide
critical coverage of scholarly journals, books, and conference
proceedings, the Data Citation Index works to provide a
comprehensive view of scholarly research bringing research
data into the same arena as the published literature that it
supports
DATA CITATION INDEX: HOW IT LOOKS AND WORKS
The full record presents fundamental
information about this data study –
an abstract, data type,
miscellaneous descriptors, and basic
taxonomic data.
Through recommendation of a standard format for citing
research data we hope to impact the research community’s
citing practices – facilitating capture and unification of cites
to research data going forward.
Exposing this Data to the General Research Community
The full record serves as a central point
from which to collect information around
this data study, and link to related
information – such as the articles that have
referenced this Data Study.
Above all though – the Data
Citation Index is about getting
users to research data itself.
Link to the Data Set information
within the repository.
Above all though – the Data
Citation Index is about getting
users to research data itself.
Link to the Data Set information
within the repository.
Remaining within the Data
Citation Index, link to all
records associated with this
data study -- or link out directly
to associated data sets.
Information may of course be
printed, e-mailed, or archived
within EndNote Web, EndNote, or
added to one’s ResearcherID
publications list.
Research Data – Opportunities and Challenges
25
As Research Data continues to grow exponentially it is critical
to make this source available to the International Research
Community
As policies change and universities adapt standards for
research storage and sharing of research results, it become
more critical for this data to be part of the continuum of
research
Thanks you