developing a resource discovery proposition for scientific ... · discovery for datasets was an...

18
Developing a resource discovery proposition for scientific datasets at the British Library EBLIP6 30-06-11 Rachael Kotarski, Content Specialist – Datasets [email protected] Elizabeth Newbold, Content and Collections Leader [email protected]

Upload: others

Post on 19-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Developing a resource discovery proposition for

scientific datasets at the British LibraryEBLIP6 30-06-11

Rachael Kotarski, Content Specialist – Datasets

[email protected]

Elizabeth Newbold, Content and Collections Leader

[email protected]

British Library, Science and TechnologyCollections and Content

Strengths of the collection:

• All aspects of science, technology and medicine – including a strong focus on industry and applications of science

• All material is of a high technical standard and ‘research relevance’is a key factor in selecting material

• International in coverage and scope; material acquired from all of the major STM publishers

• Print monographs and serials - Extensive journal collections including trade magazines and newsletters

• Grey literature (conference proceedings, reports, theses, official publications); Patents and Maps

• National library of the United Kingdom

• Origins of the collections in science as a distinct resource forscientists and engineers date from 1850 and the Patent Office Library

• Science is an integral part of the British Libraries remit

• Serves business & industry, researchers, academics and students through dedicated reading rooms in London and our document supply services based in Boston Spa

3

Why data?

• Data are a vital part of the scientific record.

• Growing number of mandates and requirements from funders and publishers to make data available:

• In the UK: RCUK funders, Wellcome, CRUK

• Internationally: e.g. Genome Canada, NIH, NSF, DFG, INSERM

• But researchers in areas where this is a new requirement need advice, support and the appropriate tools and resources to ensure they can share, find and reuse data

• But what is/should be/will be the role of libraries in this changing landscape?

• Data as a format is very different from traditional library content, so are libraries equipped with the knowledge, technology and capacity to deal with it?

• How can libraries prepare for this?

We needed to look at the landscape of data and the services that the Library could provide to investigate our potential role further.

By research datasets, we mean scientific information generated by experiments,

observation or computation, which forms an evidence base for the

work of researchers. That information may be stored in any digital form, including

text, numbers, images, video, audio, software, algorithms and models.

4

What do we mean by data?

Late PHASE 1

2007 Consultancy reports

STM Strategy

2008 Content strategy

Dataset content specialist in post

2009 Scoping

DataCite metadata working group

Assess suitable Library systems

2010 PHASE 2

Low-key data discovery pilot

Promotion of pilot

Survey

2011 Extension of pilot

Expanding subject scope

Analysis

Background and timeline

5

PHASE 1: Scoping of the ‘data’ landscape.

•We commissioned Key Perspectives and RAND to

look at the datasets available and assess the kinds of

services for data that the British Library would be

best placed to provide.

•These were worked into the overall STM strategy.

•The Content Strategy for STM 2008-2011 was

devised, with specific reference to datasets.

•Recruitment of an STM Datasets Content Specialist

PHASE 2: Low key pilot.

•To test the approach and gauge user interest and

need for such a service.

•Analysis to judge sustainability and use.

PHASE 1Scoping a role for the Library:Consultancy reports and STM Content Strategy

• Key Perspectives suggested four different

approaches, which RAND explored further,

fleshing out the options proposed by KP based

on ‘supply’ and ‘demand’ characteristics of

datasets.

• Both reports highlighted that providing

discovery for datasets was an important avenue

for the Library to investigate further.

6

• As a result, the focus on enabling and developing discovery of datasets was worked

into the Library’s STM content strategy 2008-2011. In detail, points included:

• Develop and test selection criteria for reference datasets

• Develop relationships with data stakeholders

• Explore the role of Libraries in developing mechanisms to facilitate longer term access

and persistence

7

How to test a discovery proposition?

• A service involving a ‘new’ material type would raise

questions about:

• Users

• Selection

• Metadata

• Operational sustainability

• To build the evidence to answer these questions, we:

• incorporated datasets questions in on going research for

other projects (UKPMC, RIC, Flooding project, PhD focus

groups, life science case studies)

• sought out similar user research from the literature

• worked internally to draw out suitable processes and

systems

• These would give us theoretical evidence, but to draw

concrete conclusions, we needed to pilot a service.

Options for a pilot

Most importantly, we wanted to use the technical solutions that were already available

in the Library. Options were drawn up for the shape of a discovery service. These were:

•BL webpage-based discovery: This service would be created and based within the

content management system (CMS), Percussion. • Similar to CISTI’s Scientific Data Gateway.

•BL Integrated Catalogue: This option would see data resources catalogued into Aleph.

The records would then be surfaced via the Integrated Catalogue and Primo.• Similar to TIB Catalogue’s inclusion of data.

•Themed Collection Catalogue: This entails a standalone database for discovery as well

as storage, administration and editing of discovery metadata. • Similar to ViFaBiO.

•Primo-based discovery: This option sees metadata indexed Primo (from Ex Libris).

Metadata can be stored anywhere providing it can be ‘fed’ to Primo. • Similar to Search Oxford Libraries Online.

8

Phase 2: The pilot

9

Collecting evidence: Metrics

• In order to measure the success of the pilot, we needed to

engage our users. We took a survey approach.

• The survey needed to answer questions of user need, but

also their thoughts on the shape and direction of pilot.

• We looked at earlier surveys to phrase questions for

comparable results that would still be specific to the pilot.

• We also included profiling questions.

• We also looked at the actual use of the services through

views of each record, and SFX click-through data from SoC

to the resource itself.

• We had to keep in mind we were only included a

limited set of records with limited scope

10

Promotion of the pilot and survey

• Dataset records that were made public in May 2010 would

only be discoverable by accident, so we:

• created a webpage explaining the pilot, with Adobe Captivate

videos demonstrating how it worked, and example records.

• actively promoted via JISC email lists, British Library

newsletters, FaceBook , Twitter and user training sessions

• Survey was released in October, promoted using the same

methods but additionally on the SoC homepage and in user

training sessions.

• Response to the survey was disappointing possibly due to:

• General lack of use of SoC, which was still in ‘beta’ itself

• Lack of users with a current interest in research datasets

• Limited subject scope restricted the number of potentially

interested users

11

12

So how did the pilot answer our questions?Users / Usage

Do researchers need to find data?

• 9% of respondents said they do not currently need to find

data to reuse.

• But 100% of those expect to reuse data in the future.

Where are researchers getting their data?

• Pilot survey: Spread across all sources, but primarily Web

searches and Colleagues.

• Our other surveys and case studies showed comparable

results, although with a stronger bias towards literature

and web searches, and colleagues and collaborators.

What kind of data are they looking for?

• Pilot survey: Non-specifc, although would prefer to not

need to search again.

• Our other surveys showed a need for a wide variety of

data types, including non-digital and supplementary data

So how did the pilot answer our questions?USERS contd…

13

Will researchers use the services to find data?

•Our initial usage stats suggest yes.

•Use has remained stable, and although the number of records viewed has decreased,

the number of those that lead the user to view the dataset remains stable.

So how did the pilot answer our questions?USERS: Comparable usage

How does this compare with their use of other content?

•Compared to other resources accessed from Search Our Catalogue, the ratio of users

who go on to access datasets remains high (when factoring for the number of records

actually available) DESPITE the restricted subject scope of the records available*.14

What have we learned?

• There is a role for libraries! Although many concentrate on libraries’ role in storage and

preservation, we can quickly and easily start with how we enable discovery of data.

• It is achievable! The first year of our discovery pilot has been successful in demonstrating

one of the options for enabling discovery of research data.

• Available library systems can handle the discovery of datasets, but work is needed to

ensure staff understand the differences between data and traditional content.

• Many researchers aren’t currently able to define their needs for data, but this will

change: we need to remain engaged to maintain understanding of these changing needs.

• You have to get involvement from a lot of people – the pilot involved people from every

directorate of the Library.

15

The U.S. National Archives. Public Domain. Via Flickr

Future direction

PHASE 3 will involve:

• Assessing sustainability, particularly time

requirements of maintenance

• Harvesting and simplifying metadata

• Expanding subject scope for wider engagement

• On-going monitoring of usage

• Re-use in other projects: comparing approach

e.g. for subject portals

16

17

Thank You!

Any Questions?

Links and references

LinksSearch Our Catalogue (soon to be ‘Explore the British Library’): http://search.bl.uk

STM@BL website: http://www.bl.uk/science

RAND report: http://www.rand.org/pubs/technical_reports/TR567.html

Refs:Sharing research data to improve public health: joint statement of purpose. (2011, January 10). Retrieved from

http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Public-health-and-

epidemiology/WTDV030690.htm.

Funders’ Data Policies. Retrieved 2011-06-23, from http://www.dcc.ac.uk/resources/policy-and-legal/funders-data-policies.

Researchers and Discovery Services: Behaviour, Perceptions and Needs. A study commissioned by the Research Information

Network. (November 2006). Research Information Network. Retrieved from

http://www.rin.ac.uk/system/files/attachments/Researchers-discovery-services-report.pdf

Patterns of information use and exchange: case studies of researchers in the life sciences. (November 2009). Research

Information Network. Retrieved from http://www.rin.ac.uk/system/files/attachments/Patterns_information_use-

REPORT_Nov09.pdf

Cyberinfrastructure Vision for 21st Century Discovery. (March 2007). Retrieved from

http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf

18