british library datasets programme feb 2011

39
British Library Datasets Programme JISC RSP Winter School February 2011 Max Wilkinson

Upload: datasets-at-the-british-library

Post on 26-Jan-2015

106 views

Category:

Education


2 download

DESCRIPTION

Presentation to JISC Repository Support Programme Winter School, Feb 2011.

TRANSCRIPT

Page 1: British Library Datasets Programme Feb 2011

British LibraryDatasets Programme

JISC RSP Winter School

February 2011

Max Wilkinson

Page 2: British Library Datasets Programme Feb 2011

2

Today’s Talk

1. The British Library

2. Data in scholarly communication

3. The problem with data

4. The Datasets Programme Vision Strategy Activity (DataCite)

5. Other Projects

Page 3: British Library Datasets Programme Feb 2011

3

The British Library

Exists for everyone who wants to do research – for academic, personal, and commercial purposes.

Covers all subject areas – sciences, technology, medicine, arts, humanities, social sciences…

Receives a copy of every item published in the UK.

Holds over 150 million items, with 3 million items added each year.

Used by over 16,000 people each day (on site and online).

Page 4: British Library Datasets Programme Feb 2011

The British Library: some facts and figures

Helping people advance knowledge

to enrich lives

GIA Funding 08/09:£94.8m operational, £12m capital

Other funding secured 07/08: c.£33m

National library of the UK.

Serves researchers, business, libraries, education & the general public

Collection includes over 2m sound recordings, 5m reports, theses and conference papers, the world’s largest patents collection (c.50m)

3 main sites in London and Yorkshire. Circa 2,000 staff

Business and IP Centre: Providing inspiration, and enabling protection of creative capital and business development

Generates value to the UK economy each year of 4.4 times public funding

Collection fills over 600km of shelving and grows at 11km per year

70 Tb of digital material through voluntary deposit

British Library Act 1972National centre for reference, study, bibliographical and other information services, in relation both to scientific and technological matters, and to the humanities.

Science and Innovation Investment Framework 2004-2014, H.M. Treasury (2004)UK research base must have ready and efficient access to information of all kinds – such as experimental data sets, journals, theses, conference proceedings and patents. This is the life blood of research and innovation.

The largest document supply service in the world. Secure e-delivery and ‘just in time’ digitisation enables desktop delivery within 2 hours

Page 5: British Library Datasets Programme Feb 2011

5

Who do we serve?

The Researcher – We provide access to research level materials to all sectors including academia, industry, government, charities and NGOs.

Business -The British Library also has a critical role supporting businesses of all sizes, from individual entrepreneurs through to major organisations.

The Learner - We have an important role to play in supporting education from primary schools to developing future researchers of any age.

The Library Community – We play a key role in supporting the wider UK Library Community and information network.

The General Public - The services we offer include exhibitions and events, tours and web services which digitally showcase our collection.

Page 6: British Library Datasets Programme Feb 2011

6

Modern science relies on good data

Page 7: British Library Datasets Programme Feb 2011

7

Scholarly record

Discovery

AccessRecordPermanence

Citation

Metadata Exposure

Trust Fabrics

Copyright

Scholarlyrecord

Page 8: British Library Datasets Programme Feb 2011

8

The Foundation for Research

Data is a crucial component of the scholarly record.

Re-acquisition may be impossible

Datasets are essential to the British Library’s mission to advance the World’s knowledge.

Page 9: British Library Datasets Programme Feb 2011

9

Current Situation

No effective way to link between datasets and article;

No widely used method to identify datasets;

No widely used method to cite datasets.

Page 10: British Library Datasets Programme Feb 2011

10

As a result…

Datasets are:

Difficult to discover Difficult to access In danger of being lost

Page 11: British Library Datasets Programme Feb 2011

11

Difficult to Discover. Good luck finding the data!

“Source: Committee on Climate Change”

Page 12: British Library Datasets Programme Feb 2011

12

Data are diverse in the Digital Landscape

Seismic measurements taken by a geologist.

An audio archive of birdsong created by an ornithologist.

Genetic data collected by a medical researcher.

A survey of public opinions collected by a sociologist.

Page 13: British Library Datasets Programme Feb 2011

13

Re-join the gap…

(No) effective way to link between articles and datasets

(No) widely used method to identify datasets

(No) widely used method to cite datasets

Articles

Underlying data

Page 14: British Library Datasets Programme Feb 2011

14

Datasets – first class citizens?

Data is difficult to manage after project funding ceases

Informal networks provide the primary means of sharing

Only 21% use a national or international facility

Datasets are not included in impact analysis

Good luck finding it or getting permission to use it (your discipline may vary)

Source: UKRDS Study:The Data Imperative. Managing the UK’s research data for future use (Feb 2009)

Page 15: British Library Datasets Programme Feb 2011

15

Scholarly record

Discovery

AccessRecordPermanence

Citation

Metadata Exposure

Trust Fabrics

Copyright

Scholarlyrecord

Page 16: British Library Datasets Programme Feb 2011

16

Research training based on scholarly communication

Discovery

AccessRecordPermanence

Citation

Metadata Exposure

Trust Fabrics

Copyright

Scholarlyrecord

Rarely includes data

Page 17: British Library Datasets Programme Feb 2011

17

Scholarly communication requires intellectual exchanges

Discovery

AccessRecordPermanence

Citation

Metadata Exposure

Trust Fabrics

Copyright

Scholarlyrecord

No such data fabric

Page 18: British Library Datasets Programme Feb 2011

18

Scholarly discourse requires a record and provenance

Discovery

AccessRecordPermanence

Citation

Metadata Exposure

Trust Fabrics

Copyright

Scholarlyrecord

Almost non-existent for data

Page 19: British Library Datasets Programme Feb 2011

19

The Datasets Programme

We envision a future where researchers can:

Discover, access, reuse, and reference datasets.

Track the impact of the data that they generate and receive appropriate credit.

Our approach is to:

Provide a focus for the community to establish needs, requirements and agreement.

Explore novel technology and creative solutions.

Page 20: British Library Datasets Programme Feb 2011

20

Two key concepts

INCENTIVE

SUSTAINABILITY

Page 21: British Library Datasets Programme Feb 2011

21

Projects and activities

www.bl.uk/datasetsFollow us on twitter @datasetsBL

Page 22: British Library Datasets Programme Feb 2011

22

A Key Component for Many Goals

MakeVisible

Find

AccessTrackImpact

Verify

Reuse

Cite

?PersistentIdentification

Page 23: British Library Datasets Programme Feb 2011

23

Citation using Digital Object Identifiers (DOIs)

DatasetG.Yancheva, N. R. Nowaczyk et al (2007)Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA

Article CitationG. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoonNature 445, 74-77

How to reference

Published Article (Abstract or full text)

The DOI system offers an easy, internet actionable way to connect the article with the

underlying publication

But a complete scholarly record would also link to the evidential datasets and their

location, e.g. PANGAEA

doi:10.1038/nature05431

Page 24: British Library Datasets Programme Feb 2011

24

doi:10.1038/nature05431 leads to a landing page

Page 25: British Library Datasets Programme Feb 2011

25

Digital Object Identifiers (DOIs) offer a solution

Mostly widely used identifier for scientific articles

Researchers, authors, publishers know how to use them

Put datasets on the same playing field as articles

Connecting an Article with the Underlying Data

DatasetYancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA.doi:10.1594/PANGAEA.587840

URIs are commonly used but can decay

(e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).

Page 26: British Library Datasets Programme Feb 2011

26

doi:10.1594/PANGAEA.587840

Page 27: British Library Datasets Programme Feb 2011

27

Dataset citation using Digital Object Identifiers (DOIs)

DatasetG.Yancheva, N. R. Nowaczyk et al (2007)Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA

doi:10.1594/PANGAEA.587840

ArticleG. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoonNature 445, 74-77

doi:10.1038/nature05431

Data CitationScholarly record is complete

Page 28: British Library Datasets Programme Feb 2011

28

Projects – DataCite

DataCite is an international consortium which aims to:

Establish easier access to scientific research data on the Internet

Increase acceptance of research data as legitimate, citable contributions to the scientific record

Support data archiving that will permit results to be verified and re-purposed for future study.

Page 29: British Library Datasets Programme Feb 2011

29

DataCite

Support researchers by enabling them to locate, identify, and cite research datasets with confidence

Support data centres by providing persistent identifiers for datasets, workflows and standards for data publication

Support publishers by enabling research articles to be linked to the underlying data

DataCite : Data Centres :: CrossRef : Publishers

Page 30: British Library Datasets Programme Feb 2011

30

Digital Object Identifier (DOI)

doi:10.4124 / 0003.569Prefix Suffix

Page 31: British Library Datasets Programme Feb 2011

31

DOI prefix

doi:10.4124/0003.569Prefix Suffix

The British Library provides data centres with a unique prefix for DataCite DOI

For example, Archaeology Data Service uses 10.5284

Page 32: British Library Datasets Programme Feb 2011

32

DOI suffix

doi:10.4124/0003.569Prefix Suffix

Suffix generated by the data centre

Guidelines for DOI syntax are being developed

Page 33: British Library Datasets Programme Feb 2011

33

Resolving a DOI

doi:10.4124/0003.569Prefix Suffix

Resolving the DOI:

http://dx.doi.org/10.4124/0003.569

Page 34: British Library Datasets Programme Feb 2011

34

DOIs resolve to an open landing page

Page 35: British Library Datasets Programme Feb 2011

35

DataCite Service

Built a service for data centres to mint DOIs for datasets and store associated metadata (http://api.datacite.org)

British Library is trialling the service with several UK data centres, including:

Page 36: British Library Datasets Programme Feb 2011

36

Projects and activities

www.bl.uk/datasets

Page 37: British Library Datasets Programme Feb 2011

37

SageCite: Data citation in bioinformatics workflow

•Sage bionetworks data capture and analysis workflow (Tavenra: MyExperiemnt)•Data Citation service integration points and citation targets (e.g. data-models)•Recommendations•Benefits analysis

SageCite: Integration of data citation services into multi-contributor bio-informatics workflow. Establishing data attribution and credit mechanisms.► INCENTIVE

Sage Bionetworks: Aggregating datasets from contributors to create massive coherent datasets that can be used for systems level analysis of disease

Page 38: British Library Datasets Programme Feb 2011

38

Dryad UK: Repository sustainability

•Expand Publisher base•Seamless integration into publisher workflow•Sustainability models for datasets supplementary to publication

Dryad UK: Define a business case and pilot service integrating DataCite DOIs and dataset archiving into publisher workflows

► SUSTAINABILITY

Leveraging the Dryad Consortium, which is addressing the acquisition and storage of long tail supplementary data

Page 39: British Library Datasets Programme Feb 2011

39

For more information on the BL Datasets Programme

Max Wilkinson: Programme Manager; Datasets

Email:[email protected]

Email: [email protected]

WebSite www.bl.uk/datasets

Follow us on twitter @datasetsBL