mass digitisation? astrid verheusen projectmanager research & development division national...

26
Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation Royal Library, Copenhagen, Denmark 25 October 2007

Upload: baldwin-davis

Post on 30-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Mass digitisation?

Astrid VerheusenProjectmanagerResearch & Development DivisionNational library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Royal Library, Copenhagen, Denmark25 October 2007

Page 2: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

What is mass digitisation?

• Millions of books rather than millions of pages

• No selection/no collections (digitise everything!)

• Mainly books

• Exclusion of special collections

• Low quality standards

• Ignore copyright issues

• Ignore long term preservation issues

2

Page 3: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek - Digitisation in the past

3

• Experience with digitisation since 1995

• Webexpositions / highlights of collections

• Small-scale digitisation projects

• Mainly visually attractive images

• Emphasis on techniques / trial and error

• Exploration of possibilities

• Co-operation on a small scale

Page 4: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

4

Page 5: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek - Digitisation 2000-2005

55

Shift in emphasis:• From highlights to larger collections

• Project based

• (Inter)national co-operation

• Established methods and techniques

• Awareness of digital preservation

• More text material & audio/video

• Further exploration of possibilities

applications made with the digitised material

Page 6: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

66

Memory of the Netherlands

Page 7: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek - present & future -1

77

• Strategic plan 2006-2009:”Development of a national

programme for the mass digitisation of sources for

research in the humanities”

• Target audience

• Scientific research

• Public at large

• Development of standards and services

• Particular attention for digital preservation

• Preservation imaging

• No commercial partners for funding

Page 8: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek - present & future -2

88

Text digitisation

• Until recently: on a small scale

• Printed and typed sources (not handwritten)

• Issues differ from images• Structure / navigation• Conversion to full text (OCR)• Scanning from microfilm• Search & Retrieval

Page 9: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

9

Project Number of pages

Budget

Dutch parliamentary papers 1814-1995 2.300.000 M€ 10.5

Dutch daily newspapers 1618-1995 8.000.000 M€ 12.5

Special collections – books before 1800 1.300.000 M€ 3.0

Radio news bulletins 1.500.000 M€ 0.5

Metamorfoze - preservation imaging 28.000.000? M€ 24

Atjeh 200.000 M€ 0,3

Memory of the Netherlands 350.000 M€ 3,5

Totaal 42.150.150 M€ 54,3

Koninklijke Bibliotheek - Projects 2007-2011

Page 10: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek - Issues

1010

• Costs of digitisation: € 1.3 per page

• Costs of exploitation: millions per year from 2011 onwards

• Technical infrastructure

• Storage (1 PB needed)

• Processing 2 million files per month

• Search & retrieval is not effective enough

• Organisational infrastructure is not efficient

• The process is too slow, we want to digitise faster and

more...

Page 11: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

1111

We cannot slow down to make things perfect

The rising tide will lift all boats

Mass Digitization: Implications for Information Policy Report from “Scholarship and Libraries in Transition: A Dialogue about the Impacts of Mass Digitization Projects” Symposium held on March 10-11, 2006 University of Michigan, Ann Arbor

Page 12: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

12

Content PresentationSearch & Retrieval

StorageProcessing

Project management & Organization

Finance

Page 13: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Content: Selection & Preparation

Old approaches

• Much effort spent on selection

• Ignorence of copyright issues…

• Minute assessment of missing material

• Replacement of torn pages

13

Page 14: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Content: Selection & Preparation

New approaches

• Less effort on the selection process (integral

collections)

• Negotiation/co-operation with publishing

sector

• Limited effort on retrieving missing

pages/issues

• Limited effort on restoration

14

Page 15: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Content: Digital imaging & metadata

Old approaches

• Very high quality images

• Capture as much detail from the original as possible

• Minimize damage to the original

• Master & access images

• Lossless compression (TIFF)

• Experiment with our own scanners

15

Page 16: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Content: Digital imaging & metadata

New approaches

• One format for both access and preservation

• New formats to save storage (JPEG2000)

• Outsource all imaging activities

• Consider .txt as a master…

16

Page 17: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Processing: Quality assurance

Old approaches

• High standards for quality assurance (often

manual)

• Expensive Document Management System for

quality control

17

Page 18: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Processing : Quality assurance

New approaches

• Not realistic to check quality for all files

• We need automatic quality assurance tools

• OCR often not involved in quality assurance

18

Page 19: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Search & Retrieval

Old approaches

• Find the best search engine

• Search in metadata

• Digitise text without OCR

• We decide what the user wants

19

Page 20: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Search & Retrieval

New approaches

• All text digitisation projects include OCR

• Search through millions of pages of text

• Experiment with tools for enhanced access &

textmining

• Growing awareness that we have to involve our

users

20

Page 21: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Storage

Old approaches

• Storage on CD Rom and DVD

• Master files in e-Depot: 1 Petabyte needed

• Storage of all master files for the long term

• Access files are stored in a different system

21

Page 22: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Storage

New approaches

• Storage strategy which balances costs, access

and preservation

• Alternative file formats to minimize storage

costs & increase throughput for delivery and

transfer

• Use one file both as master and access file

22

Page 23: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Finance

• All costs are now specified

• Division of budget

• 30 % Staff

• 10 % Hard- & software

• 10 % Research & Development

• 50 % Digitisation, OCR & metadata

• Exploitation costs are becoming ‘dramatic’

• New business models

23

Page 24: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Organisation

• All digitisation activities in R&D department

• Involvement of other parts of the library is necessary

• Digitisation & digital preservation are separate

activities

• Integration is necessary

• Digitisation activities are all project based

• Integration with standing organisation is necessary

24

Page 25: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

‘Holding out for an ideal solution is often not feasible;

moreover, implementing less-than-perfect solutions can

enable us to be flexible, modular, and nimble so that we can

continue to refine our strategies as new options become

available’.

Preservation in the Age of Large-scale Digitization

A white paper

By Oya Y. Rieger

Council on Library and Information Resources

25

Conclusion

Page 26: Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation

Koninklijke Bibliotheek – National Library of the Netherlands

LIBER-EBLIDA Workshop on Digitisation

Thank [email protected]