digitisation and digital libraries in the nl of the czech republic an overview jan hutař, nl Čr

26
Digitisation and digital libraries in the NL of the Czech Republic An overview Jan Hutař, NL ČR

Upload: blaze-franklin

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Digitisation and digital libraries in the NL of the Czech Republic

An overview

Jan Hutař, NL ČR

In next 40 minutes…

• Intro• Kramerius• Manuscriptorium• IOP/NDK

04/20/23 2UISK Summer Seminar

Intro• we started in the middle of 90s • digitization advantages– access etc.

• our main aim – preservation • users were only second focus• one of the first European libraries with real

digitiz. programme– very active in the field– leader of development

• today we are far behind the others……

04/20/23 3UISK Summer Seminar

Memoriae Mundi Series Bohemica

• MMSB started in 1992 as part of UNESCO Memory of the World Programme > on its own later on

• to provide access through digital copies of Czech cultural heritage (manuscripts first)

• VISK 6 • first steps – establishing of digitization dept. +

testing of technology processes– structure, settings, compression, resolution, user

interface etc.

04/20/23 4UISK Summer Seminar

MMBS cont.

• many of NK documents were digitized completely

• Antifonář Sedlecký (Antiphonarium Sedlecense) one of the oldest Czech music books

• published as CD-rom - first completely digitized medieval manuscript on the world!

• CD > digital repository

04/20/23 5UISK Summer Seminar

Manuscriptorium

• outcome of the UNESCO „Memory of the World“ initiative in 1992

• 1993 images on CD-ROM – impossible to use images outside the CD

• 2001 images on the Internet – only thumbnails• 2004 complete and HQ images on the Internet (licencing)• 2007 very low subscription fee • 2005 Jikji UNESCO award• 2007 ENRICH project

04/20/23 UISK Summer Seminar 6

Manuscriptorium cont.

• financed by NL (VISK 6), managed by AiP Beroun Ltd.• more than 1.200.000 (5.000.000 in total with ENRICH)

pages• the richest digital manuscript resource in Europe • DL of manuscripts, old printed books and other source

documents• virtual research environment for historical resources• virtual digital library of copies of images and full texts• brings together information provided by collaborating

partners (46)04/20/23 UISK Summer Seminar 7

Manuscriptorium cont.• provides connectivity for all participating partners into:– JIB, EDLproject, M-CAST, TEL

• OAI-PMH, Z39.50• MARC21, DC, MODS, OpenM …• own catalogue (search adjusted to special features for

old books)• interface for reading/studying of digital image (turning

pages, OCR, notes etc.• higher visit rate of partners local libraries• for free

http://www.manuscriptorium.com/

04/20/23 8UISK Summer Seminar

ENRICH project• eContentPlus program 2007>2009• to provide seamless and easy access to distributed digital

representations of old documentary heritage from various European cultural institutions

• to create a shared virtual research environment especially for study of manuscripts, but also incunabula, rare old printed books, …

• builds on the Manuscriptorium Digital Library • the project groups together almost 85% currently digitized

manuscripts in the national libraries in Europe (5.000.000 pages)

• target user groups are content owners/holders, ALM, researchers and general interest users

04/20/23 9UISK Summer Seminar

ENRICH partnersEuropean Networking Resources and Information concerning

Cultural Heritage

18 partners + many associated partners• National Library of the Czech Republic, Prague – project

coordinator• Oxford University Computing Services, Oxford, United Kingdom

NL of:• Spain, Lithuania, Italy, Iceland, Hungary, Portugal, Belgium,

Sweden, Turkey, Malta, Serbia

http://enrich.manuscriptorium.com/

04/20/23 10UISK Summer Seminar

Kramerius project• started in 1997 – NL + 3 biggest regional libraries• to save and provide access to bohemical documents printed on acid

paper• first periodicals > monographs• to preserve the information for the future (paper carrier is not

important) • using by users was not possible – bad conditions of the documents• first step to create technology background > set up the processes >

digitizing• this first phase was about microforms! hybrid method later• from 2000 part of VISK 7 (MK support – not only for NL)• not part of NL budget!

04/20/23 11UISK Summer Seminar

VISK 7

• between 2001 and 2008 done about 400 titles of periodicals (4.000.000 pages) from 35 institutions

• NL provides archiving of film masters and digital masters + copies etc.

• digitized docs available through the Kramerius DL

04/20/23 12UISK Summer Seminar

VISK 7 cont.

• VISK 7 budget is too low each year > too many large scale (extensive) periodicals (1850 till 1950) > digitization is too slow

• in-house + outsourcing• the same money not only for digitization but

for HW and SW + development• a bit better last years (HW) ;-)

04/20/23 13UISK Summer Seminar

Norway grants

• NL and Municipal library in Prague 2007-2009• to save bohemical non-periodical documents

from 19th century printed on acid paper• hybrid method (not giving up microforms ;-)• 2,400.000 pages/ 16.000 books at NL• financial support from EU - 1 mil EUR

04/20/23 14UISK Summer Seminar

Kramerius system

• Kramerius system vs. Kramerius DL vs. project ;-)• GNU GPL• NL, AV + other 10 libraries• system to provide access to digital documents• from 2002 – NL, AV, Qbizm• development money from VISK and MŠMT• JAVA, Linux, Apache, Tomcat, PostgresSQL,

Lucenehttp://kramerius.qbizm.cz/

04/20/23 15UISK Summer Seminar

Kramerius system• DTD for periodicals and monographs• big problem for other documents types and

changes• OCR in TXT, metadata XML (bibl., struct.,

preservat.); DjVu, JPG, PNG, PDF• Convera, Lucene for full text search• OAI-PMH even to the level of page• METS• UUID + handle• > moving in Fedora SW this year

04/20/23 16UISK Summer Seminar

Kramerius DL• digital library at NL – periodicals and monographs

from 19th century …• connected into– JIB– TEL www.theeuropeanlibrary.org– CZ union catalogue – Europeana

• English interface [x]

http://kramerius.nkp.cz/

04/20/23 17UISK Summer Seminar

Kramerius DL

• metadata always available• metadata and full text search• docs under copyright (1900?) only in the NL

available• turning pages, saving, pdf etc.• DjVu plug-in > JPEG2000 soon

04/20/23 18UISK Summer Seminar

NDK and IOP project

• NDK (general name for 3 present projects)• IOP - structural funds of EU for new EU

members (2005)• joint project of NL and Moravian state library

in Brno• NDK is a strategic priority and candidate for

European funding under the Integrated Operational Programme IOP (Smart Administration)

04/20/23 19UISK Summer Seminar

Why!?

• NL has long tradition and good results (TEL, Europeana etc.)• NL was awarded the first UNESCO/Jikji Memory of the

World Prize for contribution to the preservation and accessibility of documentary heritage

BUT• digitization and digital preservation in CZ are significantly

hindered by a lack of money, slowing digitization and preventing us from building a truly trusted (and certified) digital repository

04/20/23 UISK Summer Seminar 20

IOP

• two main goals:– Acceleration of digitization (two digitization centers in

Prague and Brno, mass digitization)– Long-term preservation of and access to digital objects

(TDR, 2 locations)

• core of the Czech national cultural heritage is about 1.2 milion docs/350.000.000 pages– 300 year with current speed– 20 year with IOP project

04/20/23 UISK Summer Seminar 21

IOP possible results• documents published in and since 1801:

– 540,000 documents, 137 million pages (1060 TB of raw digital data in one locality, 60 TB fast access for users)

• historical documents published until 1800: – 20,000 documents, 9 million pages (50 TB of raw digital data in one locality, all

fast access for users)

• WebArchive: – harvesting and archiving of 5 billion files (221 TB of raw digital data in one

locality, all fast access for users)

• Trusted digital repository (certified by internal as well as external audits)• User-friendly and customized access to digital content for various users.• The total budget of the project should be 29 million EUR

04/20/23 UISK Summer Seminar 22

HW SW overviewRobotic scanners

– 5x 4DigitalBooks– 5x Treventus

• manual scanners A2 + A0/A1• workflow SW – Goobi, CCS or other

Archival SW (commercial)- DIAS (IBM)- Rosetta (ExLibris)- SDB (Tessella)

- analysis, on site visits- LTP activity

04/20/23 UISK Summer Seminar 23

Where are we now?

- our analysis (policies, functional requirements, new building project etc.)>

- feasibility study in June/July- project will start on September 2009

04/20/23 UISK Summer Seminar 24

WebArchiv

- from 2000- yearly harvests + topic harvest- about 10TB of data- heritrix, wayback machine- IA, IIPC- part of NDK

04/20/23 UISK Summer Seminar 25

Thanks for your attention!Questions?

In case of late questions, dont hesitate send me an email ;-)

[email protected]

04/20/23 UISK Summer Seminar 26