reinhard altenhöner
Post on 11-Jan-2016
42 Views
Preview:
DESCRIPTION
TRANSCRIPT
| IFLA2010. Newspaper section | 2010-02-26
Changing preservations tasks for the German National Library: Some insights and preliminary
remarks
IFLA International Newspaper Conference 2010 at IGNCA, New Delhi in India during 26th February to 28th February, 2010
"Digital Preservation and Access to news and views”
Reinhard Altenhöner
1
| IFLA2010. Newspaper section | 2010-02-26
ToC
2
1. Starting situation / setting
2. Digital Preservation in DNB
3. Practical Example: E-Papers
| IFLA2010. Newspaper section | 2010-02-26
Publications issued in Germany since 1913 Since June 22, 2006: Online- / Net-
publications are covered by the new law Newspapers as well: Ca. 450 newspapers
(this means selection!) are microfilmed every day
About 9.000 datasets in the central database Some years ago we started some
brainstorming on alternatives for this MF-approach collecting e-papers from the web Archiving of print-files Cooperation with media / clipping agencies
DNB: Our task: Collecting and archiving, providing permanent access
3
| IFLA2010. Newspaper section | 2010-02-26
Frequent update-processes Dedicated publication workflow: database,
Content-Management-System, presentation on the fly
Web 2.0-facilities for comments, blogging & tagging
Multiple ways of embedded advertisement Complex navigation and search functions Harvesting extremely difficult some experiments (e.g. on newsletters), but
no running workflow
Characteristics Online-newspapers
4c
| IFLA2010. Newspaper section | 2010-02-26
„kopal“
Co-operative development of a long-term digital information archive
Start in 2004
Task: Development of a standardized long-term preservation solution to facilitate resp. solutions for other libraries / industries
Basis: DIAS (Digital Information and Archiving System) of the Royal Dutch Library, condensed and extended with peripheral open-source
Enhancement for cooperative usage Development of an universal object scheme Hosting outside the library (remote access)
5
| IFLA2010. Newspaper section | 2010-02-26
kopal: cooperation
GWDG: HostingIBM: Archiving SW
DNB: Ingest/Acess SW
SUB: Ingest/Acess SW
Common task: Preservation Planning
6
| IFLA2010. Newspaper section | 2010-02-26
GWDG(Göttingen)
DIAS by IBMDIAS by IBM
Account 1
Account 2SUB Göttingen
DNB(Frankfurt)
Localsoftware
Localsoftware
Localsoftware
Localsoftware
kopal: Structure & concept
Partners nn
7
| IFLA2010. Newspaper section | 2010-02-26
Packaging
Submission Information Package
ObjectMETS 1.4
UniversalObjectFormat
LMER 1.2 – Long-term preservation Metadata for Electronic Ressources
HeaderdmdSecamdSec File SectionStructural Map
Mets.xml
8
| IFLA2010. Newspaper section | 2010-02-26
Administration InterfacekoLibRI
Online-Archivist
Machine Interface
| IFLA2010. Newspaper section | 2010-02-26
Kopal preservation strategy
Migrate object with urn xxx into new format yyy
Migrate all objects of format xxx and/or that have been ingested before a certain date
and/or that are larger than zzz MB into new format xyz (e.g. from TIFF to PNG)
Implementation of emulation view paths
No restriction as of file size or file format / type – all known and unknown file formats are being accepted (text, pictures, video, audio, executables, ... etc.)
10
| IFLA2010. Newspaper section | 2010-02-26
Digital newspapers in DNB
Some results (collections) from digitisation projects
- Simple graphics-data- access in a dedicated system - Including full text OCR & access
Online-Newspapers: Some pre-studies on objects like „Spiegel“ – but no running workflow
Concentration on e-papers
11
| IFLA2010. Newspaper section | 2010-02-26
Digitisation results in DNB 1
12
| IFLA2010. Newspaper section | 2010-02-26
Digitisation results in DNB 2
13
| IFLA2010. Newspaper section | 2010-02-26
E-papers in DNB
Preliminary thoughts: Requirements
Structured normalised metadata-set:Article/photo – issue – newspaper
Persistent identification of each unique objects, linkage between them, citable
Added information for author / title on the article level is useful but not necessarily needed
14
| IFLA2010. Newspaper section | 2010-02-26
Quantity:- One newspaper: ca. 150 articles per day / 900 a
week / 47.000 per year- 21.150.000 per year
Start modestly
Retrodigitisation (collection started with 1913) will extend this to more than 1 bil. articles
Challenge in terms of resources and technical capacities
E-paper requirements
15
| IFLA2010. Newspaper section | 2010-02-26
In cooperation with a vendor after a tender procedure
Ca. 20 important newspapers, starting with two
Metadata should be delivered in ONIX.
Harvesting Interface OAI-PMH
All data delivered in a XML-File
Integrated Digital Preservation in the kopal environment
E-paper project (recently started)
16
| IFLA2010. Newspaper section | 2010-02-26
XML record for e-Papers
17
| IFLA2010. Newspaper section | 2010-02-26
E-Paper & Access
Principal question for access: Integration in Portal environment or dedicated (independent) search-area
Advanced requirements for segmentation of text
Direct link between portal (metadata) and text
Navigation / Browsing within the object, direct access to single chapters / pages
Zooming, scroll
Integrated Full text search
Print and Store facilities
DRM, IDM
18
| IFLA2010. Newspaper section | 2010-02-26
6
FilmInformation about actors, director, producers, music, sequence, year of production. Short description of the picture, video sequence…What is in the film, rights.Any other relevant information as short summary of content for fast access…
Related booksYear of printing, editions, authors, summary of the book….
Related internet linksYear of printing, editions, authors, summary of the book….
Related music scoreYear of printing, editions, authors, summary of the book….
Related films Year of printing, editions, authors, summary of the book….
Related songsYear of printing, editions, authors, summary of the book….
Related newsYear of printing, editions, authors, summary of the book….
Semantic
Multimedia-
Search
5
COREProfessionals
(Media archives…)
MANTLEAutomated(Learning)
SHELLEnd-User(Wikipedia)
Open
Knowledge
Networks
4
Knowledge base
Semantic
relation
3
Face Logo
Text Person
Speaker 1Speaker 2
Image
Text
Title
Content-
analysis
2
Automated
optimisation
1
digitisation
Reuse of results from CONTENTUS-project
19
| IFLA2010. Newspaper section | 2010-02-26
Data processing
Automated Page-segmentation(headlines, images, tables)
OCR + entity recognition
Full text search
Semantic search interface
Based on:
Intellectual approved authority files
Statistical data analysis
| 20
20
| IFLA2010. Newspaper section | 2010-02-26
Our solution currently
21
Integrated search and retrieval
| IFLA2010. Newspaper section | 2010-02-26
Next step: Integrated E-papers
22
| IFLA2010. Newspaper section | 2010-02-26
Integrated E-paper „ZEIT“ 1
23
| IFLA2010. Newspaper section | 2010-02-26
Bereitstellung von freien Texten
24
Integrated E-paper „ZEIT“ 2
| IFLA2010. Newspaper section | 2010-02-2625
Integrated E-paper „ZEIT“ 3
| IFLA2010. Newspaper section | 2010-02-26
Reinhard Altenhöner
mailto:r.altenhoener@d-nb.de
http://www.d-nb.de
26
top related