deutsche nationalbibliothek – software-supported ......recording ii/v classification based upon...
TRANSCRIPT
![Page 1: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/1.jpg)
1
Lider Roadmapping Workshop
Deutsche Nationalbibliothek –Software-supported Bibliographic Recording and Linked Data
Mark Zöpfgen
Leipzig, 02.09.2014
![Page 2: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/2.jpg)
Overview
- DNB – German National Library
- Activities in Content Extraction and Semantic Web
- MACS
- PETRUS
- Open Linked Data
- Motivation/Challenges
Lider-Roadmapping-Workshop | Leipzig | 02.09.20142
![Page 3: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/3.jpg)
DNB – the German National Library
- central archival library and national bibliographic center for the Federal Republic of
Germany
- collect, permanently archive, comprehensively document and record
bibliographically all German and German-language publications since 1913, foreign
publications about Germany, translations of German works (German National
Bibliography)
- produces (in collaboration with other institutions the Integrated Authority File
(GND, “Gemeinsame Normdatei”)
- makes them available to the public
- develops and maintains bibliographic rules and standards for Germany
- plays a significant role in the development of international library standards.
Inventory: ~ 27,8 bibliographical units; ~ 719000 online - publications (mainly pdf
and epub)
Lider-Roadmapping-Workshop | Leipzig | 02.09.20143
![Page 4: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/4.jpg)
Lider-Roadmapping-Workshop | Leipzig | 02.09.20144
Location in Leipzig
Location in Frankfurt
![Page 5: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/5.jpg)
Activities in the Context of Content extractionand Semantic Web:
Contentus
CrissCross
Culturegraph
Linked Data Service
MACS
Unidissen
VIAF
PETRUS
…
For more information see:
http://www.dnb.de/DE/Wir/Projekte/projekte_node.html
Lider-Roadmapping-Workshop | Leipzig | 02.09.20145
![Page 6: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/6.jpg)
MACS – Multilingual Access to Subjects I/IV
- Creation of a multilingual retrieval-vocabulary for research in bibliograpic databases.
- Links between Subject Headings of LCSH (Library of Congress Subject Headings),
RAMEAU (Répertoire d'autorité-matière encyclopédique et alphabétique unifié)
and GND (Gemeinsame Normdatei)
- In cooperation with SNB (Swiss National Library)
- Currently ~ 63000 Links wich have been imported to the GND-records
Use CasesMake data of DNB internationally available (search via LCSH/RAMEAU-subject headings)
Search in the Library of Congress /Bibliothèque de France with GND-subject headings
Possibility to overtake subject headings from bibliographical records (e.g. in case of
translations)
- Link: http://www.dnb.de/DE/Wir/Kooperation/MACS/macs.html
Lider-Roadmapping-Workshop | Leipzig | 02.09.20146
![Page 7: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/7.jpg)
Lider-Roadmapping-Workshop | Leipzig | 02.09.20147
MACS – Multilingual Access to Subjects II/IV
– Maintenance: The links are created/updated using the LMI (Link Management
Interface). The LMI provides a web-interface, data is stored in a central
database.
– Data Export / Import: The links are exported via OAI-Interface. The import to
the CBS (Central Bibliographic Database) is currently done by script (manually
initiated)
– Planned:
Integration in the search-portal of TEL (The European Library)
Provision via linked data service (actually not integrated)
Regular update between LMI and CBS
![Page 8: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/8.jpg)
Lider-Roadmapping-Workshop | Leipzig | 02.09.20148
MACS – Multilingual Access to Subjects III/IV
![Page 9: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/9.jpg)
9
MACS – Multilingual Access to Subjects IV/IV
Lider-Roadmapping-Workshop | Leipzig | 02.09.2014
![Page 10: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/10.jpg)
Petrus – Software-supported BibliographicalRecording I/V
Why software supported?
Growing number of online publications (see graphic below).
The German National Library is looking to reduce its traditional indexing operations
in areas which are no longer feasible due to the continually growing number of
publications, or are no longer necessary because of technological developments.
Lider-Roadmapping-Workshop | Leipzig | 02.09.201410
13525 17651
29823
112766
0
20000
40000
60000
80000
100000
120000
2007 2008 2009 2010
![Page 11: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/11.jpg)
Lider-Roadmapping-Workshop | Leipzig | 02.09.201411
Petrus – Software-supported BibliographicalRecording II/V
ClassificationBased upon the DNB-”Sachgruppen” ~ first two layers of the DDC
Statistical procedure, training corpus ~ 300.000 objects with known classes (full
text and tables of content). The objects are limited to 40.000 characters.
After stemming, the data model is generated. As classifier, SVM (scalable vector
machine) is used. After the creation of the model, a 3-fold validation is executed, in
order to verify the quality.
The model can be transferred to an “endpoint”, which is a stand-alone application.
The endpoint communicates via web service-interface.
In use since January 2012; currently ~ 400 objects/day
![Page 12: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/12.jpg)
Lider-Roadmapping-Workshop | Leipzig | 02.09.201412
Petrus – Software-supported BibliographicalRecording III/V
Keywording
Linguistic text analysis: language recognition, identification of sentences, words,
phrases etc.
Term matching with a dictionary which is based on the integrated authority file
(72000 subject headings), Disambiguation
Term ranking (dependant on position and frequency)
The keywording process can eventually be transferred to an “endpoint” (according
to the classification modell)
~ 80 objects/day
![Page 13: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/13.jpg)
Lider-Roadmapping-Workshop | Leipzig | 02.09.201413
(1) List of publications to beprocessed
(2) Metadata to be imported out ofthe biblographic database
(3) (Full-text) objects to be importedout of the repository
(4) Transfer via a webserviceinterface
(5) Return of results(6) Storage of the results in the
bibliographic data base
Petrus – Software-supported BibliographicalRecording IV/V
![Page 14: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/14.jpg)
Petrus – Software-supported BibliographicalRecording V/V
Lider-Roadmapping-Workshop | Leipzig | 02.09.201414
Appearance in the biblio-graphic record
Return of the classification software
![Page 15: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/15.jpg)
Lider-Roadmapping-Workshop | Leipzig | 02.09.201415
Open Linked Data I/III
- DNB provides high quality, mainly intellectually created data.
- Authority file (GND) and National Bibliography are available in rdf-format
- Data is published under the Creative Commons Zero-License
- Currently, the data can be accessed via the Portal (for single records) or
downloaded
http://datendienst.dnb.de/cgi-bin/mabit.pl?userID=opendata&pass=opendata&cmd=login
- Target groups are research facilities and non-profit organisations as well as
commercial service suppliers (e.g. search engines, knowledge management
systems)
![Page 16: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/16.jpg)
– Bibliographic data is highly reliable , but has a poor formal quality
(free-text fields) - High efforts for conversion
– The data was converted using Metafacture, which had been developed by
culturegraph.org. (www.culturegraph.org)
Lider-Roadmapping-Workshop | Leipzig | 02.09.201416
Open Linked Data II/III
![Page 17: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/17.jpg)
Lider-Roadmapping-Workshop | Leipzig | 02.09.201417
Open Linked Data III/III
Bibliographic record of „Winnetou“ leads to Karl May - detail
… leads to place of birth
Coordinates
![Page 18: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/18.jpg)
Motivation of the DNB
- Motivate external parties to work with rdf-data, e.g. linking it with other
ontologies.
- Improve search: Access by themes, browsing, unveiling relations between cultural
entities.
Technical Challenges
- Improve the accessibilty (e.g. by services - MACS)
- Search: Integrate Portal (knowledge representation, user interaction) with search
engine and linked data
–Lider-Roadmapping-Workshop | Leipzig | 02.09.201418
![Page 19: Deutsche Nationalbibliothek – Software-supported ......Recording II/V Classification Based upon the DNB-”Sachgruppen” ~ first two layers of the DDC Statistical procedure, training](https://reader036.vdocuments.us/reader036/viewer/2022081621/6129df7fe1a5322a6e58529d/html5/thumbnails/19.jpg)
Questions?
Lider-Roadmapping-Workshop | Leipzig | 02.09.201419
Mark ZöpfgenDeutsche NationalbibliothekInformationstechnikAdickesallee 1D-60322 Frankfurt am MainTelefon: +49-69-1525-1705mailto: [email protected]://www.d-nb.de