culturegraph aufbau eines hubs für linked library data
DESCRIPTION
1. Markus M. Geipel Adrian Pohl . culturegraph.org Aufbau eines Hubs für Linked Library Data. 2. Table of Contents. The Linked Data Challenge Culturegraph Platform Resolving & Lookup Process & Technology RDF Modelling Current State. 3. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/1.jpg)
1
culturegraph.orgAufbau eines Hubs für Linked Library Data
Markus M. Geipel <[email protected]> Adrian Pohl <[email protected]>
![Page 2: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/2.jpg)
2
1. The Linked Data Challenge
2. Culturegraph Platform1. Resolving & Lookup2. Process & Technology3. RDF Modelling
3. Current State
Table of Contents
![Page 3: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/3.jpg)
3
Paradigm shift in modeling knowledge/data
Isolated Tables Network beyond organizational boundaries
![Page 4: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/4.jpg)
From isolated Tables to a Semantic Network
A naïve Approach
1. Transform from Marc21/Mab2/Pica to RDF
2. Put everything into a Triplestore
3. SPARQL and Reasoner do the magic
What is wrong with this approach?
4
![Page 5: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/5.jpg)
5
Format is not Content!
If you pour water into a wine-glass does it change to wine?
How can you expect old Marc21 data to change into a semantically rich, reasoner-ready piece of information just by changing the data format to RDF?
?
![Page 6: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/6.jpg)
Connections don’t come for freeSome challenges …
1. No universally unique id
2. Often no references to entities, just character-strings
3. No controlled vocabulary- Example: 1.3 Mio. different
values for the edition field
4. Changing Cataloging Practices
5. Mistakes, Typos
6
![Page 7: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/7.jpg)
Culturegraph as a signpostA coherent picture on bibliographic data
7
Hiddenduplicates
Different services
Differentinterfaces
?Culturegraph
!
![Page 8: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/8.jpg)
8
Culturegraph as a Platform to interlink Bibliographic Data
1. Open Tools- Open algorithms and code; reuse
2. Integration into existing Workflows- Synchronization of data- Integration of results into original data sources
3. Publication Results- Connections and views, not the entire aggregated Data- Linked Open Data/RDF
4. Persistence of Results- Integration into URN resolving infrastructure
5. Tracking provenance
![Page 9: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/9.jpg)
First Project: Resolving & LookupUniversally Unique and Persistent IDs– Input:
6 main German bibliographic catalogues
– Objective: Bundling of manifestations
– Service:- Publication of bundles- Minting of URNs for approved bundles- Search bundles using established identifiers
– Part of the DDB Eco-System- Support for Data Aggregation
9
![Page 10: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/10.jpg)
The Process
1. Translate into internal format1. Mapping of Fields to
Properties2. Normalization, Cleaning,
Regexp Matching, etc. defined in XML
2. Database ingest> 80 Million Records> One Billion Properties
10
XML
![Page 11: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/11.jpg)
The Process
3. Generate unique properties > 50 Mio.*- Combinations of Properties
defined in XML
4. Group by Unique Properties
5. Merge equivalent Groupsca. 18 Mio. Records* in groups
11
XML
* For a first simple Matching Algorithm
![Page 12: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/12.jpg)
The Process (next steps)
5. Check quality & mint persistent Ids
6. Publication as Linked Data
12
Id1 Id2 Id3
http://
![Page 13: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/13.jpg)
Representing bundles of bibliographic records in
RDF
13
![Page 14: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/14.jpg)
Namespaces for Internal Bibliographic Description
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
bibo: <http://purl.org/ontology/bibo/>
dcterms: <http://purl.org/dc/terms/>
frbr: <http://purl.org/vocab/frbr/core#>
foaf: <http://xmlns.com/foaf/0.1/>
cg: < http://culturegraph.org/vocab#> (not established yet)
...& others
14
![Page 15: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/15.jpg)
15
![Page 16: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/16.jpg)
Matching & Bundling
Different matching critieria to be discussed
Example: sameness of ISBN & year
Matching algorithms can be created and modified easily
Matched resources are bundled and underlying algorithm indicated
Bundle Ontology: http://purl.org/net/bundle
16
![Page 17: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/17.jpg)
17
![Page 18: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/18.jpg)
18
![Page 19: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/19.jpg)
Minting Über-Identifiers
In the last step IDs for bibliographic resources may be minted
urn:nbn:de:cg-12345678http://culturegraph.org/urn:nbn:de:cg-12345678
Based on reliable, agreed-upon algorithm
Record-resource linking by foaf:isPrimaryTopicOf
19
![Page 20: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/20.jpg)
20
![Page 21: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/21.jpg)
Future prospects
– Workflow-IntegrationShare, enrich and reuse metadata right from the start
– New Features/ProjectsFrom concrete to visionary…1. Integration of GND-references
(from BEACON-Files and other sources) 2. Computation of links to further resources
(Subject Headings, Geo coordinates, Person names, Wikipedia)3. Authority file for works4. Crowdsourcing
(enrich and correct descriptions of titles, works, persons, etc.)
21
![Page 22: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/22.jpg)
Markus M. Geipel |culturgraph.org | 5. October 201122
Summary
– Culturegraph will - Match the main German library catalogues- give each bibliographic resource a persistent ID
– State- Basic infrastructure up running with good performance
(80 Mio. Records Matched in one hour)- All Source Code published on Sourceforge- First Demonstrator Webportal at www.culturegraph.org
– Soon to come- January:
- Operational Webportal- Publication of first matching results (HTML, RDF, etc.)
- Next Year: - Persistent IDs
![Page 23: culturegraph Aufbau eines Hubs für Linked Library Data](https://reader035.vdocuments.us/reader035/viewer/2022062422/56812fe6550346895d955caa/html5/thumbnails/23.jpg)
Appendix: Projektmitarbeiter
– Daniel Schäfer (DNB) Projektleitung
– Katja Mecklinger (DNB) Stellvertretende Projektleitung, ÖA
– Markus Geipel (DNB) Leiter Architektur und Entwicklung
– Adrian Pohl (hbz) – ÖA, Ontologie
– Pascal Christoph (hbz) – Architektur
– Julia Hauser (DNB) - Ontologie
– Lars Svensson (DNB) - Ontologie
– Jürgen Kett (DNB) – Projektsteuerung, ÖA23