linking library data using fusepool

24
Linking Library Data with Fusepool Johannes Hercher (Free University Berlin) June 25, 2014 @jhercher

Upload: datentaste

Post on 24-Jun-2015

241 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Linking Library Data using Fusepool

Linking Library Data with Fusepool

Johannes Hercher (Free University Berlin) June 25, 2014

@jhercher

Page 2: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

ContextI care for metadata Ugh!

Your OPAC sucks

We cooperate…

How to link Library Data with the „Oceans“ of WWW ?

German National Library

published authority data

Page 3: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Example

a search in subject index (with GND Identifiers)

a search in full text http://primo.fu-berlin.de

• GND = Thesaurus for subject indexing in Germany

• Search with GND limited tolocal resources

Page 4: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

• search beyond the local holdings => easier, more reliable

• suggest content using semantic relations ( GND is a Thesaurus ! )

You* should use identifiers

*publishers, authors, aggregators

Assigning IDs is time consuming

- Reality -

Assigning IDs is fun

- Vision -

Page 5: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Questions & Tasks

• Could machines do the subject indexing?-> Use SMA to enrich DBpedia pages with GND IDs

• Can we support Librarians in subject indexing? -> Build Annotator Prototype https://github.com/jhercher/LEE/

Page 6: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Demonstrator

AnnotatorApp: filters stoppwords and displays Library entities

for your text

Page 7: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Review concepts and start a search using concept id’s

https://github.com/jhercher/LEE

Page 8: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

How to Fusepool

Page 9: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Workflow1. Select a subset of GND Subject Headings using SPARQL

2. Import Subject Headings

3. Configure SMA dictionary component

4. Import documents (Graph)

5. Batch matching of documents with dictionaries using Fusepools DLC

6. Review results and build services on top

Page 10: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

http://zbw.eu/beta/sparql/gndhttp://d-nb.info/standards/elementset/gnd

NomenclatureInBiologyOrChemistry

SubjectHeadingSensoStrictoProductNameOrBrandName

HistoricSingleEventOrEraEthnographicName

GroupOfPersonsSubjectHeading

Language

Page 11: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

http://localhost:8080/admin/graphs/

Page 12: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Page 13: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Page 14: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Results

Page 15: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

<http://de.dbpedia.org/resource/Wilder_Streik_bei_Ford_(1973)> <http://purl.org/dc/elements/1.1/subject> <http://d-nb.info/gnd/7708211-4> , # Drug-eluting Stent(syn: DES) <http://d-nb.info/gnd/4302110-4> , # Ford <http://d-nb.info/gnd/4578282-9> , # sich [„self“@en] <http://d-nb.info/gnd/4248646-4> , # Spitzel [„spy“@en] (syn: IM) <http://d-nb.info/gnd/4389837-3> , # August (month) <http://d-nb.info/gnd/4291333-0> , # Niederlage [„defeat“@en] <http://d-nb.info/gnd/4002623-1> . # Arbeitnehmer [„employee“@en]

• GND Dictionary includes: articles, prepositions, adjectives… • Acronyms („IM, DES“) -> activate „Case Sensitivity“ • Not every match is useful in the context („August, Defeat“)

http://localhost:8080/graph?name=urn:x-localinstance:/dlc/{yourDataset}/enhance.graph

Page 16: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

human (found in GND) = 1

SMA GND suggestions = 7

SMA correct = 3

precision = 33%

recall = 100%

SMA false = 1

Prototype: GND AnnotatorPersons LocationsTopics Time

manual Evaluation only for Topics

okok

not relevantfalse

not relevantok

not relevant

Page 17: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Results (1)

Recall: 78%"Precision: 73%

Page 18: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Results (2)

Recall: 90%"Precision: 72%

Page 19: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

http://primo.kobv.de/docId=TN_thieme_articles10.1055/s-0029-1237743

Fusepool in the wild (1)

no exact string match

chemical term geographic

financialeducation

too broad

Page 20: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Fusepool in the wild (2)AbstractReviewsTOC

ISBN: 9783642371103

Drawback: Quality of annotations depend on text input

Page 21: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Feedback

Page 22: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Why Fusepool?1. Ready for the Semantic Web"

• can handle graphs (clerezza, TDB,…)

• Data i/o using REST

2. String Matching SMA"

• Import & configuration of dictionaries (e.g. a Thesaurus)

• batch matching & annotation using Data Life Center (DLC)

3. Easy to install Builds at http://jenkins.fusepool.info

Page 23: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Conclusion!

• Fusepool: Infrastructure to build new services

• … better linking beyond the aquarium(s)

• TODO:

• build tailored interfaces for annotation, search, recommender

• improve the dictionaries

Page 24: Linking Library Data using Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Thank You!

twitter: @jhercher github: https://github.com/jhercher/ mail: [email protected]