antoine isaac , dirk kramer, lourens van der meij, shenghui wang, stefan schlobach, johan stapel
DESCRIPTION
Vocabulary Matching for Book Indexing Suggestion in Linked Libraries – A Prototype Implementation & Evaluation. Antoine Isaac , Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel. Problem: subject indexing. Describing subjects of books - PowerPoint PPT PresentationTRANSCRIPT
-
Vocabulary Matching for Book IndexingSuggestion in Linked Libraries A PrototypeImplementation & Evaluation
Antoine Isaac, Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel
-
Problem: subject indexingDescribing subjects of booksUsing concepts from vocabularies (e.g. thesauri)
-
Problem: re-indexingDescribing a book that has already be describedWith a new vocabularyFitting a different context (e.g., different libraries)
-
Why re-indexing at KB?The Dutch National Library (KB) holds many books that are also in other Dutch public librariesKB deposit uses Brinkman thesaurus for indexingPublic Libraries use Biblion thesaurus
KB DepositCollection
DutchPublic Libraries
Biblion
Brinkman
overlap betweenbook collections
-
A wider issueKB shares books with many other librariesAll having their own description practices
KB
KB DepositColl.
KB ScientificColl.
DutchPublic Libraries
LC(US Nat. Lib)
BnF(French Nat. Lib)
DNB(German Nat. Lib)
DutchBook-trade
Biblion
NUR
BISACsubject codes
Brinkman
GTT
NBCclass.
UNESCOclass.
KB Corporatie+ Persoon
RAMEAUsubject headings
LCSHsubject headings
DDCDeweydecimalclass.
SWDsubject headings
Personennamendatei
LC authority file
AutoritsBNF
overlap between book collections (thickness indicates degree of overlap)
Vertical adjustment between a coll. and KOSs denotes KOSs' being used to describe that coll.
otherclassifications
domain/disciplineclassifications
subject thesauri / subj. heading lists
book collection datasets
person/corporation data
Doel-groep--audience
-
Room for improvement?Libraries devote large resources to indexing20 people at KBAbout 20,000 books per year
Leveraging already existing descriptions for re-indexing can be beneficial for both sides
-
Alignment and re-indexingSTITCH projectTackling semantic interoperability in Cultural HeritageUsing ontology alignment
Mappings between concepts from different vocabularies can be used for re-indexingBasic idea: replace concepts in descriptionsby conceptually equivalent concepts
-
Goal: a re-indexing prototypePast: preliminary experiments with KB data
Now: building a prototype andplugging it onto the KB production systemhaving it evaluated by its potential users (indexers)
Prototype case: Dutch public libraries / KBSuggesting Brinkman subjects based on Biblion ones
-
Alignment and re-indexing: requirementsSubjects can be complex
Mappings between groups of concepts "Travel guides" + "Spain" "Spain; travel guides"
Concepts are used in descriptions
Mappings taking into account extensional semantics"Building engineering" "Learning material ; building engineering"
-
Obtaining re-indexing rulesLexical alignments are not good enough
Probabilistic rules are calculatedUsing extension of concepts: existing indexingSimple probabilities, with adhoc adjustment"Travel guides","Spain""Spain; travel guides", 0.982
Not only based on Biblion subjectsAUT main authors of booksKAR characteristicDGP intellectual level/target group
-
DemoDoesn't work?
-
User studyQuantitative aspectHow well does the tool compare to human subject indexing?
Qualitative aspectUser satisfactionImprovement suggestion
-
Evaluation setting6 indexers6 weeks284 booksEvaluation integrated in daily indexing work
Pre-evaluation briefingQuestionnaire during evaluation Post-evaluation de-briefing & questionnaire
-
User study resultsTop ranked mappings are indeed much better
Individual book satisfaction level > 70%
Suggestion class# suggestionsprecisionrecallblue30872.7%47.9%purple1,18810.7%27.1%red2,5251.11%5.98%non suggested8919.0%
-
User study results (1)But the general satisfaction is lowerOnly two out of six would use the tool as such
Quality of suggestionsLower-level suggestions are often not meaningful
Perception of suggestions' qualityLong lists with wrong suggestions ad the end are badRanking is appreciated, but it is not enough
-
User study results (2)Suggestions were found promisingBridging the indexing gap between collectionsDifferent indexing strategies"Persian language" (Biblion)vs. "Iranian language and literature" (Brinkman)
Lots of suggestions for improvementMore re-indexing!Suggesting concepts from other vocabulariesMore context metadata as input
-
ConclusionsShows the potential of re-using data in a library network
Alignment approach fitting indexing practice
Concrete demonstration, in KB production environment
Technology transfer: KB wants to continue efforts
Flexibility: architecture ready to exploit other vocabulariesLinked data & SKOS
-
Prototype components
Database
STITCH stylesheet (XSLT)
WinIBWcataloguing interface
IE
GGC cataloguing system
LOD SPARQL endpoints
suggestion service(SWI-Prolog)
vocabularyservice(Java/Tomcat)
STITCH script(VisualBasic)
Indexer
lexical alignmentsSesame RDF store
Sesame SKOSRDF store
-
Linked libraries?
KB DepositColl.
DutchPublic Libraries
KB ScientificColl.
LC(US Nat. Lib)
BnF(French Nat. Lib)
DNB(German Nat. Lib)
DutchBook-trade
Biblion
NUR
BISACsubject codes
Brinkman
GTT
NBCclass.
UNESCOclass.
KB Corporatie+ Persoon
RAMEAUsubject headings
LCSHsubject headings
DDCDeweydecimalclass.
SWDsubject headings
Personennamendatei
wikipedia.nl
wikipedia.de
LC authority file
AutoritsBNF
existing KOS alignment
potential KOS alignment of interest
overlap between book collections (thickness indicates degree of overlap)
otherclassifications
domain/disciplineclassifications
subject thesauri / subj. heading lists
book collection datasets
person/corporation data
others
LCSH
currently available entry point to the LOD cloud
KB
Vertical adjustment between a coll. and KOSs denotes KOSs' being used to describe that coll.
Doel-groep--audience
-
Thank you!Questions?
-
Screenshots
-
WinIBW production tool
-
STITCH suggestion tool
-
Original metadata
-
Concept suggestions
-
Comparing with human re-indexing
-
Complement: lexical alignments
-
Adding subjects using thesaurus access
-
Concept suggestions
-
Saving and back to WinIBW
-
ScreenshotsBack
fr : 2BI 2BR