controlled vocabularies in telplus antoine isaac vrije universiteit amsterdam edlproject workshop...
TRANSCRIPT
![Page 1: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/1.jpg)
Controlled Vocabularies in TELPlus
Antoine ISAACVrije Universiteit Amsterdam
EDLProject Workshop22-23 November 2007
![Page 2: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/2.jpg)
Agenda
• TELPlus Context
• Improving subject access– 3 sub-tasks
• Services for TEL
![Page 3: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/3.jpg)
TELPlus Context
• Started October 2007• Running 27 months
• Content WPs– OCRing previously digitised material– Improving the usability of TEL through OAI
PMH compliancy– Improving Access– Integrating services with TEL portal– User personalisation services– Extending TEL to Bulgaria & Romania
![Page 4: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/4.jpg)
WP3 – Improving Access
• Task 1: Indexing for usability– Review/test state-of-the-art semantic search
engines• On content of documents
• Task 2: Improving subject access• Task 3: FRBR aggregation, search and
browsing– Create/exploit FRBR metadata repositories
• Task 4: Focus on users– Focus groups on prototypes
![Page 5: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/5.jpg)
WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic
alignment between subjects
• Search through collections– Using metadata– In a controlled setting
• Paving the way for enhanced usages– Advanced treatments mentioned in TELplus
need conceptual structures and links between these structures
• E.g. clustering
![Page 6: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/6.jpg)
WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic
alignment between subjects
• Reference: MACS project– Manually-built semantic equivalences
between Rameau, SWD & LCSH headings
![Page 7: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/7.jpg)
MACS: Querying Collections
![Page 8: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/8.jpg)
MACS: Query Reformulation Options
![Page 9: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/9.jpg)
WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic
alignment between subjects
• Reference: MACS project– Manual equivalences between Rameau,
SWD, LCSH headings
• Here: an experiment on deploying automatic alignment techniques– Determining possible strategies– Assessing feasibility and usefulness– MACS context
![Page 10: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/10.jpg)
WP3.2 Sub-tasks
• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)
• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects
• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one
subject list to the other
![Page 11: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/11.jpg)
Converting subjects to standard representation language
Goal: solving syntactic heterogeneity between vocabularies
• Enabling the use of standard tools– E.g. for query (re)formulation
• Paving the way for dealing with semantic heterogeneity– Definitions of concepts expressed according
to a common model
![Page 12: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/12.jpg)
Converting subjects to standard representation language
Approach: Semantic Web and SKOS• Semantic Web
– Knowledge objects as web resources (URIs)– Description by linking resources (RDF)– Description using shared formal
vocabularies (ontologies)
• SKOS – A standard Semantic Web model (ontology)– For knowledge organization systems
(thesauri, subject heading lists…)
![Page 13: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/13.jpg)
http://www.iconclass.nl/s_11
http://www.iconclass.nl/s_11F
skos:Concept
rdf:type
skos:broader
skos:prefLabel
“the Virgin Mary”@en
skos:prefLabel“la Vierge Marie”@fr
http://www.iconclass.nl/
skos:inScheme
skos:ConceptScheme
rdf:type
SKOS: Example
![Page 14: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/14.jpg)
Converting subjects to standard representation language - Process
• Getting processable versions from owners – E.g. XML
• Analyzing the models
• Converting to SKOS
![Page 15: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/15.jpg)
WP3.2 Sub-tasks
• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)
• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects
• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one
subject list to the other
![Page 16: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/16.jpg)
Vocabulary Alignment
• Specifying required alignment format (links)– Type of mapping links: equivalence, broader– Cardinality: one-to-one, one-to-many– Taking application context (TEL) into account
![Page 17: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/17.jpg)
Vocabulary Alignment
• Specifying required alignment format (links)
• Selecting (& running) alignment techniques/tools– Inspired by semantic web approaches
![Page 18: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/18.jpg)
Vocabulary Alignment Techniques
• Similar to ontology alignment problem• Existing approaches for (semi-) automatic
ontology alignment– Using techniques from linguistics, computer
science, statistics
• Problem: performances do not allow 100% automatic alignment
• Problem: multilingual case– Some techniques cannot be used
![Page 19: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/19.jpg)
Backgroundknowledge
Potential Technique: Using Background Knowledge
• Using a shared conceptual reference to find links
SHL 1 SHL 2
“Calendar”
“Publication”
![Page 20: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/20.jpg)
Potential Technique: Statistical Alignment
• Object information (book indexing)
SHL 1 SHL 2
Dually-indexedbooks
“DutchLiterature”
“Dutch”
![Page 21: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/21.jpg)
Vocabulary Alignment
• Specifying required alignment format (links)
• Selection (& running) of tool/method
• Evaluation (& cleaning)– Considering application
![Page 22: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/22.jpg)
Evaluation of Alignments
• MACS has produced mappings!– Possible gold standard
• But: has MACS produced all mappings?– Which proportion of the SHLs is covered?– Taking into account all indexing strings?
• Are MACS mappings the only interesting ones?– “Serendipity” mappings
• Concepts that are not equivalent but could bring useful results when added to queries
– Compensating for indexing variability
![Page 23: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/23.jpg)
Evaluation of Alignments
• Several scenarios for using and evaluating alignments– Concept-based search– Re-indexing– Integration of one SHL into the other– SHL Merging– Free-text search– Navigation
![Page 24: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/24.jpg)
Evaluation of Alignments
• Several scenarios for using and evaluating alignments– Concept-based search
• Retrieving books indexed by SHL1 using SHL2 concepts
– Re-indexing– Integration of one SHL into the other– SHL Merging– Free-text search
• Matching user search terms to both SHL1 or SHL2 concepts
– Navigation• Browsing several collections using one SHL
structure
![Page 25: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/25.jpg)
Evaluation of Alignments
• Several settings for a single scenario– Fully automatic reformulation vs assisted
reformulation (candidates)
• Different evaluation measures– Good mappings vs acceptable ones– Number of candidates for reformulation– Semantic closeness to original query
![Page 26: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/26.jpg)
Vocabulary Alignment
• Specifying required alignment format (links)
• Selection (& running) of tool/method
• Evaluation (& cleaning)
• Assessment of the approach– Efforts required, quality, extendibility
![Page 27: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/27.jpg)
WP3.2 Sub-tasks
• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)
• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects
• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one
subject list to the other
![Page 28: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/28.jpg)
Deploying the alignment knowledge obtained into TEL framework
• Observing integration of MACS data into TEL– Conceptual input for alignment requirements
• Integration of the obtained alignment in TEL
• Assessment of the alignment integration– Technical aspects, usage aspects
![Page 29: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/29.jpg)
Reminder
• Alignment is a difficult problem• Application-specific alignment pretty much
unexplored in Semantic Web research
More a feasibility study than a complete solution to the problem
Practical goal: investigate how automatic techniques could help MACS-like initiatives
• Manual mapping is labour-intensive
![Page 30: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/30.jpg)
Agenda
• TELPlus Context
• Improving subject access– 3 sub-tasks
• Services for TEL
![Page 31: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/31.jpg)
WP4 – Integrating services with the European Library portal
Theo van Veen (KB)
Tasks:• Identifying services that are going to give the
user the greatest return• Creating new services• Integrating services within TEL…
![Page 32: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/32.jpg)
WP4 – Some Services Mentioned
Preliminary inventory: no official commitment!
Services based on controlled vocabularies:• Thesaurus and name authority service
– Providing terms linked to query terms
• Semantic enrichment service– Users can annotate search results with
terms
• Distance between terms and related terms
![Page 33: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/33.jpg)
WP4 – Some Services Mentioned
Preliminary inventory: no official commitment!
Services based on controlled vocabularies:• Thesaurus and name authority service• Semantic enrichment service• Distance between terms and related terms
Adding more value from controlled vocabularies and alignments between them
![Page 34: Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007](https://reader035.vdocuments.us/reader035/viewer/2022062619/5517e4ff550346cb568b46a1/html5/thumbnails/34.jpg)
Thanks!