juliane stiller europeanatech 2015
TRANSCRIPT
Automatic Solutions to Improve Multilingual Access in Europeana
(There is a thin line between speaking 30 languages and gibberish)
Juliane Stiller
EuropeanaTech 2015, Paris, 12.02.2015
Dimensions of multilinguality
à Interface and portal display à Search
• Translation of query
• Translation of documents
• Enrichments with multilingual datasets
à Representation and refinement of search results • User needs to be able to determine relevance of documents
à Browsing
The challenges of query translation
Determine source language
Determine target language
Translate and construct query
• Queries are short • 39% of queries belong to more than
one language • 60% of queries are named entities
The detailed and long version of the process is in Péter Király: Query Translation in Europeana. Code4lib, Issue 27, http://journal.code4lib.org/articles/10285
Evaluation
à Corpus with 250 aligned queries in English, French and German
à Send to API to translate -> translations compared to baseline
Baseline From German to From English to From French to Stummfilm (DE) - Stummfilm Stummfilm
Silent film (EN) Silent film - Silent film
film muet (FR) cinéma muet cinéma muet -
Query translation – evaluation results
51%
19%
30%
English -> German
correct
incorrect
no translation
„Unarmed“ -> „unarmed – Best of 25th Anniversary“
„Bulgaria“ -> „Bulgarien (Begriffserklärung)“
For over 80% of the queries, the automatic solution is suitable
Enrichment types and vocabularies
Enrichment Type
Target vocabulary
Source metadata fields
Number of enriched objects
Places GeoNames dcterms:spatial, dc:coverage
7 M
Concepts GEMET, DBpedia,
dc:subject, dc:type
9,2 M
Agents DBpedia dc:creator, dc:contributor
144,000
Time Semium Time
dc:date, dc:coverage, dcterms:temporal, edm:year
10,2 M
Quality of enrichments
à Incorrect enrichments lead to • Devaluation of curated metadata • Loss of trust from providers • Propagation of errors to different languages • Irrelevant search results • Bad user experiences
Better understanding of impact of enrichments through evaluation needed à
Wrong vs. correct enrichments
78%
100%
92%
22%
8%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
WHAT
WHO
WHERE
correct incorrect
Source: Stiller, J., Olensky, M., Petras, V.: A Framework for the Evaluation of Automatic Metadata Enrichments. MTSR 2014
To conclude
à Multilingual access is a continuous undertaking that needs to be adapted constantly
à Best practices need to be developed and fostered by Europeana
à Community and network are indispensible for tackling multilingual issues
• Task force on enrichment and evaluation with experts from the network just started their work