galleries, libraries and archives from european …...transcriptions, semantic annotations as a...
TRANSCRIPT
Exploring 57,624,717 artworks, artefacts, books, films and music from European museums, galleries, libraries and archives
Antoine Isaacwith slides from Hugo Manguinhas, Valentine Charles, Juliane Stiller, Mónica Marrero
HELDIG Digital Humanities Forum
Helsinki, 23 October 2019
Who is Europeana?
CC BY-SA
● A non-profit foundation
● A community of 2400 experts in digital heritage – the Europeana Network
● A shared mission: improve access to Europe's digital cultural heritage
We transform the world with culture. We build on Europe’s rich cultural heritage and make it easier for people to use for work, learning or pleasure. Our work contributes to an open, knowledgeable and creative society.
What is Europeana?
CC BY-SA
● The European Commission's digital platform for cultural heritage
● Providing access to over 57M objects from over 3500 museums, libraries, archives
What is Europeana?
CC BY-SA
● An Open Data platform providing several services
● Europeana Collections portal - http://europeana.eu
● Europeana APIs - https://pro.europeana.eu/resources/apis
Europeana Research
Partnerships
Expertise
Research Grants Programme
Europeana Research Community
Europeana Collections
Connections
Europeana APIs
Europeana R&D
Projects
Europeana ResearchCC BY-SA
Europeana & CLARIN
• 135,000 Europeana sources integrated into CLARIN’s Virtual Language Observatory (VLO)
• Selection based on quality, accessability, processability and reusability
• Case study published at https://bit.ly/2J5w8jc
Building partnerships with research infrastructures
Europeana ResearchCC BY-SA
Research Grants Programme
Intended for events that bring together cultural heritage professionals and researchers
● 2019 theme: Digital Cultural Heritage for Open Science
● Budget: 25,000 €● Deadline: 31 October 2019
Supporting new forms of collaboration
Europeana ResearchCC BY-SA
https://bit.ly/31ua48f
Research Task Force on Understanding Researchers’ Needs
• Goal: supporting cultural institutions in building digital collections that can meet researchers’ needs.
• Expected Outcomes: a set of recommendations addressed to cultural institutions and policymakers.
• Strategy: a survey addressed to social sciences and humanities scholars, and researchers working at cultural institutions. Deadline: 31 October 2019. https://bit.ly/2Me0odt
Europeana Research CommunityCC BY-SA
How does it work?
France, Public Domain1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac d’Enghien : Berregent piloté par Austerling
Title hereCC BY-SACC BY-SA
What’s inside Europeana?
Europeana EssentialsCC BY-SACC BY-SA
● Descriptive and technical metadata: title, creator, subject, rights…
● Thumbnails● Editorial content, like curated virtual exhibitions● (recently started) user-generated metadata, incl.
transcriptions, semantic annotations
As a rule, digitized content is served on our partners’ websitesExcept for some specific projects
● Newspapers● User-generated content, e.g. Europeana 1914-1918
Data flow in Europeana’s networkData providers: cultural institutions that provide metadata and links to digitized content
Aggregators: organizations or projects that gather data from a specific country or domain (music, fashion, archaeology…)
Data Quality Issues in Cultural Heritage
Caveat: some examples have been already cleaned ☺
France, Public Domain1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac d’Enghien : Berregent piloté par Austerling
Title hereCC BY-SA
Sparseness of (meta)data
CC BY-SA
Title hereCC BY-SA
Heterogeneity
Europeana EssentialsCC BY-SACC BY-SA
57M objects, from 3,500 institutions
● Many different themes and types of objectsBooks, newspapers, journals, letters, diaries, archival papers, paintings, maps, drawings,
photographs, music, spoken word, radio broadcasts, film, newsreels, television, fashion, sculpture, 3D objects, and more
● Libraries, archives, museums have different ways to describe objects. Even within a sector, big differences can be observed
Title hereCC BY-SA
Multilinguism
Europeana EssentialsCC BY-SACC BY-SA
57M objects, from 44 countries
● Officially we get metadata in 37 languages
● But there are more languages used in individual metadata fields
Title hereCC BY-SA
Europeana EssentialsCC BY-SACC BY-SA
Work by Péter Kiraly (Göttingen Research alliance)http://144.76.218.178/europeana-qa/languages.php?collecti
onId=all&field=aggregated
Title hereCC BY-SA
Europeana EssentialsCC BY-SACC BY-SA
Work by Péter Kiraly (Göttingen Research alliance)http://144.76.218.178/europeana-qa/languages.php?collecti
onId=all&field=aggregated
Title hereCC BY-SA
Multilinguism
Europeana EssentialsCC BY-SACC BY-SA
● Officially we get metadata in 37 languages
● But there are more languages used in individual metadata fields
• Over 400 language codese.g., 6 values in x-aramaic-latn - not a valid code by the way
• The most common case is lack of language information!
France, Public Domain1932, National Library of France
Agence de presse Mondial Photo-Presse.
Tournoi royal de motos à Londres : changement d'une roue de side-car en marche
How to get more homogeneous, richer & multilingual data?
Data modeling for interoperability and richer metadata
● Like many aggregators, we ask our providers to give metadata using one metadata model: the Europeana Data Model (EDM)
● But we cannot do whatever we like: we do not operate in isolation!
● Our approach must be○ easy and rewarding for our partners○ based on community-agreed best practices
A community sport
• Involving (technical) experts from libraries, archives, museums and academics – the EuropeanaTech community
• Adopting a collaborative, softer form of standardization
http://pro.europeana.eu/europeana-tech Europeana Assembly General Meeting, Rijksmuseum, Amsterdam, 2015
Title hereCC BY-SACC BY-SA
Following Best Practices, such as the Linked Open Data principles
http://vimeo.com/36752317
CC BY-SA
• Active in 2014-2016
• To develop the open data ecosystem, facilitating better communication between developers and publishers;
• To provide guidance to publishers, promoting the re-use of data;
• To foster trust in the data among developers
• Linked Data, but not only!
Data on the Web Best PracticesWorking Group
https://www.w3.org/2013/dwbp/
CC BY-SA
• Accept that (OWL) semantics establish precise specs and can enable automated reasoning but that complex vocabularies require more effort to produce and hamper reuse of data
• Minimize ontological commitment of your vocabulary – or seek to minimize the commitment of others’ vocabularies
• Check that inference does not produce too many statements that are unnecessary for target applications
• Check examples of “softer” specs, e.g. Schema.org or SKOS
BP 16: Choose the right formalization level
Data on the Web Best Practices W3C Recommendation
CC BY-SA
• Use terms from shared vocabularies, preferably standardized ones
• Check that classes, properties, terms, elements or attributes used to represent a dataset do not replicate those defined by vocabularies used for other datasets.• e.g. using the Linked Open Vocabularies repository
• Or if you have to replicate, indicate mappings clearly
BP 15: Reuse vocabularies, preferably standardized ones
Data on the Web Best Practices W3C Recommendation
Title hereCC BY-SA
Title hereCC BY-SA
Europeana EssentialsCC BY-SA
Data modeling for interoperability and richer metadata
CC BY-SA
Clavecin, Bartolomeo Cristofori Cite de la Musique, MIMO - Musical Instruments Museums Online|CC BY-NC-SA
Europeana Data Model example
Massive re-use of vocabularies
CC BY-SA
Plus• Web Annotation• RDA• WGS84• EBUcore• ccRel• ODRL
• DOAP• SVCS• DCAT• ADMS…(sometimes only for one property!)http://pro.europeana.eu/edm-documentation
EDM in Linked Open vocabularies (LOV)
OAI-ORE FOAF
Massive re-useHere are the properties for the edm:ProvidedCHO class (the cultural heritage
object which is the subject of metadata submitted to Europeana)
Optional fields:dc:contributor, dc:creator, dc:date, dc:format, dc:identifier, dc:language, dc:publisher, dc:relation, dc:source, dcterms:alternative, dcterms:extent, dcterms:temporal, dcterms:medium, dcterms:created, dcterms:provenance, dcterms:issued, dcterms:conformsTo, dcterms:hasFormat, dcterms:isFormatOf, dcterms:hasVersion, dcterms:isVersionOf, dcterms:hasPart, dcterms:isPartOf, dcterms:isReferencedBy, dcterms:references, dcterms:isReplacedBy, dcterms:replaces dcterms:isRequiredBy, dcterms:requires dcterms:tableOfContentsedm:isNextInSequenceedm:isDerivativeOfedm:currentLocation…
Mandatory fields:dc:title or dc:descriptionOne of dc:coverage, dc:subject, dc:type dcterms:spatialedm:type with value of TEXT, IMAGE, VIDEO, SOUND or 3Ddc:language for objects with edm:type value of TEXT
Enriching metadata
CC BY-SA
• EDM gives a base for (linking to) multilingual metadata• data as resources with web URIs, not only strings
• We encourage data providers to contribute their own links/data to local or external vocabularies
CC BY-SA
LOD Vocabularies currently recognized by Europeana in providers' metadata:
Vocabulary URLMIMO Concepts http://www.mimo-db.eu/
MIMO Instrument makers http://www.mimo-db.eu/
The Getty - Art & Architecture Thesaurus (AAT) http://vocab.getty.edu/
The Getty - Union List of Artist Names (ULAN) http://vocab.getty.edu/
Virtual International Authority File (VIAF) http://viaf.org/viaf/
Geonames http://sws.geonames.org/
IconClass http://iconclass.org/
Gemeinsame Normdatei (GND) http://d-nb.info/gnd
Israel Museum Jerusalem Concepts http://www.imj.org.il/imagine/thesaurus/objects/
Partage Plus concepts http://partage.vocnet.org/
data.europeana.eu WWI Concepts from Library of Congress Subject Headings (LCSH) http://data.europeana.eu/concept/loc
Europeana Sounds Genres http://data.europeana.eu/concept/soundgenres/
EAGLE Material & Object Type http://www.eagle-network.eu/voc/
DISMARC Formats & Genres http://purl.org/dismarc/ns/
UDC http://udcdata.info/rdf/
UNESCO Thesaurus http://vocabularies.unesco.org/thesaurus/
Since last week: the General Finnish Ontology YSO!For example: https://www.europeana.eu/portal/en/record/2021007/_SLSA_1070_SLSA_1070_k36.html
Title hereCC BY-SACC BY-SA
Enriching metadata
CC BY-SA
• EDM gives a base for (linking to) multilingual metadata• data as resources with web URIs, not only strings
• We encourage data providers to contribute their own links/data to local or external vocabularies
• We are going to further develop crowdsourcing/"nichesourcing" of metadata
• In parallel, we apply automatic enrichment to link object metadata to reference datasets for places, persons, concepts
Title hereCC BY-SACC BY-SA
Enriching metadata
CC BY-SA
Title hereCC BY-SACC BY-SA
Enriching metadata – Contextual Entities
CC BY-SA
We are building an "Entity Collection"
• Centralized point of reference and access to data about contextual entities: places, agents (persons and organizations), concepts...
• Caching and curating data from the wider Linked Open Data cloud
• A sort of Europeana "knowledge graph”
• With a dedicated API
Title hereCC BY-SACC BY-SA
Uses cases for the Entity Collection
CC BY-SA
Improve user experience on Europeana services● Findability: users can search with and for people, places and subjects, not only objects. In many
more languages, and with less ambiguity
● Contextualization: users see contextual information about cultural heritage objects. Entity Pages group and present all assertions about an entity
…
Entity PagesSemantic auto-completion
Data currently in the Entity Collection
CC BY-SA
• Placesa subset of Geonames, corresponding to places which are part of European countries and of some specific feature classes.
• Agentsa subset of DBpedia corresponding to most of the instances of dbp:Artist with some exceptions, and integrated from 49 DBpedia language editions.
• Conceptsa subset of DBpedia corresponding to a selection concepts matching the needs from Europeana Collections (e.g., WWI battles).Europeana Sounds music genres (obtained from Wikidata)Photo Consortium's photography vocabulary
• OrganizationsExtracted from Europeana's CRM and aligned to Wikidata when possible
216,302resources
1,572resources
165,005resources
1,077resources
Selecting data sources
CC BY-SA
• Availability and access: open license, published on the web as linked data
• Granularity, size and coverage: multilingual data, helping to answer key user needs for Europeana's CH collections. Too generic or large datasets can create too much ambiguity for the simple processes we have (e.g. enrichment)
• Quality: intrinsic aspects like correctness of representation (data structures)
• Connectivity: good data sources are well-connected internally and externally to other datasets
An exampleDBpedia resource for “Mozart” in our data
CC BY-SA
Coreference links to 6 other datasets(e.g. Freebase, Wikidata)
Inter-linking information
Preferred labels for 48 languages
An enrichment example
Links to contextual entities
And what it allows
And what it allows
Title hereCC BY-SACC BY-SA
The Entity Collection’s contribution to multilingual coverage
Entities effectively used to enrich Europeana Objects
Entities present in the Entity Collection
Title hereCC BY-SACC BY-SA
Multilingual enrichment is not easy!
Poisonous India or the Importance of a Semantic and Multilingual Enrichment StrategyMarlies Olensky, Juliane Stiller, Evelyn Dröge, MTSR 2012 http://link.springer.com/chapter/10.1007%2F978-3-642-35233-1_25
We’re not really finished!
CC BY-SA
• Expand data coverage with new data sources • For existing types of contextual entities• For new types, e.g., events
• Better enrich Europeana object metadata
Automatic translation?
CC BY-SA
• First experiments with the European Commission’s eTranslation service, on curated content
• It needs to be slightly better, or be applied to other Europeana areas, like search
Result ranked 1: French
Result ranked 2: Spanish
Result ranked 3: Polish
search results
query
Search only in corresponding language
Approach 1
Query translation
Approach 2
Metadata/content translation
Encouraging everyone on the way to improve their data
University Of Edinburgh, CC BY
Roslin Glass Slides, creator unknown
Photograph of two men step cutting on the ice face of the Tasman Glacier, New Zealand in the late 19th or
early 20th century.
Title hereCC BY-SA
Challenges for working on quality improvement
● Methodological frameworks are not easy to apply
● Getting stakeholders interested is hard for us● Communication lines are rather long● It’s a sensitive area● It’s hard to get (representatives of) users to contribute
CC BY-SA
A general effort on quality
We have set up a Data Quality Committee to analyze quality issues and make recommendations to the Europeana community about:
○ Mandatory metadata elements○ Metadata checking and normalization○ Multilingualism…
http://pro.europeana.eu/get-involved/europeana-tech/data-quality-committee
Title hereCC BY-SA
Europeana EssentialsCC BY-SACC BY-SA
Work by Péter Kiraly (Göttingen Research alliance)http://144.76.218.178/europeana-qa/languages.php?collecti
onId=all&field=aggregated
Title hereCC BY-SA
CC BY-SA
Trying to demonstrate impact
EPF CONTENTEUROPEANA PUBLISHING FRAMEWORK SINCE 2015
Rights
EPF 1.0
Content Quality
EPF METADATAEUROPEANA PUBLISHING FRAMEWORK 2.0
EPF 2.0
Contextual Classes
Enabling ElementsLanguage
https://pro.europeana.eu/post/publishing-framework
CC BY-SA
Europeana Publishing Framework: Metadata
languages attributes happy users
(using Europeana Collections in their native language)
links to vocabularies context
(for users browsing Europeana Collections by persons, places, or concepts)
enabling elements visibility
(collections being findable along various dimensions: by subject, type, creator, date)
Helping FAIRification of Cultural Data
University Of Edinburgh, CC BY
Roslin Glass Slides, creator unknown
Photograph of two men step cutting on the ice face of the Tasman Glacier, New Zealand in the late 19th or
early 20th century.
Title hereCC BY-SA
How do Europeana's data and services meet the FAIR requirements?
Europeana EssentialsCC BY-SACC BY-SA
Findable
● The Europeana aggregation network partially homogenizes its data via a shared data model
● Providers and Europeana seek to enrich the data with multilingual, semantic resources
● We promote persistent identifiers and links across them
● Europeana provides a search engine
● Data is made findable through other platforms (e.g., CLARIN)
Title hereCC BY-SA
How do Europeana's data and services meet the FAIR requirements?
Europeana EssentialsCC BY-SACC BY-SA
Accessible
● Data is published as (Linked Data) web resources
● Freely available, standard web APIs
Interoperable
● Europeana uses a community-based model
● Following best practices, such as mixing and re-using existing data models and vocabularies
● We promote more open content access protocols (IIIF)
Title hereCC BY-SA
How do Europeana's data and services meet the FAIR requirements?
Europeana EssentialsCC BY-SACC BY-SA
Re-usable
● The conditions for re-using digitized content are made clear, using shared vocabularies (Creative Commons, RightsStatements.org)
● Metadata is fully open – CC0
● Data model seeks to bridge with other communities’ models, such as W3C Web Annotation, Schema.org
Title hereCC BY-SA
Is it perfect?
Europeana EssentialsCC BY-SACC BY-SA
No but we hope it's better than if we wouldn't exist!
Want to engage?
Reminder: Europeana Research calls for action• Europeana Research Grants for events that bring together
cultural heritage professionals and researchers, • theme: Digital Cultural Heritage for Open Science• Deadline: 31 October 2019. https://bit.ly/31ua48f
• Survey to better understand researchers’ needs• Deadline: 31 October 2019. https://bit.ly/2Me0odt
Join the Europeana Network and (one of) its communities!https://pro.europeana.eu
CC BY-SA
Title hereCC BY-SACC BY-SA
Title hereCC BY-SA
Name of image | CreatorProviding organization|
Country, licence
Name of image | CreatorProviding organization| Country, licence
[email protected]@antoine_isaac