rtÉ content discovery project - christophe debruyne
DESCRIPTION
Presentation to the Metadata Developer Network Workshop 2014 (MDN Workshop 2014), 4th of June 2014, Geneva, Switzerland.TRANSCRIPT
![Page 1: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/1.jpg)
RTÉ Content Discovery Project
Christophe [email protected]
[email protected]@insight-centre.org
MDN Workshop -- 4th of June 2014
![Page 2: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/2.jpg)
Outline
• Context
• Goal and Challenges of the RTÉ Content Discovery Project
• Tasks and Data Annotation• Tasks and Data Annotation
• EBU Core – Identification of problems
• Addressing the issues
• Using the ontology
• Conclusions and Recommendations
![Page 3: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/3.jpg)
ContextRTÉ, Ireland's National Television and Radio
Broadcaster
National trusted digital repository for Ireland's social and cultural data.
Centre for Data Analytics
Documents
Television
Radio
Stills
Linking and preserving data held by Irish Institutions with central internet access point.
• Standards• Cataloguing• Archiving• Preservation
• Insight @ NUIG = DERI
• Semantic Technologies• Linked Data• Data Analytics Platform
![Page 4: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/4.jpg)
Goal of the RTÉ Content Discovery Project• Discover implicit knowledge
• across the different archives • and the Web of Data
• To facilitate internal workflows (e.g., search)• For wider reuse and repackaging RTÉ’s
Documents
Television
Radio
Stills
• For wider reuse and repackaging RTÉ’s information
• Challenges• Heterogeneous databases• Different guidelines and practices• Legacy data (from previous systems)• … “Linking Open Data cloud diagram,
by R. Cyganiak and A. Jentzsch. http://lod-cloud.net/”
![Page 5: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/5.jpg)
Part of a wider ambition …
![Page 6: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/6.jpg)
OUTCOMES FOR RTÉ
![Page 7: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/7.jpg)
RTÉ Content Discovery
In this presentation we focus on Television and Radio archives
Documents Television Radio Stills
• In this presentation we focus on Television and Radio archives
• The Television and Radio archives• Are maintained on two different instances of the same system • A system that is EBU Core “compatible”• Different content, different guidelines, …
![Page 8: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/8.jpg)
Three main tasks
• Annotate the data.• Using relevant standards, ontologies and vocabularies. • Resource Description Framework (RDF).
• Obtain an integrated view of the different archives by creating links between the RDF representations of RTÉ’s archival assets across the different archives.
• Apply advanced methods for discovering related data for a given subject in external sources such as the Linked Data Cloud.
![Page 9: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/9.jpg)
Data annotation
Relational Database
D2RQ RDF Dump Triplestore
Television
Radio
Map symbols of database to predicates
(relations and concepts) in
chosen ontologies/ vocabularies
Use D2RQ to generate RDF
dump
Store RDF dump in adequate triple store (Jena TDB)
Which ontologies?• Dublin Core, DC Terms• Foaf• EBU Core OWL• …
![Page 10: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/10.jpg)
EBU Core OWL
• The RTÉ Content Discovery platform will rely on Semantic Web technologies to reason. Ontologies will therefore need to be correct.
• But … while adopting the EBU Core OWL ontology, several problems where identified.
• We contacted EBU to resolve these issues.
• We provide an overview of some of these problems.
![Page 11: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/11.jpg)
Problems
• (1) Forgotten concept unions• The property ebucore:description has multiple domain axioms.
<rdfs:domain rdf:resource="&ebu;BusinessObject"/>
<rdfs:domain rdf:resource="&ebu;MediaResource"/>
• Unintentionally the wrong implicit information can be inferred.• Unintentionally the wrong implicit information can be inferred.
• (2a) Property unsatisfiability – via class axioms<owl:Class rdf:about="&ebu;BusinessObject">
… <owl:disjointWith rdf:resource="&ebu;Resource"/> …
</owl:Class>
• Because of (1) and (2), the property description could not be used
![Page 12: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/12.jpg)
Problems
• (2b) Property unsatisfiability – role hierarchies and datatypes• Duration has the range xsd:string• The subproperties of duration have other ranges (e.g., double in
the case of duration in edit units)• Because each subproperty also inherits the range of the • Because each subproperty also inherits the range of the
superproperty, all instances in the object of that property must be at the same time a string, and a double. This type conflict results in a contradiction.
• With (2a) and (2b) we identified 40 properties that lead to problems.
![Page 13: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/13.jpg)
Problems
• (3) Inconsistencies between formal and informal definitions
• BusinessObject is defined as: "An image, a document, an annotation […], a tag […], or an audiovisual media resource […]. Other types of BusinessObjects may be defined as subclasses.“BusinessObjects may be defined as subclasses.“
• Resource is defined as: "A manifestation of a BusinessObject." and disjoint with BusinessObjects. Meaning no individual can be an element of BusinessObjects and Resources at the same time.
• The domain of a title is BusinessObject, yet, it’s definition is: "Specifies the title or name given to the resource. […]"
![Page 14: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/14.jpg)
Problems
• (4) User readable labels• Many different properties have the same human readable label,
which could confuse the end user – e.g., when generating an Interface. • E.g., there were 11 properties with the label “Name”• E.g., there were 11 properties with the label “Name”
• Some properties had empty labels
• (5) Roles – Loss of context• Agents were related to Business Objects (BO)• Agents were related to a Role• But … a role did not relate to agents in relationship with a BO• This lead to a loss of context.
![Page 15: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/15.jpg)
Addressing the issues
• Problems were addressed over email.
• The discussions are “lost”, traces are only known to us …
• The ontology-engineering activities of EBU Core should adopt appropriate methods and tools for collaboration.• Participation of others• Traceability (!)
• The ontology is still being developed as we go along, and we have been able to make (parts of it) work…
![Page 16: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/16.jpg)
Using the ontology
![Page 17: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/17.jpg)
![Page 18: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/18.jpg)
Conclusions and Recommendations
• RTÉ Archives aims at a wider reuse and repackaging of their archival content on digital platforms through the innovative use of Semantic and Linked Data technologies.
• We adopted the EBU Core OWL ontology for annotating the television and radio archives, yet identified some issues.
• We adopted the EBU Core OWL ontology for annotating the television and radio archives, yet identified some issues.
• We collaborated on resolving those issues together with EBU
• However, we feel that appropriate collaborative methods and tools should be adopted to facilitate the ontology-engineering process and – more importantly – enable other to participate AND have visible traceability of the decisions.
![Page 19: RTÉ Content Discovery Project - Christophe Debruyne](https://reader036.vdocuments.us/reader036/viewer/2022081400/55504d81b4c90580748b52c1/html5/thumbnails/19.jpg)
References
• D2RQ, http://d2rq.org/
• Digital Repository of Ireland, http://www.dri.ie/
• Insight, http://www.insight-centre.org/• Insight, http://www.insight-centre.org/
• Jena TDB, http://jena.apache.org/documentation/tdb/
• RTÉ Archives, http://www.rte.ie/archives