tib|av-portal - icsti.org · tib portal for science and technology offers ... competence centre for...
TRANSCRIPT
22
German National Library of Science and Technology (TIB)
German National Library of Science and Technology – for all areas of engineering as well as architecture, chemistry, computer science, mathematics and physics
Financed by German Federal Government and all Federal States
TIB portal for science and technology offers more than 160 million data sets from professional databases, publishers‘ services and library catalogues
Access to full texts, research data, audiovisual media and 3D models
DOI registration agency for research data since 2005
Source: TIB
33
The significance of non-textual materials like Videos and 3D-Objects in research and teaching is steadily increasing.
Competence Centre for non-textual materials
The goal of the competence centre is to improve access to and the use of nontextual material.
4
Scope• Mediaspecific portal for sientific videos• Automated video analysis with scene,
speech, text and image recognition• crosslingual retrieval english / german• Persistent identifiers (DOI) and Media
fragment Identifiers (MFID)
Target Group• Researchers, university staff, students
Content• Research & teaching films covering
topics from science and technology • 2500 videos / 900 hours ( Jan. 2015)• Most of it under open access licences
www.av.getinfo.de
Competence Centre for nontextual materialsTIB|AV-Portal
5
• Recordings of conferences and other scientific lectures, talks and discussions• Recordings of experiments from the area of research and development• Microscopic images, images taken using special cameras and techniques• Documentation and presentation of research work and results• Presentations of 3D models (such as architectural models) on film • Recordings of university seminars and lectures• eLearning material such as online tutorials, MOOCs, teaching experiments and
experiments for private study• Presentations of scientific organisations• Documentaries, reports and portraits
What is meant by scientific videos?
6
Scientific Videos
Scientific videos are a valuable source- to illustrate and share knowledge concerning findings, methodologies or
procedures within the scientific community- to make e.g. lab experiments transparent and reproducible- visualize temporal components by e.g. using zooming or stretching
techniques etc.
User needs• High quality content• Free access and usage of videos• Advanced findability on a permanent basis• Citation of videos and video segments• Links to videos and video segments• Interlinkage with further research data
7
Video Portals
Can YouTube manage this? NO
• You cant find what you are looking for if it‘s not in the title• You cant find a specific video without the URL• Unclear how long the content will be there• You can only cite the whole video not a segment• You can only link to the title not to a segment
9
TIB Services for audiovisual media
• Access• Citability• Permanence• Long term preservation• Searchability• Curation• Cataloging• Licencing quality content• Generate recognition for the creators
1111
1. Automatic Video Analysis of the TIB|AV-Portal
Skorupka, Sascha: Experiment der Woche, 2012
1. Spoken Language
2. Visual Concepts
4. Structural Information(Video Segments)
3. Text Overlays
1212
2. Automatic Indexing of Videos on the Basis of the GND
• Text Analysis: OCR analysis of text overlays (e.g. on slides) OCR transcript
• Audio Analysis: speech to text audio transcript
• Named Entity Recognition- Reference Vocabulary: subject headings of the
Gemeinsame Normdatei (GND = Integrated Authority File)- Analyzed Text Content: OCR transcript and audio transcript
131313
Automatic Indexing on the Segment Level
• The individual segments of the video are automatically indexed withGND subject headings.
Pinpoint segment based searches within the videos
Skorupka, Sascha: Experiment der Woche, 201113
1414
Problem: English Labels are Lacking in the GND
• The GND contains only very few English labels regarding thesubject headings used in the AV Portal knowledge base: 63.356 subject headings from the field of science and technology.1
For the automatic indexing of English-language videos, an English indexing vocabulary is lacking
No segment-based queries within English-languagevideos using subject headings
1 In 2013, the results of the MACS project had not yet been published in the GND dump.
1515
3. Mapping the GND Entities onto the DBpedia and other Authority Files
Solving the Problem• Gaining English labels for the GND entities of the AV Portal
knowledge base (63.356 subject headings) by mapping theGND entities onto the DBpedia and other authority files (LCSH, MACS results, and the WTI Thesaurus)
DBpedia• The DBpedia contains structured information from the
Wikipedia (info boxes, tables, web links). This information ismade available as Linked Data.
161616
Mapping the GND Entities onto the DBpedia2
http://d-nb.info/gnd/4000537-9
labels
AdventVorweihnachtszeitAdventszeitVorweihnachtsfest
find DBpedia candidate(s)
dbpde:Adventdbpde:The_Adventdbpde:Advent_Cornwalldbpde:Advent_Creekdbpde:Advent_Computersdbpde:Advent_(Band)dbpde:Advent_(publisher)dbpde:Advent_Recordsdbpde:Advent_International
disambiguatedbpde:Advent
languagelink
dbp:Advent
context
2 Figure: Steinmetz, N. / Sack, H.: Cross-Lingual Semantic Mapping of Authority Files. SWIB 2013, Hamburg. Foliennr. 36.
1717
4. Results
17
GND Subject Headings German DBpedia English DBpedia
63.356 35.638 56% 28.691 45%
• 28.691 GND subject headings could be „translated“ into English bymeans of the GND/DBpedia mapping
• Overall result of all mapping strategies ( DBpedia; LCSH; MACS; WTI Thesaurus):
• 35.025 (55%) GND subject headings got (at least) one English label
English Labels
1818
Quality of the Results of the Mapping Algorithm
• Ground Truth (Evaluation Data)
180 manual mappings of GND entities of the AV portalknowledge base to the corresponding German DBpedia entities
• Recall
The algorithm mapped 90% of the GND entities of the GroundTruth.
• Precision
92% of the GND entities were mapped correctly.
GND German DBpedia
1919
5. Additional Benefit of the GND/DBpedia Mapping
Problem
• GND contains only a small amount of context information.
Often, a reliable disambiguation of GND entities in theprocess of the Named Entity Recognition is notpossible.
Context Information
2020
Disambiguation of GND Entities
Wind
http://d-nb.info/gnd/4066257-3
… Die Winde arbeitet nach dem Prinzip des Wellrads, kombiniert mit einem Hebel ...
Audio Transcript
ContextInformation
LuftströmungMeteorologieKlimatologieHochatmosphäreMagnetosphäre
Winde
http://d-nb.info/gnd/4742659-7
HebezeugHebelWellradFahrzeugbauFördertechnik
ContextInformation
‚Winde‘ isindexed<wind> <windlass>
2121
Mapping the GND Entities onto the DBpedia
Solving the Problem• Extracting context information from the DBpedia for the GND
entities of the AV Portal knowledge base
Improving the disambiguation of GND entities
Context Information from the DBpedia• WikiLinks (Internal Wikipedia Links) of the DBpedia entities• Wikipedia articles to which the DBpedia entities refer
2323
Results
23
GND Subject Headings German DBpedia English DBpedia
63.356 35.638 56% 28.691 45%
• For 35.638 GND subject headings, additional context information couldbe extracted from the German DBpedia (WikiLinks, Wikipedia articles)
• For 28.691 GND subject headings, additional context information couldbe extracted from the English DBpedia (WikiLinks, Wikipedia articles)
Context Information
Improving the disambiguation of the German-language NER
Improving the disambiguation of the English-language NER
25
Next steps…..• Improve search options• Make results from videoanalysis editable• Linked open data service > via SPARQL endpoint• Enrich TIB|AV-Portal by linked open data• Deal with video as add-ons and with video abstracts• Provide video metrics and statistics
TIB|AV-Portal