tib|av-portal - icsti.org · tib portal for science and technology offers ... competence centre for...

26
Margret Plank 19th of January 2015 TACC Meeting TIB|AV-Portal

Upload: vudat

Post on 30-Aug-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Margret Plank19th of January 2015

TACC Meeting

TIB|AV-Portal

22

German National Library of Science and Technology (TIB)

German National Library of Science and Technology – for all areas of engineering as well as architecture, chemistry, computer science, mathematics and physics

Financed by German Federal Government and all Federal States

TIB portal for science and technology offers more than 160 million data sets from professional databases, publishers‘ services and library catalogues

Access to full texts, research data, audiovisual media and 3D models

DOI registration agency for research data since 2005

Source: TIB

33

The significance of non-textual materials like Videos and 3D-Objects in research and teaching is steadily increasing.

Competence Centre for non-textual materials

The goal of the competence centre is to improve access to and the use of nontextual material.

4

Scope• Mediaspecific portal for sientific videos• Automated video analysis with scene,

speech, text and image recognition• crosslingual retrieval english / german• Persistent identifiers (DOI) and Media

fragment Identifiers (MFID)

Target Group• Researchers, university staff, students

Content• Research & teaching films covering

topics from science and technology • 2500 videos / 900 hours ( Jan. 2015)• Most of it under open access licences

www.av.getinfo.de

Competence Centre for nontextual materialsTIB|AV-Portal

5

• Recordings of conferences and other scientific lectures, talks and discussions• Recordings of experiments from the area of research and development• Microscopic images, images taken using special cameras and techniques• Documentation and presentation of research work and results• Presentations of 3D models (such as architectural models) on film • Recordings of university seminars and lectures• eLearning material such as online tutorials, MOOCs, teaching experiments and

experiments for private study• Presentations of scientific organisations• Documentaries, reports and portraits

What is meant by scientific videos?

6

Scientific Videos

Scientific videos are a valuable source- to illustrate and share knowledge concerning findings, methodologies or

procedures within the scientific community- to make e.g. lab experiments transparent and reproducible- visualize temporal components by e.g. using zooming or stretching

techniques etc.

User needs• High quality content• Free access and usage of videos• Advanced findability on a permanent basis• Citation of videos and video segments• Links to videos and video segments• Interlinkage with further research data

7

Video Portals

Can YouTube manage this? NO

• You cant find what you are looking for if it‘s not in the title• You cant find a specific video without the URL• Unclear how long the content will be there• You can only cite the whole video not a segment• You can only link to the title not to a segment

8

Can other video portals manage this? To some extend.

Video Portals

9

TIB Services for audiovisual media

• Access• Citability• Permanence• Long term preservation• Searchability• Curation• Cataloging• Licencing quality content• Generate recognition for the creators

10

TIB|AV-Portal

1111

1. Automatic Video Analysis of the TIB|AV-Portal

Skorupka, Sascha: Experiment der Woche, 2012

1. Spoken Language

2. Visual Concepts

4. Structural Information(Video Segments)

3. Text Overlays

1212

2. Automatic Indexing of Videos on the Basis of the GND

• Text Analysis: OCR analysis of text overlays (e.g. on slides) OCR transcript

• Audio Analysis: speech to text audio transcript

• Named Entity Recognition- Reference Vocabulary: subject headings of the

Gemeinsame Normdatei (GND = Integrated Authority File)- Analyzed Text Content: OCR transcript and audio transcript

131313

Automatic Indexing on the Segment Level

• The individual segments of the video are automatically indexed withGND subject headings.

Pinpoint segment based searches within the videos

Skorupka, Sascha: Experiment der Woche, 201113

1414

Problem: English Labels are Lacking in the GND

• The GND contains only very few English labels regarding thesubject headings used in the AV Portal knowledge base: 63.356 subject headings from the field of science and technology.1

For the automatic indexing of English-language videos, an English indexing vocabulary is lacking

No segment-based queries within English-languagevideos using subject headings

1 In 2013, the results of the MACS project had not yet been published in the GND dump.

1515

3. Mapping the GND Entities onto the DBpedia and other Authority Files

Solving the Problem• Gaining English labels for the GND entities of the AV Portal

knowledge base (63.356 subject headings) by mapping theGND entities onto the DBpedia and other authority files (LCSH, MACS results, and the WTI Thesaurus)

DBpedia• The DBpedia contains structured information from the

Wikipedia (info boxes, tables, web links). This information ismade available as Linked Data.

161616

Mapping the GND Entities onto the DBpedia2

http://d-nb.info/gnd/4000537-9

labels

AdventVorweihnachtszeitAdventszeitVorweihnachtsfest

find DBpedia candidate(s)

dbpde:Adventdbpde:The_Adventdbpde:Advent_Cornwalldbpde:Advent_Creekdbpde:Advent_Computersdbpde:Advent_(Band)dbpde:Advent_(publisher)dbpde:Advent_Recordsdbpde:Advent_International

disambiguatedbpde:Advent

languagelink

dbp:Advent

context

2 Figure: Steinmetz, N. / Sack, H.: Cross-Lingual Semantic Mapping of Authority Files. SWIB 2013, Hamburg. Foliennr. 36.

1717

4. Results

17

GND Subject Headings German DBpedia English DBpedia

63.356 35.638 56% 28.691 45%

• 28.691 GND subject headings could be „translated“ into English bymeans of the GND/DBpedia mapping

• Overall result of all mapping strategies ( DBpedia; LCSH; MACS; WTI Thesaurus):

• 35.025 (55%) GND subject headings got (at least) one English label

English Labels

1818

Quality of the Results of the Mapping Algorithm

• Ground Truth (Evaluation Data)

180 manual mappings of GND entities of the AV portalknowledge base to the corresponding German DBpedia entities

• Recall

The algorithm mapped 90% of the GND entities of the GroundTruth.

• Precision

92% of the GND entities were mapped correctly.

GND German DBpedia

1919

5. Additional Benefit of the GND/DBpedia Mapping

Problem

• GND contains only a small amount of context information.

Often, a reliable disambiguation of GND entities in theprocess of the Named Entity Recognition is notpossible.

Context Information

2020

Disambiguation of GND Entities

Wind

http://d-nb.info/gnd/4066257-3

… Die Winde arbeitet nach dem Prinzip des Wellrads, kombiniert mit einem Hebel ...

Audio Transcript

ContextInformation

LuftströmungMeteorologieKlimatologieHochatmosphäreMagnetosphäre

Winde

http://d-nb.info/gnd/4742659-7

HebezeugHebelWellradFahrzeugbauFördertechnik

ContextInformation

‚Winde‘ isindexed<wind> <windlass>

2121

Mapping the GND Entities onto the DBpedia

Solving the Problem• Extracting context information from the DBpedia for the GND

entities of the AV Portal knowledge base

Improving the disambiguation of GND entities

Context Information from the DBpedia• WikiLinks (Internal Wikipedia Links) of the DBpedia entities• Wikipedia articles to which the DBpedia entities refer

2222

DBpedia

2323

Results

23

GND Subject Headings German DBpedia English DBpedia

63.356 35.638 56% 28.691 45%

• For 35.638 GND subject headings, additional context information couldbe extracted from the German DBpedia (WikiLinks, Wikipedia articles)

• For 28.691 GND subject headings, additional context information couldbe extracted from the English DBpedia (WikiLinks, Wikipedia articles)

Context Information

Improving the disambiguation of the German-language NER

Improving the disambiguation of the English-language NER

24http://av.getinfo.de/

TIB|AV-Portal: Demonstration

25

Next steps…..• Improve search options• Make results from videoanalysis editable• Linked open data service > via SPARQL endpoint• Enrich TIB|AV-Portal by linked open data• Deal with video as add-ons and with video abstracts• Provide video metrics and statistics

TIB|AV-Portal

Thank you for your attention!