the automatic generation of formal annotations in a multimedia indexing and searching environment
Post on 01-Jan-2016
35 Views
Preview:
DESCRIPTION
TRANSCRIPT
The Automatic Generation of Formal Annotations in a MultiMedia Indexing and Searching Environment
Thierry Declerck, Peter Wittenburg and Hamish Cunningham
DFKI GmbH, Max-Planck-Institut für Psycholinguistik and University of Sheffield
ACL/EACL2001 Workshop on Human Language Technology and Knowledge
Management
The MUMIS Consortium
• CTIT University of Twente, Enschede, NL NLP/IE• TSI University of Nijmegen, Nijmegen, NL ASR• DFKI Saarbrücken, D
NLP/IE• MPI Nijmegen, NL Online SW• DCS University of Sheffield, UK NLP/IE• ESTEAM Gothenburg, SE (location Athens, GR)
Translation Software
• VDA Hilversum, NL Dissemination
Objectives
• Technology development to automatically index (with formal annotations) lengthy multimedia recordings (off-line process)
Find and annotate relevant events, together with the involved entities and relations
• Technology development to exploit indexed multimedia archives (on-line process)
Search for interesting scenes and play them via Internet
Test Domain: Soccer Games / UEFA Tournament 2000
Off-line Task
• Automatic Speech Recognition (Radio/TV Broadcasts)
Automatically transforms the speech signals into texts (for 3 languages — Dutch, English and German)
• Natural Language Processing (Information Extraction)
Analyse all available textual documents (newspapers, speech transcripts, tickers, formal texts ...), identify and extract interesting entities, relations and events
• Merging all the annotations produced so far
• Create a database with formal annotations
The Generation of Formal Annotations
• Metadata (type of game, teams, date, final score, players etc.), as they can be used a.o. for classifying and filtering videos in the MM digital archive• Events (particular actions with time codes, involved entities and related events), as they can be extracted from the video sequences• All Formal Annotations available in XML Standard
The Event TableRelated to domain ontology and multilingual terminology. Guiding the generation of formal annotations
Final whistle # 90>t>120
Subj=referee, score etc… Final score
Goal kick # 0>t>120
Subj=pl, loc=loc, cons=cons,..
Dribbling # 0>t>120
Subj=pl, loc=loc, …
Substitution # 0>t>120
Subj=pl, I.obj=pl, cause=c, …
Team (adding pl)
Red Card # 0>t>120
Subj=ref, I.obj=pl, cause=c, …
Team (red at t)
Goal # 0>t>=pen
Subj=pl, I.obj=team, score=s,
Order of goal
…
Event ID
Time Subcat/Modification Metadata
Off-line Task
Events indexed in video recording
1:0
60 m25 m25 m
SchollBasler
CampbellMatthäusBaslerNeville
DribblingFreekickFoul
28min24 min18 min17 min
DefensePassGoalFreekick
Radio Commenting3 Languages
Radio Commenting3 Languages
Radio Commenting3 Languages
Audio Commenting (TV, Radio)3 Languages
NewspaperText
NewspaperText
NewspaperText
NewspaperTexts
3 Languages
NewspaperText
NewspaperText
NewspaperText
Close caption3 Languages
multilingual IE
=> event tables
Merging of Annotations
Event = goal Player = Basler
Dist. = 25 m Time = 18
Score = 1:0
Event = goal Type = Freekick Player = Basler Dist. = 25 m Time = 17
Score: leading
Event = goal Player= Basler Team = GermanyTime = 18 Score = 1:0 Finalscore = 1:0
Event = goal Type = Freekick
Player = Basler Team = GermanyTime = 18 Score = 1:0 Final score = 1:0 Distance = 25 m
The Role of IE in MUMIS
• Information Extraction (IE) is the task of identifying, collecting and normalizing relevant information for a specific application or user.
• The relevant information is typically represented in form of predefined “templates”, which are filled by means of Natural Language (NL) analysis (Template = Event Table in MUMIS)
• IE combines pattern matching mechanisms, (shallow) NLP and domain knowledge (terminology and ontology).
Extension of our IE system in MUMIS
• Multilingual and multisource IE. Incremental information building
• Cross-document co-reference resolution• Combine Metadata and event extraction =>
better organisation and dynamic updating of information (KM)
• Multiple presentation of results: Template, Event table and Hyperlinks (Named Entities, rel. to KM)
Example of Processing Formal Texts
• Formal Text
• The Formal Text annotated with domain-specific information
Example of Processing Semi-Formal Texts
• Semi-Formal Text
• The Semi-Formal Text annotated with domain-specific information
Merging Component
• Acting on the generated formal annotations (Metadata and Events), but also interleaving with the generation process of those
• Checking consistency, eliminating redundancy (Template Merging), in accordance with domain ontology
• Completing the information with domain knowledge, inference Machine
On-line Tasks
Searching and Displaying
• Search for interesting events with formal queriesGive me all goals from Overmars shot with his head in 1.
Half.Event=Goal; Player=Overmars; Time<=45; Previous-
Event=Headball
• Indicate hits by thumbnails & let user select scene
• Play scene via the Internet & allow scrolling etcOf course: slow motion, fast play, start/stop, etc
• User Guidance (Lexica and Ontology)
On-line Tasks
Knowledge GuidedUser Interface
&Search Engine
München - Ajax1998
München - Porto1996
Deutschland - Brasilien1998
Prototype Demo
PlayMovie
Fragmentof that Game
1:0
60 m25 m25 m
SchollBasler
CampbellMatthäusBaslerNeville
DribblingFreekickFoul
28min24 min18 min17 min
DefensePassGoalFreekick
More about MPEG (Moving Picture Coding Experts Group)
• MPEG-1: low-level media encoding and compression format (VHS quality, ~ 2-3 Mbps - good for streaming)
• MPEG-2: improved media encoding and compression format (S-VHS quality, ~ 5-10 Mbps, digital TV and DVD standard
• MPEG-4: Codes content as objects and enables those objects to be manipulated at the client, optimized compression
On-line SW Architecture
ClientApplet
JMF
WWW ServerJava Server
MediaServer MPEG1
MediaServer MPEG1
MediaServer MPEG1
DBServer rDBMS
MediaServer MPEG1
FileServer
HTTPRMI
RMI (RTP, RTSP)
JDBC
Query interface:• automatic pre-selection• guided by domain knowledge• interactive, visual feedback
Client-Server structure:• fully distributed• JMF media presentation • RMI-based interaction
Client Objects
Query Engine Objects
MetadataAnnotations
KeyframesMPEG Movies
Lexica
Ontology
Media Server Objects
On-line HW Architecture
• efficient & reliable storage management (near-line capacity, media change, 2. Location)
• high storage capacity (n TB, 1 h MPEG1 = 1 GB)• powerful media servers / powerful network
RAID
TapeLibrary
FC Switch
Media Server
Media Server
GB Switch
Internet
1GbpsGb-Switch
Router
UI / Annotation
• UI optimization• thumbnails not that informative• which thumbnail? (several around time mark)• automatic thumbnail adjustment?• seamless operation for user• lexicon/ontology guidance• user driven input
• Manual annotation tools• MediaTagger• EUDICO
Gain - User Group
• What gets lost? Is it necessary?• Potential: direct Internet Service, less
dependencies
Current Procedure MUMIS Procedure
Manual Video Annotation Automatic Video Annotation and DB
IntegrationIntegration Central DB
Query via PC Query via PC
Results on PC
Results on PCAnd
Select & Play
Contact Video Archive
Get Video Tapes
Search on Tape on VCR
Segment & Play
top related