mmsem background dr ioannis pratikakis institute of informatics & telecommunications ncsr...

35
MMSEM background Dr Ioannis Pratikakis Dr Ioannis Pratikakis Institute of Informatics & Institute of Informatics & Telecommunications Telecommunications NCSR “Demokritos”, Athens, Greece NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10 July 2006

Upload: solomon-flowers

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM background

Dr Ioannis PratikakisDr Ioannis Pratikakis

Institute of Informatics & TelecommunicationsInstitute of Informatics & Telecommunications

NCSR “Demokritos”, Athens, GreeceNCSR “Demokritos”, Athens, Greece

MMSEM – F2F meeting

Amsterdam, 10 July 2006

Page 2: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 2

NCSR “Demokritos” - NCSR “Demokritos” - Athens, GREECEAthens, GREECE

TThe largest self-governing research organisation, under he largest self-governing research organisation, under the supervision of the Greek Government the supervision of the Greek Government

It is composed of the following Institutes:It is composed of the following Institutes: BiologyBiology Materials ScienceMaterials Science MicroelectronicsMicroelectronics Informatics & TelecommunicationsInformatics & Telecommunications Nuclear Technology & Radiation ProtectionNuclear Technology & Radiation Protection Nuclear PhysicsNuclear Physics Radioisotopes & Radiodiagnostic ProducrsRadioisotopes & Radiodiagnostic Producrs Physical ChemistryPhysical Chemistry

Page 3: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 3

Institute of Informatics and Telecommunications (IIT)

CILComputationalIntelligenceLaboratory

SKELSoftware &KnowledgeEngineeringLaboratory

Informatics Section

Page 4: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 4

SKEL profile

Information Integration

User-friendly information access Ontology Creation and Maintenance

SKEL researchers aim to develop knowledge technologies that will enable the efficient, cost-effective and user-adaptive management

and presentation of information

Page 5: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 5

Basic ResearchBasic Research• Grammar induction

• Active learning of classifiers

• Focused crawling

• Wrapper induction

• Information extraction

• Natural language generation

• Evolving summarization

• Ontology population and enrichment

• Web usage mining

Page 6: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 6

Applied ResearchApplied Research

– The general-purpose language engineering platform Ellogon (http://www.ellogon.org/)

– Language processing tools and resources

– The i-DIP platform for developing web content collection and extraction systems

– The QUATRO proxy server, for validating RDF labels of web resources

– The FILTRON e-mail filter, that blocks unsolicited commercial e-mail (spam messages)

– The FilterX Web proxy filter, that blocks obscene Web content

– Tools for creating and maintaining ontologies

– The PServer general-purpose server for personalization

– The KOINOTHTES system for knowledge discovery from web usage data

– An authoring tool for porting language generation systems to new domains and languages

Page 7: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 7

CIL profileCIL profile

Neural Networks

Computational Intelligence-Pattern recognition background

Biologically inspired modelling

Bayesian networks Support Vector

Machines

Multimedia Information Processing, Semantic

analysis & Retrieval

Image Video 3D Graphics

MultimediaSemantic Model

Page 8: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 8

• Preprocessing and feature extraction methods• Machine learning (neural networks, statistical, support vector machines)• Novel algorithm development and testing • Biologically inspired algorithms and architectures

CIL: Platform for intelligent CIL: Platform for intelligent information processinginformation processing

Page 9: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 9

CIL: Processing and Recognition of old manuscripts

Feature extraction Recognition

Page 10: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 10

Camera Based Document Analysis & Recognition

Text Identification in Web images

Page Segmentation

Table Detection

Page 11: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

CIL: Word spotting-Image based search in early handwritten and printed documents

Page 12: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 12

Query view

Results and relative similarity to the

query

CIL: Content Based Image CIL: Content Based Image RetrievalRetrieval

Page 13: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 13

CIL: 3-D Graphics retrieval based CIL: 3-D Graphics retrieval based on shapeon shape

Query3D Model

First 12 answers

Page 14: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 14

CIL: Human TrackingCIL: Human Tracking

• Tracker initialisation through – Face detection – Separation from

background– Motion field calculation

• Tracking methods– CAMSHIFT– Snakes

• Features to use for tracking: – Skin color– Clothing color - texture

Page 15: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 15

CIL: Human Behavior AnalysisCIL: Human Behavior Analysis

• Behavior modeling using– Bayesian Networks– Hidden Markov Models

• Application case: Violence detection in video

Automatic violence detection:

Page 16: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

BOEMIE Bootstrapping Ontology Evolution with

Multimedia Information Extraction

Dr Ioannis PratikakisDr Ioannis Pratikakis

Institute of Informatics & TelecommunicationsInstitute of Informatics & Telecommunications

NCSR “Demokritos”, Athens, GreeceNCSR “Demokritos”, Athens, Greece

MMSEM – F2F meeting

Amsterdam, 10 July 2006

Page 17: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 17

ContentsContents

• Consortium

• Motivation

• BOEMIE proposal

• Application scenario

• Concluding remarks

Page 18: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 18

BOEMIE projectBOEMIE project• Bootstrapping Ontology Evolution with Multimedia Information

Extraction

• STRP, IST-2004-2.4.7 “Semantic-based Knowledge and Content Systems”

– Started: 01/03/2006, Duration: 36 months

• Consortium

– Inst. of Informatics & Telecommunications, NCSR “Demokritos” (SKEL & CIL), Greece (Coordinator)

– Fraunhofer Institute for Media Communication (NetMedia), Germany

– Dip. di Informatica e Comunicazione, University of Milano (ISLab), Italy

– Inst. of Telematics and Informatics CERTH (IPL), Greece

– Hamburg University of Technology (STS), Germany

– Tele Atlas, Belgium

Page 19: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 19

Multimedia Content Analysis - IMultimedia Content Analysis - I

• Multimedia content grows with increasing rates

• Hard to provide semantic indexing of multimedia content

• Significant advances in automatic extraction of low-level features from visual content

• Little progress in the identification of high-level semantic features

Page 20: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 20

Multimedia Content Analysis - IIMultimedia Content Analysis - II

• Inadequate the analysis of single modalities

• Little progress in the effective combination of semantic features from different modalities.

• Significant effort in producing ontologies for semantic webs.

• Hard to build and maintain domain-specific multimedia ontologies.

Page 21: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 21

Existing approaches - IExisting approaches - I

• Combination of modalities may serve as a verification method, a method compensating for inaccuracies, or as an additional information source

• Combination methods may be iterated allowing for incremental use of context

• Major open issues in combination concern

– the efficient utilization of prior knowledge,

– the specification of open architecture for the integration of information from multiple sources, and

– the use of inference tools

Page 22: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 22

Existing approaches - IIExisting approaches - II

• Most of the extraction approaches are based on machine learning methods

• With the advent of promising methodologies in multimedia ontology engineering

– knowledge-based approaches are expected to gain in popularity and

– be combined with the machine learning methods

Page 23: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 23

Existing approaches – IIIExisting approaches – III

• Use of Ontologies to “drive” the information extraction process– providing high-level semantic information that helps

disambiguating the labels assigned to MM objects

• Major open issues in building and maintaining MM ontologies concern – automatic mapping between low level audio-visual

features and high level domain concepts,

– automated population and enrichment from unconstrained content,

– employing of ontology coordination techniques when multiple ontologies are present

Page 24: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 24

Existing approaches - IVExisting approaches - IV

• Synergy between information extraction and ontology learning through a bootstrapping process

– to improve both the conceptual model and the extraction system through iterative refinement

• Applied so far in knowledge acquisition from textual content

– bootstrapping starts with an information extraction system that uses a domain ontology, or

– bootstrapping starts with a seed ontology, usually small

Page 25: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 25

BOEMIE proposal - IBOEMIE proposal - I

• Driven by domain-specific multimedia ontologies, BOEMIE systems will be able to identify high-level semantic features in image, video, audio and text and fuse these features for optimal extraction.

• The ontologies will be continuously populated and enriched using the extracted semantic content.

• This is a bootstrapping process, since the enriched ontologies will in turn be used to drive the multimedia information extraction system.

Page 26: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 26

BOEMIE Proposal - IIBOEMIE Proposal - II

EVOLVEDONTOLOGY

INITIALONTOLOGY

POPULATION & ENRICHMENT COORDINATION

INTERMEDIATEONTOLOGY

ONTOLOGY EVOLUTION TOOLKIT

LEARNING TOLS

REASONING ENGINE

MATCHING TOOLS

ONTOLOGY MANAGEMENT TOOL

ONTOLOGY INITIALIZATION AND CONTENT MANAGEMENT TOOL

ONTOLOGY EVOLUTION

EVENTSDATABASE

MAPSDATABASE

MAP ANNOTATION INTERFACE

SEMANTICS EXTRACTION

RESULTS

OTHERONTOLOGIES

SEMANTICS EXTRACTION

MULTIMEDIA CONTENT

SEMANTICS EXTRACTION TOOLKIT

TEXT EXTRACTION TOOLS

AUDIO EXTRACTION TOOLS

INFORMATION FUSION TOOLS

VISUAL EXTRACTION TOOLS

FROM VISUAL CONTENT

FROM NON-VISUAL CONTENT

FROM FUSED CONTENT

Content Collection (crawlers, spiders, etc.)

Page 27: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 27

BOEMIE proposal - IIIBOEMIE proposal - III

• Semantics extraction

– Emphasis to visual content, from images and video, due to its richness and the difficulty of extracting useful information.

– Non-visual content, audio/speech and text, will provide supportive evidence, to improve extraction precision.

– Fusing information from multiple media sources is needed since

• no single modality is powerful enough to encompass all aspects of the content and identify concepts precisely.

Page 28: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 28

BOEMIE proposal - IVBOEMIE proposal - IV

• Multimedia Semantic Model

– development of a unifying representation, a “multimedia semantic model” to integrate:

• a multimedia ontology which

– describes the structure of multimedia content (content objects, such as a segment in a static image, a time window in audio, a video shot, ...),

– describes visual characteristics of content objects in terms of low-level features (colour, shape, texture, motion, …)

• a domain ontology which contains knowledge about the selected application domain, and

• a geographic ontology which contains additional knowledge about the locations to be used

Page 29: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 29

BOEMIE proposal – VBOEMIE proposal – V

• Ontology evolution involves

– ontology population and enrichment, i.e., addition of concepts, relations, properties and instances,

– coordination of

• homogeneous ontologies e.g. when more than one ontology for the same domain are available, and

• heterogeneous ontologies, e.g., updating the links between a modified domain ontology and a multimedia descriptor ontology,

– maintenance of semantic consistency

• any of the above changes may generate inconsistencies in other parts of the same ontology, in the linked ontologies or in the annotated content base.

Page 30: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 30

Application scenario - IApplication scenario - I

• Enrichment of digital maps with semantic information

– Domain: sport events in a given area (big cities)

• Sub-domain initially selected: athletics (running, jumping and throwing events)

• Cities will be selected taking into account: number and frequency of sports events, availability of multimedia coverage in English of these events, availability of map and landmark data for the city

– BOEMIE will collect multimedia coverage for sport events and strive to extract as much knowledge from the extracted features as possible, using and evolving the corresponding domain ontologies

– The identified entities and their properties, will be linked to geographical locations and stored in a content server

– The user will be provided with immediate access to the annotated content

Page 31: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 31

Application scenario - IIApplication scenario - II

• Querying

– The prototype will perform reasoning using knowledge from the domain ontology and geographical knowledge to deduce further information and answer user queries.

– The user will be able to perform the following queries:

• events in a time frame

• events of a particular type

• events at a certain location

• persons related to events

• events similar to a given one

• events at nearby venues

• points of interest near a venue

• combinations of the above

Page 32: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 32

Application scenario - IIApplication scenario - II

• Querying: an example

– Find out the location of the venues in which Athlete A has participated in a high jump competition in the city X.

• From transcribed radio commentary, the BOEMIE system knows that in 2001, the World Championships in Athletics were held in city X in venue Y. From the geographical data, it knows the exact location of venue Y in city X.

• It has further analyzed a video snippet and identified it as a high jump event. From the meta data of the video, the system knows its date of recording in 2001, and in the audio of this snippet, the keywords “X” and A's name were spotted.

• Therefore, the system can deduce that A has indeed participated in a high jump competition in city X, namely the World Championships in Athletics 2001.

• As a result, the BOEMIE system presents all used multimedia assets as “prove” for its answer and gives the exact location of the venue where the World Championship in Athletics took place.

Page 33: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 33

Concluding remarks - IConcluding remarks - I

• BOEMIE work aims to initiate a discussion on the problem of knowledge acquisition and the synergy of information extraction and ontology evolution

• Several open issues:

– the role of ontology in fusing information from multiple media

– ways to learn the optimal combination of features derived from MM content

– how existing ontology languages can be extended to tackle the requirements of MM content analysis

– the application of existing ontology learning and inference techniques in the context of MM content

– the application of the coordination task in a new context which involves not only homogeneous ontologies, but also heterogeneous ones

Page 34: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 34

Concluding remarks - IIConcluding remarks - II

• The main measurable objective of BOEMIE initiative is to improve significantly the performance of existing single-modality approaches in terms of scalability and precision.

• Towards that goal, our aim is to

– develop a new methodology for extraction and evolution, using a rich multimedia semantic model, and

– realize it as an open architecture that will be coupled with the appropriate set of tools.

Page 35: MMSEM background Dr Ioannis Pratikakis Institute of Informatics & Telecommunications NCSR “Demokritos”, Athens, Greece MMSEM – F2F meeting Amsterdam, 10

MMSEM – F2F meeting, Amsterdam, 10/07/2006 35

BOEMIE Bootstrapping Ontology Evolution with

Multimedia Information Extraction

http://www.boemie.org

THANK YOU !!!THANK YOU !!!