[ieee 2011 9th international workshop on content-based multimedia indexing (cbmi) - madrid, spain...

A Semantic-Based and Adaptive Architecture for Automatic MultimediaRetrieval Composition

Daniela Giordano, Isaak Kavasidis, Carmelo Pino, Concetto SpampinatoDepartment of Electrical, Electronics and Informatics Engineering

University of CataniaViale Andrea Doria, 6 -95125 Catania, Italy

{dgiordan, ikavasidis, cpino, cspampin}@diit.unict.it

Abstract

In this paper we present a domain-independent multi-media retrieval (MMR) platform. Currently, the use ofMMR systems for different domains poses several limita-tions, mainly related to the poor flexibility and adaptabilityto different domains and user requirements. A semantic-based platform that uses ontologies for describing not onlythe application domain but also the processing workflowto be followed for the retrieval, according to user’s re-quirements and domain characteristics is here proposed.In detail, an ontological model (domain-processing ontol-ogy) that integrates domain peculiarities and processingalgorithms allows self-adaptation of the retrieval mecha-nism to the specified application domain. According tothe instances generated for each user request, our platformgenerates the appropriate interface (GUI) for the specifiedapplication domain (e.g. music, sport video, medical im-ages, etc...) by a procedure guided by the defined domain-processing ontology. A use case on content based music re-trieval is here presented in order to show how the proposedplatform also facilitates the process of multimedia retrievalsystem implementation.

1. Introduction

During the last twenty years, with the growth of Inter-

net and the advances in computer power, network band-

width, information storage, and signal/image/video pro-

cessing techniques, huge collections of multimedia data

(audio, videos, images, 3D models, etc...) have been cre-

ated and shared on the web. Content-based multimedia re-

trieval [1] supports users in the hard task of accessing and

searching these multimedia collections.

Despite the sustained effort in developing multimedia re-

trieval systems addressing different aspects such as se-

mantic gap, feature extraction, similarity functions, brows-

ing, summarization, indexing, and evaluation, it is com-

monly believed that there is still room for research, es-

pecially, regarding the development of user-centered ap-

proaches (whose first aim is to satisfy user’s needs) that,

so far, have been underemphasized. The lack of such ap-

proaches is mainly due to the gap between users needs and

media content representation. Indeed, different users (dif-

ferent background and domain) perceive differently multi-

media content or they are only interested in specific type of

information [2]. Moreover, most of the existing multimedia

systems, such as Informedia [3], Greenstone [4], VULDA

[5] cope with only one domain, e.g. sport, medicine, law,

etc. or they deal with only one specific representation of

multimedia content (audio, video, 3D Model, etc.). There-

fore, many of these systems lack in flexibility and in adapt-

ability to different domains and to different multimedia con-

tent, i.e., they are not able to provide customized views to

users for different domains and media content.

This flexibility has been partially achieved by using ontolo-

gies for annotating and representing multimedia contents

[6], [7] and [8]. Generally, these ontologies integrate do-

main knowledge into multimedia data for annotation pur-

poses. Despite the initial enthusiasm, the main problem is

that they only target at one specific domain or media data

type. To address all these needs and for ensuring effective

utilization of available multimedia collections by a variety

of users, in this paper we propose a multimedia retrieval

platform independent of domain and media content. In de-

tail, a flexible mechanism that employs ontologies allows

the platform to adapt to the users’ requirements that involve

the domain (e.g. perform a query on multimedia content in

the law domain), the type of media (e.g. videos) and the

performance (e.g. find the law videos in the shortest time,

i.e. use fast features matching algorithms that guarantee an

average accuracy - not the best - in terms of relevance of

the retrieved results, or find all the videos related to law

978-1-61284-433-6/11/$26.00 ©2011 IEEE 181 CBMI’2011

with high-accuracy neglecting time performance, i.e., use

the features matching algorithms that guarantee the best ac-

curacy in terms of relevance of the retrieved results even

if they require more time than the other implemented algo-

rithms). To accomplish that, our platform relies on several

domain-independent multimedia ontologies (image, video,

audio, etc.) that are integrated with domain ontologies

for providing domain-specific perspectives of multimedia

content and with processing ontologies for addressing the

users’ requirements about performance. This allows us to

provide multiple domain views of the same media, e.g. im-

age retrieval in sports, or image retrieval in medicine, and

to satisfy user’s needs such as “give me images on base-

ball in the shortest time”. Therefore, the major difference

between our approach and the existing ones is the sepa-

ration between domain ontologies, multimedia ontologies

and processing ontologies; indeed, in existing approaches

such as in [2] each ontology is a complete module by itself,

whereas in our approach the domain and processing ontolo-

gies are integrated “at run-time”, thus giving flexibility to

the platform. According to the instances generated from

such ontologies integration, our platform generates semi-

automatically interfaces (GUI) of the multimedia retrieval

system for the specified domain and for the specific media

type.

The remainder of the paper is as follows: Section 2 shows

the proposed platform, describing in detail each module

composing it. Section 3 describes an use case related to

music retrieval. Finally, concluding remarks are given in

the last section.

2. The Proposed Platform

The proposed platform implements a domain-media in-

dependent multimedia retrieval (MMR) platform, whose

functionalities can be customized to provide to the user mul-

tiple domain-specific views of multimedia content and to

provide the developer with tools to create easily multimedia

retrieval systems for the domain and the media type of inter-

est. The basic idea behind this work is to model the retrieval

process of every domain by the integration of three ontolo-

gies: a domain ontology, a media ontology and a processing

ontology (the algorithms for the media processing). In de-

tail, the integration between domain and processing ontolo-

gies (called domain-processing ontology) adapts the pro-

cessing workflow for the media content taking into account

the constraints defined by the users for a specific domain,

whereas the media ontology and the domain-processing on-

tology are responsible of the definition of the system’s inter-

faces. Usually, the retrieval process of a generic MMR sys-

tem is based on pre-defined steps resulting in a static work-

flow for content processing regardless from the domain of

application, whereas different domains require different al-

gorithms for feature extraction depending on their charac-

teristics.

As previously stated, the platform has been conceived in

order to accommodate the needs of both users and devel-

opers. From the developer’s point of view, our platform

provides a flexible mechanism supporting the design of

domain-specific multimedia retrieval systems: for a specific

domain, the developer designs and inserts into the platform

the domain ontology and a processing ontology, whereas

the media ontology is already present in the platform. An

ontology integration module is then responsible for integrat-

ing the domain and processing ontologies. Moreover, the

platform also enables developers to easily upload/integrate

the developed algorithms. A self-guided mechanism assists

interface designing and associates the developed algorithms

with the ontologies’ terms.

From the user’s point of view, our platform provides several

interfaces in function of the type of media to be retrieved

and these interfaces are related to a specific application do-

main and allows users to specify constraints. Each inter-

face also binds with the media ontology model and with the

domain-processing ontology that correlates the concepts to

the algorithms for the specific application domain.

Fig. 1 shows the architecture of the proposed platform con-

sisting of three levels: user level, developer level and repos-

itory level.

2.1. User Level

This level allows the user to interact with the platform

and carry out a media retrieval process. It uses three con-

nected layers: Domain Interface Layer, Ontology Layer and

Processing Layer that are shared also with the developer

level. The user level workflow is:

• The platform asks to the user to choose the application

domain among those available and the media type, for

example image retrieval in medicine for multiple scle-

rosis. According to this choice the platform provides

the interface suitable for the selected domain. After-

wards, the user chooses the constrains that define the

MMR’s performance in terms of processing strategy of

the query content.

• These constraints are forwarded to the ontology layer

that employs an OWL-based reasoner (Racer [9]) for

creating the sequence of algorithms to be executed

after checking its consistency. In detail, it receives

the user’s constraints and provides an instance of

the domain-processing ontology representing the se-

quence of the steps to follow (fig. 2).

• The sequence of the steps to be executed, derived by

the reasoner, is then sent to the processing module that

978-1-61284-433-6/11/$26.00 ©2011 IEEE 182 CBMI’2011

Figure 1. The Platform’s Architecture

Figure 2. Example of an instance of thedomain-processing ontology generated bythe reasoner

contains the algorithms implemented as services, i.e.

the steps required for the retrieval are invoked and ap-

plied in order to provide as output the media relevant

to the query performed by the user. More in detail,

the sequence provided by the reasoner regard the fea-

tures extraction and the features matching. The fea-

tures extraction algorithm indicated by the reasoner ex-

tracts the low level features from the media passed by

the user for querying the system. Then these features

are compared with the media features contained in the

repository, using the matching algorithm and the met-

rics selected by the reasoner. According to these met-

rics, the results are ranked and returned to the user.

To clarify this step an example is due: an user A selects

the domain (e.g. medicine) and then the media type (e.g.

image). According to these choices, our platform displays

one of the available interfaces. Then the user performs a

query in the form: “find all the images similar to the one

uploaded as soon as possible”. These constraints are sent to

the ontology layer, which fetches the ontology -medicine-image processing- (i.e. the domain-processing ontology)

from the ontology repositories and according to the required

time constraints (in this example as soon as possible) selects

the most efficient algorithms for the features extraction and

for features matching. The feature extraction algorithm is

then executed over the query image and matches the low

level features with the ones stored in the feature repository.

The results are then ranked and finally displayed to the used

through the selected interface. It has to be noticed that the

indexing of the media is performed at the developer level,

i.e. when the developer adds new algorithms for specific

media, these are then available (to the user) when the whole

indexing for the media of the specific domain has been com-

pletely performed.

2.2. Developer Level

This level allows the developer to create a specific do-

main multimedia retrieval system by the following steps:

• A developer inserts the specifications for the MMR

system by uploading (or by editing if already present

in the platform) the domain ontology, the processing

ontology and the algorithms. In detail, the Interface

978-1-61284-433-6/11/$26.00 ©2011 IEEE 183 CBMI’2011

Composition Module proposes to the developer a list of

algorithms and the ontological models for specific do-

mains stored in the system. The developer can choose

to create a new model or to modify an existent one.

If a domain-ontology already exists in the current ver-

sion of the platform, the developer is able only to up-

load new algorithms (therefore, to enrich the domain-

processing ontology by adding new processing terms)

and not to add the ontological model (i.e. it is not al-

lowed to the developer to modify the domain ontol-

ogy). Otherwise, the user uploads the domain and the

processing ontologies that are then integrated in the

ontology models section (repository layer).

• After that, an interface allows the developer to asso-

ciate the created/edited ontological model with the al-

gorithms, as shown in fig. 3.

Figure 3. Interface to bind the domain-processing ontology with the algorithms

• The domain-processing ontology in OWL can be

drawn either using a software like Protege 1 or directly

using the functionality of the ontology layer, imple-

mented as web services. The created ontology is then

validated by the reasoner Racer to assure the absence

of ambiguities in the constraints.

• Afterwards the interface composition module com-

poses semi-automatically a new interface related to the

drawn ontology.

• If the developer uploads new algorithms for features

extraction then the media related to the chosen domain

are indexed according to the new algorithms. When

the indexing is done for all the media (that are stored

in a repository section related to the specific domain)

the interface of the new model will be available to the

user layer.

1http://protege.stanford.edu

The processing and the interface composition mod-

ule have been implemented using the .NET framework,

whereas the ontology layer has been implemented as web

services in Java.

2.3. Repository Layer

This layer stores all data in specific repositories. Both the

above levels save and retrieve information from this layer.

Therefore, the repositories contain information about the in-

terfaces present and usable through the platform, the onto-

logical models inserted by the developers, the algorithms

for features extraction, features matching and indexing and

the media content.

The media content repository is currently divided in four

main sections: image, video, audio and 3D model. Each

section is then divided according to the specific domain; for

instance, we have images for law, sport, medicine, biology,

etc... The annotation of each media is provided by the de-

veloper when a novel multimedia retrieval is inserted into

the platform. When a developer creates a new interface,

this is stored in the interface repository (hence available to

the user), after that all the media related to the specific do-

main are indexed according to the proposed algorithms.

The repository layer has been implemented using RDF, i.e.,

the data is stored as triples and the interaction with the other

levels is done through a SPARQL formalizer. For each RDF

repository a RDF schema has been used. The RDF stor-

age interface has been implemented using SemWeb.NET2,

a library, written in C#, that also defines mechanisms for

repository querying.

3. A Use Case: Content-Based Music RetrievalSystem

An example of use of the proposed architecture has been

implemented for music retrieval, thus addressing the in-

teraction with audio content in music domain. According

to the architecture described in the previous section, the

steps to be performed by a developer for composing a con-

tent music retrieval system are: domain ontology defini-

tion, domain-processing ontology creation, algorithms def-

inition, ontology-algorithms binding and, finally, interface

definition. In our case we have used the music ontology3

for specifying the domain. This ontology has been suitably

modified by adding a new level (processing level) to be in-

tegrated with the domain-ontology processing. In detail, the

music ontology in its original form contains only three lev-

els: the first level deals with editorial information, the sec-

ond level describes events involving the composition of a

2http://razor.occams.info/code/semweb/3http://musicontology.com/

978-1-61284-433-6/11/$26.00 ©2011 IEEE 184 CBMI’2011

musical work, and the third level describes the event decom-

position. Our modified version contains a fourth level that

includes the parameters used for the processing: i.e. audio

length, audio resolution, sample rate, byte per second, etc...

An example of the modified version of the music ontology

(with related URIs) is shown in fig. 4. The core of the ap-

plication is the domain-processing ontology which as been

designed as shown in fig. 5. This ontology presents dif-

ferent nodes where the root node defines the main concept,

i.e. Audio. The children nodes describe different concepts:

i.e. audio pre-processing, processing and performance cri-

teria. The latter one allows the user to specify constraints

about the system’s performance. Indeed, according to the

user’s requirements it can be chosen an elaboration oriented

to either accuracy (high precision) or execution time (fast

processing) or balanced between accuracy and processing

time. The performance criteria concept implies that only

one of the features belonging to the lower level can be se-

lected. The Pre-processing and Elaboration features are

OrFeatures, i.e. one can select none, one or more features.

The leaf nodes represent the features to extract; each feature

is associated to a specific algorithm. The constraints are ex-

pressed through directed edges. The letter I denotes the in-

clusion relationship. Clearly, it would have been possible

to have other types of constraints: exclusions, default and

avoid, but in this context they are not needed. According to

this ontology the system implements the interface shown in

fig. 6.

In the reported figure, the user asked for retrieving a part

of the displayed song and the option “Balanced Accuracy”

is chosen (the slider is positioned in the middle between

Accuracy and Speed). This implies the selection of the al-

gorithm OnWindow, MakeWindow, Big Overlap, DCT and

MedCoeff (the inference of these algorithms is achieved by

the reasoner is highlighted in fig. 5). In fact, these algo-

rithms achieve good accuracy in audio processing and they

do not affect speed performance [10].

The algorithms for music processing, previewing, con-

tent filtering are implemented using two libraries: Audio-

Lab4 and IrrKlang5.

4. Concluding Remarks

In this paper we have proposed a semantic-based and

adaptive architecture for automatic multimedia retrieval

composition according to the domain’s features and user re-

quirements. Moreover, the proposed system also represents

a novel architecture for general purpose automatic multime-

dia retrieval, based on some key technologies, i.e. ontology,

features modeling and interface auto-composition. There-

fore such integrated systems resulting in a semantic-rich

4http://www.mitov.com/html/audiolab.html5www.ambiera.com/irrklang

Figure 4. Extended Music Ontology by addinglevel 4 that contains information necessaryfor the audio processing

and flexible mechanism for generating automatically any

type of multimedia retrieval system must be deeply investi-

gated to realize the full potential of multimedia retrieval.

We are currently working on improving the mechanism for

ontologies integrating and matching to support collabora-

tive refinements and evolution of the already created ontolo-

gies and also on providing the developer level with a module

for assisting the use of algorithms across domain. More-

over, the automatic interface composition module should

also be expanded taking into account methods for intelligent

workflow composition such as the one proposed in [11] .

References

[1] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-

based multimedia information retrieval: State of the

art and challenges,” ACM Trans. Multimedia Comput.Commun. Appl., vol. 2, pp. 1–19, February 2006.

[2] A. Dong and H. Li, “Multi-ontology based multimedia

annotation for domain-specific information retrieval,”

in Proceedings of the IEEE International Conferenceon Sensor Networks, Ubiquitous, and TrustworthyComputing - Vol 2 - Workshops - Volume 02, (Wash-

ington, DC, USA), pp. 158–165, IEEE Computer So-

ciety, 2006.

978-1-61284-433-6/11/$26.00 ©2011 IEEE 185 CBMI’2011

Figure 5. Domain-Processing Ontology for the developed Music Media Retrieval. The algorithmsselected by the reasoner according to the user’s requirements set in fig. 6 are here highlighted

Figure 6. Interface generated by the pro-posed system for the implemented multime-dia retrieval system. Each section is relatedboth to the domain ontology (music ontol-ogy, on the right side of the image) and to thedomain-processing ontology shown in fig.5

[3] A. G. Hauptmann, J. J. Wang, W.-H. Lin, J. Yang, and

M. Christel, “Efficient search: the informedia video

retrieval system,” in Proceedings of the 2008 interna-tional conference on Content-based image and videoretrieval, CIVR ’08, (New York, NY, USA), pp. 543–

544, ACM, 2008.

[4] A. Hinze, G. Buchanan, D. Bainbridge, and I. Witten,

“Semantics in greenstone,” in Semantic Digital Li-

braries (S. R. Kruk and B. McDaniel, eds.), pp. 163–

176, Springer Berlin Heidelberg, 2009.

[5] U. Rashid, I. A. Niaz, and M. A. Bhatti, “Unified

multimodal search framework for multimedia infor-

mation retrieval,” in Advanced Techniques in Comput-ing Sciences and Software Engineering (K. Elleithy,

ed.), pp. 129–136, Springer Netherlands, 2010.

[6] X. Liu, Z. Shao, and J. Liu, “Ontology-based image

retrieval with sift features,” Pervasive Computing, Sig-nal Porcessing and Applications, International Con-ference on, vol. 0, pp. 464–467, 2010.

[7] V. Mezaris and M. G. Strintzis, “Object segmenta-

tion and ontologies for mpeg-2 video indexing and re-

trieval,” in CIVR, pp. 573–581, 2004.

[8] S. Abdallah, Y. Raimond, and M. Sandler, “An

ontology-based approach to information management

for music analysis systems,” in Audio Engineering So-ciety Convention 120, 5 2006.

[9] V. Haarslev and R. Moller, “Racer: An owl reason-

ing agent for the semantic web,” in In Proc. of the In-ternational Workshop on Applications, Products andServices of Web-based Support Systems, in conjunc-tion with 2003 IEEE/WIC International Conference onWeb Intelligence, vol. 13, pp. 91–95, 2003.

[10] J. Foote, M. L. Cooper, and U. Nam, “Audio retrieval

by rhythmic similarity,” in ISMIR, 2002.

[11] G. Nadarajan, “Planning for automatic video process-

ing using ontology-based workflow,” in ICAPS 07,

2007.

978-1-61284-433-6/11/$26.00 ©2011 IEEE 186 CBMI’2011

[ieee 2011 9th international workshop on content-based multimedia indexing (cbmi) - madrid, spain...

Documents