[ieee 2011 9th international workshop on content-based multimedia indexing (cbmi) - madrid, spain...
TRANSCRIPT
A Semantic-Based and Adaptive Architecture for Automatic MultimediaRetrieval Composition
Daniela Giordano, Isaak Kavasidis, Carmelo Pino, Concetto SpampinatoDepartment of Electrical, Electronics and Informatics Engineering
University of CataniaViale Andrea Doria, 6 -95125 Catania, Italy
{dgiordan, ikavasidis, cpino, cspampin}@diit.unict.it
Abstract
In this paper we present a domain-independent multi-media retrieval (MMR) platform. Currently, the use ofMMR systems for different domains poses several limita-tions, mainly related to the poor flexibility and adaptabilityto different domains and user requirements. A semantic-based platform that uses ontologies for describing not onlythe application domain but also the processing workflowto be followed for the retrieval, according to user’s re-quirements and domain characteristics is here proposed.In detail, an ontological model (domain-processing ontol-ogy) that integrates domain peculiarities and processingalgorithms allows self-adaptation of the retrieval mecha-nism to the specified application domain. According tothe instances generated for each user request, our platformgenerates the appropriate interface (GUI) for the specifiedapplication domain (e.g. music, sport video, medical im-ages, etc...) by a procedure guided by the defined domain-processing ontology. A use case on content based music re-trieval is here presented in order to show how the proposedplatform also facilitates the process of multimedia retrievalsystem implementation.
1. Introduction
During the last twenty years, with the growth of Inter-
net and the advances in computer power, network band-
width, information storage, and signal/image/video pro-
cessing techniques, huge collections of multimedia data
(audio, videos, images, 3D models, etc...) have been cre-
ated and shared on the web. Content-based multimedia re-
trieval [1] supports users in the hard task of accessing and
searching these multimedia collections.
Despite the sustained effort in developing multimedia re-
trieval systems addressing different aspects such as se-
mantic gap, feature extraction, similarity functions, brows-
ing, summarization, indexing, and evaluation, it is com-
monly believed that there is still room for research, es-
pecially, regarding the development of user-centered ap-
proaches (whose first aim is to satisfy user’s needs) that,
so far, have been underemphasized. The lack of such ap-
proaches is mainly due to the gap between users needs and
media content representation. Indeed, different users (dif-
ferent background and domain) perceive differently multi-
media content or they are only interested in specific type of
information [2]. Moreover, most of the existing multimedia
systems, such as Informedia [3], Greenstone [4], VULDA
[5] cope with only one domain, e.g. sport, medicine, law,
etc. or they deal with only one specific representation of
multimedia content (audio, video, 3D Model, etc.). There-
fore, many of these systems lack in flexibility and in adapt-
ability to different domains and to different multimedia con-
tent, i.e., they are not able to provide customized views to
users for different domains and media content.
This flexibility has been partially achieved by using ontolo-
gies for annotating and representing multimedia contents
[6], [7] and [8]. Generally, these ontologies integrate do-
main knowledge into multimedia data for annotation pur-
poses. Despite the initial enthusiasm, the main problem is
that they only target at one specific domain or media data
type. To address all these needs and for ensuring effective
utilization of available multimedia collections by a variety
of users, in this paper we propose a multimedia retrieval
platform independent of domain and media content. In de-
tail, a flexible mechanism that employs ontologies allows
the platform to adapt to the users’ requirements that involve
the domain (e.g. perform a query on multimedia content in
the law domain), the type of media (e.g. videos) and the
performance (e.g. find the law videos in the shortest time,
i.e. use fast features matching algorithms that guarantee an
average accuracy - not the best - in terms of relevance of
the retrieved results, or find all the videos related to law
978-1-61284-433-6/11/$26.00 ©2011 IEEE 181 CBMI’2011
with high-accuracy neglecting time performance, i.e., use
the features matching algorithms that guarantee the best ac-
curacy in terms of relevance of the retrieved results even
if they require more time than the other implemented algo-
rithms). To accomplish that, our platform relies on several
domain-independent multimedia ontologies (image, video,
audio, etc.) that are integrated with domain ontologies
for providing domain-specific perspectives of multimedia
content and with processing ontologies for addressing the
users’ requirements about performance. This allows us to
provide multiple domain views of the same media, e.g. im-
age retrieval in sports, or image retrieval in medicine, and
to satisfy user’s needs such as “give me images on base-
ball in the shortest time”. Therefore, the major difference
between our approach and the existing ones is the sepa-
ration between domain ontologies, multimedia ontologies
and processing ontologies; indeed, in existing approaches
such as in [2] each ontology is a complete module by itself,
whereas in our approach the domain and processing ontolo-
gies are integrated “at run-time”, thus giving flexibility to
the platform. According to the instances generated from
such ontologies integration, our platform generates semi-
automatically interfaces (GUI) of the multimedia retrieval
system for the specified domain and for the specific media
type.
The remainder of the paper is as follows: Section 2 shows
the proposed platform, describing in detail each module
composing it. Section 3 describes an use case related to
music retrieval. Finally, concluding remarks are given in
the last section.
2. The Proposed Platform
The proposed platform implements a domain-media in-
dependent multimedia retrieval (MMR) platform, whose
functionalities can be customized to provide to the user mul-
tiple domain-specific views of multimedia content and to
provide the developer with tools to create easily multimedia
retrieval systems for the domain and the media type of inter-
est. The basic idea behind this work is to model the retrieval
process of every domain by the integration of three ontolo-
gies: a domain ontology, a media ontology and a processing
ontology (the algorithms for the media processing). In de-
tail, the integration between domain and processing ontolo-
gies (called domain-processing ontology) adapts the pro-
cessing workflow for the media content taking into account
the constraints defined by the users for a specific domain,
whereas the media ontology and the domain-processing on-
tology are responsible of the definition of the system’s inter-
faces. Usually, the retrieval process of a generic MMR sys-
tem is based on pre-defined steps resulting in a static work-
flow for content processing regardless from the domain of
application, whereas different domains require different al-
gorithms for feature extraction depending on their charac-
teristics.
As previously stated, the platform has been conceived in
order to accommodate the needs of both users and devel-
opers. From the developer’s point of view, our platform
provides a flexible mechanism supporting the design of
domain-specific multimedia retrieval systems: for a specific
domain, the developer designs and inserts into the platform
the domain ontology and a processing ontology, whereas
the media ontology is already present in the platform. An
ontology integration module is then responsible for integrat-
ing the domain and processing ontologies. Moreover, the
platform also enables developers to easily upload/integrate
the developed algorithms. A self-guided mechanism assists
interface designing and associates the developed algorithms
with the ontologies’ terms.
From the user’s point of view, our platform provides several
interfaces in function of the type of media to be retrieved
and these interfaces are related to a specific application do-
main and allows users to specify constraints. Each inter-
face also binds with the media ontology model and with the
domain-processing ontology that correlates the concepts to
the algorithms for the specific application domain.
Fig. 1 shows the architecture of the proposed platform con-
sisting of three levels: user level, developer level and repos-
itory level.
2.1. User Level
This level allows the user to interact with the platform
and carry out a media retrieval process. It uses three con-
nected layers: Domain Interface Layer, Ontology Layer and
Processing Layer that are shared also with the developer
level. The user level workflow is:
• The platform asks to the user to choose the application
domain among those available and the media type, for
example image retrieval in medicine for multiple scle-
rosis. According to this choice the platform provides
the interface suitable for the selected domain. After-
wards, the user chooses the constrains that define the
MMR’s performance in terms of processing strategy of
the query content.
• These constraints are forwarded to the ontology layer
that employs an OWL-based reasoner (Racer [9]) for
creating the sequence of algorithms to be executed
after checking its consistency. In detail, it receives
the user’s constraints and provides an instance of
the domain-processing ontology representing the se-
quence of the steps to follow (fig. 2).
• The sequence of the steps to be executed, derived by
the reasoner, is then sent to the processing module that
978-1-61284-433-6/11/$26.00 ©2011 IEEE 182 CBMI’2011
Figure 1. The Platform’s Architecture
Figure 2. Example of an instance of thedomain-processing ontology generated bythe reasoner
contains the algorithms implemented as services, i.e.
the steps required for the retrieval are invoked and ap-
plied in order to provide as output the media relevant
to the query performed by the user. More in detail,
the sequence provided by the reasoner regard the fea-
tures extraction and the features matching. The fea-
tures extraction algorithm indicated by the reasoner ex-
tracts the low level features from the media passed by
the user for querying the system. Then these features
are compared with the media features contained in the
repository, using the matching algorithm and the met-
rics selected by the reasoner. According to these met-
rics, the results are ranked and returned to the user.
To clarify this step an example is due: an user A selects
the domain (e.g. medicine) and then the media type (e.g.
image). According to these choices, our platform displays
one of the available interfaces. Then the user performs a
query in the form: “find all the images similar to the one
uploaded as soon as possible”. These constraints are sent to
the ontology layer, which fetches the ontology -medicine-image processing- (i.e. the domain-processing ontology)
from the ontology repositories and according to the required
time constraints (in this example as soon as possible) selects
the most efficient algorithms for the features extraction and
for features matching. The feature extraction algorithm is
then executed over the query image and matches the low
level features with the ones stored in the feature repository.
The results are then ranked and finally displayed to the used
through the selected interface. It has to be noticed that the
indexing of the media is performed at the developer level,
i.e. when the developer adds new algorithms for specific
media, these are then available (to the user) when the whole
indexing for the media of the specific domain has been com-
pletely performed.
2.2. Developer Level
This level allows the developer to create a specific do-
main multimedia retrieval system by the following steps:
• A developer inserts the specifications for the MMR
system by uploading (or by editing if already present
in the platform) the domain ontology, the processing
ontology and the algorithms. In detail, the Interface
978-1-61284-433-6/11/$26.00 ©2011 IEEE 183 CBMI’2011
Composition Module proposes to the developer a list of
algorithms and the ontological models for specific do-
mains stored in the system. The developer can choose
to create a new model or to modify an existent one.
If a domain-ontology already exists in the current ver-
sion of the platform, the developer is able only to up-
load new algorithms (therefore, to enrich the domain-
processing ontology by adding new processing terms)
and not to add the ontological model (i.e. it is not al-
lowed to the developer to modify the domain ontol-
ogy). Otherwise, the user uploads the domain and the
processing ontologies that are then integrated in the
ontology models section (repository layer).
• After that, an interface allows the developer to asso-
ciate the created/edited ontological model with the al-
gorithms, as shown in fig. 3.
Figure 3. Interface to bind the domain-processing ontology with the algorithms
• The domain-processing ontology in OWL can be
drawn either using a software like Protege 1 or directly
using the functionality of the ontology layer, imple-
mented as web services. The created ontology is then
validated by the reasoner Racer to assure the absence
of ambiguities in the constraints.
• Afterwards the interface composition module com-
poses semi-automatically a new interface related to the
drawn ontology.
• If the developer uploads new algorithms for features
extraction then the media related to the chosen domain
are indexed according to the new algorithms. When
the indexing is done for all the media (that are stored
in a repository section related to the specific domain)
the interface of the new model will be available to the
user layer.
1http://protege.stanford.edu
The processing and the interface composition mod-
ule have been implemented using the .NET framework,
whereas the ontology layer has been implemented as web
services in Java.
2.3. Repository Layer
This layer stores all data in specific repositories. Both the
above levels save and retrieve information from this layer.
Therefore, the repositories contain information about the in-
terfaces present and usable through the platform, the onto-
logical models inserted by the developers, the algorithms
for features extraction, features matching and indexing and
the media content.
The media content repository is currently divided in four
main sections: image, video, audio and 3D model. Each
section is then divided according to the specific domain; for
instance, we have images for law, sport, medicine, biology,
etc... The annotation of each media is provided by the de-
veloper when a novel multimedia retrieval is inserted into
the platform. When a developer creates a new interface,
this is stored in the interface repository (hence available to
the user), after that all the media related to the specific do-
main are indexed according to the proposed algorithms.
The repository layer has been implemented using RDF, i.e.,
the data is stored as triples and the interaction with the other
levels is done through a SPARQL formalizer. For each RDF
repository a RDF schema has been used. The RDF stor-
age interface has been implemented using SemWeb.NET2,
a library, written in C#, that also defines mechanisms for
repository querying.
3. A Use Case: Content-Based Music RetrievalSystem
An example of use of the proposed architecture has been
implemented for music retrieval, thus addressing the in-
teraction with audio content in music domain. According
to the architecture described in the previous section, the
steps to be performed by a developer for composing a con-
tent music retrieval system are: domain ontology defini-
tion, domain-processing ontology creation, algorithms def-
inition, ontology-algorithms binding and, finally, interface
definition. In our case we have used the music ontology3
for specifying the domain. This ontology has been suitably
modified by adding a new level (processing level) to be in-
tegrated with the domain-ontology processing. In detail, the
music ontology in its original form contains only three lev-
els: the first level deals with editorial information, the sec-
ond level describes events involving the composition of a
2http://razor.occams.info/code/semweb/3http://musicontology.com/
978-1-61284-433-6/11/$26.00 ©2011 IEEE 184 CBMI’2011
musical work, and the third level describes the event decom-
position. Our modified version contains a fourth level that
includes the parameters used for the processing: i.e. audio
length, audio resolution, sample rate, byte per second, etc...
An example of the modified version of the music ontology
(with related URIs) is shown in fig. 4. The core of the ap-
plication is the domain-processing ontology which as been
designed as shown in fig. 5. This ontology presents dif-
ferent nodes where the root node defines the main concept,
i.e. Audio. The children nodes describe different concepts:
i.e. audio pre-processing, processing and performance cri-
teria. The latter one allows the user to specify constraints
about the system’s performance. Indeed, according to the
user’s requirements it can be chosen an elaboration oriented
to either accuracy (high precision) or execution time (fast
processing) or balanced between accuracy and processing
time. The performance criteria concept implies that only
one of the features belonging to the lower level can be se-
lected. The Pre-processing and Elaboration features are
OrFeatures, i.e. one can select none, one or more features.
The leaf nodes represent the features to extract; each feature
is associated to a specific algorithm. The constraints are ex-
pressed through directed edges. The letter I denotes the in-
clusion relationship. Clearly, it would have been possible
to have other types of constraints: exclusions, default and
avoid, but in this context they are not needed. According to
this ontology the system implements the interface shown in
fig. 6.
In the reported figure, the user asked for retrieving a part
of the displayed song and the option “Balanced Accuracy”
is chosen (the slider is positioned in the middle between
Accuracy and Speed). This implies the selection of the al-
gorithm OnWindow, MakeWindow, Big Overlap, DCT and
MedCoeff (the inference of these algorithms is achieved by
the reasoner is highlighted in fig. 5). In fact, these algo-
rithms achieve good accuracy in audio processing and they
do not affect speed performance [10].
The algorithms for music processing, previewing, con-
tent filtering are implemented using two libraries: Audio-
Lab4 and IrrKlang5.
4. Concluding Remarks
In this paper we have proposed a semantic-based and
adaptive architecture for automatic multimedia retrieval
composition according to the domain’s features and user re-
quirements. Moreover, the proposed system also represents
a novel architecture for general purpose automatic multime-
dia retrieval, based on some key technologies, i.e. ontology,
features modeling and interface auto-composition. There-
fore such integrated systems resulting in a semantic-rich
4http://www.mitov.com/html/audiolab.html5www.ambiera.com/irrklang
Figure 4. Extended Music Ontology by addinglevel 4 that contains information necessaryfor the audio processing
and flexible mechanism for generating automatically any
type of multimedia retrieval system must be deeply investi-
gated to realize the full potential of multimedia retrieval.
We are currently working on improving the mechanism for
ontologies integrating and matching to support collabora-
tive refinements and evolution of the already created ontolo-
gies and also on providing the developer level with a module
for assisting the use of algorithms across domain. More-
over, the automatic interface composition module should
also be expanded taking into account methods for intelligent
workflow composition such as the one proposed in [11] .
References
[1] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-
based multimedia information retrieval: State of the
art and challenges,” ACM Trans. Multimedia Comput.Commun. Appl., vol. 2, pp. 1–19, February 2006.
[2] A. Dong and H. Li, “Multi-ontology based multimedia
annotation for domain-specific information retrieval,”
in Proceedings of the IEEE International Conferenceon Sensor Networks, Ubiquitous, and TrustworthyComputing - Vol 2 - Workshops - Volume 02, (Wash-
ington, DC, USA), pp. 158–165, IEEE Computer So-
ciety, 2006.
978-1-61284-433-6/11/$26.00 ©2011 IEEE 185 CBMI’2011
Figure 5. Domain-Processing Ontology for the developed Music Media Retrieval. The algorithmsselected by the reasoner according to the user’s requirements set in fig. 6 are here highlighted
Figure 6. Interface generated by the pro-posed system for the implemented multime-dia retrieval system. Each section is relatedboth to the domain ontology (music ontol-ogy, on the right side of the image) and to thedomain-processing ontology shown in fig.5
[3] A. G. Hauptmann, J. J. Wang, W.-H. Lin, J. Yang, and
M. Christel, “Efficient search: the informedia video
retrieval system,” in Proceedings of the 2008 interna-tional conference on Content-based image and videoretrieval, CIVR ’08, (New York, NY, USA), pp. 543–
544, ACM, 2008.
[4] A. Hinze, G. Buchanan, D. Bainbridge, and I. Witten,
“Semantics in greenstone,” in Semantic Digital Li-
braries (S. R. Kruk and B. McDaniel, eds.), pp. 163–
176, Springer Berlin Heidelberg, 2009.
[5] U. Rashid, I. A. Niaz, and M. A. Bhatti, “Unified
multimodal search framework for multimedia infor-
mation retrieval,” in Advanced Techniques in Comput-ing Sciences and Software Engineering (K. Elleithy,
ed.), pp. 129–136, Springer Netherlands, 2010.
[6] X. Liu, Z. Shao, and J. Liu, “Ontology-based image
retrieval with sift features,” Pervasive Computing, Sig-nal Porcessing and Applications, International Con-ference on, vol. 0, pp. 464–467, 2010.
[7] V. Mezaris and M. G. Strintzis, “Object segmenta-
tion and ontologies for mpeg-2 video indexing and re-
trieval,” in CIVR, pp. 573–581, 2004.
[8] S. Abdallah, Y. Raimond, and M. Sandler, “An
ontology-based approach to information management
for music analysis systems,” in Audio Engineering So-ciety Convention 120, 5 2006.
[9] V. Haarslev and R. Moller, “Racer: An owl reason-
ing agent for the semantic web,” in In Proc. of the In-ternational Workshop on Applications, Products andServices of Web-based Support Systems, in conjunc-tion with 2003 IEEE/WIC International Conference onWeb Intelligence, vol. 13, pp. 91–95, 2003.
[10] J. Foote, M. L. Cooper, and U. Nam, “Audio retrieval
by rhythmic similarity,” in ISMIR, 2002.
[11] G. Nadarajan, “Planning for automatic video process-
ing using ontology-based workflow,” in ICAPS 07,
2007.
978-1-61284-433-6/11/$26.00 ©2011 IEEE 186 CBMI’2011