[ieee 2010 ieee international conference on intelligent computer communication and processing (iccp)...
TRANSCRIPT
A Hierarchical Semantically Enhanced
Multimedia Data Warehouse
Andrei Vanea, Rodica Potolea
Technical University of Cluj-Napoca
Cluj-Napoca, [email protected], [email protected]
Abstract - Data warehouses are used in many domains. Their
purpose is to store historical data and to assist in the decision
making process. Multimedia data warehouses are used for
storing files which contain texts, graphics, videos and sound.
These kinds of files are produced in large quantities, in fields
such as medicine or space research. We propose a framework
for building such a data warehouse, structuring the data in a
familiar way, for a warehouse user. We present a hierarchical
way of structuring the data and the information extracted. We
also propose a method of semantically enhancing the data and
the information extraction process with the use of hierarchical
metadata.
I. INTRODUCTION
A data warehouse is a repository of electronic data,
designed for storing, aggregating and summarizing the data.
Operational databases are used for storing daily business
transactions, preserving data integrity and providing fast
access to data. The data model used with success in
operational databases is the relational model, which follows
Codd’s normalization rules. Data warehouses focus on
business processes and the entities that describe them. Data
warehouses store historical data, from all the areas of a
business, rather than from only one single department, the
way that operational databases do. As an example,
operational databases store data produced at a specific store,
in a chain of stores, and a data warehouse stores all the data
produced at all the stores of that particular store chain.
Another use for data warehouses is in the decision making
process, therefore data warehouses are optimized for a fast
analysis of data. Usually, dimensional modeling is used in
data warehouses. Dimensional modeling is a logical design
technique, in which many tables, called dimensions,
describe a central table, called fact, e.g. the central table
references them. A fact table is the primary table in a
dimensional model where the numerical performance
measurements of the business are stored. Dimension tables
are integral companions to a fact table and contain the
textual descriptors of the business [1]. There is almost
always a way of describing the data in the data warehouse,
and this is done by using metadata, which is data that
describes (central) data [2]. Sometimes, in order to insure
the significance (meaning) of the data, some meta-metadata
is used, to describe the metadata itself [3]. Multimedia data
is complex data in different formats - texts, graphs, videos
sounds [4, 5]. These multimedia files capture different
events or different descriptions of the same event.
Therefore, the multimedia data needs to be stored so that it
can later be processed and analyzed.
Although data warehouse technology for numerical and
symbolic data is considered to be mature [5], there is much
to do in regard to complex, multimedia data warehousing
[6]. We humans can relatively easy extract information from
different types of multimedia objects, such as text files,
images and sounds. But the systematic information
extraction through an automated process needs adequate
techniques, specific for each type of multimedia object
stored, from which knowledge is extracted. But why, if we
can do it ourselves, do we have to bring computers into the
knowledge extraction process? The answer is simple:
because of the large amount of multimedia data that is
created/captured. We humans do not have the ability to
process in depth such complex data nor to detect and extract
knowledge that might be hidden in multimedia data. This
can only be done with the aid of knowledge discovery
processes, which are systematic and (semi) automated.
Many complex fields, such as medicine, space research
or weather forecast, acquire data in many formats: text,
audio, video. There is a need to store, retrieve and process
this complex data. One of the best solutions to accomplish
this is trough the use of data warehouses. Traditional DBMS
systems are not really suited for complex, multimedia data.
This is because relational databases require that the data
they store have structure, whereas multimedia data is often
semi-structured. In this case, new ways of storing and
processing this multimedia data had to be developed and
adopted. Because of the way in which XML can represent
any kind of structure, it was the self-evident way in which
multimedia data could be stored and handled. Some DBMS
support XML files and other are XML-native, such as eXist.
The existence of XML based languages, such as xPath and
xQuery, has further improved the use of XML as a
multimedia data storing technology.
978-1-4244-8230-6/10/$26.00 ©2010 IEEE 3
II. RELATED WORK
Data warehouses are widely used now in many fields,
from economics to medicine and weather forecast [1]. At
the beginning, there where mostly numerical and textual
data warehouses, primarily in business operations such as
economics, marketing and sales. The large amount of data
generated by these businesses was successfully integrated
using the dimensional model. The relational model, used for
regular storage in OLTP databases, was not suited for the
decisions that had to be pulled from the existing data. This
is because it not optimized for aggregating such large
amounts of data, which are stored in data warehouses. The
fact tables store summarized, aggregated data, which is later
used by OLAP tools to aid in the decision making process.
Symbolic objects have also been widely spread in data
warehouse environments. These objects are mostly character
strings. Such objects may be found in surveys and
questionnaires [7, 8].
In [6] the authors describe a data warehouse for complex
objects. The semi-structure format of these objects is
captured via XML files, which are then parsed and validated
against a minimum requirements pattern. The
communication between the user and the data warehouse is
accomplished using the xQuery language, a language for
XML-like structured data. A medical data warehouse
focused on ECG signal recordings, is described in [5],
containing image data and symbolic data.
Retrieving multimedia data from a database or data
warehouse can be done in two ways: by content or by
description [5]. Description based retrieval uses attribute
descriptions of data (color, audio/video duration, number of
instances a particular word is used), while content based
retrieval uses the actual data inside the files (clouds, ideas,
theories). When dealing with multimedia, it is helpful to
separate the types of media files during data retrieval or
processing. Storing these files in a hierarchical way is a
solution, as presented in [9]. It is important to understand
what the significance of the data that it is stored in the
warehouse is, so the use of metadata is a crucial part of the
data warehouse system.
Current trends are to semantically enhance the data
representation of data stored in warehouses. In [10] the
authors propose a method to semantically translate
conceptual models into their platform specific counter parts,
by using an OLAP algebra. The authors of [11] have built a
data warehouse which has two ontologies, one for the
specific business terms and one for the technical terms,
specific to the aggregation and knowledge extraction tools.
This requires a one-time collaboration between the business
experts and data warehouse designers, to produce a mapping
between the two ontologies. As a result, whenever a new
query is requested by the business analysts, the warehouse
administrator can quickly create the appropriate data mart,
without the need of long and repetitive meetings between
the two expert teams.
In [12] the authors implement a system in which they
analyze multimedia data, medical in nature, in order to
extract knowledge from it and to assist the physicians.
III. THE PROPOSED MODEL
One of the problems that is more frequent in multimedia
data warehouses than in numerical and symbolic
warehouses, is answering questions like “which are the
entities that have some particular features?”, which in
complex object warehouses, focusing on medical records for
example, could translate to something like “which are the
people that have had a heart attack?”.
Our work focuses on creating a multimedia data
warehouse which can represent the data in a familiar, top-
down way, and process the data stored by knowing their
connections and dependencies. The system must minimize
human intervention in creating the needed facts and
dimensions that are not considered in the design and
implementation processes. To instantiate our system we
build a medical data warehouse.
The proposed model takes into consideration the fact
that metadata for complex objects refers to information and
description of such things as file format, size, location,
number of words, number of lines, width, height,
(video)length, and so forth, needing thus a hole new way of
representing metadata and the connections between data and
the metadata describing it. Also, the model aims at
improving the way a question like the one already presented
can be answered.
Another thing our model wants to assure is the semantic
value of metadata. Most data warehouses use metadata to
help the user to understand the data stored, making it easier
for them to select the appropriate tools for summarizing,
reporting or analyzing the data. As an example, consider a
(multimedia) data warehouse used in a complex field, such
as finance or medicine. If the beneficiaries want to get some
results that are not designed to be extracted by the system,
they need to contact the data warehouse administrator(s).
But most of the time, they are not experts in the field of
economics, nor in that of medicine. They are specialized in
the field of computer science or at least in database/data
warehouse maintenance. So, there will be a lot of time spent
before the administrator understands the needs of the
beneficiaries, and by using the metadata, creates the
appropriate queries or data marts. With the use of a rich
semantic metadata, the system can automatically resolve
such requests.
A. SYSTEM ARCHITECTURE
We structured our data warehouse in five blocks: the
ETL tools block, the warehouse block, the semantic
metadata block, the processing and metadata maintenance
block and the query processor block (Fig. 1).
4
Figure 1. The system architecture, containing the five blocks: ETL Tools,
Data Warehouse, Metadata, Query Processor and Processing and Metadata Maintenance Tools.
1) The ETL Tools Block
The ETL tools block acquires the data and afterwards
prepares it to be stored in the warehouse. It checks the type
of the file that will be stored and gathers specific data
(meta)features such as name, file length or format. The
information acquired in this step is loaded in XML files
which store the dimensions and characteristics of the files.
2) The Metadata Block
To assure the semantics of the data, we use two
repositories of metadata: one describing the terms specific
to the business domain and one describing the technical
terms that the system can extract and process. We also store
the mappings between the business terms and the technical
terms. These are gathered by both business and technical
specialists, at implementation time. All the information
within these two repositories is represented in a hierarchical
manner, using XML files and XML elements. Lower level
items are nested in upper level items. In this way, the query
processor can resolve the query on a high level item, by
breaking it in lower level items. Finding out which items
that characterize (influence) other items, the query processor
can access the corresponding fact tables. If the desired fact
table does not exist, the query processor checks to see if
another existing fact table (or tables) can provide significant
data for the computation of the query result. If not, the query
builder can create the appropriate fact table, using the
semantic metadata provided in the repositories. This creates
a dynamic environment, which does not need the
intervention of the data warehouse designer or
administrator.
3) The Warehouse Block
We further propose a similar construct for the actual data
warehouse, by modifying the classical dimensional model.
Our model relies on the hierarchical composition of
features. The block is composed of two other blocks, one
containing the dimensions of the system and one containing
the facts.
The facts block communicates with the dimensional
block, which provides the dimensional data needed to
extract the fact data. It contains one data mart for each type
of multimedia data that is managed by the system: text,
image, video, audio and database. Each data mart contains a
Figure 2. The layering of fact tables. Some facts may depend on other facts,
i.e. they see other facts as dimensions (left), but some may not (right).
hierarchy of facts, from level 1 to level n. So, in a way
similar to metadata attributes which depend on the attributes
at a lower level, each fact table may be referenced by the
fact table at the next level (Fig. 2). This means that each fact
table becomes a dimension table for the upper level fact
table, and the dimension tables become support tables. Each
fact table is linked with the corresponding level of metadata,
to allow a faster access to the right fact table(s) that is (are)
needed to answer the query.
4) The Processing and Metadata Maintenance Tools
Block
The processing and maintenance block are made up of
the tools needed to compute aggregations of the data and to
manipulate the metadata. Aggregation tools operate on the
different types of media supported by the data warehouse.
The metadata maintenance tools allow editing the metadata
repositories and also the mappings between the business
repository and the technical one. The mappings are directly
influenced by what the processing tools can compute and
extract in technical terms.
5) The Query Processor
The query processor acts like a controller. It is
connected to all the other three blocks and it resolves the
semantics of the query that the user inputs. After the
dependencies of the query are computed, it selects the
corresponding aggregation tools, based on technical
mappings.
IV. MODEL IMPLEMENTATION
We build our system according to the architecture and
the model proposed. The particular domain for which we
instantiated the implementation is the medical one; more
specific pneumology (pulmonology). The data acquired so
far is represented in two main formats: symbolic (text) and
5
images. Symbolic data is used for storing information about
the name, id, the patient’s date of birth, their gender,
weather they are smokers, non smokers or former smokers,
but it may also store the physicians comments about the
medical state of the patient. Two types of time series are
stored in the data warehouse: the first one contains the
amount (volume) of air exhaled by the patient over time,
and the second one contains the flow of the air volume.
The data warehouse user can view single patient data or
submit a query. The query parameters that need to be
specified are the type of aggregation requested, the domain
specific term on which the aggregation function is to be
applied and the characteristics of the data that is going to be
processed.
The ETL tools in the ETL block of the system get the
operational data and transform it in the structure used by the
data warehouse. Each dimension is stored in one XML file,
so for every new patient or new data for an existing patient,
a new record - i.e. XML node - is appended to the
appropriate XML dimension. If a new image containing a
graphical representation of the functional respiratory tests of
patient is extracted, its characteristics such as file name,
format, size, path, type of time series and corresponding
patient are stored in specific dimensions.
All the parameters that the user can select from to create
the query are already defined in the metadata repository.
The medical terms repository stores the terms that are
supported by the warehouse in an XML file. Every medical
term can be used to describe another medical, i.e. it
influences another term. This is accomplished by nesting
medical terms (i.e. XML nodes) inside upper level terms.
Every medical term has either a direct mapping with a
technical term or in indirect one, via transitivity. The same
property is valid for the technical terms. The structures of
the technical and medical metadata repository are similar.
An XML file contains the direct mappings between two
terms, belonging to each domain. Each XML node
represents a mapping and has two child elements,
representing the matching between medical and technical
terms. This mapping is viewed as the way in which a
particular medical (i.e. business) term is represented as
information. Therefore, knowing the medical term (issue)
that the user is interested in, and how it is represented, the
system can select the appropriate internal functions for
analyzing the existing data.
Each fact level of each data mart is stored in a different
XML file. In this way, we achieve the hierarchical structure
of the data marts, in which a fact table may become a
dimension for another fact table. Such a fact file contains
the result of the aggregation and at least one set of
references for each dimension used in the aggregation
process. To speed up the process of automatically creating
new queries, an XML file is created, containing technical
terms associated with every existing fact table.
After the query parameters are submitted by the user, the
query processor checks the medical term for existing
technical mapping, in the mappings XML file. If a mapping
is not found, the query processor checks for other medical
terms that describe the current medical term and then checks
for a mapping for all these lower level terms. This process is
repeated until all the medical terms that describe the current
query have a mapping with a technical term. After this step,
the query processor checks all the technical mappings found
in the previous step if they have an associated fact table, i.e.
a fact table that has already stored the needed aggregation. If
no satisfactory fact table is found, it then checks for existing
aggregation tools which can extract them from the data. A
similar process is implemented for the technical terms as
with the medical ones. The difference is that the query
processor searches in an XML file containing a mapping
between a technical term and an existing processing
procedure. Once the technical mappings have been fully
resolved, the query processor computes the aggregation and
if no similar fact table exists, where it can store the result, it
automatically creates the fact table. The computed result is
presented to the user.
V. EXPERIMENTAL RESULTS
A. DESCRIBING THE DATA
(a) (b)
Figure 3. (a) – A typical air flow graph for a healthy patient; (b) – A typical
air flow graph for a sick patient: the PEF and FVC are not reached and the
FV line is concave.
In defining a system to assist physicians in taking better
medical decisions, an important step is to correctly
understand the particularities of the problem under
investigation, and identify the features that trigger a
decision. Therefore, we should first detect the end user
needs, and transform them into possible knowledge that our
data warehouse could offer. Such knowledge is extracted
from basic information about patients and lung performance
test results, stored by the system. This knowledge could
help in identifying existing lung problems, via the data
mining process. Such lung problems are encoded in the
recorded images on respiratory tests. Fig. 3a shows a record
of a flow - volume test on a healthy patient and Fig. 5a
shows a record of a volume over time test, also on a healthy
patient. With the domain expert (i.e. lung physician) we
6
identified the significant aspects of the image that are good
indicators of the lung health. They are transformed into
features characterizing the image, which represent rough
data stored in the data warehouse. Moreover, it allowed
creating the list of mappings between the medical and
technical terms. The information extraction process is
beneficial as: (1) is time consuming and (2) as a domain
expert barely handles large and complex data sets, might not
(easily) see hidden relations among the components.
Our first investigations lead us to a number of 35
features: 23 numeric, 7 Boolean and 5 nominal.
The data recorded consists of image data, symbolic and
numeric data. The basic symbolic and numeric data relates
to the known information about the patients, data that is
usually found in medical records. In our particular problem
under investigation, the specified medical records are:
patient id, patient name, date of birth, weight, height, sex,
smoking history and race (sometimes, the race influences
the normal values that a test on a healthy patient should
produce).
To identify patterns, help in determining the diagnostic
and propose a treatment plan, a classification process is
required. In order to do so, a set of features should be
extracted from the images. The image data (Fig. 3a and 5a)
presents graphs indicating inhaled and exhaled air flow
related features.
Figure 4. The dimensions of the system.
The first image type (Fig. 3a) plots the flow of air of a
patient over the volume of exhaled and inhaled air. These
features are presented below and they are all stored as a
numerical value:
is the angle formed by the ordinate
and the first section of the graph;
The Peak Expiratory Flow (PEF) is the
maximal flow (or speed) achieved during the
maximally forced expiration initiated at full
inspiration, measured in liters per second;
The Normal Peak Expiratory Flow (NPEF) is
the computed normal value for the PEF for a
healthy patient, given particular features such
as weight, height, age, etc.;
is the angle between the first initial
curve of the exhaled air and curve between the
second portion of the graph;
The Forced Expiratory Flow at 25-75% (FEF25-75) is the average flow (or speed) of air
coming out of the lung during the middle
portion of the expiration, measured in liters per
second;
The Normal Forced Expiratory Flow at 25-
75% (NFEF25-75) is the computed normal
value for the FEF25-75 for a healthy patient,
given particular features such as weight, height,
age, etc.;
The Forced Vital Capacity (FVC) is the volume
of air that can be forcibly be blown out after a
full inspiration, measured in liters;
The Normal Forced Vital Capacity (NFVC) is
the computed normal value for the FVC for a
healthy patient, given particular features such
as weight, height, age, etc.;
The Flow-Volume Line (FV line) plots the way
the air is exhaled, from the PEF to the FVC.
Regarding the second type of image (Fig. 5a) which
plots the air volume over time, the features that reveal
interest for the diagnosis process, in combination with the
existing air flow image features, are these numerical values:
The Forced Expiratory Volume in one second(FEV1) is the maximum volume of air that can
be forcibly blown out in the first second of the
FVC maneuver;
The Normal Forced Expiratory Volume in one
second (NFEV1) is the computed normal value
for the FEV1 for a healthy patient, given
particular features such as weight, height, age,
etc.
The FVC, FEV1 and PEF have specific values, the
normal value, computed for each (healthy) individual,
according to height, age, sex, and sometimes race and
weight. These characteristics are stored in the dimensions of
the data warehouse (Fig. 4). The normal values are present
in the images acquired by the ETL tools and stored in the
data warehouse. Another interest is in determining whether
the PEF and FVC features have reached the computed
normal values (i.e. the corresponding values for the
specified features on a healthy patient, with the given
7
characteristics). In the ETL step these conditions are
checked and the results are stored in a dimension, particular
to the air flow images. If the computed normal values are
not reached, we compute the percentage of the measured
value in aspect to the predicted one.
(a) (b)Figure 5. (a) – A typical air volume graph for a healthy patient; (b) – A
typical air volume graph for a sick patient: the FEV1 is not reached.
Fig. 3a presents a typical air flow graph for a healthy
patient. The s are very small, the computed
normal values for the FVC and PEF are reached, and the FV
line is straight. In Fig. 5a we can notice that the value for the
FEV1 is also reached.
Fig. 3b presents a typical lung problem air flow image. It
can observe that the PEF and the FEV where not reached
and the FV line is concave, which indicates signs of illness
and lungs problem. There are three types of lung problems
that could be identified from such images:
obstructive lung disease - concave FV line and
FEV1 not reached;
restrictive lung disease - FVC and FEV1 not
reached;
mixed lung disease - PEF, FEV1 and FVC not
reached.
Some factors that are technical in nature might influence
the relativity and interpretability of the images containing
the air flow. An incorrect test might not yield the most
accurate information about the patient’s health. An example
is the way in which the patient blows out the air. If they
blow out the air in one continuous stream, then the test is
successful, but if they stop exhaling and inhale for short
periods, the test is not accurate. This is indicated by the
shape of the FV line, i.e. if it is smooth (continuous
exhaling) or presents many spikes (exhaling and inhaling,
intermediately). If it is not smooth, then the test is not as
relevant as it should be.
computed values reached, smoothness of the curve of the
exhaled air and concavity of the graph, we build specific
technical tools which we map to the corresponding technical
terms. These technical terms are then mapped to their
corresponding medical terms. The mappings to the medical
terms where done with the help of the medical specialist.
The
direct line between the
computed by using the direct line between the origin and the
PEF and the direct line between the PEF and the FVC. In
order to determine the concavity, we use areas. We compute
the area determined by the actual graph stored in the image,
and the area for the graph resulted by considering the FV
line as the straight line which connect the PEF and FVC.
This second area represents the area of a perfect lung exam
result, in which the patient is healthy. We then subtract the
actual area from the second area. In case the difference
gives a positive result, then the FV line is concave.
Otherwise, the FV line is convex.
B. EXTRACTING AND STORING KNOWLEDGE
The physicians need both data and knowledge from such
a system, in order to be assisted in the decision making
process. Therefore, intelligent queries to aggregate existing
(legacy) data and extract significant information from them
are increasing the impact of such systems.
The power and the speed of the system is improved
trough the existence of such information as mean values for
the different results captured by the respiratory tests, on
different patients, both healthy and sick, with particularities
such as height, weight or age.
For the problem under investigation, which is to
determine if a patient is suffering from some kind of lung
problem, knowledge of interest is represented by: the mean
value of the percentage of the measured PEF value in
respect to the normal PEF (NPEF) value, for a given age
group, the mean value of the measured FVC value in respect
to the NFVC (Fig. 6a), the mean value for the concavity
degree of the FV line, the number of patients that have a
concave FV line of a degree lower than the mean, the
number of patients with restrictive lung disease which reach
at least 70% of the NFVC.
(a) (b)Figure 6. (a) An air flow image of a patient, containing the mean
percentage of the PEF (horizontal) and FVC (vertical) in respect to their
normal values; (b) An air flow image of a patient containing a percentage of lung problem diagnosed patients, with similar FV line.
We identified the relevant medical aspects, transformed
them into queries, allowing us to store relevant information,
ready to be used by the physicians. We populated the
warehouse with significant information, before deploying it
(but also in a dynamic fashion, afterwards) to the end user.
This represents an enhancement, difficult to be obtained by
the warehouse administrator, who is not a (medical) domain
specialist. It also reduces the response time of the system.
8
The first type of query, for the mean percentage of the
measured PEF in respect to the NPEF for the 6 – 10 years
age group offers the physician basic knowledge about how
much air is exhaled at those particular ages. The result was
stored in a fact table which corresponds to that particular
age group. This particular query type is important and
relevant for all age groups. Therefore, the knowledge stored
in the system was enhanced with the computed mean for the
age group 11 – 20, the age group 5 – 20, and so forth (i.e.
ranges suggested by domain expert).
Figure 7. Sample taken from the technical metadata repository.
When the first query was submitted, the query processor
checked for mappings for the medical term and then
checked for existing results for the query. Because it did not
find any existing results, already stored in the data
warehouse, it selected the corresponding records, and
computed the requested mean. The same was done for the
second query. For the third query (5 – 20 mean) the query
processor found in the technical metadata repository that a
mean for the age group 6 – 20 can be computed using the
means of the two age groups 6 – 10 and 11 – 20 (Fig. 7).
After finding this information, the query processor looked
for records containing those two means. After finding them,
it decides to use them to resolve the query, instead of
retrieving the required data for all the patients between 6
and 20 years, and computing the mean.
Because the system deals with images, querying the
warehouse based on raw image data needs to be one of the
supported features of the system. Inputting an image as a
query, to get aggregate data relevant to the medical
investigation retrieves relevant knowledge from the system.
Such knowledge is the percentage o patients diagnosed with
some kind of lung problem, with similar FV line (Fig 6b).
The similarity threshold is submitted with the query
parameters. When the image was submitted, the concavity
degree was computed and used in selecting the
corresponding images of the sick patients.
VI. CONCLUSIONS AND FUTURE WORK
We have presented an architecture for a multimedia data
warehouse which aims at a semantically rich environment.
We also presented a data model for representing facts and
dimensions accordingly to the hierarchical structure of
entities captured in multimedia objects. The metadata model
we proposed is also based on hierarchies and can be easily
extended, to provide better performance.
As for the future work, we plan to expand the medical
and technical terms lists and to improve the way in which
they are mapped. We plan to develop more powerful ETL
tools which can extract more information from the files
before they are loaded into the warehouse, to improve the
speed of query processing.
REFERENCES
[1] R. Kimball, The Data Warehouse Toolkit, Wiley and
Sons, 2nd
Edition, 2002
[2] P. Vassiliadis, Data Warehouse Metadata,
Encyclopedia of Database Systems, Springer, 2009
[3] Object Management Group, Common Warehouse
Metamodel (CWM) Specification, 2003
[4] A. Tanasescu , O. Boussaid, F. Bentayeb, Towards
Complex Data Warehousing : A new approach for
integrating and modeling complex data, 5th
International Conference on Modeling, Computation
and Optimization in Information Systems and
Management Sciences, France, 2004
[5] A. M. Arigon, M. Miquel, A. Tchounikine, Multimedia
data warehouses: a multiversion model and a medical application, Multimedia Tools and Applications, vol.
35, 2007
[6] H. Mahboubi, J.C. Ralaivao, S. Loudcher, O. Boussaid,
F. Bentayeb, J. Darmont, X-WACoDa: An XML-based
approach for Warehousing and Analyzing Complex
Data, Advances in Data Warehousing and Mining, IGI
Publishing, 2009
[7] E. Diday, L. Billard, Symbolic Data Analysis:
Definitions and Examples, 2002
[8] S. E. G. Cisaro, H. O. Nigro, Architecture for Symbolic
Object Warehouse, Encyclopedia of Data Warehousing
and Mining, 2nd
Edition, IGI Global, 2009
[9] J. You, Q. Li, On hierarchical content-based image
retrieval by dynamic indexing and guided search,
Proceedings of the 8th IEEE International Conference
on Cognitive Informatics, 2009
[10] J. Pardillo, J. N. Mazón, J. Trujillo, Bridging the Semantic Gap in OLAP Model: Platform-independent
Queries, Proceedings ACM 11th International
Workshop on Data Warehousing and OLAP, 2008
[11] G. Xie, Y. Yang, S. Liu, Z. Qiu, Y. Pan, X. Zhou,
EIAW: Towards a Business-friendly Data Warehouse
Using Semantic Web Technologies, The Semantic Web,
6th International Semantic Web Conference, 2nd Asian
Semantic Web Conference, ISWC 2007 + ASWC 2007,
2007
[12] M. L. Antonie, O. R. Zaiane, A. Coman, Application of
Data Mining Techniques for Medical Image
Classification, Proceedings of the Second International
Workshop on Multimedia Data Mining, 2001
9