fuzzy clustering of time-variant and invariant …...1 fuzzy clustering of time-variant and...
TRANSCRIPT
1
Fuzzy Clustering of Time-variant and invariant Features:
Application to Sepsis Outcome Prediction
Marta C. Ferreira*
* Technical University of Lisbon, Instituto Superior Técnico, Dept. of
Mechanical Engineering, CIS/IDMEC – LAETA, Av. Rovisco Pais, 1049-001 Lisbon, Portugal 2014.
ARTICLE INFO
ABSTRACT
Keywords:
This dissertation proposes a novel clustering method based on fuzzy c-means, which is capable of
handling information from time variant and invariant features. The new method, Mixed Clustering,
shows the advantages of successfully aggregating both data components to identify systems in a wide
range of application domains, such as Medical, Management or Energy Systems.
The flexible formulation of the proposed methodology can adapt to data sets with multivariate time
series and different similarity measures based on distance. In fact, in addition to the euclidean
distance, the distance based on the popular Dynamic Time Warping method is used for time series
similarity search, being capable of overcoming the temporal misalignment between them, commonly
found on these applications.
The contribution of the Mixed Clustering approach is demonstrated for forecasting and
classification problems, the first being achieved through its application to a meteorological system for
temperature and humidity forecasting based on geographical location. The method’s performance as a
binary classifier is demonstrated with a Medical application, where the goal is to predict the outcome
of a patient diagnosed with septic shock through the analysis of physiological variables measured
during a sampling period and patient’s demography, which is constant during his stay in an Intensive
Care Unit. The machine learning process is tested under unsupervised and supervised alternatives.
The application of the method showed that when the temporal information of the patient is poorer, the
demographic information can improve the classifier’s performance.
Data Mining
Machine Learning
Clustering
Time Series Analysis
Mixed Data
Septic Shock
1. Introduction
1.1. Knowledge Data Discovery
The present developments in data warehouse enable
storing of increasingly bigger sets of data, leading to a
growth in the amount of information available regarding
any given system as well as the analytical possibilities
they provide.
The Knowledge Data Discovery (KDD) process
focuses on methodologies for extracting useful
knowledge from the available information, data bases,
(Fayyad, Piatetsky-Shapiro, & Smyth, 1996).
Firstly the data relevant (target data) for the system
under identification from the available data base, after
which the target data is pre-processed, cleaning the
information, handling missing values and adapting it to
the requirements of the analysis.
Figure 1-1 KDD Process
The data is then Transformed, consolidated into
structures appropriate for the data mining method then
applied, in this case the Mixed Clustering, which
identifies patterns in the data.
2
The results obtained from the mined patterns is then
interpreted in the original systems field, finally
obtaining the useful knowledge desired.
The focus of this dissertation is proposal of a new,
efficient, data mining method based on clustering, for
databases combining time variant and invariant features,
valid for forecasting and classification problems, and
applicable to a diverse range of application domains,
from medical problems, climacteric analysis, power
management to economic studies, designated as Mixed
Clustering.
1.2. Time Series Data Mining
This innovative data mining method searches for
patterns and similarities in both data components, time
variant and invariant, combining the extracted
information to better characterize the data objects.
The process of mining time series, particularly, the
clustering of time series attracts the interest of
researchers. The complexity of this type of data requires
careful examination of the proposed algorithms, (Rani
& Sikka, 2012). While the time invariant features are
easily compared by a common and simple distance
function, the Euclidean Distance, the time variant
features, represented by time series, require a more
complex analysis, (Rani & Sikka, 2012).
Thus, a more modern measure is implemented for
similarity search of time series, the Dynamic Time
Warping.
Figure 1-2 Euclidean and DTW matching of Time Series
This similarity measure is capable to overcome
temporally misaligned time series, identifying similar
tendencies and patters, even if unfazed in the time of
occurrence.
This measure has been successfully applied in areas
such as handwriting and online signature matching,
time series database search, computer vision,
surveillance and signal processing, (Gaudin &
Nicoloyannis, 2006).
1.3. Outline
This work is structured as follows: in section 2, the
mixed clustering concept is described and the
methodology presented. In section 3, the use of the
method’s outputs to solve a forecasting problem is
presented and applied to a Meteorological System,
followed by a demonstration and discussion of the
results. The method’s contribution to a classification
problem is demonstrated in section 4, and applied to a
Medical System, followed by a demonstration and
discussion of the results achieved. Finally, in section 5
the results of the different applications are revised and
compared to previous works on the subject, concluding
with a set of suggestions to further develop the study
described as future work.
2. Clustering
2.1. Concept
Clustering is a data mining technique that aims to
group similar data objects, based on patterns identified,
while distinguishing objects with distinct behaviours,
divide the data into clusters, so that intra-group
differences are smaller than those inter-groups. This
concept is useful in a wide range of applications from
image analysis, wireless sensor network's based
applications or population segmentation to
bioinformatics, (Liao, 2005).
Often, the information that describes a system is not
all represented in the same type of data, there are
categorical, numerical and text features, constant and
time-varying features. In such cases, a clustering
3
method capable of conciliating distinct data types
becomes necessary.
In (Izakian, Witold, & Jamal, 2013), a clustering
method to handle spatiotemporal systems is proposed.
These systems are characterized not only by temporal
features but also by the spatial location at which they
were measured. Geography, climatology and
epidemiology systems are examples of applications
relying on spatiotemporal data for their identification.
The methodology proposed in (Izakian, Witold, &
Jamal, 2013) expands the Fuzzy C-Means (FCM)
Clustering technique, (Bezdek, Ehrlich, & Full, 1984) to
handle spatiotemporal data by adding a pondering
element 𝜆, that factors the importance to be given to the
temporal component. This element majorly beneficiates
the algorithm’s flexibility, allowing it to search for the
best combination between temporal and spatial
contributions
The aim of this dissertation is to expand this notion of
spatiotemporal data to any dataset containing different
types of data, constant and time-varying, that may
require specific treatment, by generalizing the
spatiotemporal clustering methodology to data bases
with mixed clustering and multivariate time series.
We will show that there are benefits in successfully
converging both data components to model systems in a
wide range of application domains, such as Medical
Care, Finances, Management and Energetic Systems.
2.2. Mixed Clustering Methodology
When working with a database with time variant and
invariant features, the input data is considered as a
concatenation of both data components:
𝑥𝑖 = [𝑥𝑖𝑠|𝑥𝑖
𝑡], 𝑖 = 1, . . , 𝑛 ( 2.1 )
The invariant component, represented by numeric
values, is structured as follows
𝑥𝑖𝑠 = [𝑥𝑖,1
𝑠 , … , 𝑥𝑖,𝑟𝑠 ] ( 2.2 )
Where r is the number of invariant features.
The time variant data component, represented by
multivariate time series, is structured as a tri-
dimensional matrix:
𝑥𝑖,𝑗,𝑘𝑡 =
( 2.3 )
In this format, each value is defined by 3 coordinates:
𝑖 = 1, … , 𝑛, indicating the sample number,
j = 1, … , 𝑞, the sampling point
and 𝑘 = 1, … , 𝑓, the feature
The clustering method defines a set of prototypes, or
centers for each of the c clusters, comprised of a variant
and an invariant component:
The invariant component’s prototypes are determined
by:
𝑣𝑙𝑠 =
∑ 𝑢𝑙,𝑖𝑚𝑥𝑖
𝑠𝑛𝑖=1
∑ 𝑢𝑙,𝑖𝑚𝑛
𝑖=1
( 2.4 )
The time-variant prototypes require an expansion to
deal with the dimensionality increase of the data. A 3
dimensional structure was defined, with dimensions
[𝑐 × 𝑞 × 𝑓]:
𝒗𝒍,𝒌𝒕 =
∑ 𝒖𝒍,𝒊𝒎𝒙𝒊,𝒌
𝒕𝒏𝒊=𝟏
∑ 𝒖𝒍,𝒊𝒎𝒏
𝒊=𝟏 ( 2.5 )
Where the fuzziness parameter, m, makes the process
more fuzzy or crisp. The membership degree
The value 𝑢𝑙,𝑖 is an element of the partition matrix, U,
that defines the degree at which each sample belongs to
each cluster. Being a fuzzy clustering method, the
membership of a sample k to a cluster i is a value in the
interval 𝑢𝑙,𝑖 ∈ [0,1],∑ 𝑢𝑖,𝑘 = 1𝑐𝑙=1 and0 < ∑ 𝑢𝑙,𝑖 < 𝑛𝑛
𝑖=1 .
The similarity between a sample and a cluster is
then measured by the sample’s augmented distance
to the cluster’s center, given by:
𝒅𝝀𝟐(𝒗𝒍, 𝒙𝒊) = ‖𝒗𝒍
𝒔 − 𝒙𝒊𝒔‖𝟐 + 𝝀 ∑ 𝜹(𝒗𝒍,𝒌
𝒕 , 𝒙𝒊,𝒌𝒕 )
𝒇
𝒌=𝟏
( 2.6 )
Where 𝜹(𝒗𝒍,𝒌𝒕 , 𝒙𝒊,𝒌
𝒕 ) is the distance between the 𝑘𝑡ℎ
feature of prototype 𝑖 and sample 𝑗, calculated by the
4
distance function used and 𝜆 is a parameter that defines
the influence given to the time variant features. The
optimal value of this parameter is determined by
sequential runs of the clustering process, for different
values, choosing the one that generates the best
performance.
By adding the distances of all features for each
sample, the matrix of distances maintains its dimension
[𝑐 × 𝑛], resulting in a meaningful partition matrix
defined, as for a univariate time-series system, by:
𝒖𝒍,𝒊 =𝟏
∑ (𝒅𝝀(𝒗𝒍,𝒙𝒊)
𝒅𝝀(𝒗𝒐,𝒙𝒊))
𝟐/(𝒎−𝟏)𝒄𝒐=𝟏
( 2.7 )
Since the objective function 𝐽 only has direct
dependency on the distances and membership degrees,
it can be defined as for a univariate time-series system:
𝐽𝑱 = ∑ ∑ 𝒖𝒍,𝒊𝒎𝒅𝝀
𝟐(𝒗𝒍, 𝒙𝒊
𝒏𝒊=𝟏
𝒄𝒍=𝟏 ) ( 2.8 )
The Clustering process continues until convergence
of the distance function or the maximum number of
iterations is achieved.
3. Forecasting Problem – Meteorological System
3.1. Modelling
The Alberta Agriculture and Rural Development
organization provides current and historical weather
data from approximately 340 meteorological stations
located across the Californian province, mapped on
Figure 3-1. The meteorological variables available
include temperature, humidity, precipitation and solar
radiation, and are of great interest for users such as
Epidemiologists seeking to better understand, for
instance, the relationships between measures of
environmental health and those of animal health. This
platform, available at (ARD) is also valuable for
environmental or agriculture analysis.
Figure 3-1 Map of the province of Alberta, Canada. Area
were the meteorological stations are located
The Alberta province covers areas with different
geographical and meteorological profiles that
characterize these locations, including mountains,
valleys, lakes and arid areas.
For these experiments, the average daily temperatures
and the daily average humidity registries where
considered, taken from 1/1/2009 to 12/31/2009, forming
the time variant input features. The time invariant
features used consisted of the latitude and longitude
coordinates of the location of the station they were
measured at.
All stations in which all features were available and
had no missing values were considered, resulting in 168
samples.
The time series were represented by the Discrete
Fourier Transform (DFT).
DFT Fuzziness parameter: 𝑚 = 2
Number of samples: 𝑛 = 249
Number of time invariant features: 𝑟 = 2
Number of time variant features: 𝑓 = 2
Time variant feature’s length: 𝑞 = 365
3.2. Experimental Setup
The application of the Mixed clustering methodology
proposed to the Meteorological System was performed
under two distinct criterions. The first, Reconstruction
5
Criterion (RC), evaluates the cluster validity, while the
Prediction Criterion (PC) evaluates the method’s
forecasting ability.
Reconstruction Criterion
The RC assesses the quality of the clusters
constructed by attempting to recreate the original data.
Defining �̂� as the reconstructed data, its variant and
invariant components are respectively defined as
�̂�𝑖𝑠 =
∑ 𝑢𝑙,𝑖𝑚𝑣𝑙
𝑠𝑐𝑙=1
∑ 𝑢𝑙,𝑖𝑚𝑐
𝑙=1 ( 3.1 )
�̂�𝑖𝑡 =
∑ 𝑢𝑙,𝑖𝑚𝑣𝑙,𝑘
𝑡𝑐𝑙=1
∑ 𝑢𝑙,𝑖𝑚𝑐
𝑙=1 𝑘 ∈ [1, 𝑓] ( 3.2 )
The Average Reconstruction Error (ARE) is
calculated as:
𝐴𝑅𝐸(𝜆) =1
𝑛× (
1
𝑟× (∑ ∑
(𝑥𝑖,𝑗𝑠 − �̂�𝑖,𝑗
𝑠 )2
𝜎𝑗2
𝑟
𝑗=1
𝑛
𝑖=1) +
1
𝑓 × 𝑞
× (∑ ∑ ∑(𝑥𝑖,𝑗
𝑡 − �̂�𝑖,𝑗𝑡 )
2
𝜎𝑗2
𝑞
𝑗=1
𝑓
𝑘=1
𝑛
𝑖=1))
( 3.3 )
Where 𝜎𝑗2 is the variance of the j
th feature.
Prediction Criterion
The aim of the PC is to predict the temporal
component of the data by using the available spatial
component of the data, minimizing the resulting error
by adjusting the temporal influence parameter 𝜆.
A partition matrix is estimated from the invariant data
and prototypes:
�̃�𝑙,𝑖 =1
∑ (‖𝑣𝑙
𝑠−𝑥𝑖𝑠‖
‖𝑣𝑜𝑠 −𝑥𝑖
𝑠‖)
2(𝑚−1)⁄
𝑐𝑜=1
( 3.4 )
The average Prediction Error (APE) is then calculated
as:
𝐴𝑃𝐸(𝜆) =1
𝑛×𝑓×𝑞× (∑ ∑ ∑
(𝑥𝑖,𝑗𝑡 −𝑥𝑖,𝑗
𝑡 )2
𝜎𝑗2
𝑞𝑗=1
𝑓𝑘=1
𝑛𝑖=1 ) ( 3.5 )
The stopping criteria for the clustering algorithm in this
experiment were the following:
Minimal variation of the objective function:
|∆𝐽| < 𝜀 = 10−5
Maximum number of iterations: 𝑚𝑎𝑥𝑖𝑡 = 100
3.3. Results and Discussion
Reconstruction Criterion
The RC was applied to each of time variant feature,
humidity or temperature, individually and to the
combination of both in a multivariate approach, each
using a number of clusters between 2 and 5, using the
Euclidean Distance and the DTW for similarity search.
It was observed that the multivariate alternative was
not capable to improve the quality of the data clusters
created, according to this criteria, and that the best
results were obtained for the temperature features, with
5 clusters and using the Euclidean Distance. Figure 3-2
shows a plot of the analysed stations according to their
geographical location, coloured according to the cluster
they have the highest membership degree to, under the
best RC conditions. Four stations in different regions
are highlighted.
Figure 3-2 Geographical Distribution under best RC
conditions, c=5
It is clear that the method was capable of recognizing
and distinguishing areas with the most different
climacteric profiles.
Prediction Criterion
The PC was also applied under the same experimental
conditions as the RC, multivariate and univariate time
series, Euclidean distance and DTW were used as
similarity measures for a number of clusters between 2
and 10.
6
The best result was also obtained using the
multivariate approach, with the Euclidean distance and
8 clusters.
These conditions were used to forecast the
temperature and humidity. The total samples were
separated into training and testing sets:
𝑥𝑡𝑟𝑎𝑖𝑛 = [𝑥𝑡𝑟𝑎𝑖𝑛𝑠 |𝑥𝑡𝑟𝑎𝑖𝑛
𝑡 ] ( 3.1 )
And
𝑥𝑡𝑒𝑠𝑡 = [𝑥𝑡𝑒𝑠𝑡𝑠 |𝑥𝑡𝑒𝑠𝑡
𝑡 ] ( 3.2 )
The procedure followed is described in Figure 3-3.
Figure 3-3 Workflow representing process for temporal
forecasting of test set
In this experiment, around 70% of the samples were
used as train set, 𝑛𝑇𝑟𝑎𝑖𝑛 = 117, while the rest was
used as test set. The forecasting results of humidity and
temperature of one exemplary test sample, under the
best conditions, are shown in Figure 3-4 and Figure 3-5,
respectively.
Figure 3-4 Humidity Predicting under best PC conditions
Figure 3-5 Temperature Predicting under best PC conditions
In the forecasting problem, the DTW did not show
improvement on the Euclidean distance, as similarity
measures. The multivariate approach achieved the best
forecasts of temperature and humidity during 2009, at
the selected stations.
4. Classification Problem – Medical System
An analogy was made from the spatiotemporal
concept, where the geographical location becomes, in
medical applications, a patient’s demography: age,
weight, height, sex, among other possibilities. In this
equivalence, the temporal component is regarded as all
time-varying features that characterize the system, such
as heart beats, blood pressure, body temperature and
such, measured through a period of time and
represented as time-series.
7
4.1. Modelling
Septic shock is a medical emergency that can occur as
a reaction of the immune system to, for example, an
operation. It is estimated to affect about 12% of patients
in an Intensive Care Unit (ICU) and has a high death
rate, which is referred to depend on the patient’s age
and overall health.
The database used, MEDAN, comprises several
physiological features of patients diagnosed with
abdominal septic shock, uniformly sampled during the
whole period while the patient was at the ICU, (Paetz,
2003). This database was pre-processed by (Marques,
Moutinho, Vieira, & Sousa, 2011), who analysed the
most determinant features for outcome prediction,
creating a sub dataset of patients with measurements of
12 of the available features.
This data suffered further processing, from which
resulted a data set with 100 samples each comprised of:
2 time invariant features: patient’s age and
weight, represented by a numeric value;
12 time variant features representing
physiological variables by time series with a
sampling time of 24 hours, over the last 10 days of
the patient’s stay in an Intensive Care Unit;
1 outcome represented by a binary where 0
represents the patient’s survival and 1 the patient’s
death.
4.2. Experimental Setup
The concept of classification based on clustering
assumes that similar objects will share outcomes, and
uses this knowledge to predict an object’s classification.
The classification approach proposed in this work is
based on this concept and defines an object as
belonging to a cluster if its membership degree is higher
than a certain threshold. It then assumes that objects
grouped together must share the same outcome. Thus,
this concept is only valid for binary classifiers using
two clusters, c=2.
To evaluate the method’s ability to predict an object’s
outcome, a 5 fold Cross Validation was performed.
At each fold, the train set is clustered to determine the
optimal 𝜆∗ and the resulting clustering output 𝑣∗. The
membership degree of each test set sample are then
determined, depending on their distance to each cluster
prototype, and the predicted outcome determined
according to the highest membership degree.
The experiments described in this section share the
following experimental conditions:
Clustering Conditions:
o Minimal variation of the objective function:
|∆J| < ε = 10−8
o Maximum number of iterations: maxit = 500
o Fuzziness parameter: m = 2
Classification Conditions:
o 5 Fold CrossValidation
o Class Distribution: 44%/56%
The Mixed Clustering methodology was applied
under two learning approaches: unsupervised and
supervised. The first partitions the data without
knowledge of its outcome, while the second used
labelled samples for training, following the steps:
i Unsupervised Clustering of Train set to
determine 𝜆∗;
ii Supervised Clustering of Train set using 𝜆∗ to
obtain prototypes 𝑣∗;
iii Unsupervised Classification of Test set using 𝑣∗.
The criteria implemented to evaluate the quality of
the outcome prediction is frequently used with health
care problems, (Lavrač, 1999):
Accuracy: measures the number of correct
classifications out of samples classified;
Sensitivity: accounts for the number of correct
positive classifications, out of all positive samples;
Specificity: accounts for the number of correct
negative classifications, out of all negative samples;
4.3. Results and Discussion
The experiments performed with the Mixed
Clustering include the use of data representations in
time (raw data) and frequency domain (DFT), of the
8
Euclidean Distance and the DTW as similarity
measures. In addition to the mixed clustering, an
alternative clustering was tested, using only the time
variant features, to assess the actual benefit of
combining both information components, designated as
Temporal Clustering.
A Forward Feature Selection method was used to
assess the quality of each time variant feature, under all
combinations of conditions described.
It was observed that the superiority of a similarity
measure or time series representation method depended
on the feature.
The benefit of the mixed clustering over the temporal
clustering was also not global for every feature. It was
verified that when the time variant features, by
themselves, were rich enough, the addition of the
patient’s demography mislead the algorithm, leading to
weaker results. However, when the temporal feature
was weaker, it benefited from the mixed clustering
approach.
The best overall Unsupervised Mixed Clustering
result was obtained using the Euclidean Distance with
the DFT using one time variant feature, no. 6,
representative of the Central Venous Pressure.
Figure 4-1 shows the differences between the
temporal and mixed alternatives, under unsupervised
learning, for the best feature and an example of a
weaker temporal feature that benefited from the mixed
clustering approach, feature 8: Ph.
Figure 4-1 Unsupervised Mixed and Temporal Clustering
Accuracy for features 6 and 8
It is observable that while the addition of the patient’s
demography did not increase the performance of feature
6, the weaker feature 8 needed the increase of
information that came with it.
In Figure 4-2, the equivalent results are shown, for
the Supervised learning alternative.
Figure 4-2 Supervised Mixed and Temporal Clustering
Accuracy for features 6 and 8
The best result under Supervised clustering was also
achieved for feature 6, using the DTW and DFT. It is
also shown that, for these features, the supervised
clustering alternative managed to improve the results of
the unsupervised alternative. This effect was not
verified for all features however, overall the supervised
learning increase the performance of the features that
were also the best under the unsupervised alternative,
suggesting that the features most related to the outcome
beneficiate from its inclusion in the learning process.
5. Conclusions and Future Work
A new expanded clustering algorithm was formulated
to mine databases represented by both time variant and
invariant features, combining the information extracted
to further characterize a given system. The results of the
data mining and pattern recognition process were
applied to machine learning purposes, where distinct
methodologies were proposed to solve Forecasting and
Classification problems, the first with a Meteorological
System, while the last with a Medical application,
demonstrating its wide applicability.
9
Different measures were implemented for similarity
search between time series, the commonly used
Euclidean Distance and the increasingly popular
Dynamic Time Warping. The benefit of the joint
clustering of different types of data was also
demonstrated, by comparing it to the clustering of
individual data types.
Table 5-1 shows the best result obtained from
previous work on the same database.
It should be noticed that these results are not directly
comparable since the studies performed different
processing on the input data and the methods used are
different. The authors of (Cismondi, et al., 2012) used
multi-criteria Feature Selection with Fuzzy Models
(FM) and Neural Networks (NN) to predict the patient’s
outcome.
While the FM constructed produced the best ACC,
the Mixed Clustering produced comparable results
using 4 times less features, 2 of each were numerical
values, significantly easier to measure and process.
Table 5-1 Best Mixed Clustering and best previous
work result
Reference Method No.
features
ACC
(%)
Sens.
(%)
Spec.
(%)
(Cismondi,
et al.,
2012)
NN
Max
Sens. 12 72.78 84.27 65.53
Max.
Spec. 12 75.74 60.00 85.48
Parallel 12 79.25 85.21 69.64
FM
Max
Sens. 12 74.45 84.18 68.81
Max.
Spec. 12 79.17 63.53 88.83
Parallel 12 81.72 85.15 71.21
Mixed
Clustering
Unsupervised
3* 76.00 88.83 66.09
Supervised 3* 77.00 82.33 73.24
* The mixed Clustering used two constant features, patient’s age
and weight, combined with one time variant feature.
In addition, the Mixed Clustering method has the
highest sensitivity, or true positive rate, crucial since the
positive class represents a deceased patient.
As future work, it would be interesting to expand the
clustering possibilities to any number of partitions and
to databases with any number of classes.
Since the DTW method is able to compare time series
of different length, the expansion of the method to form
prototypes of variable length would expand the
applicability of the mixed clustering method to
databases with time series of different length.
Also, a reformulation of the method should include
the possibility to use different similarity measures for
each feature, as well as the influence given to each
through the implementation of different temporal
influence parameters 𝜆𝑖, where 𝑖 = 1,2, … , 𝑓.
Even though one of the great advantages of the data
mining and soft computing techniques analysis is their
ability to read any problem specific to a given field as a
generalized system, the final step in the KDD approach
would be the interpretation of the results, bringing the
problem back to its field and enabling practical
conclusions. Thus, the medical system application
demonstrated would benefit from further analysis over
the best features that resulted from the feature selection
algorithms, possibly bringing awareness of the
importance of a feature to the medical community. In
this context, a feature sensibility study could also be
performed on the time variant and invariant features,
pre-assessing the quality of the knowledge they contain.
The causes of septic are not yet fully comprehended,
however some risk factors have been studied (Fink,
Abraham, Vincent, & Kochanek, 2005), and could be
insert in the Mixed Clustering method as time invariant
features.
Finally, the validation of the mixed clustering
methodology requires its application to problems from
different domains and fields, such as Financial, Power
Consumption or Surveillance Applications. The use of
benchmark databases can demonstrate its value against
10
different techniques. However, due to the specific
characteristics of the mixed clustering’s inputs, there is
a shortage of available databases, (Keogh & Kasetty,
2003).
References
ARD. (n.d.). Current and Historical Alberta Weather
Station Data Viewer. Retrieved May 2014,
from http://agriculture.alberta.ca/acis/alberta-
weather-data-viewer.jsp
Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM:
The fuzzy c-means clustering algorithm.
Computers & Geosciences, 10, 191-203.
Cismondi, F., Horn, A. L., Fialho, A. S., Vieira, S. M.,
Reti, S. R., Sousa, J. M., et al. (2012). Multi-
stage Modeling Using Fuzzy Multi-criteria
Feature Selection to Improve Survival
Prediction of ICU Septic Shock Patients.
Expert Systems with Applications, 39, 12332--
12339.
Devjver, P. A., & Kittler, J. (1982). Pattern
Recognition: A Statistical Approach. Prentice-
Hall.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996).
From data mining to knowledge discovery in
databases. Al Magazine, 17, 37-54.
Fink, M., Abraham, E., Vincent, J., & Kochanek, P. M.
(2005). Septic Shock. In Textbook of Critical
Care (5th ed.). Saunders Elsevier.
Gaudin, R., & Nicoloyannis, N. (2006). An Adaptable
Time Warping Distance for Time Series
Learning . 5th International Conference on
Machine Learning and Applications (ICMLA
06). Orlando, USA.
Han, J., & Kamber, M. (2006). Data Mining: Concepts
and Techniques (2 ed.). Morgan Kaufmann
Publishers.
Izakian, H., Witold, P., & Jamal, I. (2013, October).
Clustering Spatiotemporal Data: An
Augmented Fuzzy C-Means. IEEE
TRANSACTIONS ON FUZZY SYSTEMS, 21.
Keogh, E., & Kasetty, S. (2003, October). On the Need
for Time Series Data Mining Benchmarks: A
Survey and Empirical Demonstration. Data
Mining and Knowledge Discovery, 7, pp. 349-
371.
Lavrač, N. (1999). Artificial Intelligence in Medicine:
Machine Learning for Data Mining in
Medicine (Vol. 1620).
Liao, T. W. (2005, November). Clustering of time series
data - a survey. Pattern Recognition, 1857-
1874.
Marques, F. J., Moutinho, A., Vieira, S. M., & Sousa, J.
M. (2011). Preprocessing of Clinical Databases
to improve classification accuracy of patient
diagnosis. World Congress, (pp. 14121-
14126).
Paetz, J. (2003). Knowledge-based approach to septic
shock patient data using a neural network with
trapezoidal activation functions. Artificial
Intelligence in Medicine, 28, 207-230.
Rani, S., & Sikka, G. (2012). Recent Techniques of
Clustering of Time Series Data: A Survey.
International Journal of Computer
Applications, 52(15).