![Page 1: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/1.jpg)
1Philippe Biela – Journée ClasSpec - EMD – 6/07/2007
![Page 2: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/2.jpg)
2
The paper presents a general framework for time series clustering based with spectral decomposition of the affinity matrix
A Gaussian function is used to construct the affinity matrix and develop a gradient based method for self-tuning the variance of the Gaussian function.
The approach can be used to cluster both constant and variable length time series.
The algorithm is able to discover the optimal number of clusters automatically.
Experimental results are presented to show the effectiveness of the method.
ABSTRACT
![Page 3: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/3.jpg)
3
Theoretical Background
We consider a set of M Time Series with same length d :
The data matrix with Time Series is :
Consider that we have K clusters, we can suppose that a permuation matrix E exist :
Ai represents the i-th cluster and si the number of data in the i-th
cluster
![Page 4: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/4.jpg)
4
We consider the within-cluster scatter dispersion matrix of cluster k :
mk is the mean vector of the k-th cluster
The total within-cluster scatter matrix is Sw
The goal of clustering is to achieve high within similarity and low between-cluster similarity, that is we should minimize trace (Sw) and maximise trace (Sb)
![Page 5: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/5.jpg)
5
Maximisation of trace (Sb) is equivalent to minimization of trace (Sw)
Then the optimization criterion becomes :
If we consider the block-diagonal matrix Q as :
where ek is a column vector containing sk « ones »
We can demonstrate that
Is equivalent to precedent criterion if we consider :
and we relax the constraint of
22
![Page 6: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/6.jpg)
6
The optimal can be obtained by taking the top K eigenvectors of
Then optimization problem becomes :
![Page 7: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/7.jpg)
7
Where Ai represents the data in cluster i
The normalization makes :
if we assume that data objects are ordered by cluster as :
The similarity matrix S and normalizesd similarity matrix S’ will become block-diagonal
![Page 8: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/8.jpg)
8
To find a « good » similarity matrix wich is almost block-diagonal,
we use the Gaussien function
Then we consider :
![Page 9: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/9.jpg)
9
Clustering Time Series Algorithm via Spectral Decomposition
![Page 10: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/10.jpg)
10
In this experiment they use a real EEG dataset which is extracted from the 2nd Wadsworth BCI dataset in BCI2003 competition.
The data objects can be generated from 3 classes: the EEG signals evoked by flashes containing targets,the EEG signals evoked by flashes adjacent to targets, and other EEG signals.
All the data objects have an equal length 144.
![Page 11: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/11.jpg)
11
![Page 12: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/12.jpg)
12
![Page 13: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/13.jpg)
13
![Page 14: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/14.jpg)
14
![Page 15: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007](https://reader035.vdocuments.us/reader035/viewer/2022081516/568150a4550346895dbea813/html5/thumbnails/15.jpg)
15
50 EEG signals are randomly choose from each class, all the time series have the same length, therefore it’s the Euclidean distance which is used to measure the pairwise distances of the time series.
The results are compared with results from hierarcical agglomerative clustreing (HAC). There are 3 kinds of HAC approaches according to the different similarity measure :
Complete-linkage HAC (CHAC)Single-linkage HAC (SHAC)Average-linkage HAC (AHAC)