philippe biela – journée classpec - emd – 6/07/2007

15
1 Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

Upload: bart

Post on 21-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Philippe Biela – Journée ClasSpec - EMD – 6/07/2007. ABSTRACT. The paper presents a general framework for time series clustering based with spectral decomposition of the affinity matrix - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

1Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

Page 2: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

2

The paper presents a general framework for time series clustering based with spectral decomposition of the affinity matrix

A Gaussian function is used to construct the affinity matrix and develop a gradient based method for self-tuning the variance of the Gaussian function.

The approach can be used to cluster both constant and variable length time series.

The algorithm is able to discover the optimal number of clusters automatically.

Experimental results are presented to show the effectiveness of the method.

ABSTRACT

Page 3: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

3

Theoretical Background

We consider a set of M Time Series with same length d :

The data matrix with Time Series is :

Consider that we have K clusters, we can suppose that a permuation matrix E exist :

Ai represents the i-th cluster and si the number of data in the i-th

cluster

Page 4: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

4

We consider the within-cluster scatter dispersion matrix of cluster k :

mk is the mean vector of the k-th cluster

The total within-cluster scatter matrix is Sw

The goal of clustering is to achieve high within similarity and low between-cluster similarity, that is we should minimize trace (Sw) and maximise trace (Sb)

Page 5: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

5

Maximisation of trace (Sb) is equivalent to minimization of trace (Sw)

Then the optimization criterion becomes :

If we consider the block-diagonal matrix Q as :

where ek is a column vector containing sk « ones »

We can demonstrate that

Is equivalent to precedent criterion if we consider :

and we relax the constraint of

22

Page 6: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

6

The optimal can be obtained by taking the top K eigenvectors of

Then optimization problem becomes :

Page 7: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

7

Where Ai represents the data in cluster i

The normalization makes :

if we assume that data objects are ordered by cluster as :

The similarity matrix S and normalizesd similarity matrix S’ will become block-diagonal

Page 8: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

8

To find a « good » similarity matrix wich is almost block-diagonal,

we use the Gaussien function

Then we consider :

Page 9: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

9

Clustering Time Series Algorithm via Spectral Decomposition

Page 10: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

10

In this experiment they use a real EEG dataset which is extracted from the 2nd Wadsworth BCI dataset in BCI2003 competition.

The data objects can be generated from 3 classes: the EEG signals evoked by flashes containing targets,the EEG signals evoked by flashes adjacent to targets, and other EEG signals.

All the data objects have an equal length 144.

Page 11: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

11

Page 12: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

12

Page 13: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

13

Page 14: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

14

Page 15: Philippe Biela – Journée ClasSpec - EMD – 6/07/2007

15

50 EEG signals are randomly choose from each class, all the time series have the same length, therefore it’s the Euclidean distance which is used to measure the pairwise distances of the time series.

The results are compared with results from hierarcical agglomerative clustreing (HAC). There are 3 kinds of HAC approaches according to the different similarity measure :

Complete-linkage HAC (CHAC)Single-linkage HAC (SHAC)Average-linkage HAC (AHAC)