compression-based unsupervised clustering of spectral signatures

Compression-based Unsupervised Clustering of Spectral Signatures

D. Cerra, J. Bieniarz, J. Avbelj, P. Reinartz, and R. Mueller

WHISPERS, Lisbon, 8.06.2011

Folie 2

Contents

IntroductionCBSM as

Spectral Distances

TraditionalSpectral distances

NCD as spectralDistance

Compression-basedSimilarity Measures

How to quantifyInformation?

NormalizedCompression

Distance

Folie 3

Contents

CBSM asSpectral Distances

Compression-basedSimilarity Measures Introduction

Folie 4

IntroductionMany applications in hyperspectral remote sensing rely on quantifying the similarities between two pixels, represented by spectra:

Classification / SegmentationTarget DetectionSpectral Unmixing

Spectral distancesMostly based on vector processing

Any different (and effective) similarity measure out there?

Similar!

Not Similar!

Folie 5

Contents

Introduction Compression-basedSimilarity Measures

How to quantifyInformation?

NormalizedCompression

Distance

CBSM asSpectral Distances

Folie 6

How to quantify information? Two approaches

Probabilistic (classic) AlgorithmicVS.

Information UncertaintyShannon Entropy

Information ComplexityKolmogorov Complexity

xpxpXH )(log)()( qxKQxq

Related to a single object (string) xLength of the shortest program q among Qx programs which outputs the string x Measures how difficult it is to describe x from scratchUncomputable

Related to a random variable X with probability mass function p(x)Measure of the average uncertainty in XMeasures the average number of bits required to describe XComputable

Folie 7

VS.(Statistic) Mutual Information Algorithmic Mutual Information

Amount of computational resources shared by the shortest programs which output the strings x and y The joint Kolmogorov complexity K(x,y) is the length of the shortest program which outputs x followed by ySymmetric, non-negativeIf then

K(x,y) = K(x) + K(y)x and y are algorithmically independent

Measure in bits of the amount of information a random variable X has about another variable YThe joint entropy H(X,Y) is the entropy of the pair (X,Y) with a joint distribution p(x,y) Symmetric, non-negativeIf I(X;Y) = 0 then

H(X;Y) = H(X) + H(Y)X and Y are statistically independent

),()()();( YXHYHXHYXI ),()()():( yxKyKxKyxIw

yx ypxp

yxpyxpYXI, )()(

),(log),();(

0):( yxIw

Mutual Information in Shannon/Kolmogorov

Probabilistic (classic) Algorithmic

Folie 8

Normalized Information Distance (NID)

Normalized length of the shortest program that computes x knowing y, as well as computing y knowing x

Similarity MetricNID(x,y)=0 iff x=y NID(x,y)=1 -> maximum distance between x and y

The NID minimizes all normalized admissible distances

NID (x, y) = y)} {K(x), K(x)} {K(y), K(K(x, y) -

maxmin

Li - Vitányi

Folie 9

Compression: Approximating Kolmogorov Complexity

Big problem! The Kolmogorov complexity K(x) is uncomputable!K(x) represents a lower bound for what an off-the-shelf compressor can achieve when compressing xWhat if we use the approximation:

C(x) is the size of the file obtained by compressing x with a standard lossless compressor (such as Gzip)

)()( xCxK

AOriginal size: 65 KbCompressed size: 47 Kb

BOriginal size: 65 KbCompressed size: 2 Kb

Folie 10

Normalized Compression Distance (NCD)

Approximate the NID by replacing complexities with compression factors

If two objects compress better together than separately, it means they share common patterns and are similar!!Advantages

Basically parameter-free (data-driven)Applicable with any off-the-shelf compressor to diverse datatypes

C(xy) NCD

C(y)}max {C(x),(y)}min{C(x),CC(x,y)yxNCD

),( K(y)}max {K(x),

(y)}min{K(x),KK(x,y)yxNID ),(

Folie 11

Evolution of CBSM1993 Ziv & Merhav

First use of relative entropy to classify texts

2000 Frank et al., Khmelev First compression-based experiments on text categorization

2001 Benedetto et al.Intuitively defined compression-based relative entropyCaused a rise of interest in compression-based methods

2002 Watanabe et al. Pattern Representation based on Data Compression (PRDC)First in classifying general data with a first step of conversion into strings

2004 NCDSolid theoretical foundations (Algorithmic Information Theory)

2005-2010 Many things came next…Chen-Li Metric for DNA classification (Chen & Li, 2005)Compression-based Dissimilarity Measure (Keogh et al., 2006) Cosine Similarity (Sculley & Brodley, 2006)Dictionary Distance (Macedonas et al., 2008)Fast Compression Distance (Cerra and Datcu, 2010)

Folie 12

Compression-Based Similarity Measures: Applications

Clustering and classification of:

Simple TextsDictionaries from different languagesMusicDNA genomesVolcanologyChain lettersAuthorship attributionImages…

Folie 13

How to visualize a distance matrix?

An unsupervised clustering of a distance matrix related to a dataset can be carried out with a dendrogram (binary tree)

A dendrogram represents a distance matrix in two dimensionsIt recursively splits the dataset in two groups containing similar objectsThe most similar objects appear as siblings

a b c d e f

a 0 1 1 1 1 1

b 1 0 0.1 0.3 0.4 0.6

c 1 0.1 0 0.4 0.4 0.7

d 1 0.3 0.4 0 0.2 0.5

e 1 0.4 0.4 0.2 0 0.5

f 1 0.6 0.7 0.5 0.5 0

Folie 14

Rodents

An all-purpose method: application to DNA genomes

Clustered by

Primates

Folie 15

Landslides

Explosions

VolcanologySeparate Explosions (ex)

from Landslides (Ls)

Stromboli Volcano

Folie 16

Optical Images Hierarchical Clustering

60 Spot 5 subsets, spatial resolution 5m

Forest

Desert

Fields

Clouds

Folie 17

SAR Scene Hierarchical Clustering

32 TerraSAR-X subsets,

Acquired over Paris,

spatial resolution 1.8m

False Alarm

Folie 18

Contents

IntroductionCBSM as

Spectral Distances

TraditionalSpectral distances

NCD as spectralDistance

Compression-basedSimilarity Measures

Folie 19

Rocks Categorization

41 spectra

From Aster 2.0 Spectral Library

Spectra belonging to different rocks may present a similar behaviour or overlap

Mafic Felsic Shale

Folie 20

Some well-known Spectral Distances

Euclidean Distance Spectral Angle

Spectral Correlation Spectral Information Divergence

Folie 21

Results

Evaluation of the dendrogram through visual inspectionIs it possible to cut the dendogram to separate the classes?How many objects would be misplaced given the best cuts?

Folie 22

Conclusions

The NCD can be employed as a spectral distance, and may provide surprising results

The NCD is resistant to noiseDifferences between minerals of the same class may be regarded as noise

The NCD (implicitly) focuses on the relevant information within the dataWe guess that the analysis benefits from considering the general behaviour of the spectra

Drawbacks

Computationally intensive (spectra have to be analyzed sequentially)Dependent to some extent on the compressor used

In every case the best compressor for the data at hand should be used, which approximates at best the Kolmogorov complexity

Folie 23

Compression

compression-based unsupervised clustering of spectral signatures

y symmetric

y nidx

computing y

outputs x

strings x

pair x

string x measures

random variable x

Documents

unsupervised learning: clustering rong jin outline ...

unsupervised clustering for nontextual web document...

data exploration and unsupervised learning with clustering

a hybrid unsupervised clustering-based anomaly detection

real-time unsupervised clustering

neural networks - lecture 81 unsupervised competitive...

real-time unsupervised clustering -...

transition state clustering: unsupervised surgical...

online deep clustering for unsupervised representation...

unsupervised learning and clustering

unsupervised learning: k-means...

fast spectral clustering for unsupervised hyperspectral...

unsupervised learning learning unsupervised...unsupervised...

chapter 9 clustering and unsupervised...

improving unsupervised image clustering with robust learning

unsupervised clustering methods for image segmentation

unsupervised clustering and active learning of

unsupervised learning (a.k.a...

clustering time series using unsupervised-shapelets...

clustering supervised vs. unsupervised learning examples of...