compression-based unsupervised clustering of spectral signatures
Post on 12-Feb-2016
57 Views
Preview:
DESCRIPTION
TRANSCRIPT
Compression-based Unsupervised Clustering of Spectral Signatures
D. Cerra, J. Bieniarz, J. Avbelj, P. Reinartz, and R. Mueller
WHISPERS, Lisbon, 8.06.2011
Folie 2
Contents
IntroductionCBSM as
Spectral Distances
TraditionalSpectral distances
NCD as spectralDistance
Compression-basedSimilarity Measures
How to quantifyInformation?
NormalizedCompression
Distance
Folie 3
Contents
CBSM asSpectral Distances
Compression-basedSimilarity Measures Introduction
Folie 4
IntroductionMany applications in hyperspectral remote sensing rely on quantifying the similarities between two pixels, represented by spectra:
Classification / SegmentationTarget DetectionSpectral Unmixing
Spectral distancesMostly based on vector processing
Any different (and effective) similarity measure out there?
Similar!
Not Similar!
Folie 5
Contents
Introduction Compression-basedSimilarity Measures
How to quantifyInformation?
NormalizedCompression
Distance
CBSM asSpectral Distances
Folie 6
How to quantify information? Two approaches
Probabilistic (classic) AlgorithmicVS.
Information UncertaintyShannon Entropy
Information ComplexityKolmogorov Complexity
x
xpxpXH )(log)()( qxKQxq
min)(
Related to a single object (string) xLength of the shortest program q among Qx programs which outputs the string x Measures how difficult it is to describe x from scratchUncomputable
Related to a random variable X with probability mass function p(x)Measure of the average uncertainty in XMeasures the average number of bits required to describe XComputable
Folie 7
VS.(Statistic) Mutual Information Algorithmic Mutual Information
Amount of computational resources shared by the shortest programs which output the strings x and y The joint Kolmogorov complexity K(x,y) is the length of the shortest program which outputs x followed by ySymmetric, non-negativeIf then
K(x,y) = K(x) + K(y)x and y are algorithmically independent
Measure in bits of the amount of information a random variable X has about another variable YThe joint entropy H(X,Y) is the entropy of the pair (X,Y) with a joint distribution p(x,y) Symmetric, non-negativeIf I(X;Y) = 0 then
H(X;Y) = H(X) + H(Y)X and Y are statistically independent
),()()();( YXHYHXHYXI ),()()():( yxKyKxKyxIw
yx ypxp
yxpyxpYXI, )()(
),(log),();(
0):( yxIw
Mutual Information in Shannon/Kolmogorov
Probabilistic (classic) Algorithmic
Folie 8
Normalized Information Distance (NID)
Normalized length of the shortest program that computes x knowing y, as well as computing y knowing x
Similarity MetricNID(x,y)=0 iff x=y NID(x,y)=1 -> maximum distance between x and y
The NID minimizes all normalized admissible distances
NID (x, y) = y)} {K(x), K(x)} {K(y), K(K(x, y) -
maxmin
Li - Vitányi
Folie 9
Compression: Approximating Kolmogorov Complexity
Big problem! The Kolmogorov complexity K(x) is uncomputable!K(x) represents a lower bound for what an off-the-shelf compressor can achieve when compressing xWhat if we use the approximation:
C(x) is the size of the file obtained by compressing x with a standard lossless compressor (such as Gzip)
)()( xCxK
AOriginal size: 65 KbCompressed size: 47 Kb
BOriginal size: 65 KbCompressed size: 2 Kb
Folie 10
Normalized Compression Distance (NCD)
Approximate the NID by replacing complexities with compression factors
If two objects compress better together than separately, it means they share common patterns and are similar!!Advantages
Basically parameter-free (data-driven)Applicable with any off-the-shelf compressor to diverse datatypes
x
y
Coder
Coder
Coder
C(x)
C(y)
C(xy) NCD
C(y)}max {C(x),(y)}min{C(x),CC(x,y)yxNCD
),( K(y)}max {K(x),
(y)}min{K(x),KK(x,y)yxNID ),(
Folie 11
Evolution of CBSM1993 Ziv & Merhav
First use of relative entropy to classify texts
2000 Frank et al., Khmelev First compression-based experiments on text categorization
2001 Benedetto et al.Intuitively defined compression-based relative entropyCaused a rise of interest in compression-based methods
2002 Watanabe et al. Pattern Representation based on Data Compression (PRDC)First in classifying general data with a first step of conversion into strings
2004 NCDSolid theoretical foundations (Algorithmic Information Theory)
2005-2010 Many things came next…Chen-Li Metric for DNA classification (Chen & Li, 2005)Compression-based Dissimilarity Measure (Keogh et al., 2006) Cosine Similarity (Sculley & Brodley, 2006)Dictionary Distance (Macedonas et al., 2008)Fast Compression Distance (Cerra and Datcu, 2010)
Folie 12
Compression-Based Similarity Measures: Applications
Clustering and classification of:
Simple TextsDictionaries from different languagesMusicDNA genomesVolcanologyChain lettersAuthorship attributionImages…
Folie 13
How to visualize a distance matrix?
An unsupervised clustering of a distance matrix related to a dataset can be carried out with a dendrogram (binary tree)
A dendrogram represents a distance matrix in two dimensionsIt recursively splits the dataset in two groups containing similar objectsThe most similar objects appear as siblings
a b c d e f
a 0 1 1 1 1 1
b 1 0 0.1 0.3 0.4 0.6
c 1 0.1 0 0.4 0.4 0.7
d 1 0.3 0.4 0 0.2 0.5
e 1 0.4 0.4 0.2 0 0.5
f 1 0.6 0.7 0.5 0.5 0
Folie 14
Rodents
An all-purpose method: application to DNA genomes
Clustered by
Primates
Folie 15
Landslides
Explosions
VolcanologySeparate Explosions (ex)
from Landslides (Ls)
Stromboli Volcano
Folie 16
Optical Images Hierarchical Clustering
60 Spot 5 subsets, spatial resolution 5m
Forest
Desert
City
Fields
Clouds
Sea
Folie 17
SAR Scene Hierarchical Clustering
32 TerraSAR-X subsets,
Acquired over Paris,
spatial resolution 1.8m
False Alarm
Folie 18
Contents
IntroductionCBSM as
Spectral Distances
TraditionalSpectral distances
NCD as spectralDistance
Compression-basedSimilarity Measures
Folie 19
Rocks Categorization
41 spectra
From Aster 2.0 Spectral Library
Spectra belonging to different rocks may present a similar behaviour or overlap
Mafic Felsic Shale
Folie 20
Some well-known Spectral Distances
Euclidean Distance Spectral Angle
Spectral Correlation Spectral Information Divergence
Folie 21
Results
Evaluation of the dendrogram through visual inspectionIs it possible to cut the dendogram to separate the classes?How many objects would be misplaced given the best cuts?
1
2
1 2
3
5
64
7
1
2
34
9 8
12
34
67
5
34
6
8
75
Folie 22
Conclusions
The NCD can be employed as a spectral distance, and may provide surprising results
Why?
The NCD is resistant to noiseDifferences between minerals of the same class may be regarded as noise
The NCD (implicitly) focuses on the relevant information within the dataWe guess that the analysis benefits from considering the general behaviour of the spectra
Drawbacks
Computationally intensive (spectra have to be analyzed sequentially)Dependent to some extent on the compressor used
In every case the best compressor for the data at hand should be used, which approximates at best the Kolmogorov complexity
Folie 23
Compression
top related