clustering introduction preprocessing: dimensional reduction with svd clustering methods: k-means,...
Post on 15-Jan-2016
230 views
TRANSCRIPT
![Page 1: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/1.jpg)
Clustering
IntroductionPreprocessing: dimensional reduction with SVDClustering methods: K-means, FCMHierarchical methodsModel based methods (at the end)Competitive NN (SOM) (not shown here)SVC, QCApplicationsCOMPACT
(an ill-defined problem)
![Page 2: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/2.jpg)
What Is Clustering?
Why? To help understand the natural grouping or structure in a data set
When? Used either as a stand-alone tool to get insight into data distribution or as a preprocessing step for other algorithms, e.g., to discover
classes
Not
Not
Clas
sifica
tion
Class
ifica
tion
!!!!
Clustering is partitioning of data into meaningful (?) groups called clusters.Cluster a collection of objects that are “similarsimilar” to one
another … what is similar? unsupervised learning: no predefined classes
![Page 3: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/3.jpg)
Clustering Applications
Operations Research: Facility Location Problem: locate fire stations so as to
minimize the maximum/average distance a fire truck must travel
Signal Processing Vector Quantization: Transmit large files (e.g., video,
speech) by computing quantizers Astronomy:
SkyCat: Clustered 2x109 sky objects into stars, galaxies, quasars, etc based on radiation emitted in different spectrum bands.
![Page 4: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/4.jpg)
Clustering Applications
Marketing: Segmentation of customers for target marketing Segmentation of customers based on online clickstream data.
Web To discover categories of content. Search results
Bioinformatics Gene expression
Finding groups of individuals (sick Vs. healthy) Finding groups of genes
Motifs search. …
In practice, clustering is one of the most widely used data mining techniques Association rule algorithms produce too many rules Other machine learning algorithms require labeled data.
![Page 5: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/5.jpg)
Points/Metric Space Points could be in in Rd, {0,1}d,… Metric Space: dist(x,y) is a distance metric
if Reflexive: dist(x,y)=0 iff x=y Symmetric: dist(x,y)= dist(y,x) Triangle Inequality: dist(x,y) dist(x,z) +
dist(z,y)
x
y
![Page 6: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/6.jpg)
Example of Distance Metrics The distance between x=<x1,…,xn> and
y=<y1,…,yn> is: L2 norm: Manhattan Distance (L1 norm):
Documents: Cosine measure Similarity
i.e., more similar -> close to 1 less similar -> close to 0
Not a metric space, but 1-cos is
2211 )()( nn yxyx
nn yxyx 11
![Page 7: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/7.jpg)
Correlation
We might care more about the overall shape of expression profiles rather than the actual magnitudes
That is, we might want to consider genes similar when they are “up” and “down” together
When might we want this kind of measure? What experimental issues might make this appropriate?
![Page 8: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/8.jpg)
Pearson Linear Correlation
We’re shifting the expression profiles down (subtracting the means) and scaling by the standard deviations (i.e., making the data have mean = 0 and std = 1)
n
ii
n
ii
n
ii
n
ii
i
n
ii
yn
y
xn
x
yyxx
yyxx
1
1
)()(
))((),(
1
2
1
2
1yx
![Page 9: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/9.jpg)
Pearson Linear Correlation Pearson linear correlation (PLC) is a measure that is
invariant to scaling and shifting (vertically) of the expression values
Always between –1 and +1 (perfectly anti-correlated and perfectly correlated)
This is a similarity measure, but we can easily make it into a dissimilarity measure:
2
),(1 yxpd
![Page 10: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/10.jpg)
PLC (cont.)
PLC only measures the degree of a linear relationship between two expression profiles!
If you want to measure other relationships, there are many other possible measures (see Jagota book and project #3 for more examples)
= 0.0249, so dp = 0.4876
The green curve is the square of the blue curve – this relationship is not captured with PLC
![Page 11: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/11.jpg)
More correlation examples
What do you think the correlation is here? Is this what we want?
How about here? Is this what we want?
![Page 12: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/12.jpg)
Missing Values A common problem w/ microarray data One approach with Euclidean distance or
PLC is just to ignore missing values (i.e., pretend the data has fewer dimensions)
There are more sophisticated approaches that use information such as continuity of a time series or related genes to estimate missing values – better to use these if possible
![Page 13: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/13.jpg)
Preprocessing
For methods that are not applicable in very high dimensions you may want to apply
- Dimensional reduction, e.g. consider the first few SVD components (truncate S at r-dimensions) and use the remaing values of the U or V matrices
- Dimensional reduction + normalization: after applying dimensional reduction normalize all resulting vectors to unit length (i.e. consider angles as proximity measures)
- Feature selection, e.g. consider only features that have large variance. More on feature selection in the future.
![Page 14: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/14.jpg)
Clustering Types
Exclusive vs. Overlapping Clustering Hierarchical vs. Global Clustering Formal vs. Heuristic Clustering
First two examples:
K-Means: exclusive, global, heuristic
FCM (fuzzy c-means): overlapping, global, heuristic
![Page 15: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/15.jpg)
Two classes of data described by (o) and (*). The objective is to reproduce the two classes by K=2 clustering.
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 16: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/16.jpg)
1. Place two cluster centres (x) at random.2. Assign each data point (* and o) to the nearest cluster centre (x)
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 17: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/17.jpg)
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
1. Compute the new centre of each class2. Move the crosses (x)
![Page 18: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/18.jpg)
Iteration 2
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 19: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/19.jpg)
Iteration 3
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 20: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/20.jpg)
Iteration 4 (then stop, because no visible change)Each data point belongs to the cluster defined by the nearest centre
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 21: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/21.jpg)
The membership matrix M: 1. The last five data points (rows) belong to the first cluster (column)2. The first five data points (rows) belong to the second cluster (column)
M =
0.0000 1.0000
0.0000 1.0000
0.0000 1.0000
0.0000 1.0000
0.0000 1.0000
1.0000 0.0000
1.0000 0.0000
1.0000 0.0000
1.0000 0.0000
1.0000 0.0000
![Page 22: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/22.jpg)
Membership matrix M
otherwise
ifm jkikik
0
122
cucu
data point k cluster centre i
distance
cluster centre j
Results of K-means depend on the starting point of the algorithm. Repeat it several times to get a better feeling whether the results are meaningful.
![Page 23: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/23.jpg)
c-partition
Kc
iallforUCØ
jiallforØCC
UC
i
ji
c
ii
2
1
All clusters C together fills the
whole universe UClusters do not
overlap
A cluster C is never empty and it is smaller than the whole universe U
There must be at least 2 clusters in a c-partition
and at most as many as the number of data
points K
![Page 24: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/24.jpg)
Objective function
c
i Ckik
c
ii
ik
JJ1
2
,1 u
cu
Minimise the total sum of all distances
![Page 25: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/25.jpg)
Algorithm: fuzzy c-means (FCM)
![Page 26: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/26.jpg)
Each data point belongs to two clusters to different degrees
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 27: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/27.jpg)
1. Place two cluster centres
2. Assign a fuzzy membership to each data point depending on distance
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 28: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/28.jpg)
1. Compute the new centre of each class2. Move the crosses (x)
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 29: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/29.jpg)
Iteration 2
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 30: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/30.jpg)
Iteration 5
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 31: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/31.jpg)
Iteration 10
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 32: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/32.jpg)
Iteration 13 (then stop, because no visible change)Each data point belongs to the two clusters to a degree
-8 -6 -4 -2 0 2-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
log(intensity) 475 Hz
log
(inte
nsi
ty)
55
7 H
z
Tiles data: o = whole tiles, * = cracked tiles, x = centres
![Page 33: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/33.jpg)
The membership matrix M: 1. The last five data points (rows) belong mostly to the first cluster (column)2. The first five data points (rows) belong mostly to the second cluster (column)
M =
0.0025 0.9975
0.0091 0.9909
0.0129 0.9871
0.0001 0.9999
0.0107 0.9893
0.9393 0.0607
0.9638 0.0362
0.9574 0.0426
0.9906 0.0094
0.9807 0.0193
![Page 34: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/34.jpg)
Hard Classifier (HCM)
Ok
light
moderate
severeOk
A cell is either one or the other class defined by a colour.
![Page 35: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/35.jpg)
Fuzzy Classifier (FCM)
Ok
light
moderate
severeOk
A cell can belong to several classes to adegree, i.e., one columnmay have several colours.
![Page 36: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/36.jpg)
Dendrograms allow us to visualize visualization is not unique!
Tends to be sensitive to small changes in the data Provided with clusters of every size: where to “cut” is
user-determined Large storage demand +
Running Time: O(n2 * |levels|) = O(n3) Depends on: distance measure, linkage method
Hierarchical Clustering• Greedy• Agglomerative vs. Divisive
![Page 37: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/37.jpg)
Hierarchical Agglomerative Clustering
We start with every data point in a separate cluster
We keep merging the most similar pairs of data points/clusters until we have one big cluster left
This is called a bottom-up or agglomerative method
![Page 38: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/38.jpg)
Hierarchical Clustering (cont.) This produces a
binary tree or dendrogram
The final cluster is the root and each data item is a leaf
The height of the bars indicate how close the items are
![Page 39: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/39.jpg)
Hierarchical Clustering Demo
![Page 40: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/40.jpg)
Hierarchical Clustering Issues Distinct clusters are not produced –
sometimes this can be good, if the data has a hierarchical structure w/o clear boundaries
There are methods for producing distinct clusters, but these usually involve specifying somewhat arbitrary cutoff values
What if data doesn’t have a hierarchical structure? Is HC appropriate?
![Page 41: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/41.jpg)
Support Vector Clustering
Given points x in data space, define images in Hilbert space.
Require all images to be enclosed by a minimal sphere in Hilbert space.
Reflection of this sphere in data space defines cluster boundaries.
Two parameters: width of Gaussian kernel and fraction of outliers
Ben-Hur, Horn, Siegelmann & Vapnik. JMLR 2 (2001) 125-127
![Page 42: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/42.jpg)
![Page 43: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/43.jpg)
![Page 44: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/44.jpg)
![Page 45: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/45.jpg)
Variation of q allows for clustering solutions on various scales
q=1,
20,
24,
48
![Page 46: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/46.jpg)
![Page 47: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/47.jpg)
Example that allows for SVclustering only in presence of outliers. Procedure: limit β <C=1/pN, where p=fraction of assumed outliers in the data.
q=3.5 p=0 q=1 p=0.3
![Page 48: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/48.jpg)
Similarity to scale space approach for high values of q and p. Probability distribution obtained from R(x) .
q=4.8 p=0.7
![Page 49: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/49.jpg)
From Scale-space to Quantum Clustering
Parzen window approach: estimate the probability density by kernel functions (Gaussians) located at data points.
N
i
N
i
xx
i
i
ecxfcxP1 1
2
)(2
2
)(
σ= 1/√(2q)
![Page 50: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/50.jpg)
Quantum Clustering
View P= as the solution of the Schrödinger equation:
with the potential V(x) responsible for attraction to cluster centers and the Lagrangian causing the spread.
Find V(x):
i
xx
i
i
exxd
EExV2
2
22
2
22
2
1
22
ExVH 2
2
2
Horn and Gottlieb, Phys. Rev. Lett. 88 (2002) 018702
![Page 51: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/51.jpg)
The Crabs Example The Crabs Example (from Ripley’s (from Ripley’s textbook)textbook)4 classes, 50 samples each, d=54 classes, 50 samples each, d=5
A topographic map of the probability distribution for the crab data set with =1/2 using principal components 2 and 3. There exists only one maximum.
![Page 52: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/52.jpg)
The Crabs ExampleQC potential exhibits four minima identified with cluster centers
A topographic map of the potential for the crab data set with =1/2 using principal components 2 and 3 . The four minima are denoted by crossed circles. The contours are set at values V=cE for c=0.2,…,1.
![Page 53: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/53.jpg)
The Crabs Example - ContdThe Crabs Example - Contd..
A three dimensional plot of the potential for the crab data set with =1/3 using principal components 2 and 3
![Page 54: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/54.jpg)
The Crabs Example - ContdThe Crabs Example - Contd..
A three dimensional plot of the potential for the crab data set with =1/2 using principal components 2 and 3
![Page 55: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/55.jpg)
Identifying Clusters
Local minima of the potential are identified with cluster centers.
Data points are assigned to clusters according to:-minimal distance from centers, or,-sliding points down the slopes of the potential
with gradient descent until they reach the centers.
![Page 56: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/56.jpg)
The Iris ExampleThe Iris Example3 classes, each containing 50 samples, d=43 classes, each containing 50 samples, d=4
A topographic map of the potential for the iris data set with =0.25 using principal components 1 and 2. The three minima are denoted by crossed circles. The contours are set at values V=cE for c=0.2,…,1.
![Page 57: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/57.jpg)
The Iris Example - Gradient Descent DynamicsThe Iris Example - Gradient Descent Dynamics
![Page 58: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/58.jpg)
The Iris Example - Using Raw Data in 4DThe Iris Example - Using Raw Data in 4D..
There are only 5 misclassifications. =0.21.
![Page 59: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/59.jpg)
Example – Yeast cell cycle
Yeast cell cycle data were studied by several Yeast cell cycle data were studied by several groups who have applied SVD. groups who have applied SVD. (Spellman et al. (Spellman et al.
Molecular Biology of the Cell, 9, Dec. 2000)Molecular Biology of the Cell, 9, Dec. 2000) We use it to test clustering of genes, whose We use it to test clustering of genes, whose classification into groups was investigated by classification into groups was investigated by Spellman et al.Spellman et al.
The gene/sample matrix that we start from has The gene/sample matrix that we start from has dimensions of 798x72, using the same selection dimensions of 798x72, using the same selection as made by as made by (Shamir, R. and Sharan, R. 2002 ). (Shamir, R. and Sharan, R. 2002 ).
We truncate it to r=4 and obtain, once again, We truncate it to r=4 and obtain, once again, our best results for our best results for σσ=0.5, where four clusters =0.5, where four clusters follow from the QC algorithm. follow from the QC algorithm.
![Page 60: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/60.jpg)
Example – Yeast cell cycle
The five gene families as represented in two coordinates of our r=4 dimensional space.
![Page 61: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/61.jpg)
Example – Yeast cell cycle
Cluster assignments of genes for QC with s=0.46 , as compared to the classification by Spellman into five classes, shown as alternating gray and white areas .
![Page 62: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/62.jpg)
Yeast cell cycle in normalized 2 dimensions
![Page 63: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/63.jpg)
Hierarchical Quantum Clustering (HQC)
Start with raw data matrix containing gene expression profiles of the samples.
Apply SVD and truncate to r-space by selecting the first r significant eigenvectors
Apply QC in r-dimensions starting at small scale , obtaining many clusters. Move data points to cluster centers and reiterate the process at higher σ. This produces hierarchical clustering that can be represented by a dendrogram.
![Page 64: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/64.jpg)
Example – Clustering of human cancer cells
The NCI60 set is a gene expression profile of The NCI60 set is a gene expression profile of ~8000 genes in 60 human cancer cells. ~8000 genes in 60 human cancer cells.
NCI60 includes cell lines derived from cancers NCI60 includes cell lines derived from cancers of colorectal, renal, ovarian, breast, prostate, of colorectal, renal, ovarian, breast, prostate, lung and central nervous system, as well as lung and central nervous system, as well as leukemias and melanomas.leukemias and melanomas.
After application of selective filters the number After application of selective filters the number of gene spots is reduced to 1,376 gene subset. of gene spots is reduced to 1,376 gene subset. (Scherf et al. – Nature Genetics 24 , 2000)(Scherf et al. – Nature Genetics 24 , 2000)
We applied HQC with r=5 dimensionWe applied HQC with r=5 dimension.
![Page 65: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/65.jpg)
Example – Clustering of human cancer cells
Dendrogram of 60 cancer cell samples. The clustering was done in 5 truncated dimensions. The first 2 letters in each sample represent the tissue/cancer type.
![Page 66: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/66.jpg)
Example - Projection onto the unit sphere
Representation of data of four classes of cancer cells on two dimensions of the truncated space. The circles denote the locations of the data points before this normalization was applied
![Page 67: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/67.jpg)
COMPACT – a comparative package for clustering assessment
Compact is a GUI Matlab tool that enables an easy and intuitive way to compare some clustering methods.
Compact is a five-step wizard that contains basic Matlab clustering methods as well as the quantum clustering algorithm. Compact provides a flexible and customizable interface for clustering data with high dimensionality.
Compact allows both textual and graphical display of the clustering results
![Page 68: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/68.jpg)
How to Install?
COMPACT is a self-extracting package. In order to install and run the QUI tool, follow these three easy steps
Download the COMPACT.zip package to your local drive.
Add the COMPACT destination directory to your Matlab path.
Within Matlab, type ‘compact’ at the command prompt.
![Page 69: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/69.jpg)
Steps – 1
Input parameters
![Page 70: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/70.jpg)
Steps – 1
Selecting variables
![Page 71: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/71.jpg)
Steps – 2
Determining the matrix shape and vectors to cluster
![Page 72: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/72.jpg)
Steps – 3
Preprocessing Procedures Components’ variance
graphs Preprocessing parameters
![Page 73: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/73.jpg)
Steps – 4
Points distribution preview
and clustering method selection
![Page 74: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/74.jpg)
Steps – 5
Parameters for clustering algorithms Kmeans
![Page 75: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/75.jpg)
Steps – 5
Parameters for clustering algorithms FCM
![Page 76: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/76.jpg)
Steps – 5
Parameters for clustering algorithms NN
![Page 77: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/77.jpg)
Steps – 5
Parameters for clustering algorithms QC
![Page 78: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/78.jpg)
Steps – 6COMPACT results
![Page 79: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/79.jpg)
Steps – 6Results
![Page 80: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/80.jpg)
Clustering Methods: Model-Based Data are generated from a mixture of
underlying probability distributions
![Page 81: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/81.jpg)
Some Examples Two univariate
normal components
Equal proportions Common
variance 2=1
=1 =2
=3 =4
![Page 82: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/82.jpg)
Two univariate normal components
proportions 0.75 and 0.25
Common variance 2=1
=1 =2
=3 =4
![Page 83: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/83.jpg)
and some more
![Page 84: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/84.jpg)
Probability Models
Classification Likelihood
1 11
( ,..., ; ,..., | ) ( | )i i
n
C G n ii
L x f x
set of parameters of cluster K k
|i ik x K Mixture Likelihood
1 111
( ,..., ; ,..., | ) ( | )n G
M G G k k i kki
L x f x
is the probability that an observation belongs to cluster K ( )
k0;k
1
1G
kk
![Page 85: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/85.jpg)
Probability Models (Cont.) Most used multivariate normal distribution
Θk has a means vector μk and a covariance matrix Σk
11( ) ( )
2/ 2
1( | , )
2 | |
Ti k k i kx x
k i k k pk
f x e
How is the covariance matrix Σk calculated?
![Page 86: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/86.jpg)
Calculating the covariance matrix Σk
The idea: parameterize the covariance matrixT
k k k k kD A D Dk – Orthogonal matrix of eigenvectors
Determines the orientation of the PCs of Σk
Ak – Diagonal matrix whose elements are proportional to the eigenvalues of Σk
Determines the shape of the density contours
λk – Scalar Determines the volume of the corresponding
ellipsoid
![Page 87: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/87.jpg)
Σk Definition Determines the Model
spherical, equal (SOS criterion)k I all ellipsoids are equal k DAD
![Page 88: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/88.jpg)
How is Θk computed? EM algorithm
1 1
( , , | ) [log ( | )]n G
k k ik ik k k i ki k
l z x z f x
The complete-data log-likelihood(*)
1 if belongs to group
0 otherwisei
ik
x kz
Density of an observation given zi is
is the conditional expectation of zik given xi and Θ1,…, ΘG
1
( | ) ik
Gz
k i kk
f x
1ˆ [ | , ,..., ]ik ik i Gz E z x
![Page 89: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/89.jpg)
1
ˆˆ
n
ik ii
kk
z x
n
ˆ kk
n
n
1
ˆn
k iki
n z
ˆk depends on the model
ˆikz• E: calculate,
,1
ˆˆ ˆ( | )ˆ
ˆˆ ˆ( | )
k k i k kik G
j j i j jj
f xz
f x
• M: given maximize (*)ˆikz
![Page 90: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the](https://reader036.vdocuments.us/reader036/viewer/2022062305/56649d495503460f94a25437/html5/thumbnails/90.jpg)
Limitations of the EM Algorithm Low rate of convergence
You should start with good starting points and hope for separable clusters…
Not practical for large number of clusters (== probabilities)
"Crashes" when covariance matrix becomes singular Problems when there are few observation in a
cluster EM must not get more clusters than exist in
nature…