clustering methods: part 2d pasi fränti 31.3.2014 speech & image processing unit school of...
TRANSCRIPT
Clustering Methods: Part 2d
Pasi Fränti
31.3.2014
Speech & Image Processing UnitSchool of Computing
University of Eastern FinlandJoensuu, FINLAND
Swap-based algorithms
Part I:
Random Swap algorithm
P. Fränti and J. KivijärviRandomised local search algorithm for the clustering problem Pattern Analysis and Applications, 3 (4), 358-369, 2000.
Pseudo code of Random Swap
RandomSwap(X) C, P
C SelectRandomRepresentatives(X); P OptimalPartition(X, C);
REPEAT T times
(Cnew,j) RandomSwap(X, C); Pnew LocalRepartition(X, Cnew, P, j); Cnew, Pnew Kmeans(X, Cnew, Pnew); IF f(Cnew, Pnew) < f(C, P) THEN
(C, P) Cnew , Pnew;
RETURN (C, P);
Demonstration of the algorithm
Two centroids , butonly one cluster .
One centroid , buttwo clusters .
Two centroids , butonly one cluster .
One centroid , buttwo clusters .
Centroid swap
Swap is made fromcentroid rich area tocentroid poor area.
Swap is made fromcentroid rich area tocentroid poor area.
Local repartition
Fine-tuning by K-means1st iteration
Fine-tuning by K-means2nd iteration
Fine-tuning by K-means3rd iteration
Fine-tuning by K-means16th iteration
Fine-tuning by K-means17th iteration
Fine-tuning by K-means18th iteration
Fine-tuning by K-means19th iteration
Fine-tuning by K-meansFinal result after 25 iterations
Implementation of the swap
1. Random swap:
2. Re-partition vectors from old cluster:
3. Create new cluster:
c x j random M i random Nj i ( , ), ( , )1 1
p d x c i p jik M
i k i
arg min ,1
2
p d x c i Nik j k p
i ki
arg min , ,2
1
Random swap as local search
Study neighbor solutions
Select one and move
Random swap as local search
Fine-tune solution by hill-climbing technique!
Role of K-means
Consider only local optima!
Role of K-means
Effective search space
Role of swap: reduce search space
Chain reaction by K-means after swap
176.53
163.93 163.63 163.51 163.08
150
155
160
165
170
175
180
185
190
K-means Random+ RS
K-means+ RS
Split +RS
Ward +RS
MS
E
Bridge
Independency of initializationResults for T = 5000 iterations
Worst
BestInitial
Initial
Initial
Part II:
Efficiency of Random Swap
Probability of good swap
• Select a proper centroid for removal:
– There are M clusters in total: premoval=1/M.
• Select a proper new location:
– There are N choices: padd=1/N
– Only M are significantly different: padd=1/M
• In total:– M2 significantly different swaps.
– Probability of each different swap is pswap=1/M2
– Open question: how many of these are good?
Number of neighbors
1
4
2
3
6
5
Open question: what is the size of neighborhood ()?
1
2
3
Voronoi neighbors Neighbors by distance
0 %
5 %
10 %
15 %
20 %
25 %
30 %
35 %
40 %
45 %
1 2 3 4 5 6 7 8 9
Number of neighbours
Fre
qu
en
cy
Average = 3.9
Observed number of neighborsData set S2
Average number of neighbors
• Probability of not finding good swap:T
Mq
2
2
1
Expected number of iterations
2
2
1loglogM
Tq
2
2
1log
log
M
qT
• Estimated number of iterations:
Observed q-values Estimated iterations (T ) S1 S2 S3 S4 S1 S2 S3 S4
q=10% 19% 14% 22% 22% 53 47 39 37 q=1% 3.1% 1.2% 1.0% 3.6% 106 93 78 74 q=0.1% 0.1% 0.1% 0.2% 1.1% 159 140 117 111 Expected: 72 56 55 48 23 21 17 16
Estimated number of iterationsdepending on T
S1 S2 S3 S4
Observed = Number of iterations needed in practice.Estimated = Estimate of the number of iterations needed for given q
Probability of success (p)depending on T
0
20
40
60
80
100
0 50 100 150 200 250 300
Iterations
p
0.000000001
0.00000001
0.0000001
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
0 50 100 150 200 250 300
Iterations
q
Probability of failure (q) depending on T
0.01 %
0.10 %
1.00 %
10.00 %
100.00 %
16 32 64 128 256 512 1024
Dimensionality
q
Observed for q=0.10%
Observed for q=1%
Observed for q=10%
Observed probabilities depending on dimensionality
2
2
ln -α
MqT
2
2
2222-ln
/
ln -
/1ln
ln
α
Mq
Mα
q
Mα
qT
Bounds for the number of iterations
Upper limit:
Lower limit similarly; resulting in:
2
2
1 2
1ˆ
M
iT
w
i
Multiple swaps (w)
Probability for performing less than w swaps:
Expected number of iterations:
iTiw
i MMi
Tq
2
2
2
21
0
1
K-means clustering result(3 swaps needed)
Final clustering result
Number of swaps neededExample from image quantization
Efficiency of the random swap
Total time to find correct clustering:– Time per iteration Number of
iterations
Time complexity of a single step:– Swap: O(1)– Remove cluster: 2MN/M = O(N)– Add cluster: 2N = O(N)– Centroids: 2(2N/M) + 2 + 2 = O(N/M) – (Fast) K-means iteration: 4N = O(N)*
*See Fast K-means for analysis.
Observed number of steps at iteration: Step: Time complexity:
50 100 500 Centroid swap 2 2 2 2
Cluster removal 2N 7,526 8,448 10,137 Cluster addition 2N 8,192 8,192 8,192 Update centroids 4N/M + 2 + 1 53 61 60
K-means iterations 4N 300,901 285,555 197,327 Total O(N) 316,674 302,258 215,718
Time complexity and the observed number of steps
0
20
40
60
80
100
120
140
0 50 100 150 200 250 300 350 400 450 500
k-means 2. iterationk-means 1. iterationlocal repartition
Bridge
Time spent by K-means iterations
0.1 1 10 100160
165
170
175
180
185
190
1 iteration2 iterations3 iterations4 iterations5 iterations
10 20 30 40 50167
168
169
170
171
172
173
174
Ve rs io n w ith o n e it e ra t io n s e e ms to b e w e a ke s t a ll th e t ime .
Ve rs io n s w ith o th e r a mo u n ts o fit e ra t io n s a re p re t ty e v e n .
T im e (s )
Err
or (
MSE
)
B rid ge
Effect of K-means iterations
Total time complexity
Number of iterations needed (T):
α
NMq-N
α
Mq-MNT
2
2
2 lnln ,
2
2
ln -α
MqT
t = O(αN)
Total time:
Time complexity of a single step (t):
Time complexity: conclusions
1. Logarithmic dependency on q
2. Linear dependency on N
3. Quadratic dependency on M (With large number of clusters, can be too slow)
4. Inverse dependency on (worst case = 2) (Higher the dimensionality and higher the cluster overlap, faster the method)
α
NMq-MNT
2ln,
Bridge
160
165
170
175
180
185
190
0.1 1 10 100 1000Time
MS
E
Random Swap
Repeated k-means
Time-distortion performance
Time-distortion performance
Missa1
5.00
5.25
5.50
5.75
6.00
6.25
6.50
1 10 100 1000Time
MS
E
RamdomSwap
Repeated k-means
Time-distortion performance
400
450
500
550
600
1 10 100 1000 10000Time
MS
E
Random Swap
Repeated k-means
Birch1
Mill
ions
Time-distortion performance
Birch2
0.0
2.0
4.0
6.0
8.0
10.0
1 10 100 1000
Mill
ions
Time
MS
E
Repeated k-means
Random Swap
Time-distortion performance
Europe
2
4
6
8
10
12
14
16
1 10 100 1000Time
MS
E Repeated k-means
RandomSwap
Mill
ion
s
Time-distortion performance
KDD-Cup04 Bio
7.42
7.44
7.46
7.48
7.50
7.52
7.54
7.56
7.58
7.60
100 1000 10000 100000Time
MS
E
Random Swap
Repeated k-means
Mill
ions
References
Random swap algorithm:• P. Fränti and J. Kivijärvi, "Randomised local search algorithm
for the clustering problem", Pattern Analysis and Applications, 3 (4), 358-369, 2000.
• P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139‑1148, August 1998.
Pseudo code:• http://cs.joensuu.fi/sipu/soft/
Efficiency of Random swap algorithm:• P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of
random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.
Part III:
Example when 4 swaps needed
MSE = 4.2 * 109 MSE = 3.4 * 109
1st swap
MSE = 3.1* 109 MSE = 3.0 * 109
2nd swap
MSE = 2.3 * 109 MSE = 2.1 * 109
3rd swap
MSE = 1.9 * 109 MSE = 1.7 * 109
4th swap
MSE = 1.3 * 109
Final result
Part IV:
Deterministic Swap
13
10
15
6
11
1
7
4
5
12
8
14
2
3
9
Two centroids , butonly one cluster .
One centroid , buttwo clusters .
Deterministic swap
Cluster Removal Addition 1 0.80 0.39
2 1.04 0.64 3 5.48 1.09 4 5.66 0.92 5 6.50 0.76 6 7.67 1.01 7 8.47 0.45 8 9.10 0.75 9 9.90 1.42
10 11.09 1.26 11 11.47 0.61 12 12.17 4.70 13 14.61 0.94 14 16.41 0.93 15 16.68 1.41
Costs for the swap:
From where to where?
• Merge two existing clusters [Frigui 1997, Kaukoranta 1998] following the spirit of agglomerative clustering.
• Local optimization: remove the prototype that increases the cost function value least [Fritzke 1997, Likas 2003, Fränti 2006].
• Smart swap: find two nearest prototypes, and remove one of them randomly [Chen, 2010].
• Pairwise swap: locate a pair of inconsistent prototypes in two solutions [Zhao, 2012].
Cluster removal
1. Select an existing cluster– Depending on strategy: 1..M choices.– Each choice takes O(N) time to test.
2. Select a location within this cluster– Add new prototype– Consider only existing points
Cluster addition
Select the cluster
• Cluster with the biggest MSE– Intuitive heuristic [Fritzke 1997, Chen 2010]
– Computationally demanding:
• Local optimization– Try all clusters for the addition [Likas et al, 2003]
– Computationally demanding: O(NM)-O(N2)
Select the location
1. Current prototype + ε [Fritzke 1997]
2. Furthest vector [Fränti et al 1997]
3. Any other split heuristic [Fränti et al, 1997]
4. Random location
5. Every possible location [Likas et al, 2003]
Complexity of swaps
Furthest point in cluster
Prototype removed
Cluster where added
Furthest pointselected
• Initialization: O(MN) • Swap Iteration
– Finding nearest pair: O(M2) – Calculating distortion: O(N) – Sorting clusters: O(M∙logM) – Evaluation of result: O(N) – Repartition and fine-tuning: O(N) Total: O(MN+M2+I∙N)
• Number of iteration expected: < 2∙M
• Estimated total time: O(2M2N)
Smart swap
Smart swap
Nearestprototypes
Cluster with largest distortion
SmartSwap(X,M) → C,PC ← InitializeCentroids(X);P ←PartitionDataset(X, C);Maxorder ← log2M;
order ← 1;WHILE order < Maxorder ci, cj ←FindNearestPair(C);
S ← SortClustersByDistortion(P, C); cswap ←RandomSelect(ci, cj );
clocation ←sorder;
Cnew ← Swap(cswap, clocation);
Pnew ← LocalRepartition(P, Cnew);
KmeansIteration(Pnew, Cnew);
IF f(Cnew) < f(C), THEN
order ← 1; C ←Cnew ;
ELSE order ← order + 1; KmeansIteration(P, C);
Smart swappseudo code
Pairwise swap
Unpaired prototypes
Unpairedprototypes
Nearest neighborsof each other
Nearest neighbor ofthe other set further than in the same set
→Subject to swap
Combinations of random and deterministic swap
Variant Removal Addition
RR Random Random
RD Random Deterministic
DR Deterministic Random
DD Deterministic Deterministic
D2R Deterministic+ data update
Random
D2D Deterministic+ data update
Deterministic
Summary of the time complexities
Random removal
Deterministic removal
RR RD DR DD D2R D2DRemoval O(1) O(1) O(MN) O(MN) O(αN) O(αN)
Addition O(1) O(N) O(1) O(N) O(1) O(N)
Repartition
O(N) O(N) O(N) O(N) O(N) O(N)
K-means O(αN) O(αN) O(αN) O(αN) O(αN) O(αN)
O(αN) O(αN) O(MN) O(MN) O(αN) O(αN)
Profiles of the processing time
0,00
0,05
0,10
0,15
0,20
0,25
0,30
0,35
0,40
0,45
RR RD DR DD D2R D2D
Time (
s) / it
eratio
n
Others
Repartition
Sw ap
K-means
0,00
0,50
1,00
1,50
2,00
2,50
RR RD DR DD D2R D2D
Time (
s) / it
eratio
n
Others
Repartition
Sw ap
K-means
Bridge Birch2
Test data setsData set Type of data set Number of data
vectors (N) Number of clusters (M)
Dimension of data vector (d)
Bridge Gray-scale image 4086 256 16
House* RGB image 34112 256 3
Miss America Residual vectors 6480 256 16
Europe Differential coordinates 169673 2
BIRCH1-BIRCH3 Synthetically generated 100000 100 2
S1- S4 Synthetically generated 5000 15 2
Dim32-1024 Synthetically generated 1000 256 32 – 1024
Data set S1 Data set S2 Data set S3 Data set S4
Birch data sets
Birch1 Birch2 Birch3
ExperimentsBridge
1 10 100160
165
170
175
180
185
Time (s)
Err
or
(MS
E)
Bridge
RRDRRDDD
RD
DD
DR
RandomSwap
ExperimentsBridge
Bridge
165
170
175
180
185
190
0.1 1 10 100Time
MS
E
Random Swap
Repeated k-means
DR
D2R
10 1002
2.5
3
3.5
4
4.5
5x 10
6
Time (s)
Err
or
(MS
E)
Birch2
RRDRRDDD
ExperimentsBirch2
Random Swap
DD
DRRD
Missa1
5.3
5.5
5.7
5.9
6.1
6.3
6.5
1 10 100
Time
MS
E
RamdomSwap
Repeated k-means
DR
D2R
ExperimentsMiss America
Quality comparisons (MSE)with 10 second time constraint
18:14:16:15:14:12:1Average speed-up from RR to RD
2.785.111.025.586.10171.20RD-variant
4.435.701.265.856.41174.08Random Swap
4.105.491.525.926.58177.66Repeated K-means
22.3513.102.378.3412.12251.32Repeated Random
Birch2
×106
Birch1
×108
Europe×107
Miss America
HouseBridge
Literature1. P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the
clustering problem", Pattern Analysis and Applications, 3 (4), 358-369, 2000.
2. P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139‑1148, August 1998.
3. P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.
4. P. Fränti, M. Tuononen and O. Virmajoki, "Deterministic and randomized local search algorithms for clustering", IEEE Int. Conf. on Multimedia and Expo, (ICME'08), Hannover, Germany, 837-840, June 2008.
5. P. Fränti and O. Virmajoki, "On the efficiency of swap-based clustering", Int. Conf. on Adaptive and Natural Computing Algorithms (ICANNGA'09), Kuopio, Finland, LNCS 5495, 303-312, April 2009.
5. J. Chen, Q. Zhao, and P. Fränti, "Smart swap for more efficient clustering", Int. Conf. Green Circuits and Systems (ICGCS’10), Shanghai, China, 446-450, June 2010.
6. B. Fritzke, The LBG-U method for vector quantization – an improvement over LBG inspired from neural networks. Neural Processing Letters 5(1) (1997) 35-45.
7. P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pat. Rec., 39 (5), 761-765, May 2006.
8. T. Kaukoranta, P. Fränti and O. Nevalainen "Iterative split-and-merge algorithm for VQ codebook generation", Optical Engineering, 37 (10), 2726-2732, October 1998.
9. H. Frigui and R. Krishnapuram, "Clustering by competitive agglomeration". Pattern Recognition, 30 (7), 1109-1119, July 1997.
Literature
10. A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, 451-461, 2003.
11. PAM (Kaufman and Rousseeuw, 1987)
12. CLARA (Kaufman and Rousseeuw in 1990)
13. CLARANS (A Clustering Algorithm based on Randomized Search) (Ng and Han 1994)
14. R.T. Ng and J. Han, “CLARANS: A method for clustering objects for spatial data mining,” IEEE Transactions on knowledge and data engineering, 14 (5), September/October 2002.
Literature