random swap algorithm - itä-suomen yliopistocs.uef.fi/pages/franti/cluster/randomswap.pdf · 2018....

Random Swapalgorithm

Pasi Fränti24.4.2018

Definitions and data

Set of N data points:X={x1 , x2 , …, xN

}

Set of k cluster prototypes (centroids):C={c1 , c2 , …, ck

},

P={p1 , p2 , …, pk

},

Partition of the data:

N

iPi iCxd

dNMSE

1

21

NiCxP jikj

i ,1 minarg2

1

kjxCjPjP

ijii

,1 1

Clustering problem

Objective function:

Optimality of partition:

Optimality of centroid:

K-means algorithmX = Data setC = Cluster centroidsP = Partition

K-Means(X, C) → (C, P)REPEAT

Cprev

←

C;FOR i=1 TO

N DO

pi ←

FindNearest(xi , C);

FOR j=1

TO

k DOcj ←

Average of xi

pi = j;

UNTIL C = Cprev

Optimal partition

Optimal centoids

Problems of k-means

Swapping strategy

Pigeon hole principle

P. Fränti, M. Rezaei and Q. Zhao"Centroid index: cluster level similarity measure”Pattern Recognition, 47 (9), 3034-3045, September 2014, 2014.

CI = Centroid index:

Pigeon hole principle

S2

Aim of the swap

Two centroids , butonly one cluster .

One centroid , buttwo clusters .

Two centroids , butonly one cluster .

One centroid , buttwo clusters .

S2

Random Swap algorithm

Random Swap(X) C, P

C Select random representatives(X); P Optimal partition(X, C); REPEAT T times

(Cnew,j) Random swap(X, C); Pnew Local repartition(X, Cnew, P, j); Cnew, Pnew Kmeans(X, Cnew, Pnew); IF f(Cnew, Pnew) < f(C, P) THEN

(C, P) Cnew, Pnew; RETURN (C, P);

Steps of the swap

1. Random swap:

2. Re-allocate vectors from old cluster:

3. Create new cluster:

),1(),,1( Nrandomikrandomjxc ij

jpicxdp ijikj

i

,minarg 2

1

p d x c i Nik j k p

i ki

arg min , ,2

1

Swap is made fromcentroid rich area tocentroid poor area.

Swap is made fromcentroid rich area tocentroid poor area.

Swap

Local re-partition

Re-allocate vectors

Create new cluster

Iterate by k-means1st iteration

Iterate by k-means2nd iteration

Iterate by k-means3rd iteration

Iterate by k-means16th iteration

Final result25 iterations

Extreme example

176.53

163.93 163.63 163.51 163.08

150

155160

165170

175

180185

190

K-means Random+ RS

K-means+ RS

Split +RS

Ward +RS

MSE

Bridge

Dependency on initial solution

Data sets

Data sets visualized Images

Bridge House Miss America Europe

16-d 3-d 16-d 2-d

4x4 blocksframe differential

Differentialcoordinates

RGB color4x4 blocks

S1 S2 S3 S4

Unbalance DIM32

Data sets visualized Artificial

Data sets visualized Artificial

G2-2-30 G2-2-50 G2-2-70

A1 A2 AA3

Time complexity

Efficiency of the random swap

Total time to find correct clustering:– Time per iteration

Number of iterations

Time complexity of single iteration:– Swap: O(1)– Remove cluster: 2kN/k = O(N)– Add cluster: 2N = O(N)– Centroids: 2N/k + 2N/k + 2

= O(N/k)

– K-means: IkN = O(IkN)

Bottleneck!

Efficiency of the random swap

Total time to find correct clustering:– Time per iteration

Number of iterations

Time complexity of single iteration:– Swap: O(1)– Remove cluster: 2kN/k = O(N)– Add cluster: 2N = O(N)– Centroids: 2N/k + 2N/k + 2

= O(N/k)

– (Fast) K-means: 4N = O(N)

T. Kaukoranta, P. Fränti and O. Nevalainen"A fast exact GLA based on code vector activity detection"IEEE Trans. on Image Processing, 9 (8), 1337-1342, August 2000.

2 iterations only!

Estimated and observed steps

N=4096, k=256, N/k=16, 8Bridge

KM 1

KM 2

0

2

4

6

8

10

0 50 100 150 200 250 300 350 400 450

Tim

e (m

s)

Local repartition

Bridge

53 %

39 %

N = 4096k = 256 8

Processing time profile

KM 1

KM 2

0

40

80

120

160

200

0 100 200 300 400 500 600 700 800 900

Tim

e (m

s)

Local repartition

Birch1

49 %

48 %

CI=0 reached

N = 100.000k = 100 5.5

50 %44 %

Processing time profile

160

165

170

175

180

185

190

0.01 0.1 1 10 100Time (s)

MSE

Effect of K-means iterations

21

34

5

Bridge

0

2

4

6

8

10

12

0.1 1 10 100Time (s)

MSE

Effect of K-means iterations

213

45

Birch2

How many swaps?

Three types of swaps

20.12 15.8720.09

Before swapBefore swap AcceptedAccepted SuccessfulSuccessful

CI=2 CI=2 CI=1

• Trial swap• Accepted swap• Successful swap

MSE improvesCI improves

Accepted and successful swaps

CI=4CI=4 CI=9CI=9

Number of swaps needed Example with 35 clusters

A3A3

K-means clustering result(3 swaps needed)

Rem

ove

Remove

Added

Final clustering result

Number of swaps needed Example from image quantization

0 %

5 %

10 %

15 %

20 %

0 10 20 30 40 50 60 70 80 90 100

Swaps

Max = 322

Average = 35

90% cases < 70

Median = 26

S1

Statistical observationsN=5000, k=15, d=2, N/k=333, 4.1

N=5000, k=15, d=2, N/k=333, 4.8

0 %

5 %

10 %

15 %

20 %

0 10 20 30 40 50 60 70 80 90 100

Swaps

Max = 248Average = 25

90% cases < 50

Median = 18 S4

Statistical observations

Theoretical estimation

Probability of good swap

• Select a proper prototype to remove:– There are k clusters in total: premoval =1/k

• Select a proper new location:– There are N choices: padd =1/N– Only k are significantly different: padd =1/k

• Both happens same time:– k2 significantly different swaps.– Probability of each different swap is pswap =1/k2

– Open question: how many of these are good?

p = (/k)(/k) = O(/k)2

• Probability of not finding good swap:

Expected number of iterations

2

2

1loglogk

Tq

2

2

1log

log

k

qT

• Estimated number of iterations:

T

kq

2

2

1

0.000000001

0.00000001

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

0 50 100 150 200 250 300Iterations

q

Probability of failure (q) depending on T

3.81.8

2.75.2

0.6

0.2 0.3

1.3

18 21 17 22

0.1

1

10

100

S1 S2 S3 S4Dataset

q

q =0.1%

q =1%

q =10%

S 1-S 4

Observed probability (%) of failN=5000, k=15, d=2, N/k=333, 4.5

1.3 2.01.50.60.9

0.50.50

0.010.010.010.010.01

10.3 9.616.3 14.7 16.0 19.6

0.001

0.01

0.1

1

10

100

32 64 128 256 512 1024

Dimensions

q

q =0.1%

q =1%

q =10%

Dim 32-128

Observed probability (%) of failN=1024, k=16, d=32-128, N/k=64, 1.1

2

2

ln -αkqT

2

2

2222 -ln /

ln -/1ln

ln αkq

kαq

kαqT

Bounds for the iterations

Upper limit:

Lower limit similarly; resulting in:

Multiple swaps (w)

Probability for performing less than w swaps:

Expected number of iterations:

iTiw

i kkiT

q

2

2

2

21

01

2

2

2

2

1

log1ˆkwk

it

w

i

Expected time complexity

1. Linear dependency on N2. Quadratic dependency on k

(With large number of clusters, it can be too slow)

3. Logarithmic dependency on w(Close to constant)

4. Inverse dependency on (Higher the dimensionality, faster the method)

αNkw-N

αkwkNt

2

2

2 log log,ˆ

400

600

800

1000

1200

1400

1600

1800

1009080706050403020100Data size (thousands)

Itera

tions

Birch275%

25%N=100.000

Linear dependency on NN<100.000, k=100, d=2, N/k=1000, 3.1

Quadratic dependency on kN<100.000, k=100, d=2, N/k=1000, 3.1

0

200

400

600

800

1000

1200

1400

1009080706050403020100Clusters

Itera

tions

Birch2

75%

25%

Quadratic fit

Logarithmic dependency on w

Theory vs. reality

Neighborhood size

How much is

?

Voronoi neighbors Neighbors by distance

1

2

3

1

4

2

3

6

5

2(3k-6)/k = 6 – 12/k 2kD/2/k = O(2k D/2-1)

2-dim:D-dim:

kUpper limit:

S1S1

0 %

5 %

10 %

15 %

20 %

25 %

30 %

35 %

40 %

45 %

1 2 3 4 5 6 7 8 9

Number of neighbours

Freq

uenc

y

Average = 3.9

Observed number of neighbors Data set S2

• Five iterations of random swap clustering

• Each pair of prototypes A and B:1. Calculate the half point HP = (A+B)/22. Find the nearest prototype C for HP3. If C=A or C=B they are potential neighbors.

• Analyze potential neighbors:1. Calculate all vector distances across A and B2. Select the nearest pair (a,b)3. If d(a,b) < min( d(a,C(a), d(b,C(b) ) then Accept

•

= Number of pairs found / k

Estimate

Observed values of

Optimality

Multiple optima (=plateaus)• Very similar result (<0.3% diff. in MSE)• CI-values significantly high (9%)• Finds one of the near-optimal solutions

Experiments

Bridge

160

165

170

175

180

185

190

0 1 10 100 1000 10000Time

MSE

Random Swap

Repeated k-means(10.000 repeats)

510

40

50

3020

0

70

60

73 K-means(single)

61

(10k)(28)

(172)(146)

(13k)

(825)

(78k) (387k) (813k)

(365)

(3k)

Time-versus-distortionN=4096, k=256, d=16, N/k=16, 5.4

Missa

5.00

5.25

5.50

5.75

6.00

6.25

6.50

0 1 10 100 1000 10000Time

MSE

Random Swap

Repeated k-means(10.000 repeats)

510

4050

3020

0

70

60

8093

88

K-means(single)

(10k)(136)(62)

(501)

(23k)

(1632)

(70k) (940k)(135k)


Birch1

4.50

5.00

5.50

6.00

6.50

7.00

0 1 10 100 1000 10000Time

MSE

Random Swap

Repeated k-means(2500 repeats)5

10

20

0

7

3

K-means(single)

(489)

(22)

(2)

(43)

(606)

Time-versus-distortionN=100.000, k=100, d=2, N/k=1000, 5.8

Birch2

0

2

4

6

8

10

0 1 10 100 1000 10000Time

MSE

Random Swap

Repeated k-means(2500 repeats)

5

10

20

0

18K-means

(single)

(845) (2500)

9(41)

(2067)

(64)

(34)

(1)


Europe

0

2

4

6

8

10

12

14

16

1 10 100 1000 10000 100000 1000000Time

MSE

Random Swap

Repeated k-means(500 repeats)

51040 30 20 0

6041 39

K-means(single)

100

80

(576k)(58k)(305)

(11)

(24)

(44)(1888) (9k)


Unbalance

0

20

40

60

80

100

120

140

0.01 0.1 1 10 100 1000Time

MSE

Randomswap

Repeated k-means

1

2

3

0

3K-means(single)

(116)

2

(98)

(43)

(20)

(4)

1

(3.219)

1

(20.000)


0 %

5 %

10 %

15 %

160 165 170 175 180 185MSE

164.69

Random swapBridge

179.82Repeated k-means

Variation of results

0 %

5 %

10 %

15 %

4.0 4.5 5.0 5.5 6.0 6.5MSE

4.64

Random swapBirch1

5.48Repeated k-means

3

45 6 7

89 10 11

Variation of results

• k-means (KM)

• k-means++• repeated k-means (RKM)• x-means• agglomerative clustering (AC)• random swap (RS)• global k-means

• genetic algorithm

Comparison of algorithms

D. Arthur and S. Vassilvitskii, "K-means++: the advantages of careful seeding", ACM-SIAM Symp. on Discrete Algorithms (SODA’07), New Orleans, LA, 1027-1035, January, 2007.

D. Pelleg, and A. Moore, "X-means: Extending k- means with efficient estimation of the number of clusters", Int. Conf. on Machine Learning, (ICML’00), Stanford, CA, USA, June 2000.

P. Fränti, T. Kaukoranta, D.-F. Shen and K.-S. Chang, "Fast and memory efficient implementation of the exact PNN", IEEE Trans. on Image Processing, 9 (5), 773- 777, May 2000.

A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, 451-461, 2003.

P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pat. Rec. Let., 21 (1), 61-68, January 2000.

Processing time

Clustering quality

Conclusions

What we learned?

1. Random swap is efficient algorithm

2. It does not converge to sub-optimal result

3. Expected processing has dependency:•Linear O(N) dependency on the size of data•Quadratic O(k2) on the number of clusters•Inverse O(1/) on the neighborhood size•Logarithmic O(log w) on the number of swaps

References•

P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, 2018.

•

P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), 358-369, 2000.

•

P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139-1148, August 1998.

•

P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.

•

Pseudo code: http://cs.uef.fi/pages/franti/research/rs.txt

http://cs.uef.fi/pages/franti/research/rs.txt

Supporting material

Implementations available:(C, Matlab, Java, Javascript, R and Python)http://www.uef.fi/web/machine-learning/software

Interactive animation:http://cs.uef.fi/sipu/animator/

Clusterator:http://cs.uef.fi/sipu/clusterator

http://www.uef.fi/web/machine-learning/software

http://cs.uef.fi/sipu/animator/

http://cs.uef.fi/sipu/clusterator

random swap algorithm - itä-suomen yliopistocs.uef.fi/pages/franti/cluster/randomswap.pdf · 2018....

Documents