kck-means a clustering method based on kernel canonical correlation analysis

KCK-meansA Clustering Method based on Kernel

Canonical Correlation AnalysisDr. Yingjie Tian

Outline

• Motivation & Challenges• KCCA, Kernel Canonical Correlation Analysis• Our method: KCK-means• Experiments• Conclusions

Motivation

• Previous Similarity Metrics– Euclidean distance– Squared Mahalanobis distance– Mutual neighbor distance– …

• Fail when there are non-linear correlation between attributes

Motivation• In some interesting application domains, attributes can be

naturally split into two subsets, either of which suffices for learning

• Intuitively, there may be some projections can reveal the ground truth in these two views

• KCCA is a technique that can extract common features from a pair of multivariate data

• It is the most promising candidate

Outline


Canonical Correlation Analysis(1/2)

• X = {x1, x2, … , xl} and Y = {y1, y2, … , yl} denote two views

• CCA finds projection vectors wx and wy max the correlation coefficient between and

• That is:

• Cxy is the between-sets covariance matrix of X and Y, Cxx and Cyy are within-sets covariance matrices.

Txw X T

yw Y

,arg max

x y

Tx xy y

T Tw wx xx x y yy y

w C w

w C w w C w

1. .

1

Tx xx xTy yy y

w C ww r t

w C w

Canonical Correlation Analysis(2/2)

• Cyy is invertible, then solving

for the generalized eigenvectors, then we can obtain the sequence of wx’s and then find the corresponding wy’s by using

11y yy yx xw C C w

1 2xy yy yx x xx xC C C w C w

Why Kernel CCA

• Why use Kernel extension of CCA?– CCA may not extract useful descriptors of the data

because of its linearity– In order to find nonlinear correlated projections

• Sx = { }, Sy= { }KCCA maps xi and yi to and

• then and are treated as instances to run CCA routine.

1 2( ( ), ( ),..., ( ))lx x x 1 2( ( ), ( ),..., ( ))ly y y

( )ix ( )iy( )ix ( )iy

KCCA

• Objective function:

where αand βare two desirable projectionsKx = and Ky= are two kernel matrices

• We use Partial Gram-Schmidt Orthogonolisation (PGSO) to approximate the kernel matrices

2 2,max

Tx y

T Tx y

K K

K Kα β

α β

α α β β

Tx xS S T

y yS S

How to solve KCCA

• α can be solved from

is used for regularization β can be obtained from

• a number of αand β (and corresponding λ) can be found

1 1 2( ) ( )x y y xK I K K I K α α

11( )y xK I Kβ α

Outline


Project into ground truth

• Two kernel functions are defined asx(xi, xj) =

y(yi, yj) =• For any x* and y*, their projections can be obt

ained by P(x*)=x(xi, X) α

and P(y*)=y(yi, Y) β

for two views respectively

( ) ( )Tx i x jx x ( ) ( )T

y i y jy y

Why use other pairs of projections?

• In accordance to (Zhou, Z.H, et al), two views are conditionally independent given the class label, the biggest α and β should be in accordance with the ground-truth.

• However, in real-world, such conditional independence rarely holds, and information conveyed by the other pairs of correlated projections should not be omitted

Similarity measure based on KCCA

μ is a parameter which regulates the proportion of the distance between the original instances and the distance of their projections

2 2

1

( , ) ( ) ( )m

sim i j i j k i k jk

f x x x x P x P x

KCK-means for 2-views

• Our method is proposed based on K-means• In fact, we just extend K-means by adding the

process of solving the fsim

KCK-means for 1-view

• However, two-view data sets are rare in real world• (Nigam, K. et al.) points out that if there is sufficient r

edundancy among the features, we are able to identify a fairly reasonable division of them

• Similarly, we try to randomly split 1-view data set into two parts and treat them as the 2 views of the original data set to perform KCK-means.

C.L.Chen

Please note that, in the second version of KCK-means, the procedure of clustering is combining into one process finally, which is different from the first version clustering respectively.

Outline


Evaluation Metrics

• Pair-Precision:

• Mutual Information:

• Intuitive-Precision:

( )

( 1) / 2

num correct decisionsaccuracy

n n

( , ) ( ) ( ) ( , )MI A B H A H B H A B

21

( ) ( ) log ( ( ))n

i ii

H A p x p x

1( ) max({ | ( ) } )i i jP A x label x C

A

Results on 2-views and 1-views

Influence of η

• There is a precision parameter (or stopping criterion)—η in the PGSO algorithm

• The dimensions of the projections rely on η

• We also investigate its influence on the performance of KCK-means

Influence of η (2-views)

40%

50%

60%

70%

80%

90%

100%

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1η

P-P

reci

sion

Kmeans(View1)Kmeans(View2)Agglom(View1)Agglom(View2)KCK-means(View1)KCK-means(View2)

60%

65%

70%

75%

80%

85%

90%

95%

100%

η

I-P

reci

sion


0

0.1

0.2

0.3

0.4

0.5

0.6

η

Mut

ual

In

form

atio

n


Influence of η (1-view)

60%

65%

70%

75%

80%

85%

90%

95%

100%

η

I-P

reci

sion

Kmeans

Agglom

KCK-means

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

η

Mut

ual I

nfor

mat

ion Kmeans

Agglom

KCK-means

60%

65%

70%

75%

80%

85%

90%

95%

100%

η

P-P

reci

sion

KmeansAgglomKCK-means

Outline


Conclusions(1/2)

• Results reflect that by using KCK-means, much better quality of clusters could be obtained than those obtained from K-means and agglomerative hierarchical clustering

• We also note that when μ is set to be very small or even zero, the performance of KCK-means is the best

• It means using the projections obtained from KCCA the similarity between instances already can be measured good enough

Conclusions(1/2)• However, when the number of dimensions of the pro

jections obtained from KCCA is very small, the performance of KCK-means descends very much even worse than those of the two traditional clustering algorithms.

• It means, in real-world applications, information conveyed by the other pairs of correlated projections should be also considered

• All in all, Dimensions of projections used in KCK-means must be enough

Thank You !

kck-means a clustering method based on kernel canonical correlation analysis

Documents