kck-means a clustering method based on kernel canonical correlation analysis
DESCRIPTION
KCK-means A Clustering Method based on Kernel Canonical Correlation Analysis. Dr. Yingjie Tian. Outline. Motivation & Challenges KCCA, Kernel Canonical Correlation Analysis Our method: KCK-means Experiments Conclusions. Outline. Motivation & Challenges - PowerPoint PPT PresentationTRANSCRIPT
KCK-meansA Clustering Method based on Kernel
Canonical Correlation AnalysisDr. Yingjie Tian
Outline
• Motivation & Challenges• KCCA, Kernel Canonical Correlation Analysis• Our method: KCK-means• Experiments• Conclusions
Outline
• Motivation & Challenges• KCCA, Kernel Canonical Correlation Analysis• Our method: KCK-means• Experiments• Conclusions
Motivation
• Previous Similarity Metrics– Euclidean distance– Squared Mahalanobis distance– Mutual neighbor distance– …
• Fail when there are non-linear correlation between attributes
Motivation• In some interesting application domains, attributes can be
naturally split into two subsets, either of which suffices for learning
• Intuitively, there may be some projections can reveal the ground truth in these two views
• KCCA is a technique that can extract common features from a pair of multivariate data
• It is the most promising candidate
Outline
• Motivation & Challenges• KCCA, Kernel Canonical Correlation Analysis• Our method: KCK-means• Experiments• Conclusions
Canonical Correlation Analysis(1/2)
• X = {x1, x2, … , xl} and Y = {y1, y2, … , yl} denote two views
• CCA finds projection vectors wx and wy max the correlation coefficient between and
• That is:
• Cxy is the between-sets covariance matrix of X and Y, Cxx and Cyy are within-sets covariance matrices.
Txw X T
yw Y
,arg max
x y
Tx xy y
T Tw wx xx x y yy y
w C w
w C w w C w
1. .
1
Tx xx xTy yy y
w C ww r t
w C w
Canonical Correlation Analysis(2/2)
• Cyy is invertible, then solving
for the generalized eigenvectors, then we can obtain the sequence of wx’s and then find the corresponding wy’s by using
11y yy yx xw C C w
1 2xy yy yx x xx xC C C w C w
Why Kernel CCA
• Why use Kernel extension of CCA?– CCA may not extract useful descriptors of the data
because of its linearity– In order to find nonlinear correlated projections
• Sx = { }, Sy= { }KCCA maps xi and yi to and
• then and are treated as instances to run CCA routine.
1 2( ( ), ( ),..., ( ))lx x x 1 2( ( ), ( ),..., ( ))ly y y
( )ix ( )iy( )ix ( )iy
KCCA
• Objective function:
where αand βare two desirable projectionsKx = and Ky= are two kernel matrices
• We use Partial Gram-Schmidt Orthogonolisation (PGSO) to approximate the kernel matrices
2 2,max
Tx y
T Tx y
K K
K Kα β
α β
α α β β
Tx xS S T
y yS S
How to solve KCCA
• α can be solved from
is used for regularization β can be obtained from
• a number of αand β (and corresponding λ) can be found
1 1 2( ) ( )x y y xK I K K I K α α
11( )y xK I Kβ α
Outline
• Motivation & Challenges• KCCA, Kernel Canonical Correlation Analysis• Our method: KCK-means• Experiments• Conclusions
Project into ground truth
• Two kernel functions are defined asx(xi, xj) =
y(yi, yj) =• For any x* and y*, their projections can be obt
ained by P(x*)=x(xi, X) α
and P(y*)=y(yi, Y) β
for two views respectively
( ) ( )Tx i x jx x ( ) ( )T
y i y jy y
Why use other pairs of projections?
• In accordance to (Zhou, Z.H, et al), two views are conditionally independent given the class label, the biggest α and β should be in accordance with the ground-truth.
• However, in real-world, such conditional independence rarely holds, and information conveyed by the other pairs of correlated projections should not be omitted
Similarity measure based on KCCA
μ is a parameter which regulates the proportion of the distance between the original instances and the distance of their projections
2 2
1
( , ) ( ) ( )m
sim i j i j k i k jk
f x x x x P x P x
KCK-means for 2-views
• Our method is proposed based on K-means• In fact, we just extend K-means by adding the
process of solving the fsim
KCK-means for 1-view
• However, two-view data sets are rare in real world• (Nigam, K. et al.) points out that if there is sufficient r
edundancy among the features, we are able to identify a fairly reasonable division of them
• Similarly, we try to randomly split 1-view data set into two parts and treat them as the 2 views of the original data set to perform KCK-means.
Outline
• Motivation & Challenges• KCCA, Kernel Canonical Correlation Analysis• Our method: KCK-means• Experiments• Conclusions
Evaluation Metrics
• Pair-Precision:
• Mutual Information:
• Intuitive-Precision:
( )
( 1) / 2
num correct decisionsaccuracy
n n
( , ) ( ) ( ) ( , )MI A B H A H B H A B
21
( ) ( ) log ( ( ))n
i ii
H A p x p x
1( ) max({ | ( ) } )i i jP A x label x C
A
Results on 2-views and 1-views
Influence of η
• There is a precision parameter (or stopping criterion)—η in the PGSO algorithm
• The dimensions of the projections rely on η
• We also investigate its influence on the performance of KCK-means
Influence of η (2-views)
40%
50%
60%
70%
80%
90%
100%
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1η
P-P
reci
sion
Kmeans(View1)Kmeans(View2)Agglom(View1)Agglom(View2)KCK-means(View1)KCK-means(View2)
60%
65%
70%
75%
80%
85%
90%
95%
100%
η
I-P
reci
sion
Kmeans(View1)Kmeans(View2)Agglom(View1)Agglom(View2)KCK-means(View1)KCK-means(View2)
0
0.1
0.2
0.3
0.4
0.5
0.6
η
Mut
ual
In
form
atio
n
Kmeans(View1)Kmeans(View2)Agglom(View1)Agglom(View2)KCK-means(View1)KCK-means(View2)
Influence of η (1-view)
60%
65%
70%
75%
80%
85%
90%
95%
100%
η
I-P
reci
sion
Kmeans
Agglom
KCK-means
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
η
Mut
ual I
nfor
mat
ion Kmeans
Agglom
KCK-means
60%
65%
70%
75%
80%
85%
90%
95%
100%
η
P-P
reci
sion
KmeansAgglomKCK-means
Outline
• Motivation & Challenges• KCCA, Kernel Canonical Correlation Analysis• Our method: KCK-means• Experiments• Conclusions
Conclusions(1/2)
• Results reflect that by using KCK-means, much better quality of clusters could be obtained than those obtained from K-means and agglomerative hierarchical clustering
• We also note that when μ is set to be very small or even zero, the performance of KCK-means is the best
• It means using the projections obtained from KCCA the similarity between instances already can be measured good enough
Conclusions(1/2)• However, when the number of dimensions of the pro
jections obtained from KCCA is very small, the performance of KCK-means descends very much even worse than those of the two traditional clustering algorithms.
• It means, in real-world applications, information conveyed by the other pairs of correlated projections should be also considered
• All in all, Dimensions of projections used in KCK-means must be enough
Thank You !