collaborative co- clustering across multiple social...
Post on 10-Apr-2018
220 Views
Preview:
TRANSCRIPT
Collaborative co-clustering across
multiple social mediaFengjiao Wang Shuyang Lin Philip S. Yu
Department of Computer Science, University of Illinois at Chicago
share location
review
find nearby friends
social
discover POI
Main activities in location-based social networks
“check-in”
Aligned networks
same user
Twitter & Foursquare Kickstarter & Twitter
Foursquare
Kickstarter
Aligned networks
post tweet
check-in post tweet
create
back upsame userinclude link
Twitter & Foursquare Kickstarter & Twitter
Challenges
Twitter & Foursquare
source
challenge I: multiple sources
challenge II: multiple types of objects
multiple sources
source
sourceobject
sourceobject
object
Problem formalization
relationship…
…
relationshipview
…
…
source1 source3
featureview featureview
featureview
Information Information
source2
featureview
Information
relationshipview
…
…
featureview featureview
Co-regularizedcollaborativeco-clustering
spectralclusteringspectralco-clustering
co-regularization
co-clustering
…
……
…clustering
……
……
clustering
……
…
…
Co-regularized collaborative co-clustering
Objective function the other one into two feature matrices. We follow the no-tations of spectral co-clustering. Assume there are m objectA and n object B, the relationship matrix between objectA and object B is E. Feature matrices in feature views arecomputed using Gaussian kernel. We denote kernel matrixof object A in view w as K(w)
A . We also denote kernel matrix
of object B in view p as K(p)B . Our goal is to find co-cluster
result X, where X can be solved using E, K(w)A , and K
(p)B .
2.3 Co-regularized Collaborative Co-clusteringModel
In the proposed multi-view setting, objects are groupedinto co-cluster and clusters under di↵erent views. In orderto achieve better co-cluster by leveraging relationship view
and feature views, we can consider this problem as findingmaximum agreement between di↵erent views with regardsto the same types of objects. The dissimilarity between co-cluster and clusters are then measured by a co-regularizationterm, which is performed on the eigenvector matrices of co-cluster and clusters. The reasons why eigenvector matricesare utilized are listed below. First, Eigenvector matrices inspectral clustering/co-clustering represent the graph parti-tion rules, which is essentially the discriminative informationof clusters. Second, eigenvector matrix for two types of ob-jects in spectral co-clustering could be decomposed into twoeigenvector matrices corresponding to each type of objects.This would allow us to construct the co-regularization termfor each type of objects under di↵erent views.
Frobenius norm is employed to realize this co-regularization.Assume two eigenvector matrices are U(a) and U(b)), theFrobenius norm measures the distance between them, notedas D(U(a)
,U(b)), where
D(U(a),U(b)) = �Tr(U(a)U(a)TU(b)U(b)T )
Subsequently, maximizing agreement between two views isto minimize �Tr(U(a)U(a)TU(b)U(b)T ).
2.3.1 Co-regularized collaborative co-clusteringWe formulate the collaborative co-clustering problem as
an optimization problem which tries to find optimized graphpartitioning for spectral co-clustering and spectral cluster-ing in multiple views and maximize agreement between re-
lationship view and feature views. That being said, we willfirst perform spectral co-clustering on the relationship view,from which we can obtain a partition result represented asU(v). Also, spectral clustering will be performed on eachfeature view for A and B, where U
(w)A for 1 w W , and
U(p)B for 1 p P can be obtained. Then, U(v), U(w)
A ,
and U(p)B will be used as initial value for the optimization
algorithm. The objective function for optimization is shownin Equation 1. It is composed of three parts, i.e., the objec-tive function of co-clustering in relationship view, the objec-tive function in feature views, and objective function for theco-regularization. In particular, the first item is the spectralco-clustering objective function. The second and third itemsstand for the objective function under features views for Aand B, respectively. Note that multiple feature views existfor A and B, so they are both in a sum manner. As can beseen, the fourth and fifth items are in the shape of Frobe-nius norm, and therefore represent the objective functionfor co-regularization. It is worth to mention that since wecombine the objective function together as shown, it brings
us the merit of simultaneous clustering, co-clustering, andcollaboratively optimization.
minU(v),U
(w)A ,U
(p)B
tr(U(v)TL(v)U(v)) +X
1wW
tr(U(w)TA L
(w)A U
(w)A )
+X
1pP
tr(U(p)TB L
(p)B U
(p)B )
� �
X
1wW
tr(T1U(v)U(v)TT1
TU(w)A U
(w)TA )
� �
X
1pP
tr(T2U(v)U(v)TT2
TU(p)B U
(p)TB ) (1)
where U(w)TA U
(w)A = I, for 1 w W
U(p)TB U
(p)B = I, for 1 p P
L(v) =
"D
(v)r �E
�ET D(v)c
#
L(w)A = {D(w)
A }�1/2K(w)A {D(w)
A }�1/2, for 1 w W
L(p)B = {D(p)
B }�1/2K(p)B {D(p)
B }�1/2, for 1 p P
U(v) is eigenvector matrix in view v related to two typesof objects A and B. U
(w)A and U
(p)B are eigenvector matri-
ces in view w and view p related to object A and objectB respectively. L(v) is Laplacian matrix of co-clusteringin view v. L
(w)A and L
(p)B are Laplacian matrices of clus-
tering in view w and p. Matrices D(v)r , D
(v)c , D
(w)A , and
D(p)B are diagonal matrices, [D(v)
r ]ii =P
j Eij , [D(v)c ]ii =
Pj Eji, [D(w)
A ]i =P
j [K(w)A ]ij , and [D(p)
B ]i =P
j [K(p)B ]ij .
In co-regularization, to make eigenvector matrices U(v) andU
(w)A /U(p)
B compatible with each other, we define transitionmatrix T1 =
⇥Im⇥m 0m⇥n
⇤and T2 =
⇥0n⇥m In⇥n
⇤to
transfer U(v) to another matrix which only contain informa-tion in terms of the same type of objects with U
(w)A /U(p)
B .
Eigenvector matrix U(v) can be split into two matrices U(v)A
and U(v)B by equation U(v) =
"U
(v)A
U(v)B
#, where matrix U
(v)A
and U(v)B are eigenvector matrices corresponding to object
A and object B in view v respectively. Hyperparameter � isdefined to tradeo↵ original clusterings and co-regularizationterm.The proposed problem is a non-convex optimization prob-
lem since two non-convex terms for co-regularization are in-troduced in the objective function. We use alternating min-imization technique to solve this problem, since alternatingminimization provides a useful framework for iterative opti-mization in non-convex problem. In details, every eigenvec-tor matrix will be updated alternatively with other eigen-vector matrices being held fixed in each iteration. Since ananalytical solution can be found for each eigenvector matrixduring alternating minimization, repeating this process it-eratively will converge asymptotically in general. However,it is not the aim of this paper to prove this property. Wealso provide an intuitive interpretation of the proposed al-gorithm. Take object A as an example, the final clusters ofobject A should preserve original relationship with anothertype of object B and also be refined by clusters in otherviews. To avoid a clustering result that is either too closeto original co-cluster or too close to clusters in other views
the other one into two feature matrices. We follow the no-tations of spectral co-clustering. Assume there are m objectA and n object B, the relationship matrix between objectA and object B is E. Feature matrices in feature views arecomputed using Gaussian kernel. We denote kernel matrixof object A in view w as K(w)
A . We also denote kernel matrix
of object B in view p as K(p)B . Our goal is to find co-cluster
result X, where X can be solved using E, K(w)A , and K
(p)B .
2.3 Co-regularized Collaborative Co-clusteringModel
In the proposed multi-view setting, objects are groupedinto co-cluster and clusters under di↵erent views. In orderto achieve better co-cluster by leveraging relationship view
and feature views, we can consider this problem as findingmaximum agreement between di↵erent views with regardsto the same types of objects. The dissimilarity between co-cluster and clusters are then measured by a co-regularizationterm, which is performed on the eigenvector matrices of co-cluster and clusters. The reasons why eigenvector matricesare utilized are listed below. First, Eigenvector matrices inspectral clustering/co-clustering represent the graph parti-tion rules, which is essentially the discriminative informationof clusters. Second, eigenvector matrix for two types of ob-jects in spectral co-clustering could be decomposed into twoeigenvector matrices corresponding to each type of objects.This would allow us to construct the co-regularization termfor each type of objects under di↵erent views.
Frobenius norm is employed to realize this co-regularization.Assume two eigenvector matrices are U(a) and U(b)), theFrobenius norm measures the distance between them, notedas D(U(a)
,U(b)), where
D(U(a),U(b)) = �Tr(U(a)U(a)TU(b)U(b)T )
Subsequently, maximizing agreement between two views isto minimize �Tr(U(a)U(a)TU(b)U(b)T ).
2.3.1 Co-regularized collaborative co-clusteringWe formulate the collaborative co-clustering problem as
an optimization problem which tries to find optimized graphpartitioning for spectral co-clustering and spectral cluster-ing in multiple views and maximize agreement between re-
lationship view and feature views. That being said, we willfirst perform spectral co-clustering on the relationship view,from which we can obtain a partition result represented asU(v). Also, spectral clustering will be performed on eachfeature view for A and B, where U
(w)A for 1 w W , and
U(p)B for 1 p P can be obtained. Then, U(v), U(w)
A ,
and U(p)B will be used as initial value for the optimization
algorithm. The objective function for optimization is shownin Equation 1. It is composed of three parts, i.e., the objec-tive function of co-clustering in relationship view, the objec-tive function in feature views, and objective function for theco-regularization. In particular, the first item is the spectralco-clustering objective function. The second and third itemsstand for the objective function under features views for Aand B, respectively. Note that multiple feature views existfor A and B, so they are both in a sum manner. As can beseen, the fourth and fifth items are in the shape of Frobe-nius norm, and therefore represent the objective functionfor co-regularization. It is worth to mention that since wecombine the objective function together as shown, it brings
us the merit of simultaneous clustering, co-clustering, andcollaboratively optimization.
minU(v),U
(w)A ,U
(p)B
tr(U(v)TL(v)U(v)) +X
1wW
tr(U(w)TA L
(w)A U
(w)A )
+X
1pP
tr(U(p)TB L
(p)B U
(p)B )
� �
X
1wW
tr(T1U(v)U(v)TT1
TU(w)A U
(w)TA )
� �
X
1pP
tr(T2U(v)U(v)TT2
TU(p)B U
(p)TB ) (1)
where U(w)TA U
(w)A = I, for 1 w W
U(p)TB U
(p)B = I, for 1 p P
L(v) =
"D
(v)r �E
�ET D(v)c
#
L(w)A = {D(w)
A }�1/2K(w)A {D(w)
A }�1/2, for 1 w W
L(p)B = {D(p)
B }�1/2K(p)B {D(p)
B }�1/2, for 1 p P
U(v) is eigenvector matrix in view v related to two typesof objects A and B. U
(w)A and U
(p)B are eigenvector matri-
ces in view w and view p related to object A and objectB respectively. L(v) is Laplacian matrix of co-clusteringin view v. L
(w)A and L
(p)B are Laplacian matrices of clus-
tering in view w and p. Matrices D(v)r , D
(v)c , D
(w)A , and
D(p)B are diagonal matrices, [D(v)
r ]ii =P
j Eij , [D(v)c ]ii =
Pj Eji, [D(w)
A ]i =P
j [K(w)A ]ij , and [D(p)
B ]i =P
j [K(p)B ]ij .
In co-regularization, to make eigenvector matrices U(v) andU
(w)A /U(p)
B compatible with each other, we define transitionmatrix T1 =
⇥Im⇥m 0m⇥n
⇤and T2 =
⇥0n⇥m In⇥n
⇤to
transfer U(v) to another matrix which only contain informa-tion in terms of the same type of objects with U
(w)A /U(p)
B .
Eigenvector matrix U(v) can be split into two matrices U(v)A
and U(v)B by equation U(v) =
"U
(v)A
U(v)B
#, where matrix U
(v)A
and U(v)B are eigenvector matrices corresponding to object
A and object B in view v respectively. Hyperparameter � isdefined to tradeo↵ original clusterings and co-regularizationterm.The proposed problem is a non-convex optimization prob-
lem since two non-convex terms for co-regularization are in-troduced in the objective function. We use alternating min-imization technique to solve this problem, since alternatingminimization provides a useful framework for iterative opti-mization in non-convex problem. In details, every eigenvec-tor matrix will be updated alternatively with other eigen-vector matrices being held fixed in each iteration. Since ananalytical solution can be found for each eigenvector matrixduring alternating minimization, repeating this process it-eratively will converge asymptotically in general. However,it is not the aim of this paper to prove this property. Wealso provide an intuitive interpretation of the proposed al-gorithm. Take object A as an example, the final clusters ofobject A should preserve original relationship with anothertype of object B and also be refined by clusters in otherviews. To avoid a clustering result that is either too closeto original co-cluster or too close to clusters in other views
co-clustering
clustering
co-regulariza/on
Experiment setup: dataset
• ReutersMulAlingualdataset
• 1200documents,6categorieswithregardstotopics.DocumentswriGenin5differentlanguagescorrespondingto5views.
• WebKBdataset(Cornell,Texas,Washington,Wisconsin)
• 4sub-datasetsofwebpagesextractedfromcomputersciencedepartmentof4universiAes:Cornell,Texas,Washing,andWisconsin.Webpagesareclassifiedinto5categories(student,project,course,staff,faculty).
• Foursquare+TwiGerdataset
• 881Foursquareuserswhocheckedin10,285Amesat780venues.
Reuters dataset
0"
0.2"
0.4"
0.6"
0.8"
1"
1.2"
KL_doc" KL_word" KL_total"
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Co:CoClust"
(a) KL divergence
0"
0.1"
0.2"
0.3"
0.4"
0.5"
15" 20" 25" 30"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"Mul?@view_pair"Mul?@view_centroid"Co@CoClust"
(b) NMI
0.64%
0.66%
0.68%
0.7%
0.72%
0.74%
0.76%
15% 20% 25% 30%cluster(number(
CoClust%ITCC%MssrIcc%MssrIIcc%Semi%Best_view%Feature_concat%MulBCview_pair%MulBCview_centroid%CoCCoClust%
(c) RI
0"
0.05"
0.1"
0.15"
0.2"
0.25"
15" 20" 25" 30"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"Mul>?view_pair"Mul>?view_centroid"Co?CoClust"
(d) Accuracy
Figure 2: Reuters Dataset
0"
0.2"
0.4"
0.6"
0.8"
1"
KL_doc" KL_word" KL_total"
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Co:CoClust"
(a) KL divergence
0"
0.1"
0.2"
0.3"
0.4"
0.5"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"Mul@Aview_pair"Mul@Aview_centroid"CoACoClust"
(b) NMI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulBCview_pair"MulBCview_centroid"CoCCoClust"
(c) RI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
3" 4" 5" 6"cluster(number(
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Semi"
Best_view"
Feature_concat"
MulBCview_pair"
MulBCview_centroid"
(d) Accuracy
Figure 3: Cornell Dataset
Cornell dataset
0"
0.2"
0.4"
0.6"
0.8"
1"
1.2"
KL_doc" KL_word" KL_total"
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Co:CoClust"
(a) KL divergence
0"
0.1"
0.2"
0.3"
0.4"
0.5"
15" 20" 25" 30"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"Mul?@view_pair"Mul?@view_centroid"Co@CoClust"
(b) NMI
0.64%
0.66%
0.68%
0.7%
0.72%
0.74%
0.76%
15% 20% 25% 30%cluster(number(
CoClust%ITCC%MssrIcc%MssrIIcc%Semi%Best_view%Feature_concat%MulBCview_pair%MulBCview_centroid%CoCCoClust%
(c) RI
0"
0.05"
0.1"
0.15"
0.2"
0.25"
15" 20" 25" 30"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"Mul>?view_pair"Mul>?view_centroid"Co?CoClust"
(d) Accuracy
Figure 2: Reuters Dataset
0"
0.2"
0.4"
0.6"
0.8"
1"
KL_doc" KL_word" KL_total"
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Co:CoClust"
(a) KL divergence
0"
0.1"
0.2"
0.3"
0.4"
0.5"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"Mul@Aview_pair"Mul@Aview_centroid"CoACoClust"
(b) NMI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulBCview_pair"MulBCview_centroid"CoCCoClust"
(c) RI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
3" 4" 5" 6"cluster(number(
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Semi"
Best_view"
Feature_concat"
MulBCview_pair"
MulBCview_centroid"
(d) Accuracy
Figure 3: Cornell Dataset
Texas dataset
0"0.2"0.4"0.6"0.8"1"
1.2"1.4"
KL_doc" KL_word" KL_total"
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Co:CoClust"
(a) KL divergence
0"0.05"0.1"0.15"0.2"0.25"0.3"0.35"0.4"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"Mul@Aview_pair"Mul@Aview_centroid"CoACoClust"
(b) NMI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulBCview_pair"MulBCview_centroid"CoCCoClust"
(c) RI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulBCview_pair"MulBCview_centroid"CoCCoClust"
(d) Accuracy
Figure 4: Texas Dataset
0"
0.5"
1"
1.5"
2"
KL_doc" KL_word" KL_total"
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Co8CoClust"
(a) KL divergence
0"0.05"0.1"0.15"0.2"0.25"0.3"0.35"0.4"
4" 5" 6" 7"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulABview_pair"MulABview_centroid"CoBCoClust"
(b) NMI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
4" 5" 6" 7"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulBCview_pair"MulBCview_centroid"CoCCoClust"
(c) RI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"
4" 5" 6" 7"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulABview_pair"MulABview_centroid"CoBCoClust"
(d) Accuracy
Figure 5: Washington Dataset
Washington dataset
0"0.2"0.4"0.6"0.8"1"
1.2"1.4"
KL_doc" KL_word" KL_total"
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Co:CoClust"
(a) KL divergence
0"0.05"0.1"0.15"0.2"0.25"0.3"0.35"0.4"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"Mul@Aview_pair"Mul@Aview_centroid"CoACoClust"
(b) NMI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulBCview_pair"MulBCview_centroid"CoCCoClust"
(c) RI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
3" 4" 5" 6"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulBCview_pair"MulBCview_centroid"CoCCoClust"
(d) Accuracy
Figure 4: Texas Dataset
0"
0.5"
1"
1.5"
2"
KL_doc" KL_word" KL_total"
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Co8CoClust"
(a) KL divergence
0"0.05"0.1"0.15"0.2"0.25"0.3"0.35"0.4"
4" 5" 6" 7"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulABview_pair"MulABview_centroid"CoBCoClust"
(b) NMI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
4" 5" 6" 7"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulBCview_pair"MulBCview_centroid"CoCCoClust"
(c) RI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"
4" 5" 6" 7"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulABview_pair"MulABview_centroid"CoBCoClust"
(d) Accuracy
Figure 5: Washington Dataset
Wisconsin dataset
0"
0.2"
0.4"
0.6"
0.8"
1"
1.2"
KL_doc" KL_word" KL_total"
CoClust"
ITCC"
MssrIcc"
MssrIIcc"
Co:CoClust"
(a) KL divergence
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
2" 3" 4" 5"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"Mul@Aview_pair"Mul@Aview_centroid"CoACoClust"
(b) NMI
0"
0.2"
0.4"
0.6"
0.8"
1"
2" 3" 4" 5"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulABview_pair"MulABview_centroid"CoBCoClust"
(c) RI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"
2" 3" 4" 5"cluster(number(
CoClust"ITCC"MssrIcc"MssrIIcc"Semi"Best_view"Feature_concat"MulBCview_pair"MulBCview_centroid"CoCCoClust"
(d) Accuracy
Figure 6: Wisconsin Dataset
over the best of the baselines. Figure 2(c) gives RI re-sults, where Co-CoClust shows the best performance onall four cases. Co-CoClust achieves good overall Accu-racy. Especially, when cluster number equals to 15 and 20,Co-CoClust achieves 28% improvement over best of base-lines. When number of clusters equals to 25, Co-CoClustobtains second best results.
Evaluation results on Cornell dataset is shown in Fig-ure 3. Figure 3(a) shows KL divergence of document andword clusters, where Co-CoClust achieves much better re-sults compared with other baselines. Quality of documentclusters is being evaluated by other metrics with varyingnumber of clusters from 3 to 6, as shown in Figure 3(b)-(d).Co-CoClust achieves the best results on 3 cases and onesecond best result when number of clusters equals to 3 inFigure 3(b) . In Figure 3(c) and Figure 3(d), Co-CoClustachieve the best results on RI and Accuracy.
Figure 4 demonstrates evaluation results onTexas dataset.Again, the proposed approach performs well on KL divergencemetric in Figure 4(a). We can also observe from Figure 4(b),Figure 4(c), and Figure 4(d) that Co-CoClust consistentlyachieves better results. For instance, in Figure 4(d), Co-CoClust improves second best method by 10% averagingover di↵erent cluster number.
Evaluation results onWashington dataset is summarizedin Figure 5. Co-CoClust achieves consistently better re-sults than baselines by a significant margin on KL divergence.In the evaluation of NMI, RI, and Accuracy, Co-CoClustperforms better than co-clustering based methods and multi-view based methods in most of cases.
Figure 6 illustrates performance of the proposed approachand 9 baselines on Wisconsin dataset. Co-CoClust ob-tains the best results in KL divergence in Figure 6(a). Co-CoClust also achieves better results on NMI, RI, and Ac-
curacy overall compared with baseline algorithms.To sum up, the proposed Co-CoClust performs consis-
tently better than single view clustering, co-clustering, andmulti-view clustering algorithms on both social networksand document-word datasets. It proves that the proposedmodel steadily outperforms most of the state of the art al-gorithms in combining multiple source information for co-clustering problems.
3.2.2 Social network datasetThis dataset severs as a case study for a practical ap-
plication in social network. Unlike Reuters and webKBdatasets which have ground truth, social network datasetdid not provide any ground truth for user clusters or clus-ters of social media objects. Moreover, it is hard to manuallylabel users or social media objects with high quality. There-fore, in this paper, we employ a di↵erent evaluation strategyto show the performance of the proposed Co-CoClust onsocial network dataset. The idea is to justify the e�cacyand robustness of the proposed approach on utilizing par-tially observed information for the task of clustering place.With an increasing percentage of random information loss,we want to evaluate how the proposed algorithm performsto deal with the information loss by means of compensatingthe loss via multi-sources learning. Figure 7 shows compar-ison results when the percentage of information loss rangesfrom 10% to 60%. The “ground truth” is produced by k-means algorithm on full information in terms of place. Fig-ure 7(a) depicts NMI of the proposed approach and base-lines. In general, all of the other algorithms su↵ers moredegradation when more information being hidden from ex-periment. However, Co-CoClust consistently performs thebest in the perspectives of NMI and Accuracy as shown inFigure 7(a) and Figure 7(c), which demonstrates the ro-
Foursquare+Twitter dataset
0"
0.1"
0.2"
0.3"
0.4"
0.5"
10%" 20%" 30%" 40%" 50%" 60%"Percentage)of)hold)out)informa2on)
CoClust)
ITCC)
MssrIcc)
MssrIIcc)
Semi)
Best_view)
Feature_concat)
Mul2>view_pair)
Mul2>view_centroid)
Co>CoClust)
(a) NMI
0"
0.2"
0.4"
0.6"
0.8"
10%" 20%" 30%" 40%" 50%" 60%"Percentage)of)hold)out)informa2on)
CoClust)
ITCC)
MssrIcc)
MssrIIcc)
Semi)
Best_view)
Feature_concat)
Mul2>view_pair)
Mul2>view_centroid)
Co>CoClust)
(b) RI
0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"
10%" 20%" 30%" 40%" 50%" 60%"Percentage)of)hold)out)informa2on)
CoClust)
ITCC)
MssrIcc)
MssrIIcc)
Semi)
Best_view)
Feature_concat)
Mul2>view_pair)
Mul2>view_centroid)
Co>CoClust)
(c) Accuracy
Figure 7: Foursquare+Twitter
bustness of Co-CoClust. Co-CoClust obtains the secondbest results when evaluated by RI as shown in Figure 7(b).Overall, Co-CoCluster outperforms other algorithms in com-bining multi-source information for co-clustering problems.
4. RELATED WORKCo-clustering algorithm earns a lot of attentions from re-
search communities in data mining and machine learningdue to its capability to cluster two types of objects or ob-ject and feature simultaneously [9, 10, 16]. Co-clusteringalgorithms also prove to be a powerful data mining tech-nique on practical applications such as text mining, socialrecommendation, mining networks. Spectral co-clusteringalgorithm proposed by Dhillon et al. ([9]) utilizes graphpartitioning technique for the co-clustering of the bipartitegraph of documents and words. It attracts a lot of attentionssince the objective function is well formulated and could besolved as a eigenvector problem. Other co-clustering algo-rithms are also proposed to embrace di↵erent technique onsimultaneously clustering two types of objects ([10]). Re-
cently, researchers have developed many new co-clusteringalgorithms to add constraints ([16, 18]) or side information([15, 20, 22]). [16] integrates additional information as con-straints into a semi-supervised co-clustering algorithm. [18]proposed information theoretic co-clustering framework fortext mining. In [15], the authors claim that using metadataas constraint in co-clustering achieves better performancethan metadata-driven and metadata-injected. [12] works onco-clustering multiple domains of objects, achieving clus-terings of multiple types of objects by linear combination.However, researches on developing a reasonable yet flexiblemethod to handle additional information other than usingthem as constraints are limited.The rise of multi-representation data create an opportu-
nity for multi-view learning. Many multi-view based cluster-ing algorithms ([3, 5, 11, 13, 14]) have also been proposed.[3] describes an extension of the classical k-means algorithmand EM algorithm for multi-view setting. [5] uses CanonicalCorrelation Analysis to extract relevant features from mul-tiple views, and then apply classical clustering algorithm.Non-negative matrix factorization technique is also exploitedin multi-view setting ([11, 14]). [14] assumes that not all ex-amples are presented in all views and propose a non-negativematrix factorization based model for clustering under par-tially observed data. Multi-view clustering also sees its ap-plication in web 2.0 items ([13]). Similar with this paperwhich utilize social media objects in multi-view clustering,we did co-clustering on social media objects.Recently, several algorithms in multi-view setting are pro-
posed for spectral clustering ([1, 2, 8, 19, 21, 23]). [23]generalizes single view normalized cut approach to multipleviews to obtain a graph cut by random walk based formula-tion. In [8], the authors focus on two view case of multi-viewclustering by creating a bipartite graph. Spectral clusteringis applied on the constructed bipartite graph. Instead ofworking on original features, [1] takes di↵erent clusteringscoming from di↵erent sources as input and reconcile themto find a final clustering. It is suggested that they couldachieve better performance by directly working on cluster-ings instead of original features of multiple source informa-tion. Another paper [2] also works on clusterings instead oforiginal features. They employ co-regularization techniqueto force clusterings learned from di↵erent views of the dataagree with each other. Working on clusterings instead oforiginal features shows good performance on clustering onetype of objects. Inspired by the success in clustering, in thispaper, we are working on co-clustering two types of objectson multiple source information.Other multi-view clustering algorithms ([6, 17]) also uti-
lize co-regularization technique. [17] implements multi-viewregularization of unlabeled examples to perform semi-supervisedlearning. [4] works on clustering multiple types of objectswith their relationship matrices. Relationship matrices areutilized to compute co-similarity matrices. Then, di↵erentco-similarity matrix with respect to the same type of ob-jects are combined for generating clustering result. Thereare two major di↵erences between [4] and our work. First ofall, they transfer co-clustering problem of multiple types ofobjects into clustering problems via co-similarity matrices.However, in this paper we proposed a direct co-clusteringframework to simultaneously clustering multiple types of ob-jects in the original space. Secondly, paper [4] implementsthe idea of combining multiple source information by com-
NMI
• ObservaAons
• DatasparsenessonLBSN.
• Co-clusteringusersandplacescouldbenefitlocaAonrecommendaAon.
• Idea
• CastproblemoflearninguserandplacepaGernsfrommulAplesources(alignedFoursquareandTwiGernetworks)asmulA-viewlearningproblem.
• Co-regularizingclustersofthesametypeofobjectsacrossdifferentsourcestolearnco-clusterintargetsource.
• UAlizespectralco-clusteringandspectralclusteringtolearnco-cluster(targetsource)andclusters(othersources)simultaneously.
Conclusion
top related