conmf: exploiting user comments for clustering web2.0 items
DESCRIPTION
CoNMF: Exploiting User Comments for Clustering Web2.0 Items. Presenter: He Xiangnan 28 June 2013 Email: [email protected] School of Computing National University of Singapore. Introduction. Motivations: Users comment on items based on their own interests. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/1.jpg)
CoNMF: Exploiting User Comments for Clustering Web2.0 Items
Presenter: He Xiangnan
28 June 2013Email: [email protected]
School of Computing
National University of Singapore
![Page 2: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/2.jpg)
Xiangnan He
Introduction
• Motivations:– Users comment on items based on their own interests.
– Most users’ interests are limited.
– The categories of items can be inferred from the comments.
• Proposed problem:– Clustering items by exploiting user comments.
• Applications:– Improve search diversity.
– Automatic tag generation from comments.
– Group-based recommendation
2WING, NUS
![Page 3: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/3.jpg)
Xiangnan He
Challenges
• Traditional solution:– Represent items as a feature space.
– Apply any clustering algorithm, e.g. k-means.
• Key challenges:– Items have heterogeneous features:
1. Own features (e.g. words for articles, pixels for images)
2. Comments Usernames Textual contents
– Simply concatenate all features does not preform well.
– How to meaningfully combine the heterogeneous views to produce better clustering (i.e. multi-view clustering)?
3WING, NUS
![Page 4: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/4.jpg)
Xiangnan He
Proposed solution
• Extend NMF (Nonnegative Matrix Factorization) to support multi-view clustering…
4WING, NUS
![Page 5: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/5.jpg)
Xiangnan He
NMF (Non-negative Matrix Factorization)
5WING, NUS
• Factorize data matrix V (#doc×#words) as:
–where W is #doc×k and H is k×#words, and each entry is non-negative
• Alternating optimization:– With Lagrange multipliers, differentiate on W and H respectively.
Local optimum, not global!
• Goal is minimizing the objective function:
–where || || denotes the Frobenius norm
![Page 6: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/6.jpg)
Xiangnan He
• Difference with SVD(LSI):
Characteristics of NMF• Matrix Factorization with a non-negative constraint
Reduce the dimension of the data; derive the latent space
Characteristic SVD NMF
Orthogonal basis Yes No
Negative entry Yes No
Post clustering Yes No
• Theoretically proved suitable for clustering (Chis et al. 2005)• Practically shown superior performance than SVD and k-means in
document clustering (Xu et al. 2003)
![Page 7: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/7.jpg)
Xiangnan He
Extensions of NMF
• Relationships with other clustering algorithms:– K-means: Orthogonal NMF = K-means– PLSI: KL-Divergence NMF = PLSI– Spectral clustering
• Extensions:–Tri-factor of NMF( V = W S H ) (Ding et al. 2006)–NMF with sparsity constraints (Hoyer 2004)–NMF with graph regularization (Cai et al. 2011)– However, studies on NMF-based multi-view clustering approaches are quite limited. (Liu et al. 2013)
• My proposal:– Extend NMF to support multi-view clustering
7WING, NUS
![Page 8: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/8.jpg)
Xiangnan He
Proposed solution - CoNMF
• Idea:– Couple the factorization process of NMF
• Example:– Single NMF:
Factorization equation : Objective function: Constraints: all entries of W and H are non-negative.
8WING, NUS
- 2-view CoNMF: Factorization equation:
Objective function:
![Page 9: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/9.jpg)
Xiangnan He
CoNMF Framework
– Mutual-based: Point-wise:
Cluster-wise:
9WING, NUS
• Objective function:
–Similar alternating optimization with Lagrange multipliers can solve it.
• Coupling the factorization process of multiple matrices(i.e. views) via regularization.
• Different options of regularization:– Centroid-based (Liu et al. 2013):
![Page 10: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/10.jpg)
Xiangnan He
Experiments• Last.fm dataset:
• 3-views:
• Ground-truth:– Music type of each artist provided by Last.fm
• Evaluation metrics:– Accuracy and F1
• Average performance of 20 runs.10WING, NUS
#Items #Users #Comments #Clusters
9,694 131,898 2,500,271 21
View #Items #Features Token type
Items-Desc. words 9,694 14,076 TF – IDF
Items-Comm. words 9,694 31,172 TF – IDF
Items-Users 9,694 131,898 Boolean
![Page 11: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/11.jpg)
Xiangnan He
Statistics of datasets
11WING, NUS
Statistics of #items/user Statistics of #clusters/user
P(T<=3) = 0.6229P(T<=5) = 0.8474P(T<=10) = 0.9854
Verify our assumption: each user usually comments on limited music types.
![Page 12: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/12.jpg)
Xiangnan He
Experimental results (Accuracy)
Initialization Method Desc. Comm. Users Comb.Random kmeans 0.25 0.28 0.34 0.415
12WING, NUS
SVD 0.29 0.31 0.28 0.294
Random NMF 0.24 0.27 0.32 0.313
K-means NMF 0.26 0.32 0.40 0.417
K-means CoNMF – point 0.460
K-means CoNMF – cluster 0.420
NMF Multi-NMF(SDM'13) 0.369
Random MM-LDA(WSDM'09) 0.366
1. Users>Comm.>Desc., while combined is best.2. SVD performs badly on users (non-textual).3. Users>Comm.>Desc., while combined does worse.4. Initialization is important for NMF.
5. CoNMF-point performs best.
6. Other two state-of-the-art baselines.
![Page 13: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/13.jpg)
Xiangnan He
Experimental results (F1)
13WING, NUS
Initialization Method Desc. Comm. Users Combined
Random kmeans 0.15 0.16 0.15 0.254
SVD 0.25 0.25 0.24 0.249
Random NMF 0.13 0.18 0.21 0.216
K-means NMF 0.15 0.21 0.27 0.298
K-means CoNMF –point 0.320
K-means CoNMF – cluster 0.284
NMF Multi-NMF(SDM'13) 0.265
Random MM-LDA(WSDM'09) 0.286
![Page 14: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/14.jpg)
Xiangnan He
Conclusions
• Comments benefit clustering.• Mining different views from the comments is
important:– The two views (commenting words and users) contribute differently for clustering.
– For this Last.fm dataset, users is more useful.
– Combining all views works best.
• For NMF-based methods, initialization is important.
14WING, NUS
![Page 15: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/15.jpg)
Xiangnan He
Ongoing
• More experiments on other datasets.• Improve the CoNMF framework through adding the
sparseness constraints.• The influence of normalization on CoNMF.
15WING, NUS
![Page 16: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/16.jpg)
Xiangnan He
Thanks!
QA?
16WING, NUS
![Page 17: CoNMF: Exploiting User Comments for Clustering Web2.0 Items](https://reader033.vdocuments.us/reader033/viewer/2022061617/5681592a550346895dc652e2/html5/thumbnails/17.jpg)
Xiangnan He
References(I)
• Ding Chris, Xiaofeng He, and Horst D. Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proc. SIAM Data Mining Conf 2005.
• Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proc. of SIGIR 2003
• Chris Ding, Tao Li, Wei Peng. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proc. of SIGKDD 2006
• Patrik O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Researh 2004
• Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011
• Jialu Liu, Chi Wang, Jing Gao and Jiawei Han. 2013. Multi-View Clustering via Joint Nonnegative Matrix Factorization, In Proceedings of SIAM Data Mining Conference (SDM’13)
17WING, NUS