learning to recommend with user generated content
TRANSCRIPT
Learning to Recommend with User
Generated Content
Yueshen Xu1, Zhiyuan Chen2, Jianwei Yin1, Zizheng
Wu1 and Taojun Yao1
1School of Computer Science and Technology, Zhejiang University
2University of Illinois at Chicago
[email protected]; [email protected]
2015/6/9 1Zhejiang University
Junxiang Wang
Yueshen Xu, WAIM, 2015
Outline
Background
Introduction
Related Work
Recommendation with UGC in User Side
Matrix Factorization
Topic Analysis for Items through Topic Modeling
User Interest Distribution
User Topic Regularization
Recommendation with UGC in Item Side
Item Topic Regularization
Experiment and Evaluation
Reference2015/6/9 2Zhejiang University
Keywords: Recommendation, User
Generated Content, Topic Modeling, Matrix
Factorization
Yueshen Xu, WAIM, 2015
Background
Recommendation in General
Collaborative Filtering (CF)
− Matrix Factorization (MF)
Content-based approach
− Pandora music genome project
2015/6/9 3Zhejiang University
User Generated Content (UGC)
social tag, review, question answer, blog, tweet, etc
tag-based / review-based recommendation
Problems in existing works not every web site has all kinds of UGC
the item-word / user-word space is highly sparse
synonym & polysemy
most works only focus on a single kind of UGC
item1 item2 item3 item4
user1 r11
user2 r22
user3
user4 r41 r44
user5 r53
Yueshen Xu, WAIM, 2015
Background
2015/6/9 4
Other related work social / trust-based recommendation helpful but limited
− no social relationship Amazon, Ebay, Newegg, Jingdong, Expedia, etc
− UGC √
Description/Profile-based recommendation− static content
− fail to distinguish different items
− unrelated to a user’s preference
UGC, in contrast: emphasize an item’s features
− those words received frequently
increase dynamically
associated with a user’s preference / interested topics − I like science fiction films, so I wrote a lot of movie reviews that contain
words like fiction, tech, super, hero, robotic, machine
natural chunking (social tag)
Yueshen Xu, WAIM, 2015
Contribution
2015/6/9 5Zhejiang University
Main contributions
We study UGC in learning user interests and learning item features
We propose a novel user-oriented collaborative filtering model and a
novel item-oriented collaborative filtering model
We propose a way to utilize different types of UGC in a unified way in
recommender systems
We expand an existing dataset by crawling new data, and conduct
sufficient experiments on three real-world datasets, which attest the
effectiveness of proposed models.
Yueshen Xu, WAIM, 2015
Recommendation with UGC in User
Side
2015/6/9 Zhejiang University 6
Topic analysis for items through topic modeling Terms in UGC are combined together to compose the term set W
each item owns an aggregated term list
pLSA/LDA/HDP/nCRP/PAM: all are OK
𝚯 = 𝜽𝒋 (𝜽𝒋 = 𝜃𝒋𝟏, 𝜃𝒋𝟐, … , 𝜃𝒋𝑲, ) is the topic/aspect distribution
of document j (i.e., item j) what we need
User Interest Distribution Cluster items into groups according to the similarity of their
topics (K-Means/GMM/K-Medoid: all are OK)
Yueshen Xu, WAIM, 2015
Recommendation with UGC in User
Side
2015/6/9 Zhejiang University 7
User Interest Distribution (cont.)
Intuition : find items with similar topics, although they are in
different categories: clothes, gadget, book, toy, DVD all about
Harry Potter
Aggregate each user’s consumption records on each cluster 𝐶𝑞
𝑆𝑖𝑚 𝑖, 𝑙 =𝑃𝐶𝐶, 𝒄𝒐𝒔𝒊𝒏𝒆 𝑜𝑟 𝐾𝐿 𝑑𝑖𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒
the weight of 𝑙 as one of user 𝑖’s
neighbors: 𝑒𝑖𝑙 𝑖, 𝑙 =𝑆𝑖𝑚(𝑖,𝑙)
𝑙′∈𝐿(𝑖) 𝑆𝑖𝑚(𝑖,𝑙′)
A novel regularization : user topic regularization (UTR)
𝑚𝑖𝑛 𝑖=1𝑀 ∥ 𝑈𝑖 − 𝑙∈𝐿(𝑖) 𝑒𝑖𝑙𝑈𝑙 ∥𝐹
2
Intuition: users with similar interested topics tend to have similar latent features
user 𝑖
user 𝑙
Yueshen Xu, WAIM, 2015
Recommendation with UGC in User
Side
2015/6/9 Zhejiang University 8
A new MF model (UTR-MF)
𝑚𝑖𝑛𝑈,𝑉𝐿 = 𝑖=1𝑀 𝑗=1
𝑁 𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖𝑇𝑉𝑗)
2 +𝜆𝑈
2∥ 𝑈 ∥𝐹
2 +𝜆𝑉
2∥ 𝑉 ∥𝐹
2 +𝛼
2 𝑖=1𝑀 ∥ 𝑈𝑖 − 𝑙∈𝐿(𝑖) 𝑒𝑖𝑙𝑈𝑙 ∥𝐹
2
gradient descent/ coordinate descent
Gradient Descent
𝜕𝐿
𝜕𝑈𝑖= 𝑗=1
𝑁 𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖𝑇𝑉𝑗)(−𝑉𝑗) + 𝜆𝑈𝑈𝑖 + 𝛼 𝑈𝑖 − 𝑙∈𝐿 𝑖 𝑒𝑖𝑙𝑈𝑖 +
𝛼 𝑔∈𝐺(𝑖)(𝑈𝑔 − 𝑙′∈𝐿 𝑔 𝑒𝑔𝑙′𝑈𝑙′) × (−𝑒𝑔𝑖)
𝜕𝐿
𝜕𝑉𝑗= 𝑖=1
𝑀 𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖𝑇𝑉𝑗)(−𝑈𝑖) + 𝜆𝑉𝑉𝑗
𝐺(𝑖) is a set consisting of those users whose neighborhoods
include user 𝑖
Yueshen Xu, WAIM, 2015
Recommendation with UGC in Item
Side
2015/6/9 9
Intuition for items: similar UGC similar topic
distribution similar latent feature
𝑆𝑖𝑚 𝑗, ℎ : similarity between item j and h PCC, cosine or KL
divergence
𝑤 𝑗, ℎ =𝑆𝑖𝑚(𝑗,ℎ)
ℎ′∈𝐻(𝑗) 𝑆𝑖𝑚(𝑗,ℎ′)
A novel regularization: item topic regularization (ITR)
𝑚𝑖𝑛 𝑗=1𝑁 ∥ 𝑉𝑗 − ℎ∈𝐻(𝑗)𝑤𝑗ℎ𝑉ℎ ∥𝐹
2
A new MF model (ITR-MF):
‒ 𝑚𝑖𝑛𝑈,𝑉𝐿 = 𝑖=1𝑀 𝑗=1
𝑁 𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖𝑇𝑉𝑗)
2 +𝜆𝑈
2∥ 𝑈 ∥𝐹
2 +𝜆𝑉
2∥ 𝑉 ∥𝐹
2 +𝛼
2 𝑗=1𝑁 ∥ 𝑉𝑗 − ℎ∈𝐻(𝑗)𝑤𝑗ℎ𝑉ℎ ∥𝐹
2
A natural combination: UTR + ITR
gradient descent/coordinate descent
Yueshen Xu, WAIM, 2015
Experiment and Evaluation
2015/6/9 Zhejiang University 10
Real-world dataset Movielens (social tag + rating)
Last.fm (expanded, social tag + rating)
Yelp (review + rating)
Evaluation Metric: RMSE and MAE Compared baseline models: UserCF, ItemCF, PMF, TF-IDF MF, CTR
In social tag case:
Yueshen Xu, WAIM, 2015
Experiment and Evaluation
2015/6/9 Zhejiang University 11
Experimental results (cont.)
UTR-MF and ITR-MF outperform other baselines in all cases
A detailed example, in Last.fm dataset, ITR-MF achieves 14%
improvement than PMF and 8% improvement than CTR
ITR-MF behaves better than UTR-MF: a user’s preference is harder to
infer. The main reason is probably that a user’s preference can change
dynamically
Yueshen Xu, WAIM, 2015
Experiment and Evaluation
2015/6/9 Zhejiang University 12
Experimental results (cont.) in review case the improvement is similar to that in the social tag
case
UTR-MF and ITR-MF outperform other baselines in all cases
ITR-MF behaves better than UTR-MF: a user’s preference is harder to
infer
The improvements are significant according to the paired t-test (𝑝 <0.001)
For more details, please refer to our paper
Yueshen Xu, WAIM, 2015
Conclusion
Conclusion
We demonstrate that different types of UGC can be integrated
into the MF model in a unified way
User preferences and item features can be learned from UGC
text
Our two novel regularization terms are effective to model user
preferences and item features
Our two MF-extended models can achieve large improvements
Future Work
Study other types of UGC, such as tweet and blog, to learn user
preferences and influential events in SNS
2015/6/9 Zhejiang University 13
Yueshen Xu, WAIM, 2015
Reference
[1] Adomavicius, G. and Tuzhilin, A.: Toward the next generation of recommender systems: A survey of
the state-of-the-art and possible extensions. In: IEEE TKDE, 17(6):734-749 (2005)
[2] Aggarwal, C.C. and Zhai, C.: Mining Text Data. In: Springer, New York (2012)
[3] Bischo, K., Firan, C.S., Nejdl, W., and Paiu, R.: Can all tags be used for search?In: ACM CIKM, pp.
193-202 (2008)
[4] Blei, D.M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation. In: JMLR,3:993-1022 (2003)
[5] Cantador, I., Brusilovsky, P., and Ku ik, T.: HetRec workshop. In: ACM RecSys,New York, USA (2011)
[6] Chen, C., Zheng, X., Wang, Y., Hong, F. and Lin, Z.: Context-Aware Collaborative Topic Regression
with Social Matrix Factorization for Recommender Systems. In: AAAI, pp. 9-15 (2014)
[7] Fang, Y. and Si, L.: Matrix co-factorization for recommendation with rich side information and implicit
feedback. In: HetRec (workshop of RecSys), pp. 65-69 (2011)
[8] Griths, T. L. and Steyvers, M.: Finding Scientific Topics. In: PNAS (2004)
[9] Koren, Y., Bell, R., and Volinsky, C.: Matrix factorization techniques for recommender systems. In:
Computer, 42(8):30-37 (2009)
[10] Liang, H., Xu, Y., Li, Y., Nayak, R., and Tao, X.: Connecting users and items with weighted tags for
personalized item recommendations. In: Hypertext, pp.51-60(2010)
[11] Liu, X. and Aberer, K.: SoCo: a social network aided context-aware recommendersystem. In: WWW,
pp. 781-802 (2013)
[12] Ma, H., Zhou, D., Liu, C., Lyu, M.R., and King, I.: Recommender systems with social regularization.
In: ACM WSDM, pp. 287-296 (2011)
2015/6/9 Zhejiang University 14
Yueshen Xu, WAIM, 2015
Reference
[13] McAuley, J.J. and Leskovec, J.: Hidden factors and hidden topics: understanding rating
dimensions with review text. In: ACM RecSys, pp. 165-172 (2013)
[14] Moens, M.-F., Li, J. and Chua, T.-S. : Mining User Generated Content. In: Chapman and Hall/CRC
(2014)
[15] Pandora. Music genome project. In: http://www.pandora.com/about/mgp
[16] Purushotham, S. and Liu, Y.: Collaborative topic regression with social matrix factorization for
recommendation systems. In: IEEE ICML, pp. 759-766 (2012)
[17] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J.: Grouplens: An open
architecture for collaborative filtering of netnews. In: CSCW, pp. 175-186 (1994)
[18] Rovi. Recommendations api version 2.0. In:
http://proddoc.rovicorp.com/mashery/index.php/Recommendations
[19] Salakhutdinov, R. and Mnih, A.: Probabilistic matrix factorization. In: NIPS
[20] Sarwar, B., Karypis, G., Konstan, J., and Reidl, J.: Item-based collaborative tering
recommendation algorithm. In: WWW, pp. 285-295 (2001)
[21] Wang, C. and Blei, D.M.: Collaborative topic modeling for recommending scientic articles. In: ACM
SIGKDD, pp. 448-456 (2011)
[22] Yang, X., Steck, H., and Liu, Y.: Circle-based recommendation in online social networks. In: ACM
SIGKDD, pp. 1267-1275 (2012)
[23] Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y. and Ma, S.: Explicit factor models for explainable
recommendation based on phrase-level sentiment analysis. In: ACM SIGIR, pp. 83-92 (2014)
2015/6/9 Zhejiang University 15
Yueshen Xu, WAIM, 2015
Thank you!
Q&A
2015/6/9 16Zhejiang University