hyper: a flexible and extensible probabilistic framework for hybrid recommender systems pigi kouki,...

Post on 12-Jan-2016

220 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HyPER: A Flexible and Extensible Probabilistic Framework for

Hybrid Recommender Systems

Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki, Lise Getoor

University of California, Santa Cruz University of Maryland, College Park

San Jose State University

2

Motivation• Increasing amount of data useful for recommendations

content

social demographic

ratings

3

Multiple Data Sources

• Content – [Gunawardana and Meek, RecSys 2009]– [Forbes and Zhu, RecSys 2011]– [de Campos et al., IJAR 51(7) 2010]

• Social relationships– [Ma et al., WSDM 2011]– [Liu et al., DSS 55(3) 2013]

Combining ratings with otherdata sources improves performance

4

Multiple Data Sources

• Content – [Gunawardana and Meek, RecSys 2009]– [Forbes and Zhu, RecSys 2011]– [de Campos et al., IJAR 51(7) 2010]

• Social relationships– [Ma et al., WSDM 2011]– [Liu et al., DSS 55(3) 2013]

Combining ratings with otherdata sources improves performance

5

Multiple Data Sources

• Content – [Gunawardana and Meek, RecSys 2009]– [Forbes and Zhu, RecSys 2011]– [de Campos et al., IJAR 51(7) 2010]

• Social relationships– [Ma et al., WSDM 2011]– [Liu et al., DSS 55(3) 2013]

Combining ratings with otherdata sources improves performance

6

Multiple Data Sources

• Review text – [McAuley & Leskovec, RecSys 2013]– [Ling et al., RecSys, 2014]

• Tags and labels– [Guy et al., SIGIR 2010]

• Feedback– [Sedhain et al., RecSys, 2014]

Combining ratings with otherdata sources improves performance

#cool #neat #ok #sucks

7

Multiple Data Sources

• Review text – [McAuley & Leskovec, RecSys 2013]– [Ling et al., RecSys, 2014]

• Tags and labels– [Guy et al., SIGIR 2010]

• Feedback– [Sedhain et al., RecSys, 2014]

Combining ratings with otherdata sources improves performance

#cool #neat #ok #sucks

8

Multiple Data Sources

• Review text – [McAuley & Leskovec, RecSys 2013]– [Ling et al., RecSys, 2014]

• Tags and labels– [Guy et al., SIGIR 2010]

• Feedback– [Sedhain et al., RecSys, 2014]

Combining ratings with otherdata sources improves performance

#cool #neat #ok #sucks

9

Multiple Recommenders

• [Jahrer et al., KDD 2010]• [Burke, In The Adaptive Web, 2007]

Combining predictions of multiple recommenders also improves performance

“Predictive accuracy is substantially improved when blending multiple predictors”-[Bell et al., The BellKor Solution to the Netflix Prize, 2007]

See also:

10

Desiderata for Hybrid Systems

• To get the best performance, we should make use of all available data sources and algorithms

• We need a framework that is:– General

• Combines arbitrary data modalities• Combines multiple recommenders• problem and data-agnostic

– Extensible to new information sources/recommenders– Scalable to large data sets

11

Desiderata for Hybrid Systems

• To get the best performance, we should make use of all available data sources and algorithms

• We need a framework that is:– General

• Combines arbitrary data modalities• Combines multiple recommenders• problem and data-agnostic

– Extensible to new information sources/recommenders– Scalable to large data sets

12

Desiderata for Hybrid Systems

• To get the best performance, we should make use of all available data sources and algorithms

• We need a framework that is:– General

• Combines arbitrary data modalities• Combines multiple recommenders• problem and data-agnostic

– Extensible to new information sources/recommenders– Scalable to large data sets

13

General Hybrid Recommendersin the Literature

• Existing hybrid systems, though powerful, typically fall short on either generality, extensibility, or scalability– Often combine collaborative and/or content-based methods with

each other or just one other data modality (cf. previous slides)

– Some systems can leverage heterogeneous data• [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014]

• Probabilistic graphical modeling approaches are typically more general, less scalable– Bayesian networks [de Campos et al., IJAR 51(7) 2010]

– Markov logic networks [Hoxha & Rettinger, ICMLA 2013]

14

General Hybrid Recommendersin the Literature

• Existing hybrid systems, though powerful, typically fall short on either generality, extensibility, or scalability– Often combine collaborative and/or content-based methods with

each other or just one other data modality (cf. previous slides)

– Some systems can leverage heterogeneous data• [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014]

• Probabilistic graphical modeling approaches are typically more general, less scalable– Bayesian networks [de Campos et al., IJAR 51(7) 2010]

– Markov logic networks [Hoxha & Rettinger, ICMLA 2013]

15

Our Approach

• A general, extensible, scalable recommender framework

• Leverages advances in statistical relational learning– Probabilistic soft logic [Bach et al., UAI 2013, ArXiv 2015]

• Inspired by recent work in drug-target interaction prediction [Fakhraei et al., Transactions on Computational Biology and Bioinformatics 11(5) 2014]

We propose HyPER: Hybrid Probabilistic Extensible Recommender

16

Hybrid Modeling with HyPER

Data Source

Recommender

3

4

Predicted Ratings

17

Hybrid Modeling with HyPER

Data Source 1

Recommender

3

4

Predicted RatingsData Source 2

Data Source N

18

Hybrid Modeling with HyPER

Data Source 1

Recommender 1

3

4

Predicted RatingsData Source 2

Data Source N

Recommender 2

Recommender M

HyPER

19

HyPER: High-Level Approach

• User-item ratings viewed as a weighted bipartite graph

• Build hybrid model by adding links to encode additional information– multiple user and item similarities, social

information,…

• Predict ratings by reasoning over the graph, via a graphical model

20

HyPER: High-Level Approach

• User-item ratings viewed as a weighted bipartite graph

• Build hybrid model by adding links to encode additional information– multiple user and item similarities, social

information,…

• Predict ratings by reasoning over the graph, via a graphical model

21

HyPER: High-Level Approach

• User-item ratings viewed as a weighted bipartite graph

• Build hybrid model by adding links to encode additional information– multiple user and item similarities, social

information,…

• Predict ratings by reasoning over the graph, via a graphical model

22

Extended Recommendation Graph

23

Extended Recommendation Graph

24

Extended Recommendation Graph

25

Extended Recommendation Graph

26

Extended Recommendation Graph

27

Modeling and Reasoning over the Graph

• Hinge-loss Markov random fields (HL-MRFs) [Bach et al., UAI 2013]

– Exact, efficient, and scalable inference– Continuous random variables– Models defined by PSL programs

• Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015]

– Statistical relational learning system– Logical probabilistic programming interface – Templating language for HL-MRFs

28

Modeling and Reasoning over the Graph

• Hinge-loss Markov random fields (HL-MRFs) [Bach et al., UAI 2013]

– Exact, efficient, and scalable inference– Continuous random variables– Models defined by PSL programs

• Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015]

– Statistical relational learning system– Logical probabilistic programming interface – Templating language for HL-MRFs

29

Hinge-loss Markov Random Fields

Conditional random field over continuous random variablesbetween 0 and 1

30

Hinge-loss Markov Random Fields

Conditional random field over continuous random variablesbetween 0 and 1

Feature functions are hinge loss functions

31

Hinge-loss Markov Random Fields

Feature functions are hinge loss functions

Conditional random field over continuous random variablesbetween 0 and 1

32

Hinge-loss Markov Random Fields

Feature functions are hinge loss functions

Conditional random field over continuous random variablesbetween 0 and 1

Linear function

33

Hinge-loss Markov Random Fields

Feature functions are hinge loss functions

Conditional random field over continuous random variablesbetween 0 and 1

Linear function

34

Hinge-loss Markov Random Fields

Feature functions are hinge loss functions

Conditional random field over continuous random variablesbetween 0 and 1

Linear function

2

35

Hinge-loss Markov Random Fields

Feature functions are hinge loss functions

Conditional random field over continuous random variablesbetween 0 and 1

Hinge losses encode the distance to satisfactionfor each instantiated rule

2

Linear function

36

Efficient Inference in HL-MRFs

• Energy function is convex, can find a global MAP state

• The alternating direction method of multipliers (ADMM) is used for efficient and scalable inference

37

Probabilistic Soft Logic

• Statistical relational learning language• Uses first-order logical rules • Τemplates HL-MRFs

logical operators

predicatesweight

w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)

38

Probabilistic Soft Logic

• Statistical relational learning language• Uses first-order logical rules• Τemplates HL-MRFs

predicatesweight

w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)

39

Probabilistic Soft Logic

• Statistical relational learning language• Uses first-order logical rules• Τemplates HL-MRFs

logical operators

predicatesweight

w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)

40

Probabilistic Soft Logic

• Statistical relational learning language• Uses first-order logical rules • Τemplates HL-MRFs

predicatesweight

w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)

logical operators

41

Probabilistic Soft Logic

• Converts rules to hinge-loss potentials

• PSL program = rules + data• Open source: http://psl.umiacs.umd.edu

hinge-loss

LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)

42

Probabilistic Soft Logic

• Converts rules to hinge-loss potentials

• PSL program = rules + data• Open source: http://psl.umiacs.umd.edu

hinge-loss

LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)

max{LikesGenre(U, G) + IsGenre(M, G) - Rating(U, M) -1, 0}

43

Probabilistic Soft Logic

• Converts rules to hinge-loss potentials

• PSL program = rules + data• Open source: http://psl.umiacs.umd.edu

hinge-loss

LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)

max{LikesGenre(U, G) + IsGenre(M, G) - Rating(U, M) -1, 0}

44

Recommendations with HyPER

• Similar items get similar ratings from a user– e.g. cosine, adjusted cosine, Pearson, content

SimilarItems(i1,i2)

Rating(u,i1) = 5

Rating(u,i1) = ?

SimilarItemssim(i1, i2) && Rating(u, i1) Rating(u, i2)

45

Recommendations with HyPER• Similar users give similar ratings to an item– e.g. cosine, Pearson

SimilarUsers(u1,u2)

Rating(u1,i) = 4

Rating(u2,i) = ?

SimilarUserssim(u1, u2) && Rating(u1, i) Rating(u2, i)

46

• Mean-centering priors

• Additional data sources

• Leveraging existing recommenders• e.g. matrix factorization, item-based

Recommendations with HyPER

AverageUserRating(u) Rating(u, i)AverageItemRating(i) Rating(u, i)

47

• Mean-centering priors

• Social network links

• Leveraging existing recommenders• e.g. matrix factorization, item-based

Recommendations with HyPER

Friends(u1, u2) && Rating (u1, i) Rating(u2, i)

AverageUserRating(u) Rating(u, i)AverageItemRating(i) Rating(u, i)

48

• Mean-centering priors

• Social network links

• Leveraging existing recommenders• e.g. matrix factorization, item-based

Recommendations with HyPER

RatingRecommender(u, i) Rating(u, i)

Friends(u1, u2) && Rating (u1, i) Rating(u2, i)

AverageUserRating(u) Rating(u, i)AverageItemRating(i) Rating(u, i)

49

• Mean-centering priors

• Social network links

• Leveraging existing recommenders• e.g. matrix factorization, item-based

Recommendations with HyPER

Extensible to new data/algorithms – just add rules!

RatingRecommender(u, i) Rating(u, i)

Friends(u1, u2) && Rating (u1, i) Rating(u2, i)

AverageUserRating(u) Rating(u, i)AverageItemRating(i) Rating(u, i)

50

Balancing the Rules

• Balancing done through weights wj

• Higher wj indicates a more important rule

• Weight learning by approximating a gradient step in the conditional log-likelihood:

51

Experimental Validation

• Yelp academic dataset– ~34k users, ~3.6k items, ~99k ratings – ~81k friendships– 514 business categories

• Last.fm– ~1.8k users, ~17k items, ~92k ratings– ~12k friendships– ~9.7k artist tags

• Evaluation metrics: RMSE, MAEhttps://www.yelp.com/academic_datasethttp://grouplens.org/datasets/hetrec-2011/

52

Baselines

• Collaborative filtering systems– Item-based cf. [Ning et al., In Recommender Systems Handbook, 2015]

– Matrix factorization (MF) cf. [Koren et al., IEEE Computer 42(8) 2009]

– Bayesian probabilistic matrix factorization (BPMF) [Salakhutdinov & Mnih., ICML 2008]

• Hybrid Systems– Naïve hybrid (averaged predictions)– BPMF with social relations and content (BPMF-SRIC)

[Liu et al., DSS 55(3) 2013]

53

HyPER vs Baselines

• HyPER outperforms all other models in both datasets• Results statistically significant

54

HyPER Submodels: Mean-centering

• HyPER combined model beats individual rules

55

HyPER Submodels: User-based

• HyPER combined model beats/matches best individual rules• Similar story for item-based, content & social

56

• HyPER can combine different recommenders effectively• Results statistically significant better

Combining the Baselines

57

HyPER (All Rules)

• Combining all rules achieves the best performance in both datasets

58

Scaling to Large Datasets

• Parallel implementation for inference and learning based on ADMM [Bach et al, UAI 2013]

• Scaling to big-data applications:– perform inference in parallel on densely

connected subgraphs of the original graph– fully distributed implementation of ADMM

59

Conclusions

• HyPER is a general-purpose, extensible framework for hybrid recommender systems

• With HyPER, practitioners can define custom hybrid models for using all available data/algorithms, via logical rules in PSL

• HyPER outperforms existing techniques on two popular datasets

60

Conclusions

• HyPER is a general-purpose, extensible framework for hybrid recommender systems

• With HyPER, practitioners can define custom hybrid models for using all available data/algorithms, via logical rules in PSL

• HyPER outperforms existing techniques on two popular datasets

Thank you for your attention!

61

HyPER Submodels – Item-based, Content & Social

62

ReferencesX. Ning, C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation

methods. In Recommender Systems Handbook. 2nd edition, Springer, 2015S. Fakhraei, B. Huang, L. Raschid, and L. Getoor. Network-based drug-target interaction prediction with

probabilistic soft logic. Transactions on Computational Biology and Bioinformatics, 11(5), 2014.J. Liu, C. Wu, and W. Liu. Bayesian probabilistic matrix factorization with social relations and item contents for

recommendation. Decision Support Systems, 55(3), 2013.R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In

ICML, 2008.Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer,

42(8), 2009.A. Gunawardana and C. Meek. A unified approach to building hybrid recommender systems. In RecSys, 2009.R. Burke. Hybrid web recommender systems. In The Adaptive Web. Springer, 2007.L. de Campos, J. Fernandez-Luna, J. Huete, and M. Rueda-Morales. Combining content-based and collaborative

recommendations: A hybrid approach based on Bayesian networks. International Journal of Approximate Reasoning, 51(7), 2010.

M. Jahrer, A. Toscher, and R. Legenstein. Combining predictions for accurate recommender systems. In KDD, �2010.

63

ReferencesJ. Hoxha and A. Rettinger. First-order probabilistic model for hybrid recommendations. In ICMLA, 2013.S. H. Bach, B. Huang, B. London, and L. Getoor. Hinge-loss Markov random fields: Convex inference for structured

prediction. In UAI, 2013.S.H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random fields and probabilistic soft logic.

ArXiv:1505.04406 [cs.LG], 2015.A. P. Forbes and M. Zhu. Content-boosted matrix factorization for recommender systems: Experiments with recipe

recommendation. In RecSys, 2011.J. Chen, G. Chen, H. Zhang, J. Huang, and G. Zhao. Social recommendation based on multi-relational analysis. In WI-

IAT, 2012.R. Burke, F. Vahedian, and B. Mobasher. Hybrid recommendation in heterogeneous networks. In User Modeling,

Adaptation, and Personalization. Springer, 2014.J. Gemmell, T. S., B. Mobasher, and R. Burke. Resource recommendation in social annotation systems: A linear-

weighted hybrid approach. Journal of Computer and System Sciences, 78(4), 2012.X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han. Personalized entity recommendation: A

heterogeneous information network approach. In WSDM, 2014.H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In WSDM, 2011.J. McAuley and J. Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. In

RecSys, 2013.G. Ling, M. R. Lyu, and I. King. Ratings meet reviews, a combined approach to recommend. In RecSys, 2014.I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel. Social media recommendation based on people and tags. In

SIGIR, 2010.S. Sedhain, S. Sanner, D. Braziunas, L. Xie, and J. Christensen. Social collaborative ltering for cold-start

recommendations. In RecSys, 2014.

top related