recommendation systems=1for large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf ·...

38
Recommendation systems 1 Jake Hofman Yahoo! Research April 4, 2011 1 For large data sets @jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 1 / 31

Upload: others

Post on 03-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Recommendation systems1

Jake Hofman

Yahoo! Research

April 4, 2011

1For large data sets@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 1 / 31

Page 2: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Personalized recommendations

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 2 / 31

Page 3: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Personalized recommendations

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 3 / 31

Page 4: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

http://netflixprize.com

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 4 / 31

Page 5: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

http://netflixprize.com/rules

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 5 / 31

Page 6: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

http://netflixprize.com/faq

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 6 / 31

Page 7: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Netflix prize: results

http://en.wikipedia.org/wiki/Netflix Prize

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 7 / 31

Page 8: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Netflix prize: results

See [TJB09] and [Kor09] for more gory details.

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 8 / 31

Page 9: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Recommendation systems

High-level approaches:

• Content-based methods(e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7)

• Collaborative methods(e.g., “Users who liked this also liked”)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 9 / 31

Page 10: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Netflix prize: data

(userid, movieid, rating, date)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 10 / 31

Page 11: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Netflix prize: data

(movieid, year, title)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 10 / 31

Page 12: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Recommendation systems

High-level approaches:

• Content-based methods(e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7)

• Collaborative methods(e.g., “Users who liked this also liked”)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 11 / 31

Page 13: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Collaborative filtering

Memory-based(e.g., k-nearest neighbors)

Model-based(e.g., matrix factorization)

http://research.yahoo.com/pub/2859

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 12 / 31

Page 14: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Problem statement

• Given a set of past ratings Rui that user u gave item i• Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number

of stars for movie rating• Or we may infer implicit ratings from user actions, e.g.

Rui = 1 if u purchased i ; otherwise Rui = ?

• Make recommendations of several forms• Predict unseen item ratings for a particular user• Suggest items for a particular user• Suggest items similar to a particular item• . . .

• Compare to natural baselines• Guess global average for item ratings• Suggest globally popular items

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 13 / 31

Page 15: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Problem statement

• Given a set of past ratings Rui that user u gave item i• Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number

of stars for movie rating• Or we may infer implicit ratings from user actions, e.g.

Rui = 1 if u purchased i ; otherwise Rui = ?

• Make recommendations of several forms• Predict unseen item ratings for a particular user• Suggest items for a particular user• Suggest items similar to a particular item• . . .

• Compare to natural baselines• Guess global average for item ratings• Suggest globally popular items

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 13 / 31

Page 16: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Problem statement

• Given a set of past ratings Rui that user u gave item i• Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number

of stars for movie rating• Or we may infer implicit ratings from user actions, e.g.

Rui = 1 if u purchased i ; otherwise Rui = ?

• Make recommendations of several forms• Predict unseen item ratings for a particular user• Suggest items for a particular user• Suggest items similar to a particular item• . . .

• Compare to natural baselines• Guess global average for item ratings• Suggest globally popular items

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 13 / 31

Page 17: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

k-nearest neighbors

Key intuition:Take a local popularity vote amongst “similar” users

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 14 / 31

Page 18: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

k-nearest neighborsUser similarity

Quantify similarity as a function of users’ past ratings, e.g.

• Fraction of items u and v have in common

Suv =|Ru ∩ Rv ||Ru ∪ Rv |

=

∑i RuiRvi∑

i (Rui + Rvi − RuiRvi )(1)

Retain top-k most similar neighbors v for each user u

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 15 / 31

Page 19: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

k-nearest neighborsUser similarity

Quantify similarity as a function of users’ past ratings, e.g.

• Angle between rating vectors

Suv =Ru · Rv

|Ru| |Rv |=

∑i RuiRvi√∑

i R2ui

∑j R2

vj

(1)

Retain top-k most similar neighbors v for each user u

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 15 / 31

Page 20: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

k-nearest neighborsPredicted ratings

Predict unseen ratings R̂ui as a weighted vote over u’s neighbors’ratings for item i

R̂ui =

∑v RviSuv∑

v Suv(2)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 16 / 31

Page 21: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

k-nearest neighborsPractical notes

We expect most users have nothing in common, so calculatesimilarities as:

for each item i :for all pairs of users u, v that have rated i :

calculate Suv (if not already calculated)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 17 / 31

Page 22: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

k-nearest neighborsPractical notes

Alternatively, we can make recommendations using an item-basedapproach [LSY03]:

• Compute similarities Sij between all pairs of items

• Predict ratings with a weighted vote R̂ui =∑

j RujSij/∑

j Sij

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 17 / 31

Page 23: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

k-nearest neighborsPractical notes

Several (relatively) simple ways to scale:

• Sample a subset of ratings for each user (by, e.g., recency)

• Use MinHash to cluster users [DDGR07]

• Distribute calculations with MapReduce

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 17 / 31

Page 24: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Matrix factorization

Key intuition:Model item attributes as belonging to a set of unobserved “topics

and user preferences across these “topics”

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 18 / 31

Page 25: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Matrix factorizationLinear model

Start with a simple linear model:

R̂ui = b0︸︷︷︸global average

+ bu︸︷︷︸user bias

+ bi︸︷︷︸item bias

(3)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 19 / 31

Page 26: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Matrix factorizationLinear model

For example, we might predict that a harsh critic would score apopular movie as

R̂ui = 3.6︸︷︷︸global average

+ −0.5︸︷︷︸user bias

+ 0.8︸︷︷︸item bias

(3)

= 3.9 (4)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 19 / 31

Page 27: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Matrix factorizationLow-rank approximation

Add an interaction term:

R̂ui = b0︸︷︷︸global average

+ bu︸︷︷︸user bias

+ bi︸︷︷︸item bias

+ Wui︸︷︷︸user-item interaction

(5)

where Wui = Pu · Qi =∑

k PukQik

• Puk is user u’s preference for topic k

• Qik is item i ’s association with topic k

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 20 / 31

Page 28: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Matrix factorizationLoss function

Measure quality of model fit with squared-loss:

L =∑(u,i)

(R̂ui − Rui

)2(6)

=∑(u,i)

([PQ]ui − Rui )2 (7)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 21 / 31

Page 29: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Matrix factorizationOptimization

The loss is non-convex in (P,Q), so no global minimum exists

Instead we can optimize L iteratively, e.g.:

• Alternating least squares: update entire matrix P, holding Qfixed, and vice-versa

• Stochastic gradient descent: update individual rows Pu andQi for each observed Rui

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 22 / 31

Page 30: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Matrix factorizationAlternating least squares

L is convex in P (Q fixed), so take partials and solve linear systemfor updates:

0 =∂L∂P

→ P̂ = RQ[QTQ

]−1(8)

0 =∂L∂Q

→ Q̂ = RTP[PTP

]−1(9)

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 23 / 31

Page 31: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Matrix factorizationStochastic gradient descent

Alternatively, we can avoid inverting matrices by taking steps inthe direction of the negative gradient for each observed rating:

Pu ← Pu − η∂L∂Pu

= Pu +(Rui − R̂ui

)Qi (10)

Qi ← Qi − η∂L∂Qi

= Qi +(Rui − R̂ui

)Pu (11)

for some step-size η

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 24 / 31

Page 32: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Matrix factorizationPractical notes

Several ways to scale:

• Distribute matrix operations with MapReduce [GHNS11]

• Parallelize stochastic gradient descent [ZWSL10]

• Expectation-maximization for pLSI with MapReduce[DDGR07]

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 25 / 31

Page 33: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Datasets

• Movielenshttp://www.grouplens.org/node/12

• Reddithttp://bit.ly/redditdata

• CU “million songs”http://labrosa.ee.columbia.edu/millionsong/

• Yahoo Music KDDcuphttp://kddcup.yahoo.com/

• AudioScrobblerhttp://bit.ly/audioscrobblerdata

• Delicioushttp://bit.ly/deliciousdata

• . . .

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 26 / 31

Page 34: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Photo recommendations

http://koala.sandbox.yahoo.com

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 27 / 31

Page 35: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

Thanks. Questions?2

[email protected]@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 28 / 31

Page 36: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

References I

RM Bell, Y Koren, and C Volinsky.The bellkor 2008 solution to the netflix prize.Statistics Research Department at AT&T Research, 2008.

AS Das, M Datar, A Garg, and S Rajaram.Google news personalization: scalable online collaborativefiltering.page 280, 2007.

R Gemulla, PJ Haas, E Nijkamp, and Y Sismanis.Large-scale matrix factorization with distributed stochasticgradient descent.2011.

JL Herlocker, JA Konstan, LG Terveen, and JT Riedl.Evaluating collaborative filtering recommender systems.ACM Transactions on Information Systems, 22(1):5–53, 2004.

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 29 / 31

Page 37: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

References II

T Hofmann.Latent semantic models for collaborative filtering.ACM Transactions on Information Systems, 22(1):89–115,2004.

Yehuda Koren, Robert Bell, and Chris Volinsky.Matrix factorization techniques for recommender systems.Computer, 42(8):30–37, 2009.

Yehuda Koren.The bellkor solution to the netflix grand prize.pages 1–10, Aug 2009.

G Linden, B Smith, and J York.Amazon. com recommendations: Item-to-item collaborativefiltering.IEEE Internet computing, 7(1):76–80, 2003.

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 30 / 31

Page 38: Recommendation systems=1For large data setsjakehofman.com/tmp/cu_recsys_20110404.pdf · 2011-04-13 · •Alternating least squares: update entire matrix P, holding Q fixed, and

References III

A Toscher, M Jahrer, and RM Bell.The bigchaos solution to the netflix grand prize.2009.

M. Zinkevich, M. Weimer, A. Smola, and L. Li.Parallelized stochastic gradient descent.In Neural Information Processing Systems (NIPS), 2010.

@jakehofman (Yahoo! Research) Recommendation systems April 4, 2011 31 / 31