recommender systems: advances in collaborative filtering

Recommender Systems: Backgrounds &

Advances in Collaborative Filtering

Changsung Moon

Department of Computer Science

North Carolina State University

2

Amazon.com

3

Netflix

4

The Long Tail

Source: http://www.wired.com/2004/10/tail/

5

Information Overload

• Recommender systems help to match users with items

- Ease information overload

- Sales assistance (guidance, advisory, profit increase, ...)

6

Recommender Problem

• Recommender systems are a subclass of information filtering system that seek to predict the ‘rating’

or ‘preference’ that a user would give to an item – Wikipedia

7

Recommenders Trends

8

Data Mining Methods

• Recommender systems typically

apply techniques and methodologies

of a genernal data mining

9

Types of Input

• Explicit Feedback

- Feedback that users directly report on their interest in items

- e.g. star ratings for movies

- e.g. thumbs-up/down for TV shows

• Implicit Feedback

- Feedback, which indirectly reflects opinion through observing user behavior

- e.g. purchase history, browsing history, or search patterns

10

Collaborative FilteringSimilarity

0.55

0.95 0.75

0.50

Like

Like

Like

Like

Like

Recommend

Recommendation

11

Pros of Collaborative Filtering

• requires minimal knowledge engineering efforts

• needs not consider content of items

• produces good enough results in most cases

• Serendipity of results

12

Challenges for Collaborative Filtering

• Sparsity

- Usually the vast majority of ratings are unknown

- e.g. 99% of ratings are missing in Netflix data

• Scalability

- Nearest neighbor techniques require computation that grows with both the number of users

and the number of items

• Cold Start Problem

- New items and new users can cause the cold-start problem, as there will be insufficient data

for CF to work accurately

13

Challenges for Collaborative Filtering

• Popularity Bias

- tends to recommend popular items

• Synonyms

- Same or very similar items having different names or entries

- Topic modeling like LDA could solve this by grouping different words belonging to the same

topic

• Shilling Attacks

- People may give positive ratings for their own items and negative ratings for their

competitors

14

Content-based Recommendation

• Based on information about item itself, usually keywords or phrases occurring in the item

• Similarity between two content items is measured by similarity associated with their term vectors

• User’s profile can be developed by analyzing set of content the user interacted with

• enables you to compute the similarities between a user and an item

Bought

Recommend

Similar

15

Pros/Cons of Content-based Approach

• Pros

- No need for data on other users: No cold-start or sparsity

- able to recommend to users with unique tastes

- able to recommend new and unpopular items

- provides explanations by listing content features

• Cons

- In certain domains (e.g., music, blogs and videos), it is complicated to generate the features

for items

- difficult to implement serendipity

- Users only receive recommendations that are very similar to items they liked or

prefered

16

Hybrid Methods

• Weighted

- Outputs from several techniques are combined with different weights

• Switching

- Depending on situation, the system changes from one technique to another

• Mixed

- Outputs from several techniques are presented at the same time

• Cascade

- The output from one technique is used as input of another that refines the results

17

Hybrid Methods

• Feature Combination

- Features from different recommendation sources are combined as input to a single technique

• Feature Augmentation

- The output from one technique is used as input features to another

• Meta-level

- The model learned by one recommender is used as input to another

18

Two Main Techniques of CF

• Neighborhood Approach

- Relationships between items or between users

• Latent Factor Models

- Transforming both items and users to the same latent factor space

- Characterizing both items and users on factors inferred from user feedback

- pLSA

- neural networks

- Latent Dirichlet Allocation

- Matrix factorization (e.g. SVD-based models)

- ...

19

Latent Factor Models

• find features that describe the characteristics of rated objects

• Item characteristics and user preferences are described with numerical factor values

Action Comedy

20

Latent Factor Models

• Items and users are associated with a factor vector

• Dot-product captures the user’s estimated interest in the item

- Each item i is associated with a vector

- Each user u is associated with a vector

• Challenge – How to compute a mapping of items and users to factor vectors?

• Approaches

- Matrix Factorization Models

- e.g. Singular Value Decomposition (SVD)

21

SVD

• R: matrix (e.g., N users, M movies)

• U: matrix (e.g., N users, k factors)

• : diagonal matrix with k largest eigenvalues

• : matrix (e.g., k factors, M movies)

22

SVD

5 5 1

5 4 2

1 2 2

1 3 5

𝑓 1 𝑓 2-0.44-0.63

-0.23-0.60

0.25-0.25

0.83-0.43

𝑓 1𝑓 2

-0.67-0.62

-0.03-0.52

-0.41

0.85

𝑓 1𝑓 2

𝑓 1 𝑓 2010.96

4.390

R U

𝑽 𝒕

𝚺

23

SVD

𝑓 1

𝑓 2

24

SVD - Problems

• Conventional SVD has difficulties due to high portion of missing values in the user-item ratings

matrix

• Imputation to fill in missing ratings

- Imputation can be very expensive as it significantly increases the amount of data

- Inaccurate imputation might distort the data

25

Matrix Factorization for Rating Prediction

• Modeling directly the observed ratings only

- is the set of the (u,i) pairs for which is known

-

• To learn the factor vectors, and we minimize the squared error

26

Regularization

• To avoid overfitting through a regularized model

- learn the factor vectors, and

- The constant , which controls the extent of regularization, is usually determined by cross

validation

- Minimization is typically performed by either stochastic gradient descent or alternating least

squares

Regularization

27

Learning Algorithms

• Stochastic gradient descent

- Modification of parameters (, ) relative to prediction error

- Error = actual rating – predicted rating

• Alternating least squares

- allow massive parallelization

- Better for densely filled matrices

28

Simplified Illustration

29

First Two Vectors from Matrix Decomposition

30

Extended MF (Adding Biases)

• Biases

- Much of the variation in ratings is due to effects associated with either users or items,

independently of their interactions

- i.e., some users tend to give higher ratings than others

- i.e., some items tend to receive higher ratings than others

- A prediction for an unknown rating is denoted by

- : the overall average rating over all items

- and : the observed deviations of user u and item i

31


• Joe tends to rate 0.2 stars lower than the average

• Suppose that the average rating over all movies, , is 3.9 stars

• Avengers tends to be rated 0.5 stars above the average

• Avengers movie’s predicted rating by Joe:

𝑏𝑢𝑖=𝜇+𝑏𝑖+𝑏𝑢=3.9−0.2+0.5=4.2

32


• Adding biases

- A rating is created by adding biases

• Objective Function

- In order to learn parameters (, , and ) we minimize the regularized squared error

- Minimization is typically performed by either stochastic gradient descent or alternating least

squares

33

Extended MF (Temporal Dynamics)

• Ratings may be affected by temporal effects

- Popularity of an item may change

- User’s identity and preferences may change

• Modeling temporal affects can improve accuracy significantly

• Rating predictions as a function of time

34

SVD++

• Prediction accuracy can be improved by considering also implicit feedback

• N(u) denotes the set of items for which user u expressed an implicit preference

• A new set of item factors are necessary, where item i is associated with

• A user is characterized by normalizing the sum of factor vectors:

35

SVD++

• Several types of implicit feedback can be simultaneously introduced into the model

- For example, is the set of items that the user u rented, and is the set of items that reflect a

different type of implicit feedback like browsing items

36

Experimental Results

37

References

1. Koren, Y. and Bell, R., Advances in collaborative filtering. In Recommender systems handbook, pp.

145-186, Springer US, 2011

2. Amatriain, X., Jaimes, A., Oliver, N. and Pujol, J.M., Data mining methods for recommender

systems. In Recommender systems handbook, pp. 39-71, Springer US, 2011

3. Koren, Y., Bell, R. and Volinsky, C., Matrix factorization techniques for recommender systems.

IEEE Computer, (8), pp. 30-37, 2009

4. Dietmar, J. and Gerhard F., Tutorial: Recommender Systems. Proc. International Joint Conference

on Artificial Intelligence (IJCAI 13), Beijing, 2013

38

References

5. Amatriain, X. and Mobasher, B., The recommender problem revisited: morning tutorial. In

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data

mining, pp. 1971-1971, ACM, 2014

6. Bobadilla, J., Ortega, F., Hernando, A. and Gutirrez, A., Recommender systems survey. Knowledge-

Based Systems, 46, pp. 109-132, 2013

7. Moon, C., Recommender systems survey. SlideShare, 2014

(http://www.slideshare.net/ChangsungMoon/summary-of-rs-survey-ver-07-20140915)

8. Freitag, M and Schwarz, J., Matrix factorization techniques for recommender systems.

Presentation Slides in Hasso Plattner Institut, 2011

(http://hpi.de/fileadmin/user_upload/fachgebiete/naumann/lehre/SS2011/Collaborative_Filtering/

pres1-matrixfactorization.pdf)

recommender systems: advances in collaborative filtering

Data & Analytics