recommender systems: advances in collaborative filtering
TRANSCRIPT
Recommender Systems: Backgrounds &
Advances in Collaborative Filtering
Changsung Moon
Department of Computer Science
North Carolina State University
2
Amazon.com
3
Netflix
4
The Long Tail
Source: http://www.wired.com/2004/10/tail/
5
Information Overload
• Recommender systems help to match users with items
- Ease information overload
- Sales assistance (guidance, advisory, profit increase, ...)
6
Recommender Problem
• Recommender systems are a subclass of information filtering system that seek to predict the ‘rating’
or ‘preference’ that a user would give to an item – Wikipedia
7
Recommenders Trends
8
Data Mining Methods
• Recommender systems typically
apply techniques and methodologies
of a genernal data mining
9
Types of Input
• Explicit Feedback
- Feedback that users directly report on their interest in items
- e.g. star ratings for movies
- e.g. thumbs-up/down for TV shows
• Implicit Feedback
- Feedback, which indirectly reflects opinion through observing user behavior
- e.g. purchase history, browsing history, or search patterns
10
Collaborative FilteringSimilarity
0.55
0.95 0.75
0.50
Like
Like
Like
Like
Like
Recommend
Recommendation
11
Pros of Collaborative Filtering
• requires minimal knowledge engineering efforts
• needs not consider content of items
• produces good enough results in most cases
• Serendipity of results
12
Challenges for Collaborative Filtering
• Sparsity
- Usually the vast majority of ratings are unknown
- e.g. 99% of ratings are missing in Netflix data
• Scalability
- Nearest neighbor techniques require computation that grows with both the number of users
and the number of items
• Cold Start Problem
- New items and new users can cause the cold-start problem, as there will be insufficient data
for CF to work accurately
13
Challenges for Collaborative Filtering
• Popularity Bias
- tends to recommend popular items
• Synonyms
- Same or very similar items having different names or entries
- Topic modeling like LDA could solve this by grouping different words belonging to the same
topic
• Shilling Attacks
- People may give positive ratings for their own items and negative ratings for their
competitors
14
Content-based Recommendation
• Based on information about item itself, usually keywords or phrases occurring in the item
• Similarity between two content items is measured by similarity associated with their term vectors
• User’s profile can be developed by analyzing set of content the user interacted with
• enables you to compute the similarities between a user and an item
Bought
Recommend
Similar
15
Pros/Cons of Content-based Approach
• Pros
- No need for data on other users: No cold-start or sparsity
- able to recommend to users with unique tastes
- able to recommend new and unpopular items
- provides explanations by listing content features
• Cons
- In certain domains (e.g., music, blogs and videos), it is complicated to generate the features
for items
- difficult to implement serendipity
- Users only receive recommendations that are very similar to items they liked or
prefered
16
Hybrid Methods
• Weighted
- Outputs from several techniques are combined with different weights
• Switching
- Depending on situation, the system changes from one technique to another
• Mixed
- Outputs from several techniques are presented at the same time
• Cascade
- The output from one technique is used as input of another that refines the results
17
Hybrid Methods
• Feature Combination
- Features from different recommendation sources are combined as input to a single technique
• Feature Augmentation
- The output from one technique is used as input features to another
• Meta-level
- The model learned by one recommender is used as input to another
18
Two Main Techniques of CF
• Neighborhood Approach
- Relationships between items or between users
• Latent Factor Models
- Transforming both items and users to the same latent factor space
- Characterizing both items and users on factors inferred from user feedback
- pLSA
- neural networks
- Latent Dirichlet Allocation
- Matrix factorization (e.g. SVD-based models)
- ...
19
Latent Factor Models
• find features that describe the characteristics of rated objects
• Item characteristics and user preferences are described with numerical factor values
Action Comedy
20
Latent Factor Models
• Items and users are associated with a factor vector
• Dot-product captures the user’s estimated interest in the item
- Each item i is associated with a vector
- Each user u is associated with a vector
• Challenge – How to compute a mapping of items and users to factor vectors?
• Approaches
- Matrix Factorization Models
- e.g. Singular Value Decomposition (SVD)
21
SVD
• R: matrix (e.g., N users, M movies)
• U: matrix (e.g., N users, k factors)
• : diagonal matrix with k largest eigenvalues
• : matrix (e.g., k factors, M movies)
22
SVD
5 5 1
5 4 2
1 2 2
1 3 5
𝑓 1 𝑓 2-0.44-0.63
-0.23-0.60
0.25-0.25
0.83-0.43
𝑓 1𝑓 2
-0.67-0.62
-0.03-0.52
-0.41
0.85
𝑓 1𝑓 2
𝑓 1 𝑓 2010.96
4.390
R U
𝑽 𝒕
𝚺
23
SVD
𝑓 1
𝑓 2
24
SVD - Problems
• Conventional SVD has difficulties due to high portion of missing values in the user-item ratings
matrix
• Imputation to fill in missing ratings
- Imputation can be very expensive as it significantly increases the amount of data
- Inaccurate imputation might distort the data
25
Matrix Factorization for Rating Prediction
• Modeling directly the observed ratings only
- is the set of the (u,i) pairs for which is known
-
• To learn the factor vectors, and we minimize the squared error
26
Regularization
• To avoid overfitting through a regularized model
- learn the factor vectors, and
- The constant , which controls the extent of regularization, is usually determined by cross
validation
- Minimization is typically performed by either stochastic gradient descent or alternating least
squares
Regularization
27
Learning Algorithms
• Stochastic gradient descent
- Modification of parameters (, ) relative to prediction error
- Error = actual rating – predicted rating
• Alternating least squares
- allow massive parallelization
- Better for densely filled matrices
28
Simplified Illustration
29
First Two Vectors from Matrix Decomposition
30
Extended MF (Adding Biases)
• Biases
- Much of the variation in ratings is due to effects associated with either users or items,
independently of their interactions
- i.e., some users tend to give higher ratings than others
- i.e., some items tend to receive higher ratings than others
- A prediction for an unknown rating is denoted by
- : the overall average rating over all items
- and : the observed deviations of user u and item i
31
Extended MF (Adding Biases)
• Joe tends to rate 0.2 stars lower than the average
• Suppose that the average rating over all movies, , is 3.9 stars
• Avengers tends to be rated 0.5 stars above the average
• Avengers movie’s predicted rating by Joe:
𝑏𝑢𝑖=𝜇+𝑏𝑖+𝑏𝑢=3.9−0.2+0.5=4.2
32
Extended MF (Adding Biases)
• Adding biases
- A rating is created by adding biases
• Objective Function
- In order to learn parameters (, , and ) we minimize the regularized squared error
- Minimization is typically performed by either stochastic gradient descent or alternating least
squares
33
Extended MF (Temporal Dynamics)
• Ratings may be affected by temporal effects
- Popularity of an item may change
- User’s identity and preferences may change
• Modeling temporal affects can improve accuracy significantly
• Rating predictions as a function of time
34
SVD++
• Prediction accuracy can be improved by considering also implicit feedback
• N(u) denotes the set of items for which user u expressed an implicit preference
• A new set of item factors are necessary, where item i is associated with
• A user is characterized by normalizing the sum of factor vectors:
35
SVD++
• Several types of implicit feedback can be simultaneously introduced into the model
- For example, is the set of items that the user u rented, and is the set of items that reflect a
different type of implicit feedback like browsing items
36
Experimental Results
37
References
1. Koren, Y. and Bell, R., Advances in collaborative filtering. In Recommender systems handbook, pp.
145-186, Springer US, 2011
2. Amatriain, X., Jaimes, A., Oliver, N. and Pujol, J.M., Data mining methods for recommender
systems. In Recommender systems handbook, pp. 39-71, Springer US, 2011
3. Koren, Y., Bell, R. and Volinsky, C., Matrix factorization techniques for recommender systems.
IEEE Computer, (8), pp. 30-37, 2009
4. Dietmar, J. and Gerhard F., Tutorial: Recommender Systems. Proc. International Joint Conference
on Artificial Intelligence (IJCAI 13), Beijing, 2013
38
References
5. Amatriain, X. and Mobasher, B., The recommender problem revisited: morning tutorial. In
Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data
mining, pp. 1971-1971, ACM, 2014
6. Bobadilla, J., Ortega, F., Hernando, A. and Gutirrez, A., Recommender systems survey. Knowledge-
Based Systems, 46, pp. 109-132, 2013
7. Moon, C., Recommender systems survey. SlideShare, 2014
(http://www.slideshare.net/ChangsungMoon/summary-of-rs-survey-ver-07-20140915)
8. Freitag, M and Schwarz, J., Matrix factorization techniques for recommender systems.
Presentation Slides in Hasso Plattner Institut, 2011
(http://hpi.de/fileadmin/user_upload/fachgebiete/naumann/lehre/SS2011/Collaborative_Filtering/
pres1-matrixfactorization.pdf)