intro to recsys and ccf brian ackerman 1. roadmap introduction to recommender systems &...

Intro to RecSys and CCF

Brian Ackerman

Roadmap

• Introduction to Recommender Systems & Collaborative Filtering

• Collaborative Competitive Filtering

Introduction to Recommender Systems & Collaborative Filtering

Motivation

• Netflix has over 20,000 movies, but you may only be interested in a small number of these movies

• Recommender systems can provide personalized suggestions based on a large set of items such as movies– Can be done in a variety of ways, the most popular

is collaborative filtering

Collaborative Filtering

• If two users rate a subset of items similarly, then they might rate other items similarly as well

Item A Item B Item C Item D Item EUser 1 ? 3 4 5 3User 2 1 3 4 5 ?

Roadmap (RS-CF)

• Motivation• Problem• Main CF Types– Memory-based – User-based– Model-based – Regularized SVD

Problem Setting

• Set of users, U• Set of items, I• Users can rate items where rui is user u’s rating

on item i• Ratings are often stored in a rating matrix– R|U|×|I|

Sample Rating MatrixItem A Item B Item C Item D Item E Item F Item G Item H Item I

User 1 - 5 - 3 - - 2 - -

User 2 4 - 5 - - 4 - 1 -

User 3 - 4 - 3 - - 2 - -

User 4 1 2 - - - 5 - 3 -

User 5 - - 3 - 4 - - 2 -

User 6 - 2 - - 1 - - 2 -

User 7 4 - - 5 - - 4 - 1

# is a user rating, - means a null entry, not rated

Problem

• Input– Rating matrix (R|U|×|I|)– Active user, a (user interacting with the system)

• Output– Prediction for all null entries of the active user

Roadmap (RS-CF)

• Motivation• Problem• Main CF Types– Memory-based – User-based– Model-based – Regularized SVD

Main Types

• Memory-based– User-based* [Resnick et al. 1994]– Item-based [Sarwar et al. 2001]– Similarity Fusion (User/Item-based) [Wang et al.

2006]• Model-based– SVD (Singular Value Decomposition) [Sarwar et al.

2000]– RSVD (Regularized SVD)* [Funk 2006]

User-based

• Find similar user’s– KNN or threshold

• Make prediction

Item A Item B Item C Item D Item E Item F Item G Item H Item I

Active ? 5 ? 3 ? ? 2 ? ?

User 2 4 - 5 - - 4 - 1 -

User 3 - 4 - 3 - - 2 - -

User 4 1 2 - - - 5 - 3 -

User 5 - - 3 - 4 - - 2 -

User 6 - 2 - - 1 - - 2 -

User 7 4 - - 5 - - 4 - 1

User-based – Similar Users

• Consider each user (row) to be a vector• Compare each vector to find the similarity

between two users– Let a be the vector for active user and u3 be the

vector for user 3– Cosine similarity can be used to compare vectors

User-based – Similar Users

• KNN (k-nearest neighbors or top-k)– Only find the k most similar users

• Threshold– Find all users that are at most θ level of similarity

Item A Item B Item C Item D Item E Item F Item G Item H Item I

User 1 ? 5 - 3 - - 2 - -

User 2 4 - 5 - - 4 - 1 -

User 3 - 4 - 3 - - 2 - -

User 4 1 2 - - - 5 - 3 -

User 5 - - 3 - 4 - - 2 -

User 6 - 2 - - 1 - - 2 -

User 7 4 - - 5 - - 4 - 1

User-based – Make Prediction

• Weighted by similarity– Weight each similar user’s rating based on

similarity to active user

Similar users

Prediction for active user on item i

Main Types

• Memory-based– User-based* [Resnick et al. 1994]– Item-based [Sarwar et al. 2001]– Similarity Fusion (User/Item-based) [Wang et al.

2006]• Model-based– SVD (Singular Value Decomposition) [Sarwar et al.

2000]– RSVD (Regularized SVD)* [Funk 2006]

Regularized SVD

• Netflix data has 8.5 billion entries based on 17 thousand movie and .5 million users

• Only 100 million ratings– 1.1% of all possible ratings

• Why do we need to operate on such a large matrix?

Regularized SVD – Setup

• Let each user and item be represented by a feature vector of length k– E.g. Item A may be vector Ak = [a1 a2 a3 … ak]

• Imagine the features for items were fixed– E.g. items are movies and each feature is a genre

such as comedy, drama, etc…• Features of the user vector are how well a

user likes that feature

• Consider the movie Die Hard– Its feature vector may be i = [1 0 0] if the features

are action, comedy, and drama• Maybe the user has the feature vector u =

[3.87 2.64 1.32]• We can try to predict a user’s rating using the

dot product of these two vectors– r’ui= u i = [1 0 0] [3.87 2.64 1.32] = 3.87∙ ∙

Regularized SVD – Goal

• Try to find values for each item vector that work for all users

• Try to find value for each user vector that can produce the actual rating when taking the dot product with the item vector

• Minimizing the difference between the actual and predicted (based on dot product) rating

• In reality, we cannot choose k to be large enough for a fixed number of features– There are too many to consider (e.g. genre, actors,

directors, etc…)• Usually k is only 25 to 50 which reduces the

total size of the matrices to only roughly 25 million to 50 million (compared to 8.5 billion)

• Because of the size of k, the values in the vectors are NOT directly tied to any feature

Regularized SVD – Goal

• Let u be a user, i be an item, rui is a rating by user u on item i where R is the set of all ratings, and φu, φi are the vectors

• At first thought, it seems simple to have the following optimization goal

Regularized SVD – Overfitting

• Problem is overfitting of the features– Solved by regularization

Regularized SVD – Regularization

• Introduce a new optimization goal including a term for regularization

• Minimizing the magnitude of the feature vectors– Controlled by fixed parameters λu and λi

Regularized SVD

• Many improvements have been proposed to improve the regularized optimization goal– RSVD2/NSVD1/NSVD2 [Paterek 2007]: added term

for user bias and a term for item bias, minimize number of parameters

– Integrated Neighborhood SVD++ [Koren 2008]: used a neighborhood-based approach to RSVD

Roadmap

• Introduction to Recommender Systems & Collaborative Filtering

• Collaborative Competitive Filtering

Collaborative Competitive Filtering: Learning Recommender Using Context of User Choice

Georgia Tech and Yahoo! LabsBest Student Paper at SIGIR’11

Motivation

• A user may be given 5 random movies and chooses Die Hard– This tells us the user prefers action movies

• A user may be given 5 actions movies and chooses Die Hard over Rocky and Terminator– This tells us the user prefers Bruce Willis

Roadmap (CCF)

• Motivation• Problem Setting & Input• Techniques• Extensions

Problem Setting

• Set of users, U• Set of items, I• Each user interaction has an offer set O and a

decision set D• Each user interaction is stored as a tuple (u, O,

D) where D is a subset of O

CCF InputItem A Item B Item C Item D Item E Item F Item G Item H Item I

U1-S1 1 - - -

U1-S2 - - 1 -

U1-S3 - - - 1

U2-S1 - 1 - - -

U2-S2 - 1 - -

U3-S1 - - - 1

U3-S2 - - - 1

1 means user interaction, - means it was in the offer set

Roadmap (CCF)

Local Optimality of User Choice

• Each item has a potential revenue to the user which is rui

• Users also consider the opportunity cost (OC) when deciding potential revenue– OC is what the user gives up for making a given

decision

• OC is cui = max( i’ | i’ in O \ i)

• Profit is πui= rui – cui

Local Optimality of User Choice

• A user interaction is an opportunity give and take process– User is given a set of opportunities– User makes a decision to select one of the many

opportunities– Each opportunity comes with some revenue

(utility or relevance)

Competitive Collaborative Filtering

• Local optimality constraint– Each item in the decision set has a revenue higher

than those not in the decision set

– Problem becomes intractable with only this constraint, no unique solution

CCF – Hinge Model

• Optimization goal– Minimize error (ξ, slack variable) & model

complexity

CCF – Hinge Model

• Find average potential utility– Average utility of non-chosen items

• Constraints– Chosen items have a higher utility– eui is an error term

CCF – Hinge Model

• Optimization Goal

– Assume ξ is 0

Average Relevance of Non-chosen Items

CCF – How to use results

• We can predict the relevance of all items based on user and item vectors– Can set threshold if more than one item can be

chosen (e.g. θ > .9 implies action)

Item User Action Predicted Relevance

A 1 .98

B - .93

C - .56

D - .25

E - .11

Roadmap (CCF)

Extensions

• Sessions without a response– User does not take any opportunity

• Adding content features– Fixed features for each item rather than a limited

number of parameters to improve accuracy of new item prediction

intro to recsys and ccf brian ackerman 1. roadmap introduction to recommender systems &...

Documents

personalized recommender system using entropy based...

product quantized collaborative filtering

collaborative filtering introduction search or content...

collaborative filtering

collaborative filtering via euclidean embedding m....

1 information filtering rong jin. 2 outline brief...

eecs349 collaborative filtering - github pages ·...

introduction to collaborative filtering

collaborative filtering recommender systems

multiple objectives in collaborative filtering (recsys 2010)

collaborative filtering recommendation

collaborative filtering using knn

collaborative filtering matrix factorization...

case study 4: collaborative filtering · case study 4:...

collaborative filtering with ccam

collaborative filtering recommendation system

collaborative filtering - practical machine learning, cs ......

collaborative filtering with privacy

presented by :ayesha khan. content introduction everyday...

online evolutionary collaborative filtering recsys 2010...