svd and the netflix dataset

SVD Applied to Collaborative Filtering

~ URUG 7-12-07 ~

Recommendation System

Recommendation SystemAnswers the question:

What do I want next?!?

Recommendation System

Very consumer driven.

Must provide good results or a user may not trust the system in the future.

Answers the question:What do I want next?!?

Collaborative FilteringBase user recommendations off of:

User’s past history.

History of like-minded users.

View data as product X user matrix.

Find a “neighborhood” of similar users for that user.

Return the top-N recommendations.

Early Approaches

Goldberg, et. al. (1992), Using collaborative filtering to weave an information tapestry

Konstan, J., el. at (1997), Applying Collaborative Filtering to Usenet news.

Use Pearson Correlation or cosine similarity as a measure of similarity to form neighborhoods.

Early CF Challenges

Early CF Challenges

Sparsity - No correlation between users can be found. Reduced coverage occurs.

Early CF Challenges


Scalability - Nearest neighbor algorithms computation time grows with the number of products and users.

Early CF Challenges


Scalability - Nearest neighbor algorithms computation time grows with the number of products and users.

Synonymy

Dimensionality Reduction

Dimensionality ReductionLatent Semantic Indexing (LSI)


Algorithm from IR community (late 80s-early 90s.)



Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets.




Reduces dimensionality of a dataset and captures the latent relationships.




Reduces dimensionality of a dataset and captures the latent relationships.

Easily maps to CF!

Framing LSI for CFProducts X Users matrix instead of Terms X Documents.

480,189 users, 17,770 movies, only ~100 milion ratings.

17,770 X 480,189 matrix that is 99% sparse!

About 8.5 billion potential ratings.

Netflix Dataset

SVD- The math behind LSISingular Value Decomposition

For any M x N matrix A of rank r, it can decomposed as:

A = U!V TU is a M x M orthogonal matrix.V is a N X N orthogonal matrix.Σ is a M x N diagonal matrix whose first r diagonal entries are the nonzero singular values of A.

!1 ! !2...! !r > !r+1 = ... = !n = 0

Related to eigenvalue decomposition (PCA)

U is the orthornormal eigenspace of AA^T. Spans the “column space”, known as left singular vectors.

V is the orthornormal eigenspace of A^TA. Spans “row space”. Right vectors.

Singular values are the square roots of the eigenvalues.

Reducing Dimensionality

A_k is the closest approximation to A.

A_k minimizes the Frobenius norm over all rank-k matrices:

Ak = Uk!kV Tk

||A!Ak||F

Making RecommendationsCosine Similarity- common way to find neighborhood.

cos(i, j) =i · j

||i||2 ! || j||2Somehow base recommendations off of that neighborhood and its users.

Can also make predictions of products with a simple dot product if the singular values are combined with the singular vectors.

CPprod = Cavg +UkS1/2k (c) · S1/2

k V Tk (p)

Challenges with SVDScalability - Once again, compute time grows with the number of users and products. O(m^3)

Offline stage.

Online stage.

Even doing the SVD computation offline is not possible for large datasets. Other methods are needed.

Incremental SVD

uk = uTVk!!1k

Incremental SVD Results

GHA for SVD

Gorrell (2006),GHA for Incremental SVD in NLP

Based off of Sanger’s (1989) GHA for eigen decomposition.

!cai = cb

i · b(x!"j<i

(a · caj)c

aj)

!cbi = ca

i · a(b!"j<i

(b · cbj)c

bj)

GHA extended by Funk

void train(int user, int movie, real rating) { real err = lrate * (rating - predictRating(movie, user));

userValue[user] += err * movieValue[movie]; movieValue[movie] += err * userValue[user]; }

Netflix Results

Best RMSEs

0.9283

0.9212

Blended to get 0.9189, 3.42% better than Netflix.

SummarySVD provides an elegant and automatic recommendation system that has the potential to scale.

There are many different algorithms to calculate or at least approximate SVD which can be used in offline stages for websites that need to have CF.

Every dataset is different and requires experimentation with to get the best results.

svd and the netflix dataset

Technology