large-scale parallel collaborative filtering for the β¦...2019/01/17 Β Β· alternating least squares...
TRANSCRIPT
![Page 1: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/1.jpg)
Large-scale Parallel Collaborative Filtering for the Netflix Prize Yunhong Zhou, Dennis Wilkinson, Robert Schreiber and Rong Pan
TALK LEAD: PRESTON ENGSTROM
FACILITATORS: SERENA MCDONNELL, OMAR NADA
![Page 2: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/2.jpg)
AgendaHistorical Background
Problem Formulation
Alternating Least Squares For Collaborative Filtering
Results
Discussion
![Page 3: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/3.jpg)
The Netflix PrizeIn 2006, Netflix offered 1 million dollars for an algorithm that could beat their algorithm, βCinematchβ, in terms of RMSE by 10%
100MM ratings
(User, Item, Date, Rating)
Training data and holdout was split by dateβ¦ Training data: 1996 β 2005
β¦ Test data: 2006
![Page 4: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/4.jpg)
The Netflix Prize: Solutions
![Page 5: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/5.jpg)
Recommendation Systems:Collaborative FilteringGenerate recommendations of relevant items from aggregate behavior/tastes over a large number of users
Contrasted against Content Filtering, which uses attributes (text, tags, user age, demographics) to find related items and users
![Page 6: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/6.jpg)
Notationππ’ : Number of users
ππ : Number of items
R : ππ’ Γ ππ interaction matrix
πππ : Rating provided by user π for item π
πΌπ : Set of movies user π has rated
ππ’π : Cardinality of πΌπ
πΌπ : Set of users who rated movie π
πππ: Cardinality of πΌπ
![Page 7: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/7.jpg)
Problem FormulationGiven a set of users and items, estimate how a user would rate an item they have had no interaction with
Available information:β¦ List of (User ID, Item ID, Rating,β¦..)
Low Rank Approximation⦠Populate a matrix with the known interactions, factor to two smaller
matrices, and use them to generate unknown interactions
![Page 8: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/8.jpg)
Problem Formulation
π π R
Let π = ππ be the user feature matrix, where ππ β βππ for all π = 1β¦ππ’
Let π = ππ be the user feature matrix, where ππ β βππ for all π = 1β¦ππ
ππ is the dimension of the feature space. This gives ππ’ + ππ Γ ππ parameters to be learned
If the full interaction matrix R is known, and ππ is sufficiently large, we could expect
that πππ =< ππ,ππ >,βπ, π
ππ = 3
![Page 9: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/9.jpg)
Problem Formulation
Mean square loss function for a single rating:
β2 π, π,π = (π β< π,π >)π
Total loss for a given π and π, and the set of known ratings πΌ:
βπππ π , π,π =1
πΟ π,π βπΌ β
2(πππππ,ππ)
Finding a good low rank approximation:π,π = argmin
π,πβπππ (π , π,π)
Many parameters and few ratings make overfitting very likely, so a regularization term should be introduced:
βππππ
π , π,π = βπππ π , π,π + π( πΞπ2 + πΞπ
2)
![Page 10: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/10.jpg)
Singular Value Decomposition
Singular Value Decomposition (SVD) is a common method for approximating a matrix π by the product of two rank π matrices ΰ·¨π = ππ Γπ
Because the algorithm operates over the entire matrix, minimizing the Frobenious norm π β ΰ·¨π , standard SVD can fail to find π and π
![Page 11: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/11.jpg)
Alternating Least Squares
Alternating Least Squares is one method used to find π and π, only considering known ratings
By treating each update of π and π as a least squares problem, we are able to update one matrix by fixing the other, and using the known entries of π
![Page 12: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/12.jpg)
ALS with Weighted-π-RegularizationStep 1: Set π΄ to be a ππ Γ ππ, assign the average rating for each movie to the first row
Step 2: Fix π΄ solve πΌ by minimizing the objective function
Step 3: Fix πΌ solve π by minimizing the objective function
Step 4: Repeat steps 2 and 3 until a set number of iterations, or an improvement in error of less than 0.0001 on the probe dataset occurs
![Page 13: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/13.jpg)
ALS with Weighted-π-Regularization
The selected lost function uses Tikhonov regularization to penalize large parameters. This scheme was found to never overfit on test data when the number of features ππ or number of iterations was increased.
π π,π =
(π,π)ππΌ
πππ β πππ»ππ
2+ π
π
ππ’π ππ2 +
π
πππππ
2
π : Regularization weight
ππ’π : Cardinality of πΌπ
πππ: Cardinality of πΌπ
![Page 14: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/14.jpg)
Solving for π given π1
2
ππ
ππ’ππ= 0, βπ, π
β
πβπΌπ
(πππ»ππ β πππ)πππ + πππ’ππ’ππ = 0, βπ, π
β
πβπΌπ
ππππππππ + πππ’ππ’ππ =
πβπΌπ
πππ πππ , βπ, π
β ππΌπππΌππ + πππ’ππΈ ππ = ππΌππ
π π, πΌπ , βπ
β ππ = π΄πβ1ππ , βπ
![Page 15: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/15.jpg)
Update Procedure (Single Machine)
![Page 16: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/16.jpg)
Parallelization
ALS-WR can be parallelized by distributing the updates of π and π over multiple computers
When updating π, each computer need only hold a portion of π, and the corresponding portion of π . However, a complete copy of π is needed locally on each computer
Once all computers have updated their local partition of π, the results can be gathered, and distributed out to allow the following update of π
![Page 17: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/17.jpg)
Post ProcessingGlobal mean shift:
β¦ Given a prediction π, if the mean of π is not equal to the mean of the test dataset, all predictions are shifted by a constant π = ππππ π‘ππ π‘ βππππ π
β¦ This is shown to strictly reduce RMSE
Linear combinations of predictors:β¦ Given two predictors π0 and π1, a new family of predictors ππ₯ = 1 β π₯ π0 +π₯π1 can be found, and π₯β determined by minimizing π πππΈ ππ₯
β¦ This yields a predictor ππ₯β , which is at least as good as π0 or π1
![Page 18: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/18.jpg)
2 Minute Break
![Page 19: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/19.jpg)
Hyperparameter Selection : Ξ»
![Page 20: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/20.jpg)
Hyperparameter Selection : ππ
![Page 21: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/21.jpg)
Experimental ResultsWhere does ALS-WR stack up against the top Netflix prize winners?
The best RMSE score is an improvement of 5.56% over CineMatch
ππ πβ πΉπ΄πΊπ¬ π©πππ πͺπππππππππ
50 0.065 0.9114 No
150 0.065 0.9066 No
300 0.065 0.9017 Yes
400 0.065 0.9006 Yes
500 0.065 0.9000 Yes
1000 0.065 0.8985 No
![Page 22: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/22.jpg)
Combining ALS-WR With Other MethodsThe top performing Netflix prize contributions were all composed of ensembles of models. Knowing this, combined with apparent diminishing returns of increasing ππ and π, ALS-WR was combined with two other common methods:
β¦ Restricted Boltzmann Machines (RBM)
β¦ K-nearest Neighbours (kNN)
Both methods can be parallelized in a similar method to ALS-WR, and provide a modest improvement in performance.
Using a linear blend of ALS, RBM and kNN, an RMSE of 0.8952 (a 5.91% over CineMatch) was obtained
![Page 23: Large-scale Parallel Collaborative Filtering for the β¦...2019/01/17 Β Β· Alternating Least Squares Alternating Least Squares is one method used to find and π,only considering](https://reader033.vdocuments.us/reader033/viewer/2022042210/5eaeadadec0e7e1b4c38fdd4/html5/thumbnails/23.jpg)
DiscussionWhile RMSE is the error metric both Netflix and the researchers choose, itβs not necessarily the most appropriate. What other metrics could have been useful?
It is stated that the method of regularization will stop overfitting if either the number of features or number of iterations is increased- does this mean that if both are increased, it will overfit? Is their claim that the model never overfits valid?
S