dataengconf: building a music recommender system from scratch with spotify data team
TRANSCRIPT
November 14, 2015
Building a
Music Recommenderfrom
ScratchVidhya Murali@vid052
Vidhya Murali
Who Am I?
2
•Areas of Interest: Data & Machine Learning•Data Science Engineer @Spotify•Masters Student from the University of Wisconsin Madison
aka Happy Badger for life!
“Torture the data, and it will confess!”
3
– Ronald Coase, Nobel Prize Laureate
Music Recommendations at Spotify
Features: Discover Discover Weekly Moments Radio Related Artists
4
5
30 million tracks…What to recommend?
6
•Manual Curation by Experts
•Editorial Tagging
•Metadata (e.g. Label provided data, NLP over News, Blogs)
•Audio Signals
•Collaborative Filtering Model
Approaches
6
•Manual Curation by Experts
•Editorial Tagging
•Metadata (e.g. Label provided data, NLP over News, Blogs)
•Audio Signals
•Collaborative Filtering Model
Approaches
Definition of CF
7
Hey,I like tracks P, Q, R, S!
Well,I like tracks Q, R, S, T!
Then you should check out track P!
Nice! Btw try track T!
Legacy Slide of Erik Bernhardsson
Collaborative Filtering Model 8
•Find patterns from user’s past behavior to generate recommendations
•Domain independent
•Scalable
•Accuracy (Collaborative Model) >= Accuracy (Content Based Model)
Construct Big Matrix!9
Artists(n)
Users(m)
Vidhya
Ellie Goulding
Construct Big Matrix!9
Artists(n)
Users(m)
Vidhya
Ellie Goulding
Order of Millions!
Latent Factor Models 10
Vidhya Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .m m
n
m n
Latent Factor Models 10
Vidhya Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .m m
n
m n
User Artist Matrix: (m x n)
Latent Factor Models 10
Vidhya Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .m m
n
m n
User Vector Matrix: X: (m x f)
User Artist Matrix: (m x n)
Latent Factor Models 10
Vidhya Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .m m
n
m n
User Vector Matrix: X: (m x f)
Artist Vector Matrix: Y: (n x f)
User Artist Matrix: (m x n)
Latent Factor Models 10
Vidhya Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .
(here, f = 2)
m m
n
m n
User Vector Matrix: X: (m x f)
Artist Vector Matrix: Y: (n x f)
User Artist Matrix: (m x n)
Why Vectors? 11
•Vectors encode higher order dependencies
•Users and Items in the same vector space!•Use vector similarity to compute:•Item-Item similarities•User-Item recommendations
•Linear complexity: order of number of latent factors
•Easy to scale up
Explicit Matrix Factorization 12
•User explicitly rates a subset of the music catalog•Goal: Predict how users will rate new music•How: Approximate ratings matrix by the inner product of 2 smaller matrices
by minimizing the RMSE (root mean squared error)
X YUsers
Artists
• = bias for user• = bias for item• = regularization parameter
• = user rating for item• = user latent factor vector• = item latent factor vector
Matrix Factorization using Implicit Feedback 13
Matrix Factorization using Implicit Feedback
User Artist Play Count Matrix
13
Matrix Factorization using Implicit Feedback
User Artist Play Count Matrix
User Artist Preference
Matrix
Binary Label: 1 => played 0 => not played
13
Matrix Factorization using Implicit Feedback
User Artist Play Count Matrix
User Artist Preference
Matrix
Binary Label: 1 => played 0 => not played
Weights Matrix
Weights based on play count and smoothing
13
Equation(s) Alert!14
Implicit Matrix Factorization 15
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
•Aggregate all (user, artist) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
X YUsers
Artists
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed artist else 0• • = user latent factor vector• = item latent factor vector
Alternating Least Squares 16
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
Artists
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed artist else 0• • = user latent factor vector• = item latent factor vector
Fix artists
•Aggregate all (user, artist) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
17
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed artist else 0• • = user latent factor vector• = item latent factor vector
Fix artists
Solve for users
•Aggregate all (user, artist) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
18
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed artist else 0• • = user latent factor vector• = item latent factor vector
Fix users
•Aggregate all (user, artist) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
19
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed artist else 0• • = user latent factor vector• = item latent factor vector
Fix usersSolve for artists
•Aggregate all (user, artist) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
20
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed artist else 0• • = user latent factor vector• = item latent factor vector
Fix usersSolve for artists
Repeat until convergence…
•Aggregate all (user, artist) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
21
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vector
Fix usersSolve for artists
Repeat until convergence…
•Aggregate all (user, artist) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
Alternating Least Squares
Artists
Vectors•“Compact” representation for users and items(artists) in the same space
23
Recommendations via Cosine Similarity
23
Recommendations via Cosine Similarity
24
Annoy
•70 million users, at least 4 million tracks for candidates per user•Brute Force Approach: •O(70M x 4M x 10) ~= 0(3 peta-operations)!
• Approximate Nearest Neighbor Oh Yeah!
• Uses Local Sensitive Hashing
• Clone: https://github.com/spotify/annoy
25
Thank You!You can reach me @Email: [email protected]: @vid052