machine learning and big data for music discovery at spotify
TRANSCRIPT
Machine Learning & Big Data for Music Discovery
Galvanize NYC, Mar 9th, 2017
Vidhya Murali @vid052Ching-Wei Chen @cweichen
100M users in 60 markets
50M subscribers
Over 30M songs, and 2B playlists
$5B paid to rightsholders
SpotifyMusic for everyone.
30 Million Songs...
What to recommend?
What to recommend?
Discover
Discover
Discover Weekly
How What to recommend?
Many flavors of recommendations
Radio
Many flavors of recommendations
Daily MixRadio
Many flavors of recommendations
This Is:Daily MixRadio
Many flavors of recommendations
Recommended SongsThis Is:Daily MixRadio
‣ Editorial
Recommendation approaches
‣ Editorial‣ Algorithmic
○Content-based
■ Metadata
■ Audio Signals
○Collaborative
■ Usage based
Recommendation approaches
‣ Editorial‣ Algorithmic
○Content-based
■ Metadata
■ Audio Signals
○Collaborative
■ Usage based
‣ Algotorial
Recommendation approaches
‣ Editorial‣ Algorithmic
○Content-based
■ Metadata
■ Audio Signals
○Collaborative
■ Usage based
‣ Algotorial
Recommendation approaches
‣ Find patterns from user’s past behavior to generate recommendations.
‣ Domain independent
‣ Scalable
Collaborative Filtering
Compact representation for each user and item (song): f-dimensional vectors
Latent Factor Models
NLP Models
Context & Co-occurrence is key!
Document : Playlist
Word : Song
NLP Models work great on playlists!
Generating Song Vectors
w1 w2 w3 w4 w5 w6 w7 wn....………..
?
Music in Latent Space
Semantic Regularities
Music + Math = Epic
Songs as vectors
Recommendations
User Profile:
● Aggregation over user interactions on Spotify
● Clustering to capture distinct user tastes/ contexts
● Time Sensitive profiling
‣ Scale of catalog● 30M tracks; 2B playlists● Training
○ 25B data points○ 100M users○ 60 countries represented
Challenges unique to spotify
Data Pipelines
Data Pipelines
Big Table
Big Table
GCS
DATAFLOW
Pub Sub
Scio
‣ Scale of catalog● 30M tracks; 2B playlists● Training
○ 25B data points○ 100M users○ 60 countries represented
‣ Cold-Start○ New Users○ New Music
Challenges unique to spotify
Learning from sound
What’s in a sound?
What’s in a sound?
AmplitudeTime
Frequencies
Loudness
What’s in a sound?
MelodyBeats
Chords
Voices
Instruments
Lyrics
AmplitudeTime
Frequencies
Loudness
What’s in a sound?
MelodyBeats
Chords
Voices
Instruments
Lyrics
AmplitudeTime
Frequencies
Loudness
PopularityEra
RegionGenre
Mood
Purpose
* Some information isn’t encoded in the signal itself, but within the cultural context around the music
Supervised Machine Learning
http://www.nltk.org/
Deep Learning
Deep Learning
1. No feature extraction necessary
2. LOTS of simple learning nodes in many layers
3. Propogate errors backwards to learn optimal weights
4. Needs LOTS of data
Convolutional Neural Networks
Typical Convolutional Neural Network
Deep Learning on Audio at Spotify
Sander Dieleman: http://benanne.github.io/2014/08/05/spotify-cnns.html
Input: Audio spectrogram
Output: Latent Space Vector
Audio vector space
Cold Start? Problem solved! *
* Not completely, of course!
Recommending new music
Release Radar Fresh Finds
Recommendations at Spotify
Recommended SongsThis Is:Daily MixRadio
Discover Weekly
Release Radar
What’s next?
?
Join the band!www.spotify.com/jobs
Ching-Wei (@cweichen): [email protected]
Vidhya (@vid052): [email protected]