deezer - big data as a streaming service
TRANSCRIPT
Big Data as a Streaming Service
Big Data as a Streaming ServiceJulie KnibbeProduct Manager – Deezer@julieknibbe
Manuel MoussalamR&D – Deezer
Big Data as a Streaming Service
Product Manager
Defines features that meet users needs
Based on:• Market research• Product Data Analytics• Users feedback• Competitive Analysis• Creativity
Big Data as a Streaming Service
The Leanback Experience Team at Deezer
• Product Manager• Project Manager• R&D Developers• Big Data developers• Web developers
(front/back)• Mobile developers• QA
Big Data as a Streaming Service
Deezer
Active users 30M
Countries 180+
Tracks in catalog 35M
Artists in catalog 1M
Music providers 1K+
Big Data as a Streaming Service
The recommendation problem
No one wants to hear music they don’t like
Big Data as a Streaming Service
The recommendation problem
No one wants to hear the same 200 tracks over and over again
Big Data as a Streaming Service
The recommendation problem
You need to hear a song from 1 to 7 times to like it
Big Data as a Streaming Service
The recommendation problem
Parameters and variables:• Mood• Tastes• Habits• Openness• Sociological profile• …
Dimensions:• 35M tracks• 1M artists• 30M users
Big Data as a Streaming Service
User Profile – Implicit / Explicit feedback
Adaptation Add new informationForget old interests
Big Data as a Streaming Service
Music Recommendation
Given a listening profile for user X, what music should we recommend?
Big Data as a Streaming Service
Recommendation system – adapting to user
types
Savants
Enthusiasts
Casuals
Indifferents
Riskier recommendations
Popular recommendations
Finding the right mix between novelty, familiarity and relevance
Big Data as a Streaming Service
Recommendation system – adapting to user
types
Sources: http://alchemi.co.uk/archives/mus/groups_and_beha.htmlhttp://musicmachinery.com/2014/01/14/the-zero-button-music-player-2/
Big Data as a Streaming Service
At Deezer
Mixing collaborative filtering with semi-supervised approaches• Curation: Deezer Editors• Multi-layered graph structure of tracks & artists• Usage monitoring
Based on Hadoop + ElasticSearch + Spark
Big Data as a Streaming Service
Collaborative Filtering: Matching
Collaborative Filtering :« User X listened to the Rolling Stones. Users listening to the Rolling Stones usually also listen to the Who, let's suggest the Who to user X. »
Popularized by the Netflix Prize
Big Data as a Streaming Service
Collaborative Filtering
Either compute similarity upon users or items.. or both
Big Data as a Streaming Service
Collaborative filtering: Exemplar based
Association rules• Market basket analysis• A priori Algorithm• ..
But:• Scalability issues• Hubs and Island issues (Stromae example)
Big Data as a Streaming Service
Collaborative filtering: Model based
A
n
m= U
I
X
k
• U is low-dimensional model on users• I on itemsRecommended items are missing entries of A
Matrix Factorization
Big Data as a Streaming Service
Collaborative Filtering: Limitations
• Cold Start problem• Sparse user-item matrix (1% coverage)• Only based on social behaviors• Popularity bias (« The rich gets richer »)
Big Data as a Streaming Service
Content-based filtering: Limitations
• Cold Start problem• Users with atypical tastes• Lack of novelty • Subjectivity not taken into account
Big Data as a Streaming Service
Content Similarity
Clustering tracks, artists, albums…
Methods:• Matrix Factorization techniques• Spectral clustering• Musical features extraction• Louvain algorithm• …
Big Data as a Streaming Service
Cleaning
• Mislabeled data: Different sources tell different things about songs, artists, albums
• No universally adopted music ontology• Subjectivity
• Outlier detection: confronting several sources and models
Big Data as a Streaming Service
Algorithms A/B Testing
Algo A
Algo B
Observe results:• Daily Active Users• Streams / users• Satisfaction• …
Deezer users
Big Data as a Streaming Service
Algorithms A/B Testing: Example
Test: Are new users (with no profile data) more likely to be more satisfied with charts items or with new ones?