more like this: machine learning approaches to music similarity
DESCRIPTION
The slides from my dissertation talk. Thesis available at http://cseweb.ucsd.edu/~bmcfee/papers/bmcfee_dissertation.pdfTRANSCRIPT
More Like This:Machine Learning Approaches
to Music Similarity
Brian McFee
Computer Science & EngineeringUniversity of California, San Diego
Music discovery in days of yore...
Music discovery 2.0: the present
f
• ~20 million songs available
• Discovery is still largely human-powered
A Google for music?
A Google for music?
• Standard text search can work with meta-data• Can we predict meta-data from audio? ⁃ [Turnbull, 2008], [Barrington, 2011]
Query by example
• Natural, user-friendly alternative to text search
Query by example
• Natural, user-friendly alternative to text search
Query by example
• Natural, user-friendly alternative to text search
This talk
• Learning algorithms for QBE, geared toward music discovery
• We'll look at two consumption models:
• Evaluation derived from user behavior
Passive listening(playlist generation)
Active browsing(search & ranking)
Learning similarity
Defining similarity: semantics?
Song similarity=
tag similarity?
Defining similarity: semantics?
• Drawbacks: - Choosing, weighting vocabulary is surprisingly difficult - Hard to maintain quality at scale
Defining similarity: human judgements?
• Which is more similar?[M. & Lanckriet, 2009, 2011]
Defining similarity: human judgements?
• Which is more similar?
• Drawbacks: ambiguity, subjectivity, scale
[M. & Lanckriet, 2009, 2011]
Collaborative filter similarity
• Collect listening histories for (lots of!) users
• Song similarity = portion of users in common
Collaborative filter similarity
• Collaborative filters perform well... - ... for tagging [Kim, Tomasik, & Turnbull, 2009] - ... and playlisting [Barrington, Oda, & Lanckriet, 2009] - ... and recommendation (Yahoo, Last.fm, iTunes...)
• Implicit feedback requires no additional effort from users
• ... but fails on unpopular items: the cold start problem!
Learning from a collaborative filter[M., Barrington, & Lanckriet, 2010, 2012]
1.
2.
3.
Learning from a collaborative filter[M., Barrington, & Lanckriet, 2010, 2012]
1.
2.
3.
Learning from a collaborative filter[M., Barrington, & Lanckriet, 2010, 2012]
1.
2.
3.
Rankings in audio space
Rankings in CF space
=
Metric learning to rank
• The goal:
Ranking by (learned) distance
Targetrankings
=
Metric learning to rank
• The goal:[M. & Lanckriet, 2010]
Ranking by (learned) distance
Targetrankings
=
Metric learning to rank
• The goal:
• Optimize a linear transformation for ranking
[M. & Lanckriet, 2010]
Structure prediction: nearest neighbors
• Setup: database , rankings
• PSD matrix transforms features
• Order by distance from :
Structure prediction: nearest neighbors
• Setup: database , rankings
• PSD matrix transforms features
• Order by distance from :
• encodes each (query, ranking) pair
Metric learning to rank (MLR)
Score fortarget ranking
Score for anyother ranking
Predictionerror
+>
• Supported losses Δ: AUC, KNN, MAP, MRR, NDCG, Prec@k
MLR solver• Cutting-plane algorithm based on 1-slack Structural SVM [Joachims, et al. 2009]
• Repeat until convergence:
Constraintgeneration
(DP)
Semi-definiteprogramming
MLR solver• Cutting-plane algorithm based on 1-slack Structural SVM [Joachims, et al. 2009]
• Repeat until convergence:
Constraintgeneration
(DP)
Semi-definiteprogramming
Sequence of QPs
MLR solver• Cutting-plane algorithm based on 1-slack Structural SVM [Joachims, et al. 2009]
• Repeat until convergence:
• Multiple kernel extensions: [Galleguillos, M., Belongie, & Lanckriet 2011]
Constraintgeneration
(DP)
Semi-definiteprogramming
Sequence of QPs
Audio pipeline
Audio signal
Audio pipeline
Audio signal 1. Feature extraction
Bag of ΔMFCCs
Audio pipeline
Audio signal 1. Feature extraction
Bag of ΔMFCCs
Codeword hist.
2. Vector quantization
Audio pipeline
Audio signal
PPK
1. Feature extraction
Bag of ΔMFCCs
Codeword hist.
2. Vector quantization
3. Probability product kernel
Audio pipeline
Audio signal
PPK
CF similarity
MLR
Supervision
Features
Evaluation: CAL10K
• Last.fm collaborative filter - 360K users, 186K artists
• CAL10K songs - 5.4K songs, 2K artists (after CF matching)
[Celma, 2008]
[Tingle, Turnbull, & Kim, 2010]
Evaluation: CAL10K
• Last.fm collaborative filter - 360K users, 186K artists
• CAL10K songs - 5.4K songs, 2K artists (after CF matching)
• Evaluation: - Split artists into train/val/test - Target rankings: top-10 most similar train artists
[Celma, 2008]
[Tingle, Turnbull, & Kim, 2010]
Evaluation: comparison
• Gaussian mixture models + KL divergence - 8 component, diagonal covariance GMM per song
• Auto-tags: predict 149 semantic tags from audio [Turnbull, 2008]
• [Our method] VQ+MLR: 1024 codewords
• Expert tags: 1053 tags from Pandora [Tingle, et al., 2009]
Similarity learning: results
GMM (KL)
Auto-tags
Auto-tags + MLR
Audio VQ
Audio VQ + MLR
Expert tags (cos)
Expert tags + MLR0.65 0.70 0.75 0.80 0.85 0.90 0.95
AUC
Example playlists
The Ramones - Go Mental
Def Leppard - Promises The Buzzcocks - Harmony In My Head Los Lonely Boys - Roses Wolfmother - Colossal Judas Priest - Diamonds and Rust (live)
Example playlists
The Ramones - Go Mental
Def Leppard - Promises The Buzzcocks - Harmony In My Head Los Lonely Boys - Roses Wolfmother - Colossal Judas Priest - Diamonds and Rust (live)
The Buzzcocks - Harmony In My Head Mötley Crüe - Same Ol' Situation The Offspring - Gotta Get Away The Misfits - Skulls AC/DC - Who Made Who (live)
MLR
Example playlists
Fats Waller - Winter Weather
Dizzy Gillespie - She's Funny That WayEnrique Morente - SoleaChet Atkins - In the MoodRachmaninov - Piano Concerto #4Eluvium - Radio Ballet
Example playlists
Fats Waller - Winter Weather
Dizzy Gillespie - She's Funny That WayEnrique Morente - SoleaChet Atkins - In the MoodRachmaninov - Piano Concerto #4Eluvium - Radio Ballet
Chet Atkins - In the MoodCharlie Parker - What Is This Thing Called Love?Bud Powell - OblivionBob Wills & His Texas Playboys - Lyla LouBob Wills & His Texas Playboys - Sittin' On Top of the World
Scaling up: fast retrieval
• Audio similarity search for a million songs?
• Idea: Index data with spatial trees
• 100-NN search over 900K songs: - Brute force: 2.4s - 50% recall: 0.14s 17x speedup - 20% recall: 0.02s 120x speedup
[M. & Lanckriet, 2011]
Similarity learning: summary
• Collaborative filters provide user-centric music similarity
• CF similarity can be approximated by audio features
• Audio search can be done quickly at large-scale
Playlist generation
Playlist generation
• Goal: generate a "good" song sequence - Music auto-pilot (given context)
• Many existing algorithms, but no standard evaluation
• What makes one algorithm better than another?
Playlist evaluation 1: Human survey
• Idea: generate playlists, ask for opinions
• Impractical at large-scale: - Huge search space - User taste, expertise can be problematic - Slow, expensive
• Does not facilitate rapid evaluation and optimization
Playlist evaluation 2: Information retrieval
• Idea: - Define "good" and "bad" playlists - Predict the next song, measure accuracy
• But what makes a bad playlist?
• Do users agree on good/bad?
A generative approach
• Playlist algorithm = distribution over playlists
• Don't evaluate synthetic playlists
• Do evaluate the likelihood of generating real playlists
[M. & Lanckriet, 2011b]
The playlist collection: AOTM-2011
• Art of the Mix - 13 years of playlists - ~210K playlist segments - ~100K songs from MSD
• Top 25 playlist categories: - Genre: Punk, Hip-hop, Reggae... - Context: Road trip, Break-up, Sleep... - Other: Mixed genre, Alternating DJ...
A simple playlist model
1. Start with a set of songs
A simple playlist model
2. Select a subset (e.g., jazz songs)
A simple playlist model
3. Select a song
A simple playlist model
4. Select a new subset
A simple playlist model
4. Select a new subset
A simple playlist model
5. Select a new song
A simple playlist model
6. Repeat...
A simple playlist model
6. Repeat...
Connecting the dots...
• Random walk on a hypergraph - Vertices = songs - Edges = subsets
• Edges derived from: - Audio clusters, tags, lyrics, era, popularity, CF - or combinations/intersections
• Goal: optimize edge weights from example playlists
Playlist model
exp. prior
playlists
transitions
edge weights
Playlist generation: evaluation
• Setup: - Split playlist collection into train/test - Learn edge weights on training playlists - Evaluate average likelihood of test playlists
• Train per category, or all together
• Compare against uniform shuffle baseline
Random walk results
ALLMixed
ThemeRock-pop
Alternating DJIndie
Single artistRomanticRoad trip
PunkDepression
Break upNarrativeHip-hop
SleepElectronic
Dance-houseR&B
CountryCover songs
HardcoreRockJazzFolk
ReggaeBlues
0% 5% 10% 1 5% 20% 25%
Log-likelihood gain over random shuffle
Global modelCategory-specific
Stationary model results
ALLMixed
ThemeRock-pop
Alternating DJIndie
Single artistRomanticRoad trip
PunkDepression
Break upNarrativeHip-hop
SleepElectronic
Dance-houseR&B
CountryCover songs
HardcoreRockJazzFolk
ReggaeBlues
Log-likelihood gain over random shuffle
-15% -10% -5% 0% 5% 10% 15% 20%
Global modelCategory-specific
Example playlists
70s & soulAudio #14 & funkDECADE 1965 & soul
Lyn Collins - ThinkIsaac Hayes - No Name BarMichael Jackson - My Girl
Audio #11 & downtempoDECADE 1990 & trip-hopAudio #11 & electronica
Everything But The Girl - BlameMassive Attack - Spying GlassBjörk - Hunter
Rhythm & Blues
Electronic music
Playlist generation summary
• Generative approach simplifies evaluation
• AOTM-2011 collection facilitates learning and evaluation
• Robust, efficient and transparent feature integration
The future
Directions for future work
• Audio features: coding, dynamics and rhythm
• Playlist models: mixtures, long-range interactions
• UI models: interactive, context-aware, diversity
Personalized recommendation
• The Million Song Dataset Challenge
• Listening histories for 1.1M users, 380K songs
• Task: personalized song recommendation
[M., Bertin-Mahieux, Ellis, & Lanckriet, 2012]
Conclusion
• MLR can optimize distance metrics for ranking, QBE retrieval
• Audio similarity can approximate a collaborative filter
• Generative playlist model integrates data, models dynamics
• User-centric evaluation makes it all possible
Thanks!
Metric partial order feature
• Score is large when distances match ranking
Playlist weights: 6390 edges
Audio CF Era Familiarity Lyrics Tags Uniform
ALLMixed
ThemeRock-pop
Alternating DJIndie
Single ArtistRomanticRoadTrip
PunkDepression
Break UpNarrativeHip-hop
SleepElectronic music
Dance-houseRhythm and Blues
CountryCover
HardcoreRockJazzFolk
ReggaeBlues
• Audio & CF: k-means (16/64/256)• Era: year, decade, decade+5• Familiarity: high/med/low
• Lyrics: LDA (k=32, top-1/3/5)• Tags: Last.fm top-10• Conjunctions