more like this: machine learning approaches to music similarity

More Like This:Machine Learning Approaches

to Music Similarity

Brian McFee

Computer Science & EngineeringUniversity of California, San Diego

Music discovery in days of yore...

Music discovery 2.0: the present

f

• ~20 million songs available

• Discovery is still largely human-powered

A Google for music?

A Google for music?

• Standard text search can work with meta-data• Can we predict meta-data from audio? ⁃ [Turnbull, 2008], [Barrington, 2011]

Query by example

• Natural, user-friendly alternative to text search

This talk

• Learning algorithms for QBE, geared toward music discovery

• We'll look at two consumption models:

• Evaluation derived from user behavior

Passive listening(playlist generation)

Active browsing(search & ranking)

Learning similarity

Defining similarity: semantics?

Song similarity=

tag similarity?

Defining similarity: semantics?

• Drawbacks: - Choosing, weighting vocabulary is surprisingly difficult - Hard to maintain quality at scale

Defining similarity: human judgements?

• Which is more similar?[M. & Lanckriet, 2009, 2011]

Defining similarity: human judgements?

• Which is more similar?

• Drawbacks: ambiguity, subjectivity, scale

[M. & Lanckriet, 2009, 2011]

Collaborative filter similarity

• Collect listening histories for (lots of!) users

• Song similarity = portion of users in common

Collaborative filter similarity

• Collaborative filters perform well... - ... for tagging [Kim, Tomasik, & Turnbull, 2009] - ... and playlisting [Barrington, Oda, & Lanckriet, 2009] - ... and recommendation (Yahoo, Last.fm, iTunes...)

• Implicit feedback requires no additional effort from users

• ... but fails on unpopular items: the cold start problem!

Learning from a collaborative filter[M., Barrington, & Lanckriet, 2010, 2012]

1.

2.

3.

Rankings in audio space

Rankings in CF space

=

Metric learning to rank

• The goal:

Ranking by (learned) distance

Targetrankings

=


• The goal:[M. & Lanckriet, 2010]

Ranking by (learned) distance

Targetrankings

=


• The goal:

• Optimize a linear transformation for ranking

[M. & Lanckriet, 2010]

Structure prediction: nearest neighbors

• Setup: database , rankings

• PSD matrix transforms features

• Order by distance from :

Structure prediction: nearest neighbors

• Setup: database , rankings

• PSD matrix transforms features

• Order by distance from :

• encodes each (query, ranking) pair

Metric learning to rank (MLR)

Score fortarget ranking

Score for anyother ranking

Predictionerror

+>

• Supported losses Δ: AUC, KNN, MAP, MRR, NDCG, Prec@k

MLR solver• Cutting-plane algorithm based on 1-slack Structural SVM [Joachims, et al. 2009]

• Repeat until convergence:

Constraintgeneration

(DP)

Semi-definiteprogramming




(DP)


Sequence of QPs



• Multiple kernel extensions: [Galleguillos, M., Belongie, & Lanckriet 2011]


(DP)


Sequence of QPs

Audio pipeline

Audio signal

Audio pipeline

Audio signal 1. Feature extraction

Bag of ΔMFCCs

Audio pipeline

Audio signal 1. Feature extraction

Bag of ΔMFCCs

Codeword hist.

2. Vector quantization

Audio pipeline

Audio signal

PPK

1. Feature extraction

Bag of ΔMFCCs

Codeword hist.

2. Vector quantization

3. Probability product kernel

Audio pipeline

Audio signal

PPK

CF similarity

MLR

Supervision

Features

Evaluation: CAL10K

• Last.fm collaborative filter - 360K users, 186K artists

• CAL10K songs - 5.4K songs, 2K artists (after CF matching)

[Celma, 2008]

[Tingle, Turnbull, & Kim, 2010]

Evaluation: CAL10K

• Last.fm collaborative filter - 360K users, 186K artists

• CAL10K songs - 5.4K songs, 2K artists (after CF matching)

• Evaluation: - Split artists into train/val/test - Target rankings: top-10 most similar train artists

[Celma, 2008]

[Tingle, Turnbull, & Kim, 2010]

Evaluation: comparison

• Gaussian mixture models + KL divergence - 8 component, diagonal covariance GMM per song

• Auto-tags: predict 149 semantic tags from audio [Turnbull, 2008]

• [Our method] VQ+MLR: 1024 codewords

• Expert tags: 1053 tags from Pandora [Tingle, et al., 2009]

Similarity learning: results

GMM (KL)

Auto-tags

Auto-tags + MLR

Audio VQ

Audio VQ + MLR

Expert tags (cos)

Expert tags + MLR0.65 0.70 0.75 0.80 0.85 0.90 0.95

AUC

Example playlists

The Ramones - Go Mental

Def Leppard - Promises The Buzzcocks - Harmony In My Head Los Lonely Boys - Roses Wolfmother - Colossal Judas Priest - Diamonds and Rust (live)

Example playlists

The Ramones - Go Mental

Def Leppard - Promises The Buzzcocks - Harmony In My Head Los Lonely Boys - Roses Wolfmother - Colossal Judas Priest - Diamonds and Rust (live)

The Buzzcocks - Harmony In My Head Mötley Crüe - Same Ol' Situation The Offspring - Gotta Get Away The Misfits - Skulls AC/DC - Who Made Who (live)

MLR

Example playlists

Fats Waller - Winter Weather

Dizzy Gillespie - She's Funny That WayEnrique Morente - SoleaChet Atkins - In the MoodRachmaninov - Piano Concerto #4Eluvium - Radio Ballet

Example playlists

Fats Waller - Winter Weather

Dizzy Gillespie - She's Funny That WayEnrique Morente - SoleaChet Atkins - In the MoodRachmaninov - Piano Concerto #4Eluvium - Radio Ballet

Chet Atkins - In the MoodCharlie Parker - What Is This Thing Called Love?Bud Powell - OblivionBob Wills & His Texas Playboys - Lyla LouBob Wills & His Texas Playboys - Sittin' On Top of the World

Scaling up: fast retrieval

• Audio similarity search for a million songs?

• Idea: Index data with spatial trees

• 100-NN search over 900K songs: - Brute force: 2.4s - 50% recall: 0.14s 17x speedup - 20% recall: 0.02s 120x speedup

[M. & Lanckriet, 2011]

Similarity learning: summary

• Collaborative filters provide user-centric music similarity

• CF similarity can be approximated by audio features

• Audio search can be done quickly at large-scale

Playlist generation

Playlist generation

• Goal: generate a "good" song sequence - Music auto-pilot (given context)

• Many existing algorithms, but no standard evaluation

• What makes one algorithm better than another?

Playlist evaluation 1: Human survey

• Idea: generate playlists, ask for opinions

• Impractical at large-scale: - Huge search space - User taste, expertise can be problematic - Slow, expensive

• Does not facilitate rapid evaluation and optimization

Playlist evaluation 2: Information retrieval

• Idea: - Define "good" and "bad" playlists - Predict the next song, measure accuracy

• But what makes a bad playlist?

• Do users agree on good/bad?

A generative approach

• Playlist algorithm = distribution over playlists

• Don't evaluate synthetic playlists

• Do evaluate the likelihood of generating real playlists

[M. & Lanckriet, 2011b]

The playlist collection: AOTM-2011

• Art of the Mix - 13 years of playlists - ~210K playlist segments - ~100K songs from MSD

• Top 25 playlist categories: - Genre: Punk, Hip-hop, Reggae... - Context: Road trip, Break-up, Sleep... - Other: Mixed genre, Alternating DJ...

A simple playlist model

1. Start with a set of songs


2. Select a subset (e.g., jazz songs)


3. Select a song


4. Select a new subset


5. Select a new song


6. Repeat...

Connecting the dots...

• Random walk on a hypergraph - Vertices = songs - Edges = subsets

• Edges derived from: - Audio clusters, tags, lyrics, era, popularity, CF - or combinations/intersections

• Goal: optimize edge weights from example playlists

Playlist model

exp. prior

playlists

transitions

edge weights

Playlist generation: evaluation

• Setup: - Split playlist collection into train/test - Learn edge weights on training playlists - Evaluate average likelihood of test playlists

• Train per category, or all together

• Compare against uniform shuffle baseline

Random walk results

ALLMixed

ThemeRock-pop

Alternating DJIndie

Single artistRomanticRoad trip

PunkDepression

Break upNarrativeHip-hop

SleepElectronic

Dance-houseR&B

CountryCover songs

HardcoreRockJazzFolk

ReggaeBlues

0% 5% 10% 1 5% 20% 25%

Log-likelihood gain over random shuffle

Global modelCategory-specific

Stationary model results

ALLMixed

ThemeRock-pop

Alternating DJIndie

Single artistRomanticRoad trip

PunkDepression

Break upNarrativeHip-hop

SleepElectronic

Dance-houseR&B

CountryCover songs


ReggaeBlues

Log-likelihood gain over random shuffle

-15% -10% -5% 0% 5% 10% 15% 20%

Global modelCategory-specific

Example playlists

70s & soulAudio #14 & funkDECADE 1965 & soul

Lyn Collins - ThinkIsaac Hayes - No Name BarMichael Jackson - My Girl

Audio #11 & downtempoDECADE 1990 & trip-hopAudio #11 & electronica

Everything But The Girl - BlameMassive Attack - Spying GlassBjörk - Hunter

Rhythm & Blues

Electronic music

Playlist generation summary

• Generative approach simplifies evaluation

• AOTM-2011 collection facilitates learning and evaluation

• Robust, efficient and transparent feature integration

The future

Directions for future work

• Audio features: coding, dynamics and rhythm

• Playlist models: mixtures, long-range interactions

• UI models: interactive, context-aware, diversity

Personalized recommendation

• The Million Song Dataset Challenge

• Listening histories for 1.1M users, 380K songs

• Task: personalized song recommendation

[M., Bertin-Mahieux, Ellis, & Lanckriet, 2012]

Conclusion

• MLR can optimize distance metrics for ranking, QBE retrieval

• Audio similarity can approximate a collaborative filter

• Generative playlist model integrates data, models dynamics

• User-centric evaluation makes it all possible

Thanks!

Metric partial order feature

• Score is large when distances match ranking

Playlist weights: 6390 edges

Audio CF Era Familiarity Lyrics Tags Uniform

ALLMixed

ThemeRock-pop

Alternating DJIndie

Single ArtistRomanticRoadTrip

PunkDepression

Break UpNarrativeHip-hop

SleepElectronic music

Dance-houseRhythm and Blues

CountryCover


ReggaeBlues

• Audio & CF: k-means (16/64/256)• Era: year, decade, decade+5• Familiarity: high/med/low

• Lyrics: LDA (k=32, top-1/3/5)• Tags: Last.fm top-10• Conjunctions

more like this: machine learning approaches to music similarity

Technology