mining large-scale music data sets - columbia universitydpwe/talks/ita-2012-02.pdftrkuypw128f92e1fc0...

15
LabROSA Overview - Dan Ellis 2012-02-09 /15 1 1. Music Audio Representations 2. The Million Song Dataset 3. Finding Similar Items (Cover Songs) 4. Current Work Mining Large-Scale Music Data Sets Dan Ellis & Thierry Bertin-Mahieux Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/

Upload: others

Post on 17-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

LabROSA Overview - Dan Ellis 2012-02-09 /151

1. Music Audio Representations2. The Million Song Dataset3. Finding Similar Items (Cover Songs)4. Current Work

Mining Large-ScaleMusic Data Sets

Dan Ellis & Thierry Bertin-MahieuxLaboratory for Recognition and Organization of Speech and Audio

Dept. Electrical Eng., Columbia Univ., NY USA

{dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/

LabROSA Overview - Dan Ellis 2012-02-09 /15

The Problem

2

One Million SongsQuery track

“Similar” tracks

Time

Frequency

0 1 2 3 4 5 6 7 8 9 100

1000

2000

3000

4000

Time

Frequency

0 1 2 3 4 5 6 7 8 9 100

1000

2000

3000

4000

Time

Frequency

0 1 2 3 4 5 6 7 8 9 100

1000

2000

3000

4000

Time

Frequency

0 1 2 3 4 5 6 7 8 9 100

1000

2000

3000

4000

LabROSA Overview - Dan Ellis 2012-02-09 /15

1. Music Audio Representations

• We need a description of the audio that contains “suitable” detail

• Speech Recognition uses Mel-Frequency Cepstral Coefficients (MFCCs)

3

freq

/ Hz

Coef

ficien

tfre

q / H

z

Let It Be (LIB-1) - log-freq specgram

32913965915

MFCCs

2468

1012

time / sec

Noise excited MFCC resynthesis (LIB-2)

0 5 10 15 20 25

32913965915

LabROSA Overview - Dan Ellis 2012-02-09 /15

Chroma Features

• We’d like to preserve the notesat least within one octave

4

Let It Be - log-freq specgram (LIB-1)

Chroma features

CDEGAB

Shepard tone resynthesis of chroma (LIB-3)

MFCC-filtered shepard tones (LIB-4)

freq

/ Hz

30014006000

freq

/ Hz

chro

ma b

in

30014006000

freq

/ Hz

30014006000

time / sec0 5 10 15 20 25

War

ren

et a

l. 200

3

LabROSA Overview - Dan Ellis 2012-02-09 /15

Beat-Synchronous Chroma

• Perform Beat Tracking.. and record one chroma vector per beatcompact representation of harmonies

5

Let It Be - log-freq specgram (LIB-1)

Onset envelope + beat times

Beat-synchronous chroma

Beat-synchronous chroma + Shepard resynthesis (LIB-6)

freq

/ Hz

30014006000

freq

/ Hz

30014006000

CDEGAB

chro

ma

bin

time / sec0 5 10 15 20 25

LabROSA Overview - Dan Ellis 2012-02-09 /15

2. Million Song Dataset (MSD)

• Commercial-scale dataset available to MIR researchers1M pop songs250 GB of features(6 years of listening)

6

• Many facets:Features,Lyrics, Tags, Covers, Listeners ...

http://labrosa.ee.columbia.edu/millionsong

Thierry Bertin-Mahieux

LabROSA Overview - Dan Ellis 2012-02-09 /15

MSD Audio Features

• Use Echo Nest “Analyze” features

segment audio into variable-length “events”

represent by 12 chroma + 12 “timbre”

supports a crude resynthesis:

7

240

761

2416

CDE

GAB

time / sec

freq

/ Hz

freq

/ Hz

0 2 4 6 8 10 12 14

240

761

2416

TRKUYPW128F92E1FC0 - Tori Amos - Smells Like Teen Spirit

Original

Resynth

EN Features

LabROSA Overview - Dan Ellis 2012-02-09 /15

MSD Metadata

8

artist: 'Tori Amos' release: 'LIVE AT MONTREUX' title: 'Smells Like Teen Spirit' id: 'TRKUYPW128F92E1FC0' key: 5 mode: 0 loudness: -16.6780 tempo: 87.2330time_signature: 4 duration: 216.4502 sample_rate: 22050 audio_md5: '8' 7digitalid: 5764727 familiarity: 0.8500 year: 1992

%5489,4468, Smells Like Teen SpiritTRTUOVJ128E078EE10 NirvanaTRFZJOZ128F4263BE3 Weird Al YankovicTRJHCKN12903CDD274 Pleasure BeachTRELTOJ128F42748B7 The Flying PicketsTRJKBXL128F92F994D Rhythms Del Mundo feat. ShanadeTRIHLAW128F429BBF8 The Bad PlusTRKUYPW128F92E1FC0 Tori Amos

12 hello 11 i 10 a 9 and 7 it 6 are 6 we 6 now

6 here 6 us 6 entertain 4 the 4 feel 4 yeah 3 to 3 my

3 is 3 with 3 oh 3 out 3 an 3 light 3 less 3 danger

100.0 – cover57.0 – covers43.0 – female vocalists42.0 – piano34.0 – alternative14.0 – singer-songwriter11.0 – acoustic8.0 – tori amos7.0 – beautiful6.0 – rock6.0 – pop6.0 – Nirvana6.0 – female vocalist6.0 – 90s5.0 – out of genre covers

5.0 – cover songs4.0 – soft rock4.0 – nirvana cover4.0 – Mellow4.0 – alternative rock3.0 – chick rock3.0 – Ballad3.0 – Awesome Covers2.0 – melancholic2.0 – k00l ch1x2.0 – indie2.0 – female vocalistist2.0 – female2.0 – cover song2.0 – american

EN Metadata

SHS Covers MxM Lyric Bag-of-Words

Last.fm Tags

LabROSA Overview - Dan Ellis 2012-02-09 /15

3. Finding Similar Items

• If it’s really the same, we can use fingerprinting (e.g. Shazam):

9

Query audio

0

500

1000

1500

2000

2500

3000

3500

4000

Match: 05!Full Circle at 0.032 sec

0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

500

1000

1500

2000

2500

3000

3500

4000

time / sec

freq

/ Hz

find local “landmarks”quantize by pairsinverted index

Avery Wang ’03

LabROSA Overview - Dan Ellis 2012-02-09 /15

Dynamic Programming Alignment

• We can match the chroma representations by DP alignment

robust to timing changesquite efficientbest performer in MIREX Cover Song evaluations

10

Serra, Gomez et al. ’08

LabROSA Overview - Dan Ellis 2012-02-09 /15

Large-Scale Cover Matching

• How can we find covers in 1M songs?@ 1 sec / comparison, one search = 11.5 CPU-daysfull N2 mining = 16,000 CPU-years

• Hashing?landmarks from chroma patches (like Shazam)

represent item by distribution of contour fragmentsone stage in a filtering hierarchy...

11

Bertin-Mahieux & Ellis ’11

LabROSA Overview - Dan Ellis 2012-02-09 /15

Euclidean Cover Matching

• Reduce cover matching to nearest-neighbor search in fixed-size beat-chroma patches

• Right “distance”?principal componentslearned weightings

• Alignment/segmentation?music segmentationmultiple probesframing-insensitive representation

12

Ellis & Poliner ’08Bertin-Mahieux & Ellis ’12

Between the Bars ! Elliot Smith ! pwr .25

120 130 140 150 160 1702468

1012

Between the Bars ! Glenn Phillips ! pwr .25 ! !17 beats ! trsp 2

120 130 140 150 160 1702468

1012

pointwise product (sum(12x300) = 19.34)

time / beats

chro

ma

bin

120 130 140 150 160 1702468

1012

LabROSA Overview - Dan Ellis 2012-02-09 /15

4. Other Work: Pattern Mining

• Cluster beat-synchronous chroma patches

13

#32273 - 13 instances #51917 - 13 instances

#65512 - 10 instances #9667 - 9 instances

#10929 - 7 instances #61202 - 6 instances

#55881 - 5 instances

5 10 15 20

#68445 - 5 instances

5 10 15 time / beats

GFDCA

GFDCA

GFDCA

GFDCA

chro

ma bi

nsa20-top10x5-cp4-4p0

LabROSA Overview - Dan Ellis 2012-02-09 /15

Other Work: MSD Challenge

• Using “Taste Profile” dataUser-Song-playcount

• Challenge: Rank 1M songs to complete half a playlistusing audio, metadata, lyrics, whatever

• Kaggle.com based contestself-service evaluation, leader board

• Results at MIREX, AdMIRe

14

b80344d063b5ccb3212f76538f3d9e43d87dca9e SOAKIMP12A8C130995 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOAPDEY12A81C210A9 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBBMDR12A8C13253B 2 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOCNMUH12A6D4F6E6D 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 5

with Brian McFee

LabROSA Overview - Dan Ellis 2012-02-09 /15

Summary• Finding Musical Similarity at Large Scale

• Beat Chroma Representation

• Million Song Dataset

• MSD Challenge

15