mining large-scale music data sets - columbia universitydpwe/talks/ita-2012-02.pdftrkuypw128f92e1fc0...
TRANSCRIPT
LabROSA Overview - Dan Ellis 2012-02-09 /151
1. Music Audio Representations2. The Million Song Dataset3. Finding Similar Items (Cover Songs)4. Current Work
Mining Large-ScaleMusic Data Sets
Dan Ellis & Thierry Bertin-MahieuxLaboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Eng., Columbia Univ., NY USA
{dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/
LabROSA Overview - Dan Ellis 2012-02-09 /15
The Problem
2
One Million SongsQuery track
“Similar” tracks
Time
Frequency
0 1 2 3 4 5 6 7 8 9 100
1000
2000
3000
4000
Time
Frequency
0 1 2 3 4 5 6 7 8 9 100
1000
2000
3000
4000
Time
Frequency
0 1 2 3 4 5 6 7 8 9 100
1000
2000
3000
4000
Time
Frequency
0 1 2 3 4 5 6 7 8 9 100
1000
2000
3000
4000
LabROSA Overview - Dan Ellis 2012-02-09 /15
1. Music Audio Representations
• We need a description of the audio that contains “suitable” detail
• Speech Recognition uses Mel-Frequency Cepstral Coefficients (MFCCs)
3
freq
/ Hz
Coef
ficien
tfre
q / H
z
Let It Be (LIB-1) - log-freq specgram
32913965915
MFCCs
2468
1012
time / sec
Noise excited MFCC resynthesis (LIB-2)
0 5 10 15 20 25
32913965915
LabROSA Overview - Dan Ellis 2012-02-09 /15
Chroma Features
• We’d like to preserve the notesat least within one octave
4
Let It Be - log-freq specgram (LIB-1)
Chroma features
CDEGAB
Shepard tone resynthesis of chroma (LIB-3)
MFCC-filtered shepard tones (LIB-4)
freq
/ Hz
30014006000
freq
/ Hz
chro
ma b
in
30014006000
freq
/ Hz
30014006000
time / sec0 5 10 15 20 25
War
ren
et a
l. 200
3
LabROSA Overview - Dan Ellis 2012-02-09 /15
Beat-Synchronous Chroma
• Perform Beat Tracking.. and record one chroma vector per beatcompact representation of harmonies
5
Let It Be - log-freq specgram (LIB-1)
Onset envelope + beat times
Beat-synchronous chroma
Beat-synchronous chroma + Shepard resynthesis (LIB-6)
freq
/ Hz
30014006000
freq
/ Hz
30014006000
CDEGAB
chro
ma
bin
time / sec0 5 10 15 20 25
LabROSA Overview - Dan Ellis 2012-02-09 /15
2. Million Song Dataset (MSD)
• Commercial-scale dataset available to MIR researchers1M pop songs250 GB of features(6 years of listening)
6
• Many facets:Features,Lyrics, Tags, Covers, Listeners ...
http://labrosa.ee.columbia.edu/millionsong
Thierry Bertin-Mahieux
LabROSA Overview - Dan Ellis 2012-02-09 /15
MSD Audio Features
• Use Echo Nest “Analyze” features
segment audio into variable-length “events”
represent by 12 chroma + 12 “timbre”
supports a crude resynthesis:
7
240
761
2416
CDE
GAB
time / sec
freq
/ Hz
freq
/ Hz
0 2 4 6 8 10 12 14
240
761
2416
TRKUYPW128F92E1FC0 - Tori Amos - Smells Like Teen Spirit
Original
Resynth
EN Features
LabROSA Overview - Dan Ellis 2012-02-09 /15
MSD Metadata
8
artist: 'Tori Amos' release: 'LIVE AT MONTREUX' title: 'Smells Like Teen Spirit' id: 'TRKUYPW128F92E1FC0' key: 5 mode: 0 loudness: -16.6780 tempo: 87.2330time_signature: 4 duration: 216.4502 sample_rate: 22050 audio_md5: '8' 7digitalid: 5764727 familiarity: 0.8500 year: 1992
%5489,4468, Smells Like Teen SpiritTRTUOVJ128E078EE10 NirvanaTRFZJOZ128F4263BE3 Weird Al YankovicTRJHCKN12903CDD274 Pleasure BeachTRELTOJ128F42748B7 The Flying PicketsTRJKBXL128F92F994D Rhythms Del Mundo feat. ShanadeTRIHLAW128F429BBF8 The Bad PlusTRKUYPW128F92E1FC0 Tori Amos
12 hello 11 i 10 a 9 and 7 it 6 are 6 we 6 now
6 here 6 us 6 entertain 4 the 4 feel 4 yeah 3 to 3 my
3 is 3 with 3 oh 3 out 3 an 3 light 3 less 3 danger
100.0 – cover57.0 – covers43.0 – female vocalists42.0 – piano34.0 – alternative14.0 – singer-songwriter11.0 – acoustic8.0 – tori amos7.0 – beautiful6.0 – rock6.0 – pop6.0 – Nirvana6.0 – female vocalist6.0 – 90s5.0 – out of genre covers
5.0 – cover songs4.0 – soft rock4.0 – nirvana cover4.0 – Mellow4.0 – alternative rock3.0 – chick rock3.0 – Ballad3.0 – Awesome Covers2.0 – melancholic2.0 – k00l ch1x2.0 – indie2.0 – female vocalistist2.0 – female2.0 – cover song2.0 – american
EN Metadata
SHS Covers MxM Lyric Bag-of-Words
Last.fm Tags
LabROSA Overview - Dan Ellis 2012-02-09 /15
3. Finding Similar Items
• If it’s really the same, we can use fingerprinting (e.g. Shazam):
9
Query audio
0
500
1000
1500
2000
2500
3000
3500
4000
Match: 05!Full Circle at 0.032 sec
0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
500
1000
1500
2000
2500
3000
3500
4000
time / sec
freq
/ Hz
find local “landmarks”quantize by pairsinverted index
Avery Wang ’03
LabROSA Overview - Dan Ellis 2012-02-09 /15
Dynamic Programming Alignment
• We can match the chroma representations by DP alignment
robust to timing changesquite efficientbest performer in MIREX Cover Song evaluations
10
Serra, Gomez et al. ’08
LabROSA Overview - Dan Ellis 2012-02-09 /15
Large-Scale Cover Matching
• How can we find covers in 1M songs?@ 1 sec / comparison, one search = 11.5 CPU-daysfull N2 mining = 16,000 CPU-years
• Hashing?landmarks from chroma patches (like Shazam)
represent item by distribution of contour fragmentsone stage in a filtering hierarchy...
11
Bertin-Mahieux & Ellis ’11
LabROSA Overview - Dan Ellis 2012-02-09 /15
Euclidean Cover Matching
• Reduce cover matching to nearest-neighbor search in fixed-size beat-chroma patches
• Right “distance”?principal componentslearned weightings
• Alignment/segmentation?music segmentationmultiple probesframing-insensitive representation
12
Ellis & Poliner ’08Bertin-Mahieux & Ellis ’12
Between the Bars ! Elliot Smith ! pwr .25
120 130 140 150 160 1702468
1012
Between the Bars ! Glenn Phillips ! pwr .25 ! !17 beats ! trsp 2
120 130 140 150 160 1702468
1012
pointwise product (sum(12x300) = 19.34)
time / beats
chro
ma
bin
120 130 140 150 160 1702468
1012
LabROSA Overview - Dan Ellis 2012-02-09 /15
4. Other Work: Pattern Mining
• Cluster beat-synchronous chroma patches
13
#32273 - 13 instances #51917 - 13 instances
#65512 - 10 instances #9667 - 9 instances
#10929 - 7 instances #61202 - 6 instances
#55881 - 5 instances
5 10 15 20
#68445 - 5 instances
5 10 15 time / beats
GFDCA
GFDCA
GFDCA
GFDCA
chro
ma bi
nsa20-top10x5-cp4-4p0
LabROSA Overview - Dan Ellis 2012-02-09 /15
Other Work: MSD Challenge
• Using “Taste Profile” dataUser-Song-playcount
• Challenge: Rank 1M songs to complete half a playlistusing audio, metadata, lyrics, whatever
• Kaggle.com based contestself-service evaluation, leader board
• Results at MIREX, AdMIRe
14
b80344d063b5ccb3212f76538f3d9e43d87dca9e SOAKIMP12A8C130995 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOAPDEY12A81C210A9 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBBMDR12A8C13253B 2 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOCNMUH12A6D4F6E6D 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SODACBL12A8C13C273 1b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 5
with Brian McFee