wedneday, january 21st, 2008 comp. sci. colloquium the problem with music: modeling distance...
Post on 21-Dec-2015
217 views
TRANSCRIPT
Wedneday, January 21st, 2008 Comp. Sci. Colloquium
The Problem with Music:The Problem with Music:
Modeling Distance Distributions of Modeling Distance Distributions of Large Music CollectionsLarge Music Collections
Prof. Michael CaseyProf. Michael Casey
Program in Digital MusicsProgram in Digital MusicsDartmouth College, Hanover, NHDartmouth College, Hanover, NH
a.k.a.a.k.a.The Problem with The Problem with
Multimedia:Multimedia:
MusicMusicMusic VideosMusic Videos
VideosVideosImagesImages
Scalable SimilarityScalable Similarity
8M tracks in commercial collection8M tracks in commercial collection
6B Images on WWW6B Images on WWW
Require scalable nearest-neighbor Require scalable nearest-neighbor methodsmethods
Increase scale, decrease search Increase scale, decrease search complexitycomplexity
Example: Remixing / Example: Remixing / Sampling in Yahoo! MusicSampling in Yahoo! Music
Original TrackOriginal Track Remix 1Remix 1 Remix 2Remix 2 Remix 3Remix 3
SpecificitySpecificity
Partial document (sub-track) retrievalPartial document (sub-track) retrieval Alternate versions: remix, cover, live, Alternate versions: remix, cover, live,
album album Task is mid-high specificityTask is mid-high specificity
Audio ShinglesAudio Shingles
, concatenate l frames of m dimensional features
A shingle is defined as:
• Shingles provide contextual information about features • Originally used for Internet search engines:
•Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig: “Syntactic Clustering of the Web”. Computer Networks 29(8-13): 1157-1166 (1997)
•Related to N-grams, overlapping sequences of features• Applied to audio domain by Casey and Slaney :
•Casey, M. Slaney, M. “The Importance of Sequences in Musical Similarity”, in Proc.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006
Audio Shingle Similarity Audio Shingle Similarity
, a query shingle drawn from a query track {Q}
, database of audio tracks indexed by (n)
, a database shingle from track n
Shingles are normalized to unit vectors, therefore:
For shingles with M dimensions (M=l.m); m=12, 20; l=30,40
Whole-track similarityWhole-track similarity
Often want to know which tracks are Often want to know which tracks are similarsimilar
Similarity depends on specificity of Similarity depends on specificity of tasktask Distortion / filtering / re-encoding (high)Distortion / filtering / re-encoding (high) Remix with new audio material (mid)Remix with new audio material (mid) Cover song: same song, different artist Cover song: same song, different artist
(mid)(mid)
Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search
Compute the number of shingle collisions between two tracks:
Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search
Compute the number of shingle collisions between two tracks:
• Requires a threshold for considering shingles to be related• Need a way to estimate relatedness (threshold) for data set
SCALESCALE
Mazurkas: 10,000 tracks 10-100ms Mazurkas: 10,000 tracks 10-100ms featuresfeatures
3s clips (30 – 300 frames per vector)3s clips (30 – 300 frames per vector) 12d – 20d features (360 – 600d vectors)12d – 20d features (360 – 600d vectors)
Yahoo! MusicYahoo! Music 6M tracks6M tracks 1000 vectors per track1000 vectors per track (6M x 1k)^2 search for near neighbours (6M x 1k)^2 search for near neighbours
Approximate Approximate nearnear neighborsneighbors
In many applications we need only In many applications we need only near near neghborsneghbors
We can exploit this by allowing a We can exploit this by allowing a degree of approximation in retrievaldegree of approximation in retrieval
HashingHashing
Types of hashesTypes of hashes String : put String : put BashBash vs vs Bush Bush in different in different
binsbins Locality sensitive : close matches in Locality sensitive : close matches in
same binsame bin High-dimensional and probabilisticHigh-dimensional and probabilistic
Nearest Neighbor implementationsNearest Neighbor implementations Pair-wise distance computationPair-wise distance computation
1,000,000,000,000 comparisons in 2M song 1,000,000,000,000 comparisons in 2M song databasedatabase
Hash bucket collisionsHash bucket collisions 1,000,000,000 hash projections1,000,000,000 hash projections
Exact matching via Exact matching via hashinghashing
Audio fingerprinting Audio fingerprinting Shazzam, etc.Shazzam, etc.
Make the feature robustMake the feature robust Use exact matching on integer hashUse exact matching on integer hash Find a sequence of hashes to identify Find a sequence of hashes to identify
specific recording or imagespecific recording or image Drawback: only exact matches Drawback: only exact matches
possiblepossible
Locality-Sensitive Hashing (Indyk-Motwani’98)
• Hash functions are locality-sensitive, if, for a random hash random function h, for any pair of points p,q we have:– Pr[h(p)=h(q)] is “high” if p is “close” to q– Pr[h(p)=h(q)] is “low” if p is”far” from q
Random ProjectionsRandom Projections
Random Random projections projections estimate estimate distancedistance
Multiple Multiple projections projections improve improve estimateestimate
hh’s are locality-sensitive’s are locality-sensitive
Pr[h(p)=h(q)]=(1-D(p,q)/d)Pr[h(p)=h(q)]=(1-D(p,q)/d)kk
We can vary the probability by We can vary the probability by changing changing kk
k=1 k=2
distance distance
Pr Pr
Statistical approaches to Statistical approaches to modeling modeling
distance distributionsdistance distributions
Distribution of minimum Distribution of minimum distancesdistances
Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selectedquery shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.
Radius-bounded retrieval Radius-bounded retrieval performance: cover song performance: cover song
(opus task)(opus task)
• Performance depends critically on xthresh, the collision threshold
• Want to estimate xthresh automatically from unlabelled data
Order StatisticsOrder Statistics
Minimum-value distribution is Minimum-value distribution is analyticanalytic
Estimate the distribution parametersEstimate the distribution parameters Substitute into minimum value Substitute into minimum value
distributiondistribution Define a threshold in terms of FP Define a threshold in terms of FP
raterate This gives an estimate of This gives an estimate of xthreshxthresh
Estimating Estimating xthresh xthresh from from unlabelled dataunlabelled data
Use theoretical statisticsUse theoretical statistics Null Hypothesis: Null Hypothesis:
HH00: shingles are drawn from unrelated tracks: shingles are drawn from unrelated tracks
Assume elements i.i.d., normally distributedAssume elements i.i.d., normally distributed MM dimensional shingles, dimensional shingles, dd effective degrees of effective degrees of
freedom: freedom:
Squared distance distribution for Squared distance distribution for HH00
ML for background ML for background distributiondistribution
• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality
Background distribution Background distribution parametersparameters
• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality
Unlabelled data Unlabelled data experimentexperiment
Unlabelled data set Unlabelled data set Known to contain:Known to contain:
cover songs (same work, different performer)cover songs (same work, different performer) Near duplicate recordings (misattribution, Near duplicate recordings (misattribution,
encoding)encoding) Estimate background distance distributionEstimate background distance distribution Estimate minimum value distributionEstimate minimum value distribution Set Set xthresh xthresh so FP rate is <= 1%so FP rate is <= 1% Whole-track retrieval based on shingle Whole-track retrieval based on shingle
collisionscollisions
MisattributionsMisattributions Joyce Hatto: 100% of known misattributions in first rankJoyce Hatto: 100% of known misattributions in first rank
Sergie FiorentinoSergie Fiorentino
Eleven out of twenty-six Mazurkas performances on Eleven out of twenty-six Mazurkas performances on another Concert Artists/Fidelio disc, issued under the another Concert Artists/Fidelio disc, issued under the name of Sergio Fiorentino, are in fact copies of name of Sergio Fiorentino, are in fact copies of recordings by other artists. This is the first time that recordings by other artists. This is the first time that such practices have been found in the Concert Artist‘ such practices have been found in the Concert Artist‘ Fidelio recordings issued other than under the name of Fidelio recordings issued other than under the name of Joyce Hatto, and prompts speculation as to how much Joyce Hatto, and prompts speculation as to how much more misattributed material remains to be found in the more misattributed material remains to be found in the Concert Artists/Fidelio catalogue. Concert Artists/Fidelio catalogue. Click hereClick here for further for further details. details.
ScalingScaling
Locality sensitive hashing Locality sensitive hashing Trade-off approximate NN for time Trade-off approximate NN for time
complexitycomplexity 3 to 4 orders of magnitude speed-up3 to 4 orders of magnitude speed-up No noticeable degradation in No noticeable degradation in
performanceperformance For optimal radius thresholdFor optimal radius threshold
Open source: google: Open source: google: “audioDB”“audioDB” Management of tracks, sequences, Management of tracks, sequences,
saliencesalience Automatic indexing parametersAutomatic indexing parameters OMRAS2, Yahoo!, AWAL, CHARM, more…OMRAS2, Yahoo!, AWAL, CHARM, more… Web-services interface (SOAP / JSON)Web-services interface (SOAP / JSON) Implementation of LSH for large N ~ 1BImplementation of LSH for large N ~ 1B 1-10 ms whole-track retrieval from 1B 1-10 ms whole-track retrieval from 1B
vectorsvectors
AudioDB: Shingle Nearest AudioDB: Shingle Nearest Neighbor SearchNeighbor Search
Current deploymentCurrent deployment
Large commercial collectionsLarge commercial collections AWAL ~ 100,000 tracksAWAL ~ 100,000 tracks Yahoo! 2M+ tracks, related song Yahoo! 2M+ tracks, related song
classifierclassifier Flickr 1B+ ImagesFlickr 1B+ Images
AudioDB: open-source, international AudioDB: open-source, international consortium of developersconsortium of developers
Google: “audioDB”Google: “audioDB”
ConclusionsConclusions
Radius-bounded retrieval model for tracksRadius-bounded retrieval model for tracks Shingles preserve temporal information, high Shingles preserve temporal information, high
dd Implements mid-to-high specificity searchImplements mid-to-high specificity search Optimal radius threshold from order statistics Optimal radius threshold from order statistics
null hypothesis: shingles are drawn from unrelated null hypothesis: shingles are drawn from unrelated trackstracks
LSH requires radius bound, automatic LSH requires radius bound, automatic estimateestimate
Scales to 1B shingles+ using LSHScales to 1B shingles+ using LSH
ThanksThanks
Malcolm Slaney, Yahoo! Research Malcolm Slaney, Yahoo! Research Inc.Inc.
Christophe Rhodes, Goldsmiths, U. Christophe Rhodes, Goldsmiths, U. of Londonof London
Michela Magas, Goldsmiths, U. of Michela Magas, Goldsmiths, U. of LondonLondon
Funding: EPSRC: EP/E02274X/1 Funding: EPSRC: EP/E02274X/1