chord recognition and segmentation using em-trained …dpwe/talks/ismir-2003-10.pdfchord recognition...
TRANSCRIPT
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /191
• Chord Recognition• EM-trained HMMs• Experiments & Results
Chord Recognition and Segmentation using EM-trained
Hidden Markov Models
Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.eduLabROSA, Columbia University
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /192
• Basic problem:Recover chord sequence labels from audio
Easier than note transcription ?More relevant to listener perception ?
Chord Transcription
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /193
• Enharmonicity:Chord labels can be ambiguousC# vs Db
• Many different chord classesmajor, minor, 6th, 9th, ...fold into 7 ‘main’ classes:maj, min, maj7, min7, dom7, aug, dim
• Acoustic variabilitychords are the same regardless of instrumentation
Difficulties with Chord Transcription
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /194
• Note transcription, then note→chord ruleslike labeling chords in MIDI transcripts
• Spectrum→chord rulesi.e. find harmonic peaks, use knowledge of likely notes in each chord
• Trained classifierdon’t use any “expert knowledge”instead, learn patterns from labeled examples
Approaches to Chord Transcription
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /195
• Use labeled training examples to estimate for features and chord class
•• Use Bayes Rule to get posterior probabilities for each class given features:
Statistical-Pattern-Recognition Chord Recognizer
p(x|wi) wix
Labeledtraining
examples{xn,ωxn}
Estimateconditional pdf
for class ω1
p(x|ω1)
Sortaccordingto class
p(wi|x) =p(x|wi) · p(wi)
 j p(x|w j) · p(w j)
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /196
• We need (lots of) examples of audio segments and the appropriate chord labelsNot widely available!Even when you can get it, there is very little
• We could hand-mark a training setpainfully time-consuming!
• Can we generate it automatically?we could, if we already had the chord transcriptions system working...
Training Data Sources
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /197
• Chord recognition is like word recognitionwe are trying to recover a sequence of ‘exclusive’ labels associated with an audio streamwe have a lot of potential training audio, but no time labels
• Can we do what they do in ASR?.. i.e. iterative re-estimation of models and labels using Expectation-MaximizationNeed only label sequence, not timings(i.e. words or chords in order, no times)
Speech Recognition Analogy
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /198
• Estimate ‘soft’ labels using current models• Update model parameters from new labels• Repeat until convergence to local maximum
EM HMM Re-Estimation
ae1 ae2 ae3
dh1 dh2
Model inventory
Uniforminitializationalignments
Initializationparameters
Repeat until convergence
E-step:probabilitiesof unknowns
M-step:maximize viaparameters
Labelled training datadh ax k ae ts ae t aa n
dh
dh
s ae t aa nax
ax
k
k
ae
ae
tΘinit
p(qn|X1,Θold)i N
Θ : max E[log p(X,Q | Θ)]
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /199
• All we need are the chord sequences for our training examples
• OLGA Tab archives (http://www.olga.net/)?multiple authors, unreliable quality
• Hal Leonard “Paperback Song Series”many Beatles songs, consistent detailmanually retyped for 20 songs:“Beatles for Sale”, “Help”, “Hard Day’s Night”
• Issues: repeats, intros, weird bits in the middle
Chord Sequence Data Sources
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1910
• Preliminary investigations to see if this workssmall database to get startedcompare different feature setsdifferent ways to evaluate resultscan we reintroduce a little high-level knowledge?
Experiments
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1911
• Two training sets:Train on 18 songs, test on 2 held-out songs(fair test)Train on all 20 songs, test on 2 songs from training set (cheating, rewards over-fitting, upper bound)
• Two evaluation measuresRecognition: transcribe unknown chordsForced alignment: find time boundaries given correct chord sequenceScore frame-level accuracy against hand-markings
Training/Test Conditions
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1912
• Can try any features with EM training...• Use MFCCs as baseline
“the” feature in ASR, also useful in music IRalso try deltas, double deltas as in ASRcapture ‘formants’, not pitch - straw man
• “Pitch Class Profile” features (Fujishima’99)collapse FFT bin energies into (24) chroma binsa/k/a Chroma Spectrum (Bartsch’01) ...
Features
FFT bin (up to 2000 Hz)
Chro
ma
bin
(qua
rter t
ones
)
Spectrum-to-chroma mapping weights
20 40 60 80 100
5
10
15
20
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1913
• Statistical system learns separate models for each of (7 chord types) x (21 roots)models are means, variances of feature vectorsonly 32 actually appear in our training seteven so, many have few training instances
• Expect similarity between e.g. Amaj & Bmajsame chord, just shifted in frequencyshift = rotation of 24-bin chroma space
• Can align & average all transpositions of same chord after each training iterationthen rotate back to starting chroma, continue...
Averaging Rotated PCP Models
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1914
• Models are not adequately discriminant:
Results: Recognition
(random ~3%)
MFCCs are poor(can overtrain)
PCPs a little better(ROT helps
generalization)
Feature Recogtrain18 train20
MFCC 5.9 16.77.7 19.6
MFCC D 15.8 7.61.5 6.9
PCP 10.0 23.618.2 26.4
PCP ROT 23.3 23.120.1 13.1
Percent Frame Accuracy: Recognition
Eight Days a WeekEvery Little Thing
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1915
• Some glimmers of hope!
Results: Alignment
Feature Aligntrain18 train20
MFCC 27.0 20.914.5 23.0
MFCC D 24.1 13.119.9 19.7
PCP 26.3 41.046.2 53.7
PCP ROT 68.8 68.383.3 83.8
Percent Frame Accuracy: Forced Alignment
PCP_ROT best accuracy
& best generalization
Eight Days a WeekEvery Little Thing
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1916
• Major/minor confusions• C# is relative minor (shared notes)
Chord ConfusionsXIRTAMNOISUFNOC"ROJAME"
keeWasyaDthgiE
jaM niM 7jaM 7niM 7mDo Aug Dim
bF/E 851 511 . 9 . . .
F/#E 9 . . . . . .
bG/#F . . . 11 . . .
G 3 . . . . . .
bA/#G . . . . . . .
A 9 . . . 1 . .
bB/#A . . . . . . .
bC/B 8 . . . , . .
C/#B . . . . . . .
bD/#C . 02 . . . . .
D . 41 . . . . .
bE/#D . . . . . . .
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1917
• Chord model centers (means) indicate chord ‘templates’:
What did the models learn?
0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4PCP_ROT family model means (train18)
DIMDOM7MAJMINMIN7
C D E F G A B C (for C-root chords)
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1918
• A flavor of the features & results:
Recognition/Alignment Example
true E G D Bm G
align E G DBm G
recog E G Bm Am Em7 Bm Em7
16.27 24.84time / sec
intensity
A
#
B
C
#
D
#
E
F
#
G
#
pitc
h cla
ssBeatles - Beatles For Sale - Eight Days a Week (4096pt)
20
0
40
60
80
100
120
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1919
• ASR-style EM is a viable approach for learning chord modelsmore training data needed
• Better featurescapture ‘global’ properties of chordsrobust to fine tuning issues?
• Better representation for training labelsi.e. allow for extra repeats?
• More ways to reintroduce music knowledge
Conclusions & Future Work