mirtoolbox · self-similarity analysis is a non-parametric technique for studying the global...
TRANSCRIPT
![Page 1: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/1.jpg)
MIRtoolbox: Sound and music analysis of
audio recordings using MatlabMUS4831, Olivier Lartillot
Part II, 9.11.2017
![Page 2: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/2.jpg)
Part 2
• Rhythm, metrical structure
• Tonal analysis
• Segmentation, structure
• Statistics
• Music & emotion
• Advanced use
![Page 3: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/3.jpg)
mirevents event detection function
time (s)0.5 1 1.5 2 2.5 3 3.5 4
ampl
itude
0.2
0.4
0.6
0.8
1Onset curve (Envelope)
• mirevents(‘ragtime.wav’)• mirpeaks(mirsum(mirspectrum(…, ‘Frame’)))
• mirevents(‘ragtime.wav’, ‘Filter’)• mirpeaks(mirsum(mirenvelope(mirfilterbank(…, ‘NbChannels’, 40))))
• mirevents(‘ragtime.wav’, ‘SpectralFlux’)• mirpeaks(mirflux(…, ‘Inc’, ‘Halfwave’))
mirevents(…, ‘Contrast’,
…)
![Page 4: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/4.jpg)
C.P.E. Bach, Concerto for cello in A major, WQ 172, 3rd mvt• ‘Envelope’, ‘Filter’: changes in dynamics
• ‘SpectralFlux’: global spectral changes
• ‘Emerge’: local changes in particular frequency regions
0.5 1 1.5 2 2.5 3 3.5 4
0.2
0.4
0.6
0.8
Onset curve (Envelope)
time (s)
ampl
itude
0.5 1 1.5 2 2.5 3 3.5 4
0.2
0.4
0.6
0.8
Onset curve (Envelope)
time (s)
ampl
itude
0.5 1 1.5 2 2.5 3 3.5 4−0.2
0
0.2
0.4
0.6
0.8
Onset curve (Envelope)
time (s)
ampl
itude
mirevents event detection function
![Page 5: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/5.jpg)
J.S. Bach, Orchestral suite No.3 in D minor, BWV 1068, Aria
0.5 1 1.5 2 2.5 3 3.5 4
0.2
0.4
0.6
0.8
Onset curve (Envelope)
time (s)
ampl
itude
0.5 1 1.5 2 2.5 3 3.5 4
0
0.2
0.4
0.6
0.8
1Onset curve (Envelope)
time (s)
ampl
itude
0.5 1 1.5 2 2.5 3 3.5 4
0.2
0.4
0.6
0.8
Onset curve (Envelope)
time (s)
ampl
itude
0.5 1 1.5 2 2.5 3 3.5 4 4.50
2
4
6
8
10
12Envelope
time (s)
ampl
itude
mirevents event detection function
• ‘Envelope’, ‘Filter’: changes in dynamics
• ‘SpectralFlux’: global spectral changes
• ‘Emerge’: local changes in particular frequency regions
![Page 6: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/6.jpg)
mirevents(..., ‘Attack’)
time (s)0.5 1 1.5 2 2.5 3 3.5 4
ampl
itude
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Onset curve (Envelope)
![Page 7: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/7.jpg)
mirattackslope average slope of note attacks
• o = mirevents(‘ragtime.wav’, ‘Attacks’)
• mirattackslope(o)
4 5 6 7 8 9
!0.5
0
0.5
1
1.5
2
2.5
Envelope (centered)
time (s)
am
plit
ude
![Page 8: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/8.jpg)
mirattackleap amplitude of note attacks
• o = mirevents(‘ragtime.wav’, ‘Attacks’)
• mirattackleap(o)
4 5 6 7 8 9
!0.5
0
0.5
1
1.5
2
2.5
Envelope (centered)
time (s)
am
plit
ude
![Page 9: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/9.jpg)
Tempo estimation?
• ‘george.wav’
• What tempo in BPM? You can tap on the beat while listening:
• http://www.all8.com/tools/bpm.htm
• How to estimate tempo using the MIRtoolbox operators presented last week?
![Page 10: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/10.jpg)
mirtempo tempo (in beats per minute)
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6!0.4
!0.2
0
0.2
0.4
0.6Envelope autocorrelation
lag (s)
coeffic
ients
0 1 2 3 4 50
0.02
0.04
0.06
0.08Onset curve (Envelope)
time (s)
am
plit
ud
e
0 1 2 3 4 5!0.05
0
0.05
0.1
0.15Onset curve (Differentiated envelope)
time (s)
am
plit
ud
e
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6!0.4
!0.2
0
0.2
0.4
0.6Envelope autocorrelation
lag (s)
co
eff
icie
nts
o
do
ac
pa
Roughly:
• o = mirevents(‘file.wav’, ‘Filter’)
• do = mirevents(o, ‘Diff ’)
• ac = mirautocor(do, ‘Resonance’)
• pa = mirpeaks(ac, ’Total’, 1)
In short:
• [t, pa] = mirtempo(‘file.wav’)
![Page 11: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/11.jpg)
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Envelope autocorrelation
lag (s)
coef
ficie
nts
• mirautocor(…, ‘Resonance’, ‘Toiviainen’) (Toiviainen & Snyder, 2003)
• mirautocor(…, ‘Resonance’, ‘vanNoorden’) (van Noorden & Moelants, 2001)
• Emphasis on the best perceived tempi
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Lag 0
Am
plitu
de
.5 s(= 120 BPM)
1 s(= 60 BPM)
mirtempo resonance curves
Lag 0
Am
plitu
de
.5 s 1 s
mirautocor mirautocor(…, ’Resonance’)
![Page 12: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/12.jpg)
• How to estimate tempo for such audio excerpt?
• ‘czardas.wav’
mirtempo tempo estimation
![Page 13: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/13.jpg)
0 2 4 $ % 10 12 14!1
0
1
2Onset curve (Differentiated envelope)
time (s)
am
plit
ude
0 2 4 6 8 10 12130
140
150
160
170
180Tempo
Temporal location of events (in s.)
coeffic
ient valu
e (
in b
pm
)
f
pa
t
mirtempo tempo (temporal evolution)
• o = mirevents(’mysong’, ‘Detect’, ‘No’)
• do = mirevents(o, ‘Diff ’)
• f = mirframe(do)
• ac = mirautocor(f, ‘Resonance’)
• pa = mirpeaks(ac, ’Total’, 1)
In short:
• [t, pa] = mirtempo(’mysong’, ‘Frame’)
![Page 14: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/14.jpg)
Switch from one metrical level to
another!
mirtempoTry for instance: mirtempo(‘george.wav’, ‘Frame’)
![Page 15: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/15.jpg)
Metrical levels:
234
5678
0.5
2.5
1(dominant)
• Level N’s period is N times level 1’s period.
mirmetre tracking all metrical levels
Try for instance: mirmetre(‘george.wav’)
![Page 16: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/16.jpg)
• The tempo curve is associated to one particular metrical level.
mirmetre tracking all metrical levels
• m = mirmetre(…)
• t = mirtempo(m)
or:
• [t m] = mirtempo(…, ‘Metre’)
![Page 17: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/17.jpg)
‘Emerge’
M. Bruch, Violin Concerto No.1 in G minor,
op.26, Finale (Allegro energico)
‘SpectralFlux’
1=
.16=
mirmetre tracking all metrical levels
Influence of the onset detection
method
![Page 18: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/18.jpg)
• Subjective judgment:
• How easily I can perceive the underlying pulsation in music.
high
medium
low
(tempo = 148 BPM)
(tempo = 148 BPM)
A
B
C(Lartillot et al., 2008)
Beat/rhythmic/metrical strength, clarity
![Page 19: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/19.jpg)
mirpulseclaritybeat strength
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Envelope autocorrelation
lag (s)
coef
ficie
nts
Lag 0
Am
plitu
de
.5 s 1 s240 BPM 0
‘KurtosisAutocor’‘TempoAutocor’‘EntropyAutocor’
‘MaxAutocor’ ‘MinAutocor’
(Lartillot et al., 2008)
• Characterisation of the autocorrelation function
• cf. also beat strength (Tzanetakis, Essl & Cook, 2002): variability of the autocorrelation curve throughout time
![Page 20: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/20.jpg)
mirchromagram energy distribution along pitches
• s = mirspectrum(a)
• c = mirchromagram(s,
‘Wrap’, ‘no’)
• c = mirchromagram(c,
‘Wrap’, ‘yes’)
![Page 21: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/21.jpg)
C Major profile
mirkeystrength probability of key candidates
mirchromagram
C major
C# major
D major
C minor
C# minor
D minor
Cross-Correlations
![Page 22: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/22.jpg)
mirkeystrength probability of key candidates
• Chromagram compared to typical chromagrams representing each possible key (or mode).
• Detection of the most probably key (or mode).
C major
C# major
D major
C minor
C# minor
D minor
Krumhansl, Cognitive foundations of musical pitch. Oxford UP, 1990. Gomez, “Tonal description of polyphonic audio for music content processing,” INFORMS Journal on Computing, 18-3, pp. 294–304, 2006.
![Page 23: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/23.jpg)
mirkey tonality estimation
mirpeaks(mirkeystrength(...))mirkeystrength
• [k c s] = mirkey(...)
![Page 24: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/24.jpg)
mirkey tonality estimation
• [k c s] = mirkey(..., ‘Frame’)
0 0.5 1 1.5 2 2.5 3 3.5D#
E
F
F#
G
G#
A
A#
Temporal location of events (in s.)
co
eff
icie
nt
va
lue
Key
min
maj
k
0 0.5 1 1.5 2 2.5 3 3.50.55
0.6
0.65
0.7
0.75
0.8Key clarity
Temporal location of events (in s.)
co
eff
icie
nt
va
lue
c
0 0.5 1 1.5 2 2.5 3 3.5
CMC#M
DMD#M
EMFM
F#MGM
G#MAM
A#MBMCm
C#mDm
D#mEmFm
F#mGm
G#mAm
A#mBm
Key strength
temporal location of beginning of frame (in s.)
ton
al ce
nte
r
s
![Page 25: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/25.jpg)
Key
s
5 10 15 20 25 30
CC#
DD#
EF
F#G
G#A
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, Umile01
majmin
5 10 15 20 25 30
CC#
DD#
EF
F#G
G#A
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, Umile01
majmin
20 40 60 80 100 120 140 160
CC#
DD#
EF
F#G
G#A
A#B
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, son5.wav
20 40 60 80 100 120 140
CC#
DD#
EF
F#G
G#A
A#B
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, son27.wav
20 40 60 80 100 120C#
DD#
EF
F#G
G#A
A#B
Temporal location of events (in s.)
coef
ficie
nt v
alue
Key, son32.wav
mirkey tonality estimation
monteverdi.wav tiersen.wav
beethoven9.wav schoenberg.wav
![Page 26: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/26.jpg)
• [k c s] = mirkey(..., ‘Frame’)
20 40 60 80 100 120 1400
0.2
0.4
0.6
0.8
1Key clarity, son31.wav
Temporal location of events (in s.)
coef
ficie
nt v
alue
Tchaïkovski, Swan Lake, Act one
unclear
clear tonality
20 40 60 80 100 120 140 1600
0.2
0.4
0.6
0.8
1Key clarity, son24.wav
Temporal location of events (in s.)
coef
ficie
nt v
alue
Prokofiev, Violon concerto No. in D major, Scherzo: Vivacissimo
unclear
clear
![Page 27: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/27.jpg)
mirmode mode estimation
![Page 28: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/28.jpg)
20 40 60 80 100 120 140−0.4
−0.2
0
0.2
0.4Mode, son28.wav
Temporal location of events (in s.)
coef
ficie
nt v
alue
Schubert, Piano Trio No.2 in E-flat major, Andante con moto
minor
major
Mozart, Piano Concerto No.23 in A major, Adagio20 40 60 80 100 120 140
−0.4
−0.2
0
0.2
0.4Mode, son22.wav
Temporal location of events (in s.)
coef
ficie
nt v
alue
minor
major
mirmode mode estimation
![Page 29: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/29.jpg)
mirkeysom self-organizing map projection of chromagram
Toiviainen & Krumhansl, “Measuring and modeling real-time responses to music: The dynamics of tonality in- duction”, Perception 32-6, pp. 741–766, 2003.
![Page 30: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/30.jpg)
• Any musical feature as input x:
• Spectrum
• Timbre(MFCC, ...)
• Tonality (Chroma, key strength, ...)
• Rhythm, etc.
• Frames are compared using a distance measure (such as cosine distance)
Similarity matrix
temporal location of frame centers (in s.)
tem
pora
l loc
atio
n of
fram
e ce
nter
s (in
s.)
20 40 60 80 100 120 140 160
20
40
60
80
100
120
140
160
A
B
A1
A2
B1
B2a B2
A2a
mirsimatrix similarity matrix
![Page 31: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/31.jpg)
mirsimatrix dissimilarity matrix
mirspectrum(a, ‘Frame’)
mirsimatrix(a,‘Dissimilarity’)
mirsimatrix(...,‘Distance’,‘cosine’)
Dissimilarity matrix
temporal location of frame centers (in s.)
tem
pora
l lo
cation o
f fr
am
e c
ente
rs (
in s
.)
0.5 1 1.5 2 2.5 3 3.5 4
0.5
1
1.5
2
2.5
3
3.5
4
2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.
The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are
S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)
The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,
dcos(vi, vj) =< vi, vj >
|vi||vj |. (2)
There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as
dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)
The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.
2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.
To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.
Foote, Cooper. “Media Segmentation using Self-Similarity Decomposition”, SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.
v i
v j
![Page 32: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/32.jpg)
mirsimatrix similarity matrix
Similarity matrix
temporal location of frame centers (in s.)
tem
po
ral lo
ca
tio
n o
f fr
am
e c
en
ters
(in
s.)
0.5 1 1.5 2 2.5 3 3.5 4
0.5
1
1.5
2
2.5
3
3.5
4
Dissimilarity matrix
temporal location of frame centers (in s.)
tem
pora
l lo
cation o
f fr
am
e c
ente
rs (
in s
.)
0.5 1 1.5 2 2.5 3 3.5 4
0.5
1
1.5
2
2.5
3
3.5
4
Foote, Cooper. “Media Segmentation using Self-Similarity Decomposition”, SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.
mirsimatrix(a, ‘Similarity’, ‘exponential’)
2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.
The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are
S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)
The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,
dcos(vi, vj) =< vi, vj >
|vi||vj |. (2)
There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as
dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)
The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.
2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.
To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.
2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.
The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are
S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)
The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,
dcos(vi, vj) =< vi, vj >
|vi||vj |. (2)
There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as
dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)
The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.
2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.
To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.
2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.
The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are
S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)
The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,
dcos(vi, vj) =< vi, vj >
|vi||vj |. (2)
There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as
dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)
The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.
2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.
To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.
2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.
The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are
S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)
The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,
dcos(vi, vj) =< vi, vj >
|vi||vj |. (2)
There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as
dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)
The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.
2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.
To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.
![Page 33: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/33.jpg)
mirsimatrix similarity matrix
• Observe the structure of this excerpt along different musical dimensions:
• ‘george.wav’
• For instance:
• s = mirspectrum(…, ‘Frame’, …)
• mirsimatrix(s)
![Page 34: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/34.jpg)
1 2 3 4 5 6 7
0.2
0.4
0.6
0.8
1
Novelty
Temporal location of events (in s.)
coeffic
ient valu
e
• Convolution with checkerboard kernel along the diagonal of mirsimatrix
• Peaks indicate transitions between phases
Similarity matrix
temporal location of frame centers (in s.)
tem
po
ral lo
ca
tio
n o
f fr
am
e c
en
ters
(in
s.)
1 2 3 4 5 6 7
1
2
3
4
5
6
7
20
40
60
80
100
120
20
40
60
80
100
120
mirnovelty novelty, kernel method
![Page 35: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/35.jpg)
mirnovelty novelty, kernel method
• Observe the structure of this excerpt using different kernel sizes:
• ‘george.wav’
• mirnovelty(…, ‘Kernel’, N)
• where N is the kernel size
![Page 36: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/36.jpg)
Kernel size:
• 64 frames
• 16 frames
Similarity matrix
temporal location of frame centers (in s.)
tem
pora
l loc
atio
n of
fram
e ce
nter
s (in
s.)
20 40 60 80 100 120 140 160
20
40
60
80
100
120
140
160
A
B
A1
A2
B1
B2a B2
A2a
- end of intro & start of A1
- start of A2a- start of B1- start of B2a- transition
A1a/A1b
mirnovelty novelty, kernel method
![Page 37: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/37.jpg)
• nv = mirnovelty(sm)
• p = mirpeaks(nv)
• sg = mirsegment(‘mysong’, p)
• mirplay(sg)
• s = mirmfcc(sg)
• sm = mirsimatrix(s)
1 2 3 4 5 6 7
0.2
0.4
0.6
0.8
1
Novelty
Temporal location of events (in s.)
coef
ficie
nt v
alue
nv
0 1 2 3 4 5 6 7
!0.5
0
0.5
1
Audio waveform
time (s)
am
plit
ude
sg
0 1 2 3 4 5 6 7
5
10
15
20
MFCC
time axis (in s.)
coeffic
ient ra
nks
s
sm
mirsegment segmentation
![Page 38: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/38.jpg)
Statistics• mirstat
• mean
• standard deviation
• slope
• periodicity
• mirhisto
• distribution histograms
• moments
• mircentroid
• mirspread
• mirskewness
• mirkurtosis
![Page 39: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/39.jpg)
mirfeatures batch of features
• mirzerocross
• mircentroid
• mirbrightness
• mirspread
• mirskewness
• mirkurtosis
• mirrolloff
• mirentropy
• mirflatness
• mirroughness
• mirregularity
• mirinharmonicity
• mirmfcc
• mirfluctuation
• mirattacktime
• mirattackslope
• mirlowenergy
• mirflux
• mirpitch
• mirchromagram
• mirkeystrength
• mirkey
• mirmode
• mirhcdf
• mirtempo
• mirpulseclarity
mirfeatures(‘Folder’, ‘Stat’)
![Page 40: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/40.jpg)
mirexport exportation of statistical data to files
• mirexport(filename, ...) adding one or several data from MIRtoolbox operators.
• mirexport(‘result.txt’, ...) saved in a text file.
• mirexport(‘result.arff’, ...) exported to WEKA for data-mining.
• mirexport(‘Workspace’, ...) saved in a Matlab variable.
![Page 41: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/41.jpg)
Stimulus set
activity
tens
ion
valence
happy
sad
tender
fear
anger
Eerola & Vuoskoski. A comparison of the discrete and dimensional models of emotion in music. Psychology of Music.
![Page 42: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/42.jpg)
valence
Stimulus set
3D 2D
R²β
R²β
v a t v a
happy .89 .93 .79 -.35 .89 .85 .49sad .63 -.20 -.84 -.22 .63 -.05 -.69
tender .77 .33 -.45 -.58 .77 .50 -.51fear .87 -.83 .07 .63 .87 -.90 .24
anger .64 -.52 .32 .35 .64 -.55 .35mean .76 .76
activity
tens
ion
Eerola & Vuoskoski. A comparison of the discrete and dimensional models of emotion in music. Psychology of Music.
![Page 43: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/43.jpg)
T. Eerola, O. Lartillot, P. Toiviainen, "Prediction of Multidimensional Emotional Ratings in Music From Audio Using Multivariate Regression Models", ISMIR, Kobe, 2009.
miremotionR² h s t f a
MLR .55 .56 .42 .51 .54PCA .34 .40 .23 .13 .27PLS .59 .64 .52 .60 .57MLR .62 .58 .44 .60 .63PCA .38 .49 .47 .45 .62PLS .65 .61 .55 .64 .67
happyFeature β
fluctuation peaks
-.40sp. centroid .13sp. spread -.19
chrom. peaks -.05majorness .03
sadFeature β
roughness .12register -.08
register var .09majorness .02
harm. change -.03
tenderFeature βRMS var -.42
sp. centroid .14key clarity .11
harm. change -.10tonal novelty -.01
fearFeature βRMS var -.79
fluctuation peaks
-.21key clarity -.09
harm. change .08tonal novelty -.02
angerFeature βRMS var .44
pulse clarity -.13sp. centroid -.13key clarity -.04
tonal novelty .02
Box-
Cox
![Page 44: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/44.jpg)
Part 2
• Rhythm, metrical structure
• Tonal analysis
• Segmentation, structure
• Statistics
• Music & emotion
• Advanced use
![Page 45: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/45.jpg)
mirgetdata return data in Matlab format
Encapsulated datanumerical data,
related sampling rates,related file name,
etc.
s = mirspectrum(‘file’);
s
vector
mirgetdata(s)
![Page 46: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/46.jpg)
mirgetdata return data in Matlab format
Encapsulated datagrouped numerical data,related sampling rates,
related file name,etc.
s = mirspectrum(‘file’, ‘Frame’);
s
matrix
mirgetdata(s)
![Page 47: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/47.jpg)
mirgetdata return data in Matlab format
Encapsulated datagrouped numerical data,related sampling rates,
related file name,etc.
f = mirfilterbank(‘file’, ‘Frame’);
f
mirgetdata(f)
3D-matrix
![Page 48: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/48.jpg)
mirgetdata return data in Matlab format
Encapsulated datagrouped numerical data,related sampling rates,
related file name,etc.
s
cell array
mirgetdata(s)
...
sg = mirsegment(‘file’) s = mirspectrum(sg,
‘Frame’);
3D-matrix 3D-matrix
![Page 49: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/49.jpg)
mirgetdata return data in Matlab format
Encapsulated datagrouped numerical data,related sampling rates,
related file name,etc.
s
mirgetdata(s)
sg = mirsegment(‘Folder’) s = mirspectrum(sg, ‘Frame’);
cell array
...
cell array
...
matrix matrix
file1segm1 segm2
cell array
...
matrix matrix
file2segm1 segm2
![Page 50: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/50.jpg)
mirgetdata return data in Matlab format
Encapsulated datagrouped numerical data,related sampling rates,
related file name,etc.
p
mirgetdata(p)
p = mirpeaks(a)
peak values
... ... ...
cell arraycell array
peak positions
... ... ...
cell arraycell array
matrix
![Page 51: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/51.jpg)
get returns fields of encapsulated data
• get(a, ‘xName’)
• get(a, ‘xData’)
• get(a, ‘yName’)
• get(a, ‘yData’)
• get(a, ‘yUnit’)
• get(a, ‘FramePos’)
• get(a, ‘Sampling’)
• get(a, ‘NBits’)
• get(a, ‘Title’)
• get(a, ‘FileName’)
• get(a, ‘Label’)
• get(a, ‘Channels’)
• get(a, ‘xPeakSample’)
• get(a, ‘xPeakUnit’)
• get(a, ‘xPeakInterpol’)
• get(a, ‘yPeak’)
• get(a, ‘yPeakInterpol’)
Encapsulated dataa = miraudio(‘ragtime’);
![Page 52: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/52.jpg)
get returns fields of encapsulated data
• get(s, ‘Frequency’) = get(s, ‘xData’)
• get(s, ‘Magnitude’) = get(s, ‘yData’)
• get(s, ‘Phase’)
• get(s, ‘xScale’) (= ‘Freq’, ‘Mel’, ‘Bark’)
• get(s, ‘Power’)
• get(s, ‘dB’)
Encapsulated datas = mirspectrum(‘ragtime’);
etc.
![Page 53: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/53.jpg)
memory managementmirenvelope(‘hugefile’);
hugefile
mirenvelope
Automatichugefile
chunk
![Page 54: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/54.jpg)
mirchunklim chunk size limitation
mirenvelope(‘hugefile’);
chunk
mirenvelope
mirchunklim by default: 500 000 samples
mirchunklim(50000) set to 50 000 samples
hugefile
If memory overflow problems, decrease mirchunklim:
![Page 55: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/55.jpg)
• a = miraudio(‘hugefile’);
• c = mirmfcc(a)a
c
avoid useless call to miraudio
hugefilemiraudio
• c = mirmfcc(‘hugefile’)
mirmfcchugefile
![Page 56: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/56.jpg)
• a = miraudio(‘Folder’);
• c = mirmfcc(a)
miraudio
a
c
• c = mirmfcc(‘Folder’)
mirmfcc
avoid useless call to miraudio
file1 file2 file3 file4
file1 file2 file3 file4
![Page 57: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/57.jpg)
• f = mirframe(‘hugefile’);
• c = mirmfcc(f)
mirframe
f
c
• c = mirmfcc(‘hugefile’, ‘Frame’)
mirmfcc
avoid useless call to mirframe
hugefile
hugefile
![Page 58: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/58.jpg)
?• a = miraudio(‘hugefile’, ‘Sampling’, 11025);
• c = mirmfcc(a)
chunk
miraudio
a
c
what if miraudio (or mirframe) really necessary?
hugefile
![Page 59: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/59.jpg)
mireval flowchart design and evaluation
• a = miraudio(‘Design’, ‘Sampling’, 11025);
• c = mirmfcc(a);
• mireval(c, ‘hugefile’)
chunk
mirmfcc
miraudio
a
c
hugefile
![Page 60: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/60.jpg)
mireval flowchart design and evaluation
• s = mirspectrum(‘Design’, ‘Frame’);
• c = mircentroid(s);
• mireval(c, ‘hugefile’)
chunk
mircentroid
mirspectrum
s
c
hugefile
![Page 61: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/61.jpg)
mireval flowchart design and evaluation
• s = mirspectrum(‘Design’, ‘Frame’);
• c = mircentroid(s);
• mireval(c, ‘Folder’)
mircentroid
mirspectrum
s
c
file1 file2 file3 file4
![Page 62: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/62.jpg)
mireval flowchart evaluation?
• s = mirspectrum(‘Design’, ‘Frame’);
• cent = mircentroid(s);
• cent = mireval(cent, ‘Folder’);
• ceps = mircepstrum(s);
• ceps = mireval(ceps, ‘Folder’);
s
cent
ceps
cent
ceps
s is evaluated twice!
![Page 63: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/63.jpg)
myflow
mirstruct complex flowchart
• myflow = mirstruct;
• myflow.tmp.s = mirspectrum(‘Design’, ‘Frame’);
• myflow.cent = mircentroid(myflow.tmp.s);
• myflow.ceps = mircepstrum(myflow.tmp.s);
• res = mireval(myflow, ‘Folder’);
myflow.tmp.s
myflow.cent
myflow.ceps
res.cent res.ceps
![Page 64: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio](https://reader034.vdocuments.us/reader034/viewer/2022042112/5e8e30a7d1672a351c55f19e/html5/thumbnails/64.jpg)
• f = mirstruct;
• f.dynamics.rms = mirrms(‘Design’, ‘Frame’)
• f.tmp.onsets = mironsets(‘Design’);
• f.rhythm.tempo = mirtempo(f.tmp.onsets, ‘Frame’);
• f.tmp.attacks = mironsets(f.tmp.onsets, ‘Attacks’);
• f.rhythm.attack.time = mirattacktime(f.tmp.attacks);
• f.rhythm.attack.slope = mirattackslope(f.tmp.attacks);
f rms
dynamics tmp.onsets
tmp.attacksattack
time slope
complex flowchart
rhythm
tempo