mirtoolbox · self-similarity analysis is a non-parametric technique for studying the global...

64
MIRtoolbox: Sound and music analysis of audio recordings using Matlab MUS4831, Olivier Lartillot Part II, 9.11.2017

Upload: others

Post on 31-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

MIRtoolbox: Sound and music analysis of

audio recordings using MatlabMUS4831, Olivier Lartillot

Part II, 9.11.2017

Page 2: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

Part 2

• Rhythm, metrical structure

• Tonal analysis

• Segmentation, structure

• Statistics

• Music & emotion

• Advanced use

Page 3: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirevents event detection function

time (s)0.5 1 1.5 2 2.5 3 3.5 4

ampl

itude

0.2

0.4

0.6

0.8

1Onset curve (Envelope)

• mirevents(‘ragtime.wav’)• mirpeaks(mirsum(mirspectrum(…, ‘Frame’)))

• mirevents(‘ragtime.wav’, ‘Filter’)• mirpeaks(mirsum(mirenvelope(mirfilterbank(…, ‘NbChannels’, 40))))

• mirevents(‘ragtime.wav’, ‘SpectralFlux’)• mirpeaks(mirflux(…, ‘Inc’, ‘Halfwave’))

mirevents(…, ‘Contrast’,

…)

Page 4: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

C.P.E. Bach, Concerto for cello in A major, WQ 172, 3rd mvt• ‘Envelope’, ‘Filter’: changes in dynamics

• ‘SpectralFlux’: global spectral changes

• ‘Emerge’: local changes in particular frequency regions

0.5 1 1.5 2 2.5 3 3.5 4

0.2

0.4

0.6

0.8

Onset curve (Envelope)

time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4

0.2

0.4

0.6

0.8

Onset curve (Envelope)

time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4−0.2

0

0.2

0.4

0.6

0.8

Onset curve (Envelope)

time (s)

ampl

itude

mirevents event detection function

Page 5: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

J.S. Bach, Orchestral suite No.3 in D minor, BWV 1068, Aria

0.5 1 1.5 2 2.5 3 3.5 4

0.2

0.4

0.6

0.8

Onset curve (Envelope)

time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1Onset curve (Envelope)

time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4

0.2

0.4

0.6

0.8

Onset curve (Envelope)

time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4 4.50

2

4

6

8

10

12Envelope

time (s)

ampl

itude

mirevents event detection function

• ‘Envelope’, ‘Filter’: changes in dynamics

• ‘SpectralFlux’: global spectral changes

• ‘Emerge’: local changes in particular frequency regions

Page 6: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirevents(..., ‘Attack’)

time (s)0.5 1 1.5 2 2.5 3 3.5 4

ampl

itude

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Onset curve (Envelope)

Page 7: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirattackslope average slope of note attacks

• o = mirevents(‘ragtime.wav’, ‘Attacks’)

• mirattackslope(o)

4 5 6 7 8 9

!0.5

0

0.5

1

1.5

2

2.5

Envelope (centered)

time (s)

am

plit

ude

Page 8: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirattackleap amplitude of note attacks

• o = mirevents(‘ragtime.wav’, ‘Attacks’)

• mirattackleap(o)

4 5 6 7 8 9

!0.5

0

0.5

1

1.5

2

2.5

Envelope (centered)

time (s)

am

plit

ude

Page 9: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

Tempo estimation?

• ‘george.wav’

• What tempo in BPM? You can tap on the beat while listening:

• http://www.all8.com/tools/bpm.htm

• How to estimate tempo using the MIRtoolbox operators presented last week?

Page 10: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirtempo tempo (in beats per minute)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6!0.4

!0.2

0

0.2

0.4

0.6Envelope autocorrelation

lag (s)

coeffic

ients

0 1 2 3 4 50

0.02

0.04

0.06

0.08Onset curve (Envelope)

time (s)

am

plit

ud

e

0 1 2 3 4 5!0.05

0

0.05

0.1

0.15Onset curve (Differentiated envelope)

time (s)

am

plit

ud

e

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6!0.4

!0.2

0

0.2

0.4

0.6Envelope autocorrelation

lag (s)

co

eff

icie

nts

o

do

ac

pa

Roughly:

• o = mirevents(‘file.wav’, ‘Filter’)

• do = mirevents(o, ‘Diff ’)

• ac = mirautocor(do, ‘Resonance’)

• pa = mirpeaks(ac, ’Total’, 1)

In short:

• [t, pa] = mirtempo(‘file.wav’)

Page 11: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Envelope autocorrelation

lag (s)

coef

ficie

nts

• mirautocor(…, ‘Resonance’, ‘Toiviainen’) (Toiviainen & Snyder, 2003)

• mirautocor(…, ‘Resonance’, ‘vanNoorden’) (van Noorden & Moelants, 2001)

• Emphasis on the best perceived tempi

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Lag 0

Am

plitu

de

.5 s(= 120 BPM)

1 s(= 60 BPM)

mirtempo resonance curves

Lag 0

Am

plitu

de

.5 s 1 s

mirautocor mirautocor(…, ’Resonance’)

Page 12: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• How to estimate tempo for such audio excerpt?

• ‘czardas.wav’

mirtempo tempo estimation

Page 13: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

0 2 4 $ % 10 12 14!1

0

1

2Onset curve (Differentiated envelope)

time (s)

am

plit

ude

0 2 4 6 8 10 12130

140

150

160

170

180Tempo

Temporal location of events (in s.)

coeffic

ient valu

e (

in b

pm

)

f

pa

t

mirtempo tempo (temporal evolution)

• o = mirevents(’mysong’, ‘Detect’, ‘No’)

• do = mirevents(o, ‘Diff ’)

• f = mirframe(do)

• ac = mirautocor(f, ‘Resonance’)

• pa = mirpeaks(ac, ’Total’, 1)

In short:

• [t, pa] = mirtempo(’mysong’, ‘Frame’)

Page 14: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

Switch from one metrical level to

another!

mirtempoTry for instance: mirtempo(‘george.wav’, ‘Frame’)

Page 15: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

Metrical levels:

234

5678

0.5

2.5

1(dominant)

• Level N’s period is N times level 1’s period.

mirmetre tracking all metrical levels

Try for instance: mirmetre(‘george.wav’)

Page 16: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• The tempo curve is associated to one particular metrical level.

mirmetre tracking all metrical levels

• m = mirmetre(…)

• t = mirtempo(m)

or:

• [t m] = mirtempo(…, ‘Metre’)

Page 17: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

‘Emerge’

M. Bruch, Violin Concerto No.1 in G minor,

op.26, Finale (Allegro energico)

‘SpectralFlux’

1=

.16=

mirmetre tracking all metrical levels

Influence of the onset detection

method

Page 18: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• Subjective judgment:

• How easily I can perceive the underlying pulsation in music.

high

medium

low

(tempo = 148 BPM)

(tempo = 148 BPM)

A

B

C(Lartillot et al., 2008)

Beat/rhythmic/metrical strength, clarity

Page 19: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirpulseclaritybeat strength

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Envelope autocorrelation

lag (s)

coef

ficie

nts

Lag 0

Am

plitu

de

.5 s 1 s240 BPM 0

‘KurtosisAutocor’‘TempoAutocor’‘EntropyAutocor’

‘MaxAutocor’ ‘MinAutocor’

(Lartillot et al., 2008)

• Characterisation of the autocorrelation function

• cf. also beat strength (Tzanetakis, Essl & Cook, 2002): variability of the autocorrelation curve throughout time

Page 20: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirchromagram energy distribution along pitches

• s = mirspectrum(a)

• c = mirchromagram(s,

‘Wrap’, ‘no’)

• c = mirchromagram(c,

‘Wrap’, ‘yes’)

Page 21: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

C Major profile

mirkeystrength probability of key candidates

mirchromagram

C major

C# major

D major

C minor

C# minor

D minor

Cross-Correlations

Page 22: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirkeystrength probability of key candidates

• Chromagram compared to typical chromagrams representing each possible key (or mode).

• Detection of the most probably key (or mode).

C major

C# major

D major

C minor

C# minor

D minor

Krumhansl, Cognitive foundations of musical pitch. Oxford UP, 1990. Gomez, “Tonal description of polyphonic audio for music content processing,” INFORMS Journal on Computing, 18-3, pp. 294–304, 2006.

Page 23: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirkey tonality estimation

mirpeaks(mirkeystrength(...))mirkeystrength

• [k c s] = mirkey(...)

Page 24: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirkey tonality estimation

• [k c s] = mirkey(..., ‘Frame’)

0 0.5 1 1.5 2 2.5 3 3.5D#

E

F

F#

G

G#

A

A#

Temporal location of events (in s.)

co

eff

icie

nt

va

lue

Key

min

maj

k

0 0.5 1 1.5 2 2.5 3 3.50.55

0.6

0.65

0.7

0.75

0.8Key clarity

Temporal location of events (in s.)

co

eff

icie

nt

va

lue

c

0 0.5 1 1.5 2 2.5 3 3.5

CMC#M

DMD#M

EMFM

F#MGM

G#MAM

A#MBMCm

C#mDm

D#mEmFm

F#mGm

G#mAm

A#mBm

Key strength

temporal location of beginning of frame (in s.)

ton

al ce

nte

r

s

Page 25: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

Key

s

5 10 15 20 25 30

CC#

DD#

EF

F#G

G#A

Temporal location of events (in s.)

coef

ficie

nt v

alue

Key, Umile01

majmin

5 10 15 20 25 30

CC#

DD#

EF

F#G

G#A

Temporal location of events (in s.)

coef

ficie

nt v

alue

Key, Umile01

majmin

20 40 60 80 100 120 140 160

CC#

DD#

EF

F#G

G#A

A#B

Temporal location of events (in s.)

coef

ficie

nt v

alue

Key, son5.wav

20 40 60 80 100 120 140

CC#

DD#

EF

F#G

G#A

A#B

Temporal location of events (in s.)

coef

ficie

nt v

alue

Key, son27.wav

20 40 60 80 100 120C#

DD#

EF

F#G

G#A

A#B

Temporal location of events (in s.)

coef

ficie

nt v

alue

Key, son32.wav

mirkey tonality estimation

monteverdi.wav tiersen.wav

beethoven9.wav schoenberg.wav

Page 26: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• [k c s] = mirkey(..., ‘Frame’)

20 40 60 80 100 120 1400

0.2

0.4

0.6

0.8

1Key clarity, son31.wav

Temporal location of events (in s.)

coef

ficie

nt v

alue

Tchaïkovski, Swan Lake, Act one

unclear

clear tonality

20 40 60 80 100 120 140 1600

0.2

0.4

0.6

0.8

1Key clarity, son24.wav

Temporal location of events (in s.)

coef

ficie

nt v

alue

Prokofiev, Violon concerto No. in D major, Scherzo: Vivacissimo

unclear

clear

Page 27: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirmode mode estimation

Page 28: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

20 40 60 80 100 120 140−0.4

−0.2

0

0.2

0.4Mode, son28.wav

Temporal location of events (in s.)

coef

ficie

nt v

alue

Schubert, Piano Trio No.2 in E-flat major, Andante con moto

minor

major

Mozart, Piano Concerto No.23 in A major, Adagio20 40 60 80 100 120 140

−0.4

−0.2

0

0.2

0.4Mode, son22.wav

Temporal location of events (in s.)

coef

ficie

nt v

alue

minor

major

mirmode mode estimation

Page 29: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirkeysom self-organizing map projection of chromagram

Toiviainen & Krumhansl, “Measuring and modeling real-time responses to music: The dynamics of tonality in- duction”, Perception 32-6, pp. 741–766, 2003.

Page 30: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• Any musical feature as input x:

• Spectrum

• Timbre(MFCC, ...)

• Tonality (Chroma, key strength, ...)

• Rhythm, etc.

• Frames are compared using a distance measure (such as cosine distance)

Similarity matrix

temporal location of frame centers (in s.)

tem

pora

l loc

atio

n of

fram

e ce

nter

s (in

s.)

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

160

A

B

A1

A2

B1

B2a B2

A2a

mirsimatrix similarity matrix

Page 31: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirsimatrix dissimilarity matrix

mirspectrum(a, ‘Frame’)

mirsimatrix(a,‘Dissimilarity’)

mirsimatrix(...,‘Distance’,‘cosine’)

Dissimilarity matrix

temporal location of frame centers (in s.)

tem

pora

l lo

cation o

f fr

am

e c

ente

rs (

in s

.)

0.5 1 1.5 2 2.5 3 3.5 4

0.5

1

1.5

2

2.5

3

3.5

4

2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.

The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are

S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)

The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,

dcos(vi, vj) =< vi, vj >

|vi||vj |. (2)

There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as

dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)

The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.

2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.

To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.

Foote, Cooper. “Media Segmentation using Self-Similarity Decomposition”, SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.

v i

v j

Page 32: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirsimatrix similarity matrix

Similarity matrix

temporal location of frame centers (in s.)

tem

po

ral lo

ca

tio

n o

f fr

am

e c

en

ters

(in

s.)

0.5 1 1.5 2 2.5 3 3.5 4

0.5

1

1.5

2

2.5

3

3.5

4

Dissimilarity matrix

temporal location of frame centers (in s.)

tem

pora

l lo

cation o

f fr

am

e c

ente

rs (

in s

.)

0.5 1 1.5 2 2.5 3 3.5 4

0.5

1

1.5

2

2.5

3

3.5

4

Foote, Cooper. “Media Segmentation using Self-Similarity Decomposition”, SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.

mirsimatrix(a, ‘Similarity’, ‘exponential’)

2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.

The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are

S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)

The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,

dcos(vi, vj) =< vi, vj >

|vi||vj |. (2)

There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as

dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)

The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.

2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.

To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.

2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.

The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are

S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)

The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,

dcos(vi, vj) =< vi, vj >

|vi||vj |. (2)

There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as

dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)

The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.

2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.

To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.

2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.

The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are

S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)

The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,

dcos(vi, vj) =< vi, vj >

|vi||vj |. (2)

There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as

dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)

The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.

2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.

To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.

2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.

The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are

S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)

The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,

dcos(vi, vj) =< vi, vj >

|vi||vj |. (2)

There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as

dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)

The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.

2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.

To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.

Page 33: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirsimatrix similarity matrix

• Observe the structure of this excerpt along different musical dimensions:

• ‘george.wav’

• For instance:

• s = mirspectrum(…, ‘Frame’, …)

• mirsimatrix(s)

Page 34: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

1 2 3 4 5 6 7

0.2

0.4

0.6

0.8

1

Novelty

Temporal location of events (in s.)

coeffic

ient valu

e

• Convolution with checkerboard kernel along the diagonal of mirsimatrix

• Peaks indicate transitions between phases

Similarity matrix

temporal location of frame centers (in s.)

tem

po

ral lo

ca

tio

n o

f fr

am

e c

en

ters

(in

s.)

1 2 3 4 5 6 7

1

2

3

4

5

6

7

20

40

60

80

100

120

20

40

60

80

100

120

mirnovelty novelty, kernel method

Page 35: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirnovelty novelty, kernel method

• Observe the structure of this excerpt using different kernel sizes:

• ‘george.wav’

• mirnovelty(…, ‘Kernel’, N)

• where N is the kernel size

Page 36: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

Kernel size:

• 64 frames

• 16 frames

Similarity matrix

temporal location of frame centers (in s.)

tem

pora

l loc

atio

n of

fram

e ce

nter

s (in

s.)

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

160

A

B

A1

A2

B1

B2a B2

A2a

- end of intro & start of A1

- start of A2a- start of B1- start of B2a- transition

A1a/A1b

mirnovelty novelty, kernel method

Page 37: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• nv = mirnovelty(sm)

• p = mirpeaks(nv)

• sg = mirsegment(‘mysong’, p)

• mirplay(sg)

• s = mirmfcc(sg)

• sm = mirsimatrix(s)

1 2 3 4 5 6 7

0.2

0.4

0.6

0.8

1

Novelty

Temporal location of events (in s.)

coef

ficie

nt v

alue

nv

0 1 2 3 4 5 6 7

!0.5

0

0.5

1

Audio waveform

time (s)

am

plit

ude

sg

0 1 2 3 4 5 6 7

5

10

15

20

MFCC

time axis (in s.)

coeffic

ient ra

nks

s

sm

mirsegment segmentation

Page 38: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

Statistics• mirstat

• mean

• standard deviation

• slope

• periodicity

• mirhisto

• distribution histograms

• moments

• mircentroid

• mirspread

• mirskewness

• mirkurtosis

Page 39: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirfeatures batch of features

• mirzerocross

• mircentroid

• mirbrightness

• mirspread

• mirskewness

• mirkurtosis

• mirrolloff

• mirentropy

• mirflatness

• mirroughness

• mirregularity

• mirinharmonicity

• mirmfcc

• mirfluctuation

• mirattacktime

• mirattackslope

• mirlowenergy

• mirflux

• mirpitch

• mirchromagram

• mirkeystrength

• mirkey

• mirmode

• mirhcdf

• mirtempo

• mirpulseclarity

mirfeatures(‘Folder’, ‘Stat’)

Page 40: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirexport exportation of statistical data to files

• mirexport(filename, ...) adding one or several data from MIRtoolbox operators.

• mirexport(‘result.txt’, ...) saved in a text file.

• mirexport(‘result.arff’, ...) exported to WEKA for data-mining.

• mirexport(‘Workspace’, ...) saved in a Matlab variable.

Page 41: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

Stimulus set

activity

tens

ion

valence

happy

sad

tender

fear

anger

Eerola & Vuoskoski. A comparison of the discrete and dimensional models of emotion in music. Psychology of Music.

Page 42: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

valence

Stimulus set

3D 2D

R²β

R²β

v a t v a

happy .89 .93 .79 -.35 .89 .85 .49sad .63 -.20 -.84 -.22 .63 -.05 -.69

tender .77 .33 -.45 -.58 .77 .50 -.51fear .87 -.83 .07 .63 .87 -.90 .24

anger .64 -.52 .32 .35 .64 -.55 .35mean .76 .76

activity

tens

ion

Eerola & Vuoskoski. A comparison of the discrete and dimensional models of emotion in music. Psychology of Music.

Page 43: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

T. Eerola, O. Lartillot, P. Toiviainen, "Prediction of Multidimensional Emotional Ratings in Music From Audio Using Multivariate Regression Models", ISMIR, Kobe, 2009.

miremotionR² h s t f a

MLR .55 .56 .42 .51 .54PCA .34 .40 .23 .13 .27PLS .59 .64 .52 .60 .57MLR .62 .58 .44 .60 .63PCA .38 .49 .47 .45 .62PLS .65 .61 .55 .64 .67

happyFeature β

fluctuation peaks

-.40sp. centroid .13sp. spread -.19

chrom. peaks -.05majorness .03

sadFeature β

roughness .12register -.08

register var .09majorness .02

harm. change -.03

tenderFeature βRMS var -.42

sp. centroid .14key clarity .11

harm. change -.10tonal novelty -.01

fearFeature βRMS var -.79

fluctuation peaks

-.21key clarity -.09

harm. change .08tonal novelty -.02

angerFeature βRMS var .44

pulse clarity -.13sp. centroid -.13key clarity -.04

tonal novelty .02

Box-

Cox

Page 44: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

Part 2

• Rhythm, metrical structure

• Tonal analysis

• Segmentation, structure

• Statistics

• Music & emotion

• Advanced use

Page 45: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirgetdata return data in Matlab format

Encapsulated datanumerical data,

related sampling rates,related file name,

etc.

s = mirspectrum(‘file’);

s

vector

mirgetdata(s)

Page 46: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirgetdata return data in Matlab format

Encapsulated datagrouped numerical data,related sampling rates,

related file name,etc.

s = mirspectrum(‘file’, ‘Frame’);

s

matrix

mirgetdata(s)

Page 47: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirgetdata return data in Matlab format

Encapsulated datagrouped numerical data,related sampling rates,

related file name,etc.

f = mirfilterbank(‘file’, ‘Frame’);

f

mirgetdata(f)

3D-matrix

Page 48: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirgetdata return data in Matlab format

Encapsulated datagrouped numerical data,related sampling rates,

related file name,etc.

s

cell array

mirgetdata(s)

...

sg = mirsegment(‘file’) s = mirspectrum(sg,

‘Frame’);

3D-matrix 3D-matrix

Page 49: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirgetdata return data in Matlab format

Encapsulated datagrouped numerical data,related sampling rates,

related file name,etc.

s

mirgetdata(s)

sg = mirsegment(‘Folder’) s = mirspectrum(sg, ‘Frame’);

cell array

...

cell array

...

matrix matrix

file1segm1 segm2

cell array

...

matrix matrix

file2segm1 segm2

Page 50: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirgetdata return data in Matlab format

Encapsulated datagrouped numerical data,related sampling rates,

related file name,etc.

p

mirgetdata(p)

p = mirpeaks(a)

peak values

... ... ...

cell arraycell array

peak positions

... ... ...

cell arraycell array

matrix

Page 51: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

get returns fields of encapsulated data

• get(a, ‘xName’)

• get(a, ‘xData’)

• get(a, ‘yName’)

• get(a, ‘yData’)

• get(a, ‘yUnit’)

• get(a, ‘FramePos’)

• get(a, ‘Sampling’)

• get(a, ‘NBits’)

• get(a, ‘Title’)

• get(a, ‘FileName’)

• get(a, ‘Label’)

• get(a, ‘Channels’)

• get(a, ‘xPeakSample’)

• get(a, ‘xPeakUnit’)

• get(a, ‘xPeakInterpol’)

• get(a, ‘yPeak’)

• get(a, ‘yPeakInterpol’)

Encapsulated dataa = miraudio(‘ragtime’);

Page 52: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

get returns fields of encapsulated data

• get(s, ‘Frequency’) = get(s, ‘xData’)

• get(s, ‘Magnitude’) = get(s, ‘yData’)

• get(s, ‘Phase’)

• get(s, ‘xScale’) (= ‘Freq’, ‘Mel’, ‘Bark’)

• get(s, ‘Power’)

• get(s, ‘dB’)

Encapsulated datas = mirspectrum(‘ragtime’);

etc.

Page 53: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

memory managementmirenvelope(‘hugefile’);

hugefile

mirenvelope

Automatichugefile

chunk

Page 54: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mirchunklim chunk size limitation

mirenvelope(‘hugefile’);

chunk

mirenvelope

mirchunklim by default: 500 000 samples

mirchunklim(50000) set to 50 000 samples

hugefile

If memory overflow problems, decrease mirchunklim:

Page 55: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• a = miraudio(‘hugefile’);

• c = mirmfcc(a)a

c

avoid useless call to miraudio

hugefilemiraudio

• c = mirmfcc(‘hugefile’)

mirmfcchugefile

Page 56: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• a = miraudio(‘Folder’);

• c = mirmfcc(a)

miraudio

a

c

• c = mirmfcc(‘Folder’)

mirmfcc

avoid useless call to miraudio

file1 file2 file3 file4

file1 file2 file3 file4

Page 57: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• f = mirframe(‘hugefile’);

• c = mirmfcc(f)

mirframe

f

c

• c = mirmfcc(‘hugefile’, ‘Frame’)

mirmfcc

avoid useless call to mirframe

hugefile

hugefile

Page 58: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

?• a = miraudio(‘hugefile’, ‘Sampling’, 11025);

• c = mirmfcc(a)

chunk

miraudio

a

c

what if miraudio (or mirframe) really necessary?

hugefile

Page 59: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mireval flowchart design and evaluation

• a = miraudio(‘Design’, ‘Sampling’, 11025);

• c = mirmfcc(a);

• mireval(c, ‘hugefile’)

chunk

mirmfcc

miraudio

a

c

hugefile

Page 60: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mireval flowchart design and evaluation

• s = mirspectrum(‘Design’, ‘Frame’);

• c = mircentroid(s);

• mireval(c, ‘hugefile’)

chunk

mircentroid

mirspectrum

s

c

hugefile

Page 61: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mireval flowchart design and evaluation

• s = mirspectrum(‘Design’, ‘Frame’);

• c = mircentroid(s);

• mireval(c, ‘Folder’)

mircentroid

mirspectrum

s

c

file1 file2 file3 file4

Page 62: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

mireval flowchart evaluation?

• s = mirspectrum(‘Design’, ‘Frame’);

• cent = mircentroid(s);

• cent = mireval(cent, ‘Folder’);

• ceps = mircepstrum(s);

• ceps = mireval(ceps, ‘Folder’);

s

cent

ceps

cent

ceps

s is evaluated twice!

Page 63: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

myflow

mirstruct complex flowchart

• myflow = mirstruct;

• myflow.tmp.s = mirspectrum(‘Design’, ‘Frame’);

• myflow.cent = mircentroid(myflow.tmp.s);

• myflow.ceps = mircepstrum(myflow.tmp.s);

• res = mireval(myflow, ‘Folder’);

myflow.tmp.s

myflow.cent

myflow.ceps

res.cent res.ceps

Page 64: MIRtoolbox · Self-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams. The first step is to parameterize the source audio

• f = mirstruct;

• f.dynamics.rms = mirrms(‘Design’, ‘Frame’)

• f.tmp.onsets = mironsets(‘Design’);

• f.rhythm.tempo = mirtempo(f.tmp.onsets, ‘Frame’);

• f.tmp.attacks = mironsets(f.tmp.onsets, ‘Attacks’);

• f.rhythm.attack.time = mirattacktime(f.tmp.attacks);

• f.rhythm.attack.slope = mirattackslope(f.tmp.attacks);

f rms

dynamics tmp.onsets

tmp.attacksattack

time slope

complex flowchart

rhythm

tempo