mirtoolbox · self-similarity analysis is a non-parametric technique for studying the global...

MIRtoolbox: Sound and music analysis of

audio recordings using MatlabMUS4831, Olivier Lartillot

Part II, 9.11.2017

Part 2

• Rhythm, metrical structure

• Tonal analysis

• Segmentation, structure

• Statistics

• Music & emotion

• Advanced use

mirevents event detection function

time (s)0.5 1 1.5 2 2.5 3 3.5 4

ampl

itude

0.2

0.4

0.6

0.8

1Onset curve (Envelope)

• mirevents(‘ragtime.wav’)• mirpeaks(mirsum(mirspectrum(…, ‘Frame’)))

• mirevents(‘ragtime.wav’, ‘Filter’)• mirpeaks(mirsum(mirenvelope(mirfilterbank(…, ‘NbChannels’, 40))))

• mirevents(‘ragtime.wav’, ‘SpectralFlux’)• mirpeaks(mirflux(…, ‘Inc’, ‘Halfwave’))

mirevents(…, ‘Contrast’,

…)

C.P.E. Bach, Concerto for cello in A major, WQ 172, 3rd mvt• ‘Envelope’, ‘Filter’: changes in dynamics

• ‘SpectralFlux’: global spectral changes

• ‘Emerge’: local changes in particular frequency regions

0.5 1 1.5 2 2.5 3 3.5 4

0.2

0.4

0.6

0.8

Onset curve (Envelope)

time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4

0.2

0.4

0.6

0.8


time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4−0.2

0

0.2

0.4

0.6

0.8


time (s)

ampl

itude


J.S. Bach, Orchestral suite No.3 in D minor, BWV 1068, Aria

0.5 1 1.5 2 2.5 3 3.5 4

0.2

0.4

0.6

0.8


time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8


time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4

0.2

0.4

0.6

0.8


time (s)

ampl

itude

0.5 1 1.5 2 2.5 3 3.5 4 4.50

2

4

6

8

10

12Envelope

time (s)

ampl

itude


• ‘Envelope’, ‘Filter’: changes in dynamics

• ‘SpectralFlux’: global spectral changes

• ‘Emerge’: local changes in particular frequency regions

mirevents(..., ‘Attack’)

time (s)0.5 1 1.5 2 2.5 3 3.5 4

ampl

itude

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


mirattackslope average slope of note attacks

• o = mirevents(‘ragtime.wav’, ‘Attacks’)

• mirattackslope(o)

4 5 6 7 8 9

!0.5

0

0.5

1

1.5

2

2.5

Envelope (centered)

time (s)

am

plit

ude

mirattackleap amplitude of note attacks

• o = mirevents(‘ragtime.wav’, ‘Attacks’)

• mirattackleap(o)

4 5 6 7 8 9

!0.5

0

0.5

1

1.5

2

2.5

Envelope (centered)

time (s)

am

plit

ude

Tempo estimation?

• ‘george.wav’

• What tempo in BPM? You can tap on the beat while listening:

• http://www.all8.com/tools/bpm.htm

• How to estimate tempo using the MIRtoolbox operators presented last week?

http://www.all8.com/tools/bpm.htm

mirtempo tempo (in beats per minute)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6!0.4

!0.2

0

0.2

0.4

0.6Envelope autocorrelation

lag (s)

coeffic

ients

0 1 2 3 4 50

0.02

0.04

0.06

0.08Onset curve (Envelope)

time (s)

am

plit

ud

e

0 1 2 3 4 5!0.05

0

0.05

0.1

0.15Onset curve (Differentiated envelope)

time (s)

am

plit

ud

e

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6!0.4

!0.2

0

0.2

0.4

0.6Envelope autocorrelation

lag (s)

co

eff

icie

nts

o

do

ac

pa

Roughly:

• o = mirevents(‘file.wav’, ‘Filter’)

• do = mirevents(o, ‘Diff ’)

• ac = mirautocor(do, ‘Resonance’)

• pa = mirpeaks(ac, ’Total’, 1)

In short:

• [t, pa] = mirtempo(‘file.wav’)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Envelope autocorrelation

lag (s)

coef

ficie

nts

• mirautocor(…, ‘Resonance’, ‘Toiviainen’) (Toiviainen & Snyder, 2003)

• mirautocor(…, ‘Resonance’, ‘vanNoorden’) (van Noorden & Moelants, 2001)

• Emphasis on the best perceived tempi

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Lag 0

Am

plitu

de

.5 s(= 120 BPM)

1 s(= 60 BPM)

mirtempo resonance curves

Lag 0

Am

plitu

de

.5 s 1 s

mirautocor mirautocor(…, ’Resonance’)

• How to estimate tempo for such audio excerpt?

• ‘czardas.wav’

mirtempo tempo estimation

0 2 4 $ % 10 12 14!1

0

1

2Onset curve (Differentiated envelope)

time (s)

am

plit

ude

0 2 4 6 8 10 12130

140

150

160

170

180Tempo

Temporal location of events (in s.)

coeffic

ient valu

e (

in b

pm

)

f

pa

t

mirtempo tempo (temporal evolution)

• o = mirevents(’mysong’, ‘Detect’, ‘No’)

• do = mirevents(o, ‘Diff ’)

• f = mirframe(do)

• ac = mirautocor(f, ‘Resonance’)

• pa = mirpeaks(ac, ’Total’, 1)

In short:

• [t, pa] = mirtempo(’mysong’, ‘Frame’)

Switch from one metrical level to

another!

mirtempoTry for instance: mirtempo(‘george.wav’, ‘Frame’)

Metrical levels:

234

5678

0.5

2.5

1(dominant)

• Level N’s period is N times level 1’s period.

mirmetre tracking all metrical levels

Try for instance: mirmetre(‘george.wav’)

• The tempo curve is associated to one particular metrical level.


• m = mirmetre(…)

• t = mirtempo(m)

or:

• [t m] = mirtempo(…, ‘Metre’)

‘Emerge’

M. Bruch, Violin Concerto No.1 in G minor,

op.26, Finale (Allegro energico)

‘SpectralFlux’

1=

.16=


Influence of the onset detection

method

• Subjective judgment:

• How easily I can perceive the underlying pulsation in music.

high

medium

low

(tempo = 148 BPM)

(tempo = 148 BPM)

A

B

C(Lartillot et al., 2008)

Beat/rhythmic/metrical strength, clarity

mirpulseclaritybeat strength

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Envelope autocorrelation

lag (s)

coef

ficie

nts

Lag 0

Am

plitu

de

.5 s 1 s240 BPM 0

‘KurtosisAutocor’‘TempoAutocor’‘EntropyAutocor’

‘MaxAutocor’ ‘MinAutocor’

(Lartillot et al., 2008)

• Characterisation of the autocorrelation function

• cf. also beat strength (Tzanetakis, Essl & Cook, 2002): variability of the autocorrelation curve throughout time

mirchromagram energy distribution along pitches

• s = mirspectrum(a)

• c = mirchromagram(s,

‘Wrap’, ‘no’)

• c = mirchromagram(c,

‘Wrap’, ‘yes’)

C Major profile

mirkeystrength probability of key candidates

mirchromagram

C major

C# major

D major

C minor

C# minor

D minor

Cross-Correlations

mirkeystrength probability of key candidates

• Chromagram compared to typical chromagrams representing each possible key (or mode).

• Detection of the most probably key (or mode).

C major

C# major

D major

C minor

C# minor

D minor

Krumhansl, Cognitive foundations of musical pitch. Oxford UP, 1990. Gomez, “Tonal description of polyphonic audio for music content processing,” INFORMS Journal on Computing, 18-3, pp. 294–304, 2006.

mirkey tonality estimation

mirpeaks(mirkeystrength(...))mirkeystrength

• [k c s] = mirkey(...)


• [k c s] = mirkey(..., ‘Frame’)

0 0.5 1 1.5 2 2.5 3 3.5D#

E

F

F#

G

G#

A

A#


co

eff

icie

nt

va

lue

Key

min

maj

k

0 0.5 1 1.5 2 2.5 3 3.50.55

0.6

0.65

0.7

0.75

0.8Key clarity


co

eff

icie

nt

va

lue

c

0 0.5 1 1.5 2 2.5 3 3.5

CMC#M

DMD#M

EMFM

F#MGM

G#MAM

A#MBMCm

C#mDm

D#mEmFm

F#mGm

G#mAm

A#mBm

Key strength

temporal location of beginning of frame (in s.)

ton

al ce

nte

r

s

Key

s

5 10 15 20 25 30

CC#

DD#

EF

F#G

G#A


coef

ficie

nt v

alue

Key, Umile01

majmin

5 10 15 20 25 30

CC#

DD#

EF

F#G

G#A


coef

ficie

nt v

alue

Key, Umile01

majmin

20 40 60 80 100 120 140 160

CC#

DD#

EF

F#G

G#A

A#B


coef

ficie

nt v

alue

Key, son5.wav

20 40 60 80 100 120 140

CC#

DD#

EF

F#G

G#A

A#B


coef

ficie

nt v

alue

Key, son27.wav

20 40 60 80 100 120C#

DD#

EF

F#G

G#A

A#B


coef

ficie

nt v

alue

Key, son32.wav


monteverdi.wav tiersen.wav

beethoven9.wav schoenberg.wav

• [k c s] = mirkey(..., ‘Frame’)

20 40 60 80 100 120 1400

0.2

0.4

0.6

0.8

1Key clarity, son31.wav


coef

ficie

nt v

alue

Tchaïkovski, Swan Lake, Act one

unclear

clear tonality

20 40 60 80 100 120 140 1600

0.2

0.4

0.6

0.8

1Key clarity, son24.wav


coef

ficie

nt v

alue

Prokofiev, Violon concerto No. in D major, Scherzo: Vivacissimo

unclear

clear

mirmode mode estimation

20 40 60 80 100 120 140−0.4

−0.2

0

0.2

0.4Mode, son28.wav


coef

ficie

nt v

alue

Schubert, Piano Trio No.2 in E-flat major, Andante con moto

minor

major

Mozart, Piano Concerto No.23 in A major, Adagio20 40 60 80 100 120 140

−0.4

−0.2

0

0.2

0.4Mode, son22.wav


coef

ficie

nt v

alue

minor

major

mirmode mode estimation

mirkeysom self-organizing map projection of chromagram

Toiviainen & Krumhansl, “Measuring and modeling real-time responses to music: The dynamics of tonality in- duction”, Perception 32-6, pp. 741–766, 2003.

• Any musical feature as input x:

• Spectrum

• Timbre(MFCC, ...)

• Tonality (Chroma, key strength, ...)

• Rhythm, etc.

• Frames are compared using a distance measure (such as cosine distance)

Similarity matrix

temporal location of frame centers (in s.)

tem

pora

l loc

atio

n of

fram

e ce

nter

s (in

s.)

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

160

A

B

A1

A2

B1

B2a B2

A2a

mirsimatrix similarity matrix

mirsimatrix dissimilarity matrix

mirspectrum(a, ‘Frame’)

mirsimatrix(a,‘Dissimilarity’)

mirsimatrix(...,‘Distance’,‘cosine’)

Dissimilarity matrix


tem

pora

l lo

cation o

f fr

am

e c

ente

rs (

in s

.)

0.5 1 1.5 2 2.5 3 3.5 4

0.5

1

1.5

2

2.5

3

3.5

4

2. SIMILARITY ANALYSIS FOR MULTIMEDIA2.1. Constructing the similarity matrixSelf-similarity analysis is a non-parametric technique for studying the global structure of time-ordered streams.The first step is to parameterize the source audio. For this, we calculate spectral features from the shorttime Fourier transform (STFT) of 0.05 second non-overlapping windows in the source audio. We have alsoexperimented with alternative parameterizations including Mel Frequency Cepstral Coefficients (MFCCs), andsubspace representations computed using principal components analysis of spectrogram data. The window sizemay also be varied, though in our experience, robust audio analysis requires resolution on the order of 20 Hz.Here, the parameterization is optimized for analysis, rather than for compression, transmission, or reconstruction.The sole requirement is that similar source audio samples produce similar features.

The analysis proceeds by comparing all pairwise combinations of audio windows using a quantitative similaritymeasure. Represent the B-dimensional spectral data computed for N windows of a digital audio file by the vectors{vi : i = 1, · · · , N} ⊂ IRB. Using a generic similarity measure, d : IRB × IRB #→ IR, we embed the resultingsimilarity data in a matrix, S as illustrated in the right panel of Figure 1. The elements of S are

S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)

The time axis runs on the horizontal (left to right) and vertical (top to bottom) axes of S and along its maindiagonal, where self-similarity is maximal. A common similarity measure is the cosine distance. Given vectorsvi and vj representing the spectrograms for sample times i and j, respectively,

dcos(vi, vj) =< vi, vj >

|vi||vj |. (2)

There are many possible choices for the similarity measure including measures based on the Euclidean vectornorm or non-negative exponential measures, such as

dexp(vi, vj) = exp (dcos(vi, vj) −1) . (3)

The right panel of Figure 1 shows a similarity matrix computed from the song “Wild Honey”by U2 using thecosine distance measure (see (2)) and low frequency spectrograms. Pixels are colored brighter with increasingsimilarity, so that segments of similar audio samples appear as bright squares along the main diagonal. Brighterrectangular regions off the main diagonal indicate similarity between segments.

2.2. Audio SegmentationSegmentation is a crucial first step in many audio analysis approaches. The structure of S suggests a straight-forward approach to segmentation.6 Consider a simple “song”having two successive notes of different pitch,for example a cuckoo call. When visualized, S for this example will exhibit a 2 × 2 checkerboard pattern.White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on theoff-diagonals correspond to regions of low cross-similarity. The instant when the notes change corresponds tothe center of the checkerboard. More generally, the boundary between two coherent audio segments will alsoproduce a checkerboard pattern. The two segments will exhibit high within-segment (self) similarity, producingadjacent square regions of high similarity along the main diagonal of S. The two segments will also producerectangular regions of low between-segment (cross) similarity off the main diagonal. The boundary is the cruxof this checkerboard pattern.

To identify these patterns in S, we take a matched-filter approach. To detect segment boundaries in theaudio, a Gaussian-tapered “checkerboard”kernel is correlated along the main diagonal of the similarity matrix.Peaks in the correlation indicate locally novel audio, thus we refer to the correlation as a novelty score. Anexample kernel appears in the left panel of Figure 2. The right panel shows the novelty score computed from thesimilarity matrix in Figure 1. Large peaks are detected in the resulting time-indexed correlation and labelled assegment boundaries.

Foote, Cooper. “Media Segmentation using Self-Similarity Decomposition”, SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.

v i

v j


Similarity matrix


tem

po

ral lo

ca

tio

n o

f fr

am

e c

en

ters

(in

s.)

0.5 1 1.5 2 2.5 3 3.5 4

0.5

1

1.5

2

2.5

3

3.5

4

Dissimilarity matrix


tem

pora

l lo

cation o

f fr

am

e c

ente

rs (

in s

.)

0.5 1 1.5 2 2.5 3 3.5 4

0.5

1

1.5

2

2.5

3

3.5

4

Foote, Cooper. “Media Segmentation using Self-Similarity Decomposition”, SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.

mirsimatrix(a, ‘Similarity’, ‘exponential’)



S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)



|vi||vj |. (2)








S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)



|vi||vj |. (2)








S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)



|vi||vj |. (2)








S(i, j) = d(vi, vj) i, j = 1, · · · , N . (1)



|vi||vj |. (2)







• Observe the structure of this excerpt along different musical dimensions:


• For instance:

• s = mirspectrum(…, ‘Frame’, …)

• mirsimatrix(s)

1 2 3 4 5 6 7

0.2

0.4

0.6

0.8

1

Novelty


coeffic

ient valu

e

• Convolution with checkerboard kernel along the diagonal of mirsimatrix

• Peaks indicate transitions between phases

Similarity matrix


tem

po

ral lo

ca

tio

n o

f fr

am

e c

en

ters

(in

s.)

1 2 3 4 5 6 7

1

2

3

4

5

6

7

20

40

60

80

100

120

20

40

60

80

100

120

mirnovelty novelty, kernel method


• Observe the structure of this excerpt using different kernel sizes:


• mirnovelty(…, ‘Kernel’, N)

• where N is the kernel size

Kernel size:

• 64 frames

• 16 frames

Similarity matrix


tem

pora

l loc

atio

n of

fram

e ce

nter

s (in

s.)

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

160

A

B

A1

A2

B1

B2a B2

A2a

- end of intro & start of A1

- start of A2a- start of B1- start of B2a- transition

A1a/A1b


• nv = mirnovelty(sm)

• p = mirpeaks(nv)

• sg = mirsegment(‘mysong’, p)

• mirplay(sg)

• s = mirmfcc(sg)

• sm = mirsimatrix(s)

1 2 3 4 5 6 7

0.2

0.4

0.6

0.8

1

Novelty


coef

ficie

nt v

alue

nv

0 1 2 3 4 5 6 7

!0.5

0

0.5

1

Audio waveform

time (s)

am

plit

ude

sg

0 1 2 3 4 5 6 7

5

10

15

20

MFCC

time axis (in s.)

coeffic

ient ra

nks

s

sm

mirsegment segmentation

Statistics• mirstat

• mean

• standard deviation

• slope

• periodicity

• mirhisto

• distribution histograms

• moments

• mircentroid

• mirspread

• mirskewness

• mirkurtosis

mirfeatures batch of features

• mirzerocross

• mircentroid

• mirbrightness

• mirspread

• mirskewness

• mirkurtosis

• mirrolloff

• mirentropy

• mirflatness

• mirroughness

• mirregularity

• mirinharmonicity

• mirmfcc

• mirfluctuation

• mirattacktime

• mirattackslope

• mirlowenergy

• mirflux

• mirpitch

• mirchromagram

• mirkeystrength

• mirkey

• mirmode

• mirhcdf

• mirtempo

• mirpulseclarity

mirfeatures(‘Folder’, ‘Stat’)

mirexport exportation of statistical data to files

• mirexport(filename, ...) adding one or several data from MIRtoolbox operators.

• mirexport(‘result.txt’, ...) saved in a text file.

• mirexport(‘result.arff’, ...) exported to WEKA for data-mining.

• mirexport(‘Workspace’, ...) saved in a Matlab variable.

Stimulus set

activity

tens

ion

valence

happy

sad

tender

fear

anger

Eerola & Vuoskoski. A comparison of the discrete and dimensional models of emotion in music. Psychology of Music.

valence

Stimulus set

3D 2D

R²β

R²β

v a t v a

happy .89 .93 .79 -.35 .89 .85 .49sad .63 -.20 -.84 -.22 .63 -.05 -.69

tender .77 .33 -.45 -.58 .77 .50 -.51fear .87 -.83 .07 .63 .87 -.90 .24

anger .64 -.52 .32 .35 .64 -.55 .35mean .76 .76

activity

tens

ion

Eerola & Vuoskoski. A comparison of the discrete and dimensional models of emotion in music. Psychology of Music.

T. Eerola, O. Lartillot, P. Toiviainen, "Prediction of Multidimensional Emotional Ratings in Music From Audio Using Multivariate Regression Models", ISMIR, Kobe, 2009.

miremotionR² h s t f a

MLR .55 .56 .42 .51 .54PCA .34 .40 .23 .13 .27PLS .59 .64 .52 .60 .57MLR .62 .58 .44 .60 .63PCA .38 .49 .47 .45 .62PLS .65 .61 .55 .64 .67

happyFeature β

fluctuation peaks

-.40sp. centroid .13sp. spread -.19

chrom. peaks -.05majorness .03

sadFeature β

roughness .12register -.08

register var .09majorness .02

harm. change -.03

tenderFeature βRMS var -.42

sp. centroid .14key clarity .11

harm. change -.10tonal novelty -.01

fearFeature βRMS var -.79

fluctuation peaks

-.21key clarity -.09

harm. change .08tonal novelty -.02

angerFeature βRMS var .44

pulse clarity -.13sp. centroid -.13key clarity -.04

tonal novelty .02

Box-

Cox

http://ismir2009.ismir.net/program.html

Part 2

• Rhythm, metrical structure

• Tonal analysis

• Segmentation, structure

• Statistics

• Music & emotion

• Advanced use

mirgetdata return data in Matlab format

Encapsulated datanumerical data,

related sampling rates,related file name,

etc.

s = mirspectrum(‘file’);

s

vector

mirgetdata(s)


Encapsulated datagrouped numerical data,related sampling rates,

related file name,etc.

s = mirspectrum(‘file’, ‘Frame’);

s

matrix

mirgetdata(s)




f = mirfilterbank(‘file’, ‘Frame’);

f

mirgetdata(f)

3D-matrix




s

cell array

mirgetdata(s)

...

sg = mirsegment(‘file’) s = mirspectrum(sg,

‘Frame’);

3D-matrix 3D-matrix




s

mirgetdata(s)

sg = mirsegment(‘Folder’) s = mirspectrum(sg, ‘Frame’);

cell array

...

cell array

...

matrix matrix

file1segm1 segm2

cell array

...

matrix matrix

file2segm1 segm2




p

mirgetdata(p)

p = mirpeaks(a)

peak values

... ... ...

cell arraycell array

peak positions

... ... ...

cell arraycell array

matrix

get returns fields of encapsulated data

• get(a, ‘xName’)

• get(a, ‘xData’)

• get(a, ‘yName’)

• get(a, ‘yData’)

• get(a, ‘yUnit’)

• get(a, ‘FramePos’)

• get(a, ‘Sampling’)

• get(a, ‘NBits’)

• get(a, ‘Title’)

• get(a, ‘FileName’)

• get(a, ‘Label’)

• get(a, ‘Channels’)

• get(a, ‘xPeakSample’)

• get(a, ‘xPeakUnit’)

• get(a, ‘xPeakInterpol’)

• get(a, ‘yPeak’)

• get(a, ‘yPeakInterpol’)

Encapsulated dataa = miraudio(‘ragtime’);

get returns fields of encapsulated data

• get(s, ‘Frequency’) = get(s, ‘xData’)

• get(s, ‘Magnitude’) = get(s, ‘yData’)

• get(s, ‘Phase’)

• get(s, ‘xScale’) (= ‘Freq’, ‘Mel’, ‘Bark’)

• get(s, ‘Power’)

• get(s, ‘dB’)

Encapsulated datas = mirspectrum(‘ragtime’);

etc.

memory managementmirenvelope(‘hugefile’);

hugefile

mirenvelope

Automatichugefile

chunk

mirchunklim chunk size limitation

mirenvelope(‘hugefile’);

chunk

mirenvelope

mirchunklim by default: 500 000 samples

mirchunklim(50000) set to 50 000 samples

hugefile

If memory overflow problems, decrease mirchunklim:

• a = miraudio(‘hugefile’);

• c = mirmfcc(a)a

c

avoid useless call to miraudio

hugefilemiraudio

• c = mirmfcc(‘hugefile’)

mirmfcchugefile

• a = miraudio(‘Folder’);

• c = mirmfcc(a)

miraudio

a

c

• c = mirmfcc(‘Folder’)

mirmfcc

avoid useless call to miraudio

file1 file2 file3 file4


• f = mirframe(‘hugefile’);

• c = mirmfcc(f)

mirframe

f

c

• c = mirmfcc(‘hugefile’, ‘Frame’)

mirmfcc

avoid useless call to mirframe

hugefile

hugefile

?• a = miraudio(‘hugefile’, ‘Sampling’, 11025);

• c = mirmfcc(a)

chunk

miraudio

a

c

what if miraudio (or mirframe) really necessary?

hugefile

mireval flowchart design and evaluation

• a = miraudio(‘Design’, ‘Sampling’, 11025);

• c = mirmfcc(a);

• mireval(c, ‘hugefile’)

chunk

mirmfcc

miraudio

a

c

hugefile


• s = mirspectrum(‘Design’, ‘Frame’);

• c = mircentroid(s);

• mireval(c, ‘hugefile’)

chunk

mircentroid

mirspectrum

s

c

hugefile



• c = mircentroid(s);

• mireval(c, ‘Folder’)

mircentroid

mirspectrum

s

c


mireval flowchart evaluation?


• cent = mircentroid(s);

• cent = mireval(cent, ‘Folder’);

• ceps = mircepstrum(s);

• ceps = mireval(ceps, ‘Folder’);

s

cent

ceps

cent

ceps

s is evaluated twice!

myflow

mirstruct complex flowchart

• myflow = mirstruct;

• myflow.tmp.s = mirspectrum(‘Design’, ‘Frame’);

• myflow.cent = mircentroid(myflow.tmp.s);

• myflow.ceps = mircepstrum(myflow.tmp.s);

• res = mireval(myflow, ‘Folder’);

myflow.tmp.s

myflow.cent

myflow.ceps

res.cent res.ceps

• f = mirstruct;

• f.dynamics.rms = mirrms(‘Design’, ‘Frame’)

• f.tmp.onsets = mironsets(‘Design’);

• f.rhythm.tempo = mirtempo(f.tmp.onsets, ‘Frame’);

• f.tmp.attacks = mironsets(f.tmp.onsets, ‘Attacks’);

• f.rhythm.attack.time = mirattacktime(f.tmp.attacks);

• f.rhythm.attack.slope = mirattackslope(f.tmp.attacks);

f rms

dynamics tmp.onsets

tmp.attacksattack

time slope

complex flowchart

rhythm

tempo

mirtoolbox · self-similarity analysis is a non-parametric technique for studying the global...

Documents