spectral centroid pianoflute piano flute decayed not decayed f0-dependent mean function which...

1
Spectral centroid Spectral centroid Piano Flute Piano Flute decayed not decayed F0-dependent mean function which captures the pitch dependency (i.e. the position of distributions of each F0) F0-normalized covariance which captures the non-pitch dependency Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution Tetsuro Kitahara*, Masataka Goto** and Hiroshi G. Okuno* (*Graduate School of Informatics, Kyoto University, Japan, **PRESTO JST / National Institute of Advanced Industrial Science and Technology, Japan) It is to obtain the names of musical instruments from sounds (acoustical signals). It is a kind of pattern recognition. It is useful for various applications. e.g. automatic music transcription, music information retrieval, MPEG-7 annotation, human-robot interaction via music, and many entertainment applications Its research began recently (since 1990s). 1. What is musical instrument identification? Feature Extraction (e.g. Decay speed, Spectral centroid) p(X|w flute ) p(X|w piano ) w = argmax p(w|X) = argmax p(X|w) p(w) <inst>piano</inst> 2. What is difficult in musical instrument identification? The pitch dependency of timbre e.g. Low-pitch piano sound = Slow decay High-pitch piano sound = Fast decay 0 1 2 3 -0.5 0 0.5 (a) Pitch = C2 (65.5Hz) time [s] 0 1 2 3 -0.5 0 0.5 (b) Pitch = C6 (1048Hz) time [s] In previous studies… The pitch dependency of timbre was pointed out, but was NOT dealt with explicitly. 3. How is the pitch dependency coped with? 1.Approximate the pitch dependency of each feature as a function of fundamental frequency (F0). 2.Estimate feature distributions of each F0 using this function. F0-dependent multivariate normal distribution The pitch dependency of timbre and its function approximation It is a distribution for representing musical sound features depending on the pitch. It has following two parameters: F0-dependent mean function: obtained by function approximation of the pitch dependency of each feature. F0-normalized covariance: obtained by normalizing the F0-dependent mean. The pitch dependency and the non-pitch dependency of timbre can be separated by estimating these parameters. 4. F0-dependent multivariate normal distribution 5. A musical instrument identification method using the F0-dependent multivariate normal distribution 1 st step: Feature extraction 129 features defined based on consulti ng literatures are extracted. e.g. Spectral centroid (which captures bright ness of tones) Decay speed of power 2 nd step: Dimensionality reduction First: PCA (principal component analys is) 129-dimension 79-dimension (with the proportion value of 9 9%) Second: LDA (linear discriminant analy sis) 79-dimension 18-dimension 3 rd step: Parameter estimation of the F0- dependent multivariate normal distributio n First: the F0-dependent mean function is approxi- mated as a cubic polynomial. Second: the F0-normalized covariance i s obtained by normalizing the F0-dependent m ean. Final step: Applying the Bayes decision rule The instrument w satisfying w = argmax [log p(X|w; f) + log p (w; f)] is determined as the result. eliminating the pitch dependency Experimental conditions: Database: A subset of RWC-MDB-I-2001 Consists of solo tones of 19 real instruments with all pitch range. Contains 3 individuals and 3 intensities for each instrument. Contains normal articulation only. The number of all sounds is 6,247. Using the 10-fold cross validation. Evaluate the performance both at individual-instrument level and at category level. Experimental results (Recognition rates): The proposed method improved recognition rates: 75.73%79.73% (at individual level) (Error reduction rate: 16.48%) 88.20%90.65% (at category level) (Error reduction rate: 20.67%) Recognition rates of 6 instruments were improved by more than 7%. Recognition rates of the piano were best improved. (74.21% 83.27%) Because the piano has the wide pitch range. The Bayes decision rule vs. k-NN rule - PCA+LDA+Bayes achieved the best performance. - LDA improved the performance. - Bayes with 79 dim. showed poor performance. ( # of training data is not enough.) 6. Experiments Piano Piano Guitars Classical Guitar, Ukulele, Acoustic Guitar Strings Violin, Viola, Cello Brass Trumpet, Trombone Saxophones Soprano Sax, Alto Sax, Tenor Sax, Baritone Sax Double Reeds Oboe, Faggot o Clarinet Clarinet Air Reeds Piccolo, Flute, Recorder The above categorization is adopted for evaluating the performance at category level. 0 20 40 60 80 100 C ategory Individual Proposed B aseline 7. Conclusions To cope with the pitch dependency of timbre in musical ins trument identifi-cation, the F0-dependent multivariate norma l distribution is proposed. Experimental results of identifying 6,247 solo tones of 19 instruments show that the proposed method improved th e recognition rate (75.73%79.73%). Future works include evaluation against mixture of sound s and development of application systems using the propose d method. 0 20 40 60 80 100 We adopted Bayes (18 dim; PCA+LDA) Bayes (18 dim; PCA only) Bayes (79 dim; PCA only) 3-NN (18 dim; PCA+LDA) 3-NN (18 dim; PCA only) 3-NN (79 dim; PCA only) The 4 th IEEE Int’l Conf. on Multimedia & Expo (6 th -9 th July 2003 in Baltimore, MD, USA)

Upload: malcolm-stevenson

Post on 14-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spectral centroid PianoFlute Piano Flute decayed not decayed F0-dependent mean function which captures the pitch dependency (i.e. the position of distributions

Spectral centroid Spectral centroidPiano Flute

Piano Flute

decayednot decayed

F0-dependent mean functionwhich captures the pitch dependency(i.e. the position of distributions of each F0)

F0-normalized covariancewhich captures the non-pitch

dependency

Musical Instrument Identification based on F0-dependent Multivariate Normal DistributionTetsuro Kitahara*, Masataka Goto** and Hiroshi G. Okuno*

(*Graduate School of Informatics, Kyoto University, Japan, **PRESTO JST / National Institute of Advanced Industrial Science and Technology, Japan)

It is to obtain the names of musical instruments from sounds (acoustical signals).

It is a kind of pattern recognition.It is useful for various applications.

e.g. automatic music transcription,music information retrieval,MPEG-7 annotation,human-robot interaction via music,and many entertainment applications

Its research began recently (since 1990s).

1. What is musical instrument identification?

Feature Extraction (e.g. Decay speed, Spectral centroid)

p(X|wflute)

p(X|wpiano)

w = argmax p(w|X) = argmax p(X|w) p(w)

<inst>piano</inst>

2. What is difficult in musical instrument identification?The pitch dependency of timbre

e.g. Low-pitch piano sound = Slow decayHigh-pitch piano sound = Fast decay

0 1 2 3-0.5

0

0.5(a) Pitch = C2 (65.5Hz)

time [s]0 1 2 3

-0.5

0

0.5(b) Pitch = C6 (1048Hz)

time [s]

In previous studies…The pitch dependency of timbre was pointed out, but was NOT dealt with explicitly.

3. How is the pitch dependency coped with?1. Approximate the pitch dependency of each feature

as a function of fundamental frequency (F0).

2. Estimate feature distributions of each F0 using this function.F0-dependent multivariate normal distribution

The pitch dependency of timbre and its function approximation

It is a distribution for representing musical sound features depending on the pitch.

It has following two parameters:F0-dependent mean function: obtained by function

approximation of the pitch dependency of each feature.F0-normalized covariance: obtained by normalizing

the F0-dependent mean.The pitch dependency and the non-pitch dependency of

timbre can be separated by estimating these parameters.

4. F0-dependent multivariate normal distribution

5. A musical instrument identification method using the F0-dependent multivariate normal distribution

1st step: Feature extraction129 features defined based on consulting literatures are extracted.

e.g. Spectral centroid (which captures brightness of tones)

Decay speed of power

2nd step: Dimensionality reductionFirst: PCA (principal component analysis)

129-dimension 79-dimension(with the proportion value of 99%)

Second: LDA (linear discriminant analysis)79-dimension 18-dimension

3rd step: Parameter estimation of the F0-dependent multivariate normal distribution

First: the F0-dependent mean function is approxi-mated as a cubic polynomial.

Second: the F0-normalized covariance is obtainedby normalizing the F0-dependent mean.

Final step: Applying the Bayes decision ruleThe instrument w satisfying

w = argmax [log p(X|w; f) + log p(w; f)]is determined as the result.

eliminating the pitch dependency

Experimental conditions: Database: A subset of RWC-MDB-I-2001

Consists of solo tones of 19 real instrumentswith all pitch range.

Contains 3 individuals and 3 intensitiesfor each instrument.

Contains normal articulation only.The number of all sounds is 6,247.

Using the 10-fold cross validation.Evaluate the performance both at

individual-instrument level and at category level.Experimental results (Recognition rates):The proposed method improved recognition rates:

75.73%79.73% (at individual level)(Error reduction rate: 16.48%)

88.20%90.65% (at category level)(Error reduction rate: 20.67%)

Recognition rates of 6 instruments were improved by more than 7%.

Recognition rates of the piano were best improved. (74.21% 83.27%)Because the piano has the wide pitch range.

The Bayes decision rule vs. k-NN rule- PCA+LDA+Bayes achieved the best performance.- LDA improved the performance.- Bayes with 79 dim. showed poor performance. ( # of training data is not enough.)

6. Experiments

Piano Piano

Guitars Classical Guitar, Ukulele, Acoustic Guitar

Strings Violin, Viola, Cello

Brass Trumpet, Trombone

Saxophones Soprano Sax, Alto Sax, Tenor Sax, Baritone Sax

Double Reeds Oboe, Faggoto

Clarinet Clarinet

Air Reeds Piccolo, Flute, Recorder

The above categorization is adopted for evaluating the performance at category level.

0 20 40 60 80 100

Category

Individual

Proposed Baseline

7. ConclusionsTo cope with the pitch dependency of timbre in musical instrument identifi-

cation, the F0-dependent multivariate normal distribution is proposed.Experimental results of identifying 6,247 solo tones of 19 instruments show

that the proposed method improved the recognition rate (75.73%79.73%).Future works include evaluation against mixture of sounds

and development of application systems using the proposed method.

0 20 40 60 80 100

We adopted

Bayes (18 dim; PCA+LDA)Bayes (18 dim; PCA only)Bayes (79 dim; PCA only)3-NN (18 dim; PCA+LDA)3-NN (18 dim; PCA only)3-NN (79 dim; PCA only)

The 4th IEEE Int’l Conf. on Multimedia & Expo (6th-9th July 2003 in Baltimore, MD, USA)