feature subset selection for automatically classifying anuran calls using sensor networks

Feature Subset Selection for Automatically Classifying Anuran Calls Using Sensor Networks

Juan G. ColonnaAfonso D. RibasEduardo F. NakamuraEulanda M. dos Santos

Institute of Computing (IComp)Federal University of Amazon (UFAM)

Introduction - Environmental Motivation

The study of environmental conditions allow:

maintain the quality of life, and to preserve the species.

The loss of species is an irreversible process!The loss of species is an irreversible process!

The variation of species populations enables to:

identify environmental problems in the early stages, and

establish strategies for the conservation of biological diversity.

Introduction - Environmental Motivation

Variations in amphibian populations are related to pollution, deforestation, urbanization, etc.

Frogs can be used as indicators for detecting environmental stress.

Figure: Percentage of threatened species in the red list. Figure adapted from [Stuart et al., 2004].

Introduction – Objectives

Classify frog species of tropical forests based on the vocalizations

using wireless sensor networks and machine learning technique.*

4* Consideration: Restrictions on the hardware.

Introduction - Challenges

Develop a method that does not need human intervention.

Characterize the spectral frequency of frog.

Extract and select the optimal set of features.

Define the classification technique.

Get the minimum set of features using genetic algorithm.

Obtain the cost of processing characteristics.

Correlate the processing cost and success rate.

Maximize the benefit cost rate.

5

WSN and Machine Learning

Related Work

6

Author Animal Features Classifier Results WSN

Taylor et al. [1996] Bufo marinus Spectrograma C4.5 60% No

Hu et al. [2005] Bufo marinus Spectrograma C4.5 60% Yes

Yen & Fu [2002]* 4 frog WaveletFisher’s

MLP 71% No

Clemins [2005] elephant MFCCsPLP

HMMDTW

69%73%

No

Cai et al. [2007] 14 bird MFCCs ANN 81% - 86% Yes

Huang et al. [2009]* 5 frog S - B - ZC k-NNSVM

83% - 100%82% - 100%

No

Vaca-Castaño & Rodriguez [2010]*

10 bird20 frog

MFCCsPCA

k-NN 86%91%

Yes

Han et al. [2011]* 9 frog S - Hs - Hr k-NN 83% - 100% No

* Work implemented and used in the comparisons.

Our approach

7

Figure: Parametrization of vocalizations.

Figure: Anuran classification stages. Figure: Pre-processing steps.

Features

8

Figure: Mel-Fourier Cepstral Coefficients (MFCCs).

Figure: Wavelet Transform with Lifting Scheme.

Obtain the features

9

Figure: Feature extraction.

Spectrogram

10

Figure: Audio sample (wave form and spectrogram) for the Adenomera andreae..

11

Features

Feature Complexity order Computational cost

Pitch O(L) 3L − 1

B O(Nlog(N)) 2M + 2M + Nlog(N)

12 MFCC’s O(Nlog(N)) Nlog(N) + N + mR

S O(Nlog(N)) 2M + Nlog(N)

H1 O(L) L + i

H2 O(L) L + i

ZC O(L) L

E O(L) L

Pw O(L) L

Comparison between MFCCs and Wavelet

12

Features k-NN

0.4 0.5 0.6

Wavelet FeaturesDaubechies Transform

96.35%(3) 97.86%(1) 98.22%(1)

Wavelet FeaturesHaar Transform

96.70%(1) 97.90%(1) 98.38%(1)

MFCCs 99.19%(9) 99.36%(2) 99.19%(1)

Table: Success rate in relation to alpha, using cross-validation fold = 10.

Applying the Wilcoxon test, with 95% significance level (α = 0.5), we conclude that the MFCCs have better performance.


13

Objective: To determine the optimal subset of features by applying GA.


14

Features Classificationbefore GA

Crossover 50%Mutation 40%

Success rate Crossover 60%Mutation 20%

Success rate

9 features with Db

97.86%(1) 1,2,3,5 93.73% 1,2,3,4,5,6,8,9 96.83%

9 featureswith Haar

97.90%(1)* 2,3,4,5,6,8,9 96.47% 1,2,3,4,5,6,7,8,9 97.90%*

12 MFCCs 99.36%(2)* 1,2,3,4,5,6,7,11 99.08% 1,2,3,4,5,6,7,8,911,12

99.33%*

Case of Study

fs = 44.1kHz

fs =5.5kHz

fs = 11kHz

Conclusions

We indicated how best set of features to choose the 12 MFCCs.

You can optimize costs by using 8 MFCCs, although the method loses generality.

The MFFCs have:

✔ Better success rate;✔ Constant cost, regardless of hardware, and✔ Immunity to environmental and quantization noise.

16

Questions?

17

Thanks

feature subset selection for automatically classifying anuran calls using sensor networks

Technology

features figure

db9 features

haar12 mfccs

optimal set of features

spectrogram figure

optimal subset of features

minimum set of features

best set of features