Automatic detection of microchiroptera echolocation calls from field recordings
using machine learning algorithms
Mark D. Skowronski and John G. Harris
Computational Neuro-Engineering Lab
Electrical and Computer Engineering
University of Florida, Gainesville, FL, USA
May 19, 2005
Overview• Motivations for acoustic bat detection
• Machine learning paradigm
• Detection experiments
• Conclusions
Bat detection motivations• Bats are among the most diverse yet least
studied mammals (~25% of all mammal species are bats).
• Bats affect agriculture and carry diseases (directly or through parasites).
• Acoustical domain is significant for echolocating bats and is non-invasive.
• Recorded data can be volumous automated algorithms for objective and repeatable detection & classification desired.
Conventional methods• Conventional bat detection/classification parallels
acoustic-phonetic paradigm of automatic speech recognition from 1970s.
• Characteristics of acoustic phonetics:– Originally mimicked human expert methods– First, boundaries between regions determined – Second, features for each region were extracted– Third, features compared with decision trees, DFA
• Limitations:– Boundaries ill-defined, sensitive to noise– Many feature extraction algorithms with varying degrees of noise
robustness
Machine learning• Acoustic phonetics gave way to machine
learning for ASR in 1980s:• Advantages:
– Decisions based on more information– Mature statistical foundation for algorithms– Frame-based features, from expert knowledge– Improved noise robustness
• For bats: increased detection range
Detection experiments• Database of bat calls
– 7 different recording sites, 8 species– 1265 hand-labeled calls (from spectrogram
readings)
• Detection experiment design– Discrete events: 20-ms bins– Discrete outcomes: Yes or No: does a bin
contain any part of a bat call?
Detectors• Baseline
– Threshold for frame energy
• Gaussian mixture model (GMM)– Model of probability distribution of call features– Threshold for model output probability
• Hidden Markov model (HMM)– Similar to GMM, but includes temporal constraints through piecewise-
stationary states– Threshold for model output probability along Viterbi path
Feature extraction• Baseline
– Normalization: session noise floor at 0 dB– Feature: frame power
• Machine learning– Blackman window, zero-padded FFT– Normalization: log amplitude mean subtraction
• From ASR: ~cepstral mean subtraction• Removes transfer function of recording environment• Mean across time for each FFT bin
– Features:• Maximum FFT amplitude, dB• Frequency at maximum amplitude, Hz• First and second temporal derivatives (slope, concavity)
Feature extraction examples
Feature extraction examples
Feature extraction examples
Six features: Power, Frequency, P, F P, F
Detection example
Experiment results
Experiment results
Conclusions• Machine learning algorithms improve detection
when specificity is high (>.6).• HMM slightly superior to GMM, uses more
temporal information, but slower to train/test.• Hand labels determined using spectrogram,
biased towards high-power calls.• Machine learning models applicable to other
species.
Bioacoustic applications• To apply machine learning to other species:
– Determine ground truth training data through expert hand labels
– Extract relevant frame-based features, considering domain-specific noise sources (echos, propellor noise, other biological sources)
– Train models of features from hand-labeled data– Consider training “silence” models for discriminant
detection/classification
Further information• http://www.cnel.ufl.edu/~markskow• [email protected]
AcknowledgementsBat data kindly provided by:
Brock Fenton, U. of Western Ontario, Canada