“wow!”bayesiansurpriseforsalientacousticeventdetectionbschauer/...“wow!”bayesiansurpriseforsalientacousticeventdetection...

1
“Wow!” Bayesian Surprise for Salient Acoustic Event Detection Boris Schauerte & Rainer Stiefelhagen Karlsruhe Institute of Technology {boris.schauerte, rainer.stiefelhagen}@kit.edu A BSTRACT We propose the use of Bayesian surprise to detect arbitrary, salient acoustic events. We use Gaussian or Gamma distributions to model the spectrogram distribution and use the Kullback-Leibler divergence of the posterior and prior distribution to calculate how “unexpected” and thus surprising newly observed audio samples are. This way, we efficiently detect arbitrary surprising/salient acoustic events. M OTIVATION identify subsets within sensory inputs that are likely to contain important information focus complex processing operations on the selected, potentially relevant information in general, drastically reduce the computa- tional requirements to process data real-time processing and reflex-like reac- tions despite computational restrictions P RINCIPLE an observed spectrogram element G(t, ω ) is “surprising” if the updated (using Bayes’ rule) distribution P ω post differs significantly from the prior distribution P ω prior S A (t, ω )= D KL (P ω post ||P ω prior ) (1) = Z P ω post log P ω post P ω prior dg (2) with Kullback-Leiber divergence D KL surprise at time t over all frequencies S A (t)= 1 |Ω| X ωΩ S A (t, ω ) (3) the unit of surprise is called “wow” [4] M ODELS Gaussian: S A (t, ω )= 1 2 [log |Σ ω prior | |Σ ω post | + Tr h Σ ω -1 prior Σ ω post i - I D + (4) (μ ω post - μ ω prior ) T Σ ω -1 prior (μ ω post - μ ω prior )] mean μ and variance Σ of the data in the considered time window, i.e. history advantage: exact closed form solution exists (highly efficient to calculate) Gamma: S A (t, ω )= α 0 log β β 0 + log Γ(α 0 ) Γ(α) (5) +β 0 α β +(α - α 0 )ψ (α) (6) α, β > 0, and Gamma function Γ and Digamma function ψ advantage: better control over the history using the decay/forgetting factor 0 <ζ< 1 and update rule α 0 = ζα + G(t, ω ) (7) β 0 = ζβ +1 (8) E VALUATION Idea: we can not simply observe humans to pro- vide a measure of acoustic saliency pragmatic, application-oriented approach: use existing acoustic event detection and classification datasets salient acoustic event detection has to sup- press “uninteresting” audio data while highlighting potentially relevant and thus salient data segments CLEAR2007 acoustic event detection dataset: recordings of meetings in a smart room a human user marked and classified (14 classes) acoustic events not all events could be classified by the hu- man user (i.e., “unknown” class) F β score as evaluation measure: F β = (1 + β 2 ) · precision · recall (β 2 · precision) + recall (9) we want to detect all prominent events, i.e. a high recall is most important we can tolerate false positives as long as we achieve a net run-time benefit when focus- ing subsequent algorithms, i.e. a high preci- sion is of secondary interest β times as much importance to recall as precision” Results: F 1 F 2 F 4 STFT + Gamma 0.7668 0.8924 0.9665 STCT + Gamma 0.7658 0.8916 0.9655 MDCT + Gamma 0.7644 0.8894 0.9647 STFT + Gaussian 0.7604 0.8832 0.9531 STCT + Gaussian 0.7612 0.8813 0.9529 MDCT + Gaussian 0.7613 0.8805 0.9538 A PPLICATIONS Robotics [1]: the robot can efficiently detect, investigate, and react on arbitrary, unexpected - i.e., in- teresting - events focus and make better use of the robot’s lim- ited computational ressources Intensive Care [2]: use Gaussian surprise to detect (sudden) patient agitation based on facial features B IOLOGICAL M OTIVATION spectrogram basilar membrane [3] surprise early sensory neurons [4] C ODE ? Gamma and Gaussian surprise implemen- tation public (BSD license) at http:// bit.ly/ZjzXqr comes with a ready to go audio example References [1] B. Schauerte, B. Kühn, K. Kroschel, R. Stiefelhagen, Multimodal Saliency-based Attention for Object-based Scene Analysis, in IROS, 2011 [2] M. Martinez, R. Stiefelhagen, Automated Multi-Camera System for Long Term Behavioral Monitoring in Intensive Care Units, in MVA, 2013 [3] Schnupp J., Nelken I., King A, Auditory Neuroscience, MIT Press, Cambridge, MA, 2011. [4] L. Itti and P. F. Baldi, Bayesian surprise attracts human attention, in NIPS, 2006. Acknowledgment This work was partially funded by the French state agency for innovation (OSEO) within the Quaero programme and the German research foundation (DFG) within the collab- orative research program “Humanoide Roboter”.

Upload: others

Post on 20-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: “Wow!”BayesianSurpriseforSalientAcousticEventDetectionbschauer/...“Wow!”BayesianSurpriseforSalientAcousticEventDetection Boris Schauerte & Rainer Stiefelhagen Karlsruhe Institute

“Wow!” BayesianSurprise forSalientAcousticEventDetectionBoris Schauerte & Rainer Stiefelhagen

Karlsruhe Institute of Technologyboris.schauerte, [email protected]

ABSTRACT

We propose the use of Bayesian surprise to detect arbitrary, salient acoustic events.We use Gaussian or Gamma distributions to model the spectrogram distribution anduse the Kullback-Leibler divergence of the posterior and prior distribution to calculatehow “unexpected” and thus surprising newly observed audio samples are. This way,we efficiently detect arbitrary surprising/salient acoustic events.

MOTIVATION• identify subsets within sensory inputs that

are likely to contain important information

• focus complex processing operations on theselected, potentially relevant information

• in general, drastically reduce the computa-tional requirements to process data

• real-time processing and reflex-like reac-tions despite computational restrictions

PRINCIPLE• an observed spectrogram element G(t, ω) is

“surprising” if the updated (using Bayes’rule) distribution Pωpost differs significantlyfrom the prior distribution Pωprior

SA(t, ω) = DKL(Pωpost||Pωprior) (1)

=

∫Pωpost log

Pωpost

Pωprior

dg (2)

with Kullback-Leiber divergence DKL

• surprise at time t over all frequencies

SA(t) =1

|Ω|∑ω∈Ω

SA(t, ω) (3)

• the unit of surprise is called “wow” [4]

MODELSGaussian:

SA(t, ω) =1

2[log|Σωprior||Σωpost|

+ Tr[Σω

−1

priorΣωpost

]− ID + (4)

(µωpost − µωprior)TΣω

−1

prior(µωpost − µωprior)]

• mean µ and variance Σ of the data in theconsidered time window, i.e. history

• advantage: exact closed form solution exists(highly efficient to calculate)

Gamma:

SA(t, ω) = α′ logβ

β′ + logΓ(α′)

Γ(α)(5)

+β′ α

β+ (α− α′)ψ(α) (6)

• α, β > 0, and Gamma function Γ andDigamma function ψ

• advantage: better control over the historyusing the decay/forgetting factor 0 < ζ < 1and update rule

α′ = ζα+G(t, ω) (7)

β′ = ζβ + 1 (8)

EVALUATIONIdea:

• we can not simply observe humans to pro-vide a measure of acoustic saliency

• pragmatic, application-oriented approach:use existing acoustic event detection andclassification datasets

• salient acoustic event detection has to sup-press “uninteresting” audio data whilehighlighting potentially relevant and thussalient data segments

CLEAR2007 acoustic event detection dataset:

• recordings of meetings in a smart room

• a human user marked and classified (14classes) acoustic events

• not all events could be classified by the hu-man user (i.e., “unknown” class)

Fβ score as evaluation measure:

Fβ = (1 + β2) ·precision · recall

(β2 · precision) + recall(9)

• we want to detect all prominent events, i.e.a high recall is most important

• we can tolerate false positives as long as weachieve a net run-time benefit when focus-ing subsequent algorithms, i.e. a high preci-sion is of secondary interest

• “β times as much importance to recall asprecision”

Results:F1 F2 F4

STFT + Gamma 0.7668 0.8924 0.9665STCT + Gamma 0.7658 0.8916 0.9655MDCT + Gamma 0.7644 0.8894 0.9647STFT + Gaussian 0.7604 0.8832 0.9531STCT + Gaussian 0.7612 0.8813 0.9529MDCT + Gaussian 0.7613 0.8805 0.9538

APPLICATIONSRobotics [1]:

• the robot can efficiently detect, investigate,and react on arbitrary, unexpected - i.e., in-teresting - events

• focus and make better use of the robot’s lim-ited computational ressources

Intensive Care [2]:

• use Gaussian surprise to detect (sudden)patient agitation based on facial features

BIOLOGICAL MOTIVATION• spectrogram ∼ basilar membrane [3]

• surprise ∼ early sensory neurons [4]

CODE?• Gamma and Gaussian surprise implemen-

tation public (BSD license) at http://bit.ly/ZjzXqr

• comes with a ready to go audio example

References[1] B. Schauerte, B. Kühn, K. Kroschel, R. Stiefelhagen,

Multimodal Saliency-based Attention for Object-basedScene Analysis, in IROS, 2011

[2] M. Martinez, R. Stiefelhagen, Automated Multi-CameraSystem for Long Term Behavioral Monitoring in IntensiveCare Units, in MVA, 2013

[3] Schnupp J., Nelken I., King A, Auditory Neuroscience,MIT Press, Cambridge, MA, 2011.

[4] L. Itti and P. F. Baldi, Bayesian surprise attracts humanattention, in NIPS, 2006.

AcknowledgmentThis work was partially funded by the French state

agency for innovation (OSEO) within the Quaero programmeand the German research foundation (DFG) within the collab-orative research program “Humanoide Roboter”.

1