![Page 1: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/1.jpg)
Audio Content Analysis in The Presence of Overlapped Classes - A Non-Exclusive Segmentation Approach to
Mitigate Information Losses
Global Summit and Expo onMultimedia & Applications
August 10-11, 2015 Birmingham, UK
![Page 2: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/2.jpg)
Increasing volume of digital Media archives leading to increased demand for these goals
Introduction
![Page 3: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/3.jpg)
Classification- Challenge and Solution
11:50
Speech
SE
Music
Classical classification problems are logically exclusive, i.e. an element is assumed to be a member of one class and of that class only. This hinders some practical uses in audio information mining, since a segment of the soundtrack can have either speech, music, event sounds or a combination of them (fuzzy element)
Non-exclusive classification can mitigate info losses.
11:50
![Page 4: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/4.jpg)
A system integration approach to audio information mining can be
hypothetically built upon the success in the following diverse areas.
To re-deploy these tools, it is essential that a pre-processor should effectively
Where speech, music and audio events of interest occur.These audio segments can be further processed by dedicated algorithms to obtain further information.
The Concept
Hello Door Knock
![Page 5: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/5.jpg)
Universal Open Architecture
![Page 6: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/6.jpg)
Spectral Subtraction Algorithm
11:50
A noise reduction technique.
VAD is employed to detects musical speech and musical
segments
Calculate spectral magnitude to musical and musical speech
segments.
Estimate the clean speech through the following formula
))((|])(ˆ||)(|)(ˆ idjeidixis
![Page 7: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/7.jpg)
![Page 8: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/8.jpg)
Data reduction.
Extract characteristic
features.
Feature Spaces
Mel Frequency Cepstrum Coefficients (MFCCs).
STFT –Temporal pattern analysis. ZCR, RMS ‘Loudness’, Entropy,
Short term energy. Optimized Feature Space For
Speech and Music Detection.
![Page 9: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/9.jpg)
11:50
• Music Analysis Retrieval and SYnthesis for Audio Signals.• Open source framework for audio processing by George
Tzanetakis University of Victoria Canada.• Development of real time audio analysis and synthesis tools• Audio processing system with specific emphasis on MIR.• Implemented for exclusive classification (Speech or Music).• Music genre organisation.
![Page 10: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/10.jpg)
Speech and Music classes are involved as starting point. Toward generalization, different styles of samples were
included in the training set. Speech samples (children, male, female, speaker with
different languages, aloud speech, speech with laughs,). Music, all genres are added (Jazz, pop, classical, rock ,…). All speech and music samples were mixed together after
normalizing them to produce speech over music samples.
Training Database Building
Pure Speech Mix Samples Pure Music
Speech 100% 90% 80% 70% 60% 50% 40% 30% 0%
Music 0% 10% 20% 30% 40% 50% 60% 70% 100%
![Page 11: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/11.jpg)
Toolbox Demonstration
![Page 12: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/12.jpg)
Results Comparison Before and After Speech Enhance
11:50
AUDIO CLASS MARSYAS ED UOA Length/Seconds
Fr Fa ERD Fr Fa ERD
SPEECH 45.56% 7.03% 26.30% 2.49% 8.45% 5.47% 1580
MUSIC 7.70% 45.56% 26.63% 11.76% 1.53% 6.65% 2115
2/)( FrFationRateErrorDetec
![Page 13: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/13.jpg)
Open Structure and Common Interfaces toward
general classifier.
Redeployment of currently available techniques.
Encourage third party contributions.
Rapid prototyping of UOA Audio Information Mining
system.
Summary and Conclusions
![Page 14: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/14.jpg)
Thank you for Listening
![Page 15: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/15.jpg)
Audio Routing
![Page 16: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/16.jpg)
Machine Learning
![Page 17: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/17.jpg)
Sound Events Detections
![Page 18: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/18.jpg)
ASR
![Page 19: Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The](https://reader034.vdocuments.us/reader034/viewer/2022051401/5697bf761a28abf838c80b19/html5/thumbnails/19.jpg)
Role of MIR in UOA