impact of audio degradation on music...
TRANSCRIPT
Universitat Politècnica de Catalunya
Master Thesis
Impact of audio degradation on musicclassification
Author:Francesc Capó Clar
Supervisor:Dr. Andreas Rauber
A thesis submitted in fulfilment of the requirementsfor the degree of Telecommunications Engineering
in the
Departament de Teoria del Senyal i ComunicacionsEscola Tècnica Superior d’Enginyeria de Telecomunicació de Barcelona
July 2014
Declaration of Authorship
I, Francesc Capó Clar, declare that this thesis titled, ’Impact of audio degradation on
music classification’ and the work presented in it are my own. I confirm that:
� This work was done wholly or mainly while in candidature for a research degree at
this University.
� Where any part of this thesis has previously been submitted for a degree or any
other qualification at this University or any other institution, this has been clearly
stated.
� Where I have consulted the published work of others, this is always clearly at-
tributed.
� Where I have quoted from the work of others, the source is always given. With the
exception of such quotations, this thesis is entirely my own work.
� I have acknowledged all main sources of help.
� Where the thesis is based on work done by myself jointly with others, I have made
clear exactly what was done by others and what I have contributed myself.
Signed:
Date:
i
“Music expresses that which cannot be said and on which it is impossible to be silent."
Victor Hugo
UNIVERSITAT POLITÈCNICA DE CATALUNYA
AbstractEscola Tècnica Superior d’Enginyeria de Telecomunicació de Barcelona
Departament de Teoria del Senyal i Comunicacions
Telecommunications Engineering
Impact of audio degradation on music classification
by Francesc Capó Clar
Music genre classification is an important part in the domain of Music Information
Retrieval that tries to find several algorithms and audio analysis to classify audio tracks
in different music styles or musical genres, like classical, rock or electronic, among others.
The benchmark collections used to make the training model for the classifier tend to
be based on high-quality studio recordings. Nevertheless, sometimes we need to classify
audio coming from several recording ways or from a non high-quality audio recording, so
this fact could have a direct impact on the genre classification.
In this document we present extensive studies focused on the impact of distorted audio
on the music genre classification. We intend to find which are the most robust and the
weakest attributes of several feature sets against a variety of distortions in controlled
settings, as well as their effect on the resulting classification.
Keywords: Music genre classification, audio degradation, Music Information Retrieval,
automatic classification, music features robustness
Acknowledgements
First of all, I would want to thank my supervisor, Andreas Rauber, for suggesting me
this interesting project and for his help, comprehension and for being patient with me
during my learning and work.
Next, thanks to my home university supervisor, Antonio Bonafonte, for all the questions
and doubts solved.
In particular, I would like to thank my parents and my family, for giving me the op-
portunity to study this degree and for their unconditional support during all these years
and for helping me in the hardest moments.
Further, an acknowledgement to my all lifelong friends and the “telecos” for encouraging
me in every moment and for making this a more enjoyable and nice way. Especially,
thanks to Xavi Bernat, Pere Guillem Mas and Bernat Orell for their technical help and
Maria Antònia Orell for her linguistic support.
Finally, thanks to the Technische Universität Wien for receiving me as a student, as well
as thanks to Vienna city for making my Erasmus an unrepeatable experience.
Xesc
iv
Agraïments
En primer lloc, voldria agrair al meu supervisor, Andreas Rauber, per suggerir-me aquest
interessant projecte i per la seva ajuda, comprensió i paciència amb jo durant el meu
aprenentatge i treball.
També, un agraïment al meu supervisor de la meva pròpia universitat, Antonio Bona-
fonte, per totes les preguntes i dubtes resolts.
En particular, m’agradaria donar les gràcies als meus pares i família, per donar-me la
oportunitat d’estudiar aquesta carrera, pel seu suport incondicional durant tots aquests
anys i per ajudar-me en els moments més difícils.
També, un especial agraïment als meus amics de tota la vida i als “telecos” per donar-me
ànims en tot moment i fer d’aquest, un camí molt més divertit i agradable. Especialment,
agrair a en Xavi Bernat, en Pere Guillem Mas i en Bernat Orell per la seva ajuda tècnica,
a la vegada que a na Maria Antònia Orell pel seu suport lingüístic.
Per acabar, donar les gràcies a la Technische Universität Wien per acollir-me com a
estudiant, així com agrair a la ciutat de Viena per fer del meu Eramus una experiència
irrepetible.
Xesc
Contents
Declaration of Authorship i
Abstract iii
Acknowledgements / Agraïments iv
Contents vi
List of Figures viii
List of Tables ix
Abbreviations x
1 Introduction 11.1 Music Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Music genre classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Related Work 42.1 Music classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Audio degradation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Noise models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Experimental Set-up 103.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Audio Degradation Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Synthetic Distortions . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2.2 Real World Distortions . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Feature Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.4 Machine Learning Software: Weka . . . . . . . . . . . . . . . . . . . . . . 17
4 Impact of Degradations 214.1 Effect on features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.1 Feature processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.1.2 Feature differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
vi
Contents vii
4.2 Effect on classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2.1 Creation of 10 - CV folds . . . . . . . . . . . . . . . . . . . . . . . 274.2.2 Analysis of classification results . . . . . . . . . . . . . . . . . . . . 28
5 Results classifying with mixed degradations 325.1 Creation of training and mixed test sets . . . . . . . . . . . . . . . . . . . 325.2 Training and classifying with all attributes . . . . . . . . . . . . . . . . . . 345.3 Attribute selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.1 Attribute selection process . . . . . . . . . . . . . . . . . . . . . . . 355.3.2 Possible results with attribute selection . . . . . . . . . . . . . . . 415.3.3 Training and classifying with most robust attributes . . . . . . . . 425.3.4 Training with all attributes and classifying missing the weaker at-
tributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6 Summary and Further Work 476.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A Worst degradations - Attribute selection 50A.1 Mean differences of ISMIR worst degradations . . . . . . . . . . . . . . . . 50A.2 Variance differences of ISMIR worst degradations . . . . . . . . . . . . . . 51A.3 Mean differences of GTZAN worst degradations . . . . . . . . . . . . . . . 52A.4 Variance differences of GTZAN worst degradations . . . . . . . . . . . . . 53
B Classification of mixed degradations 54B.1 Complete classification results of Section 5.3.3 . . . . . . . . . . . . . . . . 55B.2 Complete classification results of Section 5.3.4 . . . . . . . . . . . . . . . . 58
C Attached files 61
Bibliography 62
List of Figures
1.1 Spotify Radio Genres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Weighting Loudness Filter Curves . . . . . . . . . . . . . . . . . . . . . . . 82.2 Masking Auditory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Dolby System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Feature extraction process . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 10-Fold cross validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1 Mean and variance differences calculation . . . . . . . . . . . . . . . . . . 224.2 Folder structure for the features processing . . . . . . . . . . . . . . . . . 224.3 RP mean differences with Smartphone Recording degradation . . . . . . . 244.4 RH mean differences with Vinyl degradation . . . . . . . . . . . . . . . . . 254.5 SSD mean differences with Low Pass Filtering degradation . . . . . . . . . 254.6 MVD mean differences with Smartphone Playback degradation . . . . . . 26
5.1 Creation of mixed degraded test set . . . . . . . . . . . . . . . . . . . . . . 335.2 Creation of worst degradation . . . . . . . . . . . . . . . . . . . . . . . . . 365.3 ISMIR MVD worst degradation and attributes selection . . . . . . . . . . 385.4 ISMIR RP worst degradation and attributes selection . . . . . . . . . . . . 385.5 ISMIR RH worst degradation and attributes selection . . . . . . . . . . . 395.6 ISMIR SSD worst degradation and attributes selection . . . . . . . . . . . 395.7 ISMIR TSSD worst degradation and attributes selection . . . . . . . . . . 395.8 ISMIR TRH worst degradation and attributes selection . . . . . . . . . . . 40
A.1 ISMIR worst degradations variances and attributes selection . . . . . . . . 51A.2 GTZAN worst degradations means and attributes selection . . . . . . . . 52A.3 GTZAN worst degradations variances and attributes selection . . . . . . . 53
viii
List of Tables
2.1 Central frequency (in Hz) of frequency bands of the Bark Scale . . . . . . 8
4.1 Correct mean classification percentage of clean data sets . . . . . . . . . . 294.2 GTZAN data set: vinyl degradation classification . . . . . . . . . . . . . . 294.3 GTZAN data set: harmonic distortion classification . . . . . . . . . . . . . 304.4 ISMIR data set: low pass filtering degradation classification . . . . . . . . 31
5.1 Mixed degraded classification . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 ISMIR attribute selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 GTZAN attribute selection . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4 ISMIR, mean, most robust attributes, strong selection . . . . . . . . . . . 435.5 ISMIR, mean, most robust attributes, strong selection . . . . . . . . . . . 445.6 ISMIR, mean, missing weakest attributes, strong selection . . . . . . . . . 455.7 GTZAN, mean, missing weakest attributes, strong selection . . . . . . . . 46
B.1 ISMIR, mean, most robust attributes, tolerant selection . . . . . . . . . . 55B.2 ISMIR, variance, most robust attributes, tolerant selection . . . . . . . . . 55B.3 ISMIR, variance, most robust attributes, strong selection . . . . . . . . . . 56B.4 GTZAN, mean, most robust attributes, tolerant selection . . . . . . . . . 56B.5 GTZAN, variance, most robust attributes, tolerant selection . . . . . . . . 57B.6 GTZAN, variance, most robust attributes, strong selection . . . . . . . . . 57B.7 ISMIR, mean, missing weakest attributes, tolerant selection . . . . . . . . 58B.8 ISMIR, variance, missing weakest attributes, tolerant selection . . . . . . . 58B.9 ISMIR,variance, missing weakest attributes, strong selection . . . . . . . . 59B.10 GTZAN, mean, missing weakest attributes, tolerant selection . . . . . . . 59B.11 GTZAN,variance, missing weakest attributes, tolerant selection . . . . . . 60B.12 GTZAN, variance, missing weakest attributes, strong selection . . . . . . . 60
ix
Abbreviations
MIR Music Information Retreival
ISMIR International Society for Music Information Retreival
MFCC Mel-Frequency Cepstral Coefficients
ADT Audio Degradation Toolbox
RP Rhythm Patterns
RH Rhythm Histogram
SSD Stadistical Spectrum Descriptor
TSSD Temporal Stadistical Spectrum Descriptor
MVD Modulation frequency Variance Descriptor
TRH Temporal Rhythm Histograms
10 - CV 10 - fold Cross Validation
SVM Suport Vector Machines
KNN K-Nearest Neighbours
x
Chapter 1
Introduction
This project has been done in the Music Information Retrieval Group 1 of the Institute
of Software Technology and Interactive Systems 2 at Technische Universität Wien 3, with
the supervision of Dr. Andreas Rauber.
1.1 Music Information Retrieval
Music Information Retrieval, or MIR, is a growing field of research related to the science
of retrieving information from music, as the name itself suggests. Those involved in
this field use a mathematical and scientific background combined with knowledge in
musicology, psychology and academic music studies in order to make extensive studies
that could be very useful. Some of the most important examples could be recommender
systems, track separation and music recognition, automatic music transcription, music
generation and automatic categorization, to which this thesis is related.
Every year, there is a conference of the International Society for Music Information
Retrieval, ISMIR [1], where the MIR researchers around the world can present their
studies on that field and share improvements in the different areas.
The methods used in the MIR studies are common to all of them. These methods are
specific data source (which is used as a training or benchmark), features extraction, pro-
cessing and representation (which have the extracted information of each music track)1http://www.ifs.tuwien.ac.at/mir/index.html2http://www.ifs.tuwien.ac.at/3http://www.tuwien.ac.at/
1
Chapter 1. Introduction 2
and finally statistics and machine learning (to classify and get the results). This docu-
ment expects to focus on the music genre classification.
1.2 Music genre classification
Nowadays, we can access to several vast musical databases in due to extended internet
services (e.g. Spotify, Grooveshark, iTunes), creating a need for methods to search and
organise these databases. One way to do this is to classify the audio tracks depending on
their genre or music style (e.g. Figure 1.1). This information can be extracted directly
from the metadata present on several of the new audio formats, as MP3 or files, but
the specified genre can’t be in our own classification, i.e. it could be more general (e.g.
classical, electronic) or more specific (e.g. baroque, dance). Otherwise, in other audio
formats, as an audio CD, there is not metadata information. Thus, we need an automatic
and not subjective process to classify audio tracks among a specific list of available genres.
Figure 1.1: Genre classification used on Spotify Radio
In order to make this classification we need a benchmark data set, which has to contain
several audio tracks already classified on the genres in which we want to classify future
audio files. Thus, the bigger a benchmark data set is, the much larger classification set
can be done.
1.3 Motivation
Normally, the benchmark collections used for the training of the classifiers tends to be
based in a single source, or in high-quality studio recordings. Indeed, this is not a problem
if the audio that we need to classify is also from a high-quality recording, i.e. the audio
Chapter 1. Introduction 3
used as a benchmark and the audio that we want to classify come from an homogeneous
recording quality.
Otherwise, some users compile their own audio collections coming from different data
source, i.e. combining recordings from vinyl, smartphones, live recordings, ethnical and
historical recordings, which could be a mixing of recordings with different encoding qual-
ity and with several distortions. This could have a direct impact on the genre classifica-
tion because the audio has changed its musical features, therefore it could be classified
as a wrong genre. The main goal of this thesis is to study which exactly the effect of this
degradations on the genre classification is, as well as their effect on prominent musical
features. However, we are not interested in an absolute performance improvement of the
correct classification, the complexity classes or any real-world size features of collection.
1.4 Thesis outline
We start reviewing some related work in audio classifications field, including human and
machine learning classification, as well as different degradation and noise models which
we could find in low-quality recordings in Chapter 2.
In Chapter 3 we present our experimental set-up, including all the software, features and
classifiers used in the different experiments.
The degradations have an impact on the features, as well as on the classification, that is
presented in Chapter 4.
In Chapter 5 we present the main results of this study, as well as two ways to try to
increment our correct classification percentage.
Finally, in Chapter 6 we present our conclusions and the summary of our experiments
and we also propose some new experiments to keep working in this study field.
Chapter 2
Related Work
Music classification is one of the dominant areas of MIR research. This study is about
music genre classification, however there exist other studies related to mood or even
author classification, which have to be very precise. In addition, there are several studies
related to the audio degradation and noise models that can be connected, which is the
goal of this thesis. In this section we will do a review of several studies related to it.
2.1 Music classification
One of the data sets used in this thesis (more data set details in Section 3.1) was collected
by G. Tzanetakis in order to make an extensive study of the automatic genre classifica-
tion [2] using several features sets different than the ones that we use in our study. The
features he uses are Timbral Texture Features (that includes spectral centroid, spectral
rolloff, spectral flux, time domain zero crossings, MFCC, analysis and texture window,
and low-energy feature), Timbral Texture Feature Vector (that consists on several statisti-
cal measures of the Timbral Texture Features), Rhythmic Content Features (that consists
on a beat histogram made by the correlation of several extracted envelopes from the
discrete wavelet transform octave frequency bands) and Pitch Content Features (based
on multiple pitch detection techniques). The classifiers used in this study are Simple
Gaussian, Gaussian Mixture Model, Expectation-Maximization and K-Nearest Neighbour.
About the classification, they first proceed to a general classification (across 10 genres)
where they achieve a maximum correct classification of 60% of the instances. Then they
4
Chapter 2. Related Work 5
also perform a sub-genre classification of classical genre (between choir, orchestra, piano
and string quartet) achieving a maximum correct classification of 88% of instances, as
well as a sub-genre classification of jazz genre (between big band, cool, fusion, piano,
quartet and swing) achieving a maximum correct classification of 68%. In addition they
present the confusion matrix for the different classifications. They do a classification by
humans in order to compare both classifications, where humans achieve 53% of correctly
classified instances with only 250 ms of the audio tracks and 70% of correctly classified
instances with 3 seconds of the audio tracks.
Another related studio was made by A. Schindler and A. Rauber [3], in which they
perform a classification of four different data sets, two of them also used in our study,
using several feature sets. The features used are Echonest features, Marysas Features and
Psychoacoustic Features. As a conclusion, the best classification performance in most
of the cases is achieved with the Temporal Echonest features, that is based on all the
statistical moments of Segments Pitches, Segments Timbre, Segments Loudness Max,
Segments Loudness Max Time and lengths of segments calculated from Segments Start.
Furthermore, the dimensionality of this feature is 224 dimensions so the computational
cost required is not as high as other features that can have more than 1000 dimensions.
Support Vector Machines and K-Nearest Neighbours are two classifiers frequently used in
MIR community. In [4], they perform an study of the automatic performer classification
over a group of 18 performers, proving a better classification using the SVM instead of
K-NN. In this study they use conventional features as MFCC and their related statistical
measures taking the whole duration of each audio track to extract the feature. They also
report that, in the case of the performer classification, the percentage of correct classified
instances is higher using songs in the training and testing sets from the same album than
from different albums. They achieve a 84% of correct classified instances with songs from
the same album and 69% of correct classified instances with songs from different albums,
both of them using SVM. The improvement compared to the use of K-NN in this case is
about 15%.
Chapter 2. Related Work 6
2.2 Audio degradation
In a study related to a software that is used in this thesis (Audio Degradation Toolbox ),
the authors perform an extensive study of the most common ways to degrade audio tracks
[5]. Apart from the instructions and features of the software, the perform experiments of
the impact of some audio degradations on several methods for standard music informatics
on suitable audio data: Audio Identification Service provided by EchoNest, Score-to-
Audio Alignment, Beat-tracking and Chord Detection. In the study they test several
Real World degradations (more information about them in Section 3.2.2), comparing
their impact for each service.
Another example of study related to audio degradation is [6],in which the authors study
the effect of the degradation on the audio track classification, i.e. for each of the music
items in the database there is one class. They use three features in their study: Loud-
ness, that belongs to the category of the intensity sensations; Spectral Flatness Measures
(SFM), that is related to the tonality aspect of the audio signal, used in the discrimi-
nating criterion between different audio tracks; and Spectral Crest Factor (SCF), that
is similar to the SFM but using the maximum values of the audio signal instead of the
mean values used in the SFM. All the feature are extracted individually from the dif-
ferent frequency bands. Their experiment consist in performing a degradation of the
audio tracks in several ways, then extract the features from the degraded tracks, as well
as from the original tracks, and finally classify the degraded audio features among the
original audio. The degradations performed are: time shifts, cropping, volume change
(although the features are designed in order to be level independents of the volume, thus
there are not separate test results), codification in 96 kbps MPGE-1/2 Layer-3, equal-
isation (used with adjacent band attenuations set to -6 dB and +6 dB in an alternate
fashion), band limiting (low pass filtering), dynamic range compression, noise addition
and loudspeaker-microphone transmission. The results of the experiments are different
depending on the frequency band selection, but as a conclusion, SFM and SCF are more
robust than Loudness against audio degradation.
Sometimes is also good to perform a genre classification by humans in order to compare
the results achieved with the automatic classification. The audio degradation could
have a direct impact on the human classification as well. In [7] paper, the authors
discuss the audio degradation effects on the genre classification by music students. The
Chapter 2. Related Work 7
audio is degraded with 3 different levels of timbre or rhythm alterations. The timbre
alterations consist in changing 3 different frequency bands (3rd, 6th and 12th octaves)
by a Gaussian noise of the same spectral power on that band. On the other hand, the
rhythmic degradation consist in shuffling the frames during three different times (125,
250 and 500 ms) preserving the average global and the timbre information. The results
show that the rhythm degradations are not meaningful, besides the timbre degradations
show a very important deterioration, sometimes reaching the 70% of deterioration in the
case of classical music.
The MSc thesis by J. Mansen [8] performs a complete feature extraction covering the
whole spectrum of standard features used in MIR. The features are organized by differ-
ent toolboxes used in their extraction: Binaural Cue Selection, Music Analysis, Chroma,
ISP, MIR, PSY and YAAFE. The author also discusses the effect of the MP3 encod-
ing and the resampling on the feature extraction. According to the results, the MP3
compression with 192 Kbps and above is acceptable for the musical data, whereas below
bit rates can be significant due to features extraction. As far as frequency resampling is
concerned, although it could have no effect on the audio signal, the features extraction
could be affected due to the exchange of information between frequency bands, and in
some cases like down-sampling, it could even lead to the removing a few attributes from
some feature sets.
2.3 Noise models
The perception of the audio by humans is different than the machines’ perception. The
human audiology system is not lineal for all the frequencies due to the size of the different
parts of the ear. Thus, the audio noise doesn’t affect humans with the same magnitude as
well. In order to linearise the humans auditive response to the audio systems and noise is
common the use of Weighting filters [9]. There exist several weighting filters specifically
designed for different applications, but the most prominent one used in the noise studies
and measurements is the A-Weighting. This one emphasises the frequencies around 3-6
KHz, while attenuates the low and very high frequencies, as the ear sensitivity. Thus,
the unit used to measure the loudness is called dBA. There are also filters B, C (used
for louder sounds) and D (used for loud aircraft noise) (Figure 2.1).
Chapter 2. Related Work 8
Figure 2.1: Weighting loudness filter curves: A-weighting (blue), B-weighting (yel-low), C-weighting (red) and D-weighting (black)
Another fact of the human ear system related to the noise distortion is the masking
auditory. This effect appears when a sound has a high loudness in a specific frequency
range and makes imperceptible other lower sounds present in the same critical frequency
band (Table 2.1) or in the nearest ones (Figure 2.2). This can be a problem in noisy
environments but also in some music mixing where some instruments or vocals can be
masked by others.
100 200 300 400 510 630 770 920 1080 1270 1480 17202000 2320 2700 3150 3700 4400 5300 6400 7700 9500 12000 15500
Table 2.1: Central frequency (in Hz) of frequency bands of the Bark Scale
Related to both previous psychoacoustic facts, there is a study related to a different noise
model that can be useful to improve the non masking by noise in the audio signals [10].
In it, the author explains that in some cases it is possible that in an audio track there is
a noise from a sinusoidal noise (e.g. interferences from 50 Hz from the power grid); that
can cause an effect of auditory masking. Even in some cases, the interference could be in
the frequency bands where the human ear is most sensible, producing a higher masking
of a largest frequency spectrum range. One solution proposed by the author is to locate
the region that promotes the noise using the humans masking auditory knowledge in
order to filter it and try to eliminate it.
Chapter 2. Related Work 9
Figure 2.2: Masking auditory of a sound between 125 and 250 Hz of 40 dB maskedby a sound between 250 and 500 Hz of 70 dB
Another system to avoid the noise distortion and provide the cassettes of more robustness
is this degradation [11]. This system is based in encoder and decoder systems called
companders that compress the range between loud and soft of the audio that we want
to record in the cassette and expand the range back again on the playback, therefore
reducing the tape noise (Figure 2.3).
Figure 2.3: Dolby system used in the cassettes recordings
Chapter 3
Experimental Set-up
This study is based on the repetitions of several processes: degradation of all the data
sets, extraction of features and finally a genre classification of the different audio sets
and combinations between them. As this is a repetitive action that we need to do in
several data subsets, we use Matlab [12] in order to run the different software of feature
extraction, classify and to do graphs with the results. The Matlab is working on an
Ubuntu 13.10 remote server.
3.1 Data Sets
The data sets are the collections of audio tracks that are used to work in our study.
Due to copyright terms, we are not allowed to use our own audio collection, so we use
two prominent data sets that have been studied intensively over the time in the MIR
community. Each data set has its own list of musical genres in which the audio tracks
are previously classified. In the study we do the same procedure for both data sets, but
without mixing any data between them, so they are two different compilations, and as
we will see in their features, they have different audio format and quality. The two data
sets used are:
• GTZAN: Data set collected by G. Tzanetakis and P. Cook [2]. It consists of 1000
audio tracks, each 30 seconds long, equally distributed across 10 musical genres:
blues, classical, country, disco, hiphop, pop, jazz, metal, reggae and rock. All
tracks are 22050 Hz Mono 16-bit in .au format.
10
Chapter 3. Experimental Set-up 11
• ISMIR: Data set created for the ISMIR 2004 in the genre classification evaluation
campaign [13]. It consists of 1458 full length files distributed in 6 musical genres:
classical, electronic, jazz&blues, metal&punk, rock&pop and world. All tracks are
44 KHz Stereo 128 Kbits in .mp3 format.
The GTZAN data set is classified across 10 musical genres whereas the ISMIR data set
is classified across 6 musical genres: the last one is more general and it also mixes similar
genres e.g. jazz&blues or rock&pop. Thus, the percentage of correct genre classification
tracks in the second one should be higher than in the first one. We will check it in the
following sections.
3.2 Audio Degradation Toolbox
In order to study the differences between clean and distorted audio, we need to compare
all the clean audio tracks with their respective degraded versions. The Audio Degradation
Toolbox 0.2 (ADT) is used with the purpose of creating the different degraded versions
of the clean audio in controlled settings [5].
The ADT is implemented in Matlab. This software works doing a degraded version of
all the audio tracks on the input list. We can choose between 14 distortion units, in
which case the software applies only a single way to degrade the audio track, or we can
combine several distortions at the same time. In addition, all the parameters of the
distortions can be configured. In this study we use the predefined options, which consist
in 12 distortion units that we call synthetic distortions and 6 other distortions that we
call real world distortions. All the distortions are applied across all the audio tracks and
after this, the audio signal is also normalized with a maximum amplitude of 0.999. The
clean audio is also normalized with a maximum amplitude of 0.999.
3.2.1 Synthetic Distortions
They involve isolated distortions that only would be added deliberately to an audio track
in order to make an audio processing study as well as a music effect. The 12 distortions
that we use are:
Chapter 3. Experimental Set-up 12
• Add pink noise: Adds a pink noise with a final SNR = 10 dB. The noise is
implemented as white noise plus filter.
• Add background sound: Adds a background sound with a final SNR = 10 dB.
The sound used is a noise from a restaurant environment [14].
• Aliasing: The signal is down-sampled to 4000 Hz sampling rate without low pass
filtering, causing a purposeful violation of Nyquist-Shannon sampling theorem.
Then, the original sampling rate is restored using a regular re-sampling method
with filtering.
• Clipping: Normalize the audio signal such that a 10% is outside the interval [-1,1],
and each resulting sample x is clipped using sign(x).
• Dynamic range compression: Applies a signal-dependent normalization to the
audio signal, reducing the energy differences between soft and loud parts of the
signal. The parameters of this distortion are: forgetting time = 0.1 seconds, com-
pressor threshold = -40 dB, compressor slope = 0.9, attack time = 0.01 seconds,
release time = 0.01 seconds and delay time = 0.01 seconds.
• Harmonic distortion: Applies a quadratic distortion to the audio signal with 5
iterative applications.
• Low quality MP3 compression: The audio signal is compressed to MP3 with
a constant bit rate of 32 Kbps. Then is decompressed to the out quality.
• Speed up: The audio signal is resampled in order to speed-up the music 5% up.
• Wow re-sampling: Applies a time-dependent resampling of the audio signal with
intensity of change = 3 and frequency of change = 0.5, imitating the non-constant
of some analogue players.
• Delay: Pads the first 22050 audio samples with zero samples.
• High-pass filtering: Applies a linear high-pass filtering using a Hamming window
with the stop frequency = 1000 Hz.
• Low-pass filtering: Analogue to the high-pass filtering, applies a low-pass filter-
ing with the stop frequency = 800 Hz.
Chapter 3. Experimental Set-up 13
In all of these cases, the distortions are applied with prominent presence in all the cases,
in order to observe the maximum effect that could produce each distortion on the clean
audio tracks.
3.2.2 Real World Distortions
These distortions are a few examples of audio degradations that we could find in real-
life recording conditions. These are made as combination of synthetic distortions with
specific parameters that try to emulate a typical recording instrument or emplacement.
The 6 real world distortions used are:
• Live recording: Applies an impulse response of a reverberation effect called Great
Hall (taken from [15] but included in the ADT) and adds a pink noise with SNR
ratio= 40 dB. As the name says, it makes the effect of a live recording of a musical
concert on an open stage.
• Strong MP3 compression: The audio signal is compressed to MP3 with a
constant bit rate of 64 kbps. The bit rate compression used is the double as in the
low quality MP3 compression used in the synthetic distortion. It makes that the
difference with the uncompressed audio can be almost imperceptible to the human
ear.
• Vinyl recording: Applies an impulse response of a vinyl effect extracted from a
plug-in 1 (included in the ADT), adds a sound of a vinyl player with a SNR ratio
= 40 dB from the same plug-in effect, applies a Wow resampling distortion with
intensity of change = 1.3 and frequency of change = 33/60 (vinyl rpm = 33 rpm)
and finally adds a pink noise with SNR ratio = 40 dB. It imitates a vinyl player
recording with his typical fluctuations.
• Radio broadcast: Applies a dynamic range compression with forgetting time =
0.3 seconds, compressor threshold = -40 dB, compressor slope = 0.6 dB, attack
time = 0.2 seconds, release time = 0.2 seconds and delay time = 0.2 seconds, and
applies a speed up of +2%. It imitates the loudness characteristic of many radio
stations and the speed up used to shorten music and create more advertisement
time.1https://www.izotope.com/fr/products/effects-instruments/vinyl/
Chapter 3. Experimental Set-up 14
• Smart phone recording: Applies an impulse response from a Google Nexus One
front microphone (included in the ADT), applies a dynamic range compression with
forgetting time = 0.2 seconds, compressor threshold = -35 dB, compressor slope
= 0.5, attack time = 0.01 seconds, release time = 0.01 seconds and delay time =
0.01 seconds; also applies a clipping distortion with 0.3% of samples clipped, and
finally adds a pink noise with a SNR ratio = 35 dB.
• Smart phone playback: Applies an impulse response from a Google Nexus One
front speaker (included in the ADT) which has a high-pass filtering and cut-off =
500 Hz, and adds a pink noise with a SNR ratio = 40 dB.
At the end of the procedure, all output audio tracks are compressed to MP3 in order
to store the distorted tracks in MP3 format. The compressor used is LAME 2 with the
parameters set to the highest quality encoding, a constant bit rate of 256 kbps and joint
stereo.
3.3 Feature Sets
In order to do an audio genre classification we need to extract information from those
tracks as a manageable set of values. This values are called music features and there are
several kinds of them, e.g. MFCC (Mel-Frequency Cepstral Coefficients) which measure
the timbre of music, and Chroma features which is related to the harmony and chords
of music, among others. In this study we want to focus on several subsets of psycho-
acoustic features [16], which can be extracted with the Audio Feature Extraction Software
v.6411 using a Matlab implementation 3. These features are called this way because they
describe rhythmic structures on variety of frequency bands considering psycho-acoustic
phenomenons according to the human perception of music and sound. The features that
we use are:
• Rhythm Patterns (RP): Represents the modulation amplitudes for a range of
modulation frequencies on critical bands of the Bark scale (Table 2.1) according to
the human auditory range perception and the loudness sensation per band. The2http://lame.sourceforge.net3http://www.ifs.tuwien.ac.at/mir/audiofeatureextraction.html
Chapter 3. Experimental Set-up 15
whole extraction process is shown in Figure 3.1). The resulting feature is a 1440
dimensional value vector (60 bins for each of 24 critical band).
• Rhythm Histograms (RH): This is a histogram of 60 bins based on the average
of the 24 critical bands (Table 2.1) computed on RP. It captures the modulation
between 0 and 10 Hz which represents the general rhythm characteristics of the au-
dio track. The extraction process is on Figure 3.1. The feature is a 60 dimensional
value vector.
• Statistical Spectrum Descriptor (SSD): Computes several statistical measures
on each of 24 critical bands with the audio track already adapted to the human au-
ditory perception. The statistical measures computed are: mean, median, variance,
skewness, kurtosis, minimum value and maximum value. The extraction process is
on figure 3.1. The resulting feature is a 168 dimensional value vector, where the
values are organized first in the seven different statistical measures, and in each
subgroup we can find the value for each different frequency band.
• Modulation frequency Variance Descriptors (MVD): This represents the
variations over the critical bands (Table 2.1) for a specific modulation frequency.
Similar to the SSD, it calculates the 7 statistical measures to each 60 fluctuation
bins over all the 24 critical bands, and then makes the average of the statistical
measures for all the critical bands, resulting a 420 dimensional value vector.
• Temporal Statistical Spectrum Descriptor (TSSD): In order to incorporate
time series aspect, it takes the SSD over 7 different parts of the audio track de-
scribing the variations over the time. The resulting vector has a dimensionality of
1176 values, i.e. 7 times the dimension of SSD.
• Temporal Rhythm Histograms (TRH): It takes the RH over 7 different parts
of the audio track and describes the difference in the fluctuations between 0 and
10 Hz in the different parts of the audio track. The dimensionality of the resulting
vector is of 420 values, ie. 7 times the dimension of RH.
All the features are extracted as a text file with the name of the feature as the extension
for each different feature set. Each feature set has a header that specifies information
related to the extracted feature: the dimension of the feature (including subsets dimen-
sion), the number of tracks extracted and more information about the software version.
Chapter 3. Experimental Set-up 16
Figure 3.1: Feature extraction process for the RP, RH and SSD
After the header there are different values that are called attributes, i.e. the dimension
of each feature is the number of attributes. Each attribute is the music information
extracted from the audio track. Each attribute has a different meaning, e.g. mean of sig-
nal in a concrete frequency band, fluctuation of frequencies in a frequency band, among
others. The attributes of each feature can be compared to each other but not to the at-
tributes of other features, because they are not related to the same music characteristic.
This is an example of the RH text file of feature extraction (comments are written after
%):
$TYPE vec
$DATA_TYPE audio-rh
$DATA_DIM 1x60 %1 group of 60 attributes per audio track
$EXTRACTOR Matlab rp_extract v 0.6411 by tml
$XDIM 100 % number of audio tracks analyzed
$YDIM 1
$VEC_DIM 60 % total dimension of the feature
Val_Attr1_Track1 Val_Attr2_Track1 ... Val_Attr60_Track1 Name_Track1
Chapter 3. Experimental Set-up 17
Val_Attr1_Track2 Val_Attr2_Track2 ... Val_Attr60_Track2 Name_Track2
...
3.4 Machine Learning Software: Weka
The last step to do the genre classification is to process all the vector values coming
from the feature extraction. The most common way to do the genre classification is the
machine learning. This is a branch of artificial intelligence which concerns the construc-
tion and study of systems, in this case the genre classification, that can be created and
learned from data.
In the genre classification systems there are two parts: the training set and the test set.
The training set is a representative collection of audio tracks already classified in genre,
that have to contain all the genres in which we want to classify. With the training set we
are able to make the model using a classifier. Finally, we can use the test set, which is
also already classified in genre, and make the classification using the model constructed
before. The procedure that the machine learning software makes is to compare the genre
classification made with the trained model and the classification previously done of the
test set. Then, the results are several values with statistical measures comparing both
classifications and the confusion matrix. In this study we will use only the percentage of
correctly classified instances, which is the most general value for the genre classification.
In order to make a good genre classification, the training set should contain more audio
tracks than the test set (normally, the proportion used is 90% of training set and 10%
of test set). This is because the model needs more statistical information about the
benchmark data in order to be more accurate.
One of the most used procedures in the classification studies is the 10-Fold cross val-
idation. This is a way of using all the data set that we want to study as a training
set and at the same time also as a test set. It consists in generating 10 independent
classification experiments in which we create a test set based on the 10% of the data set
and the remaining 90% is used to create the model to be used in the classification of the
test set selected before. Then the experiment is repeated but selecting 10% of different
audio tracks from the data set as a test set and taking the 90% remaining as a training
Chapter 3. Experimental Set-up 18
set. This action is repeated 10 times in order to use all the data as a test set. Then the
results are averaged. Example Figure: 3.2.
Figure 3.2: Example of 10-fold cross validation
The software that we use in this study to do our experiments is Weka (Waikato Environ-
ment for Knowledge Analysis) [17]. This is a popular suite of machine learning written
in Java that can also be run by Java, so it can be called from Matlab. It is also an
open-source software. The program includes a graphic interface but we only will use it
with the command line calls. Weka can read *.arff files, which are text files that have all
the data values from each of the instance that we want to process. This files can allow
several different kinds of information (see the full *.arff files specifications in 4) but our
files will always have the same structure:
@Relation Name_of_the_file
@Attribute term1 numeric
@Attribute term2 numeric
...
@Attribute class {list_of_different_genres}
@Data
Value_term1_Audio1, Value_term2_Audio1, ... , genre_Audio14http://www.cs.waikato.ac.nz/ml/weka/arff.html
Chapter 3. Experimental Set-up 19
Value_term1_Audio2, Value_term2_Audio2, ... , genre_Audio2
...
We need to do a conversion from the files from the feature extraction to the *.arff files
because, even if the structure is similar, weka software can only read files with this
format. We also need to put together all the files from the different genres of the data
set, specifying the genre of the audio track and at the end of the attribute values line
instead of the name of the audio track.
When the *.arff file is ready, we are able to do the genre classification. To do it, we have
to chose which classifier we want to use. Weka contains several classifiers to be selected,
which have too several parameters to specify. In this study we will use the most common
classifiers used in the MIR research [3]:
• Naive Bayes: Is a popular probabilistic classifier based on the Bayes’ theorem:
P (A|B) =P (B|A)P (A)
P (B)
The main feature of this classifier is that it assumes that all the attributes are
independent between them. This classifier is efficient and robust against noisy
data and has a simple structure. This classifier can work fine with less training
files than the others.
• Support Vector Machines (SVM): This classifier constructs a set of hyper-
planes in a high dimensional space with the attributes and then choses the one
with the largest distance between the different genres. We use two versions of
the SVM classifier: Linear Polykernel and RBFKernel (RBF), both with penalty
parameter = 1, RBF Gamma = 0.01 and c = 0.1 (default parameters).
• J48: Is the open-source Java implementation of decision tree C4.5. It works using
the concept of information entropy. In the genre classification is very useful because
it has the advantage of being relatively quick to train. It is used with a confidence
factor used for pruning from 0.25 and a minimum of 2 instances per leaf.
• Random Forest: constructs a multitude of decision trees at training time, needing
more time but being superior in precision than J48. The parameters that we use
Chapter 3. Experimental Set-up 20
for this classifier are unlimited depth of trees, 10 generated trees and the number
of attributes to be used in random selection set to 0.
• K-Nearest Neighbours (KNN): Is a popular non-parametric classifier. It is
based on the lazy learning, where the function is only approximated locally and all
computation is deferred until classification. In our study we use Euclidean distance
(L2) as well as Manhattan distance (L1), both with k = 1.
Chapter 4
Impact of Degradations
4.1 Effect on features
After the performing of the degradation of all data sets and features extraction, we are
able to evaluate the effect of each degradation on the different feature sets. In order to
see the differences between the clean and the distorted audio, we need to process the
feature values.
4.1.1 Feature processing
As the feature extraction writes the values on a text file for each feature set of each
genre and degradation, we need to create a Matlab script in order to read the values
from the different text files and load them on a Matlab matrix that we will use to make
the different computations. We want to see the differences between the clean and the
degraded audio for each feature set depending on the degradation, but otherwise, in this
part we don’t discriminate between different genres. An absolute value difference is done
for each attribute across all the feature set; then the mean and the variance of each
difference of the attributes is calculated for all the audio tracks degraded with the same
distortion. The whole process is in the example Figure 4.1.
Mean and variance differences are calculated for all the tracks for each degradation case,
in order to study the differences independently of the genre, although this is done for
each data set separately. This involves a new folder structure as in the Figure 4.2.
21
Chapter 4. Impact of Degradations 22
$XGLR�WUDFN��
$XGLR�WUDFN��
)HDWXUH�6HW�$WWULEXWHV $WWULEXWH�YDOXH�IURP�FOHDQ�DXGLR
$WWULEXWH�YDOXH�IURP�GHJUDGHG�DXGLR
$EVROXWH�YDOXH�RI�WKH�GLIIHUHQFH�EHWZHHQ�DWWULEXWHV�IURP�FOHDQ�DQG�GHJUDGHG�DXGLR
$OO�DXGLR�WUDFNV�IURP�WKH�VDPH�GHJUDGDWLRQ
$OO�GLIIHUHQFHV�IURP�WKH�VDPH�GHJUDGDWLRQ
0HDQ�RI�GLIIHUHQFHV�IURP�WKH�VDPH�GHJUDGDWLRQ�
9DULDQFH�RI�GLIIHUHQFHV�IURP�WKH�VDPH�GHJUDGDWLRQ�
Figure 4.1: Performing of mean and variance differences calculation. First step:differences between attributes from clean and degraded audio for each audio track;
second step: mean and variance of all the attributes from the same degradation.
Figure 4.2: Folder structure before (separated by genre) and after (all genres togetherfor each degradation, and creation of new subfolder with mean and variance plots) the
features processing.
Chapter 4. Impact of Degradations 23
4.1.2 Feature differences
Regarding to the features processing done before, now we are able to analyse the robust-
ness and the weakness of the attributes against all the degradations that we applied, per
each attribute. Our results are presented as a plot for each degradation and feature, of
the mean and variance differences over all the feature set. Conceptually, a small mean
difference of an attribute means that this attribute is hardly affect by the respective
degradation, thus making it a robust attribute in an audio collection having both clean
and degraded audio with this degradation, i.e. the mean attribute values are similar
between clean and degraded audio. Oppositely, a higher mean difference of an attribute
means that this attribute has a significant impact of the clean audio by the degradation,
thus making it a weak attribute, i.e. the mean attribute values are different between
clean and degraded audio. On the other hand, a high variance difference of an attribute
means that this attribute is also weak against this degradation because this means a high
dispersion of the attribute differences, that could also induce to a wrong classification.
In summary: robuster attributes have small mean and variance difference, as well as
weaker attributes have high mean difference and/or high variance difference.
Differences observed between clean and degraded audio follow a similar degradation
pattern for each feature set, mainly in the mean differences. However, differences remain
between different degradations. On the other hand, differences between both data sets
used in this study are very similar, mainly on mean differences and diverging only in
some isolated cases on variance differences. All mean and variance attribute differences
are presented in attached files (Appendix C), although in this section we will analyse
some relevant results of the main feature differences.
Regarding to differences referring to RP, as in the Figure 4.3, the differences are irregu-
larly distributed over the whole feature set due to the composition of it. RP is a large
feature set organized by 60 groups that belongs to the amplitude modulation, where each
group includes information of the 24 frequency bands, resulting of a 1440 dimensionality.
The degradation effect to each frequency band is different for each degradation, as well
as for each of 60 groups, so we cannot simply select a frequency band in order to classify
the robuster and weaker attributes. High variance on attribute differences are irregularly
distributed as well, but with less presence than mean differences.
Chapter 4. Impact of Degradations 24
0 200 400 600 800 1000 1200 14000
0.02
0.04
0.06
0.08
0.1
0.12
0.14Mean rp smartPhoneRecording
Figure 4.3: Rhythm Patterns mean attribute differences between clean audio andSmartphone Recording degradation of the ISMIR data set.
In the case of RH, the attributes classification is simpler, as high shifts are located in
lower part of 60 bins that model the feature set. As the feature set describes amplitude
modulation of the aggregation of all the frequency bands, degradations affect more to
slower rhythm features than faster. You can see an example of feature differences in
Figure 4.4. On the other hand, the variance of attribute differences is not very impor-
tant except by degradations as Clipping and Harmonic Distortion, which lead to a high
variance on firsts attribute differences.
Regarding to the differences of SSD, in the major part of the cases, the most significant
shifts are found in skewness measures, mainly in the highest frequency bands as you
can see in Figure 4.5 which belongs to difference of Low pass filtering degradation. In
that case low frequency bands are barely affected comparing to high frequency bands
due to the degradation peculiarities, i.e. the audio signal in the lowest frequency range
shouldn’t be affected by the degradation, so neither do features belonging to that part.
The highest shifts observed in the MVD case are also found in skewness measures group,
but in this feature, differences on the skewness have a similar value over all the 60 bins as
you can see in the example Figure 4.6. The second measure group with higher differences
Chapter 4. Impact of Degradations 25
0 10 20 30 40 50 600
0.2
0.4
0.6
0.8
1
1.2
1.4Mean rh vinylRecording
Figure 4.4: Rhythm Histograms mean attribute differences between clean audio andVinyl degradation of the GTZAN data set.
0 20 40 60 80 100 120 140 1600
20
40
60
80
100
120
140Mean ssd unit applyLowpassFilter
Mean Median Variance Skewness Kurtosis Minimum Maximum
Figure 4.5: Statistical Spectrum Descriptor mean attribute differences between cleanaudio and Low Pass Filtering degradation of the ISMIR data set. Dashed red linesindicate the separation of statistical measure groups that contains each measure related
to the 24 frequency bands 2.1.
Chapter 4. Impact of Degradations 26
in this feature is the variance between all bins corresponding to each frequency band,
resulting a group of 60 bins (don’t confuse with the variance between all the tracks
difference). In some degradations, we can also see another important difference in the
first bins of the group of maximum measures, but these shifts are not as relevant as the
two later discussed.
0 50 100 150 200 250 300 350 4000
0.5
1
1.5
2
2.5Mean mvd smartPhonePlayback
Mean VarianceMedian Skewness Kurtosis Minimum Maximum
Figure 4.6: Modulation frequency Variance Descriptor mean attribute differencesbetween clean audio and Smartphone Playback degradation of the ISMIR data set.Dashed red lines indicate the separation of statistical measure groups that contains 60
bins from the amplitude modulation of all the frequency bands aggregated 2.1.
Referring to the temporal features (TSSD and TRH ), the differences between degraded
and clean audio are not very prominent in all attributes except for an small attribute
range, so the effect on the genre classification will not be affected either.
As a summary of this section, the most affected features are RP and RH over all at-
tributes range, whereas on SSD and MVD high shifts are located on a clear range of
the features. Otherwise TSSD and TRH are not specially affected by degradations.
After this extensive analysis of degradation effects on our used features, we expect an
important effect on the genre classification.
Chapter 4. Impact of Degradations 27
4.2 Effect on classification
In this section we analyse the results achieved in the classification of both data sets
degraded by the different degradations explained in Sections 3.2.1 and 3.2.2. We compare
the results with the classification of the same clean data sets as well. First we need to
perform a creation of a whole 10-CV environment consisting of different folds: training
set and test set. It is important to emphasize that in this section, in both training and
test sets we are using the same single degradation over all the data set.
4.2.1 Creation of 10 - CV folds
As we already explained in Section 3.4, Weka software uses feature values in order to
perform the classification. We have to create an individual *.arff file per each data set,
feature and degradation. The different files have to be like the explained model in the
mentioned section, containing the attribute values of each audio track from all the genres
of each data set, and the genre where it belongs at the end of the line. We have to repeat
the same procedure for each degradation and for each feature set as well. For instance,
in the GTZAN case:
GTZAN data set = 100 audio tracks per genre · 10 genres = 1000 audio tracks
6 feature sets · (clean+ 12 synthetic degr.+ 6 real world degr.) = 114 arff files
Each one of these 114 *.arff files will contain information about all the 1000 audio tracks
from the GTZAN data set, each file with different feature sets and degradations. The
same operation has to be repeated for the ISMIR data set, resulting the same number of
*.arff files but containing information about the 1458 audio tracks of the data set. When
we have already created the *.arff files we are able to create the 10-CV environment
necessary to perform the classification.
In order to use all audio files information as a test file, we use the 10-CV technique
(Figure 3.2). Weka software has an option which is able to construct its own 10-CV
environment, with the different folds containing training and test sets, but we don’t use
it because we want more information about the different folds that can not be provided
Chapter 4. Impact of Degradations 28
with this mechanism. Therefore, we construct our own 10-CV model using filtering
Weka options with which we can create the different folds of the whole *.arff file; then we
perform a single validation for each fold, getting as a result the percentage of correctly
classified instances per each fold. Finally we are able to calculate the correctly classified
instances mean and variance.
4.2.2 Analysis of classification results
Before the analysis of the classification achieved with different degradation, we will per-
form an extra classification of the clean audio set using the same 10-CV procedure ex-
plained before, in order to be able to compare both classifications: clean audio and
degraded audio. In Tables 4.1 you can see the classification results of both clean data
sets (GTZAN & ISMIR). The maximum mean values achieved in both data sets are
around 80% on ISMIR and 75% on GTZAN. The classifier that achieved better mean
results is the Support Vector Machine by Sequential Minimal Optimization with Polyno-
mial Kernel (SMO Polykernel), whereas the feature that achieved better results is the
SSD, in both libraries as well. Otherwise, as we commented before, the classification of
ISMIR is better than the classification of GTZAN due to the number of genres in which
they are distributed, i.e. is more difficult to achieve good results in data sets distributed
with a large number of genres than in others less selective. The second better classifiers
are either KNN (Euclidean and Manhattan), and the second better feature is TSSD,
which is directly related with SSD. About the variance between different folds classifica-
tion, in GTZAN data set is around 25% whereas in ISMIR data set is around 10%. This
means that the percentage of correctly classified instances is more regular for each fold
in ISMIR than in GTZAN.
In most of the cases, the application of a single degradation over training and test sets
has not a prominent impact on the genre classification, i.e. the average mean percentage
values are similar on the classification of clean audio. As you can see in the example
Tables 4.2, on the vinyl degradation classification, the values are around ± 2%, so the
differences between both classifications are not relevant. If we take a look to the RH
difference studied before in Figure 4.4 (the same case), even if we see differences over all
the attributes of the feature, we still have similar classification results.
Chapter 4. Impact of Degradations 29
(a) GTZAN data set: 10 genres
GTZAN&MEANoriginalClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,70% 49,00% 37,40% 52,30% 54,60% 35,80%SMO PolyKernel 42,60% 65,80% 49,10% 74,40% 67,50% 38,90%SMO RBFKernel 28,60% 59,60% 36,40% 52,10% 64,40% 37,60%J48 32,90% 36,40% 33,60% 49,60% 47,20% 29,50%RandomForest 35,30% 43,90% 37,90% 61,60% 59,40% 37,90%KNN Euclidean 40,40% 51,60% 40,80% 66,10% 51,60% 30,80%KNN Manhattan 40,20% 53,60% 42,80% 66,20% 61,30% 35,40%
Difference&Original&/&DegradationsliveRecording Positive&=&improvement&of&classification&with°radation
Negative&&=&deterioration&with°radationClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 33,00% 49,10% 38,70% 51,00% 48,30% 35,10% Naive Bayes L1,70% 0,10% 1,30% L1,30% L6,30% L0,70%SMO PolyKernel 40,40% 66,10% 45,10% 65,20% 59,90% 33,40% SMO PolyKernel L2,20% 0,30% L4,00% L9,20% L7,60% L5,50%SMO RBFKernel 27,60% 61,10% 33,80% 50,90% 59,60% 32,80% SMO RBFKernel L1,00% 1,50% L2,60% L1,20% L4,80% L4,80%J48 28,00% 34,20% 30,90% 46,30% 43,90% 27,30% J48 L4,90% L2,20% L2,70% L3,30% L3,30% L2,20%RandomForest 30,90% 39,50% 37,80% 55,40% 53,50% 32,70% RandomForest L4,40% L4,40% L0,10% L6,20% L5,90% L5,20%KNN Euclidean 37,00% 53,20% 39,00% 58,90% 49,20% 30,80% KNN Euclidean L3,40% 1,60% L1,80% L7,20% L2,40% 0,00%KNN Manhattan 35,20% 53,90% 39,70% 60,60% 56,40% 35,40% KNN Manhattan L5,00% 0,30% L3,10% L5,60% L4,90% 0,00%
strongMp3CompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,80% 48,30% 38,60% 53,10% 54,70% 35,60% Naive Bayes 0,10% L0,70% 1,20% 0,80% 0,10% L0,20%SMO PolyKernel 42,10% 65,10% 48,80% 73,50% 68,70% 38,90% SMO PolyKernel L0,50% L0,70% L0,30% L0,90% 1,20% 0,00%SMO RBFKernel 27,40% 59,60% 36,50% 53,60% 65,90% 37,00% SMO RBFKernel L1,20% 0,00% 0,10% 1,50% 1,50% L0,60%J48 30,60% 36,50% 30,40% 49,10% 47,00% 30,10% J48 L2,30% 0,10% L3,20% L0,50% L0,20% 0,60%RandomForest 35,50% 43,20% 39,80% 59,70% 58,00% 34,50% RandomForest 0,20% L0,70% 1,90% L1,90% L1,40% L3,40%KNN Euclidean 40,10% 51,60% 41,40% 66,30% 51,60% 30,70% KNN Euclidean L0,30% 0,00% 0,60% 0,20% 0,00% L0,10%KNN Manhattan 40,90% 53,10% 42,40% 66,40% 60,90% 36,00% KNN Manhattan 0,70% L0,50% L0,40% 0,20% L0,40% 0,60%
vinylRecordingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,20% 49,60% 38,50% 49,70% 50,90% 37,00% Naive Bayes 0,50% 0,60% 1,10% L2,60% L3,70% 1,20%SMO PolyKernel 40,10% 64,30% 49,50% 72,80% 65,10% 37,70% SMO PolyKernel L2,50% L1,50% 0,40% L1,60% L2,40% L1,20%SMO RBFKernel 29,80% 60,10% 37,90% 49,50% 63,10% 39,30% SMO RBFKernel 1,20% 0,50% 1,50% L2,60% L1,30% 1,70%J48 30,40% 36,40% 31,50% 46,90% 42,60% 29,90% J48 L2,50% 0,00% L2,10% L2,70% L4,60% 0,40%RandomForest 36,60% 41,30% 37,30% 54,70% 54,20% 34,50% RandomForest 1,30% L2,60% L0,60% L6,90% L5,20% L3,40%KNN Euclidean 38,50% 51,50% 41,60% 62,00% 48,90% 31,90% KNN Euclidean L1,90% L0,10% 0,80% L4,10% L2,70% 1,10%KNN Manhattan 38,50% 53,40% 42,30% 62,70% 60,20% 36,40% KNN Manhattan L1,70% L0,20% L0,50% L3,50% L1,10% 1,00%
radioBroadcastClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,00% 52,50% 39,90% 53,00% 52,60% 34,50% Naive Bayes 0,30% 3,50% 2,50% 0,70% L2,00% L1,30%SMO PolyKernel 41,30% 64,90% 45,10% 72,40% 67,50% 38,00% SMO PolyKernel L1,30% L0,90% L4,00% L2,00% 0,00% L0,90%SMO RBFKernel 26,50% 62,80% 36,80% 53,30% 66,00% 37,60% SMO RBFKernel L2,10% 3,20% 0,40% 1,20% 1,60% 0,00%J48 29,70% 33,30% 31,40% 48,70% 48,50% 28,60% J48 L3,20% L3,10% L2,20% L0,90% 1,30% L0,90%RandomForest 37,40% 41,30% 36,70% 58,70% 55,70% 34,60% RandomForest 2,10% L2,60% L1,20% L2,90% L3,70% L3,30%KNN Euclidean 37,50% 50,80% 38,40% 63,40% 54,00% 28,70% KNN Euclidean L2,90% L0,80% L2,40% L2,70% 2,40% L2,10%KNN Manhattan 39,20% 53,30% 39,60% 64,00% 60,00% 34,90% KNN Manhattan L1,00% L0,30% L3,20% L2,20% L1,30% L0,50%
smartPhoneRecordingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 39,00% 53,10% 42,00% 51,70% 52,80% 36,50% Naive Bayes 4,30% 4,10% 4,60% L0,60% L1,80% 0,70%SMO PolyKernel 44,50% 66,70% 47,50% 68,60% 63,70% 37,00% SMO PolyKernel 1,90% 0,90% L1,60% L5,80% L3,80% L1,90%SMO RBFKernel 28,90% 61,10% 43,10% 52,50% 61,00% 39,40% SMO RBFKernel 0,30% 1,50% 6,70% 0,40% L3,40% 1,80%J48 32,30% 35,50% 34,00% 45,10% 45,40% 28,80% J48 L0,60% L0,90% 0,40% L4,50% L1,80% L0,70%RandomForest 39,90% 36,90% 40,80% 52,60% 49,10% 35,60% RandomForest 4,60% L7,00% 2,90% L9,00% L10,30% L2,30%KNN Euclidean 40,70% 52,30% 41,40% 60,60% 49,80% 32,20% KNN Euclidean 0,30% 0,70% 0,60% L5,50% L1,80% 1,40%KNN Manhattan 39,60% 52,20% 43,00% 62,80% 56,90% 35,20% KNN Manhattan L0,60% L1,40% 0,20% L3,40% L4,40% L0,20%
smartPhonePlaybackClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 32,20% 47,80% 34,50% 42,80% 46,20% 35,20% Naive Bayes L2,50% L1,20% L2,90% L9,50% L8,40% L0,60%SMO PolyKernel 38,20% 63,00% 42,10% 65,30% 56,20% 35,50% SMO PolyKernel L4,40% L2,80% L7,00% L9,10% L11,30% L3,40%SMO RBFKernel 25,70% 55,70% 35,60% 43,70% 58,30% 34,30% SMO RBFKernel L2,90% L3,90% L0,80% L8,40% L6,10% L3,30%J48 27,60% 33,30% 27,80% 44,40% 43,10% 28,00% J48 L5,30% L3,10% L5,80% L5,20% L4,10% L1,50%RandomForest 36,30% 39,80% 33,90% 53,30% 50,80% 32,70% RandomForest 1,00% L4,10% L4,00% L8,30% L8,60% L5,20%KNN Euclidean 38,90% 52,20% 36,20% 53,60% 42,30% 30,20% KNN Euclidean L1,50% 0,60% L4,60% L12,50% L9,30% L0,60%KNN Manhattan 36,70% 52,10% 37,30% 54,40% 52,40% 37,60% KNN Manhattan L3,50% L1,50% L5,50% L11,80% L8,90% 2,20%
unit_addNoiseClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 33,80% 45,80% 36,80% 51,40% 52,50% 33,90% Naive Bayes L0,90% L3,20% L0,60% L0,90% L2,10% L1,90%SMO PolyKernel 41,00% 64,00% 46,10% 72,80% 62,50% 38,00% SMO PolyKernel L1,60% L1,80% L3,00% L1,60% L5,00% L0,90%SMO RBFKernel 25,30% 61,00% 36,30% 52,60% 62,80% 36,30% SMO RBFKernel L3,30% 1,40% L0,10% 0,50% L1,60% L1,30%J48 29,60% 35,40% 28,00% 45,50% 47,40% 28,60% J48 L3,30% L1,00% L5,60% L4,10% 0,20% L0,90%RandomForest 35,80% 41,40% 38,00% 59,50% 53,20% 33,60% RandomForest 0,50% L2,50% 0,10% L2,10% L6,20% L4,30%KNN Euclidean 40,10% 51,70% 39,70% 64,50% 46,50% 31,80% KNN Euclidean L0,30% 0,10% L1,10% L1,60% L5,10% 1,00%KNN Manhattan 40,10% 52,60% 38,70% 64,00% 57,80% 35,20% KNN Manhattan L0,10% L1,00% L4,10% L2,20% L3,50% L0,20%
unit_addSoundClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,80% 48,40% 38,40% 52,20% 54,50% 35,80% Naive Bayes 0,10% L0,60% 1,00% L0,10% L0,10% 0,00%SMO PolyKernel 41,70% 65,50% 46,00% 72,70% 65,00% 35,60% SMO PolyKernel L0,90% L0,30% L3,10% L1,70% L2,50% L3,30%SMO RBFKernel 29,40% 61,30% 37,00% 53,00% 63,50% 36,00% SMO RBFKernel 0,80% 1,70% 0,60% 0,90% L0,90% L1,60%J48 27,60% 36,10% 29,20% 45,70% 45,10% 28,60% J48 L5,30% L0,30% L4,40% L3,90% L2,10% L0,90%RandomForest 36,00% 40,60% 36,50% 56,20% 51,50% 32,70% RandomForest 0,70% L3,30% L1,40% L5,40% L7,90% L5,20%KNN Euclidean 37,80% 51,20% 41,80% 64,70% 52,90% 31,80% KNN Euclidean L2,60% L0,40% 1,00% L1,40% 1,30% 1,00%KNN Manhattan 37,70% 53,90% 43,40% 63,30% 58,10% 35,50% KNN Manhattan L2,50% 0,30% 0,60% L2,90% L3,20% 0,10%
Mean&of&all&folds&with&10&cross&validation
(b) ISMIR data set: 6 genres
ISMIR%MEANoriginalClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 57,07% 63,38% 61,25% 61,04% 52,81% 60,01%SMO PolyKernel 63,65% 75,11% 70,51% 79,29% 80,31% 66,12%SMO RBFKernel 54,94% 68,79% 63,99% 63,86% 73,12% 64,40%J48 58,23% 59,26% 60,84% 68,11% 66,67% 55,01%RandomForest 65,98% 68,86% 68,79% 75,31% 74,69% 64,54%KNN Euclidean 60,35% 73,25% 63,04% 78,81% 76,68% 63,79%KNN Manhattan 61,66% 71,33% 64,81% 78,61% 77,78% 63,44%
Difference%Original%/%DegradationsliveRecording Positive%=%improvement%of%classification%with%degradation
Negative%%=%deterioration%with%degradationClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 50,48% 66,74% 58,57% 63,99% 53,22% 34,77% Naive Bayes L6,59% 3,36% L2,67% 2,95% 0,41% L25,24%SMO PolyKernel 61,11% 72,36% 69,42% 78,68% 77,37% 65,57% SMO PolyKernel L2,54% L2,74% L1,09% L0,62% L2,94% L0,55%SMO RBFKernel 48,01% 70,65% 61,52% 66,81% 73,19% 59,67% SMO RBFKernel L6,93% 1,85% L2,47% 2,95% 0,07% L4,73%J48 55,49% 62,34% 58,78% 68,38% 66,26% 53,70% J48 L2,74% 3,09% L2,05% 0,27% L0,41% L1,31%RandomForest 61,38% 68,52% 66,05% 73,46% 72,09% 61,86% RandomForest L4,60% L0,34% L2,74% L1,85% L2,60% L2,68%KNN Euclidean 59,94% 70,37% 58,98% 75,45% 73,87% 61,52% KNN Euclidean L0,41% L2,88% L4,05% L3,36% L2,81% L2,26%KNN Manhattan 58,85% 70,37% 60,43% 77,44% 75,86% 62,69% KNN Manhattan L2,81% L0,96% L4,39% L1,16% L1,91% L0,75%
strongMp3CompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 57,48% 63,58% 58,03% 61,94% 58,50% 60,36% Naive Bayes 0,41% 0,20% L3,22% 0,89% 5,69% 0,34%SMO PolyKernel 63,17% 74,70% 70,17% 78,95% 79,49% 66,26% SMO PolyKernel L0,48% L0,41% L0,34% L0,34% L0,83% 0,14%SMO RBFKernel 54,12% 69,21% 62,42% 64,06% 73,94% 64,75% SMO RBFKernel L0,82% 0,41% L1,58% 0,20% 0,82% 0,34%J48 56,52% 61,94% 60,22% 68,73% 68,17% 55,90% J48 L1,72% 2,68% L0,62% 0,62% 1,50% 0,89%RandomForest 63,10% 69,27% 66,74% 74,08% 73,04% 63,79% RandomForest L2,88% 0,41% L2,05% L1,23% L1,65% L0,75%KNN Euclidean 59,88% 72,70% 63,17% 78,05% 76,41% 63,58% KNN Euclidean L0,48% L0,55% 0,14% L0,76% L0,27% L0,21%KNN Manhattan 60,49% 71,47% 63,24% 77,23% 78,05% 64,75% KNN Manhattan L1,17% 0,14% L1,58% L1,37% 0,27% 1,30%
vinylRecordingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 54,60% 63,72% 61,52% 62,55% 52,26% 59,05% Naive Bayes L2,47% 0,34% 0,27% 1,51% L0,55% L0,96%SMO PolyKernel 60,97% 74,97% 68,73% 77,92% 79,49% 65,36% SMO PolyKernel L2,68% L0,14% L1,78% L1,37% L0,82% L0,75%SMO RBFKernel 49,93% 69,89% 62,14% 65,91% 74,35% 62,69% SMO RBFKernel L5,01% 1,10% L1,85% 2,06% 1,23% L1,71%J48 53,92% 61,11% 56,59% 68,04% 66,67% 55,69% J48 L4,32% 1,85% L4,25% L0,07% 0,00% 0,69%RandomForest 62,48% 68,38% 66,39% 73,32% 73,25% 64,27% RandomForest L3,50% L0,48% L2,40% L1,99% L1,44% L0,27%KNN Euclidean 57,89% 73,18% 59,53% 77,16% 75,18% 60,22% KNN Euclidean L2,47% L0,07% L3,50% L1,65% L1,50% L3,57%KNN Manhattan 57,41% 73,11% 62,21% 77,91% 76,34% 62,14% KNN Manhattan L4,25% 1,78% L2,60% L0,69% L1,44% L1,30%
radioBroadcastClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 53,36% 63,44% 57,27% 60,70% 50,48% 55,90% Naive Bayes L3,71% 0,07% L3,97% L0,35% L2,34% L4,12%SMO PolyKernel 58,64% 75,03% 69,00% 78,94% 78,33% 63,03% SMO PolyKernel L5,01% L0,07% L1,51% L0,35% L1,98% L3,09%SMO RBFKernel 48,35% 70,16% 62,76% 64,19% 72,63% 55,49% SMO RBFKernel L6,59% 1,37% L1,23% 0,34% L0,48% L8,91%J48 53,37% 60,70% 57,89% 65,84% 65,64% 52,68% J48 L4,87% 1,44% L2,95% L2,27% L1,03% L2,33%RandomForest 61,52% 67,43% 65,29% 71,95% 72,02% 60,22% RandomForest L4,46% L1,44% L3,50% L3,36% L2,67% L4,32%KNN Euclidean 57,68% 73,11% 62,07% 78,12% 76,00% 57,27% KNN Euclidean L2,68% L0,14% L0,96% L0,69% L0,68% L6,52%KNN Manhattan 59,26% 71,88% 63,31% 78,32% 76,61% 60,08% KNN Manhattan L2,40% 0,55% L1,51% L0,28% L1,16% L3,36%
smartPhoneRecordingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 54,12% 65,29% 59,33% 63,10% 38,27% 56,44% Naive Bayes L2,95% 1,91% L1,92% 2,05% L14,54% L3,57%SMO PolyKernel 59,95% 73,74% 69,89% 78,53% 79,36% 64,88% SMO PolyKernel L3,71% L1,37% L0,62% L0,76% L0,96% L1,23%SMO RBFKernel 49,45% 71,54% 65,57% 66,94% 72,70% 59,19% SMO RBFKernel L5,49% 2,74% 1,58% 3,09% L0,41% L5,21%J48 53,16% 58,92% 58,77% 65,71% 64,74% 53,56% J48 L5,08% L0,34% L2,06% L2,40% L1,93% L1,45%RandomForest 61,46% 67,97% 66,88% 70,85% 71,94% 62,76% RandomForest L4,53% L0,90% L1,92% L4,46% L2,74% L1,78%KNN Euclidean 59,95% 72,98% 65,64% 78,19% 76,95% 61,25% KNN Euclidean L0,41% L0,27% 2,60% L0,62% 0,28% L2,54%KNN Manhattan 60,63% 72,77% 63,85% 78,74% 77,57% 60,84% KNN Manhattan L1,03% 1,44% L0,96% 0,13% L0,21% L2,61%
smartPhonePlaybackClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 50,48% 53,98% 41,70% 51,71% 42,79% 46,22% Naive Bayes L6,59% L9,40% L19,55% L9,33% L10,02% L13,79%SMO PolyKernel 60,08% 72,70% 64,34% 76,48% 75,92% 63,85% SMO PolyKernel L3,57% L2,40% L6,17% L2,81% L4,39% L2,26%SMO RBFKernel 46,37% 69,41% 55,01% 60,49% 70,10% 55,49% SMO RBFKernel L8,57% 0,62% L8,99% L3,36% L3,02% L8,92%J48 53,02% 56,93% 55,29% 65,85% 63,10% 50,62% J48 L5,21% L2,33% L5,55% L2,26% L3,57% L4,39%RandomForest 61,59% 69,34% 63,30% 72,43% 70,44% 59,74% RandomForest L4,39% 0,48% L5,49% L2,88% L4,25% L4,80%KNN Euclidean 56,72% 72,63% 58,57% 75,93% 70,09% 55,15% KNN Euclidean L3,63% L0,62% L4,46% L2,88% L6,58% L8,64%KNN Manhattan 56,31% 72,43% 59,05% 76,62% 74,35% 56,58% KNN Manhattan L5,35% 1,09% L5,76% L1,99% L3,43% L6,87%
unit_addNoiseClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 55,08% 63,72% 60,77% 63,85% 56,51% 59,67% Naive Bayes L1,99% 0,34% L0,48% 2,81% 3,70% L0,34%SMO PolyKernel 63,44% 74,56% 69,55% 78,60% 78,46% 65,77% SMO PolyKernel L0,21% L0,55% L0,96% L0,69% L1,85% L0,34%SMO RBFKernel 55,21% 69,27% 64,41% 67,70% 74,63% 63,79% SMO RBFKernel 0,27% 0,48% 0,41% 3,84% 1,51% L0,62%J48 54,32% 62,55% 61,45% 69,48% 67,42% 58,16% J48 L3,91% 3,30% 0,62% 1,37% 0,75% 3,15%RandomForest 62,90% 69,00% 69,14% 74,42% 74,01% 64,41% RandomForest L3,09% 0,14% 0,35% L0,89% L0,68% L0,14%KNN Euclidean 59,06% 72,42% 65,30% 79,29% 77,44% 61,59% KNN Euclidean L1,30% L0,83% 2,26% 0,48% 0,76% L2,20%KNN Manhattan 58,85% 70,57% 65,23% 79,63% 78,74% 62,07% KNN Manhattan L2,81% L0,76% 0,41% 1,03% 0,96% L1,37%
unit_addSoundClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 54,53% 66,67% 62,41% 66,87% 34,77% 59,95% Naive Bayes L2,54% 3,29% 1,17% 5,83% L18,04% L0,07%SMO PolyKernel 60,84% 74,69% 69,96% 79,29% 81,00% 66,60% SMO PolyKernel L2,82% L0,42% L0,55% 0,00% 0,69% 0,48%SMO RBFKernel 49,93% 69,69% 65,02% 67,01% 71,74% 62,76% SMO RBFKernel L5,01% 0,89% 1,03% 3,16% L1,37% L1,65%J48 54,18% 58,50% 58,02% 68,11% 67,14% 54,53% J48 L4,05% L0,75% L2,82% 0,00% 0,47% L0,48%RandomForest 63,04% 67,29% 65,57% 75,04% 72,22% 64,68% RandomForest L2,95% L1,58% L3,22% L0,27% L2,47% 0,14%KNN Euclidean 58,57% 70,78% 62,00% 79,63% 79,77% 54,87% KNN Euclidean L1,78% L2,47% L1,03% 0,82% 3,09% L8,92%KNN Manhattan 58,64% 72,29% 63,72% 79,56% 79,49% 58,30% KNN Manhattan L3,03% 0,96% L1,10% 0,95% 1,72% L5,15%
Mean%of%all%folds%with%10%cross%validation
Table 4.1: Correct mean classification percentage of clean used data sets and thenumber of genres across which are distributed.
(a) Mean percentage of correctly classified instances (highlighted values mean an improve-ment with respect to clean audio classification)
GTZAN&MEANoriginalClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,70% 49,00% 37,40% 52,30% 54,60% 35,80%SMO PolyKernel 42,60% 65,80% 49,10% 74,40% 67,50% 38,90%SMO RBFKernel 28,60% 59,60% 36,40% 52,10% 64,40% 37,60%J48 32,90% 36,40% 33,60% 49,60% 47,20% 29,50%RandomForest 35,30% 43,90% 37,90% 61,60% 59,40% 37,90%KNN Euclidean 40,40% 51,60% 40,80% 66,10% 51,60% 30,80%KNN Manhattan 40,20% 53,60% 42,80% 66,20% 61,30% 35,40%
Difference&Original&/&DegradationsliveRecording Positive&=&improvement&of&classification&with°radation
Negative&&=&deterioration&with°radationClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 33,00% 49,10% 38,70% 51,00% 48,30% 35,10% Naive Bayes L1,70% 0,10% 1,30% L1,30% L6,30% L0,70%SMO PolyKernel 40,40% 66,10% 45,10% 65,20% 59,90% 33,40% SMO PolyKernel L2,20% 0,30% L4,00% L9,20% L7,60% L5,50%SMO RBFKernel 27,60% 61,10% 33,80% 50,90% 59,60% 32,80% SMO RBFKernel L1,00% 1,50% L2,60% L1,20% L4,80% L4,80%J48 28,00% 34,20% 30,90% 46,30% 43,90% 27,30% J48 L4,90% L2,20% L2,70% L3,30% L3,30% L2,20%RandomForest 30,90% 39,50% 37,80% 55,40% 53,50% 32,70% RandomForest L4,40% L4,40% L0,10% L6,20% L5,90% L5,20%KNN Euclidean 37,00% 53,20% 39,00% 58,90% 49,20% 30,80% KNN Euclidean L3,40% 1,60% L1,80% L7,20% L2,40% 0,00%KNN Manhattan 35,20% 53,90% 39,70% 60,60% 56,40% 35,40% KNN Manhattan L5,00% 0,30% L3,10% L5,60% L4,90% 0,00%
strongMp3CompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,80% 48,30% 38,60% 53,10% 54,70% 35,60% Naive Bayes 0,10% L0,70% 1,20% 0,80% 0,10% L0,20%SMO PolyKernel 42,10% 65,10% 48,80% 73,50% 68,70% 38,90% SMO PolyKernel L0,50% L0,70% L0,30% L0,90% 1,20% 0,00%SMO RBFKernel 27,40% 59,60% 36,50% 53,60% 65,90% 37,00% SMO RBFKernel L1,20% 0,00% 0,10% 1,50% 1,50% L0,60%J48 30,60% 36,50% 30,40% 49,10% 47,00% 30,10% J48 L2,30% 0,10% L3,20% L0,50% L0,20% 0,60%RandomForest 35,50% 43,20% 39,80% 59,70% 58,00% 34,50% RandomForest 0,20% L0,70% 1,90% L1,90% L1,40% L3,40%KNN Euclidean 40,10% 51,60% 41,40% 66,30% 51,60% 30,70% KNN Euclidean L0,30% 0,00% 0,60% 0,20% 0,00% L0,10%KNN Manhattan 40,90% 53,10% 42,40% 66,40% 60,90% 36,00% KNN Manhattan 0,70% L0,50% L0,40% 0,20% L0,40% 0,60%
vinylRecordingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,20% 49,60% 38,50% 49,70% 50,90% 37,00% Naive Bayes 0,50% 0,60% 1,10% L2,60% L3,70% 1,20%SMO PolyKernel 40,10% 64,30% 49,50% 72,80% 65,10% 37,70% SMO PolyKernel L2,50% L1,50% 0,40% L1,60% L2,40% L1,20%SMO RBFKernel 29,80% 60,10% 37,90% 49,50% 63,10% 39,30% SMO RBFKernel 1,20% 0,50% 1,50% L2,60% L1,30% 1,70%J48 30,40% 36,40% 31,50% 46,90% 42,60% 29,90% J48 L2,50% 0,00% L2,10% L2,70% L4,60% 0,40%RandomForest 36,60% 41,30% 37,30% 54,70% 54,20% 34,50% RandomForest 1,30% L2,60% L0,60% L6,90% L5,20% L3,40%KNN Euclidean 38,50% 51,50% 41,60% 62,00% 48,90% 31,90% KNN Euclidean L1,90% L0,10% 0,80% L4,10% L2,70% 1,10%KNN Manhattan 38,50% 53,40% 42,30% 62,70% 60,20% 36,40% KNN Manhattan L1,70% L0,20% L0,50% L3,50% L1,10% 1,00%
radioBroadcastClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,00% 52,50% 39,90% 53,00% 52,60% 34,50% Naive Bayes 0,30% 3,50% 2,50% 0,70% L2,00% L1,30%SMO PolyKernel 41,30% 64,90% 45,10% 72,40% 67,50% 38,00% SMO PolyKernel L1,30% L0,90% L4,00% L2,00% 0,00% L0,90%SMO RBFKernel 26,50% 62,80% 36,80% 53,30% 66,00% 37,60% SMO RBFKernel L2,10% 3,20% 0,40% 1,20% 1,60% 0,00%J48 29,70% 33,30% 31,40% 48,70% 48,50% 28,60% J48 L3,20% L3,10% L2,20% L0,90% 1,30% L0,90%RandomForest 37,40% 41,30% 36,70% 58,70% 55,70% 34,60% RandomForest 2,10% L2,60% L1,20% L2,90% L3,70% L3,30%KNN Euclidean 37,50% 50,80% 38,40% 63,40% 54,00% 28,70% KNN Euclidean L2,90% L0,80% L2,40% L2,70% 2,40% L2,10%KNN Manhattan 39,20% 53,30% 39,60% 64,00% 60,00% 34,90% KNN Manhattan L1,00% L0,30% L3,20% L2,20% L1,30% L0,50%
smartPhoneRecordingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 39,00% 53,10% 42,00% 51,70% 52,80% 36,50% Naive Bayes 4,30% 4,10% 4,60% L0,60% L1,80% 0,70%SMO PolyKernel 44,50% 66,70% 47,50% 68,60% 63,70% 37,00% SMO PolyKernel 1,90% 0,90% L1,60% L5,80% L3,80% L1,90%SMO RBFKernel 28,90% 61,10% 43,10% 52,50% 61,00% 39,40% SMO RBFKernel 0,30% 1,50% 6,70% 0,40% L3,40% 1,80%J48 32,30% 35,50% 34,00% 45,10% 45,40% 28,80% J48 L0,60% L0,90% 0,40% L4,50% L1,80% L0,70%RandomForest 39,90% 36,90% 40,80% 52,60% 49,10% 35,60% RandomForest 4,60% L7,00% 2,90% L9,00% L10,30% L2,30%KNN Euclidean 40,70% 52,30% 41,40% 60,60% 49,80% 32,20% KNN Euclidean 0,30% 0,70% 0,60% L5,50% L1,80% 1,40%KNN Manhattan 39,60% 52,20% 43,00% 62,80% 56,90% 35,20% KNN Manhattan L0,60% L1,40% 0,20% L3,40% L4,40% L0,20%
smartPhonePlaybackClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 32,20% 47,80% 34,50% 42,80% 46,20% 35,20% Naive Bayes L2,50% L1,20% L2,90% L9,50% L8,40% L0,60%SMO PolyKernel 38,20% 63,00% 42,10% 65,30% 56,20% 35,50% SMO PolyKernel L4,40% L2,80% L7,00% L9,10% L11,30% L3,40%SMO RBFKernel 25,70% 55,70% 35,60% 43,70% 58,30% 34,30% SMO RBFKernel L2,90% L3,90% L0,80% L8,40% L6,10% L3,30%J48 27,60% 33,30% 27,80% 44,40% 43,10% 28,00% J48 L5,30% L3,10% L5,80% L5,20% L4,10% L1,50%RandomForest 36,30% 39,80% 33,90% 53,30% 50,80% 32,70% RandomForest 1,00% L4,10% L4,00% L8,30% L8,60% L5,20%KNN Euclidean 38,90% 52,20% 36,20% 53,60% 42,30% 30,20% KNN Euclidean L1,50% 0,60% L4,60% L12,50% L9,30% L0,60%KNN Manhattan 36,70% 52,10% 37,30% 54,40% 52,40% 37,60% KNN Manhattan L3,50% L1,50% L5,50% L11,80% L8,90% 2,20%
unit_addNoiseClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 33,80% 45,80% 36,80% 51,40% 52,50% 33,90% Naive Bayes L0,90% L3,20% L0,60% L0,90% L2,10% L1,90%SMO PolyKernel 41,00% 64,00% 46,10% 72,80% 62,50% 38,00% SMO PolyKernel L1,60% L1,80% L3,00% L1,60% L5,00% L0,90%SMO RBFKernel 25,30% 61,00% 36,30% 52,60% 62,80% 36,30% SMO RBFKernel L3,30% 1,40% L0,10% 0,50% L1,60% L1,30%J48 29,60% 35,40% 28,00% 45,50% 47,40% 28,60% J48 L3,30% L1,00% L5,60% L4,10% 0,20% L0,90%RandomForest 35,80% 41,40% 38,00% 59,50% 53,20% 33,60% RandomForest 0,50% L2,50% 0,10% L2,10% L6,20% L4,30%KNN Euclidean 40,10% 51,70% 39,70% 64,50% 46,50% 31,80% KNN Euclidean L0,30% 0,10% L1,10% L1,60% L5,10% 1,00%KNN Manhattan 40,10% 52,60% 38,70% 64,00% 57,80% 35,20% KNN Manhattan L0,10% L1,00% L4,10% L2,20% L3,50% L0,20%
unit_addSoundClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,80% 48,40% 38,40% 52,20% 54,50% 35,80% Naive Bayes 0,10% L0,60% 1,00% L0,10% L0,10% 0,00%SMO PolyKernel 41,70% 65,50% 46,00% 72,70% 65,00% 35,60% SMO PolyKernel L0,90% L0,30% L3,10% L1,70% L2,50% L3,30%SMO RBFKernel 29,40% 61,30% 37,00% 53,00% 63,50% 36,00% SMO RBFKernel 0,80% 1,70% 0,60% 0,90% L0,90% L1,60%J48 27,60% 36,10% 29,20% 45,70% 45,10% 28,60% J48 L5,30% L0,30% L4,40% L3,90% L2,10% L0,90%RandomForest 36,00% 40,60% 36,50% 56,20% 51,50% 32,70% RandomForest 0,70% L3,30% L1,40% L5,40% L7,90% L5,20%KNN Euclidean 37,80% 51,20% 41,80% 64,70% 52,90% 31,80% KNN Euclidean L2,60% L0,40% 1,00% L1,40% 1,30% 1,00%KNN Manhattan 37,70% 53,90% 43,40% 63,30% 58,10% 35,50% KNN Manhattan L2,50% 0,30% 0,60% L2,90% L3,20% 0,10%
Mean&of&all&folds&with&10&cross&validation
(b) Mean percentage differences of correctly classified instances between clear and degradedaudio (positive values = improvement, negative values = deterioration; a darker highlightedvalue means a better improvement)
GTZAN&MEANoriginalClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,70% 49,00% 37,40% 52,30% 54,60% 35,80%SMO PolyKernel 42,60% 65,80% 49,10% 74,40% 67,50% 38,90%SMO RBFKernel 28,60% 59,60% 36,40% 52,10% 64,40% 37,60%J48 32,90% 36,40% 33,60% 49,60% 47,20% 29,50%RandomForest 35,30% 43,90% 37,90% 61,60% 59,40% 37,90%KNN Euclidean 40,40% 51,60% 40,80% 66,10% 51,60% 30,80%KNN Manhattan 40,20% 53,60% 42,80% 66,20% 61,30% 35,40%
Difference&Original&/&DegradationsliveRecording Positive&=&improvement&of&classification&with°radation
Negative&&=&deterioration&with°radationClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 33,00% 49,10% 38,70% 51,00% 48,30% 35,10% Naive Bayes L1,70% 0,10% 1,30% L1,30% L6,30% L0,70%SMO PolyKernel 40,40% 66,10% 45,10% 65,20% 59,90% 33,40% SMO PolyKernel L2,20% 0,30% L4,00% L9,20% L7,60% L5,50%SMO RBFKernel 27,60% 61,10% 33,80% 50,90% 59,60% 32,80% SMO RBFKernel L1,00% 1,50% L2,60% L1,20% L4,80% L4,80%J48 28,00% 34,20% 30,90% 46,30% 43,90% 27,30% J48 L4,90% L2,20% L2,70% L3,30% L3,30% L2,20%RandomForest 30,90% 39,50% 37,80% 55,40% 53,50% 32,70% RandomForest L4,40% L4,40% L0,10% L6,20% L5,90% L5,20%KNN Euclidean 37,00% 53,20% 39,00% 58,90% 49,20% 30,80% KNN Euclidean L3,40% 1,60% L1,80% L7,20% L2,40% 0,00%KNN Manhattan 35,20% 53,90% 39,70% 60,60% 56,40% 35,40% KNN Manhattan L5,00% 0,30% L3,10% L5,60% L4,90% 0,00%
strongMp3CompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,80% 48,30% 38,60% 53,10% 54,70% 35,60% Naive Bayes 0,10% L0,70% 1,20% 0,80% 0,10% L0,20%SMO PolyKernel 42,10% 65,10% 48,80% 73,50% 68,70% 38,90% SMO PolyKernel L0,50% L0,70% L0,30% L0,90% 1,20% 0,00%SMO RBFKernel 27,40% 59,60% 36,50% 53,60% 65,90% 37,00% SMO RBFKernel L1,20% 0,00% 0,10% 1,50% 1,50% L0,60%J48 30,60% 36,50% 30,40% 49,10% 47,00% 30,10% J48 L2,30% 0,10% L3,20% L0,50% L0,20% 0,60%RandomForest 35,50% 43,20% 39,80% 59,70% 58,00% 34,50% RandomForest 0,20% L0,70% 1,90% L1,90% L1,40% L3,40%KNN Euclidean 40,10% 51,60% 41,40% 66,30% 51,60% 30,70% KNN Euclidean L0,30% 0,00% 0,60% 0,20% 0,00% L0,10%KNN Manhattan 40,90% 53,10% 42,40% 66,40% 60,90% 36,00% KNN Manhattan 0,70% L0,50% L0,40% 0,20% L0,40% 0,60%
vinylRecordingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,20% 49,60% 38,50% 49,70% 50,90% 37,00% Naive Bayes 0,50% 0,60% 1,10% L2,60% L3,70% 1,20%SMO PolyKernel 40,10% 64,30% 49,50% 72,80% 65,10% 37,70% SMO PolyKernel L2,50% L1,50% 0,40% L1,60% L2,40% L1,20%SMO RBFKernel 29,80% 60,10% 37,90% 49,50% 63,10% 39,30% SMO RBFKernel 1,20% 0,50% 1,50% L2,60% L1,30% 1,70%J48 30,40% 36,40% 31,50% 46,90% 42,60% 29,90% J48 L2,50% 0,00% L2,10% L2,70% L4,60% 0,40%RandomForest 36,60% 41,30% 37,30% 54,70% 54,20% 34,50% RandomForest 1,30% L2,60% L0,60% L6,90% L5,20% L3,40%KNN Euclidean 38,50% 51,50% 41,60% 62,00% 48,90% 31,90% KNN Euclidean L1,90% L0,10% 0,80% L4,10% L2,70% 1,10%KNN Manhattan 38,50% 53,40% 42,30% 62,70% 60,20% 36,40% KNN Manhattan L1,70% L0,20% L0,50% L3,50% L1,10% 1,00%
radioBroadcastClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,00% 52,50% 39,90% 53,00% 52,60% 34,50% Naive Bayes 0,30% 3,50% 2,50% 0,70% L2,00% L1,30%SMO PolyKernel 41,30% 64,90% 45,10% 72,40% 67,50% 38,00% SMO PolyKernel L1,30% L0,90% L4,00% L2,00% 0,00% L0,90%SMO RBFKernel 26,50% 62,80% 36,80% 53,30% 66,00% 37,60% SMO RBFKernel L2,10% 3,20% 0,40% 1,20% 1,60% 0,00%J48 29,70% 33,30% 31,40% 48,70% 48,50% 28,60% J48 L3,20% L3,10% L2,20% L0,90% 1,30% L0,90%RandomForest 37,40% 41,30% 36,70% 58,70% 55,70% 34,60% RandomForest 2,10% L2,60% L1,20% L2,90% L3,70% L3,30%KNN Euclidean 37,50% 50,80% 38,40% 63,40% 54,00% 28,70% KNN Euclidean L2,90% L0,80% L2,40% L2,70% 2,40% L2,10%KNN Manhattan 39,20% 53,30% 39,60% 64,00% 60,00% 34,90% KNN Manhattan L1,00% L0,30% L3,20% L2,20% L1,30% L0,50%
smartPhoneRecordingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 39,00% 53,10% 42,00% 51,70% 52,80% 36,50% Naive Bayes 4,30% 4,10% 4,60% L0,60% L1,80% 0,70%SMO PolyKernel 44,50% 66,70% 47,50% 68,60% 63,70% 37,00% SMO PolyKernel 1,90% 0,90% L1,60% L5,80% L3,80% L1,90%SMO RBFKernel 28,90% 61,10% 43,10% 52,50% 61,00% 39,40% SMO RBFKernel 0,30% 1,50% 6,70% 0,40% L3,40% 1,80%J48 32,30% 35,50% 34,00% 45,10% 45,40% 28,80% J48 L0,60% L0,90% 0,40% L4,50% L1,80% L0,70%RandomForest 39,90% 36,90% 40,80% 52,60% 49,10% 35,60% RandomForest 4,60% L7,00% 2,90% L9,00% L10,30% L2,30%KNN Euclidean 40,70% 52,30% 41,40% 60,60% 49,80% 32,20% KNN Euclidean 0,30% 0,70% 0,60% L5,50% L1,80% 1,40%KNN Manhattan 39,60% 52,20% 43,00% 62,80% 56,90% 35,20% KNN Manhattan L0,60% L1,40% 0,20% L3,40% L4,40% L0,20%
smartPhonePlaybackClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 32,20% 47,80% 34,50% 42,80% 46,20% 35,20% Naive Bayes L2,50% L1,20% L2,90% L9,50% L8,40% L0,60%SMO PolyKernel 38,20% 63,00% 42,10% 65,30% 56,20% 35,50% SMO PolyKernel L4,40% L2,80% L7,00% L9,10% L11,30% L3,40%SMO RBFKernel 25,70% 55,70% 35,60% 43,70% 58,30% 34,30% SMO RBFKernel L2,90% L3,90% L0,80% L8,40% L6,10% L3,30%J48 27,60% 33,30% 27,80% 44,40% 43,10% 28,00% J48 L5,30% L3,10% L5,80% L5,20% L4,10% L1,50%RandomForest 36,30% 39,80% 33,90% 53,30% 50,80% 32,70% RandomForest 1,00% L4,10% L4,00% L8,30% L8,60% L5,20%KNN Euclidean 38,90% 52,20% 36,20% 53,60% 42,30% 30,20% KNN Euclidean L1,50% 0,60% L4,60% L12,50% L9,30% L0,60%KNN Manhattan 36,70% 52,10% 37,30% 54,40% 52,40% 37,60% KNN Manhattan L3,50% L1,50% L5,50% L11,80% L8,90% 2,20%
unit_addNoiseClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 33,80% 45,80% 36,80% 51,40% 52,50% 33,90% Naive Bayes L0,90% L3,20% L0,60% L0,90% L2,10% L1,90%SMO PolyKernel 41,00% 64,00% 46,10% 72,80% 62,50% 38,00% SMO PolyKernel L1,60% L1,80% L3,00% L1,60% L5,00% L0,90%SMO RBFKernel 25,30% 61,00% 36,30% 52,60% 62,80% 36,30% SMO RBFKernel L3,30% 1,40% L0,10% 0,50% L1,60% L1,30%J48 29,60% 35,40% 28,00% 45,50% 47,40% 28,60% J48 L3,30% L1,00% L5,60% L4,10% 0,20% L0,90%RandomForest 35,80% 41,40% 38,00% 59,50% 53,20% 33,60% RandomForest 0,50% L2,50% 0,10% L2,10% L6,20% L4,30%KNN Euclidean 40,10% 51,70% 39,70% 64,50% 46,50% 31,80% KNN Euclidean L0,30% 0,10% L1,10% L1,60% L5,10% 1,00%KNN Manhattan 40,10% 52,60% 38,70% 64,00% 57,80% 35,20% KNN Manhattan L0,10% L1,00% L4,10% L2,20% L3,50% L0,20%
unit_addSoundClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,80% 48,40% 38,40% 52,20% 54,50% 35,80% Naive Bayes 0,10% L0,60% 1,00% L0,10% L0,10% 0,00%SMO PolyKernel 41,70% 65,50% 46,00% 72,70% 65,00% 35,60% SMO PolyKernel L0,90% L0,30% L3,10% L1,70% L2,50% L3,30%SMO RBFKernel 29,40% 61,30% 37,00% 53,00% 63,50% 36,00% SMO RBFKernel 0,80% 1,70% 0,60% 0,90% L0,90% L1,60%J48 27,60% 36,10% 29,20% 45,70% 45,10% 28,60% J48 L5,30% L0,30% L4,40% L3,90% L2,10% L0,90%RandomForest 36,00% 40,60% 36,50% 56,20% 51,50% 32,70% RandomForest 0,70% L3,30% L1,40% L5,40% L7,90% L5,20%KNN Euclidean 37,80% 51,20% 41,80% 64,70% 52,90% 31,80% KNN Euclidean L2,60% L0,40% 1,00% L1,40% 1,30% 1,00%KNN Manhattan 37,70% 53,90% 43,40% 63,30% 58,10% 35,50% KNN Manhattan L2,50% 0,30% 0,60% L2,90% L3,20% 0,10%
Mean&of&all&folds&with&10&cross&validation
Table 4.2: Classification of GTZAN data set degraded by vinyl degradation
On the other hand and surprisingly, in some classification cases of degraded data sets, we
got results improvement, as in the example tables 4.3 from the classification of GTZAN
data set degraded by harmonic distortion. In this case, the RH has got an improvement
around 4% in the whole feature, achieving a maximum improvement of 6,10% using
the Naive Bayes classifier. In addition, TRH has got an improvement around 4% too
but with more variance between differen classifiers, achieving a maximum of 7,30% with
Naive Bayes too. Other features as RP, MVD and SSD also have got some improvement
in isolated cases whereas in TSSD feature set, all classifier results suffer a deterioration
around 3%.
Chapter 4. Impact of Degradations 30
There are more cases with several classification improvements, mainly in GTZAN data
set, as in Smartphone recording degradation for Naive Bayes classifier or in Clipping
degradation for Naive Bayes and SMO RBFKernel classifiers, among others. Although
this is not the main goal of this study, we think that it could be an interesting research
topic to review changes that these degradations perform on the audio files and then apply
them to the actual classifiers in order to improve their results.
(a) Mean percentage of correctly classified instances (highlighted values mean an improve-ment with respect to clean audio classification)
unit_applyAliasingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 37,40% 48,20% 41,70% 44,80% 45,80% 38,50% Naive Bayes 2,70% L0,80% 4,30% L7,50% L8,80% 2,70%SMO PolyKernel 43,80% 62,10% 47,80% 65,70% 57,90% 37,50% SMO PolyKernel 1,20% L3,70% L1,30% L8,70% L9,60% L1,40%SMO RBFKernel 29,60% 56,20% 41,80% 46,10% 57,00% 39,40% SMO RBFKernel 1,00% L3,40% 5,40% L6,00% L7,40% 1,80%J48 33,20% 32,80% 33,60% 44,30% 42,00% 30,70% J48 0,30% L3,60% 0,00% L5,30% L5,20% 1,20%RandomForest 36,70% 40,00% 37,40% 52,80% 49,10% 35,40% RandomForest 1,40% L3,90% L0,50% L8,80% L10,30% L2,50%KNN Euclidean 40,70% 47,80% 43,10% 54,20% 45,00% 33,30% KNN Euclidean 0,30% L3,80% 2,30% L11,90% L6,60% 2,50%KNN Manhattan 40,70% 48,70% 42,50% 54,80% 51,30% 37,70% KNN Manhattan 0,50% L4,90% L0,30% L11,40% L10,00% 2,30%
unit_applyClippingAlternativeClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 37,60% 52,10% 40,40% 54,10% 54,80% 38,30% Naive Bayes 2,90% 3,10% 3,00% 1,80% 0,20% 2,50%SMO PolyKernel 41,50% 66,30% 46,80% 71,10% 66,00% 37,50% SMO PolyKernel L1,10% 0,50% L2,30% L3,30% L1,50% L1,40%SMO RBFKernel 29,20% 62,40% 39,40% 55,90% 66,60% 39,60% SMO RBFKernel 0,60% 2,80% 3,00% 3,80% 2,20% 2,00%J48 32,10% 36,70% 29,30% 49,40% 48,30% 30,80% J48 L0,80% 0,30% L4,30% L0,20% 1,10% 1,30%RandomForest 36,50% 42,60% 38,90% 59,80% 56,00% 34,20% RandomForest 1,20% L1,30% 1,00% L1,80% L3,40% L3,70%KNN Euclidean 39,70% 50,10% 37,80% 62,20% 53,70% 30,50% KNN Euclidean L0,70% L1,50% L3,00% L3,90% 2,10% L0,30%KNN Manhattan 40,40% 52,10% 37,50% 64,10% 61,20% 35,30% KNN Manhattan 0,20% L1,50% L5,30% L2,10% L0,10% L0,10%
unit_applyDynamicRangeCompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,60% 52,60% 37,40% 53,20% 53,40% 36,70% Naive Bayes 0,90% 3,60% 0,00% 0,90% L1,20% 0,90%SMO PolyKernel 38,80% 66,20% 48,10% 72,30% 63,50% 34,90% SMO PolyKernel L3,80% 0,40% L1,00% L2,10% L4,00% L4,00%SMO RBFKernel 25,80% 63,20% 38,20% 55,00% 63,90% 36,00% SMO RBFKernel L2,80% 3,60% 1,80% 2,90% L0,50% L1,60%J48 29,80% 34,80% 32,40% 45,70% 41,20% 28,00% J48 L3,10% L1,60% L1,20% L3,90% L6,00% L1,50%RandomForest 34,50% 40,30% 38,10% 57,00% 53,80% 34,00% RandomForest L0,80% L3,60% 0,20% L4,60% L5,60% L3,90%KNN Euclidean 35,20% 52,70% 37,80% 61,30% 50,00% 27,70% KNN Euclidean L5,20% 1,10% L3,00% L4,80% L1,60% L3,10%KNN Manhattan 35,50% 53,50% 40,50% 61,40% 57,90% 34,30% KNN Manhattan L4,70% L0,10% L2,30% L4,80% L3,40% L1,10%
unit_applyHarmonicDistortionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 40,80% 52,50% 41,70% 52,40% 53,20% 43,10% Naive Bayes 6,10% 3,50% 4,30% 0,10% L1,40% 7,30%SMO PolyKernel 47,90% 65,70% 47,00% 71,90% 64,70% 39,30% SMO PolyKernel 5,30% L0,10% L2,10% L2,50% L2,80% 0,40%SMO RBFKernel 34,50% 62,70% 41,40% 55,20% 63,00% 41,60% SMO RBFKernel 5,90% 3,10% 5,00% 3,10% L1,40% 4,00%J48 34,40% 37,00% 33,40% 52,00% 46,80% 31,20% J48 1,50% 0,60% L0,20% 2,40% L0,40% 1,70%RandomForest 38,60% 42,00% 40,40% 58,40% 54,80% 38,10% RandomForest 3,30% L1,90% 2,50% L3,20% L4,60% 0,20%KNN Euclidean 43,50% 51,90% 40,50% 61,90% 51,00% 34,80% KNN Euclidean 3,10% 0,30% L0,30% L4,20% L0,60% 4,00%KNN Manhattan 41,50% 54,40% 42,00% 63,50% 58,50% 37,90% KNN Manhattan 1,30% 0,80% L0,80% L2,70% L2,80% 2,50%
unit_applyMp3CompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,30% 44,60% 39,20% 50,20% 51,70% 35,90% Naive Bayes L0,40% L4,40% 1,80% L2,10% L2,90% 0,10%SMO PolyKernel 41,60% 63,80% 47,10% 70,90% 64,20% 38,10% SMO PolyKernel L1,00% L2,00% L2,00% L3,50% L3,30% L0,80%SMO RBFKernel 28,80% 58,80% 37,10% 50,70% 62,30% 37,80% SMO RBFKernel 0,20% L0,80% 0,70% L1,40% L2,10% 0,20%J48 30,30% 34,20% 31,10% 47,90% 44,90% 28,30% J48 L2,60% L2,20% L2,50% L1,70% L2,30% L1,20%RandomForest 38,50% 40,80% 38,50% 55,50% 52,50% 36,00% RandomForest 3,20% L3,10% 0,60% L6,10% L6,90% L1,90%KNN Euclidean 39,60% 51,60% 42,10% 61,90% 49,00% 29,40% KNN Euclidean L0,80% 0,00% 1,30% L4,20% L2,60% L1,40%KNN Manhattan 39,90% 52,20% 44,00% 62,80% 58,20% 36,30% KNN Manhattan L0,30% L1,40% 1,20% L3,40% L3,10% 0,90%
unit_applySpeedupClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,20% 49,10% 36,70% 51,20% 50,60% 35,00% Naive Bayes L0,50% 0,10% L0,70% L1,10% L4,00% L0,80%SMO PolyKernel 41,40% 65,30% 47,30% 72,30% 64,80% 37,40% SMO PolyKernel L1,20% L0,50% L1,80% L2,10% L2,70% L1,50%SMO RBFKernel 28,80% 59,30% 36,20% 52,00% 64,70% 36,70% SMO RBFKernel 0,20% L0,30% L0,20% L0,10% 0,30% L0,90%J48 30,10% 36,90% 33,00% 47,60% 47,70% 31,70% J48 L2,80% 0,50% L0,60% L2,00% 0,50% 2,20%RandomForest 38,10% 41,70% 37,90% 57,80% 54,90% 35,20% RandomForest 2,80% L2,20% 0,00% L3,80% L4,50% L2,70%KNN Euclidean 39,90% 51,30% 41,20% 63,90% 51,50% 30,00% KNN Euclidean L0,50% L0,30% 0,40% L2,20% L0,10% L0,80%KNN Manhattan 37,60% 50,80% 43,10% 63,30% 60,90% 34,60% KNN Manhattan L2,60% L2,80% 0,30% L2,90% L0,40% L0,80%
unit_applyWowResamplingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,50% 48,50% 38,20% 51,70% 52,90% 36,50% Naive Bayes 0,80% L0,50% 0,80% L0,60% L1,70% 0,70%SMO PolyKernel 42,30% 67,10% 49,10% 74,50% 68,10% 37,50% SMO PolyKernel L0,30% 1,30% 0,00% 0,10% 0,60% L1,40%SMO RBFKernel 27,90% 60,80% 37,30% 53,70% 66,20% 36,40% SMO RBFKernel L0,70% 1,20% 0,90% 1,60% 1,80% L1,20%J48 28,70% 35,80% 31,90% 50,40% 45,10% 27,60% J48 L4,20% L0,60% L1,70% 0,80% L2,10% L1,90%RandomForest 38,10% 44,10% 37,20% 58,60% 56,20% 34,70% RandomForest 2,80% 0,20% L0,70% L3,00% L3,20% L3,20%KNN Euclidean 40,20% 52,90% 42,80% 65,00% 50,60% 31,00% KNN Euclidean L0,20% 1,30% 2,00% L1,10% L1,00% 0,20%KNN Manhattan 39,90% 54,50% 43,30% 64,70% 59,80% 36,80% KNN Manhattan L0,30% 0,90% 0,50% L1,50% L1,50% 1,40%
unit_applyDelayClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 33,90% 48,30% 36,60% 51,60% 53,60% 34,50% Naive Bayes L0,80% L0,70% L0,80% L0,70% L1,00% L1,30%SMO PolyKernel 42,40% 66,90% 47,30% 73,00% 70,20% 39,40% SMO PolyKernel L0,20% 1,10% L1,80% L1,40% 2,70% 0,50%SMO RBFKernel 28,10% 59,50% 37,20% 52,10% 66,40% 40,50% SMO RBFKernel L0,50% L0,10% 0,80% 0,00% 2,00% 2,90%J48 29,30% 35,90% 29,70% 48,10% 44,70% 30,20% J48 L3,60% L0,50% L3,90% L1,50% L2,50% 0,70%RandomForest 36,20% 42,30% 38,60% 60,60% 58,90% 36,30% RandomForest 0,90% L1,60% 0,70% L1,00% L0,50% L1,60%KNN Euclidean 40,40% 51,80% 42,20% 64,10% 54,20% 34,10% KNN Euclidean 0,00% 0,20% 1,40% L2,00% 2,60% 3,30%KNN Manhattan 38,30% 53,00% 42,70% 64,50% 63,30% 37,80% KNN Manhattan L1,90% L0,60% L0,10% L1,70% 2,00% 2,40%
unit_applyHighpassFilterClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 31,90% 36,60% 39,90% 39,80% 44,50% 35,80% Naive Bayes L2,80% L12,40% 2,50% L12,50% L10,10% 0,00%SMO PolyKernel 37,60% 61,50% 46,10% 63,40% 56,40% 35,00% SMO PolyKernel L5,00% L4,30% L3,00% L11,00% L11,10% L3,90%SMO RBFKernel 22,90% 53,80% 40,20% 46,10% 56,60% 33,50% SMO RBFKernel L5,70% L5,80% 3,80% L6,00% L7,80% L4,10%J48 27,80% 34,60% 29,20% 43,40% 41,90% 27,70% J48 L5,10% L1,80% L4,40% L6,20% L5,30% L1,80%RandomForest 32,30% 41,60% 36,90% 55,40% 52,70% 29,90% RandomForest L3,00% L2,30% L1,00% L6,20% L6,70% L8,00%KNN Euclidean 35,80% 49,20% 36,10% 54,40% 46,90% 32,30% KNN Euclidean L4,60% L2,40% L4,70% L11,70% L4,70% 1,50%KNN Manhattan 37,80% 49,80% 39,60% 58,50% 56,70% 35,90% KNN Manhattan L2,40% L3,80% L3,20% L7,70% L4,60% 0,50%
unit_applyLowpassFilterClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 38,30% 32,80% 36,90% 27,60% 25,10% 39,10% Naive Bayes 3,60% L16,20% L0,50% L24,70% L29,50% 3,30%SMO PolyKernel 44,10% 56,80% 37,90% 60,90% 55,90% 35,20% SMO PolyKernel 1,50% L9,00% L11,20% L13,50% L11,60% L3,70%SMO RBFKernel 31,30% 48,50% 36,10% 43,50% 48,90% 39,40% SMO RBFKernel 2,70% L11,10% L0,30% L8,60% L15,50% 1,80%J48 34,40% 31,10% 33,70% 43,70% 39,40% 33,70% J48 1,50% L5,30% 0,10% L5,90% L7,80% 4,20%RandomForest 36,30% 38,60% 37,20% 52,70% 46,70% 35,40% RandomForest 1,00% L5,30% L0,70% L8,90% L12,70% L2,50%KNN Euclidean 38,60% 47,80% 30,50% 49,20% 36,60% 29,00% KNN Euclidean L1,80% L3,80% L10,30% L16,90% L15,00% L1,80%KNN Manhattan 40,10% 48,50% 34,70% 51,00% 41,90% 34,40% KNN Manhattan L0,10% L5,10% L8,10% L15,20% L19,40% L1,00%
(b) Mean percentage differences of correctly classified instances between clear and degradedaudio (positive values = improvement, negative values = deterioration; a darker highlightedvalue means a better improvement)
unit_applyAliasingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 37,40% 48,20% 41,70% 44,80% 45,80% 38,50% Naive Bayes 2,70% L0,80% 4,30% L7,50% L8,80% 2,70%SMO PolyKernel 43,80% 62,10% 47,80% 65,70% 57,90% 37,50% SMO PolyKernel 1,20% L3,70% L1,30% L8,70% L9,60% L1,40%SMO RBFKernel 29,60% 56,20% 41,80% 46,10% 57,00% 39,40% SMO RBFKernel 1,00% L3,40% 5,40% L6,00% L7,40% 1,80%J48 33,20% 32,80% 33,60% 44,30% 42,00% 30,70% J48 0,30% L3,60% 0,00% L5,30% L5,20% 1,20%RandomForest 36,70% 40,00% 37,40% 52,80% 49,10% 35,40% RandomForest 1,40% L3,90% L0,50% L8,80% L10,30% L2,50%KNN Euclidean 40,70% 47,80% 43,10% 54,20% 45,00% 33,30% KNN Euclidean 0,30% L3,80% 2,30% L11,90% L6,60% 2,50%KNN Manhattan 40,70% 48,70% 42,50% 54,80% 51,30% 37,70% KNN Manhattan 0,50% L4,90% L0,30% L11,40% L10,00% 2,30%
unit_applyClippingAlternativeClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 37,60% 52,10% 40,40% 54,10% 54,80% 38,30% Naive Bayes 2,90% 3,10% 3,00% 1,80% 0,20% 2,50%SMO PolyKernel 41,50% 66,30% 46,80% 71,10% 66,00% 37,50% SMO PolyKernel L1,10% 0,50% L2,30% L3,30% L1,50% L1,40%SMO RBFKernel 29,20% 62,40% 39,40% 55,90% 66,60% 39,60% SMO RBFKernel 0,60% 2,80% 3,00% 3,80% 2,20% 2,00%J48 32,10% 36,70% 29,30% 49,40% 48,30% 30,80% J48 L0,80% 0,30% L4,30% L0,20% 1,10% 1,30%RandomForest 36,50% 42,60% 38,90% 59,80% 56,00% 34,20% RandomForest 1,20% L1,30% 1,00% L1,80% L3,40% L3,70%KNN Euclidean 39,70% 50,10% 37,80% 62,20% 53,70% 30,50% KNN Euclidean L0,70% L1,50% L3,00% L3,90% 2,10% L0,30%KNN Manhattan 40,40% 52,10% 37,50% 64,10% 61,20% 35,30% KNN Manhattan 0,20% L1,50% L5,30% L2,10% L0,10% L0,10%
unit_applyDynamicRangeCompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,60% 52,60% 37,40% 53,20% 53,40% 36,70% Naive Bayes 0,90% 3,60% 0,00% 0,90% L1,20% 0,90%SMO PolyKernel 38,80% 66,20% 48,10% 72,30% 63,50% 34,90% SMO PolyKernel L3,80% 0,40% L1,00% L2,10% L4,00% L4,00%SMO RBFKernel 25,80% 63,20% 38,20% 55,00% 63,90% 36,00% SMO RBFKernel L2,80% 3,60% 1,80% 2,90% L0,50% L1,60%J48 29,80% 34,80% 32,40% 45,70% 41,20% 28,00% J48 L3,10% L1,60% L1,20% L3,90% L6,00% L1,50%RandomForest 34,50% 40,30% 38,10% 57,00% 53,80% 34,00% RandomForest L0,80% L3,60% 0,20% L4,60% L5,60% L3,90%KNN Euclidean 35,20% 52,70% 37,80% 61,30% 50,00% 27,70% KNN Euclidean L5,20% 1,10% L3,00% L4,80% L1,60% L3,10%KNN Manhattan 35,50% 53,50% 40,50% 61,40% 57,90% 34,30% KNN Manhattan L4,70% L0,10% L2,30% L4,80% L3,40% L1,10%
unit_applyHarmonicDistortionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 40,80% 52,50% 41,70% 52,40% 53,20% 43,10% Naive Bayes 6,10% 3,50% 4,30% 0,10% L1,40% 7,30%SMO PolyKernel 47,90% 65,70% 47,00% 71,90% 64,70% 39,30% SMO PolyKernel 5,30% L0,10% L2,10% L2,50% L2,80% 0,40%SMO RBFKernel 34,50% 62,70% 41,40% 55,20% 63,00% 41,60% SMO RBFKernel 5,90% 3,10% 5,00% 3,10% L1,40% 4,00%J48 34,40% 37,00% 33,40% 52,00% 46,80% 31,20% J48 1,50% 0,60% L0,20% 2,40% L0,40% 1,70%RandomForest 38,60% 42,00% 40,40% 58,40% 54,80% 38,10% RandomForest 3,30% L1,90% 2,50% L3,20% L4,60% 0,20%KNN Euclidean 43,50% 51,90% 40,50% 61,90% 51,00% 34,80% KNN Euclidean 3,10% 0,30% L0,30% L4,20% L0,60% 4,00%KNN Manhattan 41,50% 54,40% 42,00% 63,50% 58,50% 37,90% KNN Manhattan 1,30% 0,80% L0,80% L2,70% L2,80% 2,50%
unit_applyMp3CompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,30% 44,60% 39,20% 50,20% 51,70% 35,90% Naive Bayes L0,40% L4,40% 1,80% L2,10% L2,90% 0,10%SMO PolyKernel 41,60% 63,80% 47,10% 70,90% 64,20% 38,10% SMO PolyKernel L1,00% L2,00% L2,00% L3,50% L3,30% L0,80%SMO RBFKernel 28,80% 58,80% 37,10% 50,70% 62,30% 37,80% SMO RBFKernel 0,20% L0,80% 0,70% L1,40% L2,10% 0,20%J48 30,30% 34,20% 31,10% 47,90% 44,90% 28,30% J48 L2,60% L2,20% L2,50% L1,70% L2,30% L1,20%RandomForest 38,50% 40,80% 38,50% 55,50% 52,50% 36,00% RandomForest 3,20% L3,10% 0,60% L6,10% L6,90% L1,90%KNN Euclidean 39,60% 51,60% 42,10% 61,90% 49,00% 29,40% KNN Euclidean L0,80% 0,00% 1,30% L4,20% L2,60% L1,40%KNN Manhattan 39,90% 52,20% 44,00% 62,80% 58,20% 36,30% KNN Manhattan L0,30% L1,40% 1,20% L3,40% L3,10% 0,90%
unit_applySpeedupClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 34,20% 49,10% 36,70% 51,20% 50,60% 35,00% Naive Bayes L0,50% 0,10% L0,70% L1,10% L4,00% L0,80%SMO PolyKernel 41,40% 65,30% 47,30% 72,30% 64,80% 37,40% SMO PolyKernel L1,20% L0,50% L1,80% L2,10% L2,70% L1,50%SMO RBFKernel 28,80% 59,30% 36,20% 52,00% 64,70% 36,70% SMO RBFKernel 0,20% L0,30% L0,20% L0,10% 0,30% L0,90%J48 30,10% 36,90% 33,00% 47,60% 47,70% 31,70% J48 L2,80% 0,50% L0,60% L2,00% 0,50% 2,20%RandomForest 38,10% 41,70% 37,90% 57,80% 54,90% 35,20% RandomForest 2,80% L2,20% 0,00% L3,80% L4,50% L2,70%KNN Euclidean 39,90% 51,30% 41,20% 63,90% 51,50% 30,00% KNN Euclidean L0,50% L0,30% 0,40% L2,20% L0,10% L0,80%KNN Manhattan 37,60% 50,80% 43,10% 63,30% 60,90% 34,60% KNN Manhattan L2,60% L2,80% 0,30% L2,90% L0,40% L0,80%
unit_applyWowResamplingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 35,50% 48,50% 38,20% 51,70% 52,90% 36,50% Naive Bayes 0,80% L0,50% 0,80% L0,60% L1,70% 0,70%SMO PolyKernel 42,30% 67,10% 49,10% 74,50% 68,10% 37,50% SMO PolyKernel L0,30% 1,30% 0,00% 0,10% 0,60% L1,40%SMO RBFKernel 27,90% 60,80% 37,30% 53,70% 66,20% 36,40% SMO RBFKernel L0,70% 1,20% 0,90% 1,60% 1,80% L1,20%J48 28,70% 35,80% 31,90% 50,40% 45,10% 27,60% J48 L4,20% L0,60% L1,70% 0,80% L2,10% L1,90%RandomForest 38,10% 44,10% 37,20% 58,60% 56,20% 34,70% RandomForest 2,80% 0,20% L0,70% L3,00% L3,20% L3,20%KNN Euclidean 40,20% 52,90% 42,80% 65,00% 50,60% 31,00% KNN Euclidean L0,20% 1,30% 2,00% L1,10% L1,00% 0,20%KNN Manhattan 39,90% 54,50% 43,30% 64,70% 59,80% 36,80% KNN Manhattan L0,30% 0,90% 0,50% L1,50% L1,50% 1,40%
unit_applyDelayClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 33,90% 48,30% 36,60% 51,60% 53,60% 34,50% Naive Bayes L0,80% L0,70% L0,80% L0,70% L1,00% L1,30%SMO PolyKernel 42,40% 66,90% 47,30% 73,00% 70,20% 39,40% SMO PolyKernel L0,20% 1,10% L1,80% L1,40% 2,70% 0,50%SMO RBFKernel 28,10% 59,50% 37,20% 52,10% 66,40% 40,50% SMO RBFKernel L0,50% L0,10% 0,80% 0,00% 2,00% 2,90%J48 29,30% 35,90% 29,70% 48,10% 44,70% 30,20% J48 L3,60% L0,50% L3,90% L1,50% L2,50% 0,70%RandomForest 36,20% 42,30% 38,60% 60,60% 58,90% 36,30% RandomForest 0,90% L1,60% 0,70% L1,00% L0,50% L1,60%KNN Euclidean 40,40% 51,80% 42,20% 64,10% 54,20% 34,10% KNN Euclidean 0,00% 0,20% 1,40% L2,00% 2,60% 3,30%KNN Manhattan 38,30% 53,00% 42,70% 64,50% 63,30% 37,80% KNN Manhattan L1,90% L0,60% L0,10% L1,70% 2,00% 2,40%
unit_applyHighpassFilterClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 31,90% 36,60% 39,90% 39,80% 44,50% 35,80% Naive Bayes L2,80% L12,40% 2,50% L12,50% L10,10% 0,00%SMO PolyKernel 37,60% 61,50% 46,10% 63,40% 56,40% 35,00% SMO PolyKernel L5,00% L4,30% L3,00% L11,00% L11,10% L3,90%SMO RBFKernel 22,90% 53,80% 40,20% 46,10% 56,60% 33,50% SMO RBFKernel L5,70% L5,80% 3,80% L6,00% L7,80% L4,10%J48 27,80% 34,60% 29,20% 43,40% 41,90% 27,70% J48 L5,10% L1,80% L4,40% L6,20% L5,30% L1,80%RandomForest 32,30% 41,60% 36,90% 55,40% 52,70% 29,90% RandomForest L3,00% L2,30% L1,00% L6,20% L6,70% L8,00%KNN Euclidean 35,80% 49,20% 36,10% 54,40% 46,90% 32,30% KNN Euclidean L4,60% L2,40% L4,70% L11,70% L4,70% 1,50%KNN Manhattan 37,80% 49,80% 39,60% 58,50% 56,70% 35,90% KNN Manhattan L2,40% L3,80% L3,20% L7,70% L4,60% 0,50%
unit_applyLowpassFilterClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 38,30% 32,80% 36,90% 27,60% 25,10% 39,10% Naive Bayes 3,60% L16,20% L0,50% L24,70% L29,50% 3,30%SMO PolyKernel 44,10% 56,80% 37,90% 60,90% 55,90% 35,20% SMO PolyKernel 1,50% L9,00% L11,20% L13,50% L11,60% L3,70%SMO RBFKernel 31,30% 48,50% 36,10% 43,50% 48,90% 39,40% SMO RBFKernel 2,70% L11,10% L0,30% L8,60% L15,50% 1,80%J48 34,40% 31,10% 33,70% 43,70% 39,40% 33,70% J48 1,50% L5,30% 0,10% L5,90% L7,80% 4,20%RandomForest 36,30% 38,60% 37,20% 52,70% 46,70% 35,40% RandomForest 1,00% L5,30% L0,70% L8,90% L12,70% L2,50%KNN Euclidean 38,60% 47,80% 30,50% 49,20% 36,60% 29,00% KNN Euclidean L1,80% L3,80% L10,30% L16,90% L15,00% L1,80%KNN Manhattan 40,10% 48,50% 34,70% 51,00% 41,90% 34,40% KNN Manhattan L0,10% L5,10% L8,10% L15,20% L19,40% L1,00%
Table 4.3: Classification of GTZAN data set degraded by harmonic distortion
Referring to classification deteriorations due to degraded data sets, the two most affected
degradations are Low pass filtering degradation (Tables 4.4) and High pass filtering
degradation. In Low pass filtering case, deteriorations are around 7% with a maximum
deterioration of 27,57% in TSSD feature with Naive Bayes classifier. In both filtering
degradations, this significant degradation is due to the removing of some frequency bands
caused by the filtering, i.e. feature information belonging to removed frequency bands is
lost as well, hindering the classification performance.
Another important classification deterioration is caused by the Smartphone playback,
where the deterioration is around 7% in GTZAN data set and 9% in ISMIR data set. In
this case, the deterioration is due to the degradation characteristics: high pass filtering,
cut-off and bad SNR. This means a high degradation of the audio signal, changing the
extracted features and hindering a correct classification as well.
Chapter 4. Impact of Degradations 31
(a) Mean percentage of correctly classified instances (highlighted values means an improve-ment respect clean audio classification)
unit_applyAliasingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 53,85% 59,26% 57,48% 59,95% 31,62% 59,12% Naive Bayes L3,22% L4,12% L3,77% L1,10% L21,20% L0,89%SMO PolyKernel 61,46% 72,22% 68,52% 74,69% 75,86% 62,83% SMO PolyKernel L2,20% L2,88% L1,99% L4,60% L4,46% L3,29%SMO RBFKernel 50,96% 67,35% 63,03% 63,17% 67,56% 62,28% SMO RBFKernel L3,98% L1,44% L0,96% L0,69% L5,56% L2,13%J48 54,61% 59,67% 57,06% 66,73% 65,56% 54,94% J48 L3,63% 0,42% L3,78% L1,38% L1,11% L0,07%RandomForest 63,51% 67,70% 67,01% 73,12% 71,06% 62,69% RandomForest L2,47% L1,17% L1,78% L2,19% L3,63% L1,85%KNN Euclidean 60,15% 69,21% 60,02% 73,46% 70,91% 59,87% KNN Euclidean L0,20% L4,05% L3,02% L5,35% L5,76% L3,91%KNN Manhattan 59,54% 69,48% 61,93% 75,10% 74,28% 61,05% KNN Manhattan L2,12% L1,85% L2,88% L3,50% L3,50% L2,40%
unit_applyClippingAlternativeClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 53,77% 63,58% 58,30% 63,44% 52,74% 55,63% Naive Bayes L3,29% 0,20% L2,95% 2,40% L0,07% L4,39%SMO PolyKernel 59,12% 73,53% 67,63% 77,92% 78,33% 62,96% SMO PolyKernel L4,53% L1,58% L2,88% L1,38% L1,99% L3,15%SMO RBFKernel 49,66% 70,44% 63,44% 66,53% 74,49% 56,31% SMO RBFKernel L5,28% 1,65% L0,55% 2,68% 1,37% L8,09%J48 53,29% 58,78% 57,68% 67,42% 67,29% 53,84% J48 L4,94% L0,47% L3,16% L0,69% 0,62% L1,17%RandomForest 61,04% 69,35% 67,01% 74,35% 71,74% 62,41% RandomForest L4,94% 0,48% L1,78% L0,96% L2,95% L2,13%KNN Euclidean 60,70% 72,50% 64,47% 77,23% 75,58% 58,78% KNN Euclidean 0,35% L0,75% 1,44% L1,58% L1,10% L5,01%KNN Manhattan 58,71% 71,95% 64,34% 78,05% 76,96% 61,59% KNN Manhattan L2,95% 0,61% L0,48% L0,55% L0,82% L1,85%
unit_applyDynamicRangeCompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 47,12% 64,47% 56,58% 61,38% 49,58% 31,48% Naive Bayes L9,95% 1,09% L4,67% 0,34% L3,23% L28,53%SMO PolyKernel 58,57% 73,53% 67,42% 77,98% 76,95% 62,90% SMO PolyKernel L5,08% L1,58% L3,09% L1,31% L3,36% L3,22%SMO RBFKernel 47,46% 71,05% 62,48% 64,74% 72,15% 55,42% SMO RBFKernel L7,48% 2,26% L1,51% 0,89% L0,96% L8,98%J48 50,55% 57,61% 56,93% 63,86% 62,42% 51,30% J48 L7,68% L1,64% L3,91% L4,26% L4,25% L3,71%RandomForest 60,70% 64,68% 65,16% 70,78% 70,71% 61,11% RandomForest L5,28% L4,19% L3,63% L4,53% L3,98% L3,43%KNN Euclidean 57,48% 70,37% 62,69% 76,61% 74,28% 59,05% KNN Euclidean L2,88% L2,88% L0,34% L2,20% L2,40% L4,73%KNN Manhattan 57,82% 70,58% 61,32% 76,95% 75,72% 60,01% KNN Manhattan L3,84% L0,75% L3,49% L1,65% L2,06% L3,43%
unit_applyHarmonicDistortionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 54,46% 65,84% 60,63% 62,49% 53,91% 57,06% Naive Bayes L2,61% 2,46% L0,62% 1,44% 1,10% L2,95%SMO PolyKernel 59,67% 73,46% 69,21% 78,33% 78,12% 63,17% SMO PolyKernel L3,98% L1,65% L1,30% L0,96% L2,19% L2,95%SMO RBFKernel 51,30% 70,78% 64,48% 66,32% 74,69% 62,28% SMO RBFKernel L3,63% 1,99% 0,48% 2,47% 1,58% L2,12%J48 54,32% 59,47% 59,12% 68,18% 66,46% 55,69% J48 L3,91% 0,21% L1,72% 0,07% L0,21% 0,68%RandomForest 63,45% 67,22% 66,94% 74,07% 74,01% 63,71% RandomForest L2,54% L1,65% L1,85% L1,24% L0,68% L0,83%KNN Euclidean 60,29% 71,54% 62,62% 77,78% 76,40% 60,15% KNN Euclidean L0,06% L1,71% L0,41% L1,03% L0,27% L3,64%KNN Manhattan 57,54% 72,02% 63,85% 77,64% 77,64% 61,38% KNN Manhattan L4,12% 0,68% L0,96% L0,97% L0,14% L2,06%
unit_applyMp3CompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 56,11% 61,94% 58,44% 60,77% 58,84% 59,46% Naive Bayes L0,96% L1,44% L2,81% L0,28% 6,03% L0,55%SMO PolyKernel 63,31% 74,90% 69,21% 76,96% 79,22% 65,29% SMO PolyKernel L0,34% L0,21% L1,30% L2,34% L1,10% L0,82%SMO RBFKernel 54,53% 69,34% 63,44% 62,55% 72,77% 63,99% SMO RBFKernel L0,41% 0,55% L0,55% L1,31% L0,34% L0,41%J48 54,74% 58,57% 57,13% 68,80% 67,56% 57,13% J48 L3,49% L0,69% L3,70% 0,69% 0,89% 2,13%RandomForest 64,75% 68,65% 67,63% 73,12% 71,13% 64,47% RandomForest L1,23% L0,21% L1,17% L2,19% L3,56% L0,07%KNN Euclidean 62,00% 71,40% 62,56% 77,65% 74,90% 61,18% KNN Euclidean 1,65% L1,85% L0,48% L1,16% L1,78% L2,61%KNN Manhattan 61,45% 70,58% 63,86% 76,76% 77,44% 62,35% KNN Manhattan L0,21% L0,75% L0,96% L1,85% L0,34% L1,10%
unit_applySpeedupClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 56,58% 64,41% 61,66% 61,93% 36,62% 60,70% Naive Bayes L0,48% 1,03% 0,41% 0,89% L16,19% 0,68%SMO PolyKernel 63,85% 72,98% 70,37% 79,57% 80,45% 66,46% SMO PolyKernel 0,20% L2,13% L0,14% 0,27% 0,14% 0,34%SMO RBFKernel 55,01% 68,93% 62,69% 65,43% 72,09% 63,58% SMO RBFKernel 0,07% 0,13% L1,30% 1,57% L1,03% L0,82%J48 55,42% 62,14% 59,26% 67,97% 69,41% 57,06% J48 L2,82% 2,89% L1,58% L0,14% 2,74% 2,06%RandomForest 64,41% 69,00% 67,22% 75,65% 73,67% 64,75% RandomForest L1,58% 0,13% L1,57% 0,34% L1,02% 0,21%KNN Euclidean 61,87% 70,92% 63,17% 79,97% 78,33% 61,80% KNN Euclidean 1,51% L2,33% 0,13% 1,16% 1,65% L1,99%KNN Manhattan 60,56% 71,40% 64,95% 78,60% 79,02% 63,78% KNN Manhattan L1,10% 0,07% 0,14% 0,00% 1,24% 0,34%
unit_applyWowResamplingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 56,72% 64,06% 60,70% 61,53% 31,55% 59,74% Naive Bayes L0,34% 0,68% L0,55% 0,48% L21,27% L0,27%SMO PolyKernel 63,38% 75,38% 69,76% 79,22% 81,62% 67,49% SMO PolyKernel L0,28% 0,27% L0,75% L0,07% 1,31% 1,37%SMO RBFKernel 56,24% 69,14% 63,65% 65,02% 71,68% 64,68% SMO RBFKernel 1,31% 0,34% L0,34% 1,16% L1,44% 0,28%J48 55,00% 61,05% 59,94% 68,04% 68,58% 56,51% J48 L3,23% 1,79% L0,90% L0,07% 1,91% 1,50%RandomForest 65,02% 69,76% 69,48% 75,58% 73,19% 66,53% RandomForest L0,96% 0,89% 0,69% 0,28% L1,50% 1,99%KNN Euclidean 60,83% 72,29% 64,41% 79,36% 76,40% 63,31% KNN Euclidean 0,48% L0,96% 1,37% 0,55% L0,27% L0,48%KNN Manhattan 60,01% 71,81% 65,70% 78,54% 77,64% 63,38% KNN Manhattan L1,65% 0,48% 0,89% L0,07% L0,14% L0,07%
unit_applyDelayClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 56,79% 63,86% 61,45% 61,53% 47,94% 60,49% Naive Bayes L0,27% 0,48% 0,21% 0,48% L4,87% 0,48%SMO PolyKernel 63,86% 74,21% 68,87% 79,29% 79,84% 67,97% SMO PolyKernel 0,20% L0,89% L1,64% 0,00% L0,48% 1,85%SMO RBFKernel 55,42% 68,66% 63,79% 63,65% 73,05% 64,33% SMO RBFKernel 0,48% L0,14% L0,20% L0,21% L0,07% L0,07%J48 57,00% 61,39% 60,29% 69,14% 70,85% 55,28% J48 L1,24% 2,13% L0,55% 1,03% 4,18% 0,27%RandomForest 64,96% 69,41% 68,32% 75,86% 74,56% 64,82% RandomForest L1,02% 0,55% L0,47% 0,55% L0,13% 0,28%KNN Euclidean 61,11% 73,46% 62,76% 78,81% 76,54% 63,38% KNN Euclidean 0,75% 0,21% L0,28% 0,00% L0,14% L0,41%KNN Manhattan 62,07% 70,92% 64,41% 78,19% 77,85% 65,02% KNN Manhattan 0,41% L0,41% L0,41% L0,41% 0,07% 1,57%
unit_applyHighpassFilterClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 48,90% 56,45% 55,56% 49,72% 32,44% 52,60% Naive Bayes L8,16% L6,93% L5,69% L11,32% L20,37% L7,41%SMO PolyKernel 58,37% 71,33% 66,19% 75,52% 76,33% 64,06% SMO PolyKernel L5,28% L3,78% L4,32% L3,78% L3,98% L2,06%SMO RBFKernel 46,43% 66,53% 59,74% 61,18% 68,31% 53,50% SMO RBFKernel L8,50% L2,26% L4,25% L2,68% L4,80% L10,91%J48 51,09% 62,34% 55,35% 67,49% 64,89% 51,58% J48 L7,14% 3,09% L5,49% L0,62% L1,78% L3,43%RandomForest 61,93% 68,66% 66,05% 73,18% 71,88% 61,80% RandomForest L4,05% L0,21% L2,74% L2,13% L2,81% L2,74%KNN Euclidean 58,37% 71,33% 59,05% 74,62% 70,72% 58,03% KNN Euclidean L1,99% L1,92% L3,98% L4,19% L5,96% L5,76%KNN Manhattan 56,92% 71,33% 62,21% 75,72% 74,49% 58,23% KNN Manhattan L4,74% 0,00% L2,61% L2,88% L3,29% L5,22%
unit_applyLowpassFilterClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 50,28% 51,24% 57,20% 52,74% 25,24% 53,98% Naive Bayes L6,79% L12,14% L4,05% L8,30% L27,57% L6,03%SMO PolyKernel 58,30% 69,21% 62,21% 71,12% 71,81% 63,58% SMO PolyKernel L5,35% L5,90% L8,30% L8,17% L8,51% L2,53%SMO RBFKernel 49,25% 63,37% 60,36% 60,29% 65,22% 57,47% SMO RBFKernel L5,69% L5,42% L3,64% L3,57% L7,89% L6,93%J48 51,99% 58,58% 55,15% 63,85% 64,07% 50,68% J48 L6,24% L0,68% L5,69% L4,26% L2,60% L4,33%RandomForest 60,42% 65,23% 61,60% 69,47% 68,25% 59,74% RandomForest L5,56% L3,64% L7,19% L5,83% L6,44% L4,80%KNN Euclidean 56,31% 65,02% 53,63% 68,59% 66,12% 56,10% KNN Euclidean L4,05% L8,23% L9,40% L10,22% L10,56% L7,69%KNN Manhattan 56,59% 65,84% 55,00% 70,16% 68,79% 55,97% KNN Manhattan L5,07% L5,49% L9,81% L8,44% L8,98% L7,48%
(b) Mean percentage differences of correctly classified instances between clear and degradedaudio (positive values = improvement, negative values = deterioration)
unit_applyAliasingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 53,85% 59,26% 57,48% 59,95% 31,62% 59,12% Naive Bayes L3,22% L4,12% L3,77% L1,10% L21,20% L0,89%SMO PolyKernel 61,46% 72,22% 68,52% 74,69% 75,86% 62,83% SMO PolyKernel L2,20% L2,88% L1,99% L4,60% L4,46% L3,29%SMO RBFKernel 50,96% 67,35% 63,03% 63,17% 67,56% 62,28% SMO RBFKernel L3,98% L1,44% L0,96% L0,69% L5,56% L2,13%J48 54,61% 59,67% 57,06% 66,73% 65,56% 54,94% J48 L3,63% 0,42% L3,78% L1,38% L1,11% L0,07%RandomForest 63,51% 67,70% 67,01% 73,12% 71,06% 62,69% RandomForest L2,47% L1,17% L1,78% L2,19% L3,63% L1,85%KNN Euclidean 60,15% 69,21% 60,02% 73,46% 70,91% 59,87% KNN Euclidean L0,20% L4,05% L3,02% L5,35% L5,76% L3,91%KNN Manhattan 59,54% 69,48% 61,93% 75,10% 74,28% 61,05% KNN Manhattan L2,12% L1,85% L2,88% L3,50% L3,50% L2,40%
unit_applyClippingAlternativeClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 53,77% 63,58% 58,30% 63,44% 52,74% 55,63% Naive Bayes L3,29% 0,20% L2,95% 2,40% L0,07% L4,39%SMO PolyKernel 59,12% 73,53% 67,63% 77,92% 78,33% 62,96% SMO PolyKernel L4,53% L1,58% L2,88% L1,38% L1,99% L3,15%SMO RBFKernel 49,66% 70,44% 63,44% 66,53% 74,49% 56,31% SMO RBFKernel L5,28% 1,65% L0,55% 2,68% 1,37% L8,09%J48 53,29% 58,78% 57,68% 67,42% 67,29% 53,84% J48 L4,94% L0,47% L3,16% L0,69% 0,62% L1,17%RandomForest 61,04% 69,35% 67,01% 74,35% 71,74% 62,41% RandomForest L4,94% 0,48% L1,78% L0,96% L2,95% L2,13%KNN Euclidean 60,70% 72,50% 64,47% 77,23% 75,58% 58,78% KNN Euclidean 0,35% L0,75% 1,44% L1,58% L1,10% L5,01%KNN Manhattan 58,71% 71,95% 64,34% 78,05% 76,96% 61,59% KNN Manhattan L2,95% 0,61% L0,48% L0,55% L0,82% L1,85%
unit_applyDynamicRangeCompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 47,12% 64,47% 56,58% 61,38% 49,58% 31,48% Naive Bayes L9,95% 1,09% L4,67% 0,34% L3,23% L28,53%SMO PolyKernel 58,57% 73,53% 67,42% 77,98% 76,95% 62,90% SMO PolyKernel L5,08% L1,58% L3,09% L1,31% L3,36% L3,22%SMO RBFKernel 47,46% 71,05% 62,48% 64,74% 72,15% 55,42% SMO RBFKernel L7,48% 2,26% L1,51% 0,89% L0,96% L8,98%J48 50,55% 57,61% 56,93% 63,86% 62,42% 51,30% J48 L7,68% L1,64% L3,91% L4,26% L4,25% L3,71%RandomForest 60,70% 64,68% 65,16% 70,78% 70,71% 61,11% RandomForest L5,28% L4,19% L3,63% L4,53% L3,98% L3,43%KNN Euclidean 57,48% 70,37% 62,69% 76,61% 74,28% 59,05% KNN Euclidean L2,88% L2,88% L0,34% L2,20% L2,40% L4,73%KNN Manhattan 57,82% 70,58% 61,32% 76,95% 75,72% 60,01% KNN Manhattan L3,84% L0,75% L3,49% L1,65% L2,06% L3,43%
unit_applyHarmonicDistortionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 54,46% 65,84% 60,63% 62,49% 53,91% 57,06% Naive Bayes L2,61% 2,46% L0,62% 1,44% 1,10% L2,95%SMO PolyKernel 59,67% 73,46% 69,21% 78,33% 78,12% 63,17% SMO PolyKernel L3,98% L1,65% L1,30% L0,96% L2,19% L2,95%SMO RBFKernel 51,30% 70,78% 64,48% 66,32% 74,69% 62,28% SMO RBFKernel L3,63% 1,99% 0,48% 2,47% 1,58% L2,12%J48 54,32% 59,47% 59,12% 68,18% 66,46% 55,69% J48 L3,91% 0,21% L1,72% 0,07% L0,21% 0,68%RandomForest 63,45% 67,22% 66,94% 74,07% 74,01% 63,71% RandomForest L2,54% L1,65% L1,85% L1,24% L0,68% L0,83%KNN Euclidean 60,29% 71,54% 62,62% 77,78% 76,40% 60,15% KNN Euclidean L0,06% L1,71% L0,41% L1,03% L0,27% L3,64%KNN Manhattan 57,54% 72,02% 63,85% 77,64% 77,64% 61,38% KNN Manhattan L4,12% 0,68% L0,96% L0,97% L0,14% L2,06%
unit_applyMp3CompressionClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 56,11% 61,94% 58,44% 60,77% 58,84% 59,46% Naive Bayes L0,96% L1,44% L2,81% L0,28% 6,03% L0,55%SMO PolyKernel 63,31% 74,90% 69,21% 76,96% 79,22% 65,29% SMO PolyKernel L0,34% L0,21% L1,30% L2,34% L1,10% L0,82%SMO RBFKernel 54,53% 69,34% 63,44% 62,55% 72,77% 63,99% SMO RBFKernel L0,41% 0,55% L0,55% L1,31% L0,34% L0,41%J48 54,74% 58,57% 57,13% 68,80% 67,56% 57,13% J48 L3,49% L0,69% L3,70% 0,69% 0,89% 2,13%RandomForest 64,75% 68,65% 67,63% 73,12% 71,13% 64,47% RandomForest L1,23% L0,21% L1,17% L2,19% L3,56% L0,07%KNN Euclidean 62,00% 71,40% 62,56% 77,65% 74,90% 61,18% KNN Euclidean 1,65% L1,85% L0,48% L1,16% L1,78% L2,61%KNN Manhattan 61,45% 70,58% 63,86% 76,76% 77,44% 62,35% KNN Manhattan L0,21% L0,75% L0,96% L1,85% L0,34% L1,10%
unit_applySpeedupClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 56,58% 64,41% 61,66% 61,93% 36,62% 60,70% Naive Bayes L0,48% 1,03% 0,41% 0,89% L16,19% 0,68%SMO PolyKernel 63,85% 72,98% 70,37% 79,57% 80,45% 66,46% SMO PolyKernel 0,20% L2,13% L0,14% 0,27% 0,14% 0,34%SMO RBFKernel 55,01% 68,93% 62,69% 65,43% 72,09% 63,58% SMO RBFKernel 0,07% 0,13% L1,30% 1,57% L1,03% L0,82%J48 55,42% 62,14% 59,26% 67,97% 69,41% 57,06% J48 L2,82% 2,89% L1,58% L0,14% 2,74% 2,06%RandomForest 64,41% 69,00% 67,22% 75,65% 73,67% 64,75% RandomForest L1,58% 0,13% L1,57% 0,34% L1,02% 0,21%KNN Euclidean 61,87% 70,92% 63,17% 79,97% 78,33% 61,80% KNN Euclidean 1,51% L2,33% 0,13% 1,16% 1,65% L1,99%KNN Manhattan 60,56% 71,40% 64,95% 78,60% 79,02% 63,78% KNN Manhattan L1,10% 0,07% 0,14% 0,00% 1,24% 0,34%
unit_applyWowResamplingClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 56,72% 64,06% 60,70% 61,53% 31,55% 59,74% Naive Bayes L0,34% 0,68% L0,55% 0,48% L21,27% L0,27%SMO PolyKernel 63,38% 75,38% 69,76% 79,22% 81,62% 67,49% SMO PolyKernel L0,28% 0,27% L0,75% L0,07% 1,31% 1,37%SMO RBFKernel 56,24% 69,14% 63,65% 65,02% 71,68% 64,68% SMO RBFKernel 1,31% 0,34% L0,34% 1,16% L1,44% 0,28%J48 55,00% 61,05% 59,94% 68,04% 68,58% 56,51% J48 L3,23% 1,79% L0,90% L0,07% 1,91% 1,50%RandomForest 65,02% 69,76% 69,48% 75,58% 73,19% 66,53% RandomForest L0,96% 0,89% 0,69% 0,28% L1,50% 1,99%KNN Euclidean 60,83% 72,29% 64,41% 79,36% 76,40% 63,31% KNN Euclidean 0,48% L0,96% 1,37% 0,55% L0,27% L0,48%KNN Manhattan 60,01% 71,81% 65,70% 78,54% 77,64% 63,38% KNN Manhattan L1,65% 0,48% 0,89% L0,07% L0,14% L0,07%
unit_applyDelayClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 56,79% 63,86% 61,45% 61,53% 47,94% 60,49% Naive Bayes L0,27% 0,48% 0,21% 0,48% L4,87% 0,48%SMO PolyKernel 63,86% 74,21% 68,87% 79,29% 79,84% 67,97% SMO PolyKernel 0,20% L0,89% L1,64% 0,00% L0,48% 1,85%SMO RBFKernel 55,42% 68,66% 63,79% 63,65% 73,05% 64,33% SMO RBFKernel 0,48% L0,14% L0,20% L0,21% L0,07% L0,07%J48 57,00% 61,39% 60,29% 69,14% 70,85% 55,28% J48 L1,24% 2,13% L0,55% 1,03% 4,18% 0,27%RandomForest 64,96% 69,41% 68,32% 75,86% 74,56% 64,82% RandomForest L1,02% 0,55% L0,47% 0,55% L0,13% 0,28%KNN Euclidean 61,11% 73,46% 62,76% 78,81% 76,54% 63,38% KNN Euclidean 0,75% 0,21% L0,28% 0,00% L0,14% L0,41%KNN Manhattan 62,07% 70,92% 64,41% 78,19% 77,85% 65,02% KNN Manhattan 0,41% L0,41% L0,41% L0,41% 0,07% 1,57%
unit_applyHighpassFilterClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 48,90% 56,45% 55,56% 49,72% 32,44% 52,60% Naive Bayes L8,16% L6,93% L5,69% L11,32% L20,37% L7,41%SMO PolyKernel 58,37% 71,33% 66,19% 75,52% 76,33% 64,06% SMO PolyKernel L5,28% L3,78% L4,32% L3,78% L3,98% L2,06%SMO RBFKernel 46,43% 66,53% 59,74% 61,18% 68,31% 53,50% SMO RBFKernel L8,50% L2,26% L4,25% L2,68% L4,80% L10,91%J48 51,09% 62,34% 55,35% 67,49% 64,89% 51,58% J48 L7,14% 3,09% L5,49% L0,62% L1,78% L3,43%RandomForest 61,93% 68,66% 66,05% 73,18% 71,88% 61,80% RandomForest L4,05% L0,21% L2,74% L2,13% L2,81% L2,74%KNN Euclidean 58,37% 71,33% 59,05% 74,62% 70,72% 58,03% KNN Euclidean L1,99% L1,92% L3,98% L4,19% L5,96% L5,76%KNN Manhattan 56,92% 71,33% 62,21% 75,72% 74,49% 58,23% KNN Manhattan L4,74% 0,00% L2,61% L2,88% L3,29% L5,22%
unit_applyLowpassFilterClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 50,28% 51,24% 57,20% 52,74% 25,24% 53,98% Naive Bayes L6,79% L12,14% L4,05% L8,30% L27,57% L6,03%SMO PolyKernel 58,30% 69,21% 62,21% 71,12% 71,81% 63,58% SMO PolyKernel L5,35% L5,90% L8,30% L8,17% L8,51% L2,53%SMO RBFKernel 49,25% 63,37% 60,36% 60,29% 65,22% 57,47% SMO RBFKernel L5,69% L5,42% L3,64% L3,57% L7,89% L6,93%J48 51,99% 58,58% 55,15% 63,85% 64,07% 50,68% J48 L6,24% L0,68% L5,69% L4,26% L2,60% L4,33%RandomForest 60,42% 65,23% 61,60% 69,47% 68,25% 59,74% RandomForest L5,56% L3,64% L7,19% L5,83% L6,44% L4,80%KNN Euclidean 56,31% 65,02% 53,63% 68,59% 66,12% 56,10% KNN Euclidean L4,05% L8,23% L9,40% L10,22% L10,56% L7,69%KNN Manhattan 56,59% 65,84% 55,00% 70,16% 68,79% 55,97% KNN Manhattan L5,07% L5,49% L9,81% L8,44% L8,98% L7,48%
Table 4.4: Classification of ISMIR data set degraded by low pass filtering degradation
As a summary of this section, if a whole environment, including training and test sets, is
degraded by the same single degradation, there exist three different classifying repercus-
sions: slightly results improvements, provided only in by a reduced group of degradations
in some specific cases; a notable deterioration, caused by degradations that remove a
frequency band of the audio signal and/or introduce a high noise level; and the most
common, a not prominent impact on the classification due to the model construction of
degraded files by the same degradation as the training data set. This last part can be
a good way to improve our results for the classification of audio collection that comes
from a similar source or degradation.
Chapter 5
Results classifying with mixed
degradations
So far, we studied the effect of different degradations on music features and on genre
classification as well, but always for each degradation in an isolated case, i.e. we haven’t
created any training or test set formed by degraded audio tracks from different degra-
dation systems, which is not common in real-world cases. Thereby, in this chapter we
will create an environment formed by training sets from clean audio data sets, and test
sets from mixed real-world degraded data sets (Section 3.2.2); then we will analyse the
results achieved by the genre classification and compare them with the results achieved
by the clean audio classification; finally we will propose some different experiments to
try to improve the results achieved on the classification for the created environment.
5.1 Creation of training and mixed test sets
In order to create 10-CV new fold versions as we described before, we will use the
procedure explained in Section 4.2.1 with a few modifications. First of all we will create
a 10-CV environment from the extracted features of clean data sets; then we will create
only the different test set folds of all the real-world degradations using exactly the same
order of extracted features of the degraded data sets; finally we have to proceed to
exchange *.arff test set file lines (where each line belongs to a concrete audio track
features) between files of clean audio and mixed degraded audio, as the example Figure
32
Chapter 5. Results classifying with mixed degradations 33
5.1. It is important to take care that the order of the created test set folds has to be the
same on clean sets and degraded sets, because we have to exchange features between the
same origin file, in order to avoid missing or repeating any track files.
Audio track 1 attr.
Audio track 2 attr.
Audio track 3 attr.
Audio track 4 attr.
Audio track 5 attr.
Audio track 6 attr.
Audio track 7 attr.
...
Audio track 1 attr.
Audio track 2 attr.
Audio track 3 attr.
Audio track 4 attr.
Audio track 5 attr.
Audio track 6 attr.
Audio track 7 attr.
...
Audio track 1 attr.
Audio track 2 attr.
Audio track 3 attr.
Audio track 4 attr.
Audio track 5 attr.
Audio track 6 attr.
Audio track 7 attr.
...
Audio track 1 attr.
Audio track 2 attr.
Audio track 3 attr.
Audio track 4 attr.
Audio track 5 attr.
Audio track 6 attr.
Audio track 7 attr.
...
Audio track 1 attr.
Audio track 2 attr.
Audio track 3 attr.
Audio track 4 attr.
Audio track 5 attr.
Audio track 6 attr.
Audio track 7 attr.
...
Clean Test Set Fold(*.arff file)
Mixed Degraded Test Set Fold(*.arff file)
Real World
Degradation I
Real World Degradation III
Real World Degradation II
Single Degraded Test Set Folds(*.arff files)
Figure 5.1: Creation of mixed degraded test set file from one fold of the 10-CV,picking one audio track feature from each single degraded test set circularly over alltest set long. Each table cell corresponds to each *.arff line that belongs to feature
attribute values and the audio track genre at the end of line.
This process has created 10 fold training sets (clean audio) and 10 fold test sets (mixed
degraded audio) for each feature and data set (6 features ∗ 2 data set = 12), resulting
of 12 complete 10-CV environments. Then, the classification is performed fold per fold,
as in classifications made before, using the last created environments. With the results,
we will be able to calculate the mean and variance between correctly fold percentage
classification of the same 10-CV.
Chapter 5. Results classifying with mixed degradations 34
5.2 Training and classifying with all attributes
As we thought, the classification of mixed degraded audio instead of clean audio, using
clean data set as a training set, has a clearly deterioration on all data sets, classifiers and
features. This fact is due to modifications on audio by the different degradation which
makes a confusion on the classification model constructed by clean audio. On Tables 4.1
there is the ISMIR data set example of the classification results and the deterioration
comparing with classification of clean audio. The most affected feature is SSD in all
classifiers, except from SMO RBFKernel which has its highest deterioration on TSSD
feature. The deterioration of SSD is around 20% which in an important degradation.
The second most affected feature is TSSD, because it is directly related to SSD. The
following ones are RP and MVD features, with a deterioration around 15%, and finally
RH and TRH with a deterioration around 12%. About the variance results, in both
classification ways, it has no prominent result. About GTZAN data set, the maximum
deterioration is also located on SSD features, being slightly higher than on ISMIR data
set (around 25%). RH is also the less affected classification on GTZAN data set with
average values of 11%.
(a) Mean percentage of correctly classified instances of mixed degradaded data set classifi-cation.
MEAN:&CLEAN&audio&Vs.&MIXED&DEGRADED&audio&(ISMIR)Clean&audio&classificationClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 57,07% 63,38% 61,25% 61,04% 52,81% 60,01%SMO PolyKernel 63,65% 75,11% 70,51% 79,29% 80,31% 66,12%SMO RBFKernel 54,94% 68,79% 63,99% 63,86% 73,12% 64,40%J48 58,23% 59,26% 60,84% 68,11% 66,67% 55,01%RandomForest 65,98% 68,86% 68,79% 75,31% 74,69% 64,54%KNN Euclidean 60,35% 73,25% 63,04% 78,81% 76,68% 63,79%KNN Manhattan 61,66% 71,33% 64,81% 78,61% 77,78% 63,44%
Mixed°raded&audio&classification Deterioration&=&Clean&K&Mixed°radedClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89% Naive Bayes 13,86% 14,20% 14,06% 18,05% 12,49% 14,13%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76% SMO PolyKernel 12,62% 15,09% 13,86% 21,47% 21,40% 10,36%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41% SMO RBFKernel 9,26% 12,01% 12,49% 14,06% 15,37% 13,99%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37% J48 11,52% 13,92% 14,13% 19,48% 15,23% 8,64%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27% RandomForest 14,68% 15,37% 19,68% 21,06% 17,49% 14,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24% KNN Euclidean 12,07% 17,84% 15,99% 24,21% 19,28% 12,55%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67% KNN Manhattan 12,97% 15,71% 16,39% 22,16% 19,96% 10,77%
Mixed°raded&audio&classification:&Classification&using&clean&audio&data&set&as&a&training&and&mixed°raded&audio&data&set&as&a&test&set.
Clean&audio&classification:&Classification&using&clean&audio&data&set&as&a&training&and&as&a&test&set.
(b) Mean deterioration on classification of mixed degradations comparing to classification ofclean audio, using the same clean audio data set as training set (a darker highlighted valuemeans a higher deterioration).
MEAN:&CLEAN&audio&Vs.&MIXED&DEGRADED&audio&(ISMIR)Clean&audio&classificationClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 57,07% 63,38% 61,25% 61,04% 52,81% 60,01%SMO PolyKernel 63,65% 75,11% 70,51% 79,29% 80,31% 66,12%SMO RBFKernel 54,94% 68,79% 63,99% 63,86% 73,12% 64,40%J48 58,23% 59,26% 60,84% 68,11% 66,67% 55,01%RandomForest 65,98% 68,86% 68,79% 75,31% 74,69% 64,54%KNN Euclidean 60,35% 73,25% 63,04% 78,81% 76,68% 63,79%KNN Manhattan 61,66% 71,33% 64,81% 78,61% 77,78% 63,44%
Mixed°raded&audio&classification Deterioration&=&Clean&K&Mixed°radedClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89% Naive Bayes 13,86% 14,20% 14,06% 18,05% 12,49% 14,13%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76% SMO PolyKernel 12,62% 15,09% 13,86% 21,47% 21,40% 10,36%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41% SMO RBFKernel 9,26% 12,01% 12,49% 14,06% 15,37% 13,99%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37% J48 11,52% 13,92% 14,13% 19,48% 15,23% 8,64%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27% RandomForest 14,68% 15,37% 19,68% 21,06% 17,49% 14,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24% KNN Euclidean 12,07% 17,84% 15,99% 24,21% 19,28% 12,55%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67% KNN Manhattan 12,97% 15,71% 16,39% 22,16% 19,96% 10,77%
Mixed°raded&audio&classification:&Classification&using&clean&audio&data&set&as&a&training&and&mixed°raded&audio&data&set&as&a&test&set.
Clean&audio&classification:&Classification&using&clean&audio&data&set&as&a&training&and&as&a&test&set.
Table 5.1: Mixed degraded classification of ISMIR data set, using clean data set as atraining set.
The last values analyzed confirm our thoughts about the main goal of this thesis: degra-
dations have an important impact on the classification results due to their effect on
Chapter 5. Results classifying with mixed degradations 35
different features used in this study. On the other hand, there is not an specific fea-
ture or frequency band that is clearly more affected than others, so it won’t be an easy
work to avoid the degradations effect and restore the results achieved by the clean audio
classification. The experiments that we have on mind are related to selecting the most
robust features and removing or missing the weakest ones in the classification process.
5.3 Attribute selection
In order to avoid the degradation effect, we will select the most robust feature attributes
that we will use on the classification process, i.e. we will use attributes that have less
shifts on mean and variance when we apply degradations over the clean audio tracks,
so the difference between degraded and clean audio is slight and it shouldn’t lead to
a wrong classification result. Although the mixed degradations involve only real-world
degradations, for the attribute selection we will take into account all the degradations
studied in this thesis (both synthetic 3.2.1 and real-world 3.2.2) because this way we
extend our study to other real-world degradations that could be made by combinations
of the different synthetic degradations studied.
5.3.1 Attribute selection process
Before explaining the attribute selection process, we have to introduce an important
concept that we will use in the selection process: the worst degradation. This is an
inexistent degradation which is constructed by the highest attribute mean and variance
differences between the different degradations that we discussed in this study. We call it
worst degradation because if it could exist it would be the degradation which would have
a higher impact on audio features, consequently affecting the classification process as
well. One worst degradation is created for each data set and feature, using all synthetic
and real-world degradation for mean and variance differences.
The creation process of each worst degradation uses the mean and variance differences
calculated on Section 4.1.1; then, the highest shift value is selected between all the
degradations that belong to the same data set for each mean and variance difference
value; finally we will have all feature mean and variance differences for each data set and
feature. The creation process is on the example Figure 5.2.
Chapter 5. Results classifying with mixed degradations 36
'HJUDGDWLRQ�,
'HJUDGDWLRQ�,,
'HJUDGDWLRQ�,,,
�
�
�
�
�
��� � �
9DOXH�GLIIHUHQFHV
$WWULEXWHV
'LIIHUHQFHV�EHWZHHQ�FOHDQ�DQG�GHJUDGHG�DXGLR
+LJKHU�GLIIHUHQFHV�EHWZHHQ�FOHDQ�DQG�GHJUDGHG�DXGLR
:RUVW�GHJUDGDWLRQ
��� � �
$WWULEXWHV
�
�
�
�
�
9DOXH�GLIIHUHQFHV
Figure 5.2: Example of performing process of the worst degradation: an inexistentdegradation made by the higher differences of all studied degradations for each attribute
After the creation of different worst degradations, we perform the attribute selection
which consists in cutting-off of the highest differences on mean and variance attributes.
We set the highest difference attribute as the 100% level for each data set, feature set,
mean and variance. Then, we set up two levels of selectivity: the tolerant level, which
is softer on the attribute selection, and the strong level, which is more selective on
the attribute selection. The tolerant level is always higher than the strong level. It’s
important to explain that the percentage level of cut-off is not referred to the number
of attributes, but it is referred to the level of highest attribute difference of the worst
degradation of which we are performing the attribute selection, being this one the 100%
Chapter 5. Results classifying with mixed degradations 37
level. Using this criteria, an attribute difference over the established could be considered
a weak attribute, whereas an attribute difference above the established level could be a
robust one. About the mean and variance interaction, if an attribute is selected as weak
one in either measure, finally is considered a weak attribute for the future experiments,
i.e. an attribute has to be robust in mean as well as in variance to be considered a robust
attribute.
Here we will analyse the selection process specifically for the ISMIR data set, although
the feature selection between ISMIR and GTZAN is very similar. The complete worst
degradations and attribute selection are presented in Appendix A.
The criteria for the setting of the selection level is performed by the observation of the
worst degradations resulting of the last process. We try to establish two levels taking
account of the emergence of a possible threshold of the several attribute groups for each
feature and also we have to take care not to remove too many attributes in order to
provide the classifier of enough values to perform a correct classification.
In Figure 5.3(a) there are the mean differences of the worst degradation of MVD ISMIR
data set. In the right side of the plot, there are red dashed lines indicating the different
levels: the 100% level corresponds to the highest attribute difference, the 90% level
corresponds to the tolerant selection level and the 35% level corresponds to the strong
selection level. In Figure 5.3(b) there is the same degradation after the application of
the tolerant selection, where 60 attributes are removed. In this case, all the attributes
that belong to the skewness measures of the MVD are removed due to the effect of the
mean differences on the low pass filtering. In Figure 5.3(c) there is the same degradation
after the application of the strong selection, where 128 attributes are removed. In this
case, all the attributes that belong to the variance measures, the first bin that belongs
to the median measure, the 7 first bins that belong to maximum measures and all the
attributes already removed in the tolerant selection. This is due to the effect of several
degradations, also including the low pass filtering degradation.
In the RP (Figure 5.4), the worst degradation has not the weakest attributes located in
a continuous range of attributes, so in this case the use of our method of level selection
is more useful than other methods like attribute range selection. Our set levels are 75%
for the tolerant level and 60% for the strong level. In this case the number of attributes
Chapter 5. Results classifying with mixed degradations 38
0 50 100 150 200 250 300 350 4000
1
2
mvd Original Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd First Selection Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd Second Selection Mean
0 50 100 150 200 250 300 350 4000
1
2mvd Original Variance
0 50 100 150 200 250 300 350 4000
1
2mvd First Selection Variance
0 50 100 150 200 250 300 350 4000
1
2mvd Second Selection Variance
100%
90%
35%
(a) MVD worst degradation: maximum attribute = 100%0 50 100 150 200 250 300 350 4000
1
2
mvd Original Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd First Selection Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd Second Selection Mean
0 50 100 150 200 250 300 350 4000
1
2mvd Original Variance
0 50 100 150 200 250 300 350 4000
1
2mvd First Selection Variance
0 50 100 150 200 250 300 350 4000
1
2mvd Second Selection Variance
(b) Modified MVD worst degradation due to the application of the tolerant cut-off attributes (dottedline) = 90%
0 50 100 150 200 250 300 350 4000
1
2
mvd Original Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd First Selection Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd Second Selection Mean
0 50 100 150 200 250 300 350 4000
1
2mvd Original Variance
0 50 100 150 200 250 300 350 4000
1
2mvd First Selection Variance
0 50 100 150 200 250 300 350 4000
1
2mvd Second Selection Variance
(c) Modified MVD worst degradation due to the application of the strong cut-off attributes (dashedline) = 35%
Figure 5.3: MVD worst degradation and attributes selection on ISMIR data set:tolerant cut-off (dotted line) and strong cut-off (dashed line).
removed are 122 and 325 , respectively. The attributes removed in this case are unequally
distributed across all the feature set.
0 200 400 600 800 1000 1200 14000
0.05
0.1
0.15rp Original Mean
0 200 400 600 800 1000 1200 14000
0.05
0.1
rp First Selection Mean
0 200 400 600 800 1000 1200 14000
0.05
0.1
rp Second Selection Mean
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp Original Variance
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp First Selection Variance
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp Second Selection Variance
Figure 5.4: RP worst degradation and attributes selection on ISMIR data set: tolerantcut-off (dotted line) and strong cut-off (dashed line).
Regarding the RH selection process (Figure 5.5), the worst degradation has the shape of
a decreasing curve, having its higher shifts in the lower part of the feature. In this case
the tolerant selection level is on 40%, resulting from a removal of the 7 first attributes of
the feature set. On the other hand, the strong selection level is on 25%, resulting from a
removal of the 17 first attributes of the feature set. In this case, the attributes removal
is due to the effect of several degradations, most of them have the same decreasing curve
as a graph shape.
Chapter 5. Results classifying with mixed degradations 39
0 10 20 30 40 50 600
5
10
15
rh Original Mean
0 10 20 30 40 50 600
5
10
15
rh First Selection Mean
0 10 20 30 40 50 600
5
10
15
rh Second Selection Mean
0 10 20 30 40 50 600
20
40
60rh Original Variance
0 10 20 30 40 50 600
20
40
60rh First Selection Variance
0 10 20 30 40 50 600
20
40
60rh Second Selection Variance
Figure 5.5: RH worst degradation and attributes selection on ISMIR data set: toler-ant cut-off (dotted line) and strong cut-off (dashed line).
In respect to the attribute removing of SSD features (Figure 5.6), as the worst degrada-
tion has the highest differences on the skewness measures part the weakest attributes are
from 73 to 96, the skewness region for the SSD features. The tolerant level is set on 50%,
resulting of a removal of 12 attributes, whereas the strong level is set on 20%, resulting
of a removal of 24 attributes (the whole skewness measure group). The removal of these
attributes is due to the effect of the high pass filtering (attributes from 73 to 80) and
low pass filtering (attributes from 81 to 96).
0 20 40 60 80 100 120 140 1600
50
100
ssd Original Mean
0 20 40 60 80 100 120 140 1600
50
100
ssd First Selection Mean
0 20 40 60 80 100 120 140 1600
50
100
ssd Second Selection Mean
0 20 40 60 80 100 120 140 1600
2000
4000
6000
ssd Original Variance
0 20 40 60 80 100 120 140 1600
2000
4000
6000
ssd First Selection Variance
0 20 40 60 80 100 120 140 1600
2000
4000
6000
ssd Second Selection Variance
Figure 5.6: SSD worst degradation and attributes selection on ISMIR data set: tol-erant cut-off (dotted line) and strong cut-off (dashed line).
For the TSSD features (Figure 5.7), the worst degradation has a range of high shifts
between attributes from 241 to 264. The degradation effect of the remaining attributes is
negligible. However, the level of the difference between the affected attributes is different
and not regular for all of them. The tolerant selection level is set on 53%, resulting of a
removal of 11 attributes, whereas the strong selection level is set on 10%, resulting of a
removal of 24 attributes.
0 200 400 600 800 10000
1
2x 104 tssd Original Mean
0 200 400 600 800 10000
1
2x 104 tssd First Selection Mean
0 200 400 600 800 10000
1
2x 104 tssd Second Selection Mean
0 200 400 600 800 10000
2
4x 108 tssd Original Variance
0 200 400 600 800 10000
2
4x 108 tssd First Selection Variance
0 200 400 600 800 10000
2
4x 108 tssd Second Selection Variance
_Figure 5.7: TSSD worst degradation and attributes selection on ISMIR data set:
tolerant cut-off (dotted line) and strong cut-off (dashed line).
Chapter 5. Results classifying with mixed degradations 40
Regarding the TRH feature (Figure 5.8), in its worst degradation there are several sep-
arated groups that remind us the RH shape, repeated over all the TRH feature. This
fact is due to the TRH feature structure, that is basically the same RH feature model
repeated several times over the track recording, resulting of similar differences between
degraded and clean audio, repeated several times over the whole temporal feature TRH.
The tolerant selection level is set on 20% and the strong one is set on 10%. The number
of attributes removed is 4 and 19, respectively. In the tolerant selection level the first
bins that belong to two of the mentioned RH groups are removed; in the strong selection
level that first bins that belong to four of the mentioned RH groups as well are removed.
0 50 100 150 200 250 300 350 4000
50
100trh Original Mean
0 50 100 150 200 250 300 350 4000
50
100trh First Selection Mean
0 50 100 150 200 250 300 350 4000
50
100trh Second Selection Mean
0 50 100 150 200 250 300 350 4000
5000
10000trh Original Variance
0 50 100 150 200 250 300 350 4000
5000
10000trh First Selection Variance
0 50 100 150 200 250 300 350 4000
5000
10000trh Second Selection Variance
Figure 5.8: TRH worst degradation and attributes selection on ISMIR data set:tolerant cut-off (dotted line) and strong cut-off (dashed line).
In respect to the worst variance of the differences between clean and degraded audio,
there are only very isolated cases where we found a high variance without a high mean
difference, but we also performed a selection on variance graphs, using exactly the same
procedure that we used on the mean graphs. As we already said before, if an attribute is
weak either on mean or on variance, we consider it as a weak attribute and we proceed
to the removal of it. All the different graphs of the worst degradations with mean and
variance difference values with their own selection levels is presented in the Appendix A.
(a) Tolerant cut-off
Attribute(Selection((ISMIR)Tolerant)cut,offFeature RH RP MVD SSD TSSD TRHOriginal dimensionality 60 1440 420 168 1176 420Num. Attributes removed 7 122 60 12 11 4Num. Attributes remaining 53 1318 360 156 1165 416Removing relative attribute set size 11,67% 8,47% 14,29% 7,14% 0,94% 0,95%Remaining relative attribute set size 88,33% 91,53% 85,71% 92,86% 99,06% 99,05%
Strong)cut,offFeature RH RP MVD SSD TSSD TRHOriginal dimensionality 60 1440 420 168 1176 420Num. Attributes removed 17 325 128 24 24 19Num. Attributes remaining 43 1115 292 144 1152 401Removing relative attribute set size 28,33% 22,57% 30,48% 14,29% 2,04% 4,52%Remaining relative attribute set size 71,67% 77,43% 69,52% 85,71% 97,96% 95,48%
(b) Strong cut-off
Attribute(Selection((ISMIR)Tolerant)cut,offFeature RH RP MVD SSD TSSD TRHOriginal dimensionality 60 1440 420 168 1176 420Num. Attributes removed 7 122 60 12 11 4Num. Attributes remaining 53 1318 360 156 1165 416Removing relative attribute set size 11,67% 8,47% 14,29% 7,14% 0,94% 0,95%Remaining relative attribute set size 88,33% 91,53% 85,71% 92,86% 99,06% 99,05%
Strong)cut,offFeature RH RP MVD SSD TSSD TRHOriginal dimensionality 60 1440 420 168 1176 420Num. Attributes removed 17 325 128 24 24 19Num. Attributes remaining 43 1115 292 144 1152 401Removing relative attribute set size 28,33% 22,57% 30,48% 14,29% 2,04% 4,52%Remaining relative attribute set size 71,67% 77,43% 69,52% 85,71% 97,96% 95,48%
Table 5.2: Number of attributes selected on ISMIR data set
Chapter 5. Results classifying with mixed degradations 41
As a summary, in the Table 5.2 there are the different relations and values regarding the
attribute selection of the ISMIR data set. In the majority of the features, the number of
removed attributes is around the double in the strong selection comparing to the tolerant.
selection. The feature with the strongest selectivity level is MVD with a removal of 30%
of attributes on the strong selection, whereas features with less attributes removed are
the temporal features TSSD and TRH, which could even have negligible effect on the
classification.
(a) Tolerant cut-off
Attribute(Selection((GTZAN)Tolerant)cut,offFeature RH RP MVD SSD TSSD TRHOriginal dimensionality 60 1440 420 168 1176 420Num. Attributes removed 12 102 60 15 13 2Num. Attributes remaining 48 1338 360 153 1163 418Removing relative attribute set size 20,00% 7,08% 14,29% 8,93% 1,11% 0,48%Remaining relative attribute set size 80,00% 92,92% 85,71% 91,07% 98,89% 99,52%
Strong)cut,offFeature RH RP MVD SSD TSSD TRHOriginal dimensionality 60 1440 420 168 1176 420Num. Attributes removed 22 311 126 23 23 18Num. Attributes remaining 38 1129 294 145 1153 402Removing relative attribute set size 36,67% 21,60% 30,00% 13,69% 1,96% 4,29%Remaining relative attribute set size 63,33% 78,40% 70,00% 86,31% 98,04% 95,71%
(b) Strong cut-off
Attribute(Selection((GTZAN)Tolerant)cut,offFeature RH RP MVD SSD TSSD TRHOriginal dimensionality 60 1440 420 168 1176 420Num. Attributes removed 12 102 60 15 13 2Num. Attributes remaining 48 1338 360 153 1163 418Removing relative attribute set size 20,00% 7,08% 14,29% 8,93% 1,11% 0,48%Remaining relative attribute set size 80,00% 92,92% 85,71% 91,07% 98,89% 99,52%
Strong)cut,offFeature RH RP MVD SSD TSSD TRHOriginal dimensionality 60 1440 420 168 1176 420Num. Attributes removed 22 311 126 23 23 18Num. Attributes remaining 38 1129 294 145 1153 402Removing relative attribute set size 36,67% 21,60% 30,00% 13,69% 1,96% 4,29%Remaining relative attribute set size 63,33% 78,40% 70,00% 86,31% 98,04% 95,71%
Table 5.3: Number of attributes selected on GTZAN data set
Regarding the GTZAN data set (Tables 5.3), the number of attributes removed is close to
the ISMIR attribute selection, except for the RH case, in which the difference of removed
attributes is about 10% between both data sets, because it has a less pronounced curve,
the attribute selection is harder than on ISMIR data set. About the remainder features,
the differences between both data sets are around 1%.
5.3.2 Possible results with attribute selection
Using all the information about the feature selection, we will proceed to two different
experiments in order to try to achieve the same results as the classification of clean audio.
We expect three possible results:
• A classification results improvement with respect to the results achieved by the
classification of mixed degraded audio using all features: this could mean that
we removed the weaker attributes against the degradations without affecting the
correct genre classification of the different audio tracks.
Chapter 5. Results classifying with mixed degradations 42
• A similar classification results with respect to the results achieved by the classifi-
cation of mixed degraded audio using all features: in this case, it could mean that
the attributes that we removed are not important or useful on the classification of
mixed degradation audio tracks. It could lead to a new experiment to study the
effect of the removal of the same attributes in the classification of clean audio, in
order to try to reduce the dimensionality of the studied attributes.
• A classification results deterioration with respect to the results achieved by the
classification of mixed degraded audio using all features: the removal of weakest
attributes is not allowed in the genre classification of mixed degraded audio files.
5.3.3 Training and classifying with most robust attributes
This is the first experiment that we will perform in order to improve the genre classifica-
tion results of mixed degraded audio and using clean audio as a training set. We know
that in the best of the cases we will achieve the same results as in the classification of
clean audio.
In this experiment we will proceed to remove the weakest attributes from our training and
test sets as well, using both selections obtained in the previous section. The procedure
is to copy and modify the different *.arff files, which contain the attribute values of each
audio track, removing the header parts of attributes that we want to remove, as well as
the different values from each line, which belong to the attribute value that we want to
remove. We have to repeat the procedure for each fold (training and test sets) of the 12
environments of the 10-CV, using in each feature, the attributes selected to be removed
in the previous procedure. We have to perform it for the tolerant selection as well as for
the strong one. (24 complete 10-CV environments). Then, the classification is performed
as in the other sections, resulting of a mean and variance between the 10 folds of the
classification.
The Table 5.4 shows the results of the classification of the before mentioned experiments
using the strong selection of the ISMIR data set. The results achieved by this experiment
are similar as the ones achieved in Section 5.2, so we get the results commented in the
second point of Section 5.3.2.
Chapter 5. Results classifying with mixed degradations 43
It is surprisingly good that MVD feature, which is the one with the highest percentage of
removed attributes (30%), achieved the best classification marks. In addition, with the
KNN Euclidean classifier, it has a percentage of correct instances 2,26% higher than the
classification without attributes removing. This means that in the MVD case, for the
classification of mixed degraded audio files, the removed attributes are totally useless. In
addition, for RP and SSD, even with their significant percentage of attributes removal,
the results still are very similar to the mixed degraded audio files classification, only with
maximum deterioration of 1,51% in all their classifiers cases. In RP features, the clas-
sification is slightly worst, but is still very similar to the other mentioned classification.
About TSSD and TRH, the results are not significant due to the low ratio of removed
attributes.
(a) Mean percentage of correctly classified instances (highlighted values mean an improve-ment with respect to mixed degraded classification without attribute selection)
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67%
Difference%=%2nd%Scen%3%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%2nd%scenario
Negative%%=%deterioration%of%the%classification%with%2nd%scenarioClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,48% 49,25% 46,91% 42,93% 40,05% 45,82% Naive Bayes 0,28% 0,07% 30,27% 30,07% 30,28% 30,07%SMO PolyKernel 48,90% 60,02% 56,44% 58,16% 58,85% 55,62% SMO PolyKernel 32,13% 0,00% 30,21% 0,34% 30,07% 30,14%SMO RBFKernel 47,26% 55,69% 51,65% 49,38% 57,75% 50,62% SMO RBFKernel 1,58% 31,10% 0,14% 30,41% 0,00% 0,21%J48 44,17% 45,75% 47,26% 48,63% 52,20% 45,96% J48 32,54% 0,41% 0,55% 0,00% 0,76% 30,41%RandomForest 48,42% 55,01% 49,24% 53,98% 54,94% 51,37% RandomForest 32,88% 1,51% 0,13% 30,27% 32,26% 1,10%KNN Euclidean 45,33% 55,28% 48,49% 54,66% 57,33% 51,10% KNN Euclidean 32,95% 30,14% 1,44% 0,07% 30,07% 30,14%KNN Manhattan 44,78% 55,14% 48,08% 56,79% 57,88% 51,23% KNN Manhattan 33,91% 30,48% 30,34% 0,34% 0,07% 31,44%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 42,94% 49,18% 46,02% 43,55% 40,66% 45,88% Naive Bayes 30,27% 0,00% 31,17% 0,55% 0,34% 0,00%SMO PolyKernel 46,91% 61,39% 55,28% 58,85% 59,12% 55,69% SMO PolyKernel 34,12% 1,37% 31,37% 1,03% 0,20% 30,07%SMO RBFKernel 45,82% 55,49% 49,86% 49,03% 57,75% 50,21% SMO RBFKernel 0,14% 31,30% 31,64% 30,76% 0,00% 30,21%J48 46,09% 46,29% 46,91% 47,12% 51,86% 45,20% J48 30,62% 0,96% 0,20% 31,51% 0,41% 31,16%RandomForest 46,56% 53,16% 49,18% 53,43% 56,65% 50,00% RandomForest 34,73% 30,34% 0,07% 30,81% 30,55% 30,27%KNN Euclidean 44,37% 54,46% 49,31% 54,87% 57,33% 50,21% KNN Euclidean 33,91% 30,96% 2,26% 0,27% 30,07% 31,03%KNN Manhattan 44,71% 54,11% 49,45% 56,17% 57,54% 50,27% KNN Manhattan 33,98% 31,51% 1,03% 30,27% 30,27% 32,40%
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
(b) Mean differences on classification of mixed degradations using the most robust attributescompared to classification of mixed degraded data sets without attribute selection, using cleanaudio data set as training set (positive values = improvement, negative values = deterioration;a darker highlighted value means a better improvement).
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67%
Difference%=%2nd%Scen%3%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%2nd%scenario
Negative%%=%deterioration%of%the%classification%with%2nd%scenarioClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,48% 49,25% 46,91% 42,93% 40,05% 45,82% Naive Bayes 0,28% 0,07% 30,27% 30,07% 30,28% 30,07%SMO PolyKernel 48,90% 60,02% 56,44% 58,16% 58,85% 55,62% SMO PolyKernel 32,13% 0,00% 30,21% 0,34% 30,07% 30,14%SMO RBFKernel 47,26% 55,69% 51,65% 49,38% 57,75% 50,62% SMO RBFKernel 1,58% 31,10% 0,14% 30,41% 0,00% 0,21%J48 44,17% 45,75% 47,26% 48,63% 52,20% 45,96% J48 32,54% 0,41% 0,55% 0,00% 0,76% 30,41%RandomForest 48,42% 55,01% 49,24% 53,98% 54,94% 51,37% RandomForest 32,88% 1,51% 0,13% 30,27% 32,26% 1,10%KNN Euclidean 45,33% 55,28% 48,49% 54,66% 57,33% 51,10% KNN Euclidean 32,95% 30,14% 1,44% 0,07% 30,07% 30,14%KNN Manhattan 44,78% 55,14% 48,08% 56,79% 57,88% 51,23% KNN Manhattan 33,91% 30,48% 30,34% 0,34% 0,07% 31,44%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 42,94% 49,18% 46,02% 43,55% 40,66% 45,88% Naive Bayes 30,27% 0,00% 31,17% 0,55% 0,34% 0,00%SMO PolyKernel 46,91% 61,39% 55,28% 58,85% 59,12% 55,69% SMO PolyKernel 34,12% 1,37% 31,37% 1,03% 0,20% 30,07%SMO RBFKernel 45,82% 55,49% 49,86% 49,03% 57,75% 50,21% SMO RBFKernel 0,14% 31,30% 31,64% 30,76% 0,00% 30,21%J48 46,09% 46,29% 46,91% 47,12% 51,86% 45,20% J48 30,62% 0,96% 0,20% 31,51% 0,41% 31,16%RandomForest 46,56% 53,16% 49,18% 53,43% 56,65% 50,00% RandomForest 34,73% 30,34% 0,07% 30,81% 30,55% 30,27%KNN Euclidean 44,37% 54,46% 49,31% 54,87% 57,33% 50,21% KNN Euclidean 33,91% 30,96% 2,26% 0,27% 30,07% 31,03%KNN Manhattan 44,71% 54,11% 49,45% 56,17% 57,54% 50,27% KNN Manhattan 33,98% 31,51% 1,03% 30,27% 30,27% 32,40%
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
Table 5.4: Mixed degraded classification of ISMIR data set, using clean data set asa training set, and using only the most robust attributes of training and test sets with
strong selection.
In Table 5.5 we present the classification results of the same experiment on GTZAN data
set. In this case, the best feature is also MVD, achieving the best results with 2,10% of
correct classified instances using the KNN Euclidean classifier: the same classifier which
achieved the best results of the ISMIR classification. About RP and SSD, the results
achieved are also similar as the classification of mixed degraded audio, even with the high
ratio of removed attributes. On the other hand, the RP has a deterioration around 5%
which isn’t as high as on the ISMIR case. This is due to the different ratio of removed
Chapter 5. Results classifying with mixed degradations 44
(a) Mean percentage of correctly classified instances (highlighted values mean an improve-ment with respect to mixed degraded classification without attribute selection)
MEAN%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 24,60% 34,30% 27,50% 32,60% 37,30% 25,00%SMO PolyKernel 27,90% 45,30% 35,20% 42,20% 42,70% 29,50%SMO RBFKernel 19,90% 40,40% 27,40% 33,50% 42,50% 24,50%J48 21,00% 24,50% 22,80% 27,40% 29,90% 18,70%RandomForest 22,50% 28,00% 25,90% 36,40% 37,00% 24,40%KNN Euclidean 28,60% 36,50% 27,40% 38,20% 34,70% 20,30%KNN Manhattan 28,30% 36,90% 28,30% 40,90% 41,00% 24,50%
Difference%=%2nd%Scen%4%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 21,50% 34,80% 28,00% 33,10% 37,20% 24,90% Naive Bayes 43,10% 0,50% 0,50% 0,50% 40,10% 40,10%SMO PolyKernel 23,30% 46,30% 36,60% 42,90% 42,70% 29,80% SMO PolyKernel 44,60% 1,00% 1,40% 0,70% 0,00% 0,30%SMO RBFKernel 16,50% 40,00% 27,50% 33,80% 42,50% 25,30% SMO RBFKernel 43,40% 40,40% 0,10% 0,30% 0,00% 0,80%J48 20,30% 23,50% 22,30% 27,90% 30,10% 18,70% J48 40,70% 41,00% 40,50% 0,50% 0,20% 0,00%RandomForest 21,30% 27,80% 24,90% 37,10% 36,30% 22,80% RandomForest 41,20% 40,20% 41,00% 0,70% 40,70% 41,60%KNN Euclidean 24,70% 36,40% 28,80% 39,00% 34,70% 20,50% KNN Euclidean 43,90% 40,10% 1,40% 0,80% 0,00% 0,20%KNN Manhattan 25,20% 35,80% 27,40% 40,60% 41,20% 24,20% KNN Manhattan 43,10% 41,10% 40,90% 40,30% 0,20% 40,30%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 19,50% 34,40% 25,50% 33,30% 37,00% 24,20% Naive Bayes 45,10% 0,10% 42,00% 0,70% 40,30% 40,80%SMO PolyKernel 19,70% 44,90% 34,50% 42,70% 42,60% 25,80% SMO PolyKernel 48,20% 40,40% 40,70% 0,50% 40,10% 43,70%SMO RBFKernel 16,90% 37,90% 24,10% 34,00% 42,50% 23,60% SMO RBFKernel 43,00% 42,50% 43,30% 0,50% 0,00% 40,90%J48 19,40% 23,00% 22,20% 26,90% 29,90% 18,20% J48 41,60% 41,50% 40,60% 40,50% 0,00% 40,50%RandomForest 20,80% 27,30% 27,10% 38,10% 34,00% 22,40% RandomForest 41,70% 40,70% 1,20% 1,70% 43,00% 42,00%KNN Euclidean 22,70% 35,50% 29,50% 39,10% 34,80% 19,40% KNN Euclidean 45,90% 41,00% 2,10% 0,90% 0,10% 40,90%KNN Manhattan 23,40% 35,10% 29,50% 40,40% 41,10% 23,00% KNN Manhattan 44,90% 41,80% 1,20% 40,50% 0,10% 41,50%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
Positive%%=%improving%of%the%classification%with%2nd%scenario
Negative%%=%deterioration%of%the%classification%with%2nd%scenario
(b) Mean differences on classification of mixed degradations using the most robust attributescomparing to classification of mixed degraded data sets without attribute selection, usingclean audio data set as training set (positive values = improvement, negative values = dete-rioration; a darker highlighted value means a better improvement).
MEAN%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 24,60% 34,30% 27,50% 32,60% 37,30% 25,00%SMO PolyKernel 27,90% 45,30% 35,20% 42,20% 42,70% 29,50%SMO RBFKernel 19,90% 40,40% 27,40% 33,50% 42,50% 24,50%J48 21,00% 24,50% 22,80% 27,40% 29,90% 18,70%RandomForest 22,50% 28,00% 25,90% 36,40% 37,00% 24,40%KNN Euclidean 28,60% 36,50% 27,40% 38,20% 34,70% 20,30%KNN Manhattan 28,30% 36,90% 28,30% 40,90% 41,00% 24,50%
Difference%=%2nd%Scen%4%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 21,50% 34,80% 28,00% 33,10% 37,20% 24,90% Naive Bayes 43,10% 0,50% 0,50% 0,50% 40,10% 40,10%SMO PolyKernel 23,30% 46,30% 36,60% 42,90% 42,70% 29,80% SMO PolyKernel 44,60% 1,00% 1,40% 0,70% 0,00% 0,30%SMO RBFKernel 16,50% 40,00% 27,50% 33,80% 42,50% 25,30% SMO RBFKernel 43,40% 40,40% 0,10% 0,30% 0,00% 0,80%J48 20,30% 23,50% 22,30% 27,90% 30,10% 18,70% J48 40,70% 41,00% 40,50% 0,50% 0,20% 0,00%RandomForest 21,30% 27,80% 24,90% 37,10% 36,30% 22,80% RandomForest 41,20% 40,20% 41,00% 0,70% 40,70% 41,60%KNN Euclidean 24,70% 36,40% 28,80% 39,00% 34,70% 20,50% KNN Euclidean 43,90% 40,10% 1,40% 0,80% 0,00% 0,20%KNN Manhattan 25,20% 35,80% 27,40% 40,60% 41,20% 24,20% KNN Manhattan 43,10% 41,10% 40,90% 40,30% 0,20% 40,30%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 19,50% 34,40% 25,50% 33,30% 37,00% 24,20% Naive Bayes 45,10% 0,10% 42,00% 0,70% 40,30% 40,80%SMO PolyKernel 19,70% 44,90% 34,50% 42,70% 42,60% 25,80% SMO PolyKernel 48,20% 40,40% 40,70% 0,50% 40,10% 43,70%SMO RBFKernel 16,90% 37,90% 24,10% 34,00% 42,50% 23,60% SMO RBFKernel 43,00% 42,50% 43,30% 0,50% 0,00% 40,90%J48 19,40% 23,00% 22,20% 26,90% 29,90% 18,20% J48 41,60% 41,50% 40,60% 40,50% 0,00% 40,50%RandomForest 20,80% 27,30% 27,10% 38,10% 34,00% 22,40% RandomForest 41,70% 40,70% 1,20% 1,70% 43,00% 42,00%KNN Euclidean 22,70% 35,50% 29,50% 39,10% 34,80% 19,40% KNN Euclidean 45,90% 41,00% 2,10% 0,90% 0,10% 40,90%KNN Manhattan 23,40% 35,10% 29,50% 40,40% 41,10% 23,00% KNN Manhattan 44,90% 41,80% 1,20% 40,50% 0,10% 41,50%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
Positive%%=%improving%of%the%classification%with%2nd%scenario
Negative%%=%deterioration%of%the%classification%with%2nd%scenario
Table 5.5: Mixed degraded classification of GTZAN data set, using clean data set asa training set, and using only the most robust attributes of training and test sets with
strong selection.
attributes (37% on the GTZAN case and 28% on the ISMIR case). TSSD and TRH
features are in the same situation as in the classification of ISMIR data set.
5.3.4 Training with all attributes and classifying missing the weaker
attributes
In this section we will perform an experiment proceeding to train the model using all the
attributes from clean audio, and missing the weakest attributes on the test set of mixed
degraded audio.
The procedure of this experiment is similar to the other experiments performed and uses
several files that we already have made. The *.arff files for training folds used in this
section will be exactly the same training folds created in Section 5.1, which are made by
all the feature attributes from clean audio tracks. On the other hand, we will proceed
to modify a copy of the *.arff files folds created in the same section, changing in each
line of the *.arff file, all the values that a belong to weak attributes (depending on the
level of selectivity and feature case) by the interrogation sign ?. The Weka software
translates this sign as a missing value, changing the classification algorithm for each
classifier and changing the result achieved. We have to do the whole procedure for the
Chapter 5. Results classifying with mixed degradations 45
tolerant selection level as well as the strong selection level, having at the end 24 complete
10-CV environments (the same number of environments as in the previous section).
In Table 5.6 we present the classifying results achieved by the last described experiment.
In this case, the results achieved are worst than in the previous section and in less cases
we see a slightly improvement. The worst case of deterioration is with the use of the
classifier KNN Euclidean, getting deteriorations up to 40,87% on RP feature or 34,64%
on the RH one. With this classifier, the deterioration is directly related to the ratio of
removed attributes, so it means that this classifier is not good using missing values in its
classifying algorithm. The KNN Manhattan has also bad classification results, related
to the number of removing attributes as well. On the other hand, Naive Bayes and J48
classifiers have results quite similar to the classification achieved with the use of all the
attributes.
(a) Mean percentage of correctly classified instances (highlighted values mean an improve-ment with respect to mixed degraded classification without attribute selection)
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67%
Difference%=%3rd%Scen%3%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%3rd%scenario
Negative%%=%deterioration%of%the%classification%with%3rd%scenarioClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,48% 49,25% 46,91% 42,93% 40,05% 45,82% Naive Bayes 0,28% 0,07% 30,27% 30,07% 30,28% 30,07%SMO PolyKernel 44,79% 58,10% 53,77% 57,68% 59,12% 55,69% SMO PolyKernel 36,24% 31,92% 32,88% 30,14% 0,21% 30,07%SMO RBFKernel 45,75% 55,49% 51,44% 49,04% 57,68% 50,75% SMO RBFKernel 0,07% 31,30% 30,07% 30,76% 30,07% 0,34%J48 43,08% 45,33% 47,26% 49,38% 51,51% 46,30% J48 33,63% 0,00% 0,55% 0,75% 0,07% 30,07%RandomForest 45,67% 52,74% 49,59% 53,01% 56,79% 49,99% RandomForest 35,62% 30,75% 0,48% 31,23% 30,41% 30,28%KNN Euclidean 42,66% 40,47% 35,67% 32,78% 57,33% 46,23% KNN Euclidean 35,62% 314,95% 311,38% 321,82% 30,07% 35,01%KNN Manhattan 45,95% 53,63% 39,10% 56,72% 57,95% 50,48% KNN Manhattan 32,74% 31,99% 39,32% 0,28% 0,14% 32,20%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 42,94% 49,18% 46,02% 43,55% 40,66% 45,88% Naive Bayes 30,27% 0,00% 31,17% 0,55% 0,34% 0,00%SMO PolyKernel 42,18% 54,74% 49,17% 57,61% 59,12% 54,32% SMO PolyKernel 38,85% 35,28% 37,48% 30,21% 0,21% 31,44%SMO RBFKernel 45,54% 54,18% 50,00% 48,42% 57,68% 50,27% SMO RBFKernel 30,14% 32,60% 31,51% 31,37% 30,07% 30,14%J48 42,59% 46,09% 47,12% 48,56% 51,51% 46,16% J48 34,12% 0,75% 0,42% 30,07% 0,07% 30,20%RandomForest 44,85% 53,56% 48,83% 53,42% 57,00% 49,17% RandomForest 36,45% 0,07% 30,27% 30,82% 30,21% 31,10%KNN Euclidean 13,65% 14,54% 28,88% 24,96% 56,85% 41,29% KNN Euclidean 334,64% 340,87% 318,17% 329,63% 30,55% 39,95%KNN Manhattan 39,50% 46,98% 40,67% 55,21% 57,75% 47,80% KNN Manhattan 39,19% 38,64% 37,75% 31,24% 30,07% 34,87%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
(b) Mean differences on classification of mixed degradations mising weakest attributes com-paring to classification of mixed degraded data sets without attribute selection, using cleanaudio data set as training set (positive values = improvement, negative values = deteriora-tion; a darker highlighted value means a better improvement).
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67%
Difference%=%3rd%Scen%3%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%3rd%scenario
Negative%%=%deterioration%of%the%classification%with%3rd%scenarioClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,48% 49,25% 46,91% 42,93% 40,05% 45,82% Naive Bayes 0,28% 0,07% 30,27% 30,07% 30,28% 30,07%SMO PolyKernel 44,79% 58,10% 53,77% 57,68% 59,12% 55,69% SMO PolyKernel 36,24% 31,92% 32,88% 30,14% 0,21% 30,07%SMO RBFKernel 45,75% 55,49% 51,44% 49,04% 57,68% 50,75% SMO RBFKernel 0,07% 31,30% 30,07% 30,76% 30,07% 0,34%J48 43,08% 45,33% 47,26% 49,38% 51,51% 46,30% J48 33,63% 0,00% 0,55% 0,75% 0,07% 30,07%RandomForest 45,67% 52,74% 49,59% 53,01% 56,79% 49,99% RandomForest 35,62% 30,75% 0,48% 31,23% 30,41% 30,28%KNN Euclidean 42,66% 40,47% 35,67% 32,78% 57,33% 46,23% KNN Euclidean 35,62% 314,95% 311,38% 321,82% 30,07% 35,01%KNN Manhattan 45,95% 53,63% 39,10% 56,72% 57,95% 50,48% KNN Manhattan 32,74% 31,99% 39,32% 0,28% 0,14% 32,20%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 42,94% 49,18% 46,02% 43,55% 40,66% 45,88% Naive Bayes 30,27% 0,00% 31,17% 0,55% 0,34% 0,00%SMO PolyKernel 42,18% 54,74% 49,17% 57,61% 59,12% 54,32% SMO PolyKernel 38,85% 35,28% 37,48% 30,21% 0,21% 31,44%SMO RBFKernel 45,54% 54,18% 50,00% 48,42% 57,68% 50,27% SMO RBFKernel 30,14% 32,60% 31,51% 31,37% 30,07% 30,14%J48 42,59% 46,09% 47,12% 48,56% 51,51% 46,16% J48 34,12% 0,75% 0,42% 30,07% 0,07% 30,20%RandomForest 44,85% 53,56% 48,83% 53,42% 57,00% 49,17% RandomForest 36,45% 0,07% 30,27% 30,82% 30,21% 31,10%KNN Euclidean 13,65% 14,54% 28,88% 24,96% 56,85% 41,29% KNN Euclidean 334,64% 340,87% 318,17% 329,63% 30,55% 39,95%KNN Manhattan 39,50% 46,98% 40,67% 55,21% 57,75% 47,80% KNN Manhattan 39,19% 38,64% 37,75% 31,24% 30,07% 34,87%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
Table 5.6: Mixed degraded classification of ISMIR data set, using clean data sets asa training set, and using all the attributes of training but missing weakest attributes of
test sets.
With ISMIR data set, we can not define the experiment results like one of the expected
results of the Section 5.3.2, because this depends on the classifier used in each case.
For Naive Bayes, SMO RBFKernel, J48 and Random Forest classifiers we achieved the
results expected in the second point, where the attributes removed are not useful for the
classification process. On the other hand, for SMO Polykernel, KNN Euclidean and KNN
Manhattan classifiers, we achieved worst results than with the first classification with all
Chapter 5. Results classifying with mixed degradations 46
the attributes, so the attribute selection using missing values is not recommended with
the use ot these classifiers.
(a) Mean percentage of correctly classified instances (highlighted values mean an improve-ment with respect to mixed degraded classification without attribute selection)
MEAN%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 24,60% 34,30% 27,50% 32,60% 37,30% 25,00%SMO PolyKernel 27,90% 45,30% 35,20% 42,20% 42,70% 29,50%SMO RBFKernel 19,90% 40,40% 27,40% 33,50% 42,50% 24,50%J48 21,00% 24,50% 22,80% 27,40% 29,90% 18,70%RandomForest 22,50% 28,00% 25,90% 36,40% 37,00% 24,40%KNN Euclidean 28,60% 36,50% 27,40% 38,20% 34,70% 20,30%KNN Manhattan 28,30% 36,90% 28,30% 40,90% 41,00% 24,50%
Difference%=%3rd%Scen%4%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%3rd%scenario
Negative%%=%deterioration%of%the%classification%with%3rd%scenarioClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 21,50% 34,80% 28,00% 33,10% 37,20% 24,90% Naive Bayes 43,10% 0,50% 0,50% 0,50% 40,10% 40,10%SMO PolyKernel 19,40% 46,70% 32,70% 42,40% 42,80% 29,50% SMO PolyKernel 48,50% 1,40% 42,50% 0,20% 0,10% 0,00%SMO RBFKernel 16,90% 40,00% 27,30% 33,60% 42,60% 24,30% SMO RBFKernel 43,00% 40,40% 40,10% 0,10% 0,10% 40,20%J48 19,90% 23,90% 23,50% 27,90% 29,90% 18,90% J48 41,10% 40,60% 0,70% 0,50% 0,00% 0,20%RandomForest 19,40% 27,60% 26,10% 36,50% 37,00% 24,30% RandomForest 43,10% 40,40% 0,20% 0,10% 0,00% 40,10%KNN Euclidean 13,20% 21,70% 13,20% 23,20% 35,70% 19,70% KNN Euclidean 415,40% 414,80% 414,20% 415,00% 1,00% 40,60%KNN Manhattan 17,10% 34,60% 24,10% 40,70% 41,00% 24,00% KNN Manhattan 411,20% 42,30% 44,20% 40,20% 0,00% 40,50%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 19,50% 34,40% 25,50% 33,30% 37,00% 24,20% Naive Bayes 45,10% 0,10% 42,00% 0,70% 40,30% 40,80%SMO PolyKernel 16,80% 36,90% 32,50% 42,40% 42,80% 22,00% SMO PolyKernel 411,10% 48,40% 42,70% 0,20% 0,10% 47,50%SMO RBFKernel 14,70% 35,30% 24,50% 34,10% 42,30% 23,20% SMO RBFKernel 45,20% 45,10% 42,90% 0,60% 40,20% 41,30%J48 17,00% 23,90% 24,00% 27,50% 29,80% 16,40% J48 44,00% 40,60% 1,20% 0,10% 40,10% 42,30%RandomForest 17,00% 28,60% 25,00% 36,00% 36,50% 22,10% RandomForest 45,50% 0,60% 40,90% 40,40% 40,50% 42,30%KNN Euclidean 12,20% 17,90% 11,40% 22,90% 36,20% 17,60% KNN Euclidean 416,40% 418,60% 416,00% 415,30% 1,50% 42,70%KNN Manhattan 13,50% 30,80% 15,70% 39,20% 40,80% 21,40% KNN Manhattan 414,80% 46,10% 412,60% 41,70% 40,20% 43,10%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
(b) Mean differences on classification of mixed degradations missing weakest attributes com-paring to classification of mixed degraded data sets without attribute selection, using cleanaudio data sets as training set (positive values = improvement, negative values = deteriora-tion; a darker highlighted value means a better improvement).
MEAN%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 24,60% 34,30% 27,50% 32,60% 37,30% 25,00%SMO PolyKernel 27,90% 45,30% 35,20% 42,20% 42,70% 29,50%SMO RBFKernel 19,90% 40,40% 27,40% 33,50% 42,50% 24,50%J48 21,00% 24,50% 22,80% 27,40% 29,90% 18,70%RandomForest 22,50% 28,00% 25,90% 36,40% 37,00% 24,40%KNN Euclidean 28,60% 36,50% 27,40% 38,20% 34,70% 20,30%KNN Manhattan 28,30% 36,90% 28,30% 40,90% 41,00% 24,50%
Difference%=%3rd%Scen%4%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%3rd%scenario
Negative%%=%deterioration%of%the%classification%with%3rd%scenarioClassifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 21,50% 34,80% 28,00% 33,10% 37,20% 24,90% Naive Bayes 43,10% 0,50% 0,50% 0,50% 40,10% 40,10%SMO PolyKernel 19,40% 46,70% 32,70% 42,40% 42,80% 29,50% SMO PolyKernel 48,50% 1,40% 42,50% 0,20% 0,10% 0,00%SMO RBFKernel 16,90% 40,00% 27,30% 33,60% 42,60% 24,30% SMO RBFKernel 43,00% 40,40% 40,10% 0,10% 0,10% 40,20%J48 19,90% 23,90% 23,50% 27,90% 29,90% 18,90% J48 41,10% 40,60% 0,70% 0,50% 0,00% 0,20%RandomForest 19,40% 27,60% 26,10% 36,50% 37,00% 24,30% RandomForest 43,10% 40,40% 0,20% 0,10% 0,00% 40,10%KNN Euclidean 13,20% 21,70% 13,20% 23,20% 35,70% 19,70% KNN Euclidean 415,40% 414,80% 414,20% 415,00% 1,00% 40,60%KNN Manhattan 17,10% 34,60% 24,10% 40,70% 41,00% 24,00% KNN Manhattan 411,20% 42,30% 44,20% 40,20% 0,00% 40,50%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 19,50% 34,40% 25,50% 33,30% 37,00% 24,20% Naive Bayes 45,10% 0,10% 42,00% 0,70% 40,30% 40,80%SMO PolyKernel 16,80% 36,90% 32,50% 42,40% 42,80% 22,00% SMO PolyKernel 411,10% 48,40% 42,70% 0,20% 0,10% 47,50%SMO RBFKernel 14,70% 35,30% 24,50% 34,10% 42,30% 23,20% SMO RBFKernel 45,20% 45,10% 42,90% 0,60% 40,20% 41,30%J48 17,00% 23,90% 24,00% 27,50% 29,80% 16,40% J48 44,00% 40,60% 1,20% 0,10% 40,10% 42,30%RandomForest 17,00% 28,60% 25,00% 36,00% 36,50% 22,10% RandomForest 45,50% 0,60% 40,90% 40,40% 40,50% 42,30%KNN Euclidean 12,20% 17,90% 11,40% 22,90% 36,20% 17,60% KNN Euclidean 416,40% 418,60% 416,00% 415,30% 1,50% 42,70%KNN Manhattan 13,50% 30,80% 15,70% 39,20% 40,80% 21,40% KNN Manhattan 414,80% 46,10% 412,60% 41,70% 40,20% 43,10%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
Table 5.7: Mixed degraded classification of GTZAN data set, using clean data setsas a training set, and using all the attributes of training but missing weakest attributes
of test sets.
In Table 5.7 we show the results of the experiment described in this section on the
classification of GTZAN data set. In this case, the results are similar as the results
achieved for the ISMIR data set: the classification depends on the classifier used instead
of the feature used. KNN Euclidean is still the worst classifier, but with less deterioration
than in ISMIR, achieving a maximum deterioration of 18,60% on RP features. Except for
the classifier KNN Euclidean, the results of the SSD are similar to the results achieved
in the non attribute selection classification, like it happened in the ISMIR data set.
As a summary, the second experiment, using all attributes on the training set and missing
values on the test set achieved worst results than the first experiment (Section 5.3.3).
However, the results achieved in the first experiment have not improved the classification
results as we expected, achieving similar results as in the classification of mixed degraded
audio using all the attributes in both data sets.
Chapter 6
Summary and Further Work
6.1 Summary
There exist several studies related to musical genre classification systems, but they are
usually related to the classification of collections of rather consistent recording quality,
using as a training data sets with the same recording quality as well. On the other hand,
we can find situations where the audio tracks that we want to classify could come from
different sources, as well as recorded with different qualities. This has a direct impact on
the genre classification due to the effect of the audio features by different audio recording
degradations.
The study reported in this thesis evaluates the impact of degradations produced in
controlled environments as well as degradations that we could find on real-world audio
recordings, coming from several popular sources. We discussed this impact comparing
psychoacoustic features extracted from clean audio tracks and several degraded versions
of the same audio tracks. Thus, as the genre classification uses this feature sets in order
to perform its work, the classification is affected as well, decreasing the percentage of
correct classified instances.
Our hypothesis was that we could select the most robust features against an aggregation
of the most common degradations, and then proceed with a new classification process
using only the robuster features selected, thus minimizing the negative impact of the
degradations on the genre classification.
47
Chapter 6. Summary and Further Work 48
Referring to the impact of degradations on psychoacoustic features, we analysed that
the effect of degradations is not evenly spread across different feature sets, i.e. some
attributes suffer more strongly than others, depending on the degradation applied as
well. In addition, some feature sets have their weaker attributes in a continuous range
(such as RH or MVD), although for others (such as RP and SSD), the effect is not located
in a focused range. In this fashion, the best way to select the robuster features is set a
threshold level on the attribute differences between clean and degraded audio for each
feature set.
Related to the effect of degradations on the classification, we observed that if training and
test sets are formed by degraded audio of the same kind of degradation, the percentage
of correct classification is similar to the percentages achieved by training and test with
the original, i.e. high quality or clean audio. In some cases, the classification is slightly
even better, such as Harmony Distortion degradation. However, in Low-pass and High-
pass filtering, the classification is significantly worst because of the removing of several
frequency bands.
In the classification of mixed degradations using high quality audio as training set, the
results achieved are worse than the classification of collections of rather consistent record-
ing quality, as we thought in the beginning. On the other hand, the strategies proposed
in order to mitigate the effect by relying only the most robust attributes failed, because
the results achieved are similar or even worse than the new models that we proposed.
Thus, classifiers may be trained using high-quality recordings and then classifing mixed
degraded audio collections, without taking special precautions to the attributes selection,
at least for the rather broad range of degradations studied.
6.2 Further work
One interesting fact that we observed during our study is that the genre classification
whole executed in a degraded environment, i.e. training set and test set, both degraded
by the same distortion, has similar results comparing with the use of clean environment.
It could be useful when we intend to classify an audio collection stemming predominantly
or purely from the same specific degradation, e.g. all the audio files that we want to
classify are from mobile phone recordings. In this case, we could define a training set of
Chapter 6. Summary and Further Work 49
audio coming from the same degradation or degraded a clean audio collection, in order
to get a degraded one.
In addition, in the results review, we get an improvement of the correct classification of
degraded audio from some specific degradations. It would be interesting to do a research
about how these degradations modify the feature attributes in order to get a better
classification, and then apply it to the traditional genre classification systems.
Related to the attribute selection, for each song that has a known degradation, we
could perform an attribute selection of the most robust attributes only of the specific
degradation of the audio track, removing the weaker ones (like the experiment in Section
5.3.3) or missing their values (like the experiment in Section 5.3.4).
On the other hand, the combination of several feature set, taking care of the different
weight of each attribute per feature, could improve the classification using the attribute
selection.
Finally, another interesting study could be to perform our experiments using the tradi-
tional features as MFCC, Chroma features, among others, and then analyse the impact
if different degradations on them and the results achieved with the implementation of
the different selection attributes experiments.
Appendix A
Worst degradations - Attribute
selection
A.1 Mean differences of ISMIR worst degradations
Mean differences between ISMIR worst degraded and clean audio with both attribute
selection levels of all the features studied are presented during the Chapter 5:
• RP features: 5.4
• RH features: 5.5
• SSD features: 5.6
• MVD features: 5.3
• TSSD features: 5.7
• TRH features: 5.8
50
Appendix A. Worst degradations - Attribute selection 51
A.2 Variance differences of ISMIR worst degradations
0 200 400 600 800 1000 1200 14000
0.05
0.1
0.15rp Original Mean
0 200 400 600 800 1000 1200 14000
0.05
0.1
rp First Selection Mean
0 200 400 600 800 1000 1200 14000
0.05
0.1
rp Second Selection Mean
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp Original Variance
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp First Selection Variance
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp Second Selection Variance
(a) RP. Tolerant selection level: 28%; Strong selection level: 14%.
0 10 20 30 40 50 600
5
10
15
rh Original Mean
0 10 20 30 40 50 600
5
10
15
rh First Selection Mean
0 10 20 30 40 50 600
5
10
15
rh Second Selection Mean
0 10 20 30 40 50 600
20
40
60rh Original Variance
0 10 20 30 40 50 600
20
40
60rh First Selection Variance
0 10 20 30 40 50 600
20
40
60rh Second Selection Variance
(b) RH. Tolerant selection level: 19%; Strong selection level: 13%.
0 20 40 60 80 100 120 140 1600
50
100
ssd Original Mean
0 20 40 60 80 100 120 140 1600
50
100
ssd First Selection Mean
0 20 40 60 80 100 120 140 1600
50
100
ssd Second Selection Mean
0 20 40 60 80 100 120 140 1600
2000
4000
6000
ssd Original Variance
0 20 40 60 80 100 120 140 1600
2000
4000
6000
ssd First Selection Variance
0 20 40 60 80 100 120 140 1600
2000
4000
6000
ssd Second Selection Variance
(c) SSD. Tolerant selection level: 82%; Strong selection level: 8%.
0 50 100 150 200 250 300 350 4000
1
2
mvd Original Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd First Selection Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd Second Selection Mean
0 50 100 150 200 250 300 350 4000
1
2mvd Original Variance
0 50 100 150 200 250 300 350 4000
1
2mvd First Selection Variance
0 50 100 150 200 250 300 350 4000
1
2mvd Second Selection Variance
(d) MVD. Tolerant selection level: 60%; Strong selection level: 20%.
0 200 400 600 800 10000
1
2x 104 tssd Original Mean
0 200 400 600 800 10000
1
2x 104 tssd First Selection Mean
0 200 400 600 800 10000
1
2x 104 tssd Second Selection Mean
0 200 400 600 800 10000
2
4x 108 tssd Original Variance
0 200 400 600 800 10000
2
4x 108 tssd First Selection Variance
0 200 400 600 800 10000
2
4x 108 tssd Second Selection Variance
_(e) TSSD. Tolerant selection level: 72,5%; Strong selection level: 7,5%.
0 50 100 150 200 250 300 350 4000
50
100trh Original Mean
0 50 100 150 200 250 300 350 4000
50
100trh First Selection Mean
0 50 100 150 200 250 300 350 4000
50
100trh Second Selection Mean
0 50 100 150 200 250 300 350 4000
5000
10000trh Original Variance
0 50 100 150 200 250 300 350 4000
5000
10000trh First Selection Variance
0 50 100 150 200 250 300 350 4000
5000
10000trh Second Selection Variance
(f) TRH. Tolerant selection level: 19%; Strong selection level: 3%.
Figure A.1: Variance differences between ISMIR worst degraded and clean audio withboth attribute selection levels of all the features studied.
Appendix A. Worst degradations - Attribute selection 52
A.3 Mean differences of GTZAN worst degradations
0 200 400 600 800 1000 1200 14000
0.05
0.1
0.15
rp Original Mean
0 200 400 600 800 1000 1200 14000
0.05
0.1
0.15
rp First Selection Mean
0 200 400 600 800 1000 1200 14000
0.05
0.1
0.15
rp Second Selection Mean
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp Original Variance
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp First Selection Variance
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp Second Selection Variance
(a) RP. Tolerant selection level: 66%; Strong selection level: 53%.
0 10 20 30 40 50 600
5
10
rh Original Mean
0 10 20 30 40 50 600
5
10
rh First Selection Mean
0 10 20 30 40 50 600
5
10
rh Second Selection Mean
0 10 20 30 40 50 600
10
20
30
rh Original Variance
0 10 20 30 40 50 600
10
20
30
rh First Selection Variance
0 10 20 30 40 50 600
10
20
30
rh Second Selection Variance
(b) RH. Tolerant selection level: 41%; Strong selection level: 30%.
0 20 40 60 80 100 120 140 1600
50
100
150ssd Original Mean
0 20 40 60 80 100 120 140 1600
50
100
150ssd First Selection Mean
0 20 40 60 80 100 120 140 1600
50
100
150ssd Second Selection Mean
0 20 40 60 80 100 120 140 1600
5000
10000
ssd Original Variance
0 20 40 60 80 100 120 140 1600
5000
10000
ssd First Selection Variance
0 20 40 60 80 100 120 140 1600
5000
10000
ssd Second Selection Variance
(c) SSD. Tolerant selection level: 40%; Strong selection level: 13%.
0 50 100 150 200 250 300 350 4000
1
2
mvd Original Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd First Selection Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd Second Selection Mean
0 50 100 150 200 250 300 350 4000
1
2mvd Original Variance
0 50 100 150 200 250 300 350 4000
1
2mvd First Selection Variance
0 50 100 150 200 250 300 350 4000
1
2mvd Second Selection Variance
(d) MVD. Tolerant selection level: 80%; Strong selection level: 32%.
0 200 400 600 800 10000
1
2
x 104 tssd Original Mean
0 200 400 600 800 10000
1
2
x 104 tssd First Selection Mean
0 200 400 600 800 10000
1
2
x 104 tssd Second Selection Mean
0 200 400 600 800 10000
2
4
6x 108 tssd Original Variance
0 200 400 600 800 10000
2
4
6x 108 tssd First Selection Variance
0 200 400 600 800 10000
2
4
6x 108 tssd Second Selection Variance
_(e) TSSD. Tolerant selection level: 43%; Strong selection level: 10%.
0 50 100 150 200 250 300 350 4000
20
40
60trh Original Mean
0 50 100 150 200 250 300 350 4000
20
40
60trh First Selection Mean
0 50 100 150 200 250 300 350 4000
20
40
60trh Second Selection Mean
0 50 100 150 200 250 300 350 4000
2000
4000
6000trh Original Variance
0 50 100 150 200 250 300 350 4000
2000
4000
6000trh First Selection Variance
0 50 100 150 200 250 300 350 4000
2000
4000
6000trh Second Selection Variance
(f) TRH. Tolerant selection level: 37%; Strong selection level: 15%.
Figure A.2: Mean differences between GTZAN worst degraded and clean audio withboth attribute selection levels of all the features studied.
Appendix A. Worst degradations - Attribute selection 53
A.4 Variance differences of GTZAN worst degradations
0 200 400 600 800 1000 1200 14000
0.05
0.1
0.15
rp Original Mean
0 200 400 600 800 1000 1200 14000
0.05
0.1
0.15
rp First Selection Mean
0 200 400 600 800 1000 1200 14000
0.05
0.1
0.15
rp Second Selection Mean
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp Original Variance
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp First Selection Variance
0 200 400 600 800 1000 1200 14000
0.01
0.02
0.03rp Second Selection Variance
(a) RP. Tolerant selection level: 37%; Strong selection level: 30%.
0 10 20 30 40 50 600
5
10
rh Original Mean
0 10 20 30 40 50 600
5
10
rh First Selection Mean
0 10 20 30 40 50 600
5
10
rh Second Selection Mean
0 10 20 30 40 50 600
10
20
30
rh Original Variance
0 10 20 30 40 50 600
10
20
30
rh First Selection Variance
0 10 20 30 40 50 600
10
20
30
rh Second Selection Variance
(b) RH. Tolerant selection level: 31%; Strong selection level: 25%.
0 20 40 60 80 100 120 140 1600
50
100
150ssd Original Mean
0 20 40 60 80 100 120 140 1600
50
100
150ssd First Selection Mean
0 20 40 60 80 100 120 140 1600
50
100
150ssd Second Selection Mean
0 20 40 60 80 100 120 140 1600
5000
10000
ssd Original Variance
0 20 40 60 80 100 120 140 1600
5000
10000
ssd First Selection Variance
0 20 40 60 80 100 120 140 1600
5000
10000
ssd Second Selection Variance
(c) SSD. Tolerant selection level: 41%; Strong selection level: 4%.
0 50 100 150 200 250 300 350 4000
1
2
mvd Original Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd First Selection Mean
0 50 100 150 200 250 300 350 4000
1
2
mvd Second Selection Mean
0 50 100 150 200 250 300 350 4000
1
2mvd Original Variance
0 50 100 150 200 250 300 350 4000
1
2mvd First Selection Variance
0 50 100 150 200 250 300 350 4000
1
2mvd Second Selection Variance
(d) MVD. Tolerant selection level: 50%; Strong selection level: 10%.
0 200 400 600 800 10000
1
2
x 104 tssd Original Mean
0 200 400 600 800 10000
1
2
x 104 tssd First Selection Mean
0 200 400 600 800 10000
1
2
x 104 tssd Second Selection Mean
0 200 400 600 800 10000
2
4
6x 108 tssd Original Variance
0 200 400 600 800 10000
2
4
6x 108 tssd First Selection Variance
0 200 400 600 800 10000
2
4
6x 108 tssd Second Selection Variance
_(e) TSSD. Tolerant selection level: 50%; Strong selection level: 2,5%.
0 50 100 150 200 250 300 350 4000
20
40
60trh Original Mean
0 50 100 150 200 250 300 350 4000
20
40
60trh First Selection Mean
0 50 100 150 200 250 300 350 4000
20
40
60trh Second Selection Mean
0 50 100 150 200 250 300 350 4000
2000
4000
6000trh Original Variance
0 50 100 150 200 250 300 350 4000
2000
4000
6000trh First Selection Variance
0 50 100 150 200 250 300 350 4000
2000
4000
6000trh Second Selection Variance
(f) TRH. Tolerant selection level: 35%; Strong selection level: 7%.
Figure A.3: Variance differences between GTZAN worst degraded and clean audiowith both attribute selection levels of all the features studied.
Appendix B
Classification of mixed degradations
In this Appendix we present all mean and variance classification results of both data sets
used in this study that belong to classification experiments of Sections 5.3.3 and 5.3.4.
The meaning of the different tables and their highlighted colours is the same for each
group:
• Mean classification results: Mean of correct classified instances percentage of all
the folds regarding the caption of the table. Highlighted values mean an improve-
ment with respect to mixed degraded classification without attribute selection.
• Difference mean classification results: Mean differences of correct classified
instances percentage of all the folds regarding the caption of the table comparing
to the same classification without attribute selection. Positive values mean im-
provement, negative values mean deterioration; a darker highlighted value means
a better improvement.
• Variance classification results: Variance between correct classified instances
percentage of all the folds regarding the caption of the table. A darker highlighted
value means a higher variance.
• Difference variance classification results: Variance differences between correct
classified instances percentage of all the folds regarding the caption of the table
comparing to the same classification without attribute selection. Positive values
mean a higher variance on the experiment with attribute selection, negative values
mean a higher variance on the experiment without attribute selection.
54
Appendix B. Classification of mixed degradations 55
B.1 Complete classification results of Section 5.3.3
(a) Mean classification results
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67%
Difference%=%2nd%Scen%3%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%2nd%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,48% 49,25% 46,91% 42,93% 40,05% 45,82% Naive Bayes 0,28% 0,07% 30,27% 30,07% 30,28% 30,07%SMO PolyKernel 48,90% 60,02% 56,44% 58,16% 58,85% 55,62% SMO PolyKernel 32,13% 0,00% 30,21% 0,34% 30,07% 30,14%SMO RBFKernel 47,26% 55,69% 51,65% 49,38% 57,75% 50,62% SMO RBFKernel 1,58% 31,10% 0,14% 30,41% 0,00% 0,21%J48 44,17% 45,75% 47,26% 48,63% 52,20% 45,96% J48 32,54% 0,41% 0,55% 0,00% 0,76% 30,41%RandomForest 48,42% 55,01% 49,24% 53,98% 54,94% 51,37% RandomForest 32,88% 1,51% 0,13% 30,27% 32,26% 1,10%KNN Euclidean 45,33% 55,28% 48,49% 54,66% 57,33% 51,10% KNN Euclidean 32,95% 30,14% 1,44% 0,07% 30,07% 30,14%KNN Manhattan 44,78% 55,14% 48,08% 56,79% 57,88% 51,23% KNN Manhattan 33,91% 30,48% 30,34% 0,34% 0,07% 31,44%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 42,94% 49,18% 46,02% 43,55% 40,66% 45,88% Naive Bayes 30,27% 0,00% 31,17% 0,55% 0,34% 0,00%SMO PolyKernel 46,91% 61,39% 55,28% 58,85% 59,12% 55,69% SMO PolyKernel 34,12% 1,37% 31,37% 1,03% 0,20% 30,07%SMO RBFKernel 45,82% 55,49% 49,86% 49,03% 57,75% 50,21% SMO RBFKernel 0,14% 31,30% 31,64% 30,76% 0,00% 30,21%J48 46,09% 46,29% 46,91% 47,12% 51,86% 45,20% J48 30,62% 0,96% 0,20% 31,51% 0,41% 31,16%RandomForest 46,56% 53,16% 49,18% 53,43% 56,65% 50,00% RandomForest 34,73% 30,34% 0,07% 30,81% 30,55% 30,27%KNN Euclidean 44,37% 54,46% 49,31% 54,87% 57,33% 50,21% KNN Euclidean 33,91% 30,96% 2,26% 0,27% 30,07% 31,03%KNN Manhattan 44,71% 54,11% 49,45% 56,17% 57,54% 50,27% KNN Manhattan 33,98% 31,51% 1,03% 30,27% 30,27% 32,40%
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
(b) Difference mean classification results
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67%
Difference%=%2nd%Scen%3%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%2nd%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,48% 49,25% 46,91% 42,93% 40,05% 45,82% Naive Bayes 0,28% 0,07% 30,27% 30,07% 30,28% 30,07%SMO PolyKernel 48,90% 60,02% 56,44% 58,16% 58,85% 55,62% SMO PolyKernel 32,13% 0,00% 30,21% 0,34% 30,07% 30,14%SMO RBFKernel 47,26% 55,69% 51,65% 49,38% 57,75% 50,62% SMO RBFKernel 1,58% 31,10% 0,14% 30,41% 0,00% 0,21%J48 44,17% 45,75% 47,26% 48,63% 52,20% 45,96% J48 32,54% 0,41% 0,55% 0,00% 0,76% 30,41%RandomForest 48,42% 55,01% 49,24% 53,98% 54,94% 51,37% RandomForest 32,88% 1,51% 0,13% 30,27% 32,26% 1,10%KNN Euclidean 45,33% 55,28% 48,49% 54,66% 57,33% 51,10% KNN Euclidean 32,95% 30,14% 1,44% 0,07% 30,07% 30,14%KNN Manhattan 44,78% 55,14% 48,08% 56,79% 57,88% 51,23% KNN Manhattan 33,91% 30,48% 30,34% 0,34% 0,07% 31,44%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 42,94% 49,18% 46,02% 43,55% 40,66% 45,88% Naive Bayes 30,27% 0,00% 31,17% 0,55% 0,34% 0,00%SMO PolyKernel 46,91% 61,39% 55,28% 58,85% 59,12% 55,69% SMO PolyKernel 34,12% 1,37% 31,37% 1,03% 0,20% 30,07%SMO RBFKernel 45,82% 55,49% 49,86% 49,03% 57,75% 50,21% SMO RBFKernel 0,14% 31,30% 31,64% 30,76% 0,00% 30,21%J48 46,09% 46,29% 46,91% 47,12% 51,86% 45,20% J48 30,62% 0,96% 0,20% 31,51% 0,41% 31,16%RandomForest 46,56% 53,16% 49,18% 53,43% 56,65% 50,00% RandomForest 34,73% 30,34% 0,07% 30,81% 30,55% 30,27%KNN Euclidean 44,37% 54,46% 49,31% 54,87% 57,33% 50,21% KNN Euclidean 33,91% 30,96% 2,26% 0,27% 30,07% 31,03%KNN Manhattan 44,71% 54,11% 49,45% 56,17% 57,54% 50,27% KNN Manhattan 33,98% 31,51% 1,03% 30,27% 30,27% 32,40%
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
Table B.1: ISMIR mean classification results of mixed degraded audio, using cleanaudio as training set, and using only the most robust attributes with tolerant selection
(a) Variance classification results
VARIANCE%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,10% 0,11% 0,03% 0,14% 0,17% 0,10%SMO PolyKernel 0,04% 0,20% 0,16% 0,08% 0,06% 0,10%SMO RBFKernel 0,03% 0,07% 0,08% 0,06% 0,08% 0,11%J48 0,13% 0,28% 0,07% 0,27% 0,08% 0,20%RandomForest 0,20% 0,08% 0,04% 0,20% 0,06% 0,13%KNN Euclidean 0,08% 0,13% 0,11% 0,07% 0,12% 0,12%KNN Manhattan 0,07% 0,14% 0,03% 0,05% 0,07% 0,18%
Difference%=%2nd%Scen%3%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,12% 0,05% 0,04% 0,14% 0,21% 0,09% Naive Bayes 0,02% 30,06% 0,01% 0,00% 0,04% 30,01%SMO PolyKernel 0,05% 0,15% 0,14% 0,07% 0,07% 0,10% SMO PolyKernel 0,01% 30,06% 30,02% 0,00% 0,00% 0,00%SMO RBFKernel 0,01% 0,09% 0,08% 0,07% 0,08% 0,10% SMO RBFKernel 30,02% 0,02% 0,00% 0,01% 0,00% 30,01%J48 0,21% 0,23% 0,07% 0,28% 0,10% 0,17% J48 0,08% 30,04% 0,00% 0,01% 0,02% 30,03%RandomForest 0,10% 0,15% 0,11% 0,15% 0,07% 0,04% RandomForest 30,10% 0,07% 0,07% 30,05% 0,01% 30,10%KNN Euclidean 0,10% 0,26% 0,15% 0,07% 0,12% 0,09% KNN Euclidean 0,02% 0,12% 0,04% 0,00% 0,00% 30,03%KNN Manhattan 0,17% 0,13% 0,07% 0,05% 0,06% 0,13% KNN Manhattan 0,10% 30,01% 0,04% 0,00% 30,01% 30,04%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,11% 0,08% 0,04% 0,17% 0,19% 0,06% Naive Bayes 0,00% 30,03% 0,01% 0,04% 0,02% 30,04%SMO PolyKernel 0,07% 0,15% 0,08% 0,11% 0,06% 0,12% SMO PolyKernel 0,03% 30,05% 30,08% 0,03% 0,00% 0,02%SMO RBFKernel 0,01% 0,06% 0,04% 0,11% 0,09% 0,10% SMO RBFKernel 30,02% 0,00% 30,04% 0,06% 0,01% 30,01%J48 0,09% 0,29% 0,06% 0,24% 0,08% 0,21% J48 30,04% 0,01% 30,01% 30,03% 0,00% 0,01%RandomForest 0,19% 0,13% 0,10% 0,16% 0,11% 0,08% RandomForest 30,01% 0,05% 0,07% 30,04% 0,04% 30,06%KNN Euclidean 0,12% 0,20% 0,13% 0,08% 0,12% 0,06% KNN Euclidean 0,04% 0,06% 0,03% 0,01% 0,00% 30,06%KNN Manhattan 0,17% 0,19% 0,20% 0,04% 0,06% 0,12% KNN Manhattan 0,10% 0,05% 0,17% 30,01% 30,01% 30,06%
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
(b) Difference variance classification results
VARIANCE%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,10% 0,11% 0,03% 0,14% 0,17% 0,10%SMO PolyKernel 0,04% 0,20% 0,16% 0,08% 0,06% 0,10%SMO RBFKernel 0,03% 0,07% 0,08% 0,06% 0,08% 0,11%J48 0,13% 0,28% 0,07% 0,27% 0,08% 0,20%RandomForest 0,20% 0,08% 0,04% 0,20% 0,06% 0,13%KNN Euclidean 0,08% 0,13% 0,11% 0,07% 0,12% 0,12%KNN Manhattan 0,07% 0,14% 0,03% 0,05% 0,07% 0,18%
Difference%=%2nd%Scen%3%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,12% 0,05% 0,04% 0,14% 0,21% 0,09% Naive Bayes 0,02% 30,06% 0,01% 0,00% 0,04% 30,01%SMO PolyKernel 0,05% 0,15% 0,14% 0,07% 0,07% 0,10% SMO PolyKernel 0,01% 30,06% 30,02% 0,00% 0,00% 0,00%SMO RBFKernel 0,01% 0,09% 0,08% 0,07% 0,08% 0,10% SMO RBFKernel 30,02% 0,02% 0,00% 0,01% 0,00% 30,01%J48 0,21% 0,23% 0,07% 0,28% 0,10% 0,17% J48 0,08% 30,04% 0,00% 0,01% 0,02% 30,03%RandomForest 0,10% 0,15% 0,11% 0,15% 0,07% 0,04% RandomForest 30,10% 0,07% 0,07% 30,05% 0,01% 30,10%KNN Euclidean 0,10% 0,26% 0,15% 0,07% 0,12% 0,09% KNN Euclidean 0,02% 0,12% 0,04% 0,00% 0,00% 30,03%KNN Manhattan 0,17% 0,13% 0,07% 0,05% 0,06% 0,13% KNN Manhattan 0,10% 30,01% 0,04% 0,00% 30,01% 30,04%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,11% 0,08% 0,04% 0,17% 0,19% 0,06% Naive Bayes 0,00% 30,03% 0,01% 0,04% 0,02% 30,04%SMO PolyKernel 0,07% 0,15% 0,08% 0,11% 0,06% 0,12% SMO PolyKernel 0,03% 30,05% 30,08% 0,03% 0,00% 0,02%SMO RBFKernel 0,01% 0,06% 0,04% 0,11% 0,09% 0,10% SMO RBFKernel 30,02% 0,00% 30,04% 0,06% 0,01% 30,01%J48 0,09% 0,29% 0,06% 0,24% 0,08% 0,21% J48 30,04% 0,01% 30,01% 30,03% 0,00% 0,01%RandomForest 0,19% 0,13% 0,10% 0,16% 0,11% 0,08% RandomForest 30,01% 0,05% 0,07% 30,04% 0,04% 30,06%KNN Euclidean 0,12% 0,20% 0,13% 0,08% 0,12% 0,06% KNN Euclidean 0,04% 0,06% 0,03% 0,01% 0,00% 30,06%KNN Manhattan 0,17% 0,19% 0,20% 0,04% 0,06% 0,12% KNN Manhattan 0,10% 0,05% 0,17% 30,01% 30,01% 30,06%
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
Table B.2: ISMIR variance classification results of mixed degraded audio, using cleanaudio as training set, and using only the most robust attributes with tolerant selection
• ISMIR mean classification results of mixed degraded audio, using clean audio as
training set, and using only the most robust attributes with strong selection are
presented on Tables 5.4 (Chapter 5).
Appendix B. Classification of mixed degradations 56
(a) Variance classification results
VARIANCE%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,10% 0,11% 0,03% 0,14% 0,17% 0,10%SMO PolyKernel 0,04% 0,20% 0,16% 0,08% 0,06% 0,10%SMO RBFKernel 0,03% 0,07% 0,08% 0,06% 0,08% 0,11%J48 0,13% 0,28% 0,07% 0,27% 0,08% 0,20%RandomForest 0,20% 0,08% 0,04% 0,20% 0,06% 0,13%KNN Euclidean 0,08% 0,13% 0,11% 0,07% 0,12% 0,12%KNN Manhattan 0,07% 0,14% 0,03% 0,05% 0,07% 0,18%
Difference%=%2nd%Scen%3%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,12% 0,05% 0,04% 0,14% 0,21% 0,09% Naive Bayes 0,02% 30,06% 0,01% 0,00% 0,04% 30,01%SMO PolyKernel 0,05% 0,15% 0,14% 0,07% 0,07% 0,10% SMO PolyKernel 0,01% 30,06% 30,02% 0,00% 0,00% 0,00%SMO RBFKernel 0,01% 0,09% 0,08% 0,07% 0,08% 0,10% SMO RBFKernel 30,02% 0,02% 0,00% 0,01% 0,00% 30,01%J48 0,21% 0,23% 0,07% 0,28% 0,10% 0,17% J48 0,08% 30,04% 0,00% 0,01% 0,02% 30,03%RandomForest 0,10% 0,15% 0,11% 0,15% 0,07% 0,04% RandomForest 30,10% 0,07% 0,07% 30,05% 0,01% 30,10%KNN Euclidean 0,10% 0,26% 0,15% 0,07% 0,12% 0,09% KNN Euclidean 0,02% 0,12% 0,04% 0,00% 0,00% 30,03%KNN Manhattan 0,17% 0,13% 0,07% 0,05% 0,06% 0,13% KNN Manhattan 0,10% 30,01% 0,04% 0,00% 30,01% 30,04%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,11% 0,08% 0,04% 0,17% 0,19% 0,06% Naive Bayes 0,00% 30,03% 0,01% 0,04% 0,02% 30,04%SMO PolyKernel 0,07% 0,15% 0,08% 0,11% 0,06% 0,12% SMO PolyKernel 0,03% 30,05% 30,08% 0,03% 0,00% 0,02%SMO RBFKernel 0,01% 0,06% 0,04% 0,11% 0,09% 0,10% SMO RBFKernel 30,02% 0,00% 30,04% 0,06% 0,01% 30,01%J48 0,09% 0,29% 0,06% 0,24% 0,08% 0,21% J48 30,04% 0,01% 30,01% 30,03% 0,00% 0,01%RandomForest 0,19% 0,13% 0,10% 0,16% 0,11% 0,08% RandomForest 30,01% 0,05% 0,07% 30,04% 0,04% 30,06%KNN Euclidean 0,12% 0,20% 0,13% 0,08% 0,12% 0,06% KNN Euclidean 0,04% 0,06% 0,03% 0,01% 0,00% 30,06%KNN Manhattan 0,17% 0,19% 0,20% 0,04% 0,06% 0,12% KNN Manhattan 0,10% 0,05% 0,17% 30,01% 30,01% 30,06%
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
(b) Difference variance classification results
VARIANCE%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,10% 0,11% 0,03% 0,14% 0,17% 0,10%SMO PolyKernel 0,04% 0,20% 0,16% 0,08% 0,06% 0,10%SMO RBFKernel 0,03% 0,07% 0,08% 0,06% 0,08% 0,11%J48 0,13% 0,28% 0,07% 0,27% 0,08% 0,20%RandomForest 0,20% 0,08% 0,04% 0,20% 0,06% 0,13%KNN Euclidean 0,08% 0,13% 0,11% 0,07% 0,12% 0,12%KNN Manhattan 0,07% 0,14% 0,03% 0,05% 0,07% 0,18%
Difference%=%2nd%Scen%3%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,12% 0,05% 0,04% 0,14% 0,21% 0,09% Naive Bayes 0,02% 30,06% 0,01% 0,00% 0,04% 30,01%SMO PolyKernel 0,05% 0,15% 0,14% 0,07% 0,07% 0,10% SMO PolyKernel 0,01% 30,06% 30,02% 0,00% 0,00% 0,00%SMO RBFKernel 0,01% 0,09% 0,08% 0,07% 0,08% 0,10% SMO RBFKernel 30,02% 0,02% 0,00% 0,01% 0,00% 30,01%J48 0,21% 0,23% 0,07% 0,28% 0,10% 0,17% J48 0,08% 30,04% 0,00% 0,01% 0,02% 30,03%RandomForest 0,10% 0,15% 0,11% 0,15% 0,07% 0,04% RandomForest 30,10% 0,07% 0,07% 30,05% 0,01% 30,10%KNN Euclidean 0,10% 0,26% 0,15% 0,07% 0,12% 0,09% KNN Euclidean 0,02% 0,12% 0,04% 0,00% 0,00% 30,03%KNN Manhattan 0,17% 0,13% 0,07% 0,05% 0,06% 0,13% KNN Manhattan 0,10% 30,01% 0,04% 0,00% 30,01% 30,04%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,11% 0,08% 0,04% 0,17% 0,19% 0,06% Naive Bayes 0,00% 30,03% 0,01% 0,04% 0,02% 30,04%SMO PolyKernel 0,07% 0,15% 0,08% 0,11% 0,06% 0,12% SMO PolyKernel 0,03% 30,05% 30,08% 0,03% 0,00% 0,02%SMO RBFKernel 0,01% 0,06% 0,04% 0,11% 0,09% 0,10% SMO RBFKernel 30,02% 0,00% 30,04% 0,06% 0,01% 30,01%J48 0,09% 0,29% 0,06% 0,24% 0,08% 0,21% J48 30,04% 0,01% 30,01% 30,03% 0,00% 0,01%RandomForest 0,19% 0,13% 0,10% 0,16% 0,11% 0,08% RandomForest 30,01% 0,05% 0,07% 30,04% 0,04% 30,06%KNN Euclidean 0,12% 0,20% 0,13% 0,08% 0,12% 0,06% KNN Euclidean 0,04% 0,06% 0,03% 0,01% 0,00% 30,06%KNN Manhattan 0,17% 0,19% 0,20% 0,04% 0,06% 0,12% KNN Manhattan 0,10% 0,05% 0,17% 30,01% 30,01% 30,06%
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
Table B.3: ISMIR variance classification results of mixed degraded audio, using cleanaudio as training set, and using only the most robust attributes with strong selection
(a) Mean classification results
MEAN%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 24,60% 34,30% 27,50% 32,60% 37,30% 25,00%SMO PolyKernel 27,90% 45,30% 35,20% 42,20% 42,70% 29,50%SMO RBFKernel 19,90% 40,40% 27,40% 33,50% 42,50% 24,50%J48 21,00% 24,50% 22,80% 27,40% 29,90% 18,70%RandomForest 22,50% 28,00% 25,90% 36,40% 37,00% 24,40%KNN Euclidean 28,60% 36,50% 27,40% 38,20% 34,70% 20,30%KNN Manhattan 28,30% 36,90% 28,30% 40,90% 41,00% 24,50%
Difference%=%2nd%Scen%4%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 21,50% 34,80% 28,00% 33,10% 37,20% 24,90% Naive Bayes 43,10% 0,50% 0,50% 0,50% 40,10% 40,10%SMO PolyKernel 23,30% 46,30% 36,60% 42,90% 42,70% 29,80% SMO PolyKernel 44,60% 1,00% 1,40% 0,70% 0,00% 0,30%SMO RBFKernel 16,50% 40,00% 27,50% 33,80% 42,50% 25,30% SMO RBFKernel 43,40% 40,40% 0,10% 0,30% 0,00% 0,80%J48 20,30% 23,50% 22,30% 27,90% 30,10% 18,70% J48 40,70% 41,00% 40,50% 0,50% 0,20% 0,00%RandomForest 21,30% 27,80% 24,90% 37,10% 36,30% 22,80% RandomForest 41,20% 40,20% 41,00% 0,70% 40,70% 41,60%KNN Euclidean 24,70% 36,40% 28,80% 39,00% 34,70% 20,50% KNN Euclidean 43,90% 40,10% 1,40% 0,80% 0,00% 0,20%KNN Manhattan 25,20% 35,80% 27,40% 40,60% 41,20% 24,20% KNN Manhattan 43,10% 41,10% 40,90% 40,30% 0,20% 40,30%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 19,50% 34,40% 25,50% 33,30% 37,00% 24,20% Naive Bayes 45,10% 0,10% 42,00% 0,70% 40,30% 40,80%SMO PolyKernel 19,70% 44,90% 34,50% 42,70% 42,60% 25,80% SMO PolyKernel 48,20% 40,40% 40,70% 0,50% 40,10% 43,70%SMO RBFKernel 16,90% 37,90% 24,10% 34,00% 42,50% 23,60% SMO RBFKernel 43,00% 42,50% 43,30% 0,50% 0,00% 40,90%J48 19,40% 23,00% 22,20% 26,90% 29,90% 18,20% J48 41,60% 41,50% 40,60% 40,50% 0,00% 40,50%RandomForest 20,80% 27,30% 27,10% 38,10% 34,00% 22,40% RandomForest 41,70% 40,70% 1,20% 1,70% 43,00% 42,00%KNN Euclidean 22,70% 35,50% 29,50% 39,10% 34,80% 19,40% KNN Euclidean 45,90% 41,00% 2,10% 0,90% 0,10% 40,90%KNN Manhattan 23,40% 35,10% 29,50% 40,40% 41,10% 23,00% KNN Manhattan 44,90% 41,80% 1,20% 40,50% 0,10% 41,50%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
Positive%%=%improving%of%the%classification%with%2nd%scenario
(b) Difference mean classification results
MEAN%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 24,60% 34,30% 27,50% 32,60% 37,30% 25,00%SMO PolyKernel 27,90% 45,30% 35,20% 42,20% 42,70% 29,50%SMO RBFKernel 19,90% 40,40% 27,40% 33,50% 42,50% 24,50%J48 21,00% 24,50% 22,80% 27,40% 29,90% 18,70%RandomForest 22,50% 28,00% 25,90% 36,40% 37,00% 24,40%KNN Euclidean 28,60% 36,50% 27,40% 38,20% 34,70% 20,30%KNN Manhattan 28,30% 36,90% 28,30% 40,90% 41,00% 24,50%
Difference%=%2nd%Scen%4%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 21,50% 34,80% 28,00% 33,10% 37,20% 24,90% Naive Bayes 43,10% 0,50% 0,50% 0,50% 40,10% 40,10%SMO PolyKernel 23,30% 46,30% 36,60% 42,90% 42,70% 29,80% SMO PolyKernel 44,60% 1,00% 1,40% 0,70% 0,00% 0,30%SMO RBFKernel 16,50% 40,00% 27,50% 33,80% 42,50% 25,30% SMO RBFKernel 43,40% 40,40% 0,10% 0,30% 0,00% 0,80%J48 20,30% 23,50% 22,30% 27,90% 30,10% 18,70% J48 40,70% 41,00% 40,50% 0,50% 0,20% 0,00%RandomForest 21,30% 27,80% 24,90% 37,10% 36,30% 22,80% RandomForest 41,20% 40,20% 41,00% 0,70% 40,70% 41,60%KNN Euclidean 24,70% 36,40% 28,80% 39,00% 34,70% 20,50% KNN Euclidean 43,90% 40,10% 1,40% 0,80% 0,00% 0,20%KNN Manhattan 25,20% 35,80% 27,40% 40,60% 41,20% 24,20% KNN Manhattan 43,10% 41,10% 40,90% 40,30% 0,20% 40,30%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 19,50% 34,40% 25,50% 33,30% 37,00% 24,20% Naive Bayes 45,10% 0,10% 42,00% 0,70% 40,30% 40,80%SMO PolyKernel 19,70% 44,90% 34,50% 42,70% 42,60% 25,80% SMO PolyKernel 48,20% 40,40% 40,70% 0,50% 40,10% 43,70%SMO RBFKernel 16,90% 37,90% 24,10% 34,00% 42,50% 23,60% SMO RBFKernel 43,00% 42,50% 43,30% 0,50% 0,00% 40,90%J48 19,40% 23,00% 22,20% 26,90% 29,90% 18,20% J48 41,60% 41,50% 40,60% 40,50% 0,00% 40,50%RandomForest 20,80% 27,30% 27,10% 38,10% 34,00% 22,40% RandomForest 41,70% 40,70% 1,20% 1,70% 43,00% 42,00%KNN Euclidean 22,70% 35,50% 29,50% 39,10% 34,80% 19,40% KNN Euclidean 45,90% 41,00% 2,10% 0,90% 0,10% 40,90%KNN Manhattan 23,40% 35,10% 29,50% 40,40% 41,10% 23,00% KNN Manhattan 44,90% 41,80% 1,20% 40,50% 0,10% 41,50%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
Positive%%=%improving%of%the%classification%with%2nd%scenario
Table B.4: GTZAN mean classification results of mixed degraded audio, using cleanaudio as training set, and using only the most robust attributes with tolerant selection
Appendix B. Classification of mixed degradations 57
(a) Variance classification results
VARIANCE%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,04% 0,11% 0,09% 0,21% 0,06% 0,05%SMO PolyKernel 0,11% 0,20% 0,08% 0,14% 0,28% 0,27%SMO RBFKernel 0,06% 0,08% 0,12% 0,12% 0,40% 0,13%J48 0,22% 0,25% 0,11% 0,19% 0,16% 0,12%RandomForest 0,13% 0,27% 0,10% 0,17% 0,10% 0,09%KNN Euclidean 0,11% 0,31% 0,12% 0,18% 0,11% 0,25%KNN Manhattan 0,04% 0,14% 0,11% 0,18% 0,11% 0,20%
Difference%=%2nd%Scen%4%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,08% 0,11% 0,10% 0,16% 0,07% 0,05% Naive Bayes 0,03% 0,00% 0,00% 40,05% 0,01% 0,00%SMO PolyKernel 0,11% 0,16% 0,12% 0,11% 0,25% 0,30% SMO PolyKernel 40,01% 40,04% 0,05% 40,03% 40,03% 0,03%SMO RBFKernel 0,03% 0,13% 0,13% 0,12% 0,37% 0,12% SMO RBFKernel 40,03% 0,05% 0,02% 0,00% 40,03% 0,00%J48 0,21% 0,09% 0,11% 0,09% 0,15% 0,12% J48 40,01% 40,16% 0,00% 40,10% 0,00% 0,00%RandomForest 0,12% 0,21% 0,13% 0,09% 0,16% 0,12% RandomForest 40,01% 40,06% 0,03% 40,08% 0,06% 0,03%KNN Euclidean 0,10% 0,19% 0,24% 0,17% 0,10% 0,23% KNN Euclidean 0,00% 40,12% 0,12% 40,01% 40,02% 40,01%KNN Manhattan 0,18% 0,18% 0,17% 0,13% 0,14% 0,21% KNN Manhattan 0,14% 0,04% 0,05% 40,05% 0,03% 0,01%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,09% 0,14% 0,05% 0,16% 0,07% 0,05% Naive Bayes 0,04% 0,04% 40,04% 40,05% 0,01% 0,00%SMO PolyKernel 0,04% 0,38% 0,07% 0,06% 0,24% 0,16% SMO PolyKernel 40,07% 0,17% 0,00% 40,07% 40,04% 40,11%SMO RBFKernel 0,04% 0,05% 0,09% 0,16% 0,40% 0,13% SMO RBFKernel 40,02% 40,03% 40,03% 0,04% 0,00% 0,00%J48 0,11% 0,06% 0,08% 0,08% 0,18% 0,03% J48 40,11% 40,19% 40,03% 40,11% 0,02% 40,09%RandomForest 0,09% 0,12% 0,09% 0,05% 0,31% 0,03% RandomForest 40,04% 40,14% 40,02% 40,12% 0,21% 40,07%KNN Euclidean 0,13% 0,24% 0,08% 0,16% 0,10% 0,14% KNN Euclidean 0,03% 40,07% 40,04% 40,01% 40,02% 40,11%KNN Manhattan 0,11% 0,23% 0,06% 0,15% 0,14% 0,11% KNN Manhattan 0,07% 0,09% 40,06% 40,03% 0,03% 40,09%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
Positive%=%lower%variance%in%1st%scenario
(b) Difference variance classification results
VARIANCE%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,04% 0,11% 0,09% 0,21% 0,06% 0,05%SMO PolyKernel 0,11% 0,20% 0,08% 0,14% 0,28% 0,27%SMO RBFKernel 0,06% 0,08% 0,12% 0,12% 0,40% 0,13%J48 0,22% 0,25% 0,11% 0,19% 0,16% 0,12%RandomForest 0,13% 0,27% 0,10% 0,17% 0,10% 0,09%KNN Euclidean 0,11% 0,31% 0,12% 0,18% 0,11% 0,25%KNN Manhattan 0,04% 0,14% 0,11% 0,18% 0,11% 0,20%
Difference%=%2nd%Scen%4%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,08% 0,11% 0,10% 0,16% 0,07% 0,05% Naive Bayes 0,03% 0,00% 0,00% 40,05% 0,01% 0,00%SMO PolyKernel 0,11% 0,16% 0,12% 0,11% 0,25% 0,30% SMO PolyKernel 40,01% 40,04% 0,05% 40,03% 40,03% 0,03%SMO RBFKernel 0,03% 0,13% 0,13% 0,12% 0,37% 0,12% SMO RBFKernel 40,03% 0,05% 0,02% 0,00% 40,03% 0,00%J48 0,21% 0,09% 0,11% 0,09% 0,15% 0,12% J48 40,01% 40,16% 0,00% 40,10% 0,00% 0,00%RandomForest 0,12% 0,21% 0,13% 0,09% 0,16% 0,12% RandomForest 40,01% 40,06% 0,03% 40,08% 0,06% 0,03%KNN Euclidean 0,10% 0,19% 0,24% 0,17% 0,10% 0,23% KNN Euclidean 0,00% 40,12% 0,12% 40,01% 40,02% 40,01%KNN Manhattan 0,18% 0,18% 0,17% 0,13% 0,14% 0,21% KNN Manhattan 0,14% 0,04% 0,05% 40,05% 0,03% 0,01%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,09% 0,14% 0,05% 0,16% 0,07% 0,05% Naive Bayes 0,04% 0,04% 40,04% 40,05% 0,01% 0,00%SMO PolyKernel 0,04% 0,38% 0,07% 0,06% 0,24% 0,16% SMO PolyKernel 40,07% 0,17% 0,00% 40,07% 40,04% 40,11%SMO RBFKernel 0,04% 0,05% 0,09% 0,16% 0,40% 0,13% SMO RBFKernel 40,02% 40,03% 40,03% 0,04% 0,00% 0,00%J48 0,11% 0,06% 0,08% 0,08% 0,18% 0,03% J48 40,11% 40,19% 40,03% 40,11% 0,02% 40,09%RandomForest 0,09% 0,12% 0,09% 0,05% 0,31% 0,03% RandomForest 40,04% 40,14% 40,02% 40,12% 0,21% 40,07%KNN Euclidean 0,13% 0,24% 0,08% 0,16% 0,10% 0,14% KNN Euclidean 0,03% 40,07% 40,04% 40,01% 40,02% 40,11%KNN Manhattan 0,11% 0,23% 0,06% 0,15% 0,14% 0,11% KNN Manhattan 0,07% 0,09% 40,06% 40,03% 0,03% 40,09%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
Positive%=%lower%variance%in%1st%scenario
Table B.5: GTZAN variance classification results of mixed degraded audio, usingclean audio as training set, and using only the most robust attributes with tolerant
selection
• GTZAN mean classification results of mixed degraded audio, using clean audio as
training set, and using only the most robust attributes with strong selection are
presented on Tables 5.5 (Chapter 5).
(a) Variance classification results
VARIANCE%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,04% 0,11% 0,09% 0,21% 0,06% 0,05%SMO PolyKernel 0,11% 0,20% 0,08% 0,14% 0,28% 0,27%SMO RBFKernel 0,06% 0,08% 0,12% 0,12% 0,40% 0,13%J48 0,22% 0,25% 0,11% 0,19% 0,16% 0,12%RandomForest 0,13% 0,27% 0,10% 0,17% 0,10% 0,09%KNN Euclidean 0,11% 0,31% 0,12% 0,18% 0,11% 0,25%KNN Manhattan 0,04% 0,14% 0,11% 0,18% 0,11% 0,20%
Difference%=%2nd%Scen%4%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,08% 0,11% 0,10% 0,16% 0,07% 0,05% Naive Bayes 0,03% 0,00% 0,00% 40,05% 0,01% 0,00%SMO PolyKernel 0,11% 0,16% 0,12% 0,11% 0,25% 0,30% SMO PolyKernel 40,01% 40,04% 0,05% 40,03% 40,03% 0,03%SMO RBFKernel 0,03% 0,13% 0,13% 0,12% 0,37% 0,12% SMO RBFKernel 40,03% 0,05% 0,02% 0,00% 40,03% 0,00%J48 0,21% 0,09% 0,11% 0,09% 0,15% 0,12% J48 40,01% 40,16% 0,00% 40,10% 0,00% 0,00%RandomForest 0,12% 0,21% 0,13% 0,09% 0,16% 0,12% RandomForest 40,01% 40,06% 0,03% 40,08% 0,06% 0,03%KNN Euclidean 0,10% 0,19% 0,24% 0,17% 0,10% 0,23% KNN Euclidean 0,00% 40,12% 0,12% 40,01% 40,02% 40,01%KNN Manhattan 0,18% 0,18% 0,17% 0,13% 0,14% 0,21% KNN Manhattan 0,14% 0,04% 0,05% 40,05% 0,03% 0,01%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,09% 0,14% 0,05% 0,16% 0,07% 0,05% Naive Bayes 0,04% 0,04% 40,04% 40,05% 0,01% 0,00%SMO PolyKernel 0,04% 0,38% 0,07% 0,06% 0,24% 0,16% SMO PolyKernel 40,07% 0,17% 0,00% 40,07% 40,04% 40,11%SMO RBFKernel 0,04% 0,05% 0,09% 0,16% 0,40% 0,13% SMO RBFKernel 40,02% 40,03% 40,03% 0,04% 0,00% 0,00%J48 0,11% 0,06% 0,08% 0,08% 0,18% 0,03% J48 40,11% 40,19% 40,03% 40,11% 0,02% 40,09%RandomForest 0,09% 0,12% 0,09% 0,05% 0,31% 0,03% RandomForest 40,04% 40,14% 40,02% 40,12% 0,21% 40,07%KNN Euclidean 0,13% 0,24% 0,08% 0,16% 0,10% 0,14% KNN Euclidean 0,03% 40,07% 40,04% 40,01% 40,02% 40,11%KNN Manhattan 0,11% 0,23% 0,06% 0,15% 0,14% 0,11% KNN Manhattan 0,07% 0,09% 40,06% 40,03% 0,03% 40,09%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
Positive%=%lower%variance%in%1st%scenario
(b) Difference variance classification results
VARIANCE%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%2nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,04% 0,11% 0,09% 0,21% 0,06% 0,05%SMO PolyKernel 0,11% 0,20% 0,08% 0,14% 0,28% 0,27%SMO RBFKernel 0,06% 0,08% 0,12% 0,12% 0,40% 0,13%J48 0,22% 0,25% 0,11% 0,19% 0,16% 0,12%RandomForest 0,13% 0,27% 0,10% 0,17% 0,10% 0,09%KNN Euclidean 0,11% 0,31% 0,12% 0,18% 0,11% 0,25%KNN Manhattan 0,04% 0,14% 0,11% 0,18% 0,11% 0,20%
Difference%=%2nd%Scen%4%1st%Scen2nd$Scenario:$Most$robust$attributes$(1st$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,08% 0,11% 0,10% 0,16% 0,07% 0,05% Naive Bayes 0,03% 0,00% 0,00% 40,05% 0,01% 0,00%SMO PolyKernel 0,11% 0,16% 0,12% 0,11% 0,25% 0,30% SMO PolyKernel 40,01% 40,04% 0,05% 40,03% 40,03% 0,03%SMO RBFKernel 0,03% 0,13% 0,13% 0,12% 0,37% 0,12% SMO RBFKernel 40,03% 0,05% 0,02% 0,00% 40,03% 0,00%J48 0,21% 0,09% 0,11% 0,09% 0,15% 0,12% J48 40,01% 40,16% 0,00% 40,10% 0,00% 0,00%RandomForest 0,12% 0,21% 0,13% 0,09% 0,16% 0,12% RandomForest 40,01% 40,06% 0,03% 40,08% 0,06% 0,03%KNN Euclidean 0,10% 0,19% 0,24% 0,17% 0,10% 0,23% KNN Euclidean 0,00% 40,12% 0,12% 40,01% 40,02% 40,01%KNN Manhattan 0,18% 0,18% 0,17% 0,13% 0,14% 0,21% KNN Manhattan 0,14% 0,04% 0,05% 40,05% 0,03% 0,01%
2nd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,09% 0,14% 0,05% 0,16% 0,07% 0,05% Naive Bayes 0,04% 0,04% 40,04% 40,05% 0,01% 0,00%SMO PolyKernel 0,04% 0,38% 0,07% 0,06% 0,24% 0,16% SMO PolyKernel 40,07% 0,17% 0,00% 40,07% 40,04% 40,11%SMO RBFKernel 0,04% 0,05% 0,09% 0,16% 0,40% 0,13% SMO RBFKernel 40,02% 40,03% 40,03% 0,04% 0,00% 0,00%J48 0,11% 0,06% 0,08% 0,08% 0,18% 0,03% J48 40,11% 40,19% 40,03% 40,11% 0,02% 40,09%RandomForest 0,09% 0,12% 0,09% 0,05% 0,31% 0,03% RandomForest 40,04% 40,14% 40,02% 40,12% 0,21% 40,07%KNN Euclidean 0,13% 0,24% 0,08% 0,16% 0,10% 0,14% KNN Euclidean 0,03% 40,07% 40,04% 40,01% 40,02% 40,11%KNN Manhattan 0,11% 0,23% 0,06% 0,15% 0,14% 0,11% KNN Manhattan 0,07% 0,09% 40,06% 40,03% 0,03% 40,09%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
2nd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%using%the%most%robust%attributes%in%both%cases.%Two%levels%of%selectivity.
Positive%=%lower%variance%in%1st%scenario
Table B.6: GTZAN variance classification results of mixed degraded audio, usingclean audio as training set, and using only the most robust attributes with strong
selection
Appendix B. Classification of mixed degradations 58
B.2 Complete classification results of Section 5.3.4
(a) Mean classification results
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67%
Difference%=%3rd%Scen%3%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%3rd%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,48% 49,25% 46,91% 42,93% 40,05% 45,82% Naive Bayes 0,28% 0,07% 30,27% 30,07% 30,28% 30,07%SMO PolyKernel 44,79% 58,10% 53,77% 57,68% 59,12% 55,69% SMO PolyKernel 36,24% 31,92% 32,88% 30,14% 0,21% 30,07%SMO RBFKernel 45,75% 55,49% 51,44% 49,04% 57,68% 50,75% SMO RBFKernel 0,07% 31,30% 30,07% 30,76% 30,07% 0,34%J48 43,08% 45,33% 47,26% 49,38% 51,51% 46,30% J48 33,63% 0,00% 0,55% 0,75% 0,07% 30,07%RandomForest 45,67% 52,74% 49,59% 53,01% 56,79% 49,99% RandomForest 35,62% 30,75% 0,48% 31,23% 30,41% 30,28%KNN Euclidean 42,66% 40,47% 35,67% 32,78% 57,33% 46,23% KNN Euclidean 35,62% 314,95% 311,38% 321,82% 30,07% 35,01%KNN Manhattan 45,95% 53,63% 39,10% 56,72% 57,95% 50,48% KNN Manhattan 32,74% 31,99% 39,32% 0,28% 0,14% 32,20%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 42,94% 49,18% 46,02% 43,55% 40,66% 45,88% Naive Bayes 30,27% 0,00% 31,17% 0,55% 0,34% 0,00%SMO PolyKernel 42,18% 54,74% 49,17% 57,61% 59,12% 54,32% SMO PolyKernel 38,85% 35,28% 37,48% 30,21% 0,21% 31,44%SMO RBFKernel 45,54% 54,18% 50,00% 48,42% 57,68% 50,27% SMO RBFKernel 30,14% 32,60% 31,51% 31,37% 30,07% 30,14%J48 42,59% 46,09% 47,12% 48,56% 51,51% 46,16% J48 34,12% 0,75% 0,42% 30,07% 0,07% 30,20%RandomForest 44,85% 53,56% 48,83% 53,42% 57,00% 49,17% RandomForest 36,45% 0,07% 30,27% 30,82% 30,21% 31,10%KNN Euclidean 13,65% 14,54% 28,88% 24,96% 56,85% 41,29% KNN Euclidean 334,64% 340,87% 318,17% 329,63% 30,55% 39,95%KNN Manhattan 39,50% 46,98% 40,67% 55,21% 57,75% 47,80% KNN Manhattan 39,19% 38,64% 37,75% 31,24% 30,07% 34,87%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
(b) Difference mean classification results
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,21% 49,18% 47,19% 43,00% 40,32% 45,89%SMO PolyKernel 51,03% 60,02% 56,65% 57,82% 58,92% 55,76%SMO RBFKernel 45,68% 56,79% 51,51% 49,79% 57,75% 50,41%J48 46,71% 45,34% 46,71% 48,63% 51,44% 46,37%RandomForest 51,30% 53,50% 49,11% 54,25% 57,20% 50,27%KNN Euclidean 48,28% 55,41% 47,05% 54,60% 57,40% 51,24%KNN Manhattan 48,69% 55,63% 48,42% 56,44% 57,82% 52,67%
Difference%=%3rd%Scen%3%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%3rd%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 43,48% 49,25% 46,91% 42,93% 40,05% 45,82% Naive Bayes 0,28% 0,07% 30,27% 30,07% 30,28% 30,07%SMO PolyKernel 44,79% 58,10% 53,77% 57,68% 59,12% 55,69% SMO PolyKernel 36,24% 31,92% 32,88% 30,14% 0,21% 30,07%SMO RBFKernel 45,75% 55,49% 51,44% 49,04% 57,68% 50,75% SMO RBFKernel 0,07% 31,30% 30,07% 30,76% 30,07% 0,34%J48 43,08% 45,33% 47,26% 49,38% 51,51% 46,30% J48 33,63% 0,00% 0,55% 0,75% 0,07% 30,07%RandomForest 45,67% 52,74% 49,59% 53,01% 56,79% 49,99% RandomForest 35,62% 30,75% 0,48% 31,23% 30,41% 30,28%KNN Euclidean 42,66% 40,47% 35,67% 32,78% 57,33% 46,23% KNN Euclidean 35,62% 314,95% 311,38% 321,82% 30,07% 35,01%KNN Manhattan 45,95% 53,63% 39,10% 56,72% 57,95% 50,48% KNN Manhattan 32,74% 31,99% 39,32% 0,28% 0,14% 32,20%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 42,94% 49,18% 46,02% 43,55% 40,66% 45,88% Naive Bayes 30,27% 0,00% 31,17% 0,55% 0,34% 0,00%SMO PolyKernel 42,18% 54,74% 49,17% 57,61% 59,12% 54,32% SMO PolyKernel 38,85% 35,28% 37,48% 30,21% 0,21% 31,44%SMO RBFKernel 45,54% 54,18% 50,00% 48,42% 57,68% 50,27% SMO RBFKernel 30,14% 32,60% 31,51% 31,37% 30,07% 30,14%J48 42,59% 46,09% 47,12% 48,56% 51,51% 46,16% J48 34,12% 0,75% 0,42% 30,07% 0,07% 30,20%RandomForest 44,85% 53,56% 48,83% 53,42% 57,00% 49,17% RandomForest 36,45% 0,07% 30,27% 30,82% 30,21% 31,10%KNN Euclidean 13,65% 14,54% 28,88% 24,96% 56,85% 41,29% KNN Euclidean 334,64% 340,87% 318,17% 329,63% 30,55% 39,95%KNN Manhattan 39,50% 46,98% 40,67% 55,21% 57,75% 47,80% KNN Manhattan 39,19% 38,64% 37,75% 31,24% 30,07% 34,87%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
Table B.7: ISMIR mean classification results of mixed degraded audio, using cleanaudio as training set, and missing the weakest attributes with tolerant selection
(a) Variance classification results
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,10% 0,11% 0,03% 0,14% 0,17% 0,10%SMO PolyKernel 0,04% 0,20% 0,16% 0,08% 0,06% 0,10%SMO RBFKernel 0,03% 0,07% 0,08% 0,06% 0,08% 0,11%J48 0,13% 0,28% 0,07% 0,27% 0,08% 0,20%RandomForest 0,20% 0,08% 0,04% 0,20% 0,06% 0,13%KNN Euclidean 0,08% 0,13% 0,11% 0,07% 0,12% 0,12%KNN Manhattan 0,07% 0,14% 0,03% 0,05% 0,07% 0,18%
Difference%=%3rd%Scen%3%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,12% 0,05% 0,04% 0,14% 0,21% 0,09% Naive Bayes 0,02% 30,06% 0,01% 0,00% 0,04% 30,01%SMO PolyKernel 0,08% 0,08% 0,08% 0,09% 0,07% 0,11% SMO PolyKernel 0,04% 30,12% 30,08% 0,01% 0,01% 0,01%SMO RBFKernel 0,01% 0,08% 0,06% 0,07% 0,08% 0,10% SMO RBFKernel 30,02% 0,01% 30,02% 0,01% 0,00% 30,01%J48 0,13% 0,16% 0,09% 0,17% 0,09% 0,21% J48 0,00% 30,11% 0,02% 30,10% 0,01% 0,01%RandomForest 0,10% 0,07% 0,04% 0,14% 0,09% 0,14% RandomForest 30,09% 30,01% 0,01% 30,06% 0,03% 0,01%KNN Euclidean 0,05% 0,13% 0,23% 0,10% 0,11% 0,08% KNN Euclidean 30,04% 0,00% 0,12% 0,04% 0,00% 30,04%KNN Manhattan 0,05% 0,15% 0,18% 0,04% 0,07% 0,10% KNN Manhattan 30,02% 0,01% 0,15% 30,01% 0,00% 30,08%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,11% 0,08% 0,04% 0,17% 0,19% 0,06% Naive Bayes 0,00% 30,03% 0,01% 0,04% 0,02% 30,04%SMO PolyKernel 0,08% 0,10% 0,22% 0,07% 0,06% 0,07% SMO PolyKernel 0,04% 30,11% 0,06% 30,01% 0,00% 30,03%SMO RBFKernel 0,01% 0,05% 0,01% 0,10% 0,09% 0,07% SMO RBFKernel 30,02% 30,02% 30,07% 0,05% 0,00% 30,04%J48 0,10% 0,11% 0,10% 0,10% 0,08% 0,23% J48 30,03% 30,17% 0,03% 30,17% 0,00% 0,03%RandomForest 0,09% 0,08% 0,13% 0,13% 0,12% 0,12% RandomForest 30,11% 30,01% 0,09% 30,07% 0,06% 30,01%KNN Euclidean 0,18% 0,01% 0,06% 0,10% 0,09% 0,11% KNN Euclidean 0,10% 30,12% 30,05% 0,03% 30,03% 30,01%KNN Manhattan 0,04% 0,10% 0,08% 0,11% 0,06% 0,08% KNN Manhattan 30,03% 30,04% 0,04% 0,06% 30,01% 30,10%
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
(b) Difference variance classification results
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,10% 0,11% 0,03% 0,14% 0,17% 0,10%SMO PolyKernel 0,04% 0,20% 0,16% 0,08% 0,06% 0,10%SMO RBFKernel 0,03% 0,07% 0,08% 0,06% 0,08% 0,11%J48 0,13% 0,28% 0,07% 0,27% 0,08% 0,20%RandomForest 0,20% 0,08% 0,04% 0,20% 0,06% 0,13%KNN Euclidean 0,08% 0,13% 0,11% 0,07% 0,12% 0,12%KNN Manhattan 0,07% 0,14% 0,03% 0,05% 0,07% 0,18%
Difference%=%3rd%Scen%3%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,12% 0,05% 0,04% 0,14% 0,21% 0,09% Naive Bayes 0,02% 30,06% 0,01% 0,00% 0,04% 30,01%SMO PolyKernel 0,08% 0,08% 0,08% 0,09% 0,07% 0,11% SMO PolyKernel 0,04% 30,12% 30,08% 0,01% 0,01% 0,01%SMO RBFKernel 0,01% 0,08% 0,06% 0,07% 0,08% 0,10% SMO RBFKernel 30,02% 0,01% 30,02% 0,01% 0,00% 30,01%J48 0,13% 0,16% 0,09% 0,17% 0,09% 0,21% J48 0,00% 30,11% 0,02% 30,10% 0,01% 0,01%RandomForest 0,10% 0,07% 0,04% 0,14% 0,09% 0,14% RandomForest 30,09% 30,01% 0,01% 30,06% 0,03% 0,01%KNN Euclidean 0,05% 0,13% 0,23% 0,10% 0,11% 0,08% KNN Euclidean 30,04% 0,00% 0,12% 0,04% 0,00% 30,04%KNN Manhattan 0,05% 0,15% 0,18% 0,04% 0,07% 0,10% KNN Manhattan 30,02% 0,01% 0,15% 30,01% 0,00% 30,08%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,11% 0,08% 0,04% 0,17% 0,19% 0,06% Naive Bayes 0,00% 30,03% 0,01% 0,04% 0,02% 30,04%SMO PolyKernel 0,08% 0,10% 0,22% 0,07% 0,06% 0,07% SMO PolyKernel 0,04% 30,11% 0,06% 30,01% 0,00% 30,03%SMO RBFKernel 0,01% 0,05% 0,01% 0,10% 0,09% 0,07% SMO RBFKernel 30,02% 30,02% 30,07% 0,05% 0,00% 30,04%J48 0,10% 0,11% 0,10% 0,10% 0,08% 0,23% J48 30,03% 30,17% 0,03% 30,17% 0,00% 0,03%RandomForest 0,09% 0,08% 0,13% 0,13% 0,12% 0,12% RandomForest 30,11% 30,01% 0,09% 30,07% 0,06% 30,01%KNN Euclidean 0,18% 0,01% 0,06% 0,10% 0,09% 0,11% KNN Euclidean 0,10% 30,12% 30,05% 0,03% 30,03% 30,01%KNN Manhattan 0,04% 0,10% 0,08% 0,11% 0,06% 0,08% KNN Manhattan 30,03% 30,04% 0,04% 0,06% 30,01% 30,10%
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
Table B.8: ISMIR variance classification results of mixed degraded audio, using cleanaudio as training set, and missing the weakest attributes with tolerant selection
• ISMIR mean classification results of mixed degraded audio, using clean audio as
training set, and missing the weakest attributes with strong selection are presented
on Tables 5.6 (Chapter 5).
Appendix B. Classification of mixed degradations 59
(a) Variance classification results
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,10% 0,11% 0,03% 0,14% 0,17% 0,10%SMO PolyKernel 0,04% 0,20% 0,16% 0,08% 0,06% 0,10%SMO RBFKernel 0,03% 0,07% 0,08% 0,06% 0,08% 0,11%J48 0,13% 0,28% 0,07% 0,27% 0,08% 0,20%RandomForest 0,20% 0,08% 0,04% 0,20% 0,06% 0,13%KNN Euclidean 0,08% 0,13% 0,11% 0,07% 0,12% 0,12%KNN Manhattan 0,07% 0,14% 0,03% 0,05% 0,07% 0,18%
Difference%=%3rd%Scen%3%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,12% 0,05% 0,04% 0,14% 0,21% 0,09% Naive Bayes 0,02% 30,06% 0,01% 0,00% 0,04% 30,01%SMO PolyKernel 0,08% 0,08% 0,08% 0,09% 0,07% 0,11% SMO PolyKernel 0,04% 30,12% 30,08% 0,01% 0,01% 0,01%SMO RBFKernel 0,01% 0,08% 0,06% 0,07% 0,08% 0,10% SMO RBFKernel 30,02% 0,01% 30,02% 0,01% 0,00% 30,01%J48 0,13% 0,16% 0,09% 0,17% 0,09% 0,21% J48 0,00% 30,11% 0,02% 30,10% 0,01% 0,01%RandomForest 0,10% 0,07% 0,04% 0,14% 0,09% 0,14% RandomForest 30,09% 30,01% 0,01% 30,06% 0,03% 0,01%KNN Euclidean 0,05% 0,13% 0,23% 0,10% 0,11% 0,08% KNN Euclidean 30,04% 0,00% 0,12% 0,04% 0,00% 30,04%KNN Manhattan 0,05% 0,15% 0,18% 0,04% 0,07% 0,10% KNN Manhattan 30,02% 0,01% 0,15% 30,01% 0,00% 30,08%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,11% 0,08% 0,04% 0,17% 0,19% 0,06% Naive Bayes 0,00% 30,03% 0,01% 0,04% 0,02% 30,04%SMO PolyKernel 0,08% 0,10% 0,22% 0,07% 0,06% 0,07% SMO PolyKernel 0,04% 30,11% 0,06% 30,01% 0,00% 30,03%SMO RBFKernel 0,01% 0,05% 0,01% 0,10% 0,09% 0,07% SMO RBFKernel 30,02% 30,02% 30,07% 0,05% 0,00% 30,04%J48 0,10% 0,11% 0,10% 0,10% 0,08% 0,23% J48 30,03% 30,17% 0,03% 30,17% 0,00% 0,03%RandomForest 0,09% 0,08% 0,13% 0,13% 0,12% 0,12% RandomForest 30,11% 30,01% 0,09% 30,07% 0,06% 30,01%KNN Euclidean 0,18% 0,01% 0,06% 0,10% 0,09% 0,11% KNN Euclidean 0,10% 30,12% 30,05% 0,03% 30,03% 30,01%KNN Manhattan 0,04% 0,10% 0,08% 0,11% 0,06% 0,08% KNN Manhattan 30,03% 30,04% 0,04% 0,06% 30,01% 30,10%
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
(b) Difference variance classification results
MEAN%of%ISMIR%CLASSIFICATION%WITH%TRAINING%3333>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,10% 0,11% 0,03% 0,14% 0,17% 0,10%SMO PolyKernel 0,04% 0,20% 0,16% 0,08% 0,06% 0,10%SMO RBFKernel 0,03% 0,07% 0,08% 0,06% 0,08% 0,11%J48 0,13% 0,28% 0,07% 0,27% 0,08% 0,20%RandomForest 0,20% 0,08% 0,04% 0,20% 0,06% 0,13%KNN Euclidean 0,08% 0,13% 0,11% 0,07% 0,12% 0,12%KNN Manhattan 0,07% 0,14% 0,03% 0,05% 0,07% 0,18%
Difference%=%3rd%Scen%3%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,12% 0,05% 0,04% 0,14% 0,21% 0,09% Naive Bayes 0,02% 30,06% 0,01% 0,00% 0,04% 30,01%SMO PolyKernel 0,08% 0,08% 0,08% 0,09% 0,07% 0,11% SMO PolyKernel 0,04% 30,12% 30,08% 0,01% 0,01% 0,01%SMO RBFKernel 0,01% 0,08% 0,06% 0,07% 0,08% 0,10% SMO RBFKernel 30,02% 0,01% 30,02% 0,01% 0,00% 30,01%J48 0,13% 0,16% 0,09% 0,17% 0,09% 0,21% J48 0,00% 30,11% 0,02% 30,10% 0,01% 0,01%RandomForest 0,10% 0,07% 0,04% 0,14% 0,09% 0,14% RandomForest 30,09% 30,01% 0,01% 30,06% 0,03% 0,01%KNN Euclidean 0,05% 0,13% 0,23% 0,10% 0,11% 0,08% KNN Euclidean 30,04% 0,00% 0,12% 0,04% 0,00% 30,04%KNN Manhattan 0,05% 0,15% 0,18% 0,04% 0,07% 0,10% KNN Manhattan 30,02% 0,01% 0,15% 30,01% 0,00% 30,08%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,11% 0,08% 0,04% 0,17% 0,19% 0,06% Naive Bayes 0,00% 30,03% 0,01% 0,04% 0,02% 30,04%SMO PolyKernel 0,08% 0,10% 0,22% 0,07% 0,06% 0,07% SMO PolyKernel 0,04% 30,11% 0,06% 30,01% 0,00% 30,03%SMO RBFKernel 0,01% 0,05% 0,01% 0,10% 0,09% 0,07% SMO RBFKernel 30,02% 30,02% 30,07% 0,05% 0,00% 30,04%J48 0,10% 0,11% 0,10% 0,10% 0,08% 0,23% J48 30,03% 30,17% 0,03% 30,17% 0,00% 0,03%RandomForest 0,09% 0,08% 0,13% 0,13% 0,12% 0,12% RandomForest 30,11% 30,01% 0,09% 30,07% 0,06% 30,01%KNN Euclidean 0,18% 0,01% 0,06% 0,10% 0,09% 0,11% KNN Euclidean 0,10% 30,12% 30,05% 0,03% 30,03% 30,01%KNN Manhattan 0,04% 0,10% 0,08% 0,11% 0,06% 0,08% KNN Manhattan 30,03% 30,04% 0,04% 0,06% 30,01% 30,10%
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
Table B.9: ISMIR variance classification results of mixed degraded audio, using cleanaudio as training set, and missing the weakest attributes with strong selection
(a) Mean classification results
MEAN%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 24,60% 34,30% 27,50% 32,60% 37,30% 25,00%SMO PolyKernel 27,90% 45,30% 35,20% 42,20% 42,70% 29,50%SMO RBFKernel 19,90% 40,40% 27,40% 33,50% 42,50% 24,50%J48 21,00% 24,50% 22,80% 27,40% 29,90% 18,70%RandomForest 22,50% 28,00% 25,90% 36,40% 37,00% 24,40%KNN Euclidean 28,60% 36,50% 27,40% 38,20% 34,70% 20,30%KNN Manhattan 28,30% 36,90% 28,30% 40,90% 41,00% 24,50%
Difference%=%3rd%Scen%4%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%3rd%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 21,50% 34,80% 28,00% 33,10% 37,20% 24,90% Naive Bayes 43,10% 0,50% 0,50% 0,50% 40,10% 40,10%SMO PolyKernel 19,40% 46,70% 32,70% 42,40% 42,80% 29,50% SMO PolyKernel 48,50% 1,40% 42,50% 0,20% 0,10% 0,00%SMO RBFKernel 16,90% 40,00% 27,30% 33,60% 42,60% 24,30% SMO RBFKernel 43,00% 40,40% 40,10% 0,10% 0,10% 40,20%J48 19,90% 23,90% 23,50% 27,90% 29,90% 18,90% J48 41,10% 40,60% 0,70% 0,50% 0,00% 0,20%RandomForest 19,40% 27,60% 26,10% 36,50% 37,00% 24,30% RandomForest 43,10% 40,40% 0,20% 0,10% 0,00% 40,10%KNN Euclidean 13,20% 21,70% 13,20% 23,20% 35,70% 19,70% KNN Euclidean 415,40% 414,80% 414,20% 415,00% 1,00% 40,60%KNN Manhattan 17,10% 34,60% 24,10% 40,70% 41,00% 24,00% KNN Manhattan 411,20% 42,30% 44,20% 40,20% 0,00% 40,50%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 19,50% 34,40% 25,50% 33,30% 37,00% 24,20% Naive Bayes 45,10% 0,10% 42,00% 0,70% 40,30% 40,80%SMO PolyKernel 16,80% 36,90% 32,50% 42,40% 42,80% 22,00% SMO PolyKernel 411,10% 48,40% 42,70% 0,20% 0,10% 47,50%SMO RBFKernel 14,70% 35,30% 24,50% 34,10% 42,30% 23,20% SMO RBFKernel 45,20% 45,10% 42,90% 0,60% 40,20% 41,30%J48 17,00% 23,90% 24,00% 27,50% 29,80% 16,40% J48 44,00% 40,60% 1,20% 0,10% 40,10% 42,30%RandomForest 17,00% 28,60% 25,00% 36,00% 36,50% 22,10% RandomForest 45,50% 0,60% 40,90% 40,40% 40,50% 42,30%KNN Euclidean 12,20% 17,90% 11,40% 22,90% 36,20% 17,60% KNN Euclidean 416,40% 418,60% 416,00% 415,30% 1,50% 42,70%KNN Manhattan 13,50% 30,80% 15,70% 39,20% 40,80% 21,40% KNN Manhattan 414,80% 46,10% 412,60% 41,70% 40,20% 43,10%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
(b) Difference mean classification results
MEAN%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%3rd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 24,60% 34,30% 27,50% 32,60% 37,30% 25,00%SMO PolyKernel 27,90% 45,30% 35,20% 42,20% 42,70% 29,50%SMO RBFKernel 19,90% 40,40% 27,40% 33,50% 42,50% 24,50%J48 21,00% 24,50% 22,80% 27,40% 29,90% 18,70%RandomForest 22,50% 28,00% 25,90% 36,40% 37,00% 24,40%KNN Euclidean 28,60% 36,50% 27,40% 38,20% 34,70% 20,30%KNN Manhattan 28,30% 36,90% 28,30% 40,90% 41,00% 24,50%
Difference%=%3rd%Scen%4%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%%=%improving%of%the%classification%with%3rd%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 21,50% 34,80% 28,00% 33,10% 37,20% 24,90% Naive Bayes 43,10% 0,50% 0,50% 0,50% 40,10% 40,10%SMO PolyKernel 19,40% 46,70% 32,70% 42,40% 42,80% 29,50% SMO PolyKernel 48,50% 1,40% 42,50% 0,20% 0,10% 0,00%SMO RBFKernel 16,90% 40,00% 27,30% 33,60% 42,60% 24,30% SMO RBFKernel 43,00% 40,40% 40,10% 0,10% 0,10% 40,20%J48 19,90% 23,90% 23,50% 27,90% 29,90% 18,90% J48 41,10% 40,60% 0,70% 0,50% 0,00% 0,20%RandomForest 19,40% 27,60% 26,10% 36,50% 37,00% 24,30% RandomForest 43,10% 40,40% 0,20% 0,10% 0,00% 40,10%KNN Euclidean 13,20% 21,70% 13,20% 23,20% 35,70% 19,70% KNN Euclidean 415,40% 414,80% 414,20% 415,00% 1,00% 40,60%KNN Manhattan 17,10% 34,60% 24,10% 40,70% 41,00% 24,00% KNN Manhattan 411,20% 42,30% 44,20% 40,20% 0,00% 40,50%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 19,50% 34,40% 25,50% 33,30% 37,00% 24,20% Naive Bayes 45,10% 0,10% 42,00% 0,70% 40,30% 40,80%SMO PolyKernel 16,80% 36,90% 32,50% 42,40% 42,80% 22,00% SMO PolyKernel 411,10% 48,40% 42,70% 0,20% 0,10% 47,50%SMO RBFKernel 14,70% 35,30% 24,50% 34,10% 42,30% 23,20% SMO RBFKernel 45,20% 45,10% 42,90% 0,60% 40,20% 41,30%J48 17,00% 23,90% 24,00% 27,50% 29,80% 16,40% J48 44,00% 40,60% 1,20% 0,10% 40,10% 42,30%RandomForest 17,00% 28,60% 25,00% 36,00% 36,50% 22,10% RandomForest 45,50% 0,60% 40,90% 40,40% 40,50% 42,30%KNN Euclidean 12,20% 17,90% 11,40% 22,90% 36,20% 17,60% KNN Euclidean 416,40% 418,60% 416,00% 415,30% 1,50% 42,70%KNN Manhattan 13,50% 30,80% 15,70% 39,20% 40,80% 21,40% KNN Manhattan 414,80% 46,10% 412,60% 41,70% 40,20% 43,10%
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
Table B.10: GTZAN mean classification results of mixed degraded audio, using cleanaudio as training set, and missing the weakest attributes with tolerant selection
Appendix B. Classification of mixed degradations 60
(a) Variance classification results
VARIANCE%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%3nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,04% 0,11% 0,09% 0,21% 0,06% 0,05%SMO PolyKernel 0,11% 0,20% 0,08% 0,14% 0,28% 0,27%SMO RBFKernel 0,06% 0,08% 0,12% 0,12% 0,40% 0,13%J48 0,22% 0,25% 0,11% 0,19% 0,16% 0,12%RandomForest 0,13% 0,27% 0,10% 0,17% 0,10% 0,09%KNN Euclidean 0,11% 0,31% 0,12% 0,18% 0,11% 0,25%KNN Manhattan 0,04% 0,14% 0,11% 0,18% 0,11% 0,20%
Difference%=%3rd%Scen%4%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,08% 0,11% 0,10% 0,16% 0,07% 0,05% Naive Bayes 0,03% 0,00% 0,00% 40,05% 0,01% 0,00%SMO PolyKernel 0,04% 0,16% 0,05% 0,10% 0,28% 0,26% SMO PolyKernel 40,07% 40,04% 40,02% 40,03% 0,00% 40,01%SMO RBFKernel 0,03% 0,12% 0,06% 0,14% 0,37% 0,14% SMO RBFKernel 40,04% 0,04% 40,05% 0,02% 40,03% 0,01%J48 0,17% 0,18% 0,13% 0,16% 0,17% 0,12% J48 40,05% 40,07% 0,02% 40,03% 0,01% 0,00%RandomForest 0,11% 0,13% 0,08% 0,08% 0,11% 0,13% RandomForest 40,02% 40,13% 40,02% 40,09% 0,01% 0,04%KNN Euclidean 0,11% 0,14% 0,09% 0,08% 0,11% 0,09% KNN Euclidean 0,00% 40,17% 40,03% 40,10% 0,00% 40,15%KNN Manhattan 0,11% 0,29% 0,11% 0,19% 0,14% 0,16% KNN Manhattan 0,08% 0,15% 40,01% 0,01% 0,03% 40,04%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,09% 0,14% 0,05% 0,16% 0,07% 0,05% Naive Bayes 0,04% 0,04% 40,04% 40,05% 0,01% 0,00%SMO PolyKernel 0,05% 0,12% 0,14% 0,09% 0,28% 0,18% SMO PolyKernel 40,07% 40,08% 0,06% 40,04% 0,00% 40,08%SMO RBFKernel 0,08% 0,14% 0,10% 0,13% 0,39% 0,14% SMO RBFKernel 0,01% 0,07% 40,02% 0,01% 40,01% 0,01%J48 0,14% 0,41% 0,18% 0,14% 0,18% 0,35% J48 40,08% 0,16% 0,07% 40,06% 0,03% 0,23%RandomForest 0,22% 0,13% 0,11% 0,11% 0,13% 0,12% RandomForest 0,09% 40,13% 0,01% 40,06% 0,03% 0,03%KNN Euclidean 0,03% 0,05% 0,05% 0,17% 0,09% 0,14% KNN Euclidean 40,08% 40,26% 40,07% 40,01% 40,02% 40,11%KNN Manhattan 0,14% 0,05% 0,07% 0,18% 0,13% 0,06% KNN Manhattan 0,10% 40,09% 40,04% 0,01% 0,02% 40,14%
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
(b) Difference variance classification results
VARIANCE%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%3nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,04% 0,11% 0,09% 0,21% 0,06% 0,05%SMO PolyKernel 0,11% 0,20% 0,08% 0,14% 0,28% 0,27%SMO RBFKernel 0,06% 0,08% 0,12% 0,12% 0,40% 0,13%J48 0,22% 0,25% 0,11% 0,19% 0,16% 0,12%RandomForest 0,13% 0,27% 0,10% 0,17% 0,10% 0,09%KNN Euclidean 0,11% 0,31% 0,12% 0,18% 0,11% 0,25%KNN Manhattan 0,04% 0,14% 0,11% 0,18% 0,11% 0,20%
Difference%=%3rd%Scen%4%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,08% 0,11% 0,10% 0,16% 0,07% 0,05% Naive Bayes 0,03% 0,00% 0,00% 40,05% 0,01% 0,00%SMO PolyKernel 0,04% 0,16% 0,05% 0,10% 0,28% 0,26% SMO PolyKernel 40,07% 40,04% 40,02% 40,03% 0,00% 40,01%SMO RBFKernel 0,03% 0,12% 0,06% 0,14% 0,37% 0,14% SMO RBFKernel 40,04% 0,04% 40,05% 0,02% 40,03% 0,01%J48 0,17% 0,18% 0,13% 0,16% 0,17% 0,12% J48 40,05% 40,07% 0,02% 40,03% 0,01% 0,00%RandomForest 0,11% 0,13% 0,08% 0,08% 0,11% 0,13% RandomForest 40,02% 40,13% 40,02% 40,09% 0,01% 0,04%KNN Euclidean 0,11% 0,14% 0,09% 0,08% 0,11% 0,09% KNN Euclidean 0,00% 40,17% 40,03% 40,10% 0,00% 40,15%KNN Manhattan 0,11% 0,29% 0,11% 0,19% 0,14% 0,16% KNN Manhattan 0,08% 0,15% 40,01% 0,01% 0,03% 40,04%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,09% 0,14% 0,05% 0,16% 0,07% 0,05% Naive Bayes 0,04% 0,04% 40,04% 40,05% 0,01% 0,00%SMO PolyKernel 0,05% 0,12% 0,14% 0,09% 0,28% 0,18% SMO PolyKernel 40,07% 40,08% 0,06% 40,04% 0,00% 40,08%SMO RBFKernel 0,08% 0,14% 0,10% 0,13% 0,39% 0,14% SMO RBFKernel 0,01% 0,07% 40,02% 0,01% 40,01% 0,01%J48 0,14% 0,41% 0,18% 0,14% 0,18% 0,35% J48 40,08% 0,16% 0,07% 40,06% 0,03% 0,23%RandomForest 0,22% 0,13% 0,11% 0,11% 0,13% 0,12% RandomForest 0,09% 40,13% 0,01% 40,06% 0,03% 0,03%KNN Euclidean 0,03% 0,05% 0,05% 0,17% 0,09% 0,14% KNN Euclidean 40,08% 40,26% 40,07% 40,01% 40,02% 40,11%KNN Manhattan 0,14% 0,05% 0,07% 0,18% 0,13% 0,06% KNN Manhattan 0,10% 40,09% 40,04% 0,01% 0,02% 40,14%
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
Table B.11: GTZAN variance classification results of mixed degraded audio, usingclean audio as training set, and missing the weakest attributes with tolerant selection
• GTZAN mean classification results of mixed degraded audio, using clean audio as
training set, and missing the weakest attributes with strong selection are presented
on Tables 5.7 (Chapter 5).
(a) Variance classification results
VARIANCE%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%3nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,04% 0,11% 0,09% 0,21% 0,06% 0,05%SMO PolyKernel 0,11% 0,20% 0,08% 0,14% 0,28% 0,27%SMO RBFKernel 0,06% 0,08% 0,12% 0,12% 0,40% 0,13%J48 0,22% 0,25% 0,11% 0,19% 0,16% 0,12%RandomForest 0,13% 0,27% 0,10% 0,17% 0,10% 0,09%KNN Euclidean 0,11% 0,31% 0,12% 0,18% 0,11% 0,25%KNN Manhattan 0,04% 0,14% 0,11% 0,18% 0,11% 0,20%
Difference%=%3rd%Scen%4%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,08% 0,11% 0,10% 0,16% 0,07% 0,05% Naive Bayes 0,03% 0,00% 0,00% 40,05% 0,01% 0,00%SMO PolyKernel 0,04% 0,16% 0,05% 0,10% 0,28% 0,26% SMO PolyKernel 40,07% 40,04% 40,02% 40,03% 0,00% 40,01%SMO RBFKernel 0,03% 0,12% 0,06% 0,14% 0,37% 0,14% SMO RBFKernel 40,04% 0,04% 40,05% 0,02% 40,03% 0,01%J48 0,17% 0,18% 0,13% 0,16% 0,17% 0,12% J48 40,05% 40,07% 0,02% 40,03% 0,01% 0,00%RandomForest 0,11% 0,13% 0,08% 0,08% 0,11% 0,13% RandomForest 40,02% 40,13% 40,02% 40,09% 0,01% 0,04%KNN Euclidean 0,11% 0,14% 0,09% 0,08% 0,11% 0,09% KNN Euclidean 0,00% 40,17% 40,03% 40,10% 0,00% 40,15%KNN Manhattan 0,11% 0,29% 0,11% 0,19% 0,14% 0,16% KNN Manhattan 0,08% 0,15% 40,01% 0,01% 0,03% 40,04%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,09% 0,14% 0,05% 0,16% 0,07% 0,05% Naive Bayes 0,04% 0,04% 40,04% 40,05% 0,01% 0,00%SMO PolyKernel 0,05% 0,12% 0,14% 0,09% 0,28% 0,18% SMO PolyKernel 40,07% 40,08% 0,06% 40,04% 0,00% 40,08%SMO RBFKernel 0,08% 0,14% 0,10% 0,13% 0,39% 0,14% SMO RBFKernel 0,01% 0,07% 40,02% 0,01% 40,01% 0,01%J48 0,14% 0,41% 0,18% 0,14% 0,18% 0,35% J48 40,08% 0,16% 0,07% 40,06% 0,03% 0,23%RandomForest 0,22% 0,13% 0,11% 0,11% 0,13% 0,12% RandomForest 0,09% 40,13% 0,01% 40,06% 0,03% 0,03%KNN Euclidean 0,03% 0,05% 0,05% 0,17% 0,09% 0,14% KNN Euclidean 40,08% 40,26% 40,07% 40,01% 40,02% 40,11%KNN Manhattan 0,14% 0,05% 0,07% 0,18% 0,13% 0,06% KNN Manhattan 0,10% 40,09% 40,04% 0,01% 0,02% 40,14%
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
(b) Difference variance classification results
VARIANCE%of%GTZAN%CLASSIFICATION%WITH%TRAINING%4444>%1st%Scenario%Vs.%3nd%Scenario1st$Scenario:$Using$all$the$attributesClassifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,04% 0,11% 0,09% 0,21% 0,06% 0,05%SMO PolyKernel 0,11% 0,20% 0,08% 0,14% 0,28% 0,27%SMO RBFKernel 0,06% 0,08% 0,12% 0,12% 0,40% 0,13%J48 0,22% 0,25% 0,11% 0,19% 0,16% 0,12%RandomForest 0,13% 0,27% 0,10% 0,17% 0,10% 0,09%KNN Euclidean 0,11% 0,31% 0,12% 0,18% 0,11% 0,25%KNN Manhattan 0,04% 0,14% 0,11% 0,18% 0,11% 0,20%
Difference%=%3rd%Scen%4%1st%Scen3rd$Scenario:$Most$robust$attributes$(1st$selection) Positive%=%lower%variance%in%1st%scenario
Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,08% 0,11% 0,10% 0,16% 0,07% 0,05% Naive Bayes 0,03% 0,00% 0,00% 40,05% 0,01% 0,00%SMO PolyKernel 0,04% 0,16% 0,05% 0,10% 0,28% 0,26% SMO PolyKernel 40,07% 40,04% 40,02% 40,03% 0,00% 40,01%SMO RBFKernel 0,03% 0,12% 0,06% 0,14% 0,37% 0,14% SMO RBFKernel 40,04% 0,04% 40,05% 0,02% 40,03% 0,01%J48 0,17% 0,18% 0,13% 0,16% 0,17% 0,12% J48 40,05% 40,07% 0,02% 40,03% 0,01% 0,00%RandomForest 0,11% 0,13% 0,08% 0,08% 0,11% 0,13% RandomForest 40,02% 40,13% 40,02% 40,09% 0,01% 0,04%KNN Euclidean 0,11% 0,14% 0,09% 0,08% 0,11% 0,09% KNN Euclidean 0,00% 40,17% 40,03% 40,10% 0,00% 40,15%KNN Manhattan 0,11% 0,29% 0,11% 0,19% 0,14% 0,16% KNN Manhattan 0,08% 0,15% 40,01% 0,01% 0,03% 40,04%
3rd$Scenario:$Most$robust$attributes$(2nd$selection)Classifier/Feature RH RP MVD SSD TSSD TRH Classifier/Feature RH RP MVD SSD TSSD TRHNaive Bayes 0,09% 0,14% 0,05% 0,16% 0,07% 0,05% Naive Bayes 0,04% 0,04% 40,04% 40,05% 0,01% 0,00%SMO PolyKernel 0,05% 0,12% 0,14% 0,09% 0,28% 0,18% SMO PolyKernel 40,07% 40,08% 0,06% 40,04% 0,00% 40,08%SMO RBFKernel 0,08% 0,14% 0,10% 0,13% 0,39% 0,14% SMO RBFKernel 0,01% 0,07% 40,02% 0,01% 40,01% 0,01%J48 0,14% 0,41% 0,18% 0,14% 0,18% 0,35% J48 40,08% 0,16% 0,07% 40,06% 0,03% 0,23%RandomForest 0,22% 0,13% 0,11% 0,11% 0,13% 0,12% RandomForest 0,09% 40,13% 0,01% 40,06% 0,03% 0,03%KNN Euclidean 0,03% 0,05% 0,05% 0,17% 0,09% 0,14% KNN Euclidean 40,08% 40,26% 40,07% 40,01% 40,02% 40,11%KNN Manhattan 0,14% 0,05% 0,07% 0,18% 0,13% 0,06% KNN Manhattan 0,10% 40,09% 40,04% 0,01% 0,02% 40,14%
3rd%SCENARIO:%Classification%by%10%cross%validation,%usign%clean%data%for%training,%and%mixed%degraded%audio%for%testing,%missing%the%weakest%sttributes%in%the%test%files.%Two%levels%of%selectivity.
1st%SCENARIO:%Classification%using%all%the%attributes,%by%10%cross%validation,%using%clean%data%for%training,%and%mixed%degraded%audio%for%testing.
Table B.12: GTZAN variance classification results of mixed degraded audio, usingclean audio as training set, and missing the weakest attributes with strong selection
Appendix C
Attached files
The attached files set contains all the extra plots regarding to the Section 4.1, showing
mean and variance differences between all degradations used in this study and clean
audio. The plots are showed in two different display ways:
• Small-scale plots with Y-axis normalized difference values (the normalization is
applied using the highest difference attribute on all the degradations set), in order
to be able to compare different degradations for the same files.
• Individual plots with high resolution quality for each degradation without joint
normalization.
In addition, we attach two files regarding to the classification of degraded audio studied
in Section 4.2, containing all tables of mean and variance classification results for all
degradations used in this study.
61
Bibliography
[1] ISMIR. International society for music information retrieval, 2014. URL http:
//www.ismir.net/. [Online; accessed May-2014].
[2] George Tzanetakis and Perry Cook. Musical genre classification of audio signals.
Speech and Audio Processing, IEEE transactions on, 10(5):293–302, 2002.
[3] Alexander Schindler and Andreas Rauber. Capturing the temporal domain in echon-
est features for improved classification effectiveness. In Adaptive Multimedia Re-
trieval, Lecture Notes in Computer Science, Copenhagen, Denmark, October 24-25
2012. Springer.
[4] Michael I Mandel and Daniel PW Ellis. Song-level features and support vector
machines for music classification. In ISMIR 2005: 6th International Conference on
Music Information Retrieval: Proceedings: Variation 2: Queen Mary, University of
London & Goldsmiths College, University of London, 11-15 September, 2005, pages
594–599. Queen Mary, University of London, 2005.
[5] Matthias Mauch and Sebastian Ewert. The audio degradation toolbox and its ap-
plication to robustness evaluation. In Proceedings of the 14th International Society
for Music Information Retrieval Conference (ISMIR 2013), Curitiba, Brazil, 2013.
accepted.
[6] Eric Allamanche, Jürgen Herre, Oliver Hellmuth, Bernhard Fröba, Throsten Kast-
ner, and Markus Cremer. Content-based identification of audio material using mpeg-
7 low level description. In ISMIR, 2001.
[7] Enric Guaus and Perfecto Herrera. Music genre categorization in humans and ma-
chines. In Audio Engineering Society Convention 121. Audio Engineering Society,
2006.
62
Bibliography 63
[8] Jens Madsen. Modeling of emotions expressed in music using audio features. pages
117–142, 2011.
[9] Lindos electronics. A-weighting in detail. URL http://www.lindos.co.
uk/cgi-bin/FlexiData.cgi?SOURCE=Articles&VIEW=full&id=2. Last visited:
12/06/2014.
[10] Steven Van De Par, Armin Kohlrausch, Ghassan Charestan, and Richard Heusdens.
A new psychoacoustical masking model for audio coding applications. In Acoustics,
Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on,
volume 2, pages II–1805. IEEE, 2002.
[11] Center for Computer Research in Music and Acoustics Standford. Dolby
b, c, and s noise reduction systems: Making cassettes sound better.
URL http://www.dolby.com/uploadedFiles/English_(US)/Professional/
Technical_Library/Technologies/Dolby_A-type_NR/212_Dolby_B,_C_and_S_
Noise_Reduction_Systems.pdf. Last visited: 12/06/2014.
[12] MATLAB. version 7.12.0.635 (R2011a). The MathWorks Inc., Natick, Mas-
sachusetts, 2011.
[13] ISMIR 2004, 5th International Conference on Music Information Retrieval,
Barcelona, Spain, October 10-14, 2004, Proceedings, 2004. URL http://ismir2004.
ismir.net/genre_contest/index.html.
[14] D. Stowell D. Giannoulis, E. Benetos and M. D. Plumbley. Public dataset for
scene classification task. IEEE AASP Challenge on Detection and Classification of
Acoustic Scenes and Events, 2012.
[15] Rebecca Stewart and Mark B. Sandler. Database of omnidirectional and b-format
room impulse responses. In ICASSP, pages 165–168. IEEE, 2010. ISBN 978-
1-4244-4296-6. URL http://dblp.uni-trier.de/db/conf/icassp/icassp2010.
html#StewartS10.
[16] Thomas Lidy and Andreas Rauber. Evaluation of feature extractors and psycho-
acoustic transformations for music genre classification. In Proceedings of the Sixth
International Conference on Music Information Retrieval, pages 34–41, 2005. ISBN
0-9551179-0-9.
Bibliography 64
[17] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann,
and Ian H. Witten. The weka data mining software: an update. SIGKDD Explor.
Newsl., 11(1):10–18, 2009. ISSN 1931-0145. doi: 10.1145/1656274.1656278. URL
http://dx.doi.org/10.1145/1656274.1656278.