unsupervised change detection in data streams: an ... · rodrigo f. de mello has received research...

Universidade de São Paulo

2015-12

Unsupervised change detection in data

streams: an application in music analysis Progress in Artificial Intelligence, Heidelberg, v. 4, n. 1, p. 1-10, Dec. 2015http://www.producao.usp.br/handle/BDPI/50701

Downloaded from: Biblioteca Digital da Produção Intelectual - BDPI, Universidade de São Paulo

Biblioteca Digital da Produção Intelectual - BDPI

Outros departamentos - ICMC/Outros Importação - 2015

http://www.producao.usp.br

http://www.producao.usp.br/handle/BDPI/50701

Prog Artif Intell (2015) 4:1–10DOI 10.1007/s13748-015-0063-z

REGULAR PAPER

Unsupervised change detection in data streams: an applicationin music analysis

Rosane M. M. Vallim1 · Rodrigo F. de Mello1

Received: 17 March 2015 / Accepted: 4 November 2015 / Published online: 17 November 2015© Springer-Verlag Berlin Heidelberg 2015

Abstract The mining of data streams has been attractingmuch attention in the recent years, specially from MachineLearning researchers. One important task in learning fromdata streams is to correctly detect changing data characteris-tics over time, since this is critical to the correct modeling ofdata behavior.With the understanding thatmany applicationsgenerate unlabeled streams, different algorithms have beenproposed to approach unsupervised change detection. Thesealgorithms implement different strategies, from simple incre-mentalmethods thatmonitor data statistics, tomore advancedtechniques based on divergences of clustering models. Inrecent studies, however, authors pointed out those algorithmslack in learning guarantees, meaning that results obtained bythesemethods could be due tomodel parameterization. Theseobservations led to the development of a new stability con-cept that is suitable for unsupervised streams. This stabilityconcept motivated a new change detection algorithm whichensures model modifications corresponding to actual datachanges. Previous results on artificial scenarios have con-firmed this algorithm’s ability to correctly detect changes.However, the requirement of assessing the algorithm’s per-

This work was supported by FAPESP (São Paulo ResearchFoundation), Brazil, under Grants No. 2014/13323-5, 2013/16480-1and 2011/51305-0 and CNPq (The National Council for Scientific andTechnological Development), Brazil, under Grant Nos.303280/2011-5, 303051/2014-0 and 441583/2014-8.Rosane M. M. Vallim has received a research grant from FAPESP andRodrigo F. de Mello has received research grants from FAPESP andCNPq. The authors declare that they have no conflict of interest.

B Rodrigo F. de [email protected]

Rosane M. M. [email protected]

1 Institute of Mathematics and Computer Sciences, Universityof São Paulo, São Carlos, Brazil

formance on real-world data remained, which is essential tothe understanding of the algorithm’s capabilities. Motivatedby this observation, this work applied this algorithm to thedomain of audio analysis, more specifically, in music changedetection. Results obtained in different music tracks provideinteresting insights on the types of changes that produce amore significant impact on the algorithm’s decisions, allow-ing for a better understanding about its underlying dynamics.

Keywords Machine Learning · Data streams · Stablechange detection · Music change detection

1 Introduction

Data streams are ordered, open-ended sequences of dataproduced at high volumes and rates along time [5]. In oppo-sition to the batch-learning scenario [7] which dominatedthe Machine Learning (ML) research community in thepast years, the infinite nature of streams poses a totallynew set of challenges for learning algorithms. An algo-rithm designed to process a data stream needs to deal withmore rigorous memory and processing time limits than abatch-learning technique, making the latter unsuitable forthe stream domain.

Several real-world phenomena produce streams of data,such as in climate sensoring, industrial sensoring, Internettraffic monitoring and deforestation analysis [4]. Althoughmuch of the effort by the Data Stream Mining (DSM) com-munity has focused on the design of supervised techniques,unfortunately most real-world applications do not providelabeled information. This lack of a priori knowledge wheninducing models resulted in a call for the development ofunsupervised learning techniques more appropriate to beapplied in the data stream domain.

123

http://crossmark.crossref.org/dialog/?doi=10.1007/s13748-015-0063-z&domain=pdf

2 Prog Artif Intell (2015) 4:1–10

One important aspect in this direction is the develop-ment of algorithms capable of dealing with unsupervisedchange detection. Change detection is an important taskwhen learning from streams, because data characteristicscan evolve over time. This means that the probability dis-tribution generating observations in a stream cannot beguaranteed to remain fixed as time elapses. This evolvingaspect has motivated several studies in the design of algo-rithms to detect data changes, allowing for efficient andeffective model reinduction in the presence of new databehaviors [1,2,6,8]. However, these algorithms do not pro-vide learning guarantees, consequently detecting changesthat may not correspond to actual modifications in data char-acteristics.

In an attempt to overcome this gap, Vallim andMello [13]relied on previous results by Carlsson and Memoli [3] todesign a new stability concept for unsupervised data streams,providing learning guarantees for algorithms in this scenario.The order-invariant stability concept by Carlsson and Mem-oli [3] considered stability in the sense of model divergenceswhen the input data are prone to ordering perturbations.How-ever, the data independence assumption used by CarlssonandMemoli [3] is not appropriate to the data stream scenario,because it would change the inherent order of examples in thestream, jeopardizing learning when the input has some levelof dependency over time. Therefore, the stability conceptproposed by Vallim and Mello [13] assumes observations ina stream are not necessarily independent, and, because ofthat, a data stream could be represented as a time series. Themapping to a time series allows a data stream to present dif-ferent levels of data dependency, from a purely stochasticbehavior, e.g., a Gaussian noise, to a totally deterministicone, in which a next observation only depends on past ones.This gives a data stream the ability to represent a broaderrange of behaviors.

The time series representation of a data stream allowedVallim and Mello [13] to design the surrogate stabilityconcept. This concept considers model divergences whensurrogate data, which maintains frequency and amplitudecharacteristics of the original data, are presented to the unsu-pervised learning algorithm. Vallim and Mello [13] alsoproposed a change detection algorithm that produces modelsfor consecutive data windows, which are then compared todetect data divergences. By following the surrogate stabilityconcept, this algorithm guarantees that relevant model diver-gences truly correspond to data stream behavior changes.This algorithm was assessed in synthetic scenarios, provid-ing good results. Besides such results, the need for applyingthis algorithm in real-world scenarios and understanding itsbehavior under different data changing scenarios remained,motivating this paper which evaluates its outcomes and innerworkings in the domain of audio analysis.

The audio domain, and more specifically, the sub-domainof music processing, was chosen as the application sce-nario in this work. Our focus is in the detection of changesin music streams occurring along time. Since the studiedchange detection algorithm by Vallim and Mello [13] usesa frequency domain representation as learning model, ana-lyzing audio streams simplify the identification of whichsources of data changes impose a greater influence onmodel divergences. Second, audio waves have been gen-erally understood and studied in the frequency domain byseveral researchers [9,10]. Finally, the public availability ofthe music streams analyzed in this work makes it easier forother researchers to reproduce our experiments. It is impor-tant to notice that several other audio scenarios besidesmusicanalysis could benefit from the application of this algorithm,such as in the monitoring and detection of patterns in ani-mal communication (for example, dolphins and whales), inthe delimitation of audio for copyright reasons (for example,in audio streaming applications), in the detection of alter-ations to produce noise canceling mechanisms in airplanes,among others. The approach applied by the change detec-tion algorithm could also be used in the creation of musicsignatures, for the purpose of content-based retrieval [9,10].This niche of application, however, is not the focus of thiswork.

We evaluated the change detection algorithm proposedin [13] using three different music tracks. These music trackspresent different characteristics, which allowed us to analyzethe results obtained by the algorithm in different situations.Results suggest which sources of changes in these tracksproduce the most expressive model alterations, consequentlyallowing for a better understanding of the underlying dynam-ics of the studied algorithm. Our findings suggest that thechange detection approach used is an interesting alternativefor the data stream domain, allowing for good change esti-mateswith theoretical guarantees of stability and, in addition,with time complexity consistent with this scenario require-ments.

The rest of this paper is organized as follows. In Sect. 2,we give the reader some background on the techniques avail-able in the literature for change detection in unsuperviseddata streams. Next, Sect. 3 brings a review on the stabilityconcept proposed by Vallim and Mello [13], together withthe change detection algorithm proposed that holds the sur-rogate stability property. Section 4 presents the applicationof the change detection algorithm in the domain of audioanalysis, comprising detailed explanations and discussionson the audio streams considered, the experimental setup and,finally, the obtained results with corresponding analysis ofwhy those results were observed. Final conclusions from ourstudies are presented in Sect. 5, as well as indications forfuture research directions.

123

Prog Artif Intell (2015) 4:1–10 3

2 Unsupervised change detection in data streams

The change detection problem has motivated different strate-gies, ranging from statistical algorithms based solely onsummary measures such as mean and variance, to moreadvanced techniques using clustering and novelty detection.

Page [8] proposes the Page–Hinkley Test which is anincremental change detection approach that monitors thecumulated difference between the observations in the datastream and their mean. This method issues a change whenthe monitored statistic falls below a user-defined thresholdvalue. AdaptiveWindowing (ADWIN) [2] employs a similarstrategy by computing the mean to detect changes in data.ADWIN computes means of time-shifted windows of dataand compares them and issues a change when the differenceis greater than a given threshold.

The change detection problem has also been addressed byapplying incremental clustering algorithms and evaluatingmodel divergences at different time instants, i.e., with theinclusion of new observations. The Grow When Required(GWR) neural network [6] issues changes when new neu-rons are added to the model or when a neuron that hasnot been used recently is activated, characterizing an unex-pected observation or the return to a previous one. Therefore,GWR can detect changes based on a single new observa-tion, for example, an outlier. Another neural-network-basedalgorithm is the Self-Organizing Novelty Detection NeuralNetwork (SONDE) [1], which uses Shannon’s entropy toquantify the level of novelty introduced in the model afterreceiving a new observation. Therefore, the algorithm con-siders that model divergences correspond to data changes.GWRand SONDEaremore adequate to detect outliers ratherthan behavior changes, this because both algorithms computemodels based on the last received observation. Recently, Val-lim andMello [13] proposed M-DBScan, a change detectionalgorithm based on a density-based clustering method andon two different entropy measures, one temporal and anotherspatial, to estimate model divergences. By considering bothtemporal and spatial entropies,M-DBScan can issue changesdue to modifications in the order of observations and alsodue to data distribution alterations. The design of M-DBscanconsiders that a change in data streams should be seen asa sequence of novel events, therefore being more robust tooutliers than previous methods.

As pointed out in Vallim and Mello [13], despite of thegood results achieved by the aforementioned algorithms, theydo not provide learning guarantees, i.e., it is impossible toconclude if changes detected by these algorithms are due tomodifications in data characteristics or to influences gener-ated by model parameterization. Aiming to overcome thisgap, Vallim and Mello [13] based themselves in the workby Carlsson and Memoli [3], to propose a new stabilityconcept for unsupervised learning algorithms that is suit-

able to the data stream scenario. The studies by Carlssonand Memoli [3] considered stability in the sense of modeldivergences when the input data order is modified. However,although Carlsson and Memoli [3]’s work provides relevantlearning guarantees for the unsupervised learning scenario,the data independence assumption on which it was formal-ized do not permit its direct application on the data streamdomain. Although data independency could be assumed forsome streams (e.g., a purely stochastic source producingobservations), in many others, time plays an important rolein the relationship among observations, therefore, by remov-ing such relationship one completely changes the problemcharacteristics.

While also based on model divergences, the stabilityconcept proposed by Vallim and Mello [13] assumes obser-vations are not necessarily independent and identicallydistributed (i.i.d). Once the observations in a data streammay have different levels of dependency, the authors assumethat a stream can be seen as a time series. Their stabilityconcept, therefore, is based on model divergences when sur-rogate data [12] are used as input to the learning algorithm.A surrogate data are a new time series generated from theoriginal one after a phase change, but maintaining both thefrequencies and amplitudes of the original series. The so-called surrogate stability states that an algorithm is stable ifit produces the same models for both the original and thesurrogate series.

Any algorithm thatmeets the surrogate stability, therefore,guarantees thatmodel divergences truly correspond to chang-ing data characteristics. Based on this fact, Vallim andMello[13] also propose a new stable change detection algorithmfor the unsupervised data stream scenario, which comparesconsecutive Power Spectrum (PS) graphs obtained by meansof the Fourier Transform on the input data. The PS graphcorresponds to a stable model according to the surrogate sta-bility, because different phases in the input data do not alterthe resulting graph.

Based on the good results reported in [13], this workapplies the change detection algorithm to the audio domain,more specifically in the task of identifying changing charac-teristics in music over time. Therefore, in the next sections,we give a review of the change detection algorithm presentedin [13], followed by the experiments conducted and resultsobtained, showing which sources of changes in the musicstream produce a greater influence in the algorithm’s abilityto correctly detect changes.

3 Stable change detection approach

Given the importance of establishing guarantees for unsu-pervised learning algorithms, and the difficulties associatedwith the application of the stability concept by Carlsson and

123


Memoli [3] on scenarios containing temporal data dependen-cies, Vallim andMello [13] proposed a new stability conceptthat is suitable for the data stream scenario. To develop thenew concept, the authors assumed data streams can be rep-resented as a time series. This assumption allows for therepresentation of data streams composed only of determinis-tic behavior (where the dependencies among observationsalong time define new observations), only of stochasticbehavior (where only a component with random characteris-tics is presented) and, finally, mixtures of deterministic andstochastic behaviors.

According to Vallim and Mello [13], in order for an algo-rithm to be stable, it needs to generate models that areinvariant to alterations in the phase of a time series. As provedby the authors, the PS graph meets this requirement becauseboth an original series f (t) and its surrogate s(t), generatedby Theiler’s method [12], present the same PS graph. This isdue to the fact that Theiler’s method for surrogate generationonly modifies the phases of Fourier coefficients. Therefore,the surrogate series maintains both the same frequencies andamplitudes as the ones observed in the original series.

Making use of the new surrogate stability concept, theauthors propose a change detection algorithm that decom-poses portions of a data stream by means of the Fouriertransform, generates a PS graph for each corresponding por-tion, and computes a divergence between these graphs usingthe Euclidean distance. This divergence between consecu-tive models is used to detect changes in the input data. Oncethe algorithm is stable, it will not be influenced by modelparameterization.

The change detection algorithm, as it appears in Vallimand Mello [13], is reproduced as follows:

Algorithm for change detection in a data stream:construct two different time-shifted windows containingobservations of series f (t), respectively, W1 and W2, wherethe timestamps of the observations in W1 are smaller thanthe timestamps in W2.1 Denoting the series contained in W1

and W2, respectively, by f (t) and g(t), apply the FourierTransform on both series obtaining two PS graphs, denoted,respectively, as A f (t)(ξ) and Ag(t)(ξ), where ξ representsfrequencies. Next, calculate the difference between A f (t)(ξ)

and Ag(t)(ξ) expressed as D(A f (t)(ξ), Ag(t)(ξ)). The dis-tance D(·) ∈ [0,∞] can be obtained by computing theEuclidean distance between A f (t)(ξ) and Ag(t)(ξ), as definedin Eq. 1,

D(A f (t)(ξ), Ag(t)(ξ)) =√√√√

1

N

N∑

ξ=1

(A f (t)(ξ) − Ag(t)(ξ))2

(1)

1 Windows can also overlap each other.

where N represents the maximum discrete frequency con-sidered.

If D(A f (t)(ξ), Ag(t)(ξ)) ≤ δ, where δ is a threshold, thenno change is detected in the behavior of f (t). Otherwise, adivergence is detected betweenW1 andW2 what reflects datainstabilities. Window W1 receives the observations stored inwindow W2, while W2 will receive new observations fromf (t) and, then, the process is repeated.According to Vallim andMello [13], if no change happens

in two consecutive timewindows, the two corresponding sub-series will present very similar characteristics, in terms offrequencies and amplitudes. Because of this similarity, thesubseries insidewindowW2 could be seen as a surrogate fromthe series insideW1. If a change happens in the input data, itsfrequencies and amplitudes will be consequently modified,which corresponds to divergences in the PS graphs of the twoconsecutive subseries. In this case, subseries contained inW2

cannot be considered a surrogate of the subseries inside W1.

4 Application in audio analysis

This section presents the details related to the application ofthe change detection algorithm (Sect. 3) to the domain ofaudio analysis. More specifically, we applied the algorithmto the task of detecting changing characteristics in differentmusic tracks. In the next sub-sections, we present the audiotracks chosen for analysis, the configuration of the experi-ments conducted and the corresponding results obtained. Forall experiments conducted in this work, the R language forstatistical analysis2 was used as implementation tool.

4.1 Audio data set

Three different music tracks were selected in our experi-ments. These tracks were selected due to the present changesalong time which can be easily noticed by humans.

The first one isMozart’s “Alla Turca”, the last movementof the “Sonata in AMajor” for the piano. The track used wasthe one available in the album “The Best of Mozart” from1997. This track was selected because it is a well-knownclassical music and also because it is performed by only oneinstrument, in this case the piano, which makes it easier toassociate any eventual changes detected with the informationin the music sheet.

The second one is a song called “A day in the life”, bythe Beatles. The version used is the one from the album“Sgt. Pepper’s Lonely Hearts Club Band”. This track, dif-ferently from the previous one, is a pop song performedby several musicians playing different instruments. It also

2 http://www.r-project.org.

123

http://www.r-project.org


has three well-defined moments where the audio character-istics change, and which is easily perceived by the humanear. Finally, “Chop suey”, from System of a Down, was thelast track used in these experiments. The version used wasthe one available in the album “Toxicity”. This audio trackpresents several alterations over time, presenting a collectionof instruments that is more diverse than the track by the Bea-tles. Also, System of a Down is a band characterized by mainand back vocals, which is an observable feature in “ChopSuey”.

All three tracks were originally recorded with a samplingrate equal to 44,100Hz. To conduct our experiments, we con-verted all three tracks to wave files.

4.2 Experimental setup

We applied the change detection algorithm presented inSect. 3 to the threemusic tracks selected. The algorithm com-putes the divergence, by computing the Euclidean distance,only between successive time-shifted windows of observa-tions of the series. However, for the analysis conducted here,we calculated the divergence between every possible pairof windows constructed, allowing for the identification ofcross-correlated subseries.

We chose twodifferent configurations for the timewindowlength. The first configuration uses a time window contain-ing 44,100 observations, which corresponds to 1 s of audio,while the second one uses a window length equal to 4410observations, corresponding to 1/10 of a second of audio.

4.3 Results and discussion

Our first experiment applied the change detection algorithmto Mozart’s “Alla Turca”, the last movement from “Sonatain A Major”. This movement is written solely in the keynoteof A, therefore being considered a homotonal composition.Homotonal is a technical musical term that refers to thetonal structure of multi-movement compositions. A multi-movement composition is considered homotonal if all itsmovements have the same tonic note (keynote). For the par-ticular case of the “Alla Turca”, Mozart interchanged bothmodes (major and minor) during the movement. At particu-lar time intervals, the author also combines two consecutiveoctaves, a technique that is commonly present in other clas-sical compositions.

Graphical representations of the results obtained are pre-sented in Figs. 1 and 2, which correspond, respectively, tothe configuration using a time window of 44,100 observa-tions and the one using 4410 observations. In both graphs,the green areas correspond tohighdivergence,while red areasindicate the minimum divergence observed.

As can be seen in Fig. 1, the algorithm detected four majorareas of divergence in the music. These areas occur at the

Fig. 1 Divergences observed between every possible pair of windowsfor Mozart’s “Alla Turca”, using a window of 44,100 observations.Both axes represent the window index. Because the window size usedcorresponds to exactly 1 s of audio, both axis can also be interpreted asthe time in seconds from the beginning of the audio to its end

Fig. 2 Divergences observed between every possible pair of windowsforMozart’s “Alla Turca”, using a window of 4410 observations. Bothaxes represent the window index. Because the window size used corre-sponds to 1/10 of second of audio, both axis can also be interpreted asthe time in deciseconds from the beginning of the audio to its end

following approximate time intervals: 40–55s, 1min and 35sto 1min and 50s (95–110s), 2min and 30s to 2min and 55s(150–175s) and, finally, 3–3min and 10s (180–190s).

Listening to the audio, one can easily notice that thefirst, second and third intervals are repetitions of one samefragment. More specifically, both the second and the thirdintervals are exact repetitions of the first one. The excep-tion is the fourth interval, which is not an exact repetition,

123


Fig. 3 Two different PS graphs: a PS graph produced for the first sec-ond of audio; b PS graph produced for the 41st second of audio

but still shares similar characteristics with the previous audiofragments marked.

An analysis of the music sheet for the Sonata revealed thatall four intervalsmarked by the algorithm correspond to com-passes in which the pianist is required to play a sequence ofnotes comprising two consecutive octaves,while in the rest ofthe musicMozart made use of only one octave. This explainsthe results observed in Fig. 1. The presence of two octavesresults in a PS graph with two sets of present frequencies(considering the most expressed frequencies, i.e., the onespresenting the highest amplitude values), one set for the notesplayed in one octave and another for the notes played in thenext octave. On the other hand, fragments played in onlyone octave produce PS graphs containing only a set of fre-quencies. Therefore, when comparing these two types of PSgraphs, the divergence is higher, resulting in the four highestdivergence areas marked in Fig. 1. Comparisons during frag-ments of the music where only one octave is played producea low divergence, which can be more easily observed in thereddish rectangular areas among the four divergence intervalsmarked in Fig. 1. Figure 3 presents two PS graphs producedusing 1s of audio extracted, respectively, from the beginningof the Sonata and immediately after the first marked change.These PS graphs clearly show distinct sets of frequencies,illustrating the previous discussion.

Also, according to the music sheet for the completeSonata, the beginning of the first detected interval marks achange in mode (from A minor to A major). This is alsoobserved at the beginning of the third detected interval. Forthe second detected interval, the change in mode happens atthe end of the interval (from A major to A minor). From ourresults, however, we observed that the change detection algo-rithm was more influenced by the addition of another octave

than by the mode changes. Because changing from A minorto A major (or A major to A minor) produces only smallchanges in tonality, the PS graphs produced for a music frag-mentwritten inAmajorwill not be very different from the PSgraph produced by fragments written in Aminor. Indeed, thedifference between these PS graphs will be characterized bya small shift in the frequency axis. This shift in frequenciesproduces a smaller divergence than the one observed previ-ously, when a new set of frequencies corresponding to a newoctave is included in the PS graph. The same observationscan be drawn from Fig. 2, in which a smaller window size isused.

To give the reader a more detailed idea of Mozart’s “AllaTurca”, Fig. 4 shows a snippet of the music sheet for the firstpart of the music, while Fig. 5 presents a snippet from theinterval starting at 40 s and finishing at 55 s.

We also analyzed the obtained results at a finer degreeof detail. Figure 6 presents the results obtained only at theinterval starting at 130s and finishing at 150s. Note that thisfigure is a portion of Fig. 1, zoomed at thementioned interval.Figure 6 shows three high divergence points at, respectively,134, 138 and 148s. All three marks correspond to a note(or sequence of notes) played with significant or increasingintensity, which can be observed by listening to the audiotrack or looking at the sheet music. Enhancing the inten-sity of a note (or sequence of notes) directly impacts theamplitude value at the given frequency, which consequentlyimpacts divergences obtained using such a PS graph. Theseobservations show that the change detection algorithm is alsocapable of detecting smaller, and much more specific, typesof changes in audio tracks.

The second song analyzed was “A day in the life” bythe Beatles. This song has three well-marked moments ofvery distinct characteristics, occurring approximately around2min (120s), 3min (180s) and 4min (240s) of the audio.Figures 7 and 8 present the results obtained by applying thechange detection approach.

As can be seen in Fig. 7, the algorithmdetected threemajormoments of divergence in the song. These correspond to thefollowing intervals: 2–2min and 15s, 3min and 4s to 3minand 17s, 4min and 5s to 4min and 20s. As mentioned ear-lier, the song drastically changes its melody exactly at theseapproximate time intervals. Based on music sheets availableat http://www.jellynote.com,3 weobserved this songpresentsthe same tonality from the beginning to its end.

Weobserved that at themarked intervals the song is playedin a sequence of notes with increasing frequency, as if itwas traversing a musical scale starting at the tonic (the firstnote in the scale) and finishing at the tone (the last note).This pattern characterizes an increase in frequencies in the

3 https://www.jellynote.com/en/sheet-music-tabs/the-beatles/a-day-in-the-life/504a0c49d2235a3ff94a83e6#tabs:%23score_A.

123

http://www.jellynote.com

https://www.jellynote.com/en/sheet-music-tabs/the-beatles/a-day-in-the-life/504a0c49d2235a3ff94a83e6#tabs:%23score_A

https://www.jellynote.com/en/sheet-music-tabs/the-beatles/a-day-in-the-life/504a0c49d2235a3ff94a83e6#tabs:%23score_A


Fig. 4 A snippet of the musicsheet from Mozart’s “AllaTurca”, corresponding to part ofthe interval between 0 and 40sof audio. Music sheet obtainedfrom The Mutopia Project [11]

Fig. 5 A snippet of the musicsheet from Mozart’s “AllaTurca”, corresponding to part ofthe interval starting at 40 s andfinishing at 55 s. Music sheetobtained from The MutopiaProject [11]

Fig. 6 Divergences observed between every possible pair of windowsforMozart’s “Alla Turca” considering only the interval starting at 130 sand finishing at 150 s (using a window of 44,100 observations). Bothaxes represent the window index. Because the window size used corre-sponds to exactly 1 s of audio, both axis can also be interpreted as thetime in seconds, representing the audio interval from 130 to 150 s

Fig. 7 Divergences observed between every possible pair of windowsfor Beatles’ “A day in the life”, using a window of 44,100 observations.Both axes represent the window index. Because the window size usedcorresponds to exactly 1 s of audio, both axis can also be interpreted asthe time in seconds from the begining of the audio to its end

123


Fig. 8 Divergences observed between every possible pair of windowsfor Beatles’ “A day in the life”, using a window of 4410 observations.Both axes represent the window index. Because the window size usedcorresponds to 1/10 of second of audio, both axis can also be interpretedas the time in deciseconds from the beginning of the audio to its end

PS graph, which explains the high divergences observed forthese intervals. After these fragments, the audio returns toits original octave, which is represented by the reddish areasamong the three marked intervals in Fig. 7. The same resultscan be observed when a smaller window size is used (Fig. 8).

Finally, the last experiment applied the change detectionalgorithm to the song “Chop Suey” by System of a Down.“Chop Suey” is a song that has very different characteristicsover time. The song version analyzed contains a great num-ber of instruments, including guitars, piano, violins, amongothers, which are used intermittently. Also, there are greatvariations in the main and back vocals performances duringthe song. The graphical representation of the results can beobserved in Figs. 9 and 10.

According to musical sheets available at http://www.jellynote.com,4 we observed this song is played in the samescale from the beginning to its end.

The change detection algorithm marked three main areasof divergence in the song, as can be observed in both Figs. 9and 10. These correspond to the approximate intervals: 1minand 14s to 1min and 26s (74–86s), 1min and 53s to 2minand 8s (113–128s), 2min and 40s to 3min and 20s (160–200s). When listening to the audio track, we observed thatthese intervals correspond to distinct variations in the mainand back singer’s vocals. The graph also shows rectangularareas of very low divergence, corresponding to the intervals:44 s to 1min (44–60s), and 1min and 22s to 1min and 40s

4 https://www.jellynote.com/en/sheet-music-tabs/system-of-a-down/chop-suey/504a0cfcd2235a3ff94a8b7c#tabs:%23score_C.

Fig. 9 Divergences observed between every possible pair of windowsfor System of a Down’s “Chop Suey”, using a window of 44,100 obser-vations. Both axes represent the window index. Because the windowsize used corresponds to exactly 1 s of audio, both axis can also beinterpreted as the time in seconds from the begining of the audio to itsend

Fig. 10 Divergences observed between every possible pair ofwindowsfor System of a Down”’s Chop Suey”, using a window of 4410 obser-vations. Both axes represent the window index. Because the windowsize used corresponds to 1/10 of second of audio, both axis can also beinterpreted as the time in deciseconds from the beginning of the audioto its end

(82–100s). In the audio track, we observed these intervalspresent a characteristic of continuous repetition of the samemusical structure, explaining the low divergence observedwhen comparing these windows of time.

123



https://www.jellynote.com/en/sheet-music-tabs/system-of-a-down/chop-suey/504a0cfcd2235a3ff94a8b7c#tabs:%23score_C

https://www.jellynote.com/en/sheet-music-tabs/system-of-a-down/chop-suey/504a0cfcd2235a3ff94a8b7c#tabs:%23score_C


5 Conclusion

The mining of continuous streams of data is a research topicthat has been attracting interest from many research groupsin the last years. This interest comes from an understand-ing that today applications generate unprecedented amountsof data, making traditional mining techniques inappropriatefor extracting useful knowledge. Therefore, new techniquesneed to be developed, which are capable of addressing themany challenges associated with this scenario. Apart frommemory use and processing time requirements, which are inthe core of any stream learning algorithm, the non-stationarynature of many streams poses yet another difficulty that thesealgorithms need to be aware of. Evolving data characteristicshave a direct impact on the accuracy of previously learnedmodels which, if not appropriately taken into account, lead tomodel degradation over time. Correctly detecting if andwhendata are changing is thus very important in this domain.

With the perception that many applications generatedata streams destituted of label information, many algo-rithms have been proposed to address unsupervised changedetection from streams of data, ranging from statistical meth-ods [2,8] to more advanced techniques derived from thedomains of clustering and novelty detection [1,6]. However,as cited in Vallim and Mello [13], there is a lack of theoreti-cal results that give guarantees on the learning efficiency ofthese methods. In an attempt to overcome this gap, Vallimand Mello [13] proposed a new stability concept designedfor the unsupervised stream domain, providing a new direc-tion in the development of learning algorithms whose resultsare not prone to parameterization. With this new tool, theyalso developed a stable change detection algorithm based onconsecutive comparisons of learned models. Afterwards, theauthors present, in a series of synthetical experiments, howthe algorithm works and its ability to correctly detect chang-ing data characteristics.

The results in [13], although promising, are limited to theartificial data scenario. This hasmotivated this work to assessthe performance of the change detection algorithmusing real-world data streams. Our application domain of choice wasin audio analysis, more specifically in the task of detectingchanges in music streams. To do so, we selected three dif-ferent music tracks, containing different sources of change,which were later processed by the algorithm. Analysis of theresults provided some insights on the types of changes thatproduced a greater influence on the algorithms change detec-tion strategy. The change detection algorithm studied wasclearly influenced by the addition of new sets of frequenciesin the audio tracks. For Mozart’s Alla Turca, this additionwas sudden, caused by a prompt inclusion of a new octaveat specific moments of the music. On the other hand, in theBeatles song, the increase in frequencies happened smoothlyover a sequence of new notes. Regardless of a sudden or a

more gradual increase in frequencies, the algorithm detectedboth situations. It is important to notice that the removal ofsets of frequencies leads to similar results.

Changes in amplitude were a second source of influenceto the algorithm’s decision on issuing a change. This type ofchange is caused by increasing the intensity of notes playedby certain instruments or the intensity of vocals, and wereobserved in both the experiment with the “Alla Turca” andwith the “Chop Suey” track.

In the experiments conducted, smaller changes in tonal-ity were not sufficient to produce substantial divergences.The results for the “Alla Turca” did not indicate considerabledivergences caused by a mode change (i.e., from A minor toA major, or vice versa), which, we concluded, was due tothe fact that these changes only produced a small shift in thefrequency axis of the corresponding PS graph. Therefore, thestudied algorithm is not appropriate if the changes embeddedin data correspond to this situation.

It is important to point out that any change detectionissued by the algorithm will be dependent on the thresholdvalue used. Smaller values will make the algorithm less con-servative when deciding on whether model divergences aresignificant or not. On the other hand, greater threshold valuesmight lead the algorithm to miss important changes. Dif-ferently from previous change detection algorithms for thestream scenario, however, the choice of a smaller thresholdvalue does not directly map to an increase in false positives.Once the change detection algorithm proposed byVallim andMello [13] holds the surrogate stability, lowering the thresh-old valuewill only decrease the degree of divergence betweentwomodels. Thismeans that data changes at a smaller level ofdetail will be considered by the algorithm. However, for anythreshold value greater than zero, no change will be issuedby the algorithm if the input data characteristics remain thesame.

Changing the value of the window size will influence inthe algorithm’s delay in detecting data changes. Shorter win-dows will lead to faster detection, while longer ones willconsequently increase the delay for detecting changes.

Parameter N , themaximumdiscrete frequency consideredfor calculating the distance between two PS graphs, is esti-mated as half the length of the window size. This happensbecause Fourier produces symmetric complex coefficients,so only half of them can be considered to represent informa-tion.

The experiments and analysis conducted in this paperwere intended to better understand under which scenarios thealgorithm’s change detection capabilities are better suited to.The insights obtained from the experiments with the audiodata will drive us in future developments to extend the algo-rithm proposed in [13] to be more robust to different datachanging scenarios, as well as different real-world problems.Such extents include experimenting with strategies for com-

123


paring PS graphs other than the Euclidean distance usedby Vallim and Mello [13]. One possibility would include,for example, using Dynamic Time Warping (DTW), which,as a consequence, would make the algorithm robust to shiftsin frequency as well as re-scaling in amplitudes. Therefore,when comparing PS graphs, even when shifts in frequencyor re-scaling in amplitudes occur, the algorithmwill considerthat both PS graphs present the same behavior, not issuingchanges, which can be interesting in applications such asthe identification of cover songs (covers might be performedafter some tonal transpositions, i.e., changes in harmony).As part of future work, we also intend to compare this algo-rithm, and possible extensions using different divergencemeasures, with other change detection algorithms present inthe literature which are not based on any stability theories, todemonstrate that the results obtained by those methods areaffected by changing model parameters, while the algorithmstudied in this paper remains robust. A natural subsequentwork would focus on designing modifications of the currentalgorithm tomake it applicable to themulti-dimensional datastream domain.

References

1. Albertini, M.K., Mello, R.F.: A Self-Organizing Neural Net-work to Approach Novelty Detection. In: Intelligent Systems forAutomated Learning and Adaptation: Emerging Trends and Appli-cations, IGI Global, pp. 49–71 (2010)

2. Bifet, A., Gavaldà, R.: Learning from Time-changing Data withAdaptive Windowing. In: SIAM International Conference on DataMining, Minneapolis, Minnesota, USA, pp. 443–448 (2007)

3. Carlsson, G., Memoli, F.: Characterization, stability and conver-gence of hierarchical clustering methods. J. Mach. Learn. Res. 11,1425–1470 (2010)

4. Ceccherini, G., Gobron, N., Migliavacca, M.: European vegetationdynamics from remote sensing: phenological timing and phenore-gion mapping. IEEE Trans. Geosci. Remote Sens (2014)

5. Gama, J., Rodrigues, P.P.: Data stream processing. In: Learningfrom Data Streams: Processing Techniques in Sensor Networks,pp. 25–38. Springer (2007)

6. Marsland, S., Shapiro, J., Nehmzow, U.: A self-organising net-work that grows when required. Neural Netw. 15(8–9), 1041–1058(2002)

7. Mitchell, T.M.:Machine Learning, 1st edn.McGraw-Hill Inc, NewYork (1997)

8. Page, E.S.: Continuous inspection schemes. Biometrika 41, 100–115 (1954)

9. Salamon, J., Gómez, E., Ellis, D.P.W., Richard, G.: Melody extrac-tion from polyphonic music signals: approaches, applications, andchallenges. IEEE Signal Process. Mag. 31(2), 118–134 (2014)

10. Serrà, J., Serra, X., Andrzejak, R.G.: Cross recurrence quantifi-cation for cover song identification. New J. Phys. 11(093), 017(2009). doi:10.1088/1367-2630/11/9/093017

11. TheMutopia Project Rondo alla turca. http://www.mutopiaproject.org/ftp/MozartWA/KV331/KV331_3_RondoAllaTurca/KV331_3_RondoAllaTurca-a4 (2014)

12. Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., Doynefarmer,J.: Testing for nonlinearity in time series: the method of surrogatedata. Physica D: Nonlinear Phenom. 58, 77–94 (1992)

13. Vallim, R.M.M., Mello, R.F.: Proposal of a new stability conceptto detect changes in unsupervised data streams. Expert Syst. Appl.41(16), 7350–7360 (2014)

123

http://dx.doi.org/10.1088/1367-2630/11/9/093017

http://www.mutopiaproject.org/ftp/MozartWA/KV331/KV331_3_RondoAllaTurca/KV331_3_RondoAllaTurca-a4



unsupervised change detection in data streams: an ... · rodrigo f. de mello has received research...

Documents