in search of the horowitz factor - temidabojan/mps/materials/widmer_in_search_of.pdf · in search...

■ The article introduces the reader to a large inter-disciplinary research project whose goal is to use AI to gain new insight into a complex artistic phe-nomenon. We study fundamental principles of ex-pressive music performance by measuring perfor-mance aspects in large numbers of recordings byhighly skilled musicians (concert pianists) and an-alyzing the data with state-of-the-art methodsfrom areas such as machine learning, data mining,and data visualization. The article first introducesthe general research questions that guide the pro-ject and then summarizes some of the most impor-tant results achieved to date, with an emphasis onthe most recent and still rather speculative work. Abroad view of the discovery process is given, fromdata acquisition through data visualization to in-ductive model building and pattern discovery, andit turns out that AI plays an important role in allstages of such an ambitious enterprise. Our currentresults show that it is possible for machines tomake novel and interesting discoveries even in adomain such as music and that even if we mightnever find the “Horowitz Factor,” AI can give uscompletely new insights into complex artistic be-havior.

The potential of AI, particularly machinelearning and automated discovery, formaking substantial discoveries in various

branches of science has been convincinglydemonstrated in recent years, mainly in thenatural sciences ([bio]chemistry, genetics,physics, and so on) (Hunter 1993; King et al.1992; Muggleton, King, and Sternberg 1992;Shavlik, Towell, and Noordewier 1992; Valdés-Pérez 1999, 1996, 1995). However, can AI alsofacilitate substantial discoveries in less easilyquantifiable domains such as the arts?

In this article, we want to demonstrate that

it can. We report on the latest results of a long-term interdisciplinary research project that us-es AI technology to investigate one of the mostfascinating—and at the same time highly elu-sive—phenomena in music: expressive musicperformance.1 We study how skilled musicians(concert pianists, in particular) make musiccome alive, how they express and communi-cate their understanding of the musical andemotional content of the pieces by shapingvarious parameters such as tempo, timing, dy-namics, and articulation.

Our starting point is recordings of musicalpieces by actual pianists. These recordings areanalyzed with intelligent data analysis meth-ods from the areas of machine learning, datamining, and pattern recognition with the aimof building interpretable quantitative modelsof certain aspects of performance. In this way,we hope to obtain new insight into how ex-pressive performance works and what musi-cians do to make music sound like music to us.The motivation is twofold: (1) by discoveringand formalizing significant patterns and regu-larities in the artists’ musical behavior, wehope to make new contributions to the field ofmusicology, and (2) by developing new data vi-sualization and analysis methods, we hope toextend the frontiers of the field of AI-based sci-entific discovery.

In this article, we take the reader on a grandtour of this complex discovery enterprise, fromthe intricacies of data gathering—which al-ready require new AI methods—through novelapproaches to data visualization all the way toautomated data analysis and inductive learn-ing. We show that even a seemingly intangiblephenomenon such as musical expression can

Articles

FALL 2003 111

In Search of theHorowitz Factor

Gerhard Widmer, Simon Dixon, Werner Goebl, Elias Pampalk, and Asmir Tobudic

Copyright © 2003, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2003 / $2.00

is data driven: We collect recordings of perfor-mances of pieces by skilled musicians;2 mea-sure aspects of expressive variation (for exam-ple, the detailed tempo and loudness changesapplied by the musicians); and search for pat-terns in these tempo, dynamics, and articula-tion data. The goal is to find interpretablemodels that characterize and explain consis-tent regularities and patterns, if such shouldindeed exist. This requires methods and algo-rithms from machine learning, data mining,and pattern recognition as well as novel meth-ods of intelligent music processing.

Our research is meant to complement recentwork in contemporary musicology that haslargely been hypothesis driven (for example,Friberg [1995]; Sundberg [1993]; Todd [1992,1989]; Windsor and Clarke [1997]), althoughsome researchers have also taken real data asthe starting point of their investigations (for ex-ample, Palmer [1988]; Repp [1999, 1998,1992]). In the latter kind of research, statisticalmethods were generally used to verify hypothe-ses in the data. We give the computer a moreautonomous role in the discovery process byusing machine learning and related techniques.

Using machine learning in the context of ex-pressive music performance is not new. For ex-ample, there have been experiments with case-based learning for generating expressivephrasing in jazz ballads (Arcos and López deMántaras, 2001; Lopez de Mántaras and Arcos2002). The goal of that work was somewhat dif-ferent from ours; the target was to producephrases of good musical quality, so the systemmakes use of musical background knowledgewherever possible. In our context, musicalbackground knowledge should be introducedwith care because it can introduce biases in thedata analysis process. In our own previous re-search (Widmer 1998, 1995), we (re-)discovereda number of basic piano performance rules withinductive learning algorithms. However, theseattempts were extremely limited in terms ofempirical data and, thus, made it practically im-possible to establish the significance of thefindings in a statistically well-founded way. Ourcurrent investigations, which are describedhere, are the most data-intensive empiricalstudies ever performed in the area of musicalperformance research (computer-based or oth-erwise) and, as such, probably add a new kindof quality to research in this area.

Two Basic Questions:Commonalities and Differences

The starting points for the following presenta-tion are two generic types of questions regard-

be transformed into something that can bestudied formally and that the computer can in-deed discover some fundamental (and some-times surprising) principles underlying the artof music performance. It turns out that AIplays an important role in each step of thiscomplex, multistage discovery project.

The title of the article refers to the lateVladimir Horowitz (1903–1989), Russian pi-anist, legendary virtuoso, and one of the mostfamous and popular pianists of the twentiethcentury, who symbolizes, like few others, thefascination that great performers hold for thegeneral audience.

Formally explaining the secret behind theart and magic of such a great master would bean extremely exciting feat. Needless to say, it isnot very likely, and no “Horowitz Factor” willbe revealed in this article. Still, we do hope thefollowing description of the project and its re-sults will capture the reader’s imagination.

Expressive Music PerformanceExpressive music performance is the art of shapinga musical piece by continuously varying impor-tant parameters such as tempo and dynamics.Human musicians do not play a piece of musicmechanically, with constant tempo or loudness,exactly as written in the printed music score.Rather, they speed up at some places, slow downat others, stress certain notes or passages by var-ious means, and so on. The most important pa-rameter dimensions available to a performer (apianist, in particular) are timing and continuoustempo changes, dynamics (loudness variations),and articulation (the way successive notes areconnected). The precise parameter changes arenot specified in the written score, but at thesame time, they are absolutely essential for themusic to be effective and engaging. The expres-sive nuances added by an artist are what makesa piece of music come alive (and what makessome performers famous).

Expressive variation is more than just a devi-ation from, or a distortion of, the original (no-tated) piece of music. In fact, the opposite is thecase: The notated music score is but a small partof the actual music. Not every intended nuancecan be captured in a limited formalism such ascommon music notation, and the composerswere and are well aware of this. The performingartist is an indispensable part of the system, andexpressive music performance plays a centralrole in our musical culture. That is what makesit a central object of study in the field of musi-cology (see Gabrielsson [1999] for an excellentoverview of pertinent research in the field).

Our approach to studying this phenomenon

Articles

112 AI MAGAZINE

Inductive Learning of ClassificationRules and the PLCG Algorithm

The induction of classification rules is one of the major classesof learning scenarios investigated in machine learning. Given aset of examples, each described by a well-defined set of descrip-tors and labeled as belonging to one of n disjoint classes c1 ... cn,the task is to induce a general model that is (more or less) con-sistent with the training examples and can predict the class ofnew, previously unseen examples. One class of models are clas-sification rules of the form class = ci IF <condition1> AND <con-dition2> AND.... If there are only two classes a and b, one ofwhich (a, say) is the class we want to find a definition for, oneusually speaks of concept learning and refers to instances of a aspositive examples and instances of b as negative examples. Sets ofclassification rules are commonly referred to as theories.

The most common strategy for learning classification rules inmachine learning is known as sequential covering, or separate-and-conquer (Fürnkranz 1999). The strategy involves inducingrules one by one and, after having learned a rule, removing allthe examples covered by the rule so that the following learningsteps will focus on the still-uncovered examples. In the simplestcase, this process is repeated until no positive examples are leftthat are not covered by any rule. A single rule is learned by usu-ally starting with the most general rule (a rule with no condi-tions) that would cover all given examples, positive and nega-tive, and then refining the rule step by step by adding onecondition at a time so that many negative examples are exclud-ed, and many positive ones remain covered. The process of se-lecting conditions is usually guided by heuristics such as weight-ed information gain (Quinlan 1990) that assess thediscriminatory potential of competing conditions. In the sim-plest case, rule refinement stops when the current rule is pure,that is, covers no more negative examples. However, in real life,data can be noisy (that is, contain errors), and the given data andrule representation languages might not even permit the formu-

lation of perfectly consistent theories; so, good rulelearning algorithms perform some kind of pruning toavoid overfitting. They usually learn rules that are notentirely pure, and they stop before all positive exam-ples have been covered. The short presentation givenhere is necessarily simplified in various ways, and thereader who wishes to learn the whole story is referredto Fürnkranz (1999).

At the heart of our new learning algorithmPLCG is such a sequential covering algorithm. However,“wrapped around” this simple rule learning algorithmis a metaalgorithm that essentially uses the underlyingrule learner to induce several partly redundant theo-ries and then combines these theories into one finalrule set. In this sense, PLCG is an example of what isknown in machine learning as ensemble methods (Di-etterich 2000). The PLCG algorithm proceeds in severalstages (figure A); it is here we can explain the acronymPLCG. PLCG stands for partition + learn + cluster + gen-eralize. In a first step, the training examples are parti-tioned into several subsets (partition). From each ofthese subsets, a set of classification rules is induced

(learn). These rule sets are then merged into one large set, and ahierarchical clustering of the rules into a tree of rule sets is per-formed (cluster), where each set contains rules that are some-how similar. Each of these rule sets is then replaced with theleast general generalization of all the rules in the set (generalize).The result is a tree of rules of varying degrees of generality. Final-ly, a heuristic algorithm selects the most promising rules fromthis generalization tree and joins them into the final rule set thatis then returned.

The motivation for this rather complex procedure is that inthis way the advantage of ensemble learning—improved predic-tion accuracy by selectively combining the expertise of severalclassifiers—is combined with the benefit of inducing compre-hensible theories. In contrast to most common ensemble learn-ing methods such as bagging (Breiman 1996), stacking (Wolpert1992), or boosting (Freund and Shapire 1996), which only com-bine the predictions of classifiers to improve prediction accura-cy, PLCG combines the theories directly and produces one finalrule set that can be interpreted. This is important in the contextof our project, where the focus is on the discovery of inter-pretable patterns.

Systematic large-scale experiments have shown that PLCG con-sistently learns more precise theories than state-of-the-art rule-learning algorithms such as RIPPER (Cohen 1995), but these the-ories also tend to be much simpler and also more specific; thatis, they cover, or explain, fewer of the positive examples. To putit in simple terms, PLCG only learns rules for those parts of thetarget concept where it is quite sure it can competently predict;this feature is quite desirable in our discovery context. More de-tail on algorithm and experimental results can be found in Wid-mer (2003).

Articles

FALL 2003 113

R1.1: class = a IF …R1.2: class = a IF …R1.3: class = a IF …… …

R2.1: class = a IF …R2.2: class = a IF …R2.3: class = a IF …… …

Rk.1: class = a IF …Rk.2: class = a IF …Rk.3: class = a IF …… …

RULE 1: class = a IF …RULE 2: class = a IF …RULE 3: class = a IF ……………………………

R7.2: class = a IF …

{R1.1, R1.2, … }

{R.2.6, R3.4, R7.2, … }

{R3.4, R7.2}

R3.4: …

R7.2: class = a IF …R3.4: …

GR473: class = a IF …



Select

Part

itio

n

Lear

n

Mer

gePl

us

Clu

ster

Gen

eral

ize

Figure A. The PLCG Learning Algorithm: Main Stages.

Articles

114 AI MAGAZINE

ing expressive music performance. First, arethere general, fundamental principles of musicperformance that can be discovered and char-acterized? Are there general (possibly uncon-scious and definitely unwritten) rules that allor most performers adhere to? In other words,to what extent can a performer’s expressive ac-tions be predicted? Second, is it possible to for-mally characterize and quantify aspects of indi-vidual artistic style? Can we formally describewhat makes the special art of a Vladimir Horo-witz, for example?

The first set of questions thus relates to sim-ilarities or commonalities between differentperformances and different performers, andthe second focuses on the differences. The fol-lowing project presentation is structured ac-cording to these two types of questions. Thesection entitled “Studying Commonalities” fo-cuses on the commonalities and briefly recapit-ulates some of our recent work on learninggeneral performance rules from data. The ma-jor part of this article is presented in StudyingDifferences, which describes currently ongoing(and very preliminary) work on the discoveryof stylistic characteristics of great artists.

Both of these lines of research are complexenterprises and comprise a number of impor-tant steps—from the acquisition and measur-ing of pertinent data to computer-based dis-covery proper. As we see, AI plays an importantrole in all these steps.

Studying Commonalities:Searching for Fundamental

Principles of Music PerformanceThe question we turn to first is the search forcommonalities between different performan-ces and performers. Are there consistent pat-terns that occur in many performances andpoint to fundamental underlying principles?We are looking for general rules of music per-formance, and the methods used will comefrom the area of inductive machine learning.This section is kept rather short and onlypoints to the most important results becausemost of this work has already been publishedelsewhere (Widmer 2003, 2002b, 2001; Wid-mer and Tobudic 2003).

Data Acquisition: MeasuringExpressivity in PerformancesThe first problem is data acquisition. What werequire are precise measurements of the tempo,timing, dynamics, and articulation in a perfor-mance of a piece by a musician. In principle, weneed to measure exactly when and how long

and how loud each individual note was playedand how these measurements deviated fromthe nominal values prescribed in the writtenmusical score. Extracting this information withhigh precision from sound recordings is notpossible for basic signal processing reasons. In-stead, our main source of information are spe-cial pianos that precisely record each action bya performer. In particular, the BösendorferSE290 is a full-concert grand piano with a spe-cial mechanism that measures every key andpedal movement with high precision and storesthis information in a format similar to MIDI.(The piano also features a mechanical reproduc-tion facility that can reproduce a recorded per-formance with very high accuracy.) From thesemeasurements, and by comparing them to thenotes as specified in the written score, every ex-pressive nuance applied by a pianist can becomputed.

These nuances can be represented as expres-sion curves. For example, figure 1 shows dy-namics curves—the dynamics patterns pro-duced by three different pianists in performingthe same piece. More precisely, each point rep-resents the relative loudness with which a par-ticular melody note was played (relative to anaveraged standard loudness); a purely mechan-ical, unexpressive rendition of the piece wouldcorrespond to a perfectly flat horizontal line aty = 1.0. Variations in tempo and articulationcan be represented in an analogous way.

Figure 1 exhibits some clear common pat-terns and tendencies in the three performan-ces. Despite individual differences between therecordings, there seem to be common strate-gies, or “rules,” that are followed by the pi-anists, consciously or unconsciously. Obvious-ly, there is hope for automated discoveryalgorithms to find some general principles.

Induction of Note-Level Performance RulesSome such general principles have indeedbeen discovered with the help of a new induc-tive rule-learning algorithm named PLCG (Wid-mer 2003) (see page 113). PLCG was applied tothe task of learning note-level performancerules; by note level, we mean rules that predicthow a pianist is going to play a particular notein a piece—slower or faster than notated, loud-er or softer than its predecessor, staccato orlegato. Such rules should be contrasted withhigher-level expressive strategies such as theshaping of an entire musical phrase (for exam-ple, with a gradual slowing toward the end),which is addressed later. The training dataused for the experiments consisted of record-ings of 13 complete piano sonatas by W. A.

Articles

FALL 2003 115

ity and generality of the discovered rules, hereis an extreme example:

Rule TL2:abstract_duration_context = equal-longer& metr_strength ≤ 1⇒ lengthen

“Given two notes of equal duration fol-lowed by a longer note, lengthen thenote (that is, play it more slowly) thatprecedes the final, longer one if this noteis in a metrically weak position (metricalstrength ≤ 1).”

This is an extremely simple principle thatturns out to be surprisingly general: Rule TL2correctly predicts 1,894 cases of local notelengthening, which is 14.12 percent of all theinstances of significant lengthening observedin the training data. The number of incorrectpredictions is 588 (2.86 percent of all the coun-terexamples), which gives a precision (percent-age of correct predictions) of .763. It is remark-able that one simple principle like this issufficient to predict such a large proportion ofobserved note lengthenings in complex musicsuch as Mozart sonatas.

To give the reader an impression of just howeffective a few simple rules can be in predictinga pianist’s behavior in certain cases, figure 2compares the tempo variations predicted byour rules to the pianist’s actual timing in a per-formance of the well-known Mozart SonataK.331 in A major (first movement, first sec-

Mozart (K.279-284, 330-333, 457, 475, and533), performed by the Viennese concert pi-anist Roland Batik. The resulting data set com-prises more than 106,000 performed notes andrepresents some 4 hours of music. The experi-ments were performed on the melodies (usual-ly the soprano parts) only, which gives an ef-fective training set of 41,116 notes. Each notewas described by 29 attributes (10 numeric, 19discrete) that represent both intrinsic proper-ties (such as scale degree, duration, metricalposition) and some aspects of the local context(for example, melodic properties such as thesize and direction of the intervals between thenote and its predecessor and successor notes,rhythmic properties such as the durations ofsurrounding notes and so on, and some ab-stractions thereof).

From these 41,116 examples of played notes,PLCG learned a small set of 17 quite simple clas-sification rules that predict a surprisingly largenumber of the note-level choices of the pianist.The rules have been published in the musico-logical literature (Widmer 2002b) and have cre-ated some interest. The surprising aspect is thehigh number of note-level actions that can bepredicted by very few (and mostly very simple)rules. For example, 4 rules were discovered thattogether correctly predict almost 23 percent ofall the situations where the pianist lengtheneda note relative to how it was notated (whichcorresponds to a local slowing of the tempo).3

To give the reader an impression of the simplic-

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5 Dynamics Pianist 1 Dynamics Pianist 2 Dynamics Pianist 3

Figure 1. Dynamics Curves (relating to melody notes) of Performances of the Same Piece (Frédéric Chopin, Etude op.10 no.3, E ma-jor) by Three Different Viennese Pianists (computed from recordings on a Bösendorfer 290SE computer-monitored grand piano).

Articles

116 AI MAGAZINE

tion). In fact, it is just two simple rules (one fornote lengthening, one for shortening) thatproduce the system’s timing curve.4

Experiments also showed that most of theserules are highly general and robust: They carryover to other performers and even music of dif-ferent styles with virtually no loss of coverageand precision. In fact, when the rules were test-ed on performances of quite different music(Chopin), they exhibited significantly highercoverage and prediction accuracy than on theoriginal (Mozart) data they had been learnedfrom. What the machine has discovered herereally seem to be fundamental performanceprinciples.

A detailed discussion of the rules, as well asa quantitative evaluation of their coverage andprecision, can be found in Widmer (2002b);the learning algorithm PLCG is described andanalyzed in Widmer (2003).

Multilevel Learning of Performance StrategiesAs already mentioned, not all a performer’s de-cisions regarding tempo or dynamics can bepredicted on a local, note-to-note basis. Musi-cians understand the music in terms of a mul-titude of more abstract patterns and structures(for example, motifs, groups, phrases), andthey use tempo and dynamics to “shape” thesestructures, for example, by applying a gradualcrescendo (growing louder) or decrescendo(growing softer) to entire passages. Music per-

formance is a multilevel phenomenon, withmusical structures and performance patterns atvarious levels embedded in each other.

Accordingly, the set of note-level perfor-mance rules described earlier is currently beingaugmented with a multilevel learning strategywhere the computer learns to predict elemen-tary tempo and dynamics “shapes” (like a grad-ual crescendo-decrescendo) at different levelsof the hierarchical musical phrase structureand combines these predictions with local tim-ing and dynamics predicted by learned note-level models. Preliminary experiments, againwith performances of Mozart sonatas, yieldedvery promising results (Widmer and Tobudic2003). Just to give an idea, figure 3 shows thepredictions of the integrated learning algo-rithm on part of a test piece after learning fromother Mozart sonatas. As can be seen in thelower part of the figure, the system manages topredict not only local patterns but also higher-level trends (for example, gradual increases ofoverall loudness) quite well.

The curve shown in figure 3 is from a com-puter-generated performance of the Mozart pi-ano sonata K.280 in F major. A recording ofthis performance was submitted to an Interna-tional Computer Piano Performance RenderingContest (RENCON’02) in Tokyo in September2002,5 where it won second prize behind arule-based rendering system that had carefullybeen tuned by hand. The rating was done by ajury of human listeners. Although this result in

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

Learned RulesPianist

Figure 2. Mozart Sonata K.331, First Movement, First Part, as Played by Pianist and Learner. The curve plots the relative tempo at each note; notes above the 1.0 line are shortened relative to the tempo of the piece, and notes below1.0 are lengthened. A perfectly regular performance with no timing deviations would correspond to a straight line at y = 1.0.

Articles

FALL 2003 117

ed in studying famous artists. Can we find the“Horowitz Factor”?

This question might be the more intriguingone for the general audience because it in-volves famous artists. However, the readermust be warned that the question is difficult.The following is work in progress, and the ex-amples given should be taken as indications ofthe kinds of things that we hope to discoverrather than as truly significant, established dis-covery results.

Data Acquisition: Measuring Expressivity in Audio RecordingsThe first major difficulty is again data acquisi-tion. With famous pianists, the only source ofdata are audio recordings, that is, records andmusic CDs (we cannot very well invite them all

no way implies that a machine will ever be ableto learn to play music like a human artist, wedo consider it a nice success for a machinelearning system.

Studying Differences: Trying to Characterize Individual Artistic Style

The second set of questions guiding our re-search concerns the differences between indi-vidual artists. Can one characterize formallywhat is special about the style of a particular pi-anist? Contrary to the research on commonprinciples described earlier, where we mainlyused performances by local (though highlyskilled) pianists, here we are explicitly interest-

25 30 35 40 45 500.5

1

1.5

score position (bars)

rel.

dyn

amic

s

25 30 35 40 45 500.5

1

1.5

score position (bars)

rel.

dyn

amic

s

Figure 3. Learner’s Predictions for the Dynamics Curve of Mozart Sonata K.280, Third Movement, mm. 25-50. Top: dynamics shapes predicted for phrases at four levels. Bottom: composite predicted dynamics curve resulting from phrase-level shapesand note-level predictions (gray) versus pianist’s actual dynamics (black). Line segments at the bottom of each plot indicate hierarchicalphrase structure.

Articles

118 AI MAGAZINE

to Vienna to perform on the BösendorferSE290 piano). Unfortunately, it is impossible,with current signal-processing methods, to ex-tract precise performance information (startand end times, loudness, and so on) about eachindividual note directly from audio data. Thus,it will not be possible to perform studies at thesame level of detail as those based on MIDI da-ta. In particular, we cannot study how individ-ual notes are played.

What is currently possible is to extract tem-po and dynamics at the level of the beat.6 Thatis, we extract these time points from the audiorecordings that correspond to beat locations.From the (varying) time intervals betweenthese points, the beat-level tempo and itschanges can be computed. Beat-level dynamicsis also computed from the audio signal as theoverall loudness (amplitude) of the signal atthe beat times.

The hard problem here is automatically de-tecting and tracking the beat in audio record-ings. Indeed, this is an open research problemthat forced us to develop a novel beat-trackingalgorithm called BEATROOT (Dixon 2001c;Dixon and Cambouropoulos 2000) (see page127). Beat tracking, in a sense, is what humanlisteners do when they listen to a piece and taptheir foot in time with the music. As withmany other perception and cognition tasks,what seems easy and natural for a human turnsout to be extremely difficult for a machine.

The main problems to be solved are (1) de-tecting the onset times of musical events(notes, chords, and so on) in the audio signal,(2) deciding which of these events carry thebeat (that includes determining the basic tem-po, that is, the basic rate at which beats are ex-pected to occur), and (3) tracking the beatthrough tempo changes. Tracking the beat isextremely difficult in classical music, where theperformer can change the tempo drastically—aslowing down by 50 percent within 1 second isnothing unusual. It is difficult for a machine todecide whether an extreme change in interbeatintervals is because of the performer’s expres-sive timing or whether it indicates that the al-gorithm’s beat hypothesis was wrong.

After dealing with the onset-detectionproblem with rather straightforward signal-processing methods, BEATROOT models theperception of beat by two interacting process-es: The first finds the rate of the beats (tempoinduction), and the second synchronizes apulse sequence with the music (beat tracking).At any time, multiple hypotheses can exist re-garding each of these processes; these aremodeled by a multiple-agent architecture inwhich agents representing each hypothesis

compete and cooperate to find the best solu-tion (Dixon 2001c).

Experimental evaluations showed that theBEATROOT algorithm is probably one of the bestbeat tracking methods currently available(Dixon 2001a). In systematic experiments withexpressive performances of 13 complete pianosonatas by W. A. Mozart played by a Vienneseconcert pianist, the algorithm achieved a cor-rect detection rate of more than 90 percent.However, for our investigations, we needed atracking accuracy of 100 percent, so we optedfor a semiautomatic, interactive procedure. Thebeat-tracking algorithm was integrated into aninteractive computer program that takes a pieceof music (a sound file); tries to track the beat;7

displays its beat hypotheses visually on thescreen (figure 4); allows the user to listen to se-lected parts of the tracked piece and modify thebeat hypothesis by adding, deleting, or movingbeat indicators; and then attempts to retrackthe piece based on the updated information.This process is still laborious, but it is muchmore efficient than manual beat tracking.

After a recording has been processed in thisway, tempo and dynamics at the beat level caneasily be computed. The resulting series of tem-po and dynamics values is the input data to thenext processing step.

Data Visualization: The Performance WormAn important first step in the analysis of com-plex data is data visualization. Here we draw onan original idea and method developed by themusicologist Jörg Langner, who proposed torepresent the joint development of tempo anddynamics over time as a trajectory in a two-di-mensional tempo-loudness space (Langner andGoebl 2002). To provide for a visually appeal-ing display, smoothing is applied to the origi-nally measured series of data points by slidinga Gaussian window across the series of mea-surements. Various degrees of smoothing canhighlight regularities or performance strategiesat different structural levels (Langner and Goe-bl 2003). Of course, smoothing can also intro-duce artifacts that have to be taken into ac-count when interpreting the results.

We have developed a visualization programcalled the PERFORMANCE WORM (Dixon, Goebl,and Widmer 2002) that displays animated tem-po-loudness trajectories in synchrony with themusic. A movement to the right signifies an in-crease in tempo, a crescendo causes the trajec-tory to move upward, and so on. The trajecto-ries are computed from the beat-level tempoand dynamics measurements we make withthe help of BEATROOT; they can be stored to file

Articles

FALL 2003 119

reductions in loudness) to the loudness climaxaround the end of bar five, followed by a de-crescendo toward the end of the phrase. MarthaArgerich’s trajectory, however, betrays a differ-ent strategy. In addition to a narrower dynam-ics range and a slower global tempo, what setsher apart from the others is that, relativelyspeaking, she starts fast and then plays the en-tire phrase with one extended ritardando, inter-rupted only by a short speeding up betweenbars six and seven.9 Also, there is no really no-ticeable loudness climax in her interpretation.

The point here is not to discuss the musicalor artistic quality of these three perfor-mances—to do this, one would have to see(and hear) the phrase in the context of the en-tire piece—but simply to indicate the kinds ofthings one can see in such a visualization.Many more details can be seen when lesssmoothing is applied to the measured data.

We are currently investing large efforts intomeasuring tempo and dynamics in recordings

and used for systematic quantitative analysis(see discussion later).

To give a simple example, Figure 5 showssnapshots of the WORM as it displays perfor-mances of the same piece of music (the firsteight bars of Robert Schumann’s Von fremdenLändern und Menschen, from Kinderszenen,op.15) by three different pianists.8 Con-siderable smoothing was applied here to high-light the higher-level developments withinthis extended phrase.

It is immediately obvious from figure 5 thatHorowitz and Kempff have chosen a similarinterpretation. Both essentially divide thephrase into two four-bar parts, where the firstpart is played more or less with an accelerando(the worm moves to the right) and the secondpart with a ritardando, interrupted by a localspeeding up in bar 6 (more pronounced in theKempff performance). Their dynamics strate-gies are highly similar too: a general crescendo(interweaved, in Horowitz’s case, with two local

Figure 4. Screen Shot of the Interactive Beat-Tracking System BEATROOT Processing the First Five Seconds of a Mozart Piano Sonata. Shown are the audio wave form (bottom), the spectrogram derived from it (black “clouds”), the detected note onsets (short vertical lines),the system’s current beat hypothesis (long vertical lines), and the interbeat intervals in milliseconds (top).

Articles

120 AI MAGAZINE

of different pieces by different famous pianists,using the interactive BEATROOT system. Ourcurrent collection (as of January 2003)amounts to more than 500 recordings by pi-anists such as Martha Argerich, VladimirHorowitz, Artur Rubinstein, Maurizio Pollini,Sviatoslav Richter, and Glenn Gould, playingmusic by composers such as Mozart, Bee-thoven, Schubert, Chopin, Schumann, orRachmaninov. Precisely measuring these tooksome 12 person-months of hard work.

Figure 6 shows a complete tempo-loudnesstrajectory representing a performance of aChopin Ballade by Artur Rubinstein. The sub-sequent analysis steps will be based on thiskind of representation.

Transforming the Problem: Segmentation, Clustering, VisualizationThe smoothed tempo-loudness trajectoriesprovide intuitive insights, which might makevisualization programs such as the WORM a use-ful tool for music education. However, the goalof our research is to go beyond informal obser-vations and derive objective, quantitative con-clusions from the data.

Instead of analyzing the raw tempo-loudnesstrajectories—which, mathematically speaking,are bivariate time series—directly, we chose topursue an alternative route, namely, to trans-form the data representation and, thus, the en-tire discovery problem into a form that is acces-sible to common inductive machine learningand data-mining algorithms. To this end, theperformance trajectories are cut into short seg-ments of fixed length, for example, two beats.The segments are optionally subjected to vari-ous normalization operations (for example,mean and/or variance normalization to ab-stract away from absolute tempo and loudnessand/or absolute pattern size, respectively). Theresulting segments are then grouped into class-es of similar patterns using clustering. For eachof the resulting clusters, a prototype is comput-ed. These prototypes represent a set of typicalelementary tempo-loudness patterns that canbe used to approximately reconstruct a “full”trajectory (that is, a complete performance). Inthis sense, they can be seen as a simple alphabetof performance, restricted to tempo and dy-namics. Figure 7 displays a set of prototypicalpatterns computed from a set of Mozart sonatarecordings by different artists.

The particular clustering shown in figure 7was generated by a self-organizing map (SOM)algorithm (Kohonen 2001). A SOM produces ageometric layout of the clusters on a two-di-mensional grid or map, attempting to place

8

57.0

11.5

59.0

10.5

61.0

9.5

63.0

8.5

65.0

7.5

67.0

6.5

69.0

5.5

71.0

4.5

73.0

3.5

sone

BPM

8

57.0

14.0

59.0

13.0

61.0

12.0

63.0

11.0

65.0

10.0

67.0

9.0

69.0

8.0

71.0

7.0

73.0

6.0

sone

BPM

8

57.0

17.4

59.0

16.3

61.0

15.2

63.0

14.1

65.0

13.0

67.0

11.9

69.0

10.8

71.0

9.7

73.0

8.6

sone

BPM

Time: 16.6Bar: 8Beat: 16



Figure 5. Performance Trajectories over the First Eight Bars of Von fremdenLändern und Menschen (from Kinderszenen, op.15, by Robert Schumann),as Played by Martha Argerich (top), Vladimir Horowitz (center), and Wil-

helm Kempff (bottom). Horizontal axis: tempo in beats per minute (bpm); vertical axis: loudness in sone(Zwicker and Fastl 2001). The largest point represents the current instant; instantsfurther in the past appear smaller and fainter. Black circles mark the beginningsof bars. Movement to the upper right indicates a speeding up (accelerando) andloudness increase (crescendo) and so on. Note that the y axes are scaled different-ly. The musical score of this excerpt is shown at the bottom.

Articles

FALL 2003 121

ciples that all artists adhere to. Most obviousare the common bright areas in the upper rightand lower left corners; these correspond to acoupling of tempo and dynamics increases anddecreases, respectively (see figure 7). That is, aspeeding up often goes along with an increasein loudness, and vice versa. This performanceprinciple is well known and is well in accor-dance with current performance models inmusicology.

The differences between pianists emergewhen we filter out the commonalities. Theright half of figure 8 shows the same pianistsafter the commonalities (the joint SDH of allpianists) have been subtracted from each indi-vidual SDH. Now we can see clear stylistic dif-ferences. Witness, for example, the next-to-rightmost cluster in the bottom row, whichrepresents a slowing down associated with anincrease in loudness (a movement of the trajec-

similar clusters close to each other. This prop-erty, which is quite evident in figure 7, facili-tates a simple, intuitive visualization method.The basic idea, named smoothed data his-tograms (SDHs), is to visualize the distributionof cluster members in a given data set by esti-mating the probability density of the high-di-mensional data on the map (see Pampalk,Rauber, and Merkl [2002] for details). Figure 8shows how SDHs can be used to visualize thefrequencies with which certain pianists use el-ementary expressive patterns (trajectory seg-ments) from the various clusters.

Looking at these SDHs with the correspond-ing cluster map (figure 7) in mind gives us animpression of which types of patterns arepreferably used by different pianists. Note thatgenerally, the overall distributions of patternusage are quite similar (figure 8, left). Obvious-ly, there are strong commonalities, basic prin-

50 100 150 200 250 300 3500

5

10

15

20

25

30

35

Figure 6. A Complete WORM: Smoothed Tempo-Loudness Trajectory Representing a Performance of Frédéric Chopin’s Ballade op.47 in A Flat Major by Artur Rubinstein.

Horizontal axis: tempo in beats per minute (bpm); vertical axis: loudness in sone (Zwicker and Fastl 2001).

Articles

122 AI MAGAZINE

tory toward the upper left). This pattern isfound quite frequently in performances byHorowitz, Schiff, and Batik but much less so inperformances by Barenboim, Uchida, and (es-pecially) Pires. An analogous observation canbe made in the next-to-leftmost cluster in thetop row, which also represents a decoupling oftempo and dynamics (speeding up but growingsofter). Again, Pires is the pianist who does thismuch less frequently than the other pianists.Overall, Maria João Pires and András Schiff ap-pear to be particularly different from each oth-er, and this impression is confirmed when welisten to the Mozart recordings of these two pi-anists.

Structure Discovery in Musical StringsThe SDH cluster visualization method givessome insight into very global aspects of perfor-mance style—the relative frequency withwhich different artists tend to use certain styl-

istic patterns. It does show that there are sys-tematic differences between the pianists, butwe want to get more detailed insight into char-acteristic patterns and performance strategies.To this end, another (trivial) transformation isapplied to the data.

We can take the notion of an alphabet liter-ally and associate each prototypical elementarytempo-dynamics shape (that is, each clusterprototype) with a letter. For example, the pro-totypes in figure 7 could be named A, B, C, andso on. A full performance—a complete trajectoryin tempo-dynamics space—can be approximat-ed by a sequence of elementary prototypes andthus be represented as a sequence of letters,that is, a string. Figure 9 shows a part of a per-formance of a Mozart sonata movement codedin terms of such an alphabet.

This final transformation step, trivial as itmight be, makes it evident that our originalmusical problem has now been transferred into

305

319

344

399

525

208

371

844

288

326

335

460

574

243

160

209

873

410

337

488

388

452

342

319

Figure 7. A Mozart Performance Alphabet (cluster prototypes) Computed by Segmentation, Mean, and Variance Normalization and Clustering from performances of Mozart Piano Sonatas by Six Pianists

(Daniel Barenboim, Roland Batik, Vladimir Horowitz, Maria João Pires, András Schiff, and Mitsuko Uchida).To indicate directionality, dots mark the end points of segments. Shaded regions indicate the variance within a cluster.

Articles

FALL 2003 123

Barenboim Pires

Schiff Uchida

Batik Horowitz

Barenboim (–Average) Pires (–Average)

Schiff (–Average) Uchida (–Average)

Batik (–Average) Horowitz (–Average)

Figure 8. A Smoothed Data Histogram (SDH) Visualization of the Mozart Performances, Grouped by Pianist (see figure 7 for the corresponding cluster map).

Left: “Plain” SDH showing the relative frequency of pattern use by the individual pianists. Right: To emphasize the differences between theartists, the joint SDH of all pianists was subtracted from each individual SDH. Bright areas indicate high frequency of pattern use.

CDSRWHGSNMBDSOMEXOQVWOQQHHSRQVPHJFATGFFUVPLDTPNMECDOVTOMECDSPFXPOFAVHDTPNFEVHHDXTPMARIFFUHHGIEEARWTTLJEEEARQDNIBDSQIETPPMCDTOMAWOFVTNMHHDNRRVPHHDUQIFEUTPLXTORQIEBXTORQIECDHFVTOFARBDXPKFURMHDTTPDTPJARRQWLGFCTPNMEURQIIBDJCGRQIEFFEDTTOMEIFFAVTTPKIFARRTPPPNRRMIEECHDSRRQEVTTTPORMCGAIEVLGFWLHGARRVLXTOQRWPRRLJFUTPPLSRQIFFAQIFARRLHDSOQIEBGAWTOMEFETTPKECTPNIETPOIIFAVLGIECDRQFAVTPHSTPGFEJAWPORRQICHDDDTPJFEEDTTPJFAVTOQBHJRQIBDNIFUTPPLDHXOEEAIEFFECXTPRQIFECPOVLFAVPTTPPPKIEEFRWPNNIFEEDTTPJFAXTPQIBDNQIECOIEWTPPGCHXOEEUIEFFICDSOQMIFEEBDTPJURVTTPPNMFEARVTTFFFFRVTLAMFFARQBXSRWPHGFBDTTOU...............................

Figure 9. Beginning of a Performance by Daniel Barenboim (W. A. Mozart piano sonata K.279 in C major) Coded in Terms of a 24-Letter Performance Alphabet Derived from a Clustering of Performance Trajectory Segments.

Articles

124 AI MAGAZINE

a quite different world: the world of stringanalysis. The fields of pattern recognition, ma-chine learning, data mining, and so on, havedeveloped a rich set of methods that can findstructure in strings and that could now prof-itably be applied to our musical data.10

There are a multitude of questions onemight want to ask of these musical strings. Forexample, one might search for letter sequences(substrings) that are characteristic of a particu-lar pianist (see later discussion). One mightsearch for general, frequently occurring sub-strings that are typical components of perfor-mances—stylistic clichés, so to speak. Usingsuch frequent patterns as building blocks, onemight try to use machine learning algorithmsto induce—at least partial—grammars of musi-cal performance style (for example, Nevill-Manning and Witten 1997). One might alsoinvestigate whether a machine can learn toidentify performers based on characteristics oftheir performance trajectories.

We are working along several of these lines.For example, we are currently experimentingwith classification algorithms that learn to rec-ognize famous pianists based on aspects oftheir performance trajectories, both at the levelof the raw trajectories (numeric values) and atthe level of performance strings. First experi-mental results are quite encouraging (Sta-matatos and Widmer 2002; Zanon and Wid-mer 2003): there seem to be artist-specificpatterns in the performances that a machinecan identify.

In the following discussion, we look at amore direct attempt at discovering stylistic pat-terns typical of different artists. Let’s take theprevious performance strings and ask the fol-lowing question: Are there substrings in thesestrings that occur much more frequently inperformances by a particular pianist than inothers?

Such a question is a data-mining one. We donot want to bore the reader with a detailedmathematical and algorithmic account of theproblem. Essentially, what we are looking forare letter sequences with a certain minimumfrequency that occur only in one class ofstrings (that is, in performances by one pi-anist). In data-mining terms, these could becalled discriminative frequent sequences. In reali-ty, patterns that perfectly single out one pianistfrom the others will be highly unlikely, so in-stead of requiring uniqueness of a pattern to aparticular pianist, we will be searching for pat-terns that exhibit a certain level of discrimina-tory power.

Data mining has developed a multitude ofmethods for discovering frequent subsets or se-

quences in huge sequences of events or items(for example, Agrawal and Srikant [1994] andMannila, Toivonen, and Verkamo [1997]). Weextended one of these basic methods—the lev-elwise search algorithm for finding frequentitem sets (Agrawal and Srikant 1994)—towardbeing able to find frequent subsequences thatare also discriminative, where discriminatorypotential is related to the level of certaintywith which one can predict the pianist afterhaving observed a particular pattern. Techni-cally, this method involves computing the en-tropies of the distribution of the pattern occur-rences across the pianists and selectingpatterns with low entropy (Widmer 2002a).

In a first experiment with recordings ofMozart piano sonatas—5 sonatas, 54 sections,6 pianists (Daniel Barenboim, Roland Batik,Glenn Gould, Maria João Pires, András Schiff,and Mitsuko Uchida)—a number of sequenceswere discovered that are discriminative accord-ing to our definition and also look like theymight be musically interesting. For example, inone particular alphabet, the sequence FAVTcame up as a typical Barenboim pattern, withseven occurrences in Barenboim’s Mozart per-formances, two in Pires, one in Uchida, andnone in the other pianists (see also figure 9).

Now what is FAVT? To find out whether aletter sequence codes any musically interestingor interpretable behavior, we can go back tothe original data (the tempo-loudness trajecto-ries) and identify the corresponding trajectorysegments in the recordings that are coded bythe various occurrences of the sequence. As theleft part of figure 10 shows, what is coded bythe letter sequence FAVT in Daniel Baren-boim’s performances of Mozart is an increasein loudness (a crescendo), followed by a slighttempo increase (accelerando), followed by adecrease in loudness (decrescendo) with moreor less constant tempo. This pattern is, indeed,rather unusual. In our experience to date, it isquite rare to see a pianist speed up during aloudness maximum. Much more common insuch situations are slowings down (ritardandi),which gives a characteristic counterclockwisemovement of the WORM; for example, the righthalf of figure 10 shows instantiations of a pat-tern that seems characteristic of the style ofMitsuko Uchida (8 occurrences versus 0 in allthe other pianists). This “Barenboim pattern”might, thus, really be an interesting discoverythat deserves more focused study.

What about Vladimir Horowitz, the readermight ask. Where is the Horowitz factor? Thisis a fair question, given the rather grandiose ti-tle of this article. Figure 11 shows a patternthat was discovered as potentially typical of

Articles

FALL 2003 125

Horowitz in an experiment with Chopinrecordings by various famous pianists, includ-ing Horowitz, Rubinstein, and Richter. Thetempo-loudness trajectory corresponding tothe pattern describes a slowing down with adecrescendo—a movement from the upperright to the lower left—followed by a littlespeedup—the loop—followed again by a slow-ing down, now with a crescendo, an increasein loudness. If nothing else, this pattern cer-tainly looks nice. Instantiations of the pattern,distorted in various forms, were found inHorowitz recordings of music by Mozart,Chopin, and even Beethoven (see figure 12).

Is this a characteristic Horowitz performancepattern, a graphic illustration of Horowitz’s in-dividuality?

A closer analysis shows that the answer isno.11 When we actually listen to the corre-sponding sections from the recordings, we findthat most of these patterns are not really per-ceivable in such detail. In particular, the littleinterspersed accelerando in the bottom leftcorner that makes the pattern look so interest-ing is so small in most cases—it is essentiallythe result of a temporal displacement of a sin-gle melody note—that we do not hear it as aspeedup. Moreover, it turns out that some ofthe occurrences of the pattern are artifacts,caused by the transition from the end of onesonata section to the start of the next, for ex-ample.

One must be careful not to be carried awayby the apparent elegance of such discoveries.The current data situation is still too limited todraw serious conclusions. The absolute num-bers (8 or 10 occurrences of a supposedly typi-cal pattern in recordings by a pianist) are toosmall to support claims regarding statistical sig-nificance. Also, we cannot say with certaintythat similar patterns do not occur in the perfor-mances by the other pianists just because theydo not show up as substrings—they might becoded by a slightly different character se-quence! Moreover, many alternative perfor-mance alphabets could be computed; we cur-rently have no objective criteria for choosingthe optimal one in any sense. Finally, even ifwe can show that some of these patterns arestatistically significant, we will still have to es-tablish their musical relevance, as the Horowitzexample clearly shows. The ultimate test is lis-tening, and that is a very time-consuming ac-tivity.

Thus, this section appropriately ends with aword of caution. The reader should not takeany of the patterns shown here too literally.They are only indicative of the kinds of thingswe hope to discover with our methods.

50 100 150 200 250 300 35012

14

16

18

20

22

24

26

Tempo [bpm]

Lou

dn

ess

[son

e]

FAVT Barenboim t2_norm_mean_var (7)

70 80 90 100 110 120 130 140 15010

15

20

25

30

35

40

Tempo [bpm]

Lou

dn

ess

[son

e]

SC Uchida t4_norm_mean (8)

Figure 10. Two Sets of (instantiations of) Performance Patterns: FAVT Sequence Typical of Daniel Barenboim (tpp) and SC pattern

(in a different alphabet) from Mitsuko Uchida (bottom). To indicate directionality, a dot marks the end point of a segment.

Articles

126 AI MAGAZINE

54 56 58 60 62 64 66 68 70 725

6

7

8

9

10

11

12

13

14

15

Tempo [bpm]

Lou

dn

ess

[son

e]

LS Horowitz (Mozart t4 mean var smoothed)

Figure 11. A Typical Horowitz Pattern?

ways of looking at high-level aspects of perfor-mance and visualizing differences in perfor-mance style (Dixon, Goebl, and Widmer 2002);and a first set of performance patterns thatlook like they might be characteristic of partic-ular artists and that might deserve more de-tailed study.

Along the way, a number of novel methodsand tools of potentially general benefit weredeveloped: the beat-tracking algorithm and theinteractive tempo-tracking system (Dixon2001c), the PERFORMANCE WORM (Dixon et al.2002) (with possible applications in music ed-ucation and analysis), and the PLCG rule learn-ing algorithm (Widmer 2003), to name a few.

However, the list of limitations and openproblems is much longer, and it seems to keepgrowing with every step forward. Here, we dis-cuss only two main problems related to the lev-el of the analysis:

First, the investigations should look at muchmore detailed levels of expressive performance,which is currently precluded by fundamentalmeasuring problems. At the moment, it is onlypossible to extract rather crude and global in-formation from audio recordings; we cannotget at details such as timing, dynamics, and ar-ticulation of individual voices or individualnotes. It is precisely at this level—in theminute details of voicing, intervoice timing,and so on—that many of the secrets of whatmusic connoisseurs refer to as a pianist’sunique “sound” are hidden. Making these ef-fects measurable is a challenge for audio analy-sis and signal processing, one that is currentlyoutside our own area of expertise.

Second, this research must be taken to high-er structural levels. It is in the global organiza-tion of a performance, the grand dramaticstructure, the shaping of an entire piece, thatgreat artists express their interpretation andunderstanding of the music, and the differ-ences between artists can be dramatic. Thesehigh-level “master plans’”do not reveal them-selves at the level of local patterns we studiedearlier. Important structural aspects and globalperformance strategies will only become visibleat higher abstraction levels.

These questions promise to be a rich sourceof challenges for sequence analysis and patterndiscovery. First, we may have to develop appro-priate high-level pattern languages.

In view of these and many other problems,this project will most likely never be finished.However, much of the beauty of research is inthe process, not in the final results, and we dohope that our (current and future) sponsorsshare this view and will keep supporting whatwe believe is an exciting research adventure.

Whether these findings will indeed be musical-ly relevant and artistically interesting can onlybe hoped for at the moment.

ConclusionIt should have become clear by now that ex-pressive music performance is a complex phe-nomenon, especially at the level of world-classartists, and the “Horowitz Factor,” if there issuch a thing, will most likely not be explainedin an explicit model, AI-based or otherwise, inthe near future. What we have discovered todate are only tiny parts of a big mosaic. Still, wedo feel that the project has produced a numberof results that are interesting and justify thiscomputer-based discovery approach.

On the musical side, the main results to datewere a rule-based model of note-level timing,dynamics, and articulation with surprisinggenerality and predictivity (Widmer 2002b); amodel of multilevel phrasing that even won aprize in a computer music performance contest(Widmer and Tobudic 2003); completely new

Articles

FALL 2003 127

Finding the Beat with BEATROOT

Beat tracking involves identifying the basic rhythmic pulse of apiece of music and determining the sequence of times at whichthe beats occur. The BEATROOT (Dixon 2001b, 2001c) system ar-chitecture is illustrated in figure A. Audio or MIDI data areprocessed to detect the onsets of notes, and the timing of theseonsets is analyzed in the tempo induction subsystem to generatehypotheses of the tempo at various metric levels. Based on thesetempo hypotheses, the beat-tracking subsystem performs a mul-tiple hypothesis search that finds the sequence of beat times fit-ting best to the onset times.

For audio input, the onsets of events are found using a time-domain method that detects local peaks in the slope of the am-plitude envelope in various frequency bands of the signal. Thetempo-induction subsystem then proceeds by calculating the in-teronset intervals (IOIs) between pairs of (possibly nonadjacent)onsets, clustering the intervals to find common durations, andthen ranking the clusters according to the number of intervals

they contain and the relationships between different clusters toproduce a ranked list of basic tempo hypotheses.

These hypotheses are the starting point for the beat-trackingalgorithm, which uses a multiple-agent architecture to test thedifferent tempo and phase hypotheses simultaneously and findsthe agent whose predicted beat times most closely match thoseimplied by the data. Each hypothesis is handled by a beat-track-ing agent, characterized by its state and history. The state is theagent’s current hypothesis of the beat frequency and phase, andthe history is the sequence of beat times selected to date by theagent. Based on their current state, agents predict beat times andmatch them to the detected note onsets, using deviations frompredictions to adjust the hypothesized current beat rate andphase or create a new agent when there is more than one reason-able path of action. The agents assess their performance by eval-uating the continuity, regularity, and salience of the matchedbeats.

Figure A. BEATROOT Architecture.

Audio Input

Event Detection

Tempo Induction Subsystem

IOI Clustering

Cluster Grouping

Beat Tracking Subsystem

Agent Selection

Beat Track

Beat Tracking Agents

Articles

128 AI MAGAZINE

AcknowledgmentsThe project is made possible by a very generousSTART Research Prize by the Austrian FederalGovernment, administered by the AustrianFonds zur Förderung der WissenschaftlichenForschung (FWF) (project no. Y99-INF). Addi-tional support for our research on AI, machinelearning, scientific discovery, and music is pro-vided by the European project HPRN-CT-2000-00115 (MOSART) and the European UnionCOST Action 282 (Knowledge Exploration inScience and Technology). The Austrian Re-search Institute for Artificial Intelligence ac-knowledges basic financial support by the Aus-trian Federal Ministry for Education, Science,and Culture. Thanks to Johannes Fürnkranz forhis implementation of the substring discoveryalgorithm. Finally, we would like to thankDavid Leake and the anonymous reviewers fortheir very helpful comments.

Notes1. See also the project web page at www.oefai.at/mu-sic.

2. At the moment, we restrict ourselves to classicaltonal music and the piano.

3. It should be clear that a coverage of close to 100percent is totally impossible, not only because ex-pressive music performance is not a perfectly deter-ministic, predictable phenomenon but also becausethe level of individual notes is clearly insufficient asa basis for a complete model of performance; musi-cians think not (only) in terms of single notes but al-so in terms of higher-level musical units such as mo-tifs and phrases—see the subsection entitledMultilevel Learning of Performance Strategies.

4. To be more precise, the rules predict whether anote should be lengthened or shortened; the precisenumeric amount of lengthening or shortening is pre-dicted by a k–nearest-neighbor algorithm (with k = 3)that uses only instances for prediction that are cov-ered by the matching rule, as proposed in Weiss andIndurkhya (1995) and Widmer (1993).

5. Yes, there is such a thing….

6. The beat is an abstract concept related to the met-rical structure of the music; it corresponds to a kindof quasiregular pulse that is perceived as such andthat structures the music. Essentially, the beat is thetime points where listeners would tap their footalong with the music. Tempo, then, is the rate or fre-quency of the beat and is usually specified in termsof beats per minute.

7. The program has been made publicly available andcan be downloaded at www.oefai.at/~simon.

8. Martha Argerich, Deutsche Grammophon, 410653-2, 1984; Vladimir Horowitz, CBS Records (Mas-terworks), MK 42409, 1962; Wilhelm Kempff, DGG459 383-2, 1973.

9. Was this unintentional? A look at the completetrajectory over the entire piece reveals that quite incontrast to the other two pianists, she never returns

45 50 55 60 65 70 75 800

5

10

15

20

25

Tempo [bpm]

Lou

dn

ess

[son

e]

LS Horowitz (Mozart t4 mean var smoothed)

23 24 25 26 27 28 29 307.5

8

8.5

9

9.5

10

10.5

Tempo [bpm]

Lou

dn

ess

[son

e]

LS Horowitz (Beethoven t4 mean var smoothed)

20 30 40 50 60 70 80 90 1007

7.5

8

8.5

9

9.5

10

10.5

11

11.5

12

Tempo [bpm]

Lou

dn

ess

[son

e]

LS Horowitz (Chopin t4 mean var smoothed)

Figure 12. The Alleged Horowitz Pattern in Horowitz Recordings ofMusic by Mozart (top), Chopin (center), and Beethoven (bottom). (W.

A. Mozart: Piano Sonata K281, Bb major, 1st movement, recorded1989; F. Chopin: Etude op.10 no. 3, E major and Ballade no.4,op.52, F minor, recorded 1952; L. van Beethoven: Piano Sonata

op.13 in C minor [“Pathéthique”], 2nd movement, recorded 1963).

to the starting tempo again. Could it be that shestarted the piece faster than she wanted to?

10. However, it is also clear that through this long se-quence of transformation steps—smoothing, seg-mentation, normalization, replacing individual ele-mentary patterns by a prototype—a lot ofinformation has been lost. It is not clear at this pointwhether this reduced data representation still per-mits truly significant discoveries. In any case, what-ever kinds of patterns might be found in this repre-sentation will have to be tested for musicalsignificance in the original data.

11. That would have been too easy, wouldn’t it?

BibliographyAgrawal, R., and Srikant, R. 1994. Fast Algorithms forMining Association Rules. Paper presented at theTwentieth International Conference on Very LargeData Bases (VLDB), 12–15 September, Santiago,Chile.

Arcos, J. L., and López de Mántaras, R. 2001. An In-teractive CBR Approach for Generating ExpressiveMusic. Journal of Applied Intelligence 14(1): 115–129.

Breiman, L. 1996. Bagging Predictors. Machine Learn-ing 24:123–140.

Cohen, W. 1995. Fast Effective Rule Induction. Paperpresented at the Twelfth International Conferenceon Machine Learning, 9–12 July, Tahoe City, Califor-nia.

Dietterich, T. G. 2000. Ensemble Methods in Ma-chine Learning. In First International Workshop onMultiple Classifier Systems, eds. J. Kittler and F. Roli,1–15. New York: Springer.

Dixon, S. 2001a. An Empirical Comparison of TempoTrackers. Paper presented at the VIII Brazilian Sym-posium on Computer Music (SBCM’01), 31 July–3August, Fortaleza, Brazil.

Dixon, S. 2001b. An Interactive Beat Tracking and Vi-sualization System. In Proceedings of the Interna-tional Computer Music Conference (ICMC’2001), LaHabana, Cuba.

Dixon, S. 2001c. Automatic Extraction of Tempo andBeat from Expressive Performances. Journal of NewMusic Research 30(1): 39–58.

Dixon, S., and Cambouropoulos, E. 2000. Beat Track-ing with Musical Knowledge. Paper presented at theFourteenth European Conference on Artificial Intel-ligence (ECAI-2000), 20–25 August, Berlin.

Dixon, S.; Goebl, W.; and Widmer, G. 2002. The PER-FORMANCE WORM: Real-Time Visualization of Expres-sion Based on Langner’s Tempo-Loudness Anima-tion. Paper presented at the International ComputerMusic Conference (ICMC’2002), 16–21 September,Göteborg, Sweden.

Freund, Y., and Shapire, R. E. 1996. Experiments witha New Boosting Algorithm. Paper presented at theThirteenth International Machine Learning Confer-ence, 3–6 July, Bari, Italy.

Friberg, A. 1995. A Quantitative Rule System for Mu-sical Performance. Ph.D. dissertation, Royal Instituteof Technology (KTH).

Fürnkranz, J. 1999. Separate-and-Conquer Rule

Learning. Artificial Intelligence Review 13(1): 3–54.

Gabrielsson, A. 1999. The Performance of Music. InThe Psychology of Music, ed. D. Deutsch, 501–602. 2ded. San Diego: Academic Press.

Hunter, L., ed. 1993. Artificial Intelligence and Molecu-lar Biology. Menlo Park, Calif.: AAAI Press.

King, R. D.; Muggleton, S.; Lewis, R. A.; and Stern-berg, M .J. E. 1992. Drug Design by Machine Learn-ing: The Use of Inductive Logic Programming toModel the Structure-Activity Relationship ofTrimethoprim Analogues Binding to DihydrofolateReductase. In Proceedings of the National Academy ofSciences 89, 11322–11326. Washington, D.D.: Na-tional Academy Press.

Kohonen, T. 2001. Self-Organizing Maps. 3d ed.Berlin: Springer Verlag.

Langner, J., and Goebl, W. 2003. Visualizing Expres-sive Performance in Tempo-Loudness Space. Comput-er Music Journal. Forthcoming.

Langner, J., and Goebl, W. 2002. Representing Ex-pressive Performance in Tempo-Loudness Space. Pa-per presented at the ESCOM Conference on MusicalCreativity, 5–8 April, Liège, Belgium.

López de Mántaras, R., and Arcos, J. L. 2002. AI andMusic: From Composition to Expressive Perfor-mances. AI Magazine 23(3): 43–57.

Mannila, H.; Toivonen, H.; and Verkamo, I. 1997.Discovery of Frequent Episodes in Event Sequences.Data Mining and Knowledge Discovery 1(3): 259–289.

Muggleton, S.; King, R. D.; and Sternberg, M. J. E.1992. Protein Secondary Structure Prediction UsingLogic-Based Machine Learning. Protein Engineering5(7): 647–657.

Nevill-Manning, C. G., and Witten, I. H. 1997. Iden-tifying Hierarchical Structure in Sequences: A Linear-Time Algorithm. Journal of Artificial Intelligence Re-search 7:67–82.

Palmer, C. 1988. Timing in Skilled Piano Perfor-mance. Ph.D. dissertation, Cornell University.

Pampalk, E.; Rauber, A.; and Merkl, D. 2002. UsingSmoothed Data Histograms for Cluster Visualizationin Self-Organizing Maps. Paper presented at the In-ternational Conference on Artificial Neural Networks(ICANN’2002), 27–30 August, Madrid, Spain.

Quinlan, J. R. 1990. Learning Logical Definitionsfrom Relations. Machine Learning 5:239–266.

Repp, B. H. 1999. A Microcosm of Musical Expres-sion: II. Quantitative Analysis of Pianists’ Dynamicsin the Initial Measures of Chopin’s Etude in E Major.Journal of the Acoustical Society of America 105(3):1972–1988.

Repp, B. H. 1998. A Microcosm of Musical Expres-sion: I. Quantitative Analysis of Pianists’ Timing inthe Initial Measures of Chopin’s Etude in E Major.Journal of the Acoustical Society of America 104(2):1085–1100.

Repp, B. 1992. Diversity and Commonality in MusicPerformance: An Analysis of Timing Microstructurein Schumann’s Träumerei. Journal of the Acoustical So-ciety of America 92(5): 2546–2568.

Shavlik, J. W.; Towell, G.; and Noordewier, M. 1992.Using Neural Networks to Refine Biological Knowl-

Articles

FALL 2003 129

Werner Goebl is memberof the Music Group at theAustrian Research Insti-tute for Artificial Intelli-gence, Vienna, Austria. Heis a Ph.D. student at theInstitute for Musicology atGraz University. With his

research on expressive music performance,he combines both experience as a perform-ing classical pianist and insight from musi-cology. His e-mail address is [email protected].

Elias Pampalk is a Ph.D.student at the ViennaUniversity of Technology.As a member of the musicgroup of the Austrian Re-search Institute for Artifi-cial Intelligence, he is de-veloping and applying

data-mining techniques for music analysisand music information retrieval. His e-mailaddress is [email protected].

Asmir Tobudic obtainedan M.Sc. in electrical engi-neering from the Univer-sity of Technology, Vien-na. He joined the MusicGroup at the Austrian Re-search Institute for Artifi-cial Intelligence in Sep-

tember 2001, where he currently works as aPh.D. student. His research interests con-centrate on developing machine learningtechniques for building quantitative mod-els of musical expression. His e-mail ad-dress is [email protected].

Widmer, G. 1993. Combining Knowledge-Based and Instance-Based Learning to Ex-ploit Qualitative Knowledge. Informatica17(4): 371–385.

Widmer, G. and Tobudic, A. 2003. PlayingMozart by Analogy: Learning Multi-LevelTiming and Dynamics Strategies. Journal ofNew Music Research 32(3). Forthcoming.

Windsor, L., and Clarke, E. 1997. ExpressiveTiming and Dynamics in Real and ArtificialMusical Performances: Using an Algorithmas an Analytical Tool. Music Perception15(2): 127–152.

Wolpert, D. H. 1992. Stacked Generaliza-tion. Neural Networks 5(2): 24–-260.

Zanon, P., and Widmer, G. 2003. Learningto Recognize Famous Pianists with Ma-chine Learning Techniques. Paper present-ed at the Third Decennial Stockholm MusicAcoustics Conference (SMAC’03), 6–9 Au-gust, Stockholm, Sweden.

Zwicker, E., and Fastl, H. 2001. Psychoa-coustics: Facts and Models. Berlin: SpringerVerlag.

Gerhard Widmer is an as-sociate professor in theDepartment of MedicalCyternetics and ArtificialIntelligence at the Univer-sity of Vienna and head ofthe Machine Learning andData Mining Research

Group at the Austrian Research Institute forArtificial Intelligence, Vienna, Austria. Heholds M.Sc. degrees from the University ofTechnology, Vienna, and the University ofMadison at Wisconsin and a Ph.D. in com-puter science from the University of Tech-nology, Vienna. His research interests andcurrent work are in the fields of machinelearning, data mining, and intelligent mu-sic processing. In 1998, he was awardedone of Austria’s highest research prizes, theSTART Prize, for his work on AI and music.His e-mail address is gerhard @ai.univie.ac.at.

Simon Dixon is a researchscientist at the AustrianResearch Institute for Arti-ficial Intelligence, with re-search interests in beattracking, automatic tran-scription, and analysis ofmusical expression. He ob-

tained BSc (1989) and Ph.D. (1994) degreesfrom the University of Sydney in computerscience and an LMusA (1988) in classicalguitar. From 1994 to 1999, he worked as alecturer in computer science at FlindersUniversity of South Australia. His e-mailaddress is [email protected].

edge. International Journal of Genome Re-search 1(1): 81–107.

Stamatatos, E., and Widmer, G. 2002. Mu-sic Performer Recognition Using an Ensem-ble of Simple Classifiers. Paper presented atthe Fifteenth European Conference on Arti-ficial Intelligence (ECAI’2002), 21–26 July,Lyon, France.

Sundberg, J. 1993. How Can Music Be Ex-pressive? Speech Communication 13: 239–253.

Todd, N. 1992. The Dynamics of Dynamics:A Model of Musical Expression. Journal ofthe Acoustical Society of America 91(6): 3540–3550.

Todd, N. 1989. Towards a Cognitive Theoryof Expression: The Performance and Per-ception of Rubato. Contemporary Music Re-view 4:405–416.

Valdés-Pérez, R. E. 1999. Principles of Hu-man-Computer Collaboration for Knowl-edge Discovery in Science. Artificial Intelli-gence 107(2): 335–346.

Valdés-Pérez, R. E. 1996. A New Theorem inParticle Physics Enabled by Machine Dis-covery. Artificial Intelligence 82(1–2):331–339.

Valdés-Pérez, R. E. 1995. Machine Discov-ery in Chemistry: New Results. Artificial In-telligence 74(1): 191–201.

Weiss, S., and Indurkhya, N. 1995. Rule-Based Machine Learning Methods for Func-tional Prediction. Journal of Artificial Intelli-gence Research 3:383-403.

Widmer, G. 2003. Discovering Simple Rulesin Complex Data: A Meta-Learning Algo-rithm and Some Surprising Musical Discov-eries. Artificial Intelligence 146(2): 129–148.

Widmer, G. 2002a. In Search of theHorowitz Factor: Interim Report on a Musi-cal Discovery Project. In Proceedings of theFifth International Conference on Discov-ery Science (DS’02), 13–32. Berlin: SpringerVerlag.

Widmer, G. 2002b. Machine Discoveries: AFew Simple, Robust Local Expression Prin-ciples. Journal of New Music Research 31(1):37–50.

Widmer, G. 2001. Using AI and MachineLearning to Study Expressive Music Perfor-mance: Project Survey and First Report. AICommunications 14(3): 149–162.

Widmer, G. 1998. Applications of MachineLearning to Music Research: EmpiricalInvestigations into the Phenomenon ofMusical Expression. In Machine Learning,Data Mining, and Knowledge Discovery: Meth-ods and Applications, eds. R. S. Michalski, I.Bratko, and M. Kubat, 269–293. Chichester,United Kingdom: Wiley.

Widmer, G. 1995. Modeling the RationalBasis of Musical Expression. Computer Mu-sic Journal 19(2): 76–96.

Articles

130 AI MAGAZINE

in search of the horowitz factor - temidabojan/mps/materials/widmer_in_search_of.pdf · in search...

Documents