research article caro 2.0: an interactive system for

14
Research Article CaRo 2.0: An Interactive System for Expressive Music Rendering Sergio Canazza, Giovanni De Poli, and Antonio Rodà Department of Information Engineering, University of Padova, Via Gradenigo 6/A, 35131 Padova, Italy Correspondence should be addressed to Sergio Canazza; [email protected] Received 6 August 2014; Revised 14 December 2014; Accepted 27 December 2014 Academic Editor: Francesco Bellotti Copyright © 2015 Sergio Canazza et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In several application contexts in multimedia field (educational, extreme gaming), the interaction with the user requests that system is able to render music in expressive way. e expressiveness is the added value of a performance and is part of the reason that music is interesting to listen. Understanding and modeling expressive content communication is important for many engineering applications in information technology (e.g., Music Information Retrieval, as well as several applications in the affective computing field). In this paper, we present an original approach to modify the expressive content of a performance in a gradual way, applying a smooth morphing among performances with different expressive content in order to adapt the audio expressive character to the user’s desires. e system won the final stage of Rencon 2011. is performance RENdering CONtest is a research project that organizes contests for computer systems generating expressive musical performances. 1. Introduction In the last years, several services based on Web 2.0 technolo- gies have been developed, proposing new modalities of social interaction for music creation and fruition [1]. Notwithstand- ing differences among the systems, all these services tend to divide users in two categories: on the one hand the large group of music listeners, which mainly have the job of evaluating and recommending music; on the other hand, the restricted group of music content creators, which are required to have skills in the field of music composition or music performance. is partition limits the participation and interaction with the music content. Applications like Guitar-Hero try to fill this gap, giving also to non-musically trained users the possibility to experiment with music performance. Unfortunately, up to now game designers did not consider a very important aspect of music, that is, the player’s expressiveness: a performance is rated “perfect” only if it is played exactly like the musical score, without any interpretation, any humanity. Our studies on music expressiveness [2, 3] led us to develop a tool for the auralisation of multimedia objects for Web-based applications [4]. Our audio authoring tool allows a user to manage audio expressive content, applying a smooth morphing among different expressive intentions in music performances and adapting the audio-expressive character to their taste. e audio-authoring tool lets you associate differ- ent expressive characters with various multimedia objects. In this paper we present a system, namely, CaRo 2.0 (CAnazza-ROd` a, from the name of the two main authors: besides, caro in Italian means “dear”), that allows an active experience of music listening. In the CaRo 2.0 system the Valence-Arousal emotional space and Energy-Kinetics sensory space are generalised into the idea of an abstract control space, where expressive intentions are associated with positions and objects. Each user can invent or select their own expressive concepts and then design the mapping of these concepts to positions and movements on this space. e emotion and sensory control metaphors are implemented as particular and significant cases. When the user accesses a musical content, (s)he can act on the control space by selecting a point or drawing a trajectory, thus changing the character of the music. e system allows the user to engage an active and personalised experience by the interactive control of the expressive nuances of the music (s)he is listening to. 2. Modeling Expressive Music Performance 2.1. Expressive Deviations. e contribution of the performer to expression communication has two aspects: to clarify Hindawi Publishing Corporation Advances in Human-Computer Interaction Volume 2015, Article ID 850474, 13 pages http://dx.doi.org/10.1155/2015/850474

Upload: others

Post on 17-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article CaRo 2.0: An Interactive System for

Research ArticleCaRo 20 An Interactive System for Expressive Music Rendering

Sergio Canazza Giovanni De Poli and Antonio Rodagrave

Department of Information Engineering University of Padova Via Gradenigo 6A 35131 Padova Italy

Correspondence should be addressed to Sergio Canazza canazzadeiunipdit

Received 6 August 2014 Revised 14 December 2014 Accepted 27 December 2014

Academic Editor Francesco Bellotti

Copyright copy 2015 Sergio Canazza et alThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

In several application contexts inmultimedia field (educational extreme gaming) the interaction with the user requests that systemis able to render music in expressive way The expressiveness is the added value of a performance and is part of the reason thatmusic is interesting to listen Understanding and modeling expressive content communication is important for many engineeringapplications in information technology (eg Music Information Retrieval as well as several applications in the affective computingfield) In this paper we present an original approach to modify the expressive content of a performance in a gradual way applyinga smooth morphing among performances with different expressive content in order to adapt the audio expressive character tothe userrsquos desires The system won the final stage of Rencon 2011 This performance RENdering CONtest is a research project thatorganizes contests for computer systems generating expressive musical performances

1 Introduction

In the last years several services based on Web 20 technolo-gies have been developed proposing newmodalities of socialinteraction formusic creation and fruition [1] Notwithstand-ing differences among the systems all these services tend todivide users in two categories on the one hand the large groupof music listeners which mainly have the job of evaluatingand recommending music on the other hand the restrictedgroup of music content creators which are required to haveskills in the field ofmusic composition ormusic performanceThis partition limits the participation and interactionwith themusic content Applications like Guitar-Hero try to fill thisgap giving also to non-musically trained users the possibilityto experiment with music performance Unfortunately up tonow game designers did not consider a very important aspectof music that is the playerrsquos expressiveness a performanceis rated ldquoperfectrdquo only if it is played exactly like the musicalscore without any interpretation any humanity

Our studies on music expressiveness [2 3] led us todevelop a tool for the auralisation of multimedia objects forWeb-based applications [4] Our audio authoring tool allowsa user tomanage audio expressive content applying a smoothmorphing among different expressive intentions in musicperformances and adapting the audio-expressive character to

their taste The audio-authoring tool lets you associate differ-ent expressive characters with various multimedia objects

In this paper we present a system namely CaRo 20(CAnazza-ROda from the name of the two main authorsbesides caro in Italian means ldquodearrdquo) that allows an activeexperience of music listening In the CaRo 20 systemthe Valence-Arousal emotional space and Energy-Kineticssensory space are generalised into the idea of an abstractcontrol space where expressive intentions are associated withpositions and objects Each user can invent or select theirown expressive concepts and then design the mapping ofthese concepts to positions andmovements on this spaceTheemotion and sensory control metaphors are implementedas particular and significant cases When the user accessesa musical content (s)he can act on the control space byselecting a point or drawing a trajectory thus changing thecharacter of the music The system allows the user to engagean active and personalised experience by the interactivecontrol of the expressive nuances of the music (s)he islistening to

2 Modeling Expressive Music Performance

21 Expressive Deviations The contribution of the performerto expression communication has two aspects to clarify

Hindawi Publishing CorporationAdvances in Human-Computer InteractionVolume 2015 Article ID 850474 13 pageshttpdxdoiorg1011552015850474

2 Advances in Human-Computer Interaction

the composerrsquos message enlightening the musical structureand to add his personal interpretation of the piece Amechan-ical performance of a score is perceived as lacking of musicalmeaning and is considered dull and inexpressive as a text readwithout any prosodic inflection Indeed human performersnever respect tempo timing and loudness notations in amechanical way when they play a score some deviations arealways introduced even if the performer explicitly wants toplay mechanically [5]

Most studies on musical expression aim at understand-ing the systematic presence of deviations from the musicalnotation as a communication means between musician andlistener Deviations introduced by technical constraints (suchas fingering) or by imperfect performer skill are not normallyconsidered part of expression communication and are thusoften filtered out as noise

At a physical information level the main parametersconsidered in the models of the musical expression calledexpressive parameters are related to timing of musical eventsand tempo (fast or slow) dynamics (loudness variation)and articulation (the way successive notes are connected)These parameters are particularly relevant for keyboardinstruments Moreover they are the basic parameters ofthe MIDI protocol (Musical Instrument Digital Interfacesee httpmidiorgaboutmiditut historyphp) and thus areeasily measurable on digital music instruments and can beused as control signals for rendering a music performanceIn some instruments and in the singing voice other acousticparameters are taken into account such as vibrato andmicro-intonation or pedalling at the piano In contemporary musictimbre is often an essential expressive parameter sometimesalso virtual space location or movement of the sound sourceis used as expression feature

The analysis of these systematic deviations has led tothe formulation of several models that try to describe theirstructure with the aim to explain where how and whya performer modifies sometimes unconsciously what isindicated by the notation in the score It should be noticedthat although deviations are only the external surface ofsomething deeper and often not directly accessible they arequite easily measurable and thus widely used to developcomputational models for performance understanding andgeneration

22 Expressive Intentions In general musical expressionrefers both to the means used by the performer to conveythe composerrsquos message and to hisher own contribution toenrich the musical message Expressiveness related to themusical structure may depend on the dramatic narrativedeveloped by the performer and on the stylistic expectationbased on cultural norm (eg jazz versus classic music) andthe actual performance situation (eg audience engagement)Recently more interest has been given to the expressive com-ponent due to the personal interpretation of the performer[6 7]

Many studies (see eg [8ndash13]) demonstrated that it ispossible to communicate expressive content at an abstractlevel changing the interpretation of a musical piece In factthe musician organises acoustical or perceptual changes in

sound communicating different feelings images or emotionsto the listener The same piece of music can be performeddifferently by the same musician (trying to convey a specificinterpretation of the score or the situation) by addingmutableexpressive intentions

Notice that sometimes expressive intentions the per-former tries to convey can be in contrast with the characterof the musical piece A slightly broader interpretation ofexpression as kansei (Japanese term indicating sensibilityfeeling and sensitivity) [14 15] or affective communication[16] is proposed in some Japanese or American studies Weprefer the broader term expressive intention that includesemotion affect and other sensorial and descriptive wordsor actions Furthermore this term evidences the explicitintent of the performer in communicating expression Mostmusic performances would involve some intention from theperformerrsquos side regarding what the music should express tothe listeners Consequently interpretation involves assigningsome kind of meaning to the music

Whenwe talk of deviation it is important to define whichis the reference used for computing deviation Very oftenthe score is taken as reference both for theoretical (the scorerepresents the music structure) and for practical (it is easilyavailable) reasons However the choice depends on the prob-lem onewants to focus onWhenwe studied how a performerplays a piece according to different expressive intentionswe found that a clearer interpretation and best results insimulation are obtained by using a neutral performance as[17] By ldquoneutralrdquo we intend a human performance withoutany specific expressive intention In other studies the meanperformance (ie the mathematical mean across differentperformances by the same or many performers) was the usedreference to investigate stylistic choices and preferences (eg[18 19])

23 Categorical versus Dimensional Most people have aninformal understanding of musical expression While itsimportance is generally acknowledged the basic constituentsare less clear Often the simple range expressive-inexpressiveis used Regarding the affective component of music expres-sion the two theoretical traditions that have most stronglydetermined past research in this area are categorical (alsocalled discrete) and dimensional emotion theories

The assumption of the categorical approach is that peopleexperience emotions as categories that are distinct from eachother Theorists in this tradition propose the existence ofa small number of basic or fundamental emotions that arecharacterised by very specific response patterns From thesebasic emotions all other emotional states can be derivedThe focus is on characteristics that distinguish emotionfrom one another There is a reasonable agreement on fourbasic emotions happiness sadness anger and fear The nextcommon two are disgust and surprise In this case expressioninformation is represented by labels indicating the expressioncategory and eventually a number or adjective indicating thedegree of that expression

The focus of the dimensional approach is on the identifi-cation of emotions based on their placement on a continuous

Advances in Human-Computer Interaction 3

10

05

00

minus05

minus10

100500minus05minus10

Ener

gy

Kinetics

Hard

Heavy

DarkBright

Light

Soft

Figure 1 Kinetics-Energy space as mid-level representation ofexpressive intentions

space with a small number of dimensions This space isderived from similarity judgements analysed using fac-tor analysis or multidimensional scaling The dimensionalapproach provides a way of describing expression which ismore tractable than using words but which can be translatedinto and out of verbal descriptions Translation is possiblebecause emotion-related words can be understood at leastto a first approximation as referring to positions in thedimensional space [20] Moreover this representation isuseful for capturing the continuous change in expressionduring a piece of music [21]

The most used representation in music research is thetwo-dimensional Valence-Arousal (V-A) space even if otherdimensions were explored in several studies (eg see [22])The V-A space organises emotions in term of affect appraisal(pleasant-unpleasant) and physiological reaction (high-lowarousal) For example happiness is described as a feeling ofexcitement (high arousal) combinedwith positive affect (highvalence) A problem with this approach is that specifyingthe quality of a feeling only in terms of valence and arousaldoes not allow a very high degree of differentiation especiallyin research on music where one may expect a somewhatreduced range of both the unpleasantness and the arousal ofthe states produced by music [23]

Unlike the studies on music and emotions the authorsfocused on expressive intentions described by sensorial adjec-tives [2] which are frequently used in music performancethat is light heavy soft hard bright and dark and othersEach of these descriptors has an opposite (soft versus hard)and provokes contrasting performances by the musicianSome performances played according to the different expres-sive intentions were evaluated in listening experimentsFactor analysis using the performances as variables founda two-dimensional space (Figure 1)The first factor is charac-terised by brightlight versus heavy performances the secondone by soft versus hard performances Acoustical analysisof the performances (Table 1) showed that first factor isclosely correlated with Tempo and can be interpreted as theKinetics (or Velocity) factor while the second one is related

Table 1 Correlation between coordinate axes and acoustic param-eters

Tempo Legato IntensityDim 1 (kinetics) 065 minus028 minus025Dim 2 (energy) 033 minus072 073

to legatostaccato and Intensity and can be interpreted asthe Energy factor By legatostaccato we refer to how muchthe notes are detached and distinctly separated as the ratiobetween the duration of a given note (ie the measure oftime between note-onset and note-offset) and the interonsetinterval (ie the measure of time between two consecutivenote onsets) which occurs between its subsequent notes

We can use this interpretation of Kinetics-Energy spaceas an indication of how listeners organised the performancesin their own minds when focusing on sensory aspectsThe robustness of this representation was confirmed bysynthesising different and varying expressive intentions in amusical performance We can notice that this representationis at an abstraction level which is between the semantic one(such emotion) and physical one (such as timing deviations)and can thus be more effective in representing the listenerrsquosevaluation criteria [24]

3 Computer Systems for ExpressiveMusic Performance

While the models described in the previous section weremainly developed for analysis and understanding purposethey are also often used for synthesis purposes Starting frommodels of musical expression several software systems forrendering expressive musical performances were developed(see [25] for a review)

The systems for automatic expressive performances canbe grouped into three general categories (i) autonomous (ii)feedforward and (iii) feedback systems Examples for eachcategory are presented

Given a score the purpose of all the systems is to calculatethe so-called expressive deviations to be applied to each notein order to change the duration and the intensity of themusical events notated in the traditional music score Aperformance without expressive deviations is indicated by theterm ldquonominal performancerdquo

(1) Autonomous Systems Autonomous System (AS) is a col-lection of rules obtained automatically in which the musicalscore is processed without user participation and then playedin automatic way

As an example YQX system [26] uses a rule set obtainedby means of machine learning algorithms developed in theresearch field of Artificial Intelligence Widmer [27] devel-oped an algorithm for automatic extraction of rules specifi-cally designed to analyse musical performances By applyingthe algorithm to a collection of performances of Mozartrsquospiano sonatasmdashplayed using a grand piano Bosendorfer290 SEmdasha rule set has been extracted that suggests somedeviations in the expressive rhythmic and in the melodic

4 Advances in Human-Computer Interaction

structures For example the staccato rule requires that iftwo successive notes have the same pitch but the secondhas a longer temporal duration then the first note shouldbe performed staccato and the delay-next rule states thatif two notes of the same duration are followed by a longerone then the last note should be played with a slight delayAs mentioned before these rules were obtained from thetraining sets provided in this sense the rules can change ifthe system is trained on a different piano repertoire or withmusic performed by different pianists

(2) Feedforward Systems The systems characterised by afeedforward behaviour allow the user to annotate the scoreand to set the specific parameters of the system depending onthemusic score to be played but they do not allow a real-timecontrol during the performance

A historical example is Director Musices a rule basedsystem for automatic performance developed by the KungligaTekniska Hogskolan (KTH) in Stockholm born in the 1980s[28] with the aim of improving the synthesis quality ofcomputer systems for the singing voice Starting from thework of musical structures analysis by Ipolyi [29] and incollaboration with the violinist Fryden Lars Johan Sundbergformalised the relation set between the musical structuresand the expressive deviations The method used to formalisemost rules is called ldquoanalysis by synthesisrdquo [30] As a firststep a model based on experience and intuition is developedand it is subsequently applied to different musical examplesPerformances generated by this model are evaluated bylistening to identify additional changes to the model Theprocess is reiterated several times until a stable definitionof the model is obtained At this point perceptual tests ona larger population are carried out in order to assess therule system The Director Musices input is a score in MIDIformat applying all the rules (or a subset of them) Thenit computes the expressive deviations for every single notesaving the results to a new MIDI file The user can choosewhich rules to apply and how toweight them through numer-ical coefficients The deviations calculated by each rule areadded together giving rise to complex profiles as shown inFigure 2

The Shunji [31] is an algorithm for automatic performancethat uses the case-based reasoning paradigm the well-knownartificial intelligence technique that learns from examplesalternative to those used by the software YQX Unlike thelatter however Shunji is a supervised system not providinga solution to the problem proposed in an autonomous waybut interactingwith the user proposing possible solutions andreceiving controls to improve the result Shunji is based ona hierarchical segmentation algorithm [32] inspired by theGenerative Theory of Tonal Music by Lerdahl and Jackendoff[33] The performances provided as an example to thealgorithm are segmented using the hierarchical model andthe expressive deviations of each segment are stored in adatabase When the system input is a score that had not beenprocessed before the software segments the score and thenfor each segment searches the database for the most similarsegment in order to extract the performance characteristicsThen it applies the expressive deviations thus obtained to

Melodic charge (k = 2)

Duration contrast

Punctation

Phrase arch (subphrase level)

Phrase arch (phrase level)

Resulting IOI variations

15105

minus5

minus10

1520

105

minus5minus10minus15

105

minus5minus10

105

minus5

5

minus5

minus10

1015

5

minus5

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Figure 2 Example of the rules system by KTH application (adaptedfrom [30]) In particular the figure shows the resulting interonsetinterval (IOI ie the time from one tonersquos onset to that of thenext) deviations by applying of four rules (phrase arch durationcontrast melodic charge and punctuation) to the Swedish nurserytune Ekorrn satt i granen

the new score The user can interact with the software bycontrolling the segmentation of the score by providing newexamples by changing the parameters used to define thesimilarity between two musical phrases and by choosing theexamples to extract the performance characteristics amongthose that the system has considered to be more similar Thisapproach also works using a reduced number of examples ifthey are well chosen

(3) Feedback Systems The systems described above workindependently or require user intervention to define someperformance details before replay On the contrary in thefeedback systems the user can control the performancein real time during replay As the orchestral conductorcontrols the musiciansrsquo performance the user canmodify theexpressive deviations in an interactive way

In the VirtualPhilharmony [34] the user is given a scoreBefore starting the performance and with the help of sometools (s)he has to choose an expressive model that is a setof expressive deviations in relation to the articulation andagogic aspects of each single note During the performancea sensor measures the userrsquos arm movements who mimicsthe gestures of an orchestral conductor with a wand inhisher hand The system then modifies the temporal anddynamic characteristics of the performance in a consistentway following the userrsquos intentions taking into account theparticular aspects of the interaction conductor-musicianincluding the latency time between the conductorrsquos gestureand the musician execution and the prediction of the next

Advances in Human-Computer Interaction 5

Music score

EditorAnnotations

piano

protocol

Naturalizer Expressivizer

CaRo

Off-line Real-time

intention

controlledMIDI

MIDI

Userrsquos expressive

MusicXMLfile

Figure 3 System architecture CaRo 20 receives as input an annotated score in MusicXML format and generates in real-time messages toplay a MIDI controlled piano

beat which allows the musician to maintain the correcttempo between a conductorrsquos gesture and the next gesture

The differences between the systems described aboveinclude both the algorithms for computing the expressivedeviations and the aspects related to the userrsquos interactionIn the autonomous systems the user can interact with thesystem only in the selection of the training set and as aconsequence the performance style that the systemwill learnThe feedforward systems allow a deeper interaction with themodel parameters the user can set the parameters listen tothe results and then fine-tune the parameters again until theresults are satisfying The feedback systems allow a real-timecontrol of the parameters the user can change the parametersof the performance while (s)he is listening to it in a similarway as a human musician does Since the models for musicperformance usually have numerous parameters a crucialaspect of the feedback systems is how to allow the userto simultaneously control all these parameters in real-timeVirtualPhilharmony allows the user to control in real-timeonly two parameters (intensity and tempo) the other ones aredefined offline by mean of a so-called performance template

Also the system developed by the authors and describedin the next sections is a feedback system As explained laterthe real-time control of the parameters is allowed by meansof a control space based on a semantic description of differentexpressive intentions Starting from the trajectories drawn bythe user on this control space the systemmaps concepts suchas emotions or sensations in the low level parameters of themodel

4 The CaRo 20 System

41 Architecture CaRo 20 simulates the tasks of a pianistwho reads a musical score decodes its symbols plansthe expressive choices and finally executes the actionsin order to actually play the instrument Usually pianistsanalyse the score very carefully before the performanceand they add annotations and cues to the musical sheetin order to emphasise rhythmic-melodic structures sec-tion subdivisions and other interpretative indications Asshown in Figure 3 CaRo 20 receives a musical scoreas input compliant with the MusicXML standard [35]Any music editor able to export in MusicXML format

(eg Finale (httpwwwfinalemusiccom) or MuseScore(httpmusescoreorg)) can be used to write the score andadd annotations CaRo 20 is able to decode and process aset of expressive cues as specified in the next section Theslurs can be hierarchically structured to individuate sectionsmotifs and short musical cells Moreover textual labels canbe added to specify the interpretative choices

The specificity of CaRo 20 in comparison to the systemspresented in Section 3 is that it is based on a model thatexplicitly distinguishes the expressive deviations related tothe musical structure from those related to other musi-cal aspects such as the performerrsquos expressive intentionsThe computation is done in two steps The first modulecalled Naturalizer computes the deviations starting from theexpressive cues annotated in the score As the cues usuallyindicate particular elements of the score such as a musicalclimax or the end of a section the deviations have shapesthat reflect the structure of the score and the succession ofmotif and musical cells (see eg Figure 12) The deviationscomputed by the Normalizer correspond to a performancethat is musically correct but without a particular expressiveemphasis The second module called Expressivizer modifiesthe deviations computed in the previous step in order to char-acterize the performance according to the userrsquos expressiveintentions

42 Rendering The graphical interface of CaRo 20 (seeFigure 11) consists in a main window hosting the Kinetics-Energy space (see Section 23) used for the interactive controlof the expressive intentions and in a main menu throughwhich the user can load an annotated musical score inMusicXML format and launch the execution of the scoreWhen the user loads the MusicXML score the file is parsedand three data structures are generated note events expres-sive cues and hierarchical structure At this level the musicalscore is as a collection of separate events EV[ID] with theevent index ID = 1 119873 For instance a score can bedescribed as a collection of notes and rests Each event ischaracterised by a time reference and a list of attributes (suchas pitch and duration)

The first structure is a list of note events each onedescribed by the following fields ID number onset (ticks)duration (ticks) bar position (ticks) bar length (beats)

6 Advances in Human-Computer Interaction

Table 2 Correspondence among graphical expressive cues XML representation and rendering parameters

Graphicalsymbol MusicXML code Event value

∙ ltarticulations gtltstaccato default-x=ldquo3rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulations gt DR[119899]

gt ltarticulationsgtltaccent default-x=ldquominus1rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulationsgt KV[119899]

ndash ltarticulationsgtlttenuto default-x=ldquo1rdquo default-y=ldquo14rdquo placement=ldquoaboverdquogtltarticulationsgt DR[119899]

ltarticulationsgtltbreath-mark default-x=ldquo16rdquo default-y=ldquo18rdquo placement=ldquoaboverdquogtltarticulationsgt 119874[119899]

ltnotations gtltslur number=ldquo1rdquo placement=ldquoaboverdquo type=ldquostartrdquogtltnotationsgt 119874[119899] DR[119899]KV[119899]

ltdirection-typegtltpedal default-y=ldquominus91rdquo line=ldquonordquo type=ldquostartrdquogtltdirection-typegt KV[119899]

voice number grace (boolean) and pitch where ticks andbeats are common unit measures for representing musicalduration (for the complete MIDI detailed specification seehttpmidiorgtechspecsmidispecphp) The last field is apointer to a list of pitches each one represented by a stringof three characters following the schema 119883119884119885 where 119883 isthe name (from 119860 to 119866) of the note 119884 is the octave (from0 to 8) and 119885 is the alteration (minus1 stands for flat and +1

for sharp ♯) A chord (more tones played simultaneously) isrepresented by a list of pitches with more than one entry Anempty list indicates a music rest

The second structure is a list of expressive cues asspecified in the score Each entry is defined by type ID ofthe linked event and voice number Among many differentways to represent expressive cues in a score the system iscurrently able to recognise and render the expressive cueslisted in Table 2

The third data structure describes the hierarchical struc-ture of the piece that is its subdivision top to bottom inperiods phrases subphrases and motifs Each section isspecified by an entry composed by begin (ticks) end (ticks)and hierarchical level number

For clarity reasons it is useful to express the durationof the musical event in seconds instead of metric units suchas ticks and beats This conversion is possible taking thetempo marking written in the score if any or normalisingthe total score length to the performance duration Thisrepresentation is called nominal time 119905nom the event onsetis called nominal onset time 119874nom[119899] the event length iscalled nominal durationDRnom[119899] and distance between twoadjacent events is called nominal inter onset interval

IOInom [119899] = 119874nom [119899 + 1] minus 119874nom [119899] (1)

When the user selects the command to start execution ofthe score a timer is instantiated to have a temporal referencefor the processing of the musical events Let 119905119901 be the timepast from the beginning of the performance Since the systemis interactive and the results depend on the userrsquos input at thetime 119905119901 the expressive deviations (and therefore the values ofeach note event) have to be computed just in time before theyare sent to the MIDI interface The deviations are calculatedby the PlayExpressive() function which is periodicallycalled by the timer with a time interval 119879 This interval isa critical parameter of the system and its value depends on

a trade-off between two conditions First the time intervalmust be large enough to allow the processing and playing ofthe note events that is 119879 lt 119905119862 + 119905119872 where 119905119862 is the timefor calculating the expressive deviations generally negligibleand 119905119872 is the time to transmit the note events by meansof the MIDI protocol a serial protocol with a transfer rateof 31250 bps Since a note event is generally coded withthree bytes (and each byte including the start and stop bitsis composed by 10 bits) the time to transfer a note eventis about 1ms Therefore assuming that a piano score canhave at most 10 simultaneous notes 119879 should be greaterthan 10ms Another condition is that 119879 should be smallenough to give the user the feeling of real-time control ofthe expressive intentions that is 119879 lt 119905119864 where 119905119864 is thetime the user needs to recognise that the expressive intentionhas changed Estimating 119905119864 is not a trivial task becausemany aspects have to be considered such as the cognitiveprocesses and the cultural and musical context Howeverexperimental evidence suggests that certain affective nuancescan be recognised also in musical excerpts with a lengthof 250ms [36] A preliminary test showed that a period119879 = 100ms guarantees the on-time processing of themusical events and has been judged sufficient to satisfy thereal-time requirement in this context Figure 4 shows howthe PlayExpressive() function works

(1) NaturalizerThe Naturalizer computes the deviations thatare applied to the nominal parameters to obtain a naturalperformance Each time the PlayExpressive() functionis called by the timer the note events listmdashwhich containsall the notes to be playedmdashis searched for the events withan onset time 119874nom[119899] such that 119905119901 lt 119874nom[119899] le 119905119901 +

119879 If no event is found the function ends otherwise theexpressive cues list is searched for cues linked to the notesto be played For each expressive cue that is found amodifieris instantiated The modifier is a data structure that containsthe deviations to be applied to the expressive parameters dueto the expressive cue written in the score The parametersinfluenced by the different expressive cues are listed inTable 2 Depending on the type of cue the modifier canwork on a single event only identified by its ID number oron a sequence of events identified by a temporal interval(the modifierrsquos scope) and a voice number The expressivedeviations can be additive or proportional in function of thetype of cue and the expressive parameters The deviations are

Advances in Human-Computer Interaction 7

ExitNotes to beplayed

Expressivecues

Newmodifier Sync rules

Voiceleading

rules

Readuser

input

Expressiveintention

processingConstraints

rules

Deviationsrules

No

No

Yes

Yes

Send notesto player

queue

Begin

Exit

Naturalizer

Expressivizer

Figure 4 Schema of the PlayExpressive() function

applied by the successivemodule named deviation rulesThismodule combines the deviations defined by all the modifierswithin a temporal scope between 119905119901 and 119905119901 + 119879 following aset of rules The output is for each note a set of expressiveparameters which define how the note is played AlthoughCaRo 20 can control several different musical instruments(eg violin flute and clarinet) with a variable number ofparameters this paper is focused on piano performancesand the parameters computed by the rules system are IOInatDRnat and KVnat where the subscript nat identifies theparameters of the natural performance and KV is the Key-Velocity a parameter of the MIDI protocol related to thesound intensity

(2) Expressivizer The next step of the rendering algorithm iscalled Expressivizer and its aim is to modify the expressivedeviations computed by the Naturalizer in order to renderthe userrsquos different expressive intentions It is based onthe multilayer model shown in Figure 5 the user specifiesa point or a trajectory in an abstract control space thecoordinates of the control space are mapped on a set ofexpressive features finally the features are used to computethe deviations that must be applied to the score for conveyinga particular expressive intention As an example Figure 7shows a mapping surface between points of the control spaceand the feature Intensity Based on the movements of thepointer on the 119909-119910 plane the values of the Intensity featureare thus computed The user can select a single expressiveintention and the system will play the entire score withthat nuance Trajectories can also be drawn in the space toexperience the dynamic aspects of the expressive content

The abstract control space 119878 is represented as a set of 119873

triples 119878119894 = (119897119894 119901119894 119890119894) with 119894 = 1 119873 where 119897119894 is a verballabel that semantically identifies an expressive intention (egbright light and soft) 119901119894 isin R2 represents the coordinates inthe control space of the expressive intention 119897119894 and 119890119894 isin R119896 is

Semantic spacesKinematicenergy spaceValencearousal space

Features

Physical events O[n] DR[n] KV[n]

middot middot middot

Tempo ΔtempoIntensity Δintensity

Articulation Δarticulation

Figure 5 Multilayer model

a vector of features which characterise the expressive inten-tion 119897119894

119878 defines the values of the expressive features for the 119873

points of coordinates 119901119894 To have a continuos control spacein which each point of the control space is associated with afeature vector a mapping function 119891 R2 rarr R119896 is definedas

119891 (119909) =

119890119894 if 119909 = 119901119894119899

sum

119894=1

119886119894 sdot 119890119894 elsewhere (2)

with

1

119886119894

=

1003817

1003817

1003817

1003817

119909 minus 119901119894

1003817

1003817

1003817

1003817

2sdot

119873

sum

119895=1

1

1003817

1003817

1003817

1003817

1003817

119909 minus 119901119895

1003817

1003817

1003817

1003817

1003817

2 (3)

Let 119909 be a point in the control space the mappingfunction calculates the corresponding features vector 119910 =

119891(119909) as a linear combination of 119873 features vectors onefor each expressive intention defined in 119878 The featuresvectors are weighted inversely proportional to the square of

8 Advances in Human-Computer Interaction

Figure 6The options window (designed for musicians) that allowsthe fine tuning of the model For general public it is possible to loadpreset configurations (see the button ldquoload parameters from filerdquo)

15

1

05

Inte

nsity

1

05

0

minus05

minus1

105

0minus05

minus1Sad

HappyNatural

Angry

Figure 7Map between the 119909-119910 coordinates in the control space andthe Intensity feature

the euclidean distance between their corresponding expres-sive intention and 119909 Finally the resulting features vector canbe used to compute the expressive deviations represented bythe physical layer in Figure 5 For example the Tempo featureis used to compute the parameters of the 119899th physical eventfrom the output of the Naturalizer module according thefollowing equations

119874exp [119899] = 119874nat [119899 minus 1] + (119874nat [119899] minus 119874nat [119899 minus 1]) sdot Tempo

DRexp [119899] = DRnat [119899] sdot Tempo

(4)

where 119874exp and 119874nat are the onset times of the expressive andnatural performance respectively and DRexp and DRnat arethe event durations

43 User-Defined Control Space for Interaction The abstractcontrol space constitutes the interface between the userconcepts of expression and their internal representation intothe system The user following hisher own preferencescan create the semantic space which controls the expressiveintention of the performance Each user can select hisherown expressive ideas and then design the mapping ofthese concepts to positions and movements on this space

Figure 8 A control space designed for an educational context Theposition of the emoticons follows the results of [37]

Figure 9The control space is used to change interactively the soundcomment of the multimedia product (a cartoon from the movieldquoThe Good The Bad and The Uglyrdquo by Sergio Leone in this case)The author can associate different expressive intentions with thecharacters (here see the blue labels 1 2 and 3)

A preferences window (see Figure 6) allows the user to definethe set of performance styles by specifying for the 119894th style averbal semantic label 119897119894 the coordinates in the control space119901119894 and the features vector 119890119894

As a particular and significant case the emotional andsensory control metaphors can be used For example amusician may find it more convenient to use the Kinetics-Energy space (see Figure 1) because it is based on terms suchas bright dark or soft that are commonly used to describemusical nuances On the contrary a nonmusician or a childmay prefer the Valence-Arousal space with emotions beinga very common experience for every listener The sensoryspace can be defined by placing the semantic labels (withtheir associated features) Hard Soft Dark and Bright in fourorthogonal points of the space Similarly the Valence-Arousalspace can be obtained by placing the emotional labels HappyCalm Sad andAngry at the four corners of the control spaceThese spaces are implemented as presets of the system whichcan be selected by the user (see eg Figure 8)

Specific and new expressive intentions can be associatedwith objectsavatars which are placed in the screen and amovement of the pointer from an object to another onechanges continuously the resulting music expression Forexample a different type of music can be associated with thecharacters represented in the image Also the object with its

Advances in Human-Computer Interaction 9

Piano

RondoAllegretto

5

cresc

9

cresc

13

Sonate No 163rd movementopus 31 No 1

(1770-1827)Ludwig van Beethoven

Figure 10 The music excerpt used to evaluate the CaRo 20 software The score has been transcribed in MusicXML format including all theexpressive indications (slurs articulation marks and dynamic cues)

Figure 11 The graphical user interface of CaRo 20 The dotted lineshows the trajectory drawn for rendering the hbls performance

expressive label and parameters can be moved and then thecontrol space can dynamically vary (see eg Figure 9)

More generally the system allows a creative design ofabstract spaces according to artistrsquos fantasies and needs byinventing new expressions and their spatial relations

5 Results

This section reports the results of some expressive music ren-derings obtainedwith CaRo 20 with the aim of showing howthe system works The beginning of the 3rd Movement of thePiano Sonata number 16 by L van Beethoven (see Figure 10)was chosen as a case study because it is characterised by alarge number of expressive nuances and at the same timeit is written with a quite simple musical texture (four voicesat most) which makes it easier to visualise and understandthe results Seven performances of the same BeethovenrsquosSonata were rendered five performances are characterisedby only one expressive intention (bright hard heavy lightand soft) and were obtained keeping the mouse fixed onthe corresponding adjective (see the GUI of Figure 11) oneperformance named hbls has been obtained drawing atrajectory on the two-dimensional control space going fromhard to bright light and soft (see the dotted line of Figure 11)finally a nominal performance was rendered applying noexpressive rule or transformation to the music score

The system rendered the different performances cor-rectly playing the sequences of notes written in the scoreMoreover expressive deviations were computed dependingon the userrsquos preferences specified by means of the con-trol space Figure 12 shows the piano roll representation ofthe nominal performance compared to the performances

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article CaRo 2.0: An Interactive System for

2 Advances in Human-Computer Interaction

the composerrsquos message enlightening the musical structureand to add his personal interpretation of the piece Amechan-ical performance of a score is perceived as lacking of musicalmeaning and is considered dull and inexpressive as a text readwithout any prosodic inflection Indeed human performersnever respect tempo timing and loudness notations in amechanical way when they play a score some deviations arealways introduced even if the performer explicitly wants toplay mechanically [5]

Most studies on musical expression aim at understand-ing the systematic presence of deviations from the musicalnotation as a communication means between musician andlistener Deviations introduced by technical constraints (suchas fingering) or by imperfect performer skill are not normallyconsidered part of expression communication and are thusoften filtered out as noise

At a physical information level the main parametersconsidered in the models of the musical expression calledexpressive parameters are related to timing of musical eventsand tempo (fast or slow) dynamics (loudness variation)and articulation (the way successive notes are connected)These parameters are particularly relevant for keyboardinstruments Moreover they are the basic parameters ofthe MIDI protocol (Musical Instrument Digital Interfacesee httpmidiorgaboutmiditut historyphp) and thus areeasily measurable on digital music instruments and can beused as control signals for rendering a music performanceIn some instruments and in the singing voice other acousticparameters are taken into account such as vibrato andmicro-intonation or pedalling at the piano In contemporary musictimbre is often an essential expressive parameter sometimesalso virtual space location or movement of the sound sourceis used as expression feature

The analysis of these systematic deviations has led tothe formulation of several models that try to describe theirstructure with the aim to explain where how and whya performer modifies sometimes unconsciously what isindicated by the notation in the score It should be noticedthat although deviations are only the external surface ofsomething deeper and often not directly accessible they arequite easily measurable and thus widely used to developcomputational models for performance understanding andgeneration

22 Expressive Intentions In general musical expressionrefers both to the means used by the performer to conveythe composerrsquos message and to hisher own contribution toenrich the musical message Expressiveness related to themusical structure may depend on the dramatic narrativedeveloped by the performer and on the stylistic expectationbased on cultural norm (eg jazz versus classic music) andthe actual performance situation (eg audience engagement)Recently more interest has been given to the expressive com-ponent due to the personal interpretation of the performer[6 7]

Many studies (see eg [8ndash13]) demonstrated that it ispossible to communicate expressive content at an abstractlevel changing the interpretation of a musical piece In factthe musician organises acoustical or perceptual changes in

sound communicating different feelings images or emotionsto the listener The same piece of music can be performeddifferently by the same musician (trying to convey a specificinterpretation of the score or the situation) by addingmutableexpressive intentions

Notice that sometimes expressive intentions the per-former tries to convey can be in contrast with the characterof the musical piece A slightly broader interpretation ofexpression as kansei (Japanese term indicating sensibilityfeeling and sensitivity) [14 15] or affective communication[16] is proposed in some Japanese or American studies Weprefer the broader term expressive intention that includesemotion affect and other sensorial and descriptive wordsor actions Furthermore this term evidences the explicitintent of the performer in communicating expression Mostmusic performances would involve some intention from theperformerrsquos side regarding what the music should express tothe listeners Consequently interpretation involves assigningsome kind of meaning to the music

Whenwe talk of deviation it is important to define whichis the reference used for computing deviation Very oftenthe score is taken as reference both for theoretical (the scorerepresents the music structure) and for practical (it is easilyavailable) reasons However the choice depends on the prob-lem onewants to focus onWhenwe studied how a performerplays a piece according to different expressive intentionswe found that a clearer interpretation and best results insimulation are obtained by using a neutral performance as[17] By ldquoneutralrdquo we intend a human performance withoutany specific expressive intention In other studies the meanperformance (ie the mathematical mean across differentperformances by the same or many performers) was the usedreference to investigate stylistic choices and preferences (eg[18 19])

23 Categorical versus Dimensional Most people have aninformal understanding of musical expression While itsimportance is generally acknowledged the basic constituentsare less clear Often the simple range expressive-inexpressiveis used Regarding the affective component of music expres-sion the two theoretical traditions that have most stronglydetermined past research in this area are categorical (alsocalled discrete) and dimensional emotion theories

The assumption of the categorical approach is that peopleexperience emotions as categories that are distinct from eachother Theorists in this tradition propose the existence ofa small number of basic or fundamental emotions that arecharacterised by very specific response patterns From thesebasic emotions all other emotional states can be derivedThe focus is on characteristics that distinguish emotionfrom one another There is a reasonable agreement on fourbasic emotions happiness sadness anger and fear The nextcommon two are disgust and surprise In this case expressioninformation is represented by labels indicating the expressioncategory and eventually a number or adjective indicating thedegree of that expression

The focus of the dimensional approach is on the identifi-cation of emotions based on their placement on a continuous

Advances in Human-Computer Interaction 3

10

05

00

minus05

minus10

100500minus05minus10

Ener

gy

Kinetics

Hard

Heavy

DarkBright

Light

Soft

Figure 1 Kinetics-Energy space as mid-level representation ofexpressive intentions

space with a small number of dimensions This space isderived from similarity judgements analysed using fac-tor analysis or multidimensional scaling The dimensionalapproach provides a way of describing expression which ismore tractable than using words but which can be translatedinto and out of verbal descriptions Translation is possiblebecause emotion-related words can be understood at leastto a first approximation as referring to positions in thedimensional space [20] Moreover this representation isuseful for capturing the continuous change in expressionduring a piece of music [21]

The most used representation in music research is thetwo-dimensional Valence-Arousal (V-A) space even if otherdimensions were explored in several studies (eg see [22])The V-A space organises emotions in term of affect appraisal(pleasant-unpleasant) and physiological reaction (high-lowarousal) For example happiness is described as a feeling ofexcitement (high arousal) combinedwith positive affect (highvalence) A problem with this approach is that specifyingthe quality of a feeling only in terms of valence and arousaldoes not allow a very high degree of differentiation especiallyin research on music where one may expect a somewhatreduced range of both the unpleasantness and the arousal ofthe states produced by music [23]

Unlike the studies on music and emotions the authorsfocused on expressive intentions described by sensorial adjec-tives [2] which are frequently used in music performancethat is light heavy soft hard bright and dark and othersEach of these descriptors has an opposite (soft versus hard)and provokes contrasting performances by the musicianSome performances played according to the different expres-sive intentions were evaluated in listening experimentsFactor analysis using the performances as variables founda two-dimensional space (Figure 1)The first factor is charac-terised by brightlight versus heavy performances the secondone by soft versus hard performances Acoustical analysisof the performances (Table 1) showed that first factor isclosely correlated with Tempo and can be interpreted as theKinetics (or Velocity) factor while the second one is related

Table 1 Correlation between coordinate axes and acoustic param-eters

Tempo Legato IntensityDim 1 (kinetics) 065 minus028 minus025Dim 2 (energy) 033 minus072 073

to legatostaccato and Intensity and can be interpreted asthe Energy factor By legatostaccato we refer to how muchthe notes are detached and distinctly separated as the ratiobetween the duration of a given note (ie the measure oftime between note-onset and note-offset) and the interonsetinterval (ie the measure of time between two consecutivenote onsets) which occurs between its subsequent notes

We can use this interpretation of Kinetics-Energy spaceas an indication of how listeners organised the performancesin their own minds when focusing on sensory aspectsThe robustness of this representation was confirmed bysynthesising different and varying expressive intentions in amusical performance We can notice that this representationis at an abstraction level which is between the semantic one(such emotion) and physical one (such as timing deviations)and can thus be more effective in representing the listenerrsquosevaluation criteria [24]

3 Computer Systems for ExpressiveMusic Performance

While the models described in the previous section weremainly developed for analysis and understanding purposethey are also often used for synthesis purposes Starting frommodels of musical expression several software systems forrendering expressive musical performances were developed(see [25] for a review)

The systems for automatic expressive performances canbe grouped into three general categories (i) autonomous (ii)feedforward and (iii) feedback systems Examples for eachcategory are presented

Given a score the purpose of all the systems is to calculatethe so-called expressive deviations to be applied to each notein order to change the duration and the intensity of themusical events notated in the traditional music score Aperformance without expressive deviations is indicated by theterm ldquonominal performancerdquo

(1) Autonomous Systems Autonomous System (AS) is a col-lection of rules obtained automatically in which the musicalscore is processed without user participation and then playedin automatic way

As an example YQX system [26] uses a rule set obtainedby means of machine learning algorithms developed in theresearch field of Artificial Intelligence Widmer [27] devel-oped an algorithm for automatic extraction of rules specifi-cally designed to analyse musical performances By applyingthe algorithm to a collection of performances of Mozartrsquospiano sonatasmdashplayed using a grand piano Bosendorfer290 SEmdasha rule set has been extracted that suggests somedeviations in the expressive rhythmic and in the melodic

4 Advances in Human-Computer Interaction

structures For example the staccato rule requires that iftwo successive notes have the same pitch but the secondhas a longer temporal duration then the first note shouldbe performed staccato and the delay-next rule states thatif two notes of the same duration are followed by a longerone then the last note should be played with a slight delayAs mentioned before these rules were obtained from thetraining sets provided in this sense the rules can change ifthe system is trained on a different piano repertoire or withmusic performed by different pianists

(2) Feedforward Systems The systems characterised by afeedforward behaviour allow the user to annotate the scoreand to set the specific parameters of the system depending onthemusic score to be played but they do not allow a real-timecontrol during the performance

A historical example is Director Musices a rule basedsystem for automatic performance developed by the KungligaTekniska Hogskolan (KTH) in Stockholm born in the 1980s[28] with the aim of improving the synthesis quality ofcomputer systems for the singing voice Starting from thework of musical structures analysis by Ipolyi [29] and incollaboration with the violinist Fryden Lars Johan Sundbergformalised the relation set between the musical structuresand the expressive deviations The method used to formalisemost rules is called ldquoanalysis by synthesisrdquo [30] As a firststep a model based on experience and intuition is developedand it is subsequently applied to different musical examplesPerformances generated by this model are evaluated bylistening to identify additional changes to the model Theprocess is reiterated several times until a stable definitionof the model is obtained At this point perceptual tests ona larger population are carried out in order to assess therule system The Director Musices input is a score in MIDIformat applying all the rules (or a subset of them) Thenit computes the expressive deviations for every single notesaving the results to a new MIDI file The user can choosewhich rules to apply and how toweight them through numer-ical coefficients The deviations calculated by each rule areadded together giving rise to complex profiles as shown inFigure 2

The Shunji [31] is an algorithm for automatic performancethat uses the case-based reasoning paradigm the well-knownartificial intelligence technique that learns from examplesalternative to those used by the software YQX Unlike thelatter however Shunji is a supervised system not providinga solution to the problem proposed in an autonomous waybut interactingwith the user proposing possible solutions andreceiving controls to improve the result Shunji is based ona hierarchical segmentation algorithm [32] inspired by theGenerative Theory of Tonal Music by Lerdahl and Jackendoff[33] The performances provided as an example to thealgorithm are segmented using the hierarchical model andthe expressive deviations of each segment are stored in adatabase When the system input is a score that had not beenprocessed before the software segments the score and thenfor each segment searches the database for the most similarsegment in order to extract the performance characteristicsThen it applies the expressive deviations thus obtained to

Melodic charge (k = 2)

Duration contrast

Punctation

Phrase arch (subphrase level)

Phrase arch (phrase level)

Resulting IOI variations

15105

minus5

minus10

1520

105

minus5minus10minus15

105

minus5minus10

105

minus5

5

minus5

minus10

1015

5

minus5

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Figure 2 Example of the rules system by KTH application (adaptedfrom [30]) In particular the figure shows the resulting interonsetinterval (IOI ie the time from one tonersquos onset to that of thenext) deviations by applying of four rules (phrase arch durationcontrast melodic charge and punctuation) to the Swedish nurserytune Ekorrn satt i granen

the new score The user can interact with the software bycontrolling the segmentation of the score by providing newexamples by changing the parameters used to define thesimilarity between two musical phrases and by choosing theexamples to extract the performance characteristics amongthose that the system has considered to be more similar Thisapproach also works using a reduced number of examples ifthey are well chosen

(3) Feedback Systems The systems described above workindependently or require user intervention to define someperformance details before replay On the contrary in thefeedback systems the user can control the performancein real time during replay As the orchestral conductorcontrols the musiciansrsquo performance the user canmodify theexpressive deviations in an interactive way

In the VirtualPhilharmony [34] the user is given a scoreBefore starting the performance and with the help of sometools (s)he has to choose an expressive model that is a setof expressive deviations in relation to the articulation andagogic aspects of each single note During the performancea sensor measures the userrsquos arm movements who mimicsthe gestures of an orchestral conductor with a wand inhisher hand The system then modifies the temporal anddynamic characteristics of the performance in a consistentway following the userrsquos intentions taking into account theparticular aspects of the interaction conductor-musicianincluding the latency time between the conductorrsquos gestureand the musician execution and the prediction of the next

Advances in Human-Computer Interaction 5

Music score

EditorAnnotations

piano

protocol

Naturalizer Expressivizer

CaRo

Off-line Real-time

intention

controlledMIDI

MIDI

Userrsquos expressive

MusicXMLfile

Figure 3 System architecture CaRo 20 receives as input an annotated score in MusicXML format and generates in real-time messages toplay a MIDI controlled piano

beat which allows the musician to maintain the correcttempo between a conductorrsquos gesture and the next gesture

The differences between the systems described aboveinclude both the algorithms for computing the expressivedeviations and the aspects related to the userrsquos interactionIn the autonomous systems the user can interact with thesystem only in the selection of the training set and as aconsequence the performance style that the systemwill learnThe feedforward systems allow a deeper interaction with themodel parameters the user can set the parameters listen tothe results and then fine-tune the parameters again until theresults are satisfying The feedback systems allow a real-timecontrol of the parameters the user can change the parametersof the performance while (s)he is listening to it in a similarway as a human musician does Since the models for musicperformance usually have numerous parameters a crucialaspect of the feedback systems is how to allow the userto simultaneously control all these parameters in real-timeVirtualPhilharmony allows the user to control in real-timeonly two parameters (intensity and tempo) the other ones aredefined offline by mean of a so-called performance template

Also the system developed by the authors and describedin the next sections is a feedback system As explained laterthe real-time control of the parameters is allowed by meansof a control space based on a semantic description of differentexpressive intentions Starting from the trajectories drawn bythe user on this control space the systemmaps concepts suchas emotions or sensations in the low level parameters of themodel

4 The CaRo 20 System

41 Architecture CaRo 20 simulates the tasks of a pianistwho reads a musical score decodes its symbols plansthe expressive choices and finally executes the actionsin order to actually play the instrument Usually pianistsanalyse the score very carefully before the performanceand they add annotations and cues to the musical sheetin order to emphasise rhythmic-melodic structures sec-tion subdivisions and other interpretative indications Asshown in Figure 3 CaRo 20 receives a musical scoreas input compliant with the MusicXML standard [35]Any music editor able to export in MusicXML format

(eg Finale (httpwwwfinalemusiccom) or MuseScore(httpmusescoreorg)) can be used to write the score andadd annotations CaRo 20 is able to decode and process aset of expressive cues as specified in the next section Theslurs can be hierarchically structured to individuate sectionsmotifs and short musical cells Moreover textual labels canbe added to specify the interpretative choices

The specificity of CaRo 20 in comparison to the systemspresented in Section 3 is that it is based on a model thatexplicitly distinguishes the expressive deviations related tothe musical structure from those related to other musi-cal aspects such as the performerrsquos expressive intentionsThe computation is done in two steps The first modulecalled Naturalizer computes the deviations starting from theexpressive cues annotated in the score As the cues usuallyindicate particular elements of the score such as a musicalclimax or the end of a section the deviations have shapesthat reflect the structure of the score and the succession ofmotif and musical cells (see eg Figure 12) The deviationscomputed by the Normalizer correspond to a performancethat is musically correct but without a particular expressiveemphasis The second module called Expressivizer modifiesthe deviations computed in the previous step in order to char-acterize the performance according to the userrsquos expressiveintentions

42 Rendering The graphical interface of CaRo 20 (seeFigure 11) consists in a main window hosting the Kinetics-Energy space (see Section 23) used for the interactive controlof the expressive intentions and in a main menu throughwhich the user can load an annotated musical score inMusicXML format and launch the execution of the scoreWhen the user loads the MusicXML score the file is parsedand three data structures are generated note events expres-sive cues and hierarchical structure At this level the musicalscore is as a collection of separate events EV[ID] with theevent index ID = 1 119873 For instance a score can bedescribed as a collection of notes and rests Each event ischaracterised by a time reference and a list of attributes (suchas pitch and duration)

The first structure is a list of note events each onedescribed by the following fields ID number onset (ticks)duration (ticks) bar position (ticks) bar length (beats)

6 Advances in Human-Computer Interaction

Table 2 Correspondence among graphical expressive cues XML representation and rendering parameters

Graphicalsymbol MusicXML code Event value

∙ ltarticulations gtltstaccato default-x=ldquo3rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulations gt DR[119899]

gt ltarticulationsgtltaccent default-x=ldquominus1rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulationsgt KV[119899]

ndash ltarticulationsgtlttenuto default-x=ldquo1rdquo default-y=ldquo14rdquo placement=ldquoaboverdquogtltarticulationsgt DR[119899]

ltarticulationsgtltbreath-mark default-x=ldquo16rdquo default-y=ldquo18rdquo placement=ldquoaboverdquogtltarticulationsgt 119874[119899]

ltnotations gtltslur number=ldquo1rdquo placement=ldquoaboverdquo type=ldquostartrdquogtltnotationsgt 119874[119899] DR[119899]KV[119899]

ltdirection-typegtltpedal default-y=ldquominus91rdquo line=ldquonordquo type=ldquostartrdquogtltdirection-typegt KV[119899]

voice number grace (boolean) and pitch where ticks andbeats are common unit measures for representing musicalduration (for the complete MIDI detailed specification seehttpmidiorgtechspecsmidispecphp) The last field is apointer to a list of pitches each one represented by a stringof three characters following the schema 119883119884119885 where 119883 isthe name (from 119860 to 119866) of the note 119884 is the octave (from0 to 8) and 119885 is the alteration (minus1 stands for flat and +1

for sharp ♯) A chord (more tones played simultaneously) isrepresented by a list of pitches with more than one entry Anempty list indicates a music rest

The second structure is a list of expressive cues asspecified in the score Each entry is defined by type ID ofthe linked event and voice number Among many differentways to represent expressive cues in a score the system iscurrently able to recognise and render the expressive cueslisted in Table 2

The third data structure describes the hierarchical struc-ture of the piece that is its subdivision top to bottom inperiods phrases subphrases and motifs Each section isspecified by an entry composed by begin (ticks) end (ticks)and hierarchical level number

For clarity reasons it is useful to express the durationof the musical event in seconds instead of metric units suchas ticks and beats This conversion is possible taking thetempo marking written in the score if any or normalisingthe total score length to the performance duration Thisrepresentation is called nominal time 119905nom the event onsetis called nominal onset time 119874nom[119899] the event length iscalled nominal durationDRnom[119899] and distance between twoadjacent events is called nominal inter onset interval

IOInom [119899] = 119874nom [119899 + 1] minus 119874nom [119899] (1)

When the user selects the command to start execution ofthe score a timer is instantiated to have a temporal referencefor the processing of the musical events Let 119905119901 be the timepast from the beginning of the performance Since the systemis interactive and the results depend on the userrsquos input at thetime 119905119901 the expressive deviations (and therefore the values ofeach note event) have to be computed just in time before theyare sent to the MIDI interface The deviations are calculatedby the PlayExpressive() function which is periodicallycalled by the timer with a time interval 119879 This interval isa critical parameter of the system and its value depends on

a trade-off between two conditions First the time intervalmust be large enough to allow the processing and playing ofthe note events that is 119879 lt 119905119862 + 119905119872 where 119905119862 is the timefor calculating the expressive deviations generally negligibleand 119905119872 is the time to transmit the note events by meansof the MIDI protocol a serial protocol with a transfer rateof 31250 bps Since a note event is generally coded withthree bytes (and each byte including the start and stop bitsis composed by 10 bits) the time to transfer a note eventis about 1ms Therefore assuming that a piano score canhave at most 10 simultaneous notes 119879 should be greaterthan 10ms Another condition is that 119879 should be smallenough to give the user the feeling of real-time control ofthe expressive intentions that is 119879 lt 119905119864 where 119905119864 is thetime the user needs to recognise that the expressive intentionhas changed Estimating 119905119864 is not a trivial task becausemany aspects have to be considered such as the cognitiveprocesses and the cultural and musical context Howeverexperimental evidence suggests that certain affective nuancescan be recognised also in musical excerpts with a lengthof 250ms [36] A preliminary test showed that a period119879 = 100ms guarantees the on-time processing of themusical events and has been judged sufficient to satisfy thereal-time requirement in this context Figure 4 shows howthe PlayExpressive() function works

(1) NaturalizerThe Naturalizer computes the deviations thatare applied to the nominal parameters to obtain a naturalperformance Each time the PlayExpressive() functionis called by the timer the note events listmdashwhich containsall the notes to be playedmdashis searched for the events withan onset time 119874nom[119899] such that 119905119901 lt 119874nom[119899] le 119905119901 +

119879 If no event is found the function ends otherwise theexpressive cues list is searched for cues linked to the notesto be played For each expressive cue that is found amodifieris instantiated The modifier is a data structure that containsthe deviations to be applied to the expressive parameters dueto the expressive cue written in the score The parametersinfluenced by the different expressive cues are listed inTable 2 Depending on the type of cue the modifier canwork on a single event only identified by its ID number oron a sequence of events identified by a temporal interval(the modifierrsquos scope) and a voice number The expressivedeviations can be additive or proportional in function of thetype of cue and the expressive parameters The deviations are

Advances in Human-Computer Interaction 7

ExitNotes to beplayed

Expressivecues

Newmodifier Sync rules

Voiceleading

rules

Readuser

input

Expressiveintention

processingConstraints

rules

Deviationsrules

No

No

Yes

Yes

Send notesto player

queue

Begin

Exit

Naturalizer

Expressivizer

Figure 4 Schema of the PlayExpressive() function

applied by the successivemodule named deviation rulesThismodule combines the deviations defined by all the modifierswithin a temporal scope between 119905119901 and 119905119901 + 119879 following aset of rules The output is for each note a set of expressiveparameters which define how the note is played AlthoughCaRo 20 can control several different musical instruments(eg violin flute and clarinet) with a variable number ofparameters this paper is focused on piano performancesand the parameters computed by the rules system are IOInatDRnat and KVnat where the subscript nat identifies theparameters of the natural performance and KV is the Key-Velocity a parameter of the MIDI protocol related to thesound intensity

(2) Expressivizer The next step of the rendering algorithm iscalled Expressivizer and its aim is to modify the expressivedeviations computed by the Naturalizer in order to renderthe userrsquos different expressive intentions It is based onthe multilayer model shown in Figure 5 the user specifiesa point or a trajectory in an abstract control space thecoordinates of the control space are mapped on a set ofexpressive features finally the features are used to computethe deviations that must be applied to the score for conveyinga particular expressive intention As an example Figure 7shows a mapping surface between points of the control spaceand the feature Intensity Based on the movements of thepointer on the 119909-119910 plane the values of the Intensity featureare thus computed The user can select a single expressiveintention and the system will play the entire score withthat nuance Trajectories can also be drawn in the space toexperience the dynamic aspects of the expressive content

The abstract control space 119878 is represented as a set of 119873

triples 119878119894 = (119897119894 119901119894 119890119894) with 119894 = 1 119873 where 119897119894 is a verballabel that semantically identifies an expressive intention (egbright light and soft) 119901119894 isin R2 represents the coordinates inthe control space of the expressive intention 119897119894 and 119890119894 isin R119896 is

Semantic spacesKinematicenergy spaceValencearousal space

Features

Physical events O[n] DR[n] KV[n]

middot middot middot

Tempo ΔtempoIntensity Δintensity

Articulation Δarticulation

Figure 5 Multilayer model

a vector of features which characterise the expressive inten-tion 119897119894

119878 defines the values of the expressive features for the 119873

points of coordinates 119901119894 To have a continuos control spacein which each point of the control space is associated with afeature vector a mapping function 119891 R2 rarr R119896 is definedas

119891 (119909) =

119890119894 if 119909 = 119901119894119899

sum

119894=1

119886119894 sdot 119890119894 elsewhere (2)

with

1

119886119894

=

1003817

1003817

1003817

1003817

119909 minus 119901119894

1003817

1003817

1003817

1003817

2sdot

119873

sum

119895=1

1

1003817

1003817

1003817

1003817

1003817

119909 minus 119901119895

1003817

1003817

1003817

1003817

1003817

2 (3)

Let 119909 be a point in the control space the mappingfunction calculates the corresponding features vector 119910 =

119891(119909) as a linear combination of 119873 features vectors onefor each expressive intention defined in 119878 The featuresvectors are weighted inversely proportional to the square of

8 Advances in Human-Computer Interaction

Figure 6The options window (designed for musicians) that allowsthe fine tuning of the model For general public it is possible to loadpreset configurations (see the button ldquoload parameters from filerdquo)

15

1

05

Inte

nsity

1

05

0

minus05

minus1

105

0minus05

minus1Sad

HappyNatural

Angry

Figure 7Map between the 119909-119910 coordinates in the control space andthe Intensity feature

the euclidean distance between their corresponding expres-sive intention and 119909 Finally the resulting features vector canbe used to compute the expressive deviations represented bythe physical layer in Figure 5 For example the Tempo featureis used to compute the parameters of the 119899th physical eventfrom the output of the Naturalizer module according thefollowing equations

119874exp [119899] = 119874nat [119899 minus 1] + (119874nat [119899] minus 119874nat [119899 minus 1]) sdot Tempo

DRexp [119899] = DRnat [119899] sdot Tempo

(4)

where 119874exp and 119874nat are the onset times of the expressive andnatural performance respectively and DRexp and DRnat arethe event durations

43 User-Defined Control Space for Interaction The abstractcontrol space constitutes the interface between the userconcepts of expression and their internal representation intothe system The user following hisher own preferencescan create the semantic space which controls the expressiveintention of the performance Each user can select hisherown expressive ideas and then design the mapping ofthese concepts to positions and movements on this space

Figure 8 A control space designed for an educational context Theposition of the emoticons follows the results of [37]

Figure 9The control space is used to change interactively the soundcomment of the multimedia product (a cartoon from the movieldquoThe Good The Bad and The Uglyrdquo by Sergio Leone in this case)The author can associate different expressive intentions with thecharacters (here see the blue labels 1 2 and 3)

A preferences window (see Figure 6) allows the user to definethe set of performance styles by specifying for the 119894th style averbal semantic label 119897119894 the coordinates in the control space119901119894 and the features vector 119890119894

As a particular and significant case the emotional andsensory control metaphors can be used For example amusician may find it more convenient to use the Kinetics-Energy space (see Figure 1) because it is based on terms suchas bright dark or soft that are commonly used to describemusical nuances On the contrary a nonmusician or a childmay prefer the Valence-Arousal space with emotions beinga very common experience for every listener The sensoryspace can be defined by placing the semantic labels (withtheir associated features) Hard Soft Dark and Bright in fourorthogonal points of the space Similarly the Valence-Arousalspace can be obtained by placing the emotional labels HappyCalm Sad andAngry at the four corners of the control spaceThese spaces are implemented as presets of the system whichcan be selected by the user (see eg Figure 8)

Specific and new expressive intentions can be associatedwith objectsavatars which are placed in the screen and amovement of the pointer from an object to another onechanges continuously the resulting music expression Forexample a different type of music can be associated with thecharacters represented in the image Also the object with its

Advances in Human-Computer Interaction 9

Piano

RondoAllegretto

5

cresc

9

cresc

13

Sonate No 163rd movementopus 31 No 1

(1770-1827)Ludwig van Beethoven

Figure 10 The music excerpt used to evaluate the CaRo 20 software The score has been transcribed in MusicXML format including all theexpressive indications (slurs articulation marks and dynamic cues)

Figure 11 The graphical user interface of CaRo 20 The dotted lineshows the trajectory drawn for rendering the hbls performance

expressive label and parameters can be moved and then thecontrol space can dynamically vary (see eg Figure 9)

More generally the system allows a creative design ofabstract spaces according to artistrsquos fantasies and needs byinventing new expressions and their spatial relations

5 Results

This section reports the results of some expressive music ren-derings obtainedwith CaRo 20 with the aim of showing howthe system works The beginning of the 3rd Movement of thePiano Sonata number 16 by L van Beethoven (see Figure 10)was chosen as a case study because it is characterised by alarge number of expressive nuances and at the same timeit is written with a quite simple musical texture (four voicesat most) which makes it easier to visualise and understandthe results Seven performances of the same BeethovenrsquosSonata were rendered five performances are characterisedby only one expressive intention (bright hard heavy lightand soft) and were obtained keeping the mouse fixed onthe corresponding adjective (see the GUI of Figure 11) oneperformance named hbls has been obtained drawing atrajectory on the two-dimensional control space going fromhard to bright light and soft (see the dotted line of Figure 11)finally a nominal performance was rendered applying noexpressive rule or transformation to the music score

The system rendered the different performances cor-rectly playing the sequences of notes written in the scoreMoreover expressive deviations were computed dependingon the userrsquos preferences specified by means of the con-trol space Figure 12 shows the piano roll representation ofthe nominal performance compared to the performances

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article CaRo 2.0: An Interactive System for

Advances in Human-Computer Interaction 3

10

05

00

minus05

minus10

100500minus05minus10

Ener

gy

Kinetics

Hard

Heavy

DarkBright

Light

Soft

Figure 1 Kinetics-Energy space as mid-level representation ofexpressive intentions

space with a small number of dimensions This space isderived from similarity judgements analysed using fac-tor analysis or multidimensional scaling The dimensionalapproach provides a way of describing expression which ismore tractable than using words but which can be translatedinto and out of verbal descriptions Translation is possiblebecause emotion-related words can be understood at leastto a first approximation as referring to positions in thedimensional space [20] Moreover this representation isuseful for capturing the continuous change in expressionduring a piece of music [21]

The most used representation in music research is thetwo-dimensional Valence-Arousal (V-A) space even if otherdimensions were explored in several studies (eg see [22])The V-A space organises emotions in term of affect appraisal(pleasant-unpleasant) and physiological reaction (high-lowarousal) For example happiness is described as a feeling ofexcitement (high arousal) combinedwith positive affect (highvalence) A problem with this approach is that specifyingthe quality of a feeling only in terms of valence and arousaldoes not allow a very high degree of differentiation especiallyin research on music where one may expect a somewhatreduced range of both the unpleasantness and the arousal ofthe states produced by music [23]

Unlike the studies on music and emotions the authorsfocused on expressive intentions described by sensorial adjec-tives [2] which are frequently used in music performancethat is light heavy soft hard bright and dark and othersEach of these descriptors has an opposite (soft versus hard)and provokes contrasting performances by the musicianSome performances played according to the different expres-sive intentions were evaluated in listening experimentsFactor analysis using the performances as variables founda two-dimensional space (Figure 1)The first factor is charac-terised by brightlight versus heavy performances the secondone by soft versus hard performances Acoustical analysisof the performances (Table 1) showed that first factor isclosely correlated with Tempo and can be interpreted as theKinetics (or Velocity) factor while the second one is related

Table 1 Correlation between coordinate axes and acoustic param-eters

Tempo Legato IntensityDim 1 (kinetics) 065 minus028 minus025Dim 2 (energy) 033 minus072 073

to legatostaccato and Intensity and can be interpreted asthe Energy factor By legatostaccato we refer to how muchthe notes are detached and distinctly separated as the ratiobetween the duration of a given note (ie the measure oftime between note-onset and note-offset) and the interonsetinterval (ie the measure of time between two consecutivenote onsets) which occurs between its subsequent notes

We can use this interpretation of Kinetics-Energy spaceas an indication of how listeners organised the performancesin their own minds when focusing on sensory aspectsThe robustness of this representation was confirmed bysynthesising different and varying expressive intentions in amusical performance We can notice that this representationis at an abstraction level which is between the semantic one(such emotion) and physical one (such as timing deviations)and can thus be more effective in representing the listenerrsquosevaluation criteria [24]

3 Computer Systems for ExpressiveMusic Performance

While the models described in the previous section weremainly developed for analysis and understanding purposethey are also often used for synthesis purposes Starting frommodels of musical expression several software systems forrendering expressive musical performances were developed(see [25] for a review)

The systems for automatic expressive performances canbe grouped into three general categories (i) autonomous (ii)feedforward and (iii) feedback systems Examples for eachcategory are presented

Given a score the purpose of all the systems is to calculatethe so-called expressive deviations to be applied to each notein order to change the duration and the intensity of themusical events notated in the traditional music score Aperformance without expressive deviations is indicated by theterm ldquonominal performancerdquo

(1) Autonomous Systems Autonomous System (AS) is a col-lection of rules obtained automatically in which the musicalscore is processed without user participation and then playedin automatic way

As an example YQX system [26] uses a rule set obtainedby means of machine learning algorithms developed in theresearch field of Artificial Intelligence Widmer [27] devel-oped an algorithm for automatic extraction of rules specifi-cally designed to analyse musical performances By applyingthe algorithm to a collection of performances of Mozartrsquospiano sonatasmdashplayed using a grand piano Bosendorfer290 SEmdasha rule set has been extracted that suggests somedeviations in the expressive rhythmic and in the melodic

4 Advances in Human-Computer Interaction

structures For example the staccato rule requires that iftwo successive notes have the same pitch but the secondhas a longer temporal duration then the first note shouldbe performed staccato and the delay-next rule states thatif two notes of the same duration are followed by a longerone then the last note should be played with a slight delayAs mentioned before these rules were obtained from thetraining sets provided in this sense the rules can change ifthe system is trained on a different piano repertoire or withmusic performed by different pianists

(2) Feedforward Systems The systems characterised by afeedforward behaviour allow the user to annotate the scoreand to set the specific parameters of the system depending onthemusic score to be played but they do not allow a real-timecontrol during the performance

A historical example is Director Musices a rule basedsystem for automatic performance developed by the KungligaTekniska Hogskolan (KTH) in Stockholm born in the 1980s[28] with the aim of improving the synthesis quality ofcomputer systems for the singing voice Starting from thework of musical structures analysis by Ipolyi [29] and incollaboration with the violinist Fryden Lars Johan Sundbergformalised the relation set between the musical structuresand the expressive deviations The method used to formalisemost rules is called ldquoanalysis by synthesisrdquo [30] As a firststep a model based on experience and intuition is developedand it is subsequently applied to different musical examplesPerformances generated by this model are evaluated bylistening to identify additional changes to the model Theprocess is reiterated several times until a stable definitionof the model is obtained At this point perceptual tests ona larger population are carried out in order to assess therule system The Director Musices input is a score in MIDIformat applying all the rules (or a subset of them) Thenit computes the expressive deviations for every single notesaving the results to a new MIDI file The user can choosewhich rules to apply and how toweight them through numer-ical coefficients The deviations calculated by each rule areadded together giving rise to complex profiles as shown inFigure 2

The Shunji [31] is an algorithm for automatic performancethat uses the case-based reasoning paradigm the well-knownartificial intelligence technique that learns from examplesalternative to those used by the software YQX Unlike thelatter however Shunji is a supervised system not providinga solution to the problem proposed in an autonomous waybut interactingwith the user proposing possible solutions andreceiving controls to improve the result Shunji is based ona hierarchical segmentation algorithm [32] inspired by theGenerative Theory of Tonal Music by Lerdahl and Jackendoff[33] The performances provided as an example to thealgorithm are segmented using the hierarchical model andthe expressive deviations of each segment are stored in adatabase When the system input is a score that had not beenprocessed before the software segments the score and thenfor each segment searches the database for the most similarsegment in order to extract the performance characteristicsThen it applies the expressive deviations thus obtained to

Melodic charge (k = 2)

Duration contrast

Punctation

Phrase arch (subphrase level)

Phrase arch (phrase level)

Resulting IOI variations

15105

minus5

minus10

1520

105

minus5minus10minus15

105

minus5minus10

105

minus5

5

minus5

minus10

1015

5

minus5

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Figure 2 Example of the rules system by KTH application (adaptedfrom [30]) In particular the figure shows the resulting interonsetinterval (IOI ie the time from one tonersquos onset to that of thenext) deviations by applying of four rules (phrase arch durationcontrast melodic charge and punctuation) to the Swedish nurserytune Ekorrn satt i granen

the new score The user can interact with the software bycontrolling the segmentation of the score by providing newexamples by changing the parameters used to define thesimilarity between two musical phrases and by choosing theexamples to extract the performance characteristics amongthose that the system has considered to be more similar Thisapproach also works using a reduced number of examples ifthey are well chosen

(3) Feedback Systems The systems described above workindependently or require user intervention to define someperformance details before replay On the contrary in thefeedback systems the user can control the performancein real time during replay As the orchestral conductorcontrols the musiciansrsquo performance the user canmodify theexpressive deviations in an interactive way

In the VirtualPhilharmony [34] the user is given a scoreBefore starting the performance and with the help of sometools (s)he has to choose an expressive model that is a setof expressive deviations in relation to the articulation andagogic aspects of each single note During the performancea sensor measures the userrsquos arm movements who mimicsthe gestures of an orchestral conductor with a wand inhisher hand The system then modifies the temporal anddynamic characteristics of the performance in a consistentway following the userrsquos intentions taking into account theparticular aspects of the interaction conductor-musicianincluding the latency time between the conductorrsquos gestureand the musician execution and the prediction of the next

Advances in Human-Computer Interaction 5

Music score

EditorAnnotations

piano

protocol

Naturalizer Expressivizer

CaRo

Off-line Real-time

intention

controlledMIDI

MIDI

Userrsquos expressive

MusicXMLfile

Figure 3 System architecture CaRo 20 receives as input an annotated score in MusicXML format and generates in real-time messages toplay a MIDI controlled piano

beat which allows the musician to maintain the correcttempo between a conductorrsquos gesture and the next gesture

The differences between the systems described aboveinclude both the algorithms for computing the expressivedeviations and the aspects related to the userrsquos interactionIn the autonomous systems the user can interact with thesystem only in the selection of the training set and as aconsequence the performance style that the systemwill learnThe feedforward systems allow a deeper interaction with themodel parameters the user can set the parameters listen tothe results and then fine-tune the parameters again until theresults are satisfying The feedback systems allow a real-timecontrol of the parameters the user can change the parametersof the performance while (s)he is listening to it in a similarway as a human musician does Since the models for musicperformance usually have numerous parameters a crucialaspect of the feedback systems is how to allow the userto simultaneously control all these parameters in real-timeVirtualPhilharmony allows the user to control in real-timeonly two parameters (intensity and tempo) the other ones aredefined offline by mean of a so-called performance template

Also the system developed by the authors and describedin the next sections is a feedback system As explained laterthe real-time control of the parameters is allowed by meansof a control space based on a semantic description of differentexpressive intentions Starting from the trajectories drawn bythe user on this control space the systemmaps concepts suchas emotions or sensations in the low level parameters of themodel

4 The CaRo 20 System

41 Architecture CaRo 20 simulates the tasks of a pianistwho reads a musical score decodes its symbols plansthe expressive choices and finally executes the actionsin order to actually play the instrument Usually pianistsanalyse the score very carefully before the performanceand they add annotations and cues to the musical sheetin order to emphasise rhythmic-melodic structures sec-tion subdivisions and other interpretative indications Asshown in Figure 3 CaRo 20 receives a musical scoreas input compliant with the MusicXML standard [35]Any music editor able to export in MusicXML format

(eg Finale (httpwwwfinalemusiccom) or MuseScore(httpmusescoreorg)) can be used to write the score andadd annotations CaRo 20 is able to decode and process aset of expressive cues as specified in the next section Theslurs can be hierarchically structured to individuate sectionsmotifs and short musical cells Moreover textual labels canbe added to specify the interpretative choices

The specificity of CaRo 20 in comparison to the systemspresented in Section 3 is that it is based on a model thatexplicitly distinguishes the expressive deviations related tothe musical structure from those related to other musi-cal aspects such as the performerrsquos expressive intentionsThe computation is done in two steps The first modulecalled Naturalizer computes the deviations starting from theexpressive cues annotated in the score As the cues usuallyindicate particular elements of the score such as a musicalclimax or the end of a section the deviations have shapesthat reflect the structure of the score and the succession ofmotif and musical cells (see eg Figure 12) The deviationscomputed by the Normalizer correspond to a performancethat is musically correct but without a particular expressiveemphasis The second module called Expressivizer modifiesthe deviations computed in the previous step in order to char-acterize the performance according to the userrsquos expressiveintentions

42 Rendering The graphical interface of CaRo 20 (seeFigure 11) consists in a main window hosting the Kinetics-Energy space (see Section 23) used for the interactive controlof the expressive intentions and in a main menu throughwhich the user can load an annotated musical score inMusicXML format and launch the execution of the scoreWhen the user loads the MusicXML score the file is parsedand three data structures are generated note events expres-sive cues and hierarchical structure At this level the musicalscore is as a collection of separate events EV[ID] with theevent index ID = 1 119873 For instance a score can bedescribed as a collection of notes and rests Each event ischaracterised by a time reference and a list of attributes (suchas pitch and duration)

The first structure is a list of note events each onedescribed by the following fields ID number onset (ticks)duration (ticks) bar position (ticks) bar length (beats)

6 Advances in Human-Computer Interaction

Table 2 Correspondence among graphical expressive cues XML representation and rendering parameters

Graphicalsymbol MusicXML code Event value

∙ ltarticulations gtltstaccato default-x=ldquo3rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulations gt DR[119899]

gt ltarticulationsgtltaccent default-x=ldquominus1rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulationsgt KV[119899]

ndash ltarticulationsgtlttenuto default-x=ldquo1rdquo default-y=ldquo14rdquo placement=ldquoaboverdquogtltarticulationsgt DR[119899]

ltarticulationsgtltbreath-mark default-x=ldquo16rdquo default-y=ldquo18rdquo placement=ldquoaboverdquogtltarticulationsgt 119874[119899]

ltnotations gtltslur number=ldquo1rdquo placement=ldquoaboverdquo type=ldquostartrdquogtltnotationsgt 119874[119899] DR[119899]KV[119899]

ltdirection-typegtltpedal default-y=ldquominus91rdquo line=ldquonordquo type=ldquostartrdquogtltdirection-typegt KV[119899]

voice number grace (boolean) and pitch where ticks andbeats are common unit measures for representing musicalduration (for the complete MIDI detailed specification seehttpmidiorgtechspecsmidispecphp) The last field is apointer to a list of pitches each one represented by a stringof three characters following the schema 119883119884119885 where 119883 isthe name (from 119860 to 119866) of the note 119884 is the octave (from0 to 8) and 119885 is the alteration (minus1 stands for flat and +1

for sharp ♯) A chord (more tones played simultaneously) isrepresented by a list of pitches with more than one entry Anempty list indicates a music rest

The second structure is a list of expressive cues asspecified in the score Each entry is defined by type ID ofthe linked event and voice number Among many differentways to represent expressive cues in a score the system iscurrently able to recognise and render the expressive cueslisted in Table 2

The third data structure describes the hierarchical struc-ture of the piece that is its subdivision top to bottom inperiods phrases subphrases and motifs Each section isspecified by an entry composed by begin (ticks) end (ticks)and hierarchical level number

For clarity reasons it is useful to express the durationof the musical event in seconds instead of metric units suchas ticks and beats This conversion is possible taking thetempo marking written in the score if any or normalisingthe total score length to the performance duration Thisrepresentation is called nominal time 119905nom the event onsetis called nominal onset time 119874nom[119899] the event length iscalled nominal durationDRnom[119899] and distance between twoadjacent events is called nominal inter onset interval

IOInom [119899] = 119874nom [119899 + 1] minus 119874nom [119899] (1)

When the user selects the command to start execution ofthe score a timer is instantiated to have a temporal referencefor the processing of the musical events Let 119905119901 be the timepast from the beginning of the performance Since the systemis interactive and the results depend on the userrsquos input at thetime 119905119901 the expressive deviations (and therefore the values ofeach note event) have to be computed just in time before theyare sent to the MIDI interface The deviations are calculatedby the PlayExpressive() function which is periodicallycalled by the timer with a time interval 119879 This interval isa critical parameter of the system and its value depends on

a trade-off between two conditions First the time intervalmust be large enough to allow the processing and playing ofthe note events that is 119879 lt 119905119862 + 119905119872 where 119905119862 is the timefor calculating the expressive deviations generally negligibleand 119905119872 is the time to transmit the note events by meansof the MIDI protocol a serial protocol with a transfer rateof 31250 bps Since a note event is generally coded withthree bytes (and each byte including the start and stop bitsis composed by 10 bits) the time to transfer a note eventis about 1ms Therefore assuming that a piano score canhave at most 10 simultaneous notes 119879 should be greaterthan 10ms Another condition is that 119879 should be smallenough to give the user the feeling of real-time control ofthe expressive intentions that is 119879 lt 119905119864 where 119905119864 is thetime the user needs to recognise that the expressive intentionhas changed Estimating 119905119864 is not a trivial task becausemany aspects have to be considered such as the cognitiveprocesses and the cultural and musical context Howeverexperimental evidence suggests that certain affective nuancescan be recognised also in musical excerpts with a lengthof 250ms [36] A preliminary test showed that a period119879 = 100ms guarantees the on-time processing of themusical events and has been judged sufficient to satisfy thereal-time requirement in this context Figure 4 shows howthe PlayExpressive() function works

(1) NaturalizerThe Naturalizer computes the deviations thatare applied to the nominal parameters to obtain a naturalperformance Each time the PlayExpressive() functionis called by the timer the note events listmdashwhich containsall the notes to be playedmdashis searched for the events withan onset time 119874nom[119899] such that 119905119901 lt 119874nom[119899] le 119905119901 +

119879 If no event is found the function ends otherwise theexpressive cues list is searched for cues linked to the notesto be played For each expressive cue that is found amodifieris instantiated The modifier is a data structure that containsthe deviations to be applied to the expressive parameters dueto the expressive cue written in the score The parametersinfluenced by the different expressive cues are listed inTable 2 Depending on the type of cue the modifier canwork on a single event only identified by its ID number oron a sequence of events identified by a temporal interval(the modifierrsquos scope) and a voice number The expressivedeviations can be additive or proportional in function of thetype of cue and the expressive parameters The deviations are

Advances in Human-Computer Interaction 7

ExitNotes to beplayed

Expressivecues

Newmodifier Sync rules

Voiceleading

rules

Readuser

input

Expressiveintention

processingConstraints

rules

Deviationsrules

No

No

Yes

Yes

Send notesto player

queue

Begin

Exit

Naturalizer

Expressivizer

Figure 4 Schema of the PlayExpressive() function

applied by the successivemodule named deviation rulesThismodule combines the deviations defined by all the modifierswithin a temporal scope between 119905119901 and 119905119901 + 119879 following aset of rules The output is for each note a set of expressiveparameters which define how the note is played AlthoughCaRo 20 can control several different musical instruments(eg violin flute and clarinet) with a variable number ofparameters this paper is focused on piano performancesand the parameters computed by the rules system are IOInatDRnat and KVnat where the subscript nat identifies theparameters of the natural performance and KV is the Key-Velocity a parameter of the MIDI protocol related to thesound intensity

(2) Expressivizer The next step of the rendering algorithm iscalled Expressivizer and its aim is to modify the expressivedeviations computed by the Naturalizer in order to renderthe userrsquos different expressive intentions It is based onthe multilayer model shown in Figure 5 the user specifiesa point or a trajectory in an abstract control space thecoordinates of the control space are mapped on a set ofexpressive features finally the features are used to computethe deviations that must be applied to the score for conveyinga particular expressive intention As an example Figure 7shows a mapping surface between points of the control spaceand the feature Intensity Based on the movements of thepointer on the 119909-119910 plane the values of the Intensity featureare thus computed The user can select a single expressiveintention and the system will play the entire score withthat nuance Trajectories can also be drawn in the space toexperience the dynamic aspects of the expressive content

The abstract control space 119878 is represented as a set of 119873

triples 119878119894 = (119897119894 119901119894 119890119894) with 119894 = 1 119873 where 119897119894 is a verballabel that semantically identifies an expressive intention (egbright light and soft) 119901119894 isin R2 represents the coordinates inthe control space of the expressive intention 119897119894 and 119890119894 isin R119896 is

Semantic spacesKinematicenergy spaceValencearousal space

Features

Physical events O[n] DR[n] KV[n]

middot middot middot

Tempo ΔtempoIntensity Δintensity

Articulation Δarticulation

Figure 5 Multilayer model

a vector of features which characterise the expressive inten-tion 119897119894

119878 defines the values of the expressive features for the 119873

points of coordinates 119901119894 To have a continuos control spacein which each point of the control space is associated with afeature vector a mapping function 119891 R2 rarr R119896 is definedas

119891 (119909) =

119890119894 if 119909 = 119901119894119899

sum

119894=1

119886119894 sdot 119890119894 elsewhere (2)

with

1

119886119894

=

1003817

1003817

1003817

1003817

119909 minus 119901119894

1003817

1003817

1003817

1003817

2sdot

119873

sum

119895=1

1

1003817

1003817

1003817

1003817

1003817

119909 minus 119901119895

1003817

1003817

1003817

1003817

1003817

2 (3)

Let 119909 be a point in the control space the mappingfunction calculates the corresponding features vector 119910 =

119891(119909) as a linear combination of 119873 features vectors onefor each expressive intention defined in 119878 The featuresvectors are weighted inversely proportional to the square of

8 Advances in Human-Computer Interaction

Figure 6The options window (designed for musicians) that allowsthe fine tuning of the model For general public it is possible to loadpreset configurations (see the button ldquoload parameters from filerdquo)

15

1

05

Inte

nsity

1

05

0

minus05

minus1

105

0minus05

minus1Sad

HappyNatural

Angry

Figure 7Map between the 119909-119910 coordinates in the control space andthe Intensity feature

the euclidean distance between their corresponding expres-sive intention and 119909 Finally the resulting features vector canbe used to compute the expressive deviations represented bythe physical layer in Figure 5 For example the Tempo featureis used to compute the parameters of the 119899th physical eventfrom the output of the Naturalizer module according thefollowing equations

119874exp [119899] = 119874nat [119899 minus 1] + (119874nat [119899] minus 119874nat [119899 minus 1]) sdot Tempo

DRexp [119899] = DRnat [119899] sdot Tempo

(4)

where 119874exp and 119874nat are the onset times of the expressive andnatural performance respectively and DRexp and DRnat arethe event durations

43 User-Defined Control Space for Interaction The abstractcontrol space constitutes the interface between the userconcepts of expression and their internal representation intothe system The user following hisher own preferencescan create the semantic space which controls the expressiveintention of the performance Each user can select hisherown expressive ideas and then design the mapping ofthese concepts to positions and movements on this space

Figure 8 A control space designed for an educational context Theposition of the emoticons follows the results of [37]

Figure 9The control space is used to change interactively the soundcomment of the multimedia product (a cartoon from the movieldquoThe Good The Bad and The Uglyrdquo by Sergio Leone in this case)The author can associate different expressive intentions with thecharacters (here see the blue labels 1 2 and 3)

A preferences window (see Figure 6) allows the user to definethe set of performance styles by specifying for the 119894th style averbal semantic label 119897119894 the coordinates in the control space119901119894 and the features vector 119890119894

As a particular and significant case the emotional andsensory control metaphors can be used For example amusician may find it more convenient to use the Kinetics-Energy space (see Figure 1) because it is based on terms suchas bright dark or soft that are commonly used to describemusical nuances On the contrary a nonmusician or a childmay prefer the Valence-Arousal space with emotions beinga very common experience for every listener The sensoryspace can be defined by placing the semantic labels (withtheir associated features) Hard Soft Dark and Bright in fourorthogonal points of the space Similarly the Valence-Arousalspace can be obtained by placing the emotional labels HappyCalm Sad andAngry at the four corners of the control spaceThese spaces are implemented as presets of the system whichcan be selected by the user (see eg Figure 8)

Specific and new expressive intentions can be associatedwith objectsavatars which are placed in the screen and amovement of the pointer from an object to another onechanges continuously the resulting music expression Forexample a different type of music can be associated with thecharacters represented in the image Also the object with its

Advances in Human-Computer Interaction 9

Piano

RondoAllegretto

5

cresc

9

cresc

13

Sonate No 163rd movementopus 31 No 1

(1770-1827)Ludwig van Beethoven

Figure 10 The music excerpt used to evaluate the CaRo 20 software The score has been transcribed in MusicXML format including all theexpressive indications (slurs articulation marks and dynamic cues)

Figure 11 The graphical user interface of CaRo 20 The dotted lineshows the trajectory drawn for rendering the hbls performance

expressive label and parameters can be moved and then thecontrol space can dynamically vary (see eg Figure 9)

More generally the system allows a creative design ofabstract spaces according to artistrsquos fantasies and needs byinventing new expressions and their spatial relations

5 Results

This section reports the results of some expressive music ren-derings obtainedwith CaRo 20 with the aim of showing howthe system works The beginning of the 3rd Movement of thePiano Sonata number 16 by L van Beethoven (see Figure 10)was chosen as a case study because it is characterised by alarge number of expressive nuances and at the same timeit is written with a quite simple musical texture (four voicesat most) which makes it easier to visualise and understandthe results Seven performances of the same BeethovenrsquosSonata were rendered five performances are characterisedby only one expressive intention (bright hard heavy lightand soft) and were obtained keeping the mouse fixed onthe corresponding adjective (see the GUI of Figure 11) oneperformance named hbls has been obtained drawing atrajectory on the two-dimensional control space going fromhard to bright light and soft (see the dotted line of Figure 11)finally a nominal performance was rendered applying noexpressive rule or transformation to the music score

The system rendered the different performances cor-rectly playing the sequences of notes written in the scoreMoreover expressive deviations were computed dependingon the userrsquos preferences specified by means of the con-trol space Figure 12 shows the piano roll representation ofthe nominal performance compared to the performances

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article CaRo 2.0: An Interactive System for

4 Advances in Human-Computer Interaction

structures For example the staccato rule requires that iftwo successive notes have the same pitch but the secondhas a longer temporal duration then the first note shouldbe performed staccato and the delay-next rule states thatif two notes of the same duration are followed by a longerone then the last note should be played with a slight delayAs mentioned before these rules were obtained from thetraining sets provided in this sense the rules can change ifthe system is trained on a different piano repertoire or withmusic performed by different pianists

(2) Feedforward Systems The systems characterised by afeedforward behaviour allow the user to annotate the scoreand to set the specific parameters of the system depending onthemusic score to be played but they do not allow a real-timecontrol during the performance

A historical example is Director Musices a rule basedsystem for automatic performance developed by the KungligaTekniska Hogskolan (KTH) in Stockholm born in the 1980s[28] with the aim of improving the synthesis quality ofcomputer systems for the singing voice Starting from thework of musical structures analysis by Ipolyi [29] and incollaboration with the violinist Fryden Lars Johan Sundbergformalised the relation set between the musical structuresand the expressive deviations The method used to formalisemost rules is called ldquoanalysis by synthesisrdquo [30] As a firststep a model based on experience and intuition is developedand it is subsequently applied to different musical examplesPerformances generated by this model are evaluated bylistening to identify additional changes to the model Theprocess is reiterated several times until a stable definitionof the model is obtained At this point perceptual tests ona larger population are carried out in order to assess therule system The Director Musices input is a score in MIDIformat applying all the rules (or a subset of them) Thenit computes the expressive deviations for every single notesaving the results to a new MIDI file The user can choosewhich rules to apply and how toweight them through numer-ical coefficients The deviations calculated by each rule areadded together giving rise to complex profiles as shown inFigure 2

The Shunji [31] is an algorithm for automatic performancethat uses the case-based reasoning paradigm the well-knownartificial intelligence technique that learns from examplesalternative to those used by the software YQX Unlike thelatter however Shunji is a supervised system not providinga solution to the problem proposed in an autonomous waybut interactingwith the user proposing possible solutions andreceiving controls to improve the result Shunji is based ona hierarchical segmentation algorithm [32] inspired by theGenerative Theory of Tonal Music by Lerdahl and Jackendoff[33] The performances provided as an example to thealgorithm are segmented using the hierarchical model andthe expressive deviations of each segment are stored in adatabase When the system input is a score that had not beenprocessed before the software segments the score and thenfor each segment searches the database for the most similarsegment in order to extract the performance characteristicsThen it applies the expressive deviations thus obtained to

Melodic charge (k = 2)

Duration contrast

Punctation

Phrase arch (subphrase level)

Phrase arch (phrase level)

Resulting IOI variations

15105

minus5

minus10

1520

105

minus5minus10minus15

105

minus5minus10

105

minus5

5

minus5

minus10

1015

5

minus5

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Figure 2 Example of the rules system by KTH application (adaptedfrom [30]) In particular the figure shows the resulting interonsetinterval (IOI ie the time from one tonersquos onset to that of thenext) deviations by applying of four rules (phrase arch durationcontrast melodic charge and punctuation) to the Swedish nurserytune Ekorrn satt i granen

the new score The user can interact with the software bycontrolling the segmentation of the score by providing newexamples by changing the parameters used to define thesimilarity between two musical phrases and by choosing theexamples to extract the performance characteristics amongthose that the system has considered to be more similar Thisapproach also works using a reduced number of examples ifthey are well chosen

(3) Feedback Systems The systems described above workindependently or require user intervention to define someperformance details before replay On the contrary in thefeedback systems the user can control the performancein real time during replay As the orchestral conductorcontrols the musiciansrsquo performance the user canmodify theexpressive deviations in an interactive way

In the VirtualPhilharmony [34] the user is given a scoreBefore starting the performance and with the help of sometools (s)he has to choose an expressive model that is a setof expressive deviations in relation to the articulation andagogic aspects of each single note During the performancea sensor measures the userrsquos arm movements who mimicsthe gestures of an orchestral conductor with a wand inhisher hand The system then modifies the temporal anddynamic characteristics of the performance in a consistentway following the userrsquos intentions taking into account theparticular aspects of the interaction conductor-musicianincluding the latency time between the conductorrsquos gestureand the musician execution and the prediction of the next

Advances in Human-Computer Interaction 5

Music score

EditorAnnotations

piano

protocol

Naturalizer Expressivizer

CaRo

Off-line Real-time

intention

controlledMIDI

MIDI

Userrsquos expressive

MusicXMLfile

Figure 3 System architecture CaRo 20 receives as input an annotated score in MusicXML format and generates in real-time messages toplay a MIDI controlled piano

beat which allows the musician to maintain the correcttempo between a conductorrsquos gesture and the next gesture

The differences between the systems described aboveinclude both the algorithms for computing the expressivedeviations and the aspects related to the userrsquos interactionIn the autonomous systems the user can interact with thesystem only in the selection of the training set and as aconsequence the performance style that the systemwill learnThe feedforward systems allow a deeper interaction with themodel parameters the user can set the parameters listen tothe results and then fine-tune the parameters again until theresults are satisfying The feedback systems allow a real-timecontrol of the parameters the user can change the parametersof the performance while (s)he is listening to it in a similarway as a human musician does Since the models for musicperformance usually have numerous parameters a crucialaspect of the feedback systems is how to allow the userto simultaneously control all these parameters in real-timeVirtualPhilharmony allows the user to control in real-timeonly two parameters (intensity and tempo) the other ones aredefined offline by mean of a so-called performance template

Also the system developed by the authors and describedin the next sections is a feedback system As explained laterthe real-time control of the parameters is allowed by meansof a control space based on a semantic description of differentexpressive intentions Starting from the trajectories drawn bythe user on this control space the systemmaps concepts suchas emotions or sensations in the low level parameters of themodel

4 The CaRo 20 System

41 Architecture CaRo 20 simulates the tasks of a pianistwho reads a musical score decodes its symbols plansthe expressive choices and finally executes the actionsin order to actually play the instrument Usually pianistsanalyse the score very carefully before the performanceand they add annotations and cues to the musical sheetin order to emphasise rhythmic-melodic structures sec-tion subdivisions and other interpretative indications Asshown in Figure 3 CaRo 20 receives a musical scoreas input compliant with the MusicXML standard [35]Any music editor able to export in MusicXML format

(eg Finale (httpwwwfinalemusiccom) or MuseScore(httpmusescoreorg)) can be used to write the score andadd annotations CaRo 20 is able to decode and process aset of expressive cues as specified in the next section Theslurs can be hierarchically structured to individuate sectionsmotifs and short musical cells Moreover textual labels canbe added to specify the interpretative choices

The specificity of CaRo 20 in comparison to the systemspresented in Section 3 is that it is based on a model thatexplicitly distinguishes the expressive deviations related tothe musical structure from those related to other musi-cal aspects such as the performerrsquos expressive intentionsThe computation is done in two steps The first modulecalled Naturalizer computes the deviations starting from theexpressive cues annotated in the score As the cues usuallyindicate particular elements of the score such as a musicalclimax or the end of a section the deviations have shapesthat reflect the structure of the score and the succession ofmotif and musical cells (see eg Figure 12) The deviationscomputed by the Normalizer correspond to a performancethat is musically correct but without a particular expressiveemphasis The second module called Expressivizer modifiesthe deviations computed in the previous step in order to char-acterize the performance according to the userrsquos expressiveintentions

42 Rendering The graphical interface of CaRo 20 (seeFigure 11) consists in a main window hosting the Kinetics-Energy space (see Section 23) used for the interactive controlof the expressive intentions and in a main menu throughwhich the user can load an annotated musical score inMusicXML format and launch the execution of the scoreWhen the user loads the MusicXML score the file is parsedand three data structures are generated note events expres-sive cues and hierarchical structure At this level the musicalscore is as a collection of separate events EV[ID] with theevent index ID = 1 119873 For instance a score can bedescribed as a collection of notes and rests Each event ischaracterised by a time reference and a list of attributes (suchas pitch and duration)

The first structure is a list of note events each onedescribed by the following fields ID number onset (ticks)duration (ticks) bar position (ticks) bar length (beats)

6 Advances in Human-Computer Interaction

Table 2 Correspondence among graphical expressive cues XML representation and rendering parameters

Graphicalsymbol MusicXML code Event value

∙ ltarticulations gtltstaccato default-x=ldquo3rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulations gt DR[119899]

gt ltarticulationsgtltaccent default-x=ldquominus1rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulationsgt KV[119899]

ndash ltarticulationsgtlttenuto default-x=ldquo1rdquo default-y=ldquo14rdquo placement=ldquoaboverdquogtltarticulationsgt DR[119899]

ltarticulationsgtltbreath-mark default-x=ldquo16rdquo default-y=ldquo18rdquo placement=ldquoaboverdquogtltarticulationsgt 119874[119899]

ltnotations gtltslur number=ldquo1rdquo placement=ldquoaboverdquo type=ldquostartrdquogtltnotationsgt 119874[119899] DR[119899]KV[119899]

ltdirection-typegtltpedal default-y=ldquominus91rdquo line=ldquonordquo type=ldquostartrdquogtltdirection-typegt KV[119899]

voice number grace (boolean) and pitch where ticks andbeats are common unit measures for representing musicalduration (for the complete MIDI detailed specification seehttpmidiorgtechspecsmidispecphp) The last field is apointer to a list of pitches each one represented by a stringof three characters following the schema 119883119884119885 where 119883 isthe name (from 119860 to 119866) of the note 119884 is the octave (from0 to 8) and 119885 is the alteration (minus1 stands for flat and +1

for sharp ♯) A chord (more tones played simultaneously) isrepresented by a list of pitches with more than one entry Anempty list indicates a music rest

The second structure is a list of expressive cues asspecified in the score Each entry is defined by type ID ofthe linked event and voice number Among many differentways to represent expressive cues in a score the system iscurrently able to recognise and render the expressive cueslisted in Table 2

The third data structure describes the hierarchical struc-ture of the piece that is its subdivision top to bottom inperiods phrases subphrases and motifs Each section isspecified by an entry composed by begin (ticks) end (ticks)and hierarchical level number

For clarity reasons it is useful to express the durationof the musical event in seconds instead of metric units suchas ticks and beats This conversion is possible taking thetempo marking written in the score if any or normalisingthe total score length to the performance duration Thisrepresentation is called nominal time 119905nom the event onsetis called nominal onset time 119874nom[119899] the event length iscalled nominal durationDRnom[119899] and distance between twoadjacent events is called nominal inter onset interval

IOInom [119899] = 119874nom [119899 + 1] minus 119874nom [119899] (1)

When the user selects the command to start execution ofthe score a timer is instantiated to have a temporal referencefor the processing of the musical events Let 119905119901 be the timepast from the beginning of the performance Since the systemis interactive and the results depend on the userrsquos input at thetime 119905119901 the expressive deviations (and therefore the values ofeach note event) have to be computed just in time before theyare sent to the MIDI interface The deviations are calculatedby the PlayExpressive() function which is periodicallycalled by the timer with a time interval 119879 This interval isa critical parameter of the system and its value depends on

a trade-off between two conditions First the time intervalmust be large enough to allow the processing and playing ofthe note events that is 119879 lt 119905119862 + 119905119872 where 119905119862 is the timefor calculating the expressive deviations generally negligibleand 119905119872 is the time to transmit the note events by meansof the MIDI protocol a serial protocol with a transfer rateof 31250 bps Since a note event is generally coded withthree bytes (and each byte including the start and stop bitsis composed by 10 bits) the time to transfer a note eventis about 1ms Therefore assuming that a piano score canhave at most 10 simultaneous notes 119879 should be greaterthan 10ms Another condition is that 119879 should be smallenough to give the user the feeling of real-time control ofthe expressive intentions that is 119879 lt 119905119864 where 119905119864 is thetime the user needs to recognise that the expressive intentionhas changed Estimating 119905119864 is not a trivial task becausemany aspects have to be considered such as the cognitiveprocesses and the cultural and musical context Howeverexperimental evidence suggests that certain affective nuancescan be recognised also in musical excerpts with a lengthof 250ms [36] A preliminary test showed that a period119879 = 100ms guarantees the on-time processing of themusical events and has been judged sufficient to satisfy thereal-time requirement in this context Figure 4 shows howthe PlayExpressive() function works

(1) NaturalizerThe Naturalizer computes the deviations thatare applied to the nominal parameters to obtain a naturalperformance Each time the PlayExpressive() functionis called by the timer the note events listmdashwhich containsall the notes to be playedmdashis searched for the events withan onset time 119874nom[119899] such that 119905119901 lt 119874nom[119899] le 119905119901 +

119879 If no event is found the function ends otherwise theexpressive cues list is searched for cues linked to the notesto be played For each expressive cue that is found amodifieris instantiated The modifier is a data structure that containsthe deviations to be applied to the expressive parameters dueto the expressive cue written in the score The parametersinfluenced by the different expressive cues are listed inTable 2 Depending on the type of cue the modifier canwork on a single event only identified by its ID number oron a sequence of events identified by a temporal interval(the modifierrsquos scope) and a voice number The expressivedeviations can be additive or proportional in function of thetype of cue and the expressive parameters The deviations are

Advances in Human-Computer Interaction 7

ExitNotes to beplayed

Expressivecues

Newmodifier Sync rules

Voiceleading

rules

Readuser

input

Expressiveintention

processingConstraints

rules

Deviationsrules

No

No

Yes

Yes

Send notesto player

queue

Begin

Exit

Naturalizer

Expressivizer

Figure 4 Schema of the PlayExpressive() function

applied by the successivemodule named deviation rulesThismodule combines the deviations defined by all the modifierswithin a temporal scope between 119905119901 and 119905119901 + 119879 following aset of rules The output is for each note a set of expressiveparameters which define how the note is played AlthoughCaRo 20 can control several different musical instruments(eg violin flute and clarinet) with a variable number ofparameters this paper is focused on piano performancesand the parameters computed by the rules system are IOInatDRnat and KVnat where the subscript nat identifies theparameters of the natural performance and KV is the Key-Velocity a parameter of the MIDI protocol related to thesound intensity

(2) Expressivizer The next step of the rendering algorithm iscalled Expressivizer and its aim is to modify the expressivedeviations computed by the Naturalizer in order to renderthe userrsquos different expressive intentions It is based onthe multilayer model shown in Figure 5 the user specifiesa point or a trajectory in an abstract control space thecoordinates of the control space are mapped on a set ofexpressive features finally the features are used to computethe deviations that must be applied to the score for conveyinga particular expressive intention As an example Figure 7shows a mapping surface between points of the control spaceand the feature Intensity Based on the movements of thepointer on the 119909-119910 plane the values of the Intensity featureare thus computed The user can select a single expressiveintention and the system will play the entire score withthat nuance Trajectories can also be drawn in the space toexperience the dynamic aspects of the expressive content

The abstract control space 119878 is represented as a set of 119873

triples 119878119894 = (119897119894 119901119894 119890119894) with 119894 = 1 119873 where 119897119894 is a verballabel that semantically identifies an expressive intention (egbright light and soft) 119901119894 isin R2 represents the coordinates inthe control space of the expressive intention 119897119894 and 119890119894 isin R119896 is

Semantic spacesKinematicenergy spaceValencearousal space

Features

Physical events O[n] DR[n] KV[n]

middot middot middot

Tempo ΔtempoIntensity Δintensity

Articulation Δarticulation

Figure 5 Multilayer model

a vector of features which characterise the expressive inten-tion 119897119894

119878 defines the values of the expressive features for the 119873

points of coordinates 119901119894 To have a continuos control spacein which each point of the control space is associated with afeature vector a mapping function 119891 R2 rarr R119896 is definedas

119891 (119909) =

119890119894 if 119909 = 119901119894119899

sum

119894=1

119886119894 sdot 119890119894 elsewhere (2)

with

1

119886119894

=

1003817

1003817

1003817

1003817

119909 minus 119901119894

1003817

1003817

1003817

1003817

2sdot

119873

sum

119895=1

1

1003817

1003817

1003817

1003817

1003817

119909 minus 119901119895

1003817

1003817

1003817

1003817

1003817

2 (3)

Let 119909 be a point in the control space the mappingfunction calculates the corresponding features vector 119910 =

119891(119909) as a linear combination of 119873 features vectors onefor each expressive intention defined in 119878 The featuresvectors are weighted inversely proportional to the square of

8 Advances in Human-Computer Interaction

Figure 6The options window (designed for musicians) that allowsthe fine tuning of the model For general public it is possible to loadpreset configurations (see the button ldquoload parameters from filerdquo)

15

1

05

Inte

nsity

1

05

0

minus05

minus1

105

0minus05

minus1Sad

HappyNatural

Angry

Figure 7Map between the 119909-119910 coordinates in the control space andthe Intensity feature

the euclidean distance between their corresponding expres-sive intention and 119909 Finally the resulting features vector canbe used to compute the expressive deviations represented bythe physical layer in Figure 5 For example the Tempo featureis used to compute the parameters of the 119899th physical eventfrom the output of the Naturalizer module according thefollowing equations

119874exp [119899] = 119874nat [119899 minus 1] + (119874nat [119899] minus 119874nat [119899 minus 1]) sdot Tempo

DRexp [119899] = DRnat [119899] sdot Tempo

(4)

where 119874exp and 119874nat are the onset times of the expressive andnatural performance respectively and DRexp and DRnat arethe event durations

43 User-Defined Control Space for Interaction The abstractcontrol space constitutes the interface between the userconcepts of expression and their internal representation intothe system The user following hisher own preferencescan create the semantic space which controls the expressiveintention of the performance Each user can select hisherown expressive ideas and then design the mapping ofthese concepts to positions and movements on this space

Figure 8 A control space designed for an educational context Theposition of the emoticons follows the results of [37]

Figure 9The control space is used to change interactively the soundcomment of the multimedia product (a cartoon from the movieldquoThe Good The Bad and The Uglyrdquo by Sergio Leone in this case)The author can associate different expressive intentions with thecharacters (here see the blue labels 1 2 and 3)

A preferences window (see Figure 6) allows the user to definethe set of performance styles by specifying for the 119894th style averbal semantic label 119897119894 the coordinates in the control space119901119894 and the features vector 119890119894

As a particular and significant case the emotional andsensory control metaphors can be used For example amusician may find it more convenient to use the Kinetics-Energy space (see Figure 1) because it is based on terms suchas bright dark or soft that are commonly used to describemusical nuances On the contrary a nonmusician or a childmay prefer the Valence-Arousal space with emotions beinga very common experience for every listener The sensoryspace can be defined by placing the semantic labels (withtheir associated features) Hard Soft Dark and Bright in fourorthogonal points of the space Similarly the Valence-Arousalspace can be obtained by placing the emotional labels HappyCalm Sad andAngry at the four corners of the control spaceThese spaces are implemented as presets of the system whichcan be selected by the user (see eg Figure 8)

Specific and new expressive intentions can be associatedwith objectsavatars which are placed in the screen and amovement of the pointer from an object to another onechanges continuously the resulting music expression Forexample a different type of music can be associated with thecharacters represented in the image Also the object with its

Advances in Human-Computer Interaction 9

Piano

RondoAllegretto

5

cresc

9

cresc

13

Sonate No 163rd movementopus 31 No 1

(1770-1827)Ludwig van Beethoven

Figure 10 The music excerpt used to evaluate the CaRo 20 software The score has been transcribed in MusicXML format including all theexpressive indications (slurs articulation marks and dynamic cues)

Figure 11 The graphical user interface of CaRo 20 The dotted lineshows the trajectory drawn for rendering the hbls performance

expressive label and parameters can be moved and then thecontrol space can dynamically vary (see eg Figure 9)

More generally the system allows a creative design ofabstract spaces according to artistrsquos fantasies and needs byinventing new expressions and their spatial relations

5 Results

This section reports the results of some expressive music ren-derings obtainedwith CaRo 20 with the aim of showing howthe system works The beginning of the 3rd Movement of thePiano Sonata number 16 by L van Beethoven (see Figure 10)was chosen as a case study because it is characterised by alarge number of expressive nuances and at the same timeit is written with a quite simple musical texture (four voicesat most) which makes it easier to visualise and understandthe results Seven performances of the same BeethovenrsquosSonata were rendered five performances are characterisedby only one expressive intention (bright hard heavy lightand soft) and were obtained keeping the mouse fixed onthe corresponding adjective (see the GUI of Figure 11) oneperformance named hbls has been obtained drawing atrajectory on the two-dimensional control space going fromhard to bright light and soft (see the dotted line of Figure 11)finally a nominal performance was rendered applying noexpressive rule or transformation to the music score

The system rendered the different performances cor-rectly playing the sequences of notes written in the scoreMoreover expressive deviations were computed dependingon the userrsquos preferences specified by means of the con-trol space Figure 12 shows the piano roll representation ofthe nominal performance compared to the performances

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article CaRo 2.0: An Interactive System for

Advances in Human-Computer Interaction 5

Music score

EditorAnnotations

piano

protocol

Naturalizer Expressivizer

CaRo

Off-line Real-time

intention

controlledMIDI

MIDI

Userrsquos expressive

MusicXMLfile

Figure 3 System architecture CaRo 20 receives as input an annotated score in MusicXML format and generates in real-time messages toplay a MIDI controlled piano

beat which allows the musician to maintain the correcttempo between a conductorrsquos gesture and the next gesture

The differences between the systems described aboveinclude both the algorithms for computing the expressivedeviations and the aspects related to the userrsquos interactionIn the autonomous systems the user can interact with thesystem only in the selection of the training set and as aconsequence the performance style that the systemwill learnThe feedforward systems allow a deeper interaction with themodel parameters the user can set the parameters listen tothe results and then fine-tune the parameters again until theresults are satisfying The feedback systems allow a real-timecontrol of the parameters the user can change the parametersof the performance while (s)he is listening to it in a similarway as a human musician does Since the models for musicperformance usually have numerous parameters a crucialaspect of the feedback systems is how to allow the userto simultaneously control all these parameters in real-timeVirtualPhilharmony allows the user to control in real-timeonly two parameters (intensity and tempo) the other ones aredefined offline by mean of a so-called performance template

Also the system developed by the authors and describedin the next sections is a feedback system As explained laterthe real-time control of the parameters is allowed by meansof a control space based on a semantic description of differentexpressive intentions Starting from the trajectories drawn bythe user on this control space the systemmaps concepts suchas emotions or sensations in the low level parameters of themodel

4 The CaRo 20 System

41 Architecture CaRo 20 simulates the tasks of a pianistwho reads a musical score decodes its symbols plansthe expressive choices and finally executes the actionsin order to actually play the instrument Usually pianistsanalyse the score very carefully before the performanceand they add annotations and cues to the musical sheetin order to emphasise rhythmic-melodic structures sec-tion subdivisions and other interpretative indications Asshown in Figure 3 CaRo 20 receives a musical scoreas input compliant with the MusicXML standard [35]Any music editor able to export in MusicXML format

(eg Finale (httpwwwfinalemusiccom) or MuseScore(httpmusescoreorg)) can be used to write the score andadd annotations CaRo 20 is able to decode and process aset of expressive cues as specified in the next section Theslurs can be hierarchically structured to individuate sectionsmotifs and short musical cells Moreover textual labels canbe added to specify the interpretative choices

The specificity of CaRo 20 in comparison to the systemspresented in Section 3 is that it is based on a model thatexplicitly distinguishes the expressive deviations related tothe musical structure from those related to other musi-cal aspects such as the performerrsquos expressive intentionsThe computation is done in two steps The first modulecalled Naturalizer computes the deviations starting from theexpressive cues annotated in the score As the cues usuallyindicate particular elements of the score such as a musicalclimax or the end of a section the deviations have shapesthat reflect the structure of the score and the succession ofmotif and musical cells (see eg Figure 12) The deviationscomputed by the Normalizer correspond to a performancethat is musically correct but without a particular expressiveemphasis The second module called Expressivizer modifiesthe deviations computed in the previous step in order to char-acterize the performance according to the userrsquos expressiveintentions

42 Rendering The graphical interface of CaRo 20 (seeFigure 11) consists in a main window hosting the Kinetics-Energy space (see Section 23) used for the interactive controlof the expressive intentions and in a main menu throughwhich the user can load an annotated musical score inMusicXML format and launch the execution of the scoreWhen the user loads the MusicXML score the file is parsedand three data structures are generated note events expres-sive cues and hierarchical structure At this level the musicalscore is as a collection of separate events EV[ID] with theevent index ID = 1 119873 For instance a score can bedescribed as a collection of notes and rests Each event ischaracterised by a time reference and a list of attributes (suchas pitch and duration)

The first structure is a list of note events each onedescribed by the following fields ID number onset (ticks)duration (ticks) bar position (ticks) bar length (beats)

6 Advances in Human-Computer Interaction

Table 2 Correspondence among graphical expressive cues XML representation and rendering parameters

Graphicalsymbol MusicXML code Event value

∙ ltarticulations gtltstaccato default-x=ldquo3rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulations gt DR[119899]

gt ltarticulationsgtltaccent default-x=ldquominus1rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulationsgt KV[119899]

ndash ltarticulationsgtlttenuto default-x=ldquo1rdquo default-y=ldquo14rdquo placement=ldquoaboverdquogtltarticulationsgt DR[119899]

ltarticulationsgtltbreath-mark default-x=ldquo16rdquo default-y=ldquo18rdquo placement=ldquoaboverdquogtltarticulationsgt 119874[119899]

ltnotations gtltslur number=ldquo1rdquo placement=ldquoaboverdquo type=ldquostartrdquogtltnotationsgt 119874[119899] DR[119899]KV[119899]

ltdirection-typegtltpedal default-y=ldquominus91rdquo line=ldquonordquo type=ldquostartrdquogtltdirection-typegt KV[119899]

voice number grace (boolean) and pitch where ticks andbeats are common unit measures for representing musicalduration (for the complete MIDI detailed specification seehttpmidiorgtechspecsmidispecphp) The last field is apointer to a list of pitches each one represented by a stringof three characters following the schema 119883119884119885 where 119883 isthe name (from 119860 to 119866) of the note 119884 is the octave (from0 to 8) and 119885 is the alteration (minus1 stands for flat and +1

for sharp ♯) A chord (more tones played simultaneously) isrepresented by a list of pitches with more than one entry Anempty list indicates a music rest

The second structure is a list of expressive cues asspecified in the score Each entry is defined by type ID ofthe linked event and voice number Among many differentways to represent expressive cues in a score the system iscurrently able to recognise and render the expressive cueslisted in Table 2

The third data structure describes the hierarchical struc-ture of the piece that is its subdivision top to bottom inperiods phrases subphrases and motifs Each section isspecified by an entry composed by begin (ticks) end (ticks)and hierarchical level number

For clarity reasons it is useful to express the durationof the musical event in seconds instead of metric units suchas ticks and beats This conversion is possible taking thetempo marking written in the score if any or normalisingthe total score length to the performance duration Thisrepresentation is called nominal time 119905nom the event onsetis called nominal onset time 119874nom[119899] the event length iscalled nominal durationDRnom[119899] and distance between twoadjacent events is called nominal inter onset interval

IOInom [119899] = 119874nom [119899 + 1] minus 119874nom [119899] (1)

When the user selects the command to start execution ofthe score a timer is instantiated to have a temporal referencefor the processing of the musical events Let 119905119901 be the timepast from the beginning of the performance Since the systemis interactive and the results depend on the userrsquos input at thetime 119905119901 the expressive deviations (and therefore the values ofeach note event) have to be computed just in time before theyare sent to the MIDI interface The deviations are calculatedby the PlayExpressive() function which is periodicallycalled by the timer with a time interval 119879 This interval isa critical parameter of the system and its value depends on

a trade-off between two conditions First the time intervalmust be large enough to allow the processing and playing ofthe note events that is 119879 lt 119905119862 + 119905119872 where 119905119862 is the timefor calculating the expressive deviations generally negligibleand 119905119872 is the time to transmit the note events by meansof the MIDI protocol a serial protocol with a transfer rateof 31250 bps Since a note event is generally coded withthree bytes (and each byte including the start and stop bitsis composed by 10 bits) the time to transfer a note eventis about 1ms Therefore assuming that a piano score canhave at most 10 simultaneous notes 119879 should be greaterthan 10ms Another condition is that 119879 should be smallenough to give the user the feeling of real-time control ofthe expressive intentions that is 119879 lt 119905119864 where 119905119864 is thetime the user needs to recognise that the expressive intentionhas changed Estimating 119905119864 is not a trivial task becausemany aspects have to be considered such as the cognitiveprocesses and the cultural and musical context Howeverexperimental evidence suggests that certain affective nuancescan be recognised also in musical excerpts with a lengthof 250ms [36] A preliminary test showed that a period119879 = 100ms guarantees the on-time processing of themusical events and has been judged sufficient to satisfy thereal-time requirement in this context Figure 4 shows howthe PlayExpressive() function works

(1) NaturalizerThe Naturalizer computes the deviations thatare applied to the nominal parameters to obtain a naturalperformance Each time the PlayExpressive() functionis called by the timer the note events listmdashwhich containsall the notes to be playedmdashis searched for the events withan onset time 119874nom[119899] such that 119905119901 lt 119874nom[119899] le 119905119901 +

119879 If no event is found the function ends otherwise theexpressive cues list is searched for cues linked to the notesto be played For each expressive cue that is found amodifieris instantiated The modifier is a data structure that containsthe deviations to be applied to the expressive parameters dueto the expressive cue written in the score The parametersinfluenced by the different expressive cues are listed inTable 2 Depending on the type of cue the modifier canwork on a single event only identified by its ID number oron a sequence of events identified by a temporal interval(the modifierrsquos scope) and a voice number The expressivedeviations can be additive or proportional in function of thetype of cue and the expressive parameters The deviations are

Advances in Human-Computer Interaction 7

ExitNotes to beplayed

Expressivecues

Newmodifier Sync rules

Voiceleading

rules

Readuser

input

Expressiveintention

processingConstraints

rules

Deviationsrules

No

No

Yes

Yes

Send notesto player

queue

Begin

Exit

Naturalizer

Expressivizer

Figure 4 Schema of the PlayExpressive() function

applied by the successivemodule named deviation rulesThismodule combines the deviations defined by all the modifierswithin a temporal scope between 119905119901 and 119905119901 + 119879 following aset of rules The output is for each note a set of expressiveparameters which define how the note is played AlthoughCaRo 20 can control several different musical instruments(eg violin flute and clarinet) with a variable number ofparameters this paper is focused on piano performancesand the parameters computed by the rules system are IOInatDRnat and KVnat where the subscript nat identifies theparameters of the natural performance and KV is the Key-Velocity a parameter of the MIDI protocol related to thesound intensity

(2) Expressivizer The next step of the rendering algorithm iscalled Expressivizer and its aim is to modify the expressivedeviations computed by the Naturalizer in order to renderthe userrsquos different expressive intentions It is based onthe multilayer model shown in Figure 5 the user specifiesa point or a trajectory in an abstract control space thecoordinates of the control space are mapped on a set ofexpressive features finally the features are used to computethe deviations that must be applied to the score for conveyinga particular expressive intention As an example Figure 7shows a mapping surface between points of the control spaceand the feature Intensity Based on the movements of thepointer on the 119909-119910 plane the values of the Intensity featureare thus computed The user can select a single expressiveintention and the system will play the entire score withthat nuance Trajectories can also be drawn in the space toexperience the dynamic aspects of the expressive content

The abstract control space 119878 is represented as a set of 119873

triples 119878119894 = (119897119894 119901119894 119890119894) with 119894 = 1 119873 where 119897119894 is a verballabel that semantically identifies an expressive intention (egbright light and soft) 119901119894 isin R2 represents the coordinates inthe control space of the expressive intention 119897119894 and 119890119894 isin R119896 is

Semantic spacesKinematicenergy spaceValencearousal space

Features

Physical events O[n] DR[n] KV[n]

middot middot middot

Tempo ΔtempoIntensity Δintensity

Articulation Δarticulation

Figure 5 Multilayer model

a vector of features which characterise the expressive inten-tion 119897119894

119878 defines the values of the expressive features for the 119873

points of coordinates 119901119894 To have a continuos control spacein which each point of the control space is associated with afeature vector a mapping function 119891 R2 rarr R119896 is definedas

119891 (119909) =

119890119894 if 119909 = 119901119894119899

sum

119894=1

119886119894 sdot 119890119894 elsewhere (2)

with

1

119886119894

=

1003817

1003817

1003817

1003817

119909 minus 119901119894

1003817

1003817

1003817

1003817

2sdot

119873

sum

119895=1

1

1003817

1003817

1003817

1003817

1003817

119909 minus 119901119895

1003817

1003817

1003817

1003817

1003817

2 (3)

Let 119909 be a point in the control space the mappingfunction calculates the corresponding features vector 119910 =

119891(119909) as a linear combination of 119873 features vectors onefor each expressive intention defined in 119878 The featuresvectors are weighted inversely proportional to the square of

8 Advances in Human-Computer Interaction

Figure 6The options window (designed for musicians) that allowsthe fine tuning of the model For general public it is possible to loadpreset configurations (see the button ldquoload parameters from filerdquo)

15

1

05

Inte

nsity

1

05

0

minus05

minus1

105

0minus05

minus1Sad

HappyNatural

Angry

Figure 7Map between the 119909-119910 coordinates in the control space andthe Intensity feature

the euclidean distance between their corresponding expres-sive intention and 119909 Finally the resulting features vector canbe used to compute the expressive deviations represented bythe physical layer in Figure 5 For example the Tempo featureis used to compute the parameters of the 119899th physical eventfrom the output of the Naturalizer module according thefollowing equations

119874exp [119899] = 119874nat [119899 minus 1] + (119874nat [119899] minus 119874nat [119899 minus 1]) sdot Tempo

DRexp [119899] = DRnat [119899] sdot Tempo

(4)

where 119874exp and 119874nat are the onset times of the expressive andnatural performance respectively and DRexp and DRnat arethe event durations

43 User-Defined Control Space for Interaction The abstractcontrol space constitutes the interface between the userconcepts of expression and their internal representation intothe system The user following hisher own preferencescan create the semantic space which controls the expressiveintention of the performance Each user can select hisherown expressive ideas and then design the mapping ofthese concepts to positions and movements on this space

Figure 8 A control space designed for an educational context Theposition of the emoticons follows the results of [37]

Figure 9The control space is used to change interactively the soundcomment of the multimedia product (a cartoon from the movieldquoThe Good The Bad and The Uglyrdquo by Sergio Leone in this case)The author can associate different expressive intentions with thecharacters (here see the blue labels 1 2 and 3)

A preferences window (see Figure 6) allows the user to definethe set of performance styles by specifying for the 119894th style averbal semantic label 119897119894 the coordinates in the control space119901119894 and the features vector 119890119894

As a particular and significant case the emotional andsensory control metaphors can be used For example amusician may find it more convenient to use the Kinetics-Energy space (see Figure 1) because it is based on terms suchas bright dark or soft that are commonly used to describemusical nuances On the contrary a nonmusician or a childmay prefer the Valence-Arousal space with emotions beinga very common experience for every listener The sensoryspace can be defined by placing the semantic labels (withtheir associated features) Hard Soft Dark and Bright in fourorthogonal points of the space Similarly the Valence-Arousalspace can be obtained by placing the emotional labels HappyCalm Sad andAngry at the four corners of the control spaceThese spaces are implemented as presets of the system whichcan be selected by the user (see eg Figure 8)

Specific and new expressive intentions can be associatedwith objectsavatars which are placed in the screen and amovement of the pointer from an object to another onechanges continuously the resulting music expression Forexample a different type of music can be associated with thecharacters represented in the image Also the object with its

Advances in Human-Computer Interaction 9

Piano

RondoAllegretto

5

cresc

9

cresc

13

Sonate No 163rd movementopus 31 No 1

(1770-1827)Ludwig van Beethoven

Figure 10 The music excerpt used to evaluate the CaRo 20 software The score has been transcribed in MusicXML format including all theexpressive indications (slurs articulation marks and dynamic cues)

Figure 11 The graphical user interface of CaRo 20 The dotted lineshows the trajectory drawn for rendering the hbls performance

expressive label and parameters can be moved and then thecontrol space can dynamically vary (see eg Figure 9)

More generally the system allows a creative design ofabstract spaces according to artistrsquos fantasies and needs byinventing new expressions and their spatial relations

5 Results

This section reports the results of some expressive music ren-derings obtainedwith CaRo 20 with the aim of showing howthe system works The beginning of the 3rd Movement of thePiano Sonata number 16 by L van Beethoven (see Figure 10)was chosen as a case study because it is characterised by alarge number of expressive nuances and at the same timeit is written with a quite simple musical texture (four voicesat most) which makes it easier to visualise and understandthe results Seven performances of the same BeethovenrsquosSonata were rendered five performances are characterisedby only one expressive intention (bright hard heavy lightand soft) and were obtained keeping the mouse fixed onthe corresponding adjective (see the GUI of Figure 11) oneperformance named hbls has been obtained drawing atrajectory on the two-dimensional control space going fromhard to bright light and soft (see the dotted line of Figure 11)finally a nominal performance was rendered applying noexpressive rule or transformation to the music score

The system rendered the different performances cor-rectly playing the sequences of notes written in the scoreMoreover expressive deviations were computed dependingon the userrsquos preferences specified by means of the con-trol space Figure 12 shows the piano roll representation ofthe nominal performance compared to the performances

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article CaRo 2.0: An Interactive System for

6 Advances in Human-Computer Interaction

Table 2 Correspondence among graphical expressive cues XML representation and rendering parameters

Graphicalsymbol MusicXML code Event value

∙ ltarticulations gtltstaccato default-x=ldquo3rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulations gt DR[119899]

gt ltarticulationsgtltaccent default-x=ldquominus1rdquo default-y=ldquo13rdquo placement=ldquoaboverdquogtltarticulationsgt KV[119899]

ndash ltarticulationsgtlttenuto default-x=ldquo1rdquo default-y=ldquo14rdquo placement=ldquoaboverdquogtltarticulationsgt DR[119899]

ltarticulationsgtltbreath-mark default-x=ldquo16rdquo default-y=ldquo18rdquo placement=ldquoaboverdquogtltarticulationsgt 119874[119899]

ltnotations gtltslur number=ldquo1rdquo placement=ldquoaboverdquo type=ldquostartrdquogtltnotationsgt 119874[119899] DR[119899]KV[119899]

ltdirection-typegtltpedal default-y=ldquominus91rdquo line=ldquonordquo type=ldquostartrdquogtltdirection-typegt KV[119899]

voice number grace (boolean) and pitch where ticks andbeats are common unit measures for representing musicalduration (for the complete MIDI detailed specification seehttpmidiorgtechspecsmidispecphp) The last field is apointer to a list of pitches each one represented by a stringof three characters following the schema 119883119884119885 where 119883 isthe name (from 119860 to 119866) of the note 119884 is the octave (from0 to 8) and 119885 is the alteration (minus1 stands for flat and +1

for sharp ♯) A chord (more tones played simultaneously) isrepresented by a list of pitches with more than one entry Anempty list indicates a music rest

The second structure is a list of expressive cues asspecified in the score Each entry is defined by type ID ofthe linked event and voice number Among many differentways to represent expressive cues in a score the system iscurrently able to recognise and render the expressive cueslisted in Table 2

The third data structure describes the hierarchical struc-ture of the piece that is its subdivision top to bottom inperiods phrases subphrases and motifs Each section isspecified by an entry composed by begin (ticks) end (ticks)and hierarchical level number

For clarity reasons it is useful to express the durationof the musical event in seconds instead of metric units suchas ticks and beats This conversion is possible taking thetempo marking written in the score if any or normalisingthe total score length to the performance duration Thisrepresentation is called nominal time 119905nom the event onsetis called nominal onset time 119874nom[119899] the event length iscalled nominal durationDRnom[119899] and distance between twoadjacent events is called nominal inter onset interval

IOInom [119899] = 119874nom [119899 + 1] minus 119874nom [119899] (1)

When the user selects the command to start execution ofthe score a timer is instantiated to have a temporal referencefor the processing of the musical events Let 119905119901 be the timepast from the beginning of the performance Since the systemis interactive and the results depend on the userrsquos input at thetime 119905119901 the expressive deviations (and therefore the values ofeach note event) have to be computed just in time before theyare sent to the MIDI interface The deviations are calculatedby the PlayExpressive() function which is periodicallycalled by the timer with a time interval 119879 This interval isa critical parameter of the system and its value depends on

a trade-off between two conditions First the time intervalmust be large enough to allow the processing and playing ofthe note events that is 119879 lt 119905119862 + 119905119872 where 119905119862 is the timefor calculating the expressive deviations generally negligibleand 119905119872 is the time to transmit the note events by meansof the MIDI protocol a serial protocol with a transfer rateof 31250 bps Since a note event is generally coded withthree bytes (and each byte including the start and stop bitsis composed by 10 bits) the time to transfer a note eventis about 1ms Therefore assuming that a piano score canhave at most 10 simultaneous notes 119879 should be greaterthan 10ms Another condition is that 119879 should be smallenough to give the user the feeling of real-time control ofthe expressive intentions that is 119879 lt 119905119864 where 119905119864 is thetime the user needs to recognise that the expressive intentionhas changed Estimating 119905119864 is not a trivial task becausemany aspects have to be considered such as the cognitiveprocesses and the cultural and musical context Howeverexperimental evidence suggests that certain affective nuancescan be recognised also in musical excerpts with a lengthof 250ms [36] A preliminary test showed that a period119879 = 100ms guarantees the on-time processing of themusical events and has been judged sufficient to satisfy thereal-time requirement in this context Figure 4 shows howthe PlayExpressive() function works

(1) NaturalizerThe Naturalizer computes the deviations thatare applied to the nominal parameters to obtain a naturalperformance Each time the PlayExpressive() functionis called by the timer the note events listmdashwhich containsall the notes to be playedmdashis searched for the events withan onset time 119874nom[119899] such that 119905119901 lt 119874nom[119899] le 119905119901 +

119879 If no event is found the function ends otherwise theexpressive cues list is searched for cues linked to the notesto be played For each expressive cue that is found amodifieris instantiated The modifier is a data structure that containsthe deviations to be applied to the expressive parameters dueto the expressive cue written in the score The parametersinfluenced by the different expressive cues are listed inTable 2 Depending on the type of cue the modifier canwork on a single event only identified by its ID number oron a sequence of events identified by a temporal interval(the modifierrsquos scope) and a voice number The expressivedeviations can be additive or proportional in function of thetype of cue and the expressive parameters The deviations are

Advances in Human-Computer Interaction 7

ExitNotes to beplayed

Expressivecues

Newmodifier Sync rules

Voiceleading

rules

Readuser

input

Expressiveintention

processingConstraints

rules

Deviationsrules

No

No

Yes

Yes

Send notesto player

queue

Begin

Exit

Naturalizer

Expressivizer

Figure 4 Schema of the PlayExpressive() function

applied by the successivemodule named deviation rulesThismodule combines the deviations defined by all the modifierswithin a temporal scope between 119905119901 and 119905119901 + 119879 following aset of rules The output is for each note a set of expressiveparameters which define how the note is played AlthoughCaRo 20 can control several different musical instruments(eg violin flute and clarinet) with a variable number ofparameters this paper is focused on piano performancesand the parameters computed by the rules system are IOInatDRnat and KVnat where the subscript nat identifies theparameters of the natural performance and KV is the Key-Velocity a parameter of the MIDI protocol related to thesound intensity

(2) Expressivizer The next step of the rendering algorithm iscalled Expressivizer and its aim is to modify the expressivedeviations computed by the Naturalizer in order to renderthe userrsquos different expressive intentions It is based onthe multilayer model shown in Figure 5 the user specifiesa point or a trajectory in an abstract control space thecoordinates of the control space are mapped on a set ofexpressive features finally the features are used to computethe deviations that must be applied to the score for conveyinga particular expressive intention As an example Figure 7shows a mapping surface between points of the control spaceand the feature Intensity Based on the movements of thepointer on the 119909-119910 plane the values of the Intensity featureare thus computed The user can select a single expressiveintention and the system will play the entire score withthat nuance Trajectories can also be drawn in the space toexperience the dynamic aspects of the expressive content

The abstract control space 119878 is represented as a set of 119873

triples 119878119894 = (119897119894 119901119894 119890119894) with 119894 = 1 119873 where 119897119894 is a verballabel that semantically identifies an expressive intention (egbright light and soft) 119901119894 isin R2 represents the coordinates inthe control space of the expressive intention 119897119894 and 119890119894 isin R119896 is

Semantic spacesKinematicenergy spaceValencearousal space

Features

Physical events O[n] DR[n] KV[n]

middot middot middot

Tempo ΔtempoIntensity Δintensity

Articulation Δarticulation

Figure 5 Multilayer model

a vector of features which characterise the expressive inten-tion 119897119894

119878 defines the values of the expressive features for the 119873

points of coordinates 119901119894 To have a continuos control spacein which each point of the control space is associated with afeature vector a mapping function 119891 R2 rarr R119896 is definedas

119891 (119909) =

119890119894 if 119909 = 119901119894119899

sum

119894=1

119886119894 sdot 119890119894 elsewhere (2)

with

1

119886119894

=

1003817

1003817

1003817

1003817

119909 minus 119901119894

1003817

1003817

1003817

1003817

2sdot

119873

sum

119895=1

1

1003817

1003817

1003817

1003817

1003817

119909 minus 119901119895

1003817

1003817

1003817

1003817

1003817

2 (3)

Let 119909 be a point in the control space the mappingfunction calculates the corresponding features vector 119910 =

119891(119909) as a linear combination of 119873 features vectors onefor each expressive intention defined in 119878 The featuresvectors are weighted inversely proportional to the square of

8 Advances in Human-Computer Interaction

Figure 6The options window (designed for musicians) that allowsthe fine tuning of the model For general public it is possible to loadpreset configurations (see the button ldquoload parameters from filerdquo)

15

1

05

Inte

nsity

1

05

0

minus05

minus1

105

0minus05

minus1Sad

HappyNatural

Angry

Figure 7Map between the 119909-119910 coordinates in the control space andthe Intensity feature

the euclidean distance between their corresponding expres-sive intention and 119909 Finally the resulting features vector canbe used to compute the expressive deviations represented bythe physical layer in Figure 5 For example the Tempo featureis used to compute the parameters of the 119899th physical eventfrom the output of the Naturalizer module according thefollowing equations

119874exp [119899] = 119874nat [119899 minus 1] + (119874nat [119899] minus 119874nat [119899 minus 1]) sdot Tempo

DRexp [119899] = DRnat [119899] sdot Tempo

(4)

where 119874exp and 119874nat are the onset times of the expressive andnatural performance respectively and DRexp and DRnat arethe event durations

43 User-Defined Control Space for Interaction The abstractcontrol space constitutes the interface between the userconcepts of expression and their internal representation intothe system The user following hisher own preferencescan create the semantic space which controls the expressiveintention of the performance Each user can select hisherown expressive ideas and then design the mapping ofthese concepts to positions and movements on this space

Figure 8 A control space designed for an educational context Theposition of the emoticons follows the results of [37]

Figure 9The control space is used to change interactively the soundcomment of the multimedia product (a cartoon from the movieldquoThe Good The Bad and The Uglyrdquo by Sergio Leone in this case)The author can associate different expressive intentions with thecharacters (here see the blue labels 1 2 and 3)

A preferences window (see Figure 6) allows the user to definethe set of performance styles by specifying for the 119894th style averbal semantic label 119897119894 the coordinates in the control space119901119894 and the features vector 119890119894

As a particular and significant case the emotional andsensory control metaphors can be used For example amusician may find it more convenient to use the Kinetics-Energy space (see Figure 1) because it is based on terms suchas bright dark or soft that are commonly used to describemusical nuances On the contrary a nonmusician or a childmay prefer the Valence-Arousal space with emotions beinga very common experience for every listener The sensoryspace can be defined by placing the semantic labels (withtheir associated features) Hard Soft Dark and Bright in fourorthogonal points of the space Similarly the Valence-Arousalspace can be obtained by placing the emotional labels HappyCalm Sad andAngry at the four corners of the control spaceThese spaces are implemented as presets of the system whichcan be selected by the user (see eg Figure 8)

Specific and new expressive intentions can be associatedwith objectsavatars which are placed in the screen and amovement of the pointer from an object to another onechanges continuously the resulting music expression Forexample a different type of music can be associated with thecharacters represented in the image Also the object with its

Advances in Human-Computer Interaction 9

Piano

RondoAllegretto

5

cresc

9

cresc

13

Sonate No 163rd movementopus 31 No 1

(1770-1827)Ludwig van Beethoven

Figure 10 The music excerpt used to evaluate the CaRo 20 software The score has been transcribed in MusicXML format including all theexpressive indications (slurs articulation marks and dynamic cues)

Figure 11 The graphical user interface of CaRo 20 The dotted lineshows the trajectory drawn for rendering the hbls performance

expressive label and parameters can be moved and then thecontrol space can dynamically vary (see eg Figure 9)

More generally the system allows a creative design ofabstract spaces according to artistrsquos fantasies and needs byinventing new expressions and their spatial relations

5 Results

This section reports the results of some expressive music ren-derings obtainedwith CaRo 20 with the aim of showing howthe system works The beginning of the 3rd Movement of thePiano Sonata number 16 by L van Beethoven (see Figure 10)was chosen as a case study because it is characterised by alarge number of expressive nuances and at the same timeit is written with a quite simple musical texture (four voicesat most) which makes it easier to visualise and understandthe results Seven performances of the same BeethovenrsquosSonata were rendered five performances are characterisedby only one expressive intention (bright hard heavy lightand soft) and were obtained keeping the mouse fixed onthe corresponding adjective (see the GUI of Figure 11) oneperformance named hbls has been obtained drawing atrajectory on the two-dimensional control space going fromhard to bright light and soft (see the dotted line of Figure 11)finally a nominal performance was rendered applying noexpressive rule or transformation to the music score

The system rendered the different performances cor-rectly playing the sequences of notes written in the scoreMoreover expressive deviations were computed dependingon the userrsquos preferences specified by means of the con-trol space Figure 12 shows the piano roll representation ofthe nominal performance compared to the performances

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article CaRo 2.0: An Interactive System for

Advances in Human-Computer Interaction 7

ExitNotes to beplayed

Expressivecues

Newmodifier Sync rules

Voiceleading

rules

Readuser

input

Expressiveintention

processingConstraints

rules

Deviationsrules

No

No

Yes

Yes

Send notesto player

queue

Begin

Exit

Naturalizer

Expressivizer

Figure 4 Schema of the PlayExpressive() function

applied by the successivemodule named deviation rulesThismodule combines the deviations defined by all the modifierswithin a temporal scope between 119905119901 and 119905119901 + 119879 following aset of rules The output is for each note a set of expressiveparameters which define how the note is played AlthoughCaRo 20 can control several different musical instruments(eg violin flute and clarinet) with a variable number ofparameters this paper is focused on piano performancesand the parameters computed by the rules system are IOInatDRnat and KVnat where the subscript nat identifies theparameters of the natural performance and KV is the Key-Velocity a parameter of the MIDI protocol related to thesound intensity

(2) Expressivizer The next step of the rendering algorithm iscalled Expressivizer and its aim is to modify the expressivedeviations computed by the Naturalizer in order to renderthe userrsquos different expressive intentions It is based onthe multilayer model shown in Figure 5 the user specifiesa point or a trajectory in an abstract control space thecoordinates of the control space are mapped on a set ofexpressive features finally the features are used to computethe deviations that must be applied to the score for conveyinga particular expressive intention As an example Figure 7shows a mapping surface between points of the control spaceand the feature Intensity Based on the movements of thepointer on the 119909-119910 plane the values of the Intensity featureare thus computed The user can select a single expressiveintention and the system will play the entire score withthat nuance Trajectories can also be drawn in the space toexperience the dynamic aspects of the expressive content

The abstract control space 119878 is represented as a set of 119873

triples 119878119894 = (119897119894 119901119894 119890119894) with 119894 = 1 119873 where 119897119894 is a verballabel that semantically identifies an expressive intention (egbright light and soft) 119901119894 isin R2 represents the coordinates inthe control space of the expressive intention 119897119894 and 119890119894 isin R119896 is

Semantic spacesKinematicenergy spaceValencearousal space

Features

Physical events O[n] DR[n] KV[n]

middot middot middot

Tempo ΔtempoIntensity Δintensity

Articulation Δarticulation

Figure 5 Multilayer model

a vector of features which characterise the expressive inten-tion 119897119894

119878 defines the values of the expressive features for the 119873

points of coordinates 119901119894 To have a continuos control spacein which each point of the control space is associated with afeature vector a mapping function 119891 R2 rarr R119896 is definedas

119891 (119909) =

119890119894 if 119909 = 119901119894119899

sum

119894=1

119886119894 sdot 119890119894 elsewhere (2)

with

1

119886119894

=

1003817

1003817

1003817

1003817

119909 minus 119901119894

1003817

1003817

1003817

1003817

2sdot

119873

sum

119895=1

1

1003817

1003817

1003817

1003817

1003817

119909 minus 119901119895

1003817

1003817

1003817

1003817

1003817

2 (3)

Let 119909 be a point in the control space the mappingfunction calculates the corresponding features vector 119910 =

119891(119909) as a linear combination of 119873 features vectors onefor each expressive intention defined in 119878 The featuresvectors are weighted inversely proportional to the square of

8 Advances in Human-Computer Interaction

Figure 6The options window (designed for musicians) that allowsthe fine tuning of the model For general public it is possible to loadpreset configurations (see the button ldquoload parameters from filerdquo)

15

1

05

Inte

nsity

1

05

0

minus05

minus1

105

0minus05

minus1Sad

HappyNatural

Angry

Figure 7Map between the 119909-119910 coordinates in the control space andthe Intensity feature

the euclidean distance between their corresponding expres-sive intention and 119909 Finally the resulting features vector canbe used to compute the expressive deviations represented bythe physical layer in Figure 5 For example the Tempo featureis used to compute the parameters of the 119899th physical eventfrom the output of the Naturalizer module according thefollowing equations

119874exp [119899] = 119874nat [119899 minus 1] + (119874nat [119899] minus 119874nat [119899 minus 1]) sdot Tempo

DRexp [119899] = DRnat [119899] sdot Tempo

(4)

where 119874exp and 119874nat are the onset times of the expressive andnatural performance respectively and DRexp and DRnat arethe event durations

43 User-Defined Control Space for Interaction The abstractcontrol space constitutes the interface between the userconcepts of expression and their internal representation intothe system The user following hisher own preferencescan create the semantic space which controls the expressiveintention of the performance Each user can select hisherown expressive ideas and then design the mapping ofthese concepts to positions and movements on this space

Figure 8 A control space designed for an educational context Theposition of the emoticons follows the results of [37]

Figure 9The control space is used to change interactively the soundcomment of the multimedia product (a cartoon from the movieldquoThe Good The Bad and The Uglyrdquo by Sergio Leone in this case)The author can associate different expressive intentions with thecharacters (here see the blue labels 1 2 and 3)

A preferences window (see Figure 6) allows the user to definethe set of performance styles by specifying for the 119894th style averbal semantic label 119897119894 the coordinates in the control space119901119894 and the features vector 119890119894

As a particular and significant case the emotional andsensory control metaphors can be used For example amusician may find it more convenient to use the Kinetics-Energy space (see Figure 1) because it is based on terms suchas bright dark or soft that are commonly used to describemusical nuances On the contrary a nonmusician or a childmay prefer the Valence-Arousal space with emotions beinga very common experience for every listener The sensoryspace can be defined by placing the semantic labels (withtheir associated features) Hard Soft Dark and Bright in fourorthogonal points of the space Similarly the Valence-Arousalspace can be obtained by placing the emotional labels HappyCalm Sad andAngry at the four corners of the control spaceThese spaces are implemented as presets of the system whichcan be selected by the user (see eg Figure 8)

Specific and new expressive intentions can be associatedwith objectsavatars which are placed in the screen and amovement of the pointer from an object to another onechanges continuously the resulting music expression Forexample a different type of music can be associated with thecharacters represented in the image Also the object with its

Advances in Human-Computer Interaction 9

Piano

RondoAllegretto

5

cresc

9

cresc

13

Sonate No 163rd movementopus 31 No 1

(1770-1827)Ludwig van Beethoven

Figure 10 The music excerpt used to evaluate the CaRo 20 software The score has been transcribed in MusicXML format including all theexpressive indications (slurs articulation marks and dynamic cues)

Figure 11 The graphical user interface of CaRo 20 The dotted lineshows the trajectory drawn for rendering the hbls performance

expressive label and parameters can be moved and then thecontrol space can dynamically vary (see eg Figure 9)

More generally the system allows a creative design ofabstract spaces according to artistrsquos fantasies and needs byinventing new expressions and their spatial relations

5 Results

This section reports the results of some expressive music ren-derings obtainedwith CaRo 20 with the aim of showing howthe system works The beginning of the 3rd Movement of thePiano Sonata number 16 by L van Beethoven (see Figure 10)was chosen as a case study because it is characterised by alarge number of expressive nuances and at the same timeit is written with a quite simple musical texture (four voicesat most) which makes it easier to visualise and understandthe results Seven performances of the same BeethovenrsquosSonata were rendered five performances are characterisedby only one expressive intention (bright hard heavy lightand soft) and were obtained keeping the mouse fixed onthe corresponding adjective (see the GUI of Figure 11) oneperformance named hbls has been obtained drawing atrajectory on the two-dimensional control space going fromhard to bright light and soft (see the dotted line of Figure 11)finally a nominal performance was rendered applying noexpressive rule or transformation to the music score

The system rendered the different performances cor-rectly playing the sequences of notes written in the scoreMoreover expressive deviations were computed dependingon the userrsquos preferences specified by means of the con-trol space Figure 12 shows the piano roll representation ofthe nominal performance compared to the performances

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article CaRo 2.0: An Interactive System for

8 Advances in Human-Computer Interaction

Figure 6The options window (designed for musicians) that allowsthe fine tuning of the model For general public it is possible to loadpreset configurations (see the button ldquoload parameters from filerdquo)

15

1

05

Inte

nsity

1

05

0

minus05

minus1

105

0minus05

minus1Sad

HappyNatural

Angry

Figure 7Map between the 119909-119910 coordinates in the control space andthe Intensity feature

the euclidean distance between their corresponding expres-sive intention and 119909 Finally the resulting features vector canbe used to compute the expressive deviations represented bythe physical layer in Figure 5 For example the Tempo featureis used to compute the parameters of the 119899th physical eventfrom the output of the Naturalizer module according thefollowing equations

119874exp [119899] = 119874nat [119899 minus 1] + (119874nat [119899] minus 119874nat [119899 minus 1]) sdot Tempo

DRexp [119899] = DRnat [119899] sdot Tempo

(4)

where 119874exp and 119874nat are the onset times of the expressive andnatural performance respectively and DRexp and DRnat arethe event durations

43 User-Defined Control Space for Interaction The abstractcontrol space constitutes the interface between the userconcepts of expression and their internal representation intothe system The user following hisher own preferencescan create the semantic space which controls the expressiveintention of the performance Each user can select hisherown expressive ideas and then design the mapping ofthese concepts to positions and movements on this space

Figure 8 A control space designed for an educational context Theposition of the emoticons follows the results of [37]

Figure 9The control space is used to change interactively the soundcomment of the multimedia product (a cartoon from the movieldquoThe Good The Bad and The Uglyrdquo by Sergio Leone in this case)The author can associate different expressive intentions with thecharacters (here see the blue labels 1 2 and 3)

A preferences window (see Figure 6) allows the user to definethe set of performance styles by specifying for the 119894th style averbal semantic label 119897119894 the coordinates in the control space119901119894 and the features vector 119890119894

As a particular and significant case the emotional andsensory control metaphors can be used For example amusician may find it more convenient to use the Kinetics-Energy space (see Figure 1) because it is based on terms suchas bright dark or soft that are commonly used to describemusical nuances On the contrary a nonmusician or a childmay prefer the Valence-Arousal space with emotions beinga very common experience for every listener The sensoryspace can be defined by placing the semantic labels (withtheir associated features) Hard Soft Dark and Bright in fourorthogonal points of the space Similarly the Valence-Arousalspace can be obtained by placing the emotional labels HappyCalm Sad andAngry at the four corners of the control spaceThese spaces are implemented as presets of the system whichcan be selected by the user (see eg Figure 8)

Specific and new expressive intentions can be associatedwith objectsavatars which are placed in the screen and amovement of the pointer from an object to another onechanges continuously the resulting music expression Forexample a different type of music can be associated with thecharacters represented in the image Also the object with its

Advances in Human-Computer Interaction 9

Piano

RondoAllegretto

5

cresc

9

cresc

13

Sonate No 163rd movementopus 31 No 1

(1770-1827)Ludwig van Beethoven

Figure 10 The music excerpt used to evaluate the CaRo 20 software The score has been transcribed in MusicXML format including all theexpressive indications (slurs articulation marks and dynamic cues)

Figure 11 The graphical user interface of CaRo 20 The dotted lineshows the trajectory drawn for rendering the hbls performance

expressive label and parameters can be moved and then thecontrol space can dynamically vary (see eg Figure 9)

More generally the system allows a creative design ofabstract spaces according to artistrsquos fantasies and needs byinventing new expressions and their spatial relations

5 Results

This section reports the results of some expressive music ren-derings obtainedwith CaRo 20 with the aim of showing howthe system works The beginning of the 3rd Movement of thePiano Sonata number 16 by L van Beethoven (see Figure 10)was chosen as a case study because it is characterised by alarge number of expressive nuances and at the same timeit is written with a quite simple musical texture (four voicesat most) which makes it easier to visualise and understandthe results Seven performances of the same BeethovenrsquosSonata were rendered five performances are characterisedby only one expressive intention (bright hard heavy lightand soft) and were obtained keeping the mouse fixed onthe corresponding adjective (see the GUI of Figure 11) oneperformance named hbls has been obtained drawing atrajectory on the two-dimensional control space going fromhard to bright light and soft (see the dotted line of Figure 11)finally a nominal performance was rendered applying noexpressive rule or transformation to the music score

The system rendered the different performances cor-rectly playing the sequences of notes written in the scoreMoreover expressive deviations were computed dependingon the userrsquos preferences specified by means of the con-trol space Figure 12 shows the piano roll representation ofthe nominal performance compared to the performances

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article CaRo 2.0: An Interactive System for

Advances in Human-Computer Interaction 9

Piano

RondoAllegretto

5

cresc

9

cresc

13

Sonate No 163rd movementopus 31 No 1

(1770-1827)Ludwig van Beethoven

Figure 10 The music excerpt used to evaluate the CaRo 20 software The score has been transcribed in MusicXML format including all theexpressive indications (slurs articulation marks and dynamic cues)

Figure 11 The graphical user interface of CaRo 20 The dotted lineshows the trajectory drawn for rendering the hbls performance

expressive label and parameters can be moved and then thecontrol space can dynamically vary (see eg Figure 9)

More generally the system allows a creative design ofabstract spaces according to artistrsquos fantasies and needs byinventing new expressions and their spatial relations

5 Results

This section reports the results of some expressive music ren-derings obtainedwith CaRo 20 with the aim of showing howthe system works The beginning of the 3rd Movement of thePiano Sonata number 16 by L van Beethoven (see Figure 10)was chosen as a case study because it is characterised by alarge number of expressive nuances and at the same timeit is written with a quite simple musical texture (four voicesat most) which makes it easier to visualise and understandthe results Seven performances of the same BeethovenrsquosSonata were rendered five performances are characterisedby only one expressive intention (bright hard heavy lightand soft) and were obtained keeping the mouse fixed onthe corresponding adjective (see the GUI of Figure 11) oneperformance named hbls has been obtained drawing atrajectory on the two-dimensional control space going fromhard to bright light and soft (see the dotted line of Figure 11)finally a nominal performance was rendered applying noexpressive rule or transformation to the music score

The system rendered the different performances cor-rectly playing the sequences of notes written in the scoreMoreover expressive deviations were computed dependingon the userrsquos preferences specified by means of the con-trol space Figure 12 shows the piano roll representation ofthe nominal performance compared to the performances

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article CaRo 2.0: An Interactive System for

10 Advances in Human-Computer Interaction

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(a) Nominal

0 2 4 6 8 10 12 14 16 18 20 2236414651566166717681

Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

Pitc

h

020406080

100120

Velo

city

(b) Heavy

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

0 2 4 6 8 10 12 14 16 18 20 22Time (s)

36414651566166717681

Pitc

h

020406080

100120

Velo

city

(c) Light

Figure 12 Piano roll representation of the performances characterized by heavy (b) and light (c) expressive intention The nominalperformance (a) has been added as a reference

characterised by the expressive intentions heavy and light Itcan be noted that the onset duration and Key-Velocity ofthe notes change in function of the expressive intention (Apiano-roll representation is a method of representing a musi-cal stimulus or score for later computational analyses It canbe conceived as a two-dimensional graph the vertical axis is adigital representation of different notes the horizontal axis isa continuous representation of time When a note is played ahorizontal line is drawn on the piano-roll representationTheheight of this line represents which note was being played thebeginning of the line represents the notersquos onset the lengthof the line represents the notersquos duration and the end ofthe line represents the notes offset [38]) The analysis of themain expressive parameters (see Figures 12 and 13) shows thatthese values match with the correspondent parameters of realperformances analyzed in [39] Moreover Figure 14 showshow the expressive parameters change gradually followingthe userrsquosmovements between different expressive intentions

6 Assessment

The most important public forum worldwide in whichcomputer systems for expressive music performance areassessed is the Performance RENdering CONtest (Ren-con) which was initiated in 2002 by Haruhiro Katayoseand colleagues as a competition among different systems(httprenconmusicorg) During the years the Rencon con-test evolved toward a more structured format It is anannual international competition in which entrants presentcomputer systems they have developed for generating expres-sive musical performances which audience members andorganisers judge One of the goals of Rencon is to solvethe researchersrsquo common problem difficulty in evaluat-inggrading performances generated by hisher computersystem within their individual endeavours by entrants meet-ing together at the same site [40] An assessment (fromthe musical performance expressiveness point of view) of

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article CaRo 2.0: An Interactive System for

Advances in Human-Computer Interaction 11

Table 3 Results of the final stage of the Rencon 2011 contest Each system was required to render two different performances of the samemusic piece Each performance has been evaluated both by the audience attending the contest live and by people connected online througha streaming service

System Performance A Performance B Total scoreLive Online Total Live Online Total

CaRo 20 464 56 520 484 61 545 1065YQX 484 64 548 432 65 497 1045VirtualPhilharmony 452 46 498 428 32 460 958Director Musices 339 33 372 371 13 384 756Shunji System 277 24 301 277 20 297 598

0

20

40

60

80

100

120

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Key

velo

city

Note number

BrHaHe

LiSo

Figure 13 The Key-Velocity values of the performances charac-terised by the different expressive intentions (bright hard heavylight and soft) Only the melodic line is reported in order to havea more readable graph

CaRo 20 was carried out participating in Rencon 2011 Oursystem was classified for the final stage where the partici-pating systems were presented during a dedicated workshopThe performances played were openly evaluated by a large(77 participants) and qualified audience of musicians andscientists under the same conditions in the manner of acompetition Each of the five entrant systems was requiredto generate two different performances of the same musicpiece in order to evaluate the adaptability of the system Afterlistening to each performance the listeners were asked to rateit from the viewpoint of howmuchdo you give applause to theperformance on a 10-point scale (1 = nothing 10 = ovation)The CaRo 20 system won that final stage (see Table 3) Theresults as well as the comments by the experts are analysedin [41] The performances played during Rencon 2011 can belistened to in httpsmcdeiunipditadvances hci

7 Conclusion

The CaRo 20 could be integrated in the browser engines ofthe music community services In this way (i) the databases

minus15

minus1

minus05

0

05

1

15

Nor

mal

ized

val

ues

logIOInlogL

logKV

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73Note number

Hard Bright Light Soft

Figure 14The values of the threemain expressive parameters of theperformance obtained drawing the trajectory of Figure 14

keep track of the expressiveness of the songs played (ii) theusers createbrowse playlists based not only on the song titleor the artist name but also on the music expressiveness Thisallows that (i) in the rhythms games the performance will berated on the basis of the expressive intentions of the user withadvantages for the educational skill and the users involvementof the game (ii) in themusic community the user will be ableto search happy or sadmusic accordingly to the user affectivestate or preferences

In the rhythm games despite their big commercialsuccess (often bigger than the original music albums) thegameplay ismdashalmostmdashentirely oriented around the playerrsquosinteractions with a musical score or individual songs bymeans of pressing specific buttons or activating controls ona specialized game controller In these games the reaction ofthe virtual audience (ie the score rated) and the result ofthe battlecooperative mode are based on the performance ofthe player judged by the PC Up to now the game designersdid not consider the playerrsquos expressiveness As future workwe intend to insert the CaRo system in these games sothat the music performance is rated ldquoperfectrdquo not if it isonly played exactly like the score but considering also its

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article CaRo 2.0: An Interactive System for

12 Advances in Human-Computer Interaction

interpretation its expressiveness In this way these games canbe used profitably also in the educational field

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] N Bernardini andGDe Poli ldquoThe sound andmusic computingfield present and futurerdquo Journal of NewMusic Research vol 36no 3 pp 143ndash148 2007

[2] S Canazza G De Poli A Roda and A Vidolin ldquoAn abstractcontrol space for communication of sensory expressive inten-tions inmusic performancerdquo Journal ofNewMusic Research vol32 no 3 pp 281ndash294 2003

[3] S Canazza GDe Poli A Roda andAVidolin ldquoExpressivenessin music performance analysis models mapping encodingrdquoin Structuring Music through Markup Language Designs andArchitectures J Steyn Ed pp 156ndash186 IGI Global 2012

[4] S Canazza G De Poli C Drioli A Roda and A VidolinldquoAudiomorphing different expressive intentions formultimediasystemsrdquo IEEE Multimedia vol 7 no 3 pp 79ndash83 2000

[5] D Fabian R Timmers and E Schubert Eds Expressivenessin Music Performance Empirical Approaches Across Styles andCultures Oxford University Press Oxford UK 2014

[6] G De Poli ldquoMethodologies for expressiveness modelling of andfor music performancerdquo Journal of NewMusic Research vol 33no 3 pp 189ndash202 2004

[7] G Widmer and P Zanon ldquoAutomatic recognition of famousartists by machinerdquo in Proceedings of the 16th European Confer-ence on Artificial Intelligence (ECAI rsquo04) pp 1109ndash1110 ValenciaSpain 2004

[8] A Gabrielsson ldquoExpressive intention and performancerdquo inMusic and the Mind Machine R Steinberg Ed pp 35ndash47Springer Berlin Germany 1995

[9] P Juslin ldquoEmotional communication in music performance afunctionalist perspective and some datardquoMusic Perception vol14 no 4 pp 383ndash418 1997

[10] S Canazza G De Poli and A Vidolin ldquoPerceptual analysisof the musical expressive intention in a clarinet performancerdquoin Music Gestalt and Computing M Leman Ed pp 441ndash450Springer Berlin Germany 1997

[11] G De Poli A Roda and A Vidolin ldquoNoteminusbyminusnote analysis ofthe influence of expressive intentions and musical structure inviolin performancerdquo Journal of New Music Research vol 27 no3 pp 293ndash321 1998

[12] R Bresin and A Friberg ldquoEmotional coloring of computer-controlled music performancesrdquo Computer Music Journal vol24 no 4 pp 44ndash63 2000

[13] F B Baraldi G De Poli and A Roda ldquoCommunicatingexpressive intentions with a single piano noterdquo Journal of NewMusic Research vol 35 no 3 pp 197ndash210 2006

[14] S Hashimoto ldquoKANSEI as the third target of informationprocessing and related topics in Japanrdquo in Proceedings of theInternational Workshop on Kansei-Technology of Emotion pp101ndash104 1997

[15] S Hashimoto ldquoHumanoid robots for kansei communicationcomputer must have bodyrdquo inMachine Intelligence Quo Vadis

P Sincak J Vascak and K Hirota Eds pp 357ndash370 WorldScientific 2004

[16] R Picard Affective Computing MIT Press Cambridge MassUSA 1997

[17] S Canazza G De Poli C Drioli A Roda and A VidolinldquoModeling and control of expressiveness in music perfor-mancerdquo Proceedings of the IEEE vol 92 no 4 pp 686ndash7012004

[18] B H Repp ldquoDiversity and commonality in music perfor-mance an analysis of timing microstructure in SchumannrsquoslsquoTraumereirsquordquo The Journal of the Acoustical Society of Americavol 92 no 5 pp 2546ndash2568 1992

[19] B H Repp ldquoThe aesthetic quality of a quantitatively averagemusic performance two preliminary experimentsrdquo Music Per-ception vol 14 no 4 pp 419ndash444 1997

[20] L Mion G De Poli and E Rapana ldquoPerceptual organizationof affective and sensorial expressive intentions in music perfor-mancerdquo ACM Transactions on Applied Perception vol 7 no 2article 14 2010

[21] S Dixon and W Goebl ldquoThe rsquoair wormrsquo an interface forreal-time manipulation of expressive music performancerdquo inProceedings of the International Computer Music Conference(ICMC rsquo05) Barcelona Spain 2005

[22] A Roda S Canazza and G De Poli ldquoClustering affectivequalities of classical music beyond the valence-arousal planerdquoIEEE Transactions on Affective Computing vol 5 no 4 pp 364ndash376 2014

[23] P N Juslin and J A Sloboda Music and Emotion Theory andResearch Oxford University Press 2001

[24] A Camurri G De Poli M Leman and G Volpe ldquoCommu-nicating expressiveness and affect in multimodal interactivesystemsrdquo IEEE Multimedia vol 12 no 1 pp 43ndash53 2005

[25] A Kirke and E R Miranda ldquoA survey of computer systems forexpressive music performancerdquo ACM Computing Surveys vol42 no 1 article 3 pp 1ndash41 2009

[26] G Widmer S Flossmann and M Grachten ldquoYQX playsChopinrdquo AI Magazine vol 30 no 3 pp 35ndash48 2009

[27] G Widmer ldquoDiscovering simple rules in complex data a meta-learning algorithm and some surprising musical discoveriesrdquoArtificial Intelligence vol 146 no 2 pp 129ndash148 2003

[28] J Sundberg AAskenfelt and L Fryden ldquoMusical performancea synthesis-by-rule approachrdquo Computer Music Journal vol 7no 1 pp 37ndash43 1983

[29] I Ipolyi ldquoInnforing i musiksprakets opprinnelse och strukturrdquoTMHQPSR vol 48 no 1 pp 35ndash43 1952

[30] A Friberg R Bresin and J Sundberg ldquoOverview of the KTHrule system for musical performancerdquo Advances in CognitivePsychology Special Issue on Music Performance vol 2 no 2-3pp 145ndash161 2006

[31] S Tanaka M Hashida and H Katayose ldquoShunji a case-basedperformance rendering system attached importance to phraseexpressionrdquo in Proceedings of Sound and Music ComputingConference (SMC rsquo11) F Avanzini Ed pp 1ndash2 2011

[32] M Hamanaka K Hirata and S Tojo ldquoImplementing lsquoa genera-tive theory of tonal musicrsquordquo Journal of New Music Research vol35 no 4 pp 249ndash277 2006

[33] F Lerdahl and R Jackendoff A Generative Theory of TonalMusic MIT Press Cambridge Mass USA 1983

[34] T Baba M Hashida and H Katayose ldquoA conducting systemwith heuristics of the conductor lsquovirtualphilharmonyrsquordquo in Pro-ceedings of the New Interfaces for Musical Expression Conference(NIME rsquo10) pp 263ndash270 Sydney Australia June 2010

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article CaRo 2.0: An Interactive System for

Advances in Human-Computer Interaction 13

[35] M Good ldquoMusicXML for notation and analysisrdquo inTheVirtualScore Representation Retrieval Restoration W B Hewlett andE SelfridgeField Eds pp 113ndash124 MIT Press CambridgeMass USA 2001

[36] I Peretz L Gagnon and B Bouchard ldquoMusic and emotionperceptual determinants immediacy and isolation after braindamagerdquo Cognition vol 68 no 2 pp 111ndash141 1998

[37] R Plutchik Emotion A Psychoevolutionary Synthesis Harper ampRow New York NY USA 1980

[38] D Temperley The Cognition of Basic Musical Structures MITPress Cambridge Mass USA 2001

[39] S Canazza G de Poli S Rinaldin and A Vidolin ldquoSonologicalanalysis of clarinet expressivityrdquo in Music Gestalt and Com-puting Studies in Cognitive and Systematic Musicology vol 1317of Lecture Notes in Computer Science pp 431ndash440 SpringerBerlin Germany 1997

[40] H Katayose M Hashida G de Poli and K Hirata ldquoOn evalu-ating systems for generating expressive music performance therencon experiencerdquo Journal of New Music Research vol 41 no4 pp 299ndash310 2012

[41] G de Poli S Canazza A Roda and E Schubert ldquoThe roleof individual difference in judging expressiveness of computerassisted music performances by expertsrdquo ACM Transactions onApplied Perception vol 11 no 4 article 22 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: Research Article CaRo 2.0: An Interactive System for

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014