mickey the music stand: a novel approach to instructional ... · mickey the music stand: a novel...

Mickey the Music Stand: A Novel Approach to InstructionalFeedback in a Recreational Performance Setting

Yasmin HalwaniUniversity of British Columbia

Electrical and Computer EngineeringVancouver, BC, Canada

[email protected]

Samantha SterlingUniversity of British Columbia

Biomedical EngineeringVancouver, BC, Canada

[email protected]

ABSTRACTMusic comprises a major component of human culture, andwith the advent of technology, both musical instruction andmusical performance are becoming more and more accessibleto the general population. To that end we present Mickey, arobotic music stand designed to enhance the musical perfor-mance experience by providing real time expressive feedbackbased on a singer’s pitch and rhythm accuracy for a chosensong. We study the interaction of human users with Mickey,taking note of how well Mickey’s expressions are understoodand how useful the users believe Mickey is as a feedbacktool. Results show that users of the Mickey system havean increased sense of enjoyment while singing; results alsoindicate that even in the case of more complicated songs,users will maintain focus on Mickey because of his perceivedhelpfulness.

KeywordsHumanized feedback; Music Evaluation; Human-Robot In-teraction

1. INTRODUCTIONMusic has fascinated the human race for centuries. Almost

everyone is able to listen to music through some venue oranother, but musicians, those people responsible for creatingmusic, have historically been a select few. Private lessons,either through a school or through the local community, arecostly and rigidly scheduled. The advent of games such asRock Band and Guitar Hero have allowed a much larger pop-ulation the opportunity to experience the feeling of makingmusic in the privacy and comfort of their own homes, butsuch games translate poorly to real world instruments.

The goal of this paper is to explore the utility of a human-ized music stand in evaluating and providing abstract, realtime feedback designed to enhance the performance experi-ence of the user. To that effect, this paper aims to answerthe following research questions:

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists requires specific permission and/or a fee. HIT2016, Vancouver, BC, Canada.c©UBC 2016.

HIT ’16 April 19, 2016, Vancouver, BC, CANADAc© 2016 ACM. ISBN xxx-xxxx-xx-xxx/xx/xx. . . $15.00

DOI: xx.xxx/xxx x

Figure 1: Mickey: an expressive robot for vocal per-formance evaluation

1. Can users correctly interpret the robotic expressionsthe music stand is giving them as feedback?

2. Can the robotic feedback be given in such a way thatusers can still maintain primary focus on the musicand lyrics?

3. Does the user see the feedback from the music stand ashelpful in terms of maintaining accuracy in pitch andrhythm?

2. BACKGROUND AND RELATED WORKFor those people who would like to learn about music but

are unwilling or unable to get a specific tutor, a variety of au-tonomous teaching and feedback programs exist. Programsseem to exist within two main spheres: those that evaluatepitch, fingering, hand positions, and other technical aspectsof music performance and provide traditional feedback inthe form of numbers and charts, and those that employ morenovel forms of feedback.

2.1 Musical Technique EvaluationAttempts to make music instruction more accessible can

be seen in a wide variety of research projects. Examplesrange from the piano instruction software created by Gorod-nichy and Yogeswaran [3] which transmits a player’s hand

positions to a remote human instructor, to an autonomousmusic stand from Grosshauser and Hermann [5] which usesan accelerometer and a gyroscope to monitor the positionand angle of a violinist’s bow, to the Piano Tutor softwarefrom [1] which tailors a series of lessons based on the musi-cian’s skill level. However, such systems generally require asolid foundation in music theory, lack autonomy, or provideonly traditional forms of feedback.

2.2 Abstract Music InterpretationSystems that explore more novel forms of feedback include

those that push the boundaries of musical performance. Afair amount of research has been done on ways to augment amusical performance with a visual element. The researchersin [9] develop a method to map music to a chromatic (color)index, while [2] takes it a step further by designing a systemwhich allows a user to create music and kaleidoscope-like im-agery in a virtual environment. The performer plays virtualdrums, their movements captured by a video camera andposition sensors. Each drum pad is mapped to a theoreti-cally correct musical note (according to an analysis based onimprovisational jazz theory) and the projected colors changebased on the relationship between the background jazz mu-sic and the notes being played via the virtual drums.

Both of the previous examples have fairly abstract visualelements, but investigators have also created virtual dancers,whose movements are controlled or informed by the musicbeing played. Researchers in [4] created “Cindy,” a virtualfigure whose movements are controlled by two human playersworking collaboratively. Similarly, Andante, created by Xiaoet al. [10], is a graphical representation of music as a visualprojection of an animated character that walks on the keysthat are being played. It is useful to note that the vastmajority of research has been done on incorporating a visualelement virtually, while there is a lack of investigation intobringing the visual element to the solid physical realm.

2.3 Expressive Interactive RoboticsSocially interactive robots can provide physical manifes-

tations of visual feedback on musical expression. Anthropo-morphizing an existing object allows a user to interact witha familiar item in an intuitive fashion, as investigated by Os-awa and Mukai [8]. The robot created in [6] displays two dif-ferent kinds of emotional behavior, fear and happiness, whilethe researchers evaluate whether human volunteers can iden-tify both in the robotic agent. Similarly, the work done in[7] focuses on the robotic agent providing emotional feed-back based on the user’s behavior. A team of researchersfrom Stanford create a set of robotic drawers designed towork collaboratively with a human in order to complete anassembly task. The drawers’ interaction works along the di-mensions of proactivity and expressivity of emotions, as theyare designed to perform animations in reaction to human be-havior. These investigations of user- interpreted emotionalfeedback provide the foundation for this paper.

From a review of current literature, it is apparent thatquite a bit of research exists on ways to quantitatively eval-uate musical performance for use as a rehearsal tool as wellas ways to synthesize a visual representation of the musicbeing played in order to enhance the performance experi-ence. However, the bulk of this feedback is comprised ofgraphs and numbers, generally not intuitive to the user. Inaddition, the visual elements tend to exist primarily in the

Figure 2: An initial attempt at exploring the possi-ble movements of a music stand

virtual realm (on the screen, for example). To close this gap,the investigators have created a tool that can not only giveintuitive feedback to the user on how accurate their perfor-mance is in real time, in order to allow for immediate userreaction, but also uses the field of expressive robotics to cre-ate a method of communication that exists with the user inthe physical space.

3. SYSTEM OVERVIEW“Mickey”, as shown in figure 1, is an anthropomorphized

music stand created to assist in the investigation of the re-search questions. He evaluates a singer’s accuracy in termsof pitch and rhythm and provides intuitive, expressive feed-back on whether the singer is doing well or poorly. Mickeyis designed to assist a singer who is at least somewhat famil-iar with the concepts of pitch and rhythm and who wouldlike some additional, instructional feedback in a recreationalperformance setting.

3.1 Design ApproachTwo preliminary questionnaires were created and distributed

in order to inform the design of an expressive and under-standable robot. The first questionnaire assessed the gen-eral musical background of our potential users (generallyyoung adults and UBC students) as well as what featuresthey would like to see in a potential music instruction pro-gram. Eighteen volunteers, most with extensive musicalbackgrounds, completed this initial survey, which indicatedthat users as a whole were more comfortable learning andperforming a song on their own without an instructor in theroom. However, responses almost unanimously indicated apreference for receiving actual feedback from a human in-structor as opposed to a computer program, prompting theconsideration of a humanized design with recognizable fea-tures (such as arms and eyebrows).

A second questionnaire featured video recordings of move-ments created with a low fidelity prototype, designed thistime to evaluate if certain movements conveyed specific ex-pressions better than others and, if so, which expressions

were associated with which movements. Specifically, “arms”and “eyebrows” were constructed out of yellow paper and af-fixed to a music stand; survey takers were then asked whatexpressions they associated with the recorded movements,as seen in figure 2. The ten volunteers who completed thissecond survey agreed that the addition of eyebrows to themusic stand increased the stand’s range of expressions. How-ever, the addition of arm movements only served to indicateasking for attention, no matter which movement was shown;the arms made no contribution in terms of clarifying expres-sions or feedback. While initially arms were included in therobotic design, results from this survey indicated that theywouldn’t serve any meaningful purpose; this led to the cre-ation of a design informed by the survey responses, which fo-cused primarily on the expressivity of eyebrows. To expandMickey’s repertoire of expressions, the decision was madeto include an additional facial feature: a mouth. The finaldesign explores the interplay between these two expressiveelements.

3.2 System design

3.2.1 System interactionAs explained in earlier sections, the system is designed to

detect pitch and rhythm and provide feedback to the user inrealtime, so the user can adjust their performance through-out the session. The system keeps a list of tunes with prede-fined pitches and rhythms with which to compare the user’sperformance. Once the user selects one of these tunes andstarts singing, the system starts to continuously compareeach pitch and rhythm performed by the user against thecorresponding reference pitches and rhythms programmedin the system.

Pitch and Rhythm Evaluation To accommodate awide range of vocal capabilities, the system does not com-pare the pitch performed by the user to the absolute pitchprogrammed in the system. Instead, the system allows forrelative note comparison as long as the notes remain inter-nally consistent. For example, if a user accidentally changeskeys when singing the second verse, that verse will be eval-uated in the context of the new key.

In music, rhythm is the music’s pattern in time. Program-matically speaking, it is the time period between one fre-quency (note) and another. The system compares the timeseparation between the notes performed by the user withthe reference time period between the notes programmed inthe system, and it provides a feedback accordingly.

Scoring A very simple set of scoring criteria enables realtime evaluation of vocal performance based on cumulativescoring in a predictable and understandable manner. Thesystem listens to the user and adjusts the score after everymeasure. For each measure, the system listens to the userand records the sequence of the notes hit (a melody lineconsisting of a sequence of pitches) and the time separationbetween the notes (rhythm); this is then compared with thecorresponding reference measure. For each measure, if bothpitch and rhythm are correct, the system increases the scoreby 1. If the user makes at least one a mistake in pitch,rhythm or both, the system decreases the score by 1. Scoresare limited between 3 and -3. Therefore, a user cannot scorehigher than 3 or lower than -3. The score always starts fromzero, then changes after each measure based on the user’sperformance.

Figure 3: The range of facial expressions thatMickey is capable of expressing. (a) neutral, (b)happy, (c) very happy, (d) sad, (e) upset, (f) angry

Feedback and Expressions Mapping A different facialexpression is linked to each score, which gradually changesfrom upset to happy. Figure 3 shows the list of the 6 outof 7 facial expressions of Mickey: angry, upset, sad, neu-tral, happy, very happy, and excited. The excited facialexpression is simply a twitch in Mickey’s smile while main-taining the same eyebrows expression as very happy. Aftereach measure performed by the user, the system increasesor decreases the overall score, triggering a change in facialexpression that provides feedback to the user on whetherthey are doing better or worse.

3.2.2 System architectureThe system is composed of the following basic blocks, as

shown in figure 4:

1. Input: The sound signal received by the system throughthe microphone and passed to the processing unit.

2. Processing: The unit performing the signal processingrequired for pitch and rhythm evaluation and subse-quent score calculation.

3. Actuation: A secondary processing unit that translatesthe score to a physical movement by the servo motors.

4. Output: A total of 3 servo motors attached to andresponsible for the rotation of the eyebrows and mouthof the robotic music stand.

3.3 Implementation

Figure 4: A block diagram of the main system com-ponents

Figure 5: The sheet music for the first line of “Twin-kle Twinkle Little Star”

3.3.1 Pitch and Rhythm DetectionA simple prototype was built as a proof-of-concept for

pitch and rhythm detection, which evaluates the first line of“Twinkle twinkle little star”. This line was selected due toits relative easiness, as it contain only 4 measures, 6 distinctnotes (C, G, A, F, E, D) and notes with two different dura-tions (both quarter notes and half notes), as shown in figure5.

Given that each music note has a unique frequency asso-ciated with it, the program simply has to detect the soundfrequencies delivered through the input channel and the du-ration of the time periods between those frequencies. Interms of signal processing, edge detection serves both toidentify the distinct notes and also to record the time periodbetween the distinct music edges, which will be importantwhen evaluating rhythm. The system then performs a basicFourier transform between every two edges in order to detectthe frequencies. If at least one of the frequencies detectedmatch the desired frequency, then the system counts it as apass.

3.3.2 Actuation and Facial ExpressionsAs explained in the design section, the system increases or

decreases the score depending on the number of errors theuser makes in terms of pitch and/or rhythm, which trans-lates to a change in facial expression. Three servo motorswork in conjunction to form a given expression; one motorfor the right eyebrow, one for the left eyebrow, and one forthe mouth. The first and second motors are originally situ-ated to have each eyebrow at a 0 degree angle. Based on theexpression, the eyebrows (servo motors) move in oppositedirections gradually to a maximum of 40 degrees. As for themotor controlling the mouth, it only moves from 0 degrees(happy smile) to a 180 degrees (frown). It is also capableof “twitching” while it is in the 0 degrees position to showexcitement.

3.3.3 LimitationsWe believe that the current prototype is sufficient for a

foundational study in terms of the output it can provide tothe user as 7 distinct facial expressions and in terms of thegeneral algorithm used for real time calculation of scores.However, there are some significant inaccuracies in edge de-tection which limits the usability of the implemented systemin terms of signal processing, as shown in figure 6. Given thechallenges in detecting pitch duration for vocal signals andthe requirement for our system to provide realtime feedback,it is not feasible to accurately detect all the edges within ameasure. Thus, we have modified the system to detect thefrequencies present at a rate of 1 note/second. In otherwords, the system does not actively perform edge detectionand only performs Fourier transform every one second tofind the frequencies for a window of one second. This modi-fication in turn limits the tempo to 60 beats a minute, or inmusical terms, quarter note = 60. This poses an additionalchallenge for the usability of the system as a lot of tunesdo not follow this (painfully) slow beat. Although these sig-nal processing limitations can be investigated further withthe use of filters to improve the detection and make it morerobust, due to time limitations and our investigational pri-oritization on the interaction aspect of Mickey as opposedto the technical aspect, we decided to follow a Wizard-of-Ozapproach for our user study, as explained in the next section.

4. USER STUDYA total of fifteen graduate students from UBC (11 females

and 4 males, in an age range of 24 to 34) with varying musi-cal backgrounds were recruited for this study. Only the datacollected from the first participant was discarded as it wasconsidered a pilot test to the system. Three participantshad no formal music education nor played any instruments,8 participants had a few years of or extensive exposure tomusic education and theory, and 4 participants casually playmusic instruments but failed to indicate whether they havereceived any formal music education.

Our Wizard of Oz experimental design allowed for Mickey’sreactions to be controlled by the investigators via a simpleuser interface implemented in MATLAB. The investigatoroperating the controls had an extensive musical background,including training in music theory and vocal performance,and followed a predefined list of rules governing what eventstriggered reactions, as explained in the design section. Theoperator also employed a “partial credit” system; if a mis-take was made (for instance, the singer changed keys halfwaythrough), the remaining song would be judged as if the errorwas correct (within the new key), much like how a student

Figure 6: Results from the edge detection algorithmfor the first 2 measures of “Twinkle Twinkle LittleStar”

taking a calculus exam could make an addition error partway through and arrive at the wrong answer but still getpartial credit since the reasoning and methodology was cor-rect.

We took great care to design the experimental set up ina way that would assist in the deception of participants. Aswe were focused on how the user interacted with Mickey, wewanted to ensure that the user believed Mickey to be com-pletely autonomous and not be influenced by feeling judgedby the investigators. Mickey was placed against the wall sothat no one would get a close look at the electrical com-ponents; participants were instructed to sing into a smallmicrophone where the cord was simply tucked behind thestand instead of being connected to anything; the consentforms and project descriptions all referred to the fact thatparticipants would be automatically judged by Mickey.

Each participant went through a thirty minute sessionthat consisted of three separate phases. Before beginning,participants gave written consent for the investigators to usesurvey responses and video recordings made during the ses-sion for further data analysis. The participant then filledout a brief survey on their musical background. For thefirst phase, we asked the participant to sing a simple song,chosen from a predefined list. The participant was alloweda “practice session,” during which they sang the song whileobserving Mickey’s reactions in order to get a feel for thesystem. The participant then sang the same song while wevideorecorded Mickey’s reaction; in this way our sole focuscould be on the accurate operation of Mickey in real timeand still be able to analyze the tape later for which errorsspecifically prompted the reactions. Finally, the participantwas asked to complete a survey about their experience.

For the second phase, the participant had to choose froma selection of more complex songs. If they did not recog-nize any of the songs in the second list, they were allowed tochoose a second song from the first list. The practice roundin this case consisted of the participant watching an onlinevideo with karaoke lyrics for their chosen song. Mickey didnot react in this practice round, but the operator did singalong with the participant to ensure the user had a solid un-

derstanding of what the song sounded like. The “real” roundwas again videorecorded, but unlike phase one in which theparticipant sung alone, in phase two the participant sungalong with the karaoke track. The participant then answeredthe same set of questions as they did in phase one, but abouttheir experience in phase two.

After each of the first two phases in a session, each par-ticipant was asked the following questions:

1. On a scale of 1 to 5, how much do you think Mickeyenjoyed your performance?

2. Where was your attention mainly focused?

3. On a scale of 1 to 5, how would you rate your ownperformance?

4. Did you find Mickey’s expressions distracting?

5. Did you find Mickey helpful to evaluate your perfor-mance?

The participants were then fully debriefed and the reasonfor the deception explained. As the final phase of the session,the participant was then asked to try singing the first line of“Twinkle Twinkle Little Star” while being evaluated by ourautonomous program.

5. RESULTS AND DISCUSSION

5.1 Results

5.1.1 Quantitative ResultsTo answer our research questions, we focused on analyz-

ing the relevant data such as the participants’ overall scores,their understanding of Mickey’s expression, their personalevaluation of their own performance, their attention, andtheir perceived helpfulness of Mickey’s feedback. It is alsoworthy to mention that the Mickey’s scores were adjustedto match the scale of scoring the participants recorded re-garding Mickey’s evaluation and their personal evaluation,as asked in questions 1 and 3 after each of the first twophases in one session. A total of 4 repeated measures two-way ANOVA tests were used to analyze the results. Thefirst test was used to compare the means of the how par-ticipants perceived Mickey’s feedback (as asked in question1) versus their actual overall scores. The participants (N =14) perceived a significantly different feedback from Mickeythrough the two phases rendering the perception of feedbacka main effect in our study, where the mean perceived scorefrom the first phase (M = 3.36, SD = 1.08) is significantlyhigher than the second phase (M = 2.64, SD = 1.15), F (1,13) = 17.936, p < .001, partial η2 = .58. On the other hand,means of the actual overall scores evaluated by the systemin the first phase (M = 4.29, SD = 1.33) are higher, butnot significantly different from the overall scores evaluatedin the second phase (M = 3.5, SD = 1.29), F (1,13) = 4.38,ns.

The second test was used to compare the means of the howparticipants perceived their own performance versus theiroverall scores. Similar to their perception of Mickey’s feed-back, the participants’ perception of their own performanceis significantly different through the two phases, where themean first phase (M = 2.64, SD = 1.08) is significantly

Figure 7: A bar graph showing the participants’overall score, perception of Mickey’s score and theirown evaluation of their performance for phases 1 and2

Figure 8: A bar graph showing the participants’ dis-tribution of attention in phases 1 and 2

higher than the second phase (M = 2.42, SD = 1.01), F (1,13) = 18.92, p < .001, partial η2 = .593.

The bar charts presented in figure 7 show the decreasingtrend of scores between the two phases, which is followed bythe participants’ perception of their scores based on Mickey’sevaluation, and followed by the participants’ personal eval-uation of their own performance.

The third test was used to compare the means of theparticipants’ attention to their overall scores. The partic-ipants’ attention significantly shifted from the first phase(M = 2.71, SD = 1.20) more towards Mickey’s facial ex-pressions in the second phase(M = 3.89, SD = 0.99), F (1,13) = 13.44, p = 0.003, partial η2 = .508. There was no in-teraction between the participants’ attention and their scoreF (1,13) = 1.86, ns.

The last test was used to compare the means of the par-ticipants’ attention to their own evaluation of their perfor-mance. Similar to the third test, there was no interactionbetween the participants’ attention and their evaluation oftheir own performance F (1,13) = 1.88, ns.

To further understand the results, some descriptive statis-tical analysis and histograms were generated for the relevantdata. Figure 8 shows the distribution of the participants’attention within the first two phases. The bar graphs sug-gest that the participants’ attention started shifting moretowards Mickey and less towards the lyrics in the secondphase, although the songs presented in the second phase aregenerally harder with longer lyrics.

Figure 9: A bar graph showing the perceived help-fulness from Mickey by the participants for phases1 and 2

Figure 10: A bar graph showing the perceived dis-traction made by Mickey for phases 1 and 2

In figure 9, we see that participants started perceivingMickey as more helpful in the second phase compared tothe first phase. We hypothesise that this could be linked tothe fact that their attention started shifting more towardsMickey, so they are receiving more help than before.

Looking at individual scores, the level of distraction var-ied from one phase to the next for a given participant. Sur-prisingly though, the overall levels of Mickey’s distractionremained the same over the two phases, as shown in figure10. We notice that the majority of the participants (64.3 %)did not find Mickey distracting at all and only 2 participantsfound MickeyaAZs expression distracting at some points.

The pie chart in figure 11 shows the percentage of im-provement of scores between the first two phases. Most ofthe scores decreased, which could be due to a number of rea-sons. To give a few examples, it could be due to the increaseof difficulty of the song presented in phase 2 or the presenceof a karaoke track playing in the background.

Analysis of all 14 participant data for the third phase wasnot possible as we later discovered a bug in the scoring sec-tion of the automatic evaluation prototype that affected thedata collected from 6 participants, leaving only 9 partici-pants’ data uneffected. Therefore, the graph in 12 shows theaverage scores from the data collected from 9 participantsover the 16 seconds (where a note is detected per second)

Figure 11: A pie chart showing the percentages ofparticipants whose scores increased, decreased andstayed the same

Figure 12: A bar chart showing the average scoresand standard deviations per second for 9 partici-pants singing the first line of “Twinkle Twinkle Lit-tle Star”

when they had to sing the first line of “twinkle twinkle littlestar”.

5.1.2 Qualitative Results and Comments from Par-ticipants

Overall, participants seemed to really enjoy Mickey andfound him fun to interact with. A lot of the participantswanted to sing songs they are more familiar with insteadof selecting from a predefined list of songs. When askedabout what additional information you would like Mickeyto provide, one of the participants noted that it would behelpful to provide “whether pitch or rhythm is off duringeach song”. A number of other participants also agreed aswhen they got the feedback from Mickey, they did not knowwhether they did a mistake in pitch or rhythm. Anotherparticipant also suggested to provide“how far away the pitchwas from the one needed”.

Another participant suggested that it would be helpful tohave some sort of an indicator to show that Mickey is actu-ally listening to the user, by humanizing the robot further tohave blinking eyes, for example. The same participant sug-gested the use of Mickey to provide evaluation on instrumentperformance.

In terms of user study design, one of the participants sug-gested that we could have asked the users to participateahead and ask them for the song they want to sing, so theywill be more familiar with it and they will come preparedto the user study. Although it seems like a great sugges-tion as we faced some troubles with users not being familiarwith the predefined list of songs prepared, it might intro-duce some bias to the results as the participant would havelearned the song beforehand and would not use Mickey asa learning and evaluation platform, but more for sole enter-tainment.

5.2 DiscussionLooking at figure 7, we realize that the participants actu-

ally had a sense of their overall declining performance fromphase 1 to phase 2 with significant mean differences, as an-alyzed in the first two analyses of variance, between the twophases of both their perception of Mickey’s evaluation andtheir own evaluation of their personal performance. Fromthese results, we conclude that users actually do understandto some degree the expressions provided by Mickey, althoughsome of them noted that the neutral face was sometimes con-fused with the happy face, as it had a smile.

On the other hand, it is great to see a significant shiftof focus towards Mickey’s expressions and less towards thelyrics provided from phase 1 to phase 2, although the lyricsin phase 2 were generally longer and required more atten-tion. This can be interpreted to mean that participants doperceive valuable feedback from Mickey and generally findit helpful, as shown in figure 9 regarding the perceived help-fulness from Mickey’s expression.

As for the third phase, we notice that the participants’performance starts declining over time. However, this couldbe due to the fact that the program is not robust enough tore-adjust to rhythm errors. In fact, one rhythmic error in thebeginning of the song ruins the evaluation for the rest of thesong as the system performs evaluation only every secondto a specific set of notes. Although participants listened toa middle C note before they began singing in order to tuneto it, many of them could not tune correctly and startedsinging in a relative pitch. If it was evaluated per the rulesfollowed in our Wizard-of-Oz experiment, many users wouldhave had much higher scores, since the rules are more robustto rhythm errors and do not follow an exact note comparisonbut rather a relative note comparison.As a point of discus-sion, it’s worthwhile to realize that it takes a fair amountof practice to “learn” how to give Mickey the correct notesat the right time when running the programmatic trial. Ifhumans have to work so hard to adjust to something thecomputer can understand, at what point does it begin to beworthwhile to get humans to conform to the computer asopposed to the other way around.

6. CONCLUSION AND FUTURE WORKThe work presented explored the idea of anthropomorphis-

ing an existing object to provide realtime feedback on vocalperformance in a recreational setting. We decided to use amusic stand as it is the most common piece of equipmentany musician needs. We explored humanizing a music standby creating facial expressions for it that change as a form ofa natural feedback provided to the user of their vocal per-formance. The results show that users can easily interpretthe expressions provided by Mickey, the humanized music

stand, and keep their primary focus on his feedback, whichthey generally do not find very distracting.

As pointed out by some participants, future work to thissystem will include an extension to music evaluation with-out being limited to vocal performance. Also participantspointed out that they require more information regardingthe source of errors in their performance. Given that weare designing our system with humanized feedback in mind,exploring the use of arms that provide information on pitcherrors by gesturing whether the user is flat or sharp whilethe user is singing is an interesting idea.

Another area to explore in the future would be to enhancethe scoring and feedback schemes to provide realtime feed-back as well as holistic feedback by Mickey after the useris done singing. Examples of holistic feedback could includeclapping, cheering, or getting super upset and walking away!

Indeed, Mickey requires further enhancement to his ex-pressions, signal processing, and scoring schemes, but thework presented in this paper was meant to serve as a foun-dational basis of understanding of a humanized system formusic performance evaluation that helps in enhancing theuser’s vocal pitch and rhythm accuracies while simultane-ously providing an entertaining and engaging experience.

7. REFERENCES[1] R. B. Dannenberg, M. Sanchez, A. Joseph, R. Joseph,

R. Saul, and P. Capell. Results from the piano tutorproject. In Proceedings of the Fourth Biennial Artsand Technology Symposium, pages 143–150, 1993.

[2] S. Fels, K. Nishimoto, and K. Mase. Musikalscope: Agraphical musical instrument. In MultimediaComputing and Systems’ 97. Proceedings., IEEEInternational Conference on, pages 55–62. IEEE, 1997.

[3] D. Gorodnichy and A. Yogeswaran. Detection andtracking of pianist hands and fingers. 2006.

[4] M. Goto and Y. Muraoka. A virtual dancer

aAIJcindyaAI interactive performance of amusic-controlled cg dancer. In Proc. Lifelike ComputerCharacters, volume 65, 1996.

[5] T. Grosshauser and T. Hermann. The sonified musicstand–an interactive sonification system for musicians.In Proceedings of the 6th sound and music computingconference, pages 233–238. Casa da Musica, Porto,Portugal, 2009.

[6] G. Lakatos, M. Gacsi, V. Konok, I. Bruder,

B. Bereczky, P. Korondi, and A. Miklosi. Emotionattribution to a non-humanoid robot in different socialsituations. PloS one, 9(12):e114207, 2014.

[7] B. K.-J. Mok, S. Yang, D. Sirkin, and W. Ju. A placefor every tool and every tool in its place: Performingcollaborative tasks with interactive robotic drawers. InRobot and Human Interactive Communication(RO-MAN), 2015 24th IEEE InternationalSymposium on, pages 700–706. IEEE, 2015.

[8] H. Osawa, J. Mukai, and M. Imai.Anthropomorphization framework for human-objectcommunication. JACIII, 11(8):1007–1014, 2007.

[9] D. Politis, D. Margounakis, and K. Mokos. Visualizingthe chromatic index of music. In Web Delivering ofMusic, 2004. WEDELMUSIC 2004. Proceedings of theFourth International Conference on, pages 102–109.IEEE, 2004.

[10] X. Xiao, B. Tome, and H. Ishii. Andante: Walkingfigures on the piano keyboard to visualize musicalmotion. In NIME, pages 629–632, 2014.

mickey the music stand: a novel approach to instructional ... · mickey the music stand: a novel...

Documents