building a bridge between aesthetics and acoustics

18
http://rsm.sagepub.com/ Education Research Studies in Music http://rsm.sagepub.com/content/24/1/58 The online version of this article can be found at: DOI: 10.1177/1321103X050240010501 2005 24: 58 Research Studies in Music Education Evangelos T. Himonides and Graham F. Welch For Recording Emotional Response To Sung Performance Quality Building A Bridge Between Aesthetics And Acoustics With New Technology: A Proposed Framework Published by: http://www.sagepublications.com On behalf of: Society for Education, Music and Psychology Research can be found at: Research Studies in Music Education Additional services and information for http://rsm.sagepub.com/cgi/alerts Email Alerts: http://rsm.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://rsm.sagepub.com/content/24/1/58.refs.html Citations: What is This? - Jun 1, 2005 Version of Record >> at Australian National University on April 2, 2014 rsm.sagepub.com Downloaded from at Australian National University on April 2, 2014 rsm.sagepub.com Downloaded from

Upload: cindysithi

Post on 07-May-2017

234 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Building a Bridge Between Aesthetics and Acoustics

http://rsm.sagepub.com/Education

Research Studies in Music

http://rsm.sagepub.com/content/24/1/58The online version of this article can be found at:

 DOI: 10.1177/1321103X050240010501

2005 24: 58Research Studies in Music EducationEvangelos T. Himonides and Graham F. Welch

For Recording Emotional Response To Sung Performance QualityBuilding A Bridge Between Aesthetics And Acoustics With New Technology: A Proposed Framework

  

Published by:

http://www.sagepublications.com

On behalf of: 

  Society for Education, Music and Psychology Research

can be found at:Research Studies in Music EducationAdditional services and information for    

  http://rsm.sagepub.com/cgi/alertsEmail Alerts:

 

http://rsm.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://rsm.sagepub.com/content/24/1/58.refs.htmlCitations:  

What is This? 

- Jun 1, 2005Version of Record >>

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 2: Building a Bridge Between Aesthetics and Acoustics

Number 24, 2005 Research Studies in Music Education

58

© 2005 Callaway Centre

Building A Bridge Between Aesthetics And Acoustics With New Technology: A Proposed

Framework For Recording Emotional Response To Sung Performance Quality

Evangelos T. Himonides & Graham F. Welch Abstract

The paper describes a framework for the incorporation of an interactive-multimedia computer application into a four-year multimethod research project that investigates the psychoacoustic interpretation of vocal performance quality. Initial research, based on the analysis of semi-structured interviews with people from a wide range of backgrounds, followed by an extensive questionnaire survey, indicates different perceptual features as salient for individual listeners of singing performances. A synthesis of these reportedly significant perceptual features is now in the process of being critically evaluated by individual expert listeners using a novel experimental procedure that embraces the application of new multimedia technology. The new technology is designed to act as a real-time monitoring system along a ‘like/dislike’ continuum of perceived quality, whilst also collecting real-time pressure data that is outside the listener’s conscious awareness. The paper will discuss the overall research methodology, and the implementation and application of this technology, as well as possible implications for the teaching of singing.

he study of aesthetics is a characteristic feature of many different and diverse cultures across the world over several thousand years. This branch of philosophy embraces concepts of taste and beauty and appears to be particularly applicable (but not solely) to the arts, including music. Within the

field of music, it is part of the human condition to be moved by the human voice in performance. The bases for this are believed to lie in our first experiences, perceptual and emotional, of maternal vocal sound, both pre-birth and subsequently across the first twelve months of life (Malloch, 1999; Welch, 2005). And although music-making, participation and/or any other form of involvement, in general, is the emanation of numerous processes that allow every individual with normal development to be capable of ‘musiking’, it has been argued that singing activity stands above all other types of musical behaviour as the most effective and powerful means of expressing a wide range of human thought and feelings (Durrant, 2003). From both sociopsychological and cross-cultural perspectives, singing appears to reflect an intrinsic human need that transcends any morphological variation between and within different cultures, social contexts and eras (Himonides, 1997; Durrant & Himonides, 1998). The unique nature of singing as a human activity is also revealed in recent findings of neuropsychobiological research. Singing is the only activity that combines text and musical form and there is evidence (Peretz et al., 2004) that verbal production (either sung or spoken) is processed by the same language output system, but through two distinct pathways, one for singing a given melody and the other for delivering the text of songs.

Available definitions imply that our perception of beauty is a daedal model that involves the combined correspondence of a number of factors. Nevertheless, the delineation and quantification of beauty, whether in singing or in other forms of

T

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 3: Building a Bridge Between Aesthetics and Acoustics

Research Studies in Music Education Number 24, 2005

59

© 2005 Callaway Centre

artistic expression, is often contentious (such as the linked ‘Response to…’ nature of articles in recent editions of the Philosophy of Music Education Review). The nature of any links between music and emotion are similarly contested, even though there may be general agreement amongst researchers that there is an intersection between the two (Jorgensen, 2004). Accordingly, Figure 1 offers a graphical representation of a suggested theoretical framework that embraces the elements that are implicated in the construction of our perception of beauty in singing.

Musical performance embraces affective acoustic cues, such as changes in time, cadence, sound intensity, intonation, vocalisation, timbre, modulation, dynamics and even silence to convey emotions such as joy, awe, sadness, fear and anger (Juslin & Sloboda, 2001). Analyses of recorded performances indicate that virtually every performance variable is affected in ways specific to each emotion (Gabrielsson, 2003).

On a morphological basis, both musical content as well as musical structure are perceived to be key elements upon which aesthetic response is based. Although the interpretation of the term ‘aesthetic response’ remains opaque, Ockelford (2005) advocates that this might be in close accord with repetition in the design and cognition of music. Thus, minded that familiarity with a given style exists, repetition of micro and macro interrelationships of core musical elements are one of the driving forces in the experience of listening to a piece of music.

In addition to these neuropsychobiological and musical elements, research literature indicates that perception of beauty in singing is likely to be related to the listeners’ phase of musical development and experience (e.g. Welch, 2005), their acquaintance with the specific musical genre (Sloboda & O’Neil, 2001; Deutsch, 2003), the context for the listening (Himonides, 1997), the degree of performance skill demonstrated by the singer (cf. Sundberg, 2000) and the extent to which the perceived sound has been manipulated by the sound technician, recording or broadcast engineer (Howard & Angus, 2001; Cook, 2001).

Furthermore, the existence of different cultural, social and epochal standards or expectations concerning the nature of quality in singing (Potter, 2000) effect the way that singers are inducted and educated towards the production of beautiful singing performances. Yet, in general, singing pedagogy is relatively poorly documented in relation to systematic, theoretically founded research. Teaching is often seen as highly idiosyncratic, based on semitransparent methods and driven by a craft-knowledge that draws on inherited empirical methods and that is mediated by sociocultural fashion (Callaghan, 2000; Welch et al., 2004).

Miller offers germane considerations regarding the problems that today’s singing artist and/or apprentice might be facing that are predominantly grounded upon a proliferation of information. He claims:

The ambient instructional air that today’s singer breathes is full of pedagogic pollutants, often based either on imagery that has no link to acoustic or physical function, or on pseudoscientific assumptions that have no relationship to professional vocal sound (2000, p. 160).

It seems logical, therefore, for music education and pedagogy to be concerned with the challenges of both understanding and fostering artistic beauty in vocal performance. Improved specificity and research-based agreement in our pedagogic language for singing appears to be an essential requirement if we are to suggest robust improvements in the educational development of singers (Howard et al., 2004). Within this need for a greater depth of understanding there is a continued awareness to interrogate the essence of beauty in sung performance. Arguably, if this quality could better be defined or captured in some way, it may be possible to facilitate improved student performance in the studio, school and/or conservatoire setting.

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 4: Building a Bridge Between Aesthetics and Acoustics

Number 24, 2005 Research Studies in Music Education

60

© 2005 Callaway Centre

Figure 1 Theoretical elements that ‘construct’ the perception of beauty in a singing performance (with indicative references)

The prime purpose of this paper is to share ongoing research into a clarification of the elements that are perceived to constitute a ‘beautiful’ singing performance (see Figure 1) and to determine the extent to which particular elements are salient and quantifiable for an individual listener.

The vocal instrument Other evidence in support of a view that the human voice is perceived to be a

unique performance instrument surfaces from music reviews, critiques and analyses in the popular press. Comments from a recent (2005) Sunday broadsheet supplement are typical: “Cooder’s voice, earthier than ever…”, “Martin’s favourite song [on the new Coldplay album] he sings, sounding like he’s been up all night crying” and “Eno’s voice is contemplative and unassuming…” (The Sunday Times, 22 May 2005). It appears to be common for a review of a vocal performance to make reference to the instrument (the performer’s voice), but this is rarely the case for reports of instrumental performances. We are disposed to take the sound quality of the (non-voice) instrument as a given. However, somewhat paradoxically, research studies continue to investigate the sound qualities of supposedly superior musical instruments, incorporating perceptual testing, physical modelling and a comparison of the actual physical properties/characteristics of the instruments (cf. Lukasik, 2003). The human voice, seems conspicuous by its absence in such studies. The basis for the voice as an essential component in our species-wide communication, including in musical performance, lies in a common vocal anatomy and physiology that is shaped by biological development and interfaced with experience, cultural imperative and tradition (Welch, 2005).

The vocal instrument comprises three fundamental components: (i) the respiratory system which produces the energy source for the voice, (ii) the vocal folds within the laryngeal assembly which vibrate in the airstream to generate the

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 5: Building a Bridge Between Aesthetics and Acoustics

Research Studies in Music Education Number 24, 2005

61

© 2005 Callaway Centre

basic sound and (iii) the vocal tract (the spaces above the larynx—the pharyngeal space within the neck and the oral cavity, complemented by the nasal cavity) which shapes the sound (cf. Titze, 2000; Welch & Sundberg, 2002). In order to vocalise, the respiratory system compresses the lungs to generate an upward flowing airstream which sets the edges of the vocal folds in vibratory motion, resulting in pulsed sound waves that travel (mainly) through the vocal tract, to be radiated outwards from the lips (Welch et al., 2004).

Voiced sounds are acoustically rich, having many harmonics above the fundamental frequency. Accordingly, vocal pitch is essentially a product of patterns of vocal fold vibration, vocal loudness relates to changes in air pressure from the lungs and vocal colouring is generated by the interface between vocal fold vibration and the configuration of the elements of the vocal tract (Welch & Sundberg, 2002). This ‘branding’ of the sound within the vocal tract results in a rich and, importantly, unique product, the human voice (cf. Himonides, 2005). This distinctiveness is what makes the voice one of the key specialties in the science of biometrics (European Commission, 2005). However, voice scientists’ perspectives on the nature of quality in voice production are more problematic. For example, Titze and Story rehearse some of the ongoing challenges in matching conventional labels with scientific explanation:

Descriptions of voice quality have traditionally consisted of qualitative terms such as warm, shrill, twangy, creaky, shrieky, breathy, yawny, gravelly, hoarse, ringing, dull, nasal, resonant, rough, and pressed. While commonly used in both clinical and non-clinical situations, the acoustic and articulatory correlates of these terms have not been well defined. In comparison, the characteristics of vocal registers have been somewhat better defined and are often given the generally accepted labels of modal, fry, and falsetto in speech, and chest, head (or mixed), falsetto and whistle in singing. Work is now ongoing to address a few of these voice qualities on a physiologic and acoustic level (2002, p. 3).

The labels in general use for voice quality in speech are contentious and, notwithstanding the above comparison that suggests that singing generates a greater level of consensus, there continues to be controversy surrounding definitions of vocal registers (including their nomenclature and number) (Thurman et al., 2004). Consequently, given the impression in our language for vocal qualities, the lack of definition of ‘quality’ in sung performance and the wide range of elements that are theorised to contribute to our perceptions of beauty in sung performance, the research outlined in this paper is being undertaken in order to:

1) explore the possibility of measuring moment by moment listener emotional response during whilst listening to recordings of sung performances;

2) understand more clearly which of the elements that constitute a vocal performance are more strongly inferred by and interrelated with the perception of a beautiful singing performance, including its acoustic variations and musicological features;

3) determine the degree of correspondence (if any) between listeners’ reported perceptions of beauty in sung performances in comparison with measures taken during their listening to the same pieces.

Methodology A review of key literature from the fields of music psychology, acoustics and

musicology has generated a multifaceted perspective concerning the perception of beauty in singing performance (Figure 1). The fieldwork phase of the research methodology is seeking to understand more clearly the nature of these constructs and their relationships with each other. A five-phased research pathway has been constructed to gain comparative insights into the similarities and differences between participants’ reported perceptions and their actual moment-by-moment responses.

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 6: Building a Bridge Between Aesthetics and Acoustics

Number 24, 2005 Research Studies in Music Education

62

© 2005 Callaway Centre

These five phases (Figure 2) have been designed to develop iteratively as a multimethod framework (Robson, 2002). In overview, the research instrument consists of:

• semi-structured interviews; • questionnaires (elliciting a mixture of quantitative and qualitative answers); • the collection and analysis of real-time perceptual and conceptual listener

response data to specially selected auditory stimuli; • analyses (dynamic and spectro/temporal) of the selected recorded

performances that are being used as the auditory stimuli; • analyses, comparison and triangulation of the above data-sets.

The lifespan of the research fieldwork has been segmented into five consecutive activities, phases I to V. Pragmatically, some of these overlap in time as it has been possible to continue the questionnaire study (phase II) whilst, developing the protocol and research instrument for perceptual testing (phases III and IV). A brief overview of each of these phases is given below, with a particular focus on the challenges that have needed to be addressed in phase IV as an example of the complexities involved in this study.

Figure 2 An overview of the research project design

The phases of the research

Phase I The purpose of this phase has been to suggest key elements for further

investigation related to possible correspondences between the aesthetic and

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 7: Building a Bridge Between Aesthetics and Acoustics

Research Studies in Music Education Number 24, 2005

63

© 2005 Callaway Centre

emotional aspects of the listening experience. This has been undertaken through a group of exploratory interviews which are also being used to report the views of experienced professionals regarding the singer’s voice and vocal performance quality. During this initial phase of the research, nine individuals were interviewed. Of the nine people that participated:

• one was a world-class choral conductor; • two were professional opera singers; • one was a music educator, professional pianist and researcher; • one was a semi-professional opera singer and music educator; • one was a critic of music performances (live and commercially recorded) as

well as an Assistant Professor of Ethnomusicology; • one was a classical music enthusiast, and high fidelity (Hi-Fi) audiophile; and,

finally, • two were professional teachers of singing, one of them specialising in classical

singing and the other in popular (pop) singing. The basis for the selection of the interviewees was to gather an initial range

of opinions from a small number of individuals who had a declared specialist interest (professional or otherwise) in sung performance. During the semi-structured interview sessions, the participants engaged in conversations that focused on their interpretation of voice quality in general, vocal performance quality, their own listening experiences and any perceived differences between their listening experience in a live concert hall context compared to that of a recorded audio performance. Interviews were digitally recorded and transcribed, prior to being analysed using ATLAS.ti (a qualitative analysis package for large bodies of textual, graphical, audio and video data). The analyses confirmed the major factors that surfaced from the literature review (Figure 1), whilst also indicating differences in the degree to which they were regarded as salient by the individual interviewees. Comments focused principally on features related to the production of the sound (such as tone quality, vibrato, power and intonation) and also on the effects of sociocultural contexts (including interpretation, diction and acting). Less awareness was evidenced of the possible effects of the structure of the music on perceptions of vocal beauty or of how recorded sound might have been manipulated in the recording studio (or live broadcast) by the sound engineer (cf. Howard & Angus, 2001; Nair, 1999). In addition, these initial interviews demonstrated that there are important intersections between factors (such as the listener’s sociocultural background and expectation and/or links between individual musical development and the perceived production of the acoustic signal).

Phase II Following the initial stage of the research (phase I), the purpose of phase II

was essentially complementary and focused on accessing the views of a wider range of listener backgrounds, including those who did not profess to have a professional background in music. Phase II embodied the design, piloting and distribution of a standardised open-ended questionnaire. The questionnaire consisted of three sections. The first section sought demographic information regarding the respondents’ sex, age group, profession, country of origin and field of specialist knowledge (if any). In the second section, respondents were asked to nominate a singer that they regarded as one of the ‘world’s best’ and to provide a brief explanation as justification for their choice. The third and last section was also designed to be open-ended. Participants were asked to provide a self-generated list of the most significant factors for the evaluation of quality in a sung performance.

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 8: Building a Bridge Between Aesthetics and Acoustics

Number 24, 2005 Research Studies in Music Education

64

© 2005 Callaway Centre

For this purpose, five separate, blank text-boxes were provided. Next to each of the text-boxes the respondents were asked to rate the importance of each of their presented factors by assigning a number from a seven-point Likert scale (with ties permitted).

Three different groups/audiences have completed the questionnaire. The first group of participants were delegates at the 2nd International Conference on the Physiology and Acoustics of Singing, held in Denver, Colorado during the second week of October 2004. The respondents came from a wide range of backgrounds and countries of origin and included performers, voice educators, voice scientists, medical doctors and behavioural therapy specialists, drawn from the United States, Australia, Brazil, Canada, England, France, Israel, the Netherlands, Norway, Portugal and Sweden. Seventy hard copies of the questionnaire were produced and distributed to the delegates and forty-nine completed questionnaires (N1=49) were collected (response rate = 70%). The second participant group completed an online electronic form that was visually identical to the hard-copy version. The electronic address to this online survey was circulated amongst students and staff of the Royal College of Music (London), the Institute of Education (London), the University of Sheffield, the Royal Academy of Music (London) and, finally, individual members of the British Voice Association. So far, N2=180 responses have been received.i The third and final group surveyed were students and staff of the Irish World Music Centre at the University of Limerick (Ireland), as well as undergraduate and postgraduate students and teaching staff of the same university’s Mary Immaculate College. The response from this final survey group was N3=145 completed questionnaires.

The questionnaire was designed to be a hybrid survey instrument in that it was intended to serve as a large-scale structured interview substitute (see Gafni et al., 2003). Although the qualitative distillation of the responses is ongoing (and currently numbers 374), certain trends in the data are already emerging. One of the most striking is that participants rarely, if ever, make reference in their ‘list of important features’ to any of the characteristics by which they define their previous selection for a ‘world’s best’ singer. As an example, one respondent nominated “Jon Bon Jovi” as one of the world’s best singers because “...His songs are heartfelt and he has a voice that is very easy to listen to...” On the other hand, the same respondent named Strong voice (rated 7), Tune of voice (rated 6), Range of voice (rated 6), Good volume of performance (rated 5) and Projection of voice (rated 4). It is as if the participant has two co-existing conceptions of beauty in singing, namely beauty as encapsulated by a particular vocal performer and (separately) self-generated criteria of vocal beauty in the abstract. The three most common criteria that are evidenced so far in the latter category are tone quality, range and strength of voice. This suggests that there is a potential bipolarity between different components in the suggested listener construction of singing beauty (Figure 1), namely between features in the production of the acoustic signal by the performer (tone quality) and the neuropsychobiological & sociocultural processing of the performance (“…songs are heartfelt”).

The analyses of the diversity and somewhat paradoxical nature of the questionnaire responses concerning perceptions of vocal beauty will be critiqued in the light of evidence arising from the real-time data generated within the forthcoming perceptual testing phase (phase IV).

Phase III A case-study approach is being piloted for the main part of this research

(phases III and IV) in order to investigate how best to gain a measure of real-time emotional responses during listening sessions of recorded performances. The third

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 9: Building a Bridge Between Aesthetics and Acoustics

Research Studies in Music Education Number 24, 2005

65

© 2005 Callaway Centre

phase of the research has been organised into two main sub-phases, namely a pre-testing stage, followed by perceptual data gathering on a case-by-case basis. Sub-phase one: a pre-testing stage

Dido’s Lament from Purcell’s ‘Dido and Aeneas’ was selected as a relatively well-known, high-status piece of repertoire from the classical western music canon. Fifteen different, commercially available versions of this passage were identified initially. On the basis of initial piloting with a small group of volunteer listeners, this number was then reduced down to five versions in order to ensure a contrasting diversity of stimuli, whilst also reducing the possibility of listener fatigue during the real-time task.

In order to conduct the short-listing of the five finalist pieces and to reduce the risk of contamination of the research procedure and outcomes by the lead author’s personal taste, an interactive multimedia application was created using Macromedia’s ‘Flash MX’ and the ‘Actionscript’ programming language. This application comprised a screen that hosted a graphical representation of the fifteen commercial versions in a track-per-line fashion. Five different versions of the same application were created where the order of appearance of each track was algorithmically randomised and stored in an encrypted file for future reference, in order to minimise any ordering effects of the sequence of presentation. For convenience of use in relation to HCI (human-computer interaction) (Himonides, 2003) and to allow the listener/evaluator the facility to jump effortlessly between different versions, but also different key moments of the ‘Lament’ score, six marker-points were created for each of the fifteen versions that corresponded to bars 1, 11, 21, 28, 34 and 38 of the score. Bars 1 and 11 corresponded to the beginning of the ‘Lament’ and its repeat respectively, beginning with “...when I am laid...” (Figure 3).

Figure 3 Bars 1-4 and 11-14 [repeat]

Bars 21 and 32 [for its repeat] respectively mark the piano beginning of Dido’s sorrowful appeal to be remembered (Figure 4).

Figure 4 Bars 21-24 and 32-35 [repeat; differs on bar 35]

Bars 28 and 38 [for its repeat] correspond to the crescendi, the climax [and repeat] where the performer is required to sing a high G on a difficult /ĕ/ vowel [“...remĕmber me...”] (Figure 5).

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 10: Building a Bridge Between Aesthetics and Acoustics

Number 24, 2005 Research Studies in Music Education

66

© 2005 Callaway Centre

Figure 5 Bars 28-32 and 38-end [repeat: last note omitted]

Therefore, the listener/evaluator was provided with the means to navigate amidst all fifteen different commercial versions, as well as the facility to be able to jump to one of the six distinct moments of the score at will. For usability purposes, a ‘roll-over’ script was utilised so that the listener would only have to hover over the control buttons on the monitor screen (rather than having to click on them) in case s/he desired to switch rapidly between the different moments of the versions (Figure 6). Finally, the applications, by means of the five randomised versions, were saved onto mixed mode CD-ROMs that could also play the fifteen individual audio-tracks on normal commercial compact disc audio players. The unique randomised sequence for each of the five versions was kept to match that of the multimedia version on each CD-ROM. Each of the five CD-ROMs was given to individuals for the pre-screening session. The participants were asked to rate the overall performances by assigning one unique score from 1 to 15 to each performance, with the perceived ‘best’ performance being rated with the score of 15 and, consequently, the perceived ‘least best/worst’ performance to be rated with the score 1. All performances had to be rated and none of the participants were instructed with regards to any emphasis on specific aspects of the performances. The listener’s score was marked in a space provided on the back of the CD-ROM jewel cases.

Upon the collection of the five CD-ROMs and the decryption of the encrypted files that contained the information regarding the order of appearance of the pieces, the final scores and rankings for each of the fifteen versions were calculated.

Thus the conclusion of the pre-testing stage of phase III was marked by five performances being chosen as the qualifiers for phase IV: those rated as the ‘best’ two, the ‘worst’ two and the performance that came in eighth place (and thus categorised as ‘relatively neutral’).

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 11: Building a Bridge Between Aesthetics and Acoustics

Research Studies in Music Education Number 24, 2005

67

© 2005 Callaway Centre

Figure 6 The interactive interface that has been developed for the ‘ranking’ and selection of performances Sub-phase two: personal a cappella recordings

The beautification and flattery of sound, alongside the manipulation of audio in recording, are part of the recording engineer’s and producer’s art. Nowadays, modern recordings and modern audio productions rarely capture and reproduce singing and other musical performances as they might sound to the untutored ear. Reality becomes a product of the perceptual taste and experience of particular recording teams as they seek to improve the acoustic quality of the source output. For the purposes of this study, such manipulation is both unseen (and therefore unquantifiable) and also represents an unknown acoustic biasing in the various recorded versions.

Accordingly, it was decided that a parallel set of perceptual test material should be designed and created. For this parallel set, five recordings of the same piece (Dido’s Lament) were performed by invited soloists using the following recording benchmarks:

• All of the recordings were performed a cappella by different professional, semi-professional and student operatic singers (female sopranos). Unaccompanied singing permitted the possibility of future acoustic analyses without the interference of the accompanying instruments.

• Each recording took place in the same recording venue (Logan Hall Theatre, Institute of Education, University of London) in order to have a common acoustic setting with regard to the contribution of the recording space/venue to the branding and/or colouration of the captured sound.

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 12: Building a Bridge Between Aesthetics and Acoustics

Number 24, 2005 Research Studies in Music Education

68

© 2005 Callaway Centre

• All of the recordings were made using the same technology, namely a professional vacuum-tube large diaphragm microphone, an FMR Audio RNP8380 professional microphone pre-amplifier, an RME ‘Multiface’ professional recording interface set up for recording at 24 bit/96 KHz,ii and finally Cakewalk ‘SONAR 3 producer’ digital audio production software. This explicit delineation and commonality was seen as necessary because the chain of technology used for recording constitutes an extremely important factor in the final branding of the sound.

• Recent research (Nordenberg & Sundberg, 2003) suggests that variation of vocal loudness can actually affect in a non-uniform manner the contour of the long term average spectrum.iii For that reason, all of the recordings were calibrated in terms of sound pressure levels (SPL) so that it would be possible in the future to perform accurate comparative analysis of the individual takes. For this purpose, a commercial SPL meter was used.

These five recordings will be evaluated perceptually, in addition to the five commercial recordings, using the protocol outlined in phase IV.

Phase IV Perceptual testing and measurement of experience in mainstream non-clinical psychological research tends to have been undertaken previously either through an interview schedule or with a ranking of the experience on a continuum or numbered scale. In both cases, the testing usually occurred ‘post-hoc’ at the end of the listening session (cf. Berliner et al., 1978; Sederholm et al., 1993; Granqvist, 2003). In a few cases, the listener might be asked to perform a number of ratings during the listening session (see Figure 7).

Figure 7 A schematic of customary response measurement methods. The listener is either interviewed post-hoc or evaluates the performance using a Likert scale. Sometimes, a response is required whilst actually listening.

The development and introduction of (then) revolutionary technology known as ‘the cruddy’ (CRDI: continuous response digital interface) by the Center for Music Research in Florida opened new horizons for perceptual testing (see among others: Madsen et al., 1991). Since then, a large number of research studies have been conducted using this technology, and there is a general acceptance of its positive contribution to response measurement (Capperella, 1989; Johnson, 1992). The listener is asked to rate different elements whilst recording his/her experience on a circular dial (which looks like a volume control on a stereo amplifier). All the distinct locations of the dial are recorded for subsequent analysis (Figure 8).

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 13: Building a Bridge Between Aesthetics and Acoustics

Research Studies in Music Education Number 24, 2005

69

© 2005 Callaway Centre

Figure 8 Previous enhanced technological methods. The listener ‘rates’ the performance using a digital dial-like interface. The position of the dial corresponds to a particular ‘value’ until change by further manipulation. Output data is thus not necessarily providing conscious rating information all the time.

In the research detailed in this paper, a new apparatus for perceptual testing is proposed and is currently under evaluation (CReMA: continuous response measurement apparatus). The CReMA device has been adapted from modern analogue synthesiser control technology as it is believed to offer additional benefits to those provided by the CRDI. This new interface acts as a more intuitive linear control system, thus not requiring the user to jump to a new location on a circular potentiometer. In this way, a one-to-one analogy to linear scoring (graded scales, Likert scales, and scoring continua) is provided in an attempt to retain more closely the ‘like-dislike’ n-point scale linear domain. This new technology has additional, innovative features (Figure 9). In addition to left-right hand movement, the controller is able to capture real-time pressure data—an aspect of the listening experience which is likely to be outside the listener’s conscious awareness. (The pressure data is hypothesised as a correlate of the emotional response and will be subject to validation subsequently by the addition of real-time physiological measurements during the listening experience.)

Figure 9 The proposed method for real-time listening using a CReMA (continuous response measurement apparatus) linear controller. The listener uses an intuitive interface and rates the experience on a linear ‘like-dislike’ continuum. The device is capable of exporting true

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 14: Building a Bridge Between Aesthetics and Acoustics

Number 24, 2005 Research Studies in Music Education

70

© 2005 Callaway Centre

(quantised or not) position data as well as pressure data to either any MIDI capable sequencer or any professional device capable of capturing analogue control voltage (CV).

In Figure 10, we can see a schematic representation of the perceptual testing that is based on the CRDI technological approach. Although the monitoring is continuous, a potential weakness of this system can be observed. The listener turns the dial whilst listening and the dial pointer’s location is recorded. However, unless the movement is continuously changing, there will be periods of apparent stasis between responses (see dotted line). These stationary moments may or may not be intended by the listener.

Figure 10 Data analysis using existing response capturing technologies, such as the CRDI

The above mentioned issue, however, is relatively absent in real-time scoring using the CReMA technology. The interface is recording true-positional data only when the listener’s finger is in contact with the surface of the ribbon (Figure 11).

Figure 11 Data output using the proposed CReMa technology

The hypothesised meaningfulness of the additional set of data (pressure data) is currently being piloted in terms of the significance of the output data to the perceptual testing of emotional response to perceived vocal beauty. Phase V

During the final phase of the research project, the focus will be on the triangulation of the data from the previous four phases. Phases I and II have provided qualitative data concerning participants’ reported listening preferences, as well as an insight into their post-hoc interpretation and reasoning with regard to their preferences. Phase III has generated two sets of ingredients (elements to be tested) for case studies that require participant listeners to experience commercial and specially recorded sung performances, the latter with known acoustic and recording variables (cf. Baken, 1996). In this final phase, all the data sets will be brought together to generate a composite perspective of the nature of the perception of ‘beauty’ in sung performance.

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 15: Building a Bridge Between Aesthetics and Acoustics

Research Studies in Music Education Number 24, 2005

71

© 2005 Callaway Centre

Conclusions / implications for practice The nature of perceived beauty in sung performance appears to be complex,

yet it is hoped that this multimethod approach will be able to illuminate key aspects of a relatively common human response (“She is my favourite singer because…”). If so, we should be able to apply such knowledge in the singing studio or classroom to foster critical listening and also improved sung performance quality and singing teaching. Consequently, the purpose of the research is not to attempt to create a technology that will somehow detect quality in sung performance, but rather to design a system that provides additional insights into how we are responding moment by moment when we perceive singing to be particularly beautiful. Outside a performance hall, where additional factors (i.e. being a member of an audience, having visual feedback, experiencing a live performance) are absent, what we like or dislike is encapsulated in a captured sound-file. Every single piece of information that our ears receive when we are listening to our favourite vocal performance in the comfort of our living room can be isolated and analysed. The classical good CD & DVD guide 2005 (Roberts, 2004, p. 772) mentions: “Catherine Bott is a fine Dido, even-voiced across the range and powerfully expressive if occasionally a touch free with the rhythms”; is this information identifiable in the recorded sonic material? Might this particular performance be perceived as ‘better’ than another because of the timbre of Catherine Bott’s voice or maybe because of her vibrato rate or the variability of amplitude? The proposed research framework will attempt to see whether we are paying attention to specific ingredients that constitute a whole performance.

References Baken, R. J. (1996). Clinical measurement of speech and voice. San Diego, California:

Singular Publishing Group, Inc. Berliner, J. E., Durlach, N. I., & Braida, L. D. (1978). Intensity perception. IX. Effect of a

fixed standard on resolution in identification. Journal of the Acoustic Society of America, 64(2), 687-689.

Callaghan, J. (2000). Singing and voice science. San Diego, California: Singular Publishing Group.

Capperella, D. A. (1989). Reliability of the continuous response digital interface for data collection in the study of auditory perception. Southeastern Journal of Music Education, 1, 19-32.

Cook, P. R. (2001). Music, cognition, and computerized sound: an introduction to psychoacoustics. Massachusetts: MIT Press.

Deutsch, D. (2003). Ear and brain: how we make sense of sounds. New York: Copernicus Books.

Durrant, C. (2003). Choral conducting: philosophy and practice. New York: Routledge. Durrant, C., & Himonides, E. (1998). What makes people sing together? Socio-psychological

and cross-cultural perspectives on the choral phenomenon. International Journal of Music Education, 32, 61-71.

European Commission (2005). Biometrics at the frontiers: assessing the impact on society: for the European Parliament Committee on Citizens’ Freedoms and Rights, Justice and Home Affairs (LIBE) (No. EUR 21585 EN). Brussels: European Commission, Joint Research Centre (DG JRC), Institute for Prospective Technological Studies.

Gabrielsson, A. (2003). Music performance research at the millennium. Psychology of Music, 31(3), 221-272.

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 16: Building a Bridge Between Aesthetics and Acoustics

Number 24, 2005 Research Studies in Music Education

72

© 2005 Callaway Centre

Gafni, N., Moshinsky, A., & Kapitulnik, J. (2003). A standardized open-ended questionnaire as a substitute for a personal interview in dental admissions. Journal of Dental Education, 67(3), 348-353.

Granqvist, S. (2003). Computer methods for voice analysis. Stockholm: Royal Institute of Technology, Speech, Music and Hearing.

Himonides, E. (1997). What makes people sing together? Socio-psychological and cross-cultural perspectives on the singing phenomenon. Unpublished MA dissertation, University of Surrey, Guildford, UK.

Himonides, E. (2003). Towards a new praxis of context-sensitive human-computer interaction: a study on the evaluation of current HCI principles in interactive multimedia learning. (Unpublished report [Star First-Class] for accreditation with the British Computing Society). London: Middlesex University.

Himonides, E. (2005). The use of interactive-multimedia technology for the recording of emotional response to vocal performance quality: implications for the teaching of singing. Proceedings of the Fourth International Research in Music Education Conference. Exeter, UK: University of Exeter.

Howard, D. M., & Angus, J. (2001). Acoustics and psychoacoustics (music technology) (2nd edn). Oxford: Focal Press.

Howard, D. M., Welch, G. F., Brereton, J., Himonides, E., De Costa, M., Williams, J., & Howard, A. W. (2004). WinSingad: a real-time display for the singing studio. Logopedics Phoniatrics Vocology, 29(3), 135 - 144.

Johnson, C. M. (1992). Use of the continuous response digital interface in the evaluation of live musical performance. Journal of Educational Technology Systems, 20, 261-271.

Jorgensen, E. (2004). Editorial. Philosophy of Music Education Review, 12(1), 1-3. Juslin, P. N. (2001). Communicating emotion in music performance: a review and a

theoretical framework. In P. N. Juslin & J. N. Sloboda (Eds), Music and emotion: theory and research (pp. 309-337). Oxford: Oxford University Press.

Juslin, P. N., & Sloboda, J. (Eds) (2001). Music and emotion: theory and research (series in affective science). Oxford: Oxford University Press.

Lukasik, E. (2003). AMATI: A multimedia database of violin sounds. Paper presented at the Stockholm Music Acoustics Conference 2003, Stockholm, Sweden.

Madsen, C. K., Geringer, J. M., & Heller, J. (1991). Comparison of good versus bad intonation of accompanied and unaccompanied vocal and string performances using a continuous response digital interface (CRDI). Canadian Music Educator: Special Research Edition, 33, 123-130.

Malloch, S. N. (1999). Mothers and infants and communicative musicality. Musicae Scientiae, Special Issue, 29-57.

Miller, R. (2000). Training soprano voices. New York: Oxford University Press. Nair, G. (1999). Voice-tradition and technology: a state-of-the-art studio. San Diego,

California: Singular Publishing Group, Inc. Nordenberg, M., & Sundberg, J. (2003). Effect on LTAS of vocal loudness variation. TMH-

QPSR, KTH, 45, 93-100. Ockelford, A. (2005). Repetition in music: theoretical and metatheoretical perspectives.

London: Ashgate. Peretz, I., Gagnon, L., Hébert, S., & Macoir, J. (2004). Singing in the brain: insights from

cognitive neuropsychology. Music Perception, 21(3), 373-390.

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 17: Building a Bridge Between Aesthetics and Acoustics

Research Studies in Music Education Number 24, 2005

73

© 2005 Callaway Centre

Potter, J. (2000). The Cambridge companion to singing. Cambridge: Cambridge UP. Roberts, D. (Ed.) (2004). The classical good CD & DVD guide 2005. Teddington, Middlesex:

Gramophone Publications Limited. Robson, C. (2002). Real world research: a resource for social scientists and practitioner-

researchers (2nd edn). Malden, Mass: Blackwell Publishers. Scherer, K. R., & Zentner, M. R. (2001). Emotional effects of music: production rules. In P.

N. Juslin & J. N. Sloboda (Eds), Music and emotion: theory and research (pp. 361-392). Oxford: Oxford University Press.

Sederholm, E., McAllister, A., Sundberg, J., & Dalkvist, J. (1993). Perceptual analysis of child hoarseness using continuous scales. Scandinavian Journal of Logopedics and Phoniatrics, 18, 73-82.

Sloboda, J. N., & O’Neil, S. A. (2001). Emotions in everyday listening to music. In P. N. Juslin & J. N. Sloboda (Eds), Music and emotion: theory and research (pp. 361-392). Oxford: Oxford University Press.

Sundberg, J. (1987). The science of the singing voice. Dekalb, Illinois: Northern Illinois University Press.

Sundberg, J. (2000). Emotive transforms. Phonetica, 57, 95-112. Thurman, L., Welch, G. F., Theimer, A., & Klitzke, C. (2004). Addressing vocal register

discrepancies: an alternative, science-based theory of register phenomena. Paper presented at the Second International Physiology and Acoustics of Singing Conference (PAS04), Denver, Colorado.

Titze, I. R. (2000). Principles of voice production (2nd edn). Iowa City: National Center for Voice and Speech.

Titze, I. R., & Story, B. H. (2002). Voice quality: What is most characteristic about “you”. Echoes, 12(4), 3-4.

Welch, G. F. (2005). Singing as communication. In D. Miell, R. A. R. MacDonald & D. J. Hargreaves (Eds), Musical communication (pp. 239-259). New York: Oxford University Press.

Welch, G., Himonides, E., Howard, D. M., & Brereton, J. (2004). VOXed: technology as a meaningful teaching aid in the singing studio. Paper presented at the CIM04 Conference on Interdisciplinary Musicology, Graz, Austria.

Welch, G. F., & Sundberg, J. (2002). Solo voice. In R. Parncutt & G. McPherson (Eds), The science and psychology of music performance: creative strategies for teaching and learning (pp. 253-268). New York: Oxford University Press.

i The online survey is still active and can be found at: <http://www.sonustech.com/question/question.html>. ii Bit depth (or word length) and sampling rate respectively. Implying very high quality/resolution (CD audio standard is 16 bit/44.1 KHz). iii A long term average spectrum (LTAS) represents the power spectral density as a function of frequency (see: <http://www.praat.org>) and is regarded as an efficient method for voice analysis, revealing both voice source and formant characteristics (see Nordenberg & Sundberg, 2003).

About the Authors Evangelos Himonides is a lecturer in Music Technology Education and

Research Officer in the School of Arts and Humanities, Institute of Education, University of London. His doctoral research is focused on the voice and

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from

Page 18: Building a Bridge Between Aesthetics and Acoustics

Number 24, 2005 Research Studies in Music Education

74

0.

© 2005 Callaway Centre

psychoacoustics. His multidisciplinary research background is reflected in his two undergraduate degrees (music; and information technology (multimedia)—the latter at first class honours) and his MA in Choral Education. Current research projects and related publications embrace the teaching of singing, music technology, voice science, human-computer interaction and virtual learning environments with multimedia. When time is available, Evangelos works as a software developer, freelance recording engineer, session electric guitarist and vocalist.

Professor Graham Welch holds the Institute of Education, University of London Established Chair of Music Education and is Head of the Institute’s School of Arts and Humanities. He is Chair of the Society for Education, Music and Psychology Research (SEMPRE) and recent past co-chair of the Research Commission of the International Society for Music Education (ISME). He also holds Visiting Professorships at the universities of Sydney, Limerick and Roehampton, the Open University and the Sibelius Academy in Finland. Research and publications number over 175 and embrace a variety of aspects of musical development and music education, teacher education, psychology of music, singing and voice science, special education and disability.

at Australian National University on April 2, 2014rsm.sagepub.comDownloaded from