programming and enjoying music with your eyes closed

Papers CHI 2000 • 1 -6 APRIL 2 0 0 0

Programming and Enjoying Music with Your Eyes Closed

Steffen Pauws, Don Bouwhuis IPO, C e n t e r fo r U s e r - S y s t e m In te rac t ion

D e n D o l e c h 2 5600 M B E i n d h o v e n , the Ne the r l ands

+31 40 2 4 7 5 2 5 0 { S .C .Pauws , D . G . B o u w h u i s } @ tue .nl

Berry Eggen Phi l ips R e s e a r c h L a b o r a t o r i e s E i n d h o v e n

Prof . Ho l s t l a an 4 5656 A A E i n d h o v e n , the N e t h e r l a n d s

+31 40 2 7 4 5 1 6 0 b e r r y . e g g e n @ p h i l i p s . c o m

ABSTRACT Design and user evaluation of a multimodal interaction style for music programming is described. User requirements were instant usability and optional use of a visual display. The interaction style consists of a visual roller metaphor. User control of the rollers proceeds by manipulating a force feedback trackball. Tactual and auditory cues strengthen the roller impression and support use without a visual display. The evaluat ion invest igated task pe r fo rmance and procedural learning when performing music programming tasks with and without a visual display. No procedural instructions were provided. Tasks could be completed successfully with and without a visual display, though programming without a display needed more time to complete. Prior experience with a visual display did not improve performance without a visual display. When working without a display, procedures have to be acquired and remembered explicitly, as more procedures were remembered after working without a visual display. It is demonstrated that multimodality provides new ways to interact with music.

Keywords multimodal interaction, nonvisual interaction, interface design, user evaluation, interactive music system

INTRODUCTION Considering the wide assortment of music available, instant access to a large music collection and, particularly, the task of music programming becomes increasingly important. Music listeners are already tempted to organize their music electronically and prefer extended play facilities without the need to handle physical storage media. For instance, music encoded in computer files, jukeboxes and portable players are becoming increasingly popular. However, it takes time to explore a music col lec t ion and select the favouri te recordings with only sequential access to the music. In

Permission to make digital or hard copies of all or par~ of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on seiwers or to redistribute to lists, requires prior specific permission and/or a fee. CHI '2000 The Hague, Amsterdam Copyright ACM 20001-58113-216-6/00/04...$5.00

addition, current jukebox players intended for home use are inconvenient to operate [1,2]. They generally contain numerous control elements. They often have inadequate visual displays that lack relevant information required for music programming, or that are poorly legible in dimly lit situations or from a large viewing distance. Multimodality may enhance interaction with music.

This paper describes the design, implementation and a user evaluation of a multimodal interaction style for music programming. Music programming is defined here as the serial selection of multiple music recordings from a music collection. The evaluation was focused on assessing whether or not the user requirements were met by the interaction style.

REQUIREMENTS The most important user requirements for the multimodal interaction style are instant usability and optional use of a visual display.

Instant usability Usually, first-time users of a home device attempt to operate the device immediately without the aid of procedural instructions. A user manual is sometimes never consulted or is simply lost. Since users of home devices have no opportunities for training or are not willing to take these opportunities, learnability is considered a fundamental usability criterion for home devices [3]. Therefore, an interaction style for a home device should be as transparent, intuitive or self-explanatory as possible meaning that users are able to perceive at a glance the most effective and efficient ways to use the interaction style without procedural instructions. In other words, rather than leamability, a home device should allow instant usability.

Some methods to design for instant usability are consistency of operation, minimality of features, the use of a small number of control elements and the use of a conceptual metaphor for interaction. Consistency of operation (or similarity of protocols [3]) means that the same pattern of actions can be used in different situations, allowing users to learn such a pattern only once. Minimality of features means that infrequently used or more complex functions are implemented in such a way that they do not interfere with initial learning, A small number of control elements allows users to learn the meaning of only a small

376 ~k,~l~ ~I=IX ~ O O O CHI Letters volume 2 • issue 1

CHI 2000 , 1 -6 APRIL 2000 Papers

Figure 1: Visual miler metaphor of the interaction style. The music programme contains two recordings. The currently selected music style is 'postbop' and a recording of Miles Davis is playing. Three visible recommendations are linked to this recording.

set of controls or actions, for instance, the meaning of a few buttons. The use of a conceptual metaphor may form a starting point to understand an interaction style. It may aid learning domain knowledge [4], may explain the expression and meaning of actions [5] or may facilitate the learning of procedures [6,7].

Optional use of a visual display Cur ren t CD j u k e b o x p layers have an i nadequa t e presentation of visual information [1,2]. Some visual displays lack the presentation of relevant information required to operate the jukebox effectively. They are often too small or do not have enough contrast to be legible in dim light or from a large viewing distance. Portable players even lack a convenient visual display. The need for visual inspection of information can also be less desirable, for instance, when relaxing while going through a music collection. It is for these reasons that the use of a visual display should be optional in a music programming task, without sacrificing instant usability.

DESIGN The interaction style comprises a manual input modality and various output modalities: the visual, tactual and auditory modalities. User control proceeds entirely by manipulating a force feedback trackball. The auditory modality consists of three different audio streams: synthetic speech, non-speech audio and music audio.

Visual roller metaphor The concept of a music programming for the purpose of selecting favourite music from a collection is generally self- evident, irrespective of whether it is a request or carried out at will. However, first-time users of a home device often find difficulties to transfer declarative domain knowledge and a task purpose into procedural knowledge on how to achieve the desired result, though they generally like to know how- to-use-a-device first. Without procedural instructions, the initial problem that first-time users have to overcome is to d i scover what exac t ly cons t i tu tes an act ion, what consequences can be expected from an action, and whether or not these consequences are effective with respect to their

purpose [8]. In order to partly solve this initial problem for users, a metaphor is used as a conceptual interaction model.

During an iterative design process, several metaphors starting from a spherical object were considered as an conceptual interaction model. The use of a spherical object was prompted by the use of a trackball as input device (see Figure 3). While adding actions that are essential to navigation and selection, the model was assessed on appropriateness. The main criteria for assessment were consistency of operation, a minimum number of actions required and compliance with the envisaged metaphor. Secondary criteria were implementation feasibility and predicted computational resources required.

As shown in Figure 1, the final interaction model resembles a fruit machine consisting of four rollers on which title and artist of music recordings are projected. The left-hand roller represents a music programme that the user creates. A counter is positioned over this roller and displays the number of recordings added to the programme. The next roller represents the music styles in the collection. Music styles are arranged in a chronological order, that is, in the order in which the music styles (in this case, jazz styles) have emerged in time. Next, the music collection roller displays all available recordings in the collection or just the subset of recordings that belongs to a particular music style. Figure 1 shows that this roller has input focus and is therefore high-lighted. Recordings on this roller are first alphabetically ordered on artist, then grouped by album and placed in the order in which they came out on the album. Finally, albums are chronologically ordered on year of publication. The numbers displayed on this roller are intended to indicate the number of recordings available in a music style. The right-hand roller contains a list of music recommendations that corresponds to the music recording that is at the front of the music collection roller. Music recommendations are intended to ease and speed up the music selection process.

~--k.~lBI 377

Papers CHI 2 0 0 0 • 1 - 6 APRIL 2 0 0 0

Figure 2: An indefinitely long list of music recordings tapered along a ball.

In order to coincide any rotation of the trackball with a proportional turn of the roller on the screen, the list of items (music recordings and styles) is virtually tapered round a ball, analogous to wrapping a sheet of paper around a cylindric object (see Figure 2). In that configuration, a forward ball rotation turns a roller clock-wise or upward.

Force feedback trackbal l The sole control element of the interaction style is the IPO force feedback trackball device [9]. This trackball is a ground-based device (see Figure 3). A user can freely rotate a hard-plastic ball (diameter: 57 mm.), which is placed in a housing. Force feedback is mediated by two motor-driven wheels which touch the ball at its x and y axes. The force can be made dependent on the current context of interaction. Besides ball rotations, a contact switch, which is mounted underneath the ball bearing system, notifies ball presses.

Figure 3: The IPO force feedback trackball device.

In current graphical user interfaces, the hand and eyes are decoupled. For instance, working with a trackball requires continuous visual attention to inspect the state of interaction, rather than attention to the wrist, hand and fingers that control the device. A trackball is also notorious for its inefficient performance in target acquisition tasks [10], which makes moving a mediating cursor back and forth in a complex navigation task an effortful activity.

In order to diminish these problems, actions on the trackball are directly mapped to roller behaviour. There is, for instance, no mediating cursor. Rolling the ball back or forth can be d i rec t ly fe l t and d i rec t ly c o r r e s p o n d s to a proportional like-wise rotation of a roller. Thus, user actions have direct meaning which may increase usability. The use of touch feedback is intended to guide users in performing an action, even without using a visual display. It has been

demonstrated that the addition of touch feedback to the trackball used makes users perform actions more efficiently [9]. As a result, a user may experience the trackball device rather as a means to feel and act directly on rollers by instantaneous hand-finger movements, than a means to roll and point to rollers.

In the interaction style, contextual force feedback is specified by a spatial arrangement of force fields. These fields generate a force dependent on the current ball position. The force is mediated by a motor-driven ball rotation. For instance, the force field of a 'hole ' evokes a directional pulling-force towards the 'hole's centre, when the ball enters the region of the 'hole'. This feels like being captured in a region when rolling the ball, and needs some additional hand force to leave this region.

Figure 4: Spatial arrangement of force fields.

The layout of force fields in the interaction style is shown in Figure 4. Each ball movement starts with the ball at the centre of the layout. A circular force field pushes the ball slightly towards the centre. Grooves, placed in a cross, guide ball movement along a straight line (forward, backward and lateral movements). The ends of the grooves are marked by raised edges; these edges slightly hold back a continuous roll movement. When moving across an edge, the ball will be captured by a small hole which typifies the conclusion of a roll movement. This configuration of force fields mimics the felt sensation of a 'mechanical click'. If the ball is moved across an edge with some more hand force, the ball will miss the first hole and will end in the next hole. A small forward (backward) hand movement brings the next (previous) item to the front. Moving the ball with some more hand force skips the next or previous two items. Lateral ball movements corresponds to hopping from one roller to another. Again, a small movement makes a hop to the next roller, and a movement with some more hand force skips a roller.

It may be Clear that a user can, for instance, first choose a particular music style on the music styles roller and continue searching for music recordings within that style by going to the music collection and recommendation rollers. A user can either quickly go to famil iar recordings by skipping i r re levant ones or p roceed sequent ia l ly through all recordings on the roller. Double-pressing the trackball means adding or removing a music recording to or from the music programme.

378 ~ k . ~ ~_I==IT 2 0 0 0

CHI 2000 • 1 -6 APRIL 2000 Papers

Auditory feedback

Speech synthesis Speech output is used to convey information about the current state of interaction, primarily intended to allow the interaction style to be used without a visual display. For instance, after hopping from one roller to another, spoken feedback indicates to which roller was moved to. When entering the music programme roller, it indicates how many music recordings have been added to the programme. When the user rolls through the music styles roller, it tells the user which music style is left and which is entered (e.g., 'from bebop to hardbop'). By pressing the ball once, the user can ask what the currently selected music style is or to what music style a selected recording belongs.

N o n - s p e e c h s o u n d Every user action produces a software-generated sound known as an auditory icon. Auditory icons are probably best described as caricatures of naturally occurring sounds. A benefit of using auditory icons is that they may have an intuitive appeal to the listener [11]. A special class of auditory icons is used, namely, impact sounds of material being struck. Each roller and user action has its own set of qualitatively different sounds intended to distinguish the rollers and the actions by sound alone. The music programme roller is featured by sounds that are best described as sounds produced when a glass-like object is struck by another solid object. The music styles roller produces sounds as hitting a metal object. The music collection roller produces wood-like sounds, and the r ecommenda t ions ro l le r sounds like rubber. Valid parameters to generate these material-like sounds were determined by user experiments [ 12], though it was not intended that users should be able to tell what the roller would be made of by the sound that it makes. When a roller is rotated step-wise (by a small hand force) a single click is produced. Rolling more vigorously produces a spectrally broader click followed by a rattling sound to represent the scroll of the roller. Adding or removing a music recording to or from a music programme produces a 'creaking' or a 'bouncing' sound respectively.

IMPLEMENTATION As shown in Figure 5, a component-based software architecture based on Microsoft ActiveX technology was used to implement the interaction style. The graphical notation used is adopted from the Unified Modeling Language [13]. Music recommendations are generated by an in-house developed music recommender system [ 14]. The manual input and tactual output of the force feedback trackball is controlled by three components: LightHole, TacServer, and TacServer Extension. Software control of the trackball is distributed over two serially connected PC platforms. Data of the music collection is contained in a database component. Music is played back by an MPEG Audio Player. Non-speech audio is generated by a component Impact Sound; this component contains an in- house implementa t ion of the Constra ined Addit ive Synthesis technique for generating impact sounds [11]. Synthet ic speech is genera ted by a Text-to-speech component. The interaction style requires simultaneous

streaming and mixing of audio from different sources and audio output formats with low latency and maximum control. Therefore, an Audio Device component for all audio output streaming services encapsulates and extends on Microsoft's DirectSound technology. Visualisation of the rollers is implemented in the Roller component. All components starts their own threads which can be interrupted or stopped, when other services are asked from them or the state of the interaction changes. However, threads between components are not synchronized.

............ ?S.: .... i .... ::::: ...............

Figure 5: Component-based software architecture. The 'lollipops' represent software interfaces.

USER EVALUATION A user evaluation was carried out to assess instant usability with and without a visual display of the interaction style. Part icular ly, task pe r fo rmance and the learning of procedures with and without a visual display were compared. After three minutes of free exploration of the interaction style, participants performed pre-defined music programming tasks. They were not given procedural instructions on how-to-use the interaction style. Task performance was measured in real-time. Procedural learning was measured by a post-task questionnaire.

Hypotheses As tactual and auditory feedback are transient in the interaction style and visual feedback is continuous and persistent, the critical factor between visual and nonvisual interaction, that is, only tactual and auditory interaction, is human memory. Hence, it is hypothesized that:

(i) Visual interaction is more efficient, that is, requires less time and fewer actions, than nonvisual interaction, while leaving the level of auditory and tactual feedback in both conditions constant.

Nonvisual interaction requires the explicit acquisition and remembering of procedures. Consequently, it is likely that users who interact nonvisually develop a substantial body of procedural knowledge. It is therefore hypothesized that:

(ii) Users who have worked without a visual display have a higher score on procedural knowledge than users who have worked with a visual display, while leaving the level of auditory and tactual feedback in both conditions constant.

Something that is learnt while doing tasks with a visual display (e.g., an imagery of the visual display) is likely to be carried over ('transferred') to the same type of tasks without a visual display, which may facilitate a more efficient

3 7 9

Papers CHI 2 0 0 0 • 1 - 6 APRIL 2 0 0 0

performance of the latter tasks. It is therefore hypothesized that:

(iii) Users who worked with a visual display in one condition and are subsequently transferred to another condition without a visual display perform more efficiently, that is, need to spend less time and fewer actions, than users who start working without a visual display.

Measures Task performance Two task performance measures were defined: number of actions and time on rask. They were measured from the first action to the last action of the task.

Questionnaire score Procedural knowledge was measured using two versions of a 20-item randomised questionnaire; one was handed out half-way through the experiment, the other at the end of the experiment. The order in which both versions were handed out was counter-balanced. The two versions contained a relatively large overlap of questions (16) as well as four distinct questions. Questions asked what sequence of actions, that is, procedure, most efficiently and successfully transformed a given initial state of interaction into a given final state of interaction. Half of the questions concerned a single action (single-step interaction). The other half concerned a sequence of two or three actions (multiple-step interaction). Answers were judged correct if the responded action sequence successfully transformed the initial state of interaction into the final one and the action sequence was of minimal length, that is, was most efficient. Answers were otherwise judged incorrect. All correct answers were added up to arrive at a questionnaire score (maximum: 20).

Method

Instruction Participants read a short text about the music programming domain to provide them with the necessary amount of declarative knowledge. No reference was made to the interaction style; no instructions about procedures or the roller metaphor were given. Participants were asked to r e p h r a s e the g iven tex t in the i r own words . Any misconception of the music programming domain was corrected by the test supervisor.

At the outset of each music programming task, participants received a written task description. They were asked to rephrase the task instruction to avoid any misconceptions of the task.

Task The task was to select 10 distinct music recordings, equally drawn from two pre-defined music styles, as quickly as possible while paying no attention to personal preferences or to the order of the selection process. Four music p rog ramming tasks were defined and their order of presentation was counterbalanced. The tasks were designed to be equally difficult; a successful and most efficient task completion demanded 23 actions.

Design Four condi t ions were applied: two control and two experimental conditions. In one control condition, denoted by VAT (Visual, Auditory, Tactual feedback), participants completed four tasks by using the complete multimodal interaction style. The four consecutive tasks were denoted by task repetition. In the other control condition, denoted by AT (Auditory, Tactual feedback), participants only worked with the interaction style without visual display for all four tasks; the monitor was physically removed. In the two experimental conditions, participants worked with both interaction styles, one after the other. In the experimental condition denoted by VAT --~ AT, there was no visual display for the last two tasks. In the other experimental condition, denoted by AT --* VAT, this was reversed.

Test material and equipment A music collection comprising 480 one-minute excerpts of jazz music recordings (MPEG-1 Part 2 Layer II 128 Kbps stereo) from 12 different music styles served as test material. The interaction style was implemented on a PC, running under Windows 95. MPEG data was stored on the hard disk. Real-time MPEG decoding was done by software. Music was amplified by a mid-range audio amplifier (Philips Integrated Digital Amplifier DFA888) and played through a pair of high-quality loudspeakers (Philips 9818 multi-linear 4-way). A second PC was used for controlling the force feedback trackball.

Participants were seated in a comfortable chair in a non- reverberant studio. The visual display was a 17-inch colour monitor. They could adjust the audio volume to a preferred level. The trackball was placed on a small table next to the chair, in such a way that the participant could control the trackball in a comfortable way.

Procedure Twenty-four participants performed two experimental sessions on two separate days. They were randomly assigned to one of the four conditions: four participants were assigned to each of the control conditions, and eight participants were assigned to each of the experimental cond i t ions . At the first sess ion only, a 15-minute familiarisation phase let participants accustom to the force feedback and the required fine motor skills to control the trackball. Subsequently, participants were informed about the music programming domain. In both sessions, they were allowed to freely explore the interaction style (with or without a visual display) for three minutes. Then the tasks were executed.

The participants completed two music programming tasks during each session. Then, participants completed a version of the questionnaire at a desk from which it was impossible to view the test equipment . The ques t ionna i re was completed in a dialogue with the test supervisor.

Participants The average age of the 24 participants (18 males, 6 females) was 28 (min.: 21, max.: 45). Half of them were recruited by advertisements and got a fixed fee for expenses (16 Dutch guilders = 7.24 euro). The other half were col league researchers who participated voluntarily. All participants

380 ~L~IIIII ~ H Z 2 c ~ ) c D ~

C H | 2 0 0 0 • 1 - 6 A P R I L 2 0 0 0 Papers had completed higher vocational education. They were not selected based on their musical preferences or musical education.

R e s u l t s

Task p e r f o r m a n c e data o f one pa r t i c ipan t in the AT ---> VAT condition were excluded from the analyses; values for time on task and number of actions were three times as high as the mean values of other participants. The participant admitted that he had selected music by taking care of his personal preferences which was not in accordance with the task instruction.

Number of actions

The results on number of actions are shown in Figure 6. In order to compare the difference between working with and without a visual display in the first two tasks, a new variable called visual display condition was created. In this variable, the performance in the VAT and VAT ~ AT conditions (task repetition I and 2) was separated from the performance in the AT and AT --~ VAT conditions.

a. control b. experimental 100 100

75

so

25 VAT"

75

so

25

0 1 2 3 4 1 2 3 4

task repetition task repetition

Figure 6: Mean number of actions and standard error. The left- hand panel (a) shows means for four tasks in the control conditions, The r ight-hand panel (b) shows means for four tasks in the experimental conditions. The minimal number of actions to complete the task is 23.

An ANOVA with repeated measures was used with task repetition (2) as a within-subject independent variable and visual display condition (2) as a be tween-sub jec t independent variable, and number of actions as the dependent variable. No significant effects were found. The mean number of actions performed in the first two tasks was 52.7.

In order to compare the differences over all four tasks, a MANOVA with repeated measures was conducted in which task repetition (4) was a within-subject independent variable, condition (4) was a between-subject independent variable and number of actions was the dependent variable. Only a significant main effect for task repetition was found (F(3,57) = 3.79, p < 0.05). A linear trend in the data was significant (F(1,19) = 7.92, p < 0.05). Participants performed fewer actions for each successive task (mean number of actions across successive tasks: 55.7, 49.8, 44.6 and 40.0).

In order to assess the 'transfer' effect for number of actions, transfer was defined as the comparison between the number of actions performed in task repetition 3 and 4 in the VAT --~ AT condition and the number of actions performed

in task repetition 1 and 2 in both the AT and AT --> VAT conditions.

An ANOVA with repeated measures in which transfer was treated as a between-subject variable, and task repetition (2) as a within-subject variable was carried out. Number of actions was the dependent variable. A main effect for task repetition was found to be just not significant (F(1,17) = 4.11, p = 0.06). A significant transfer effect was found (F(1,17) = 8.81, p < 0.01). It appeared that almost 17fewer actions were required for participants who had done two previous tasks with a visual display, than for those who started to work without a visual display. This suggests that performance without a visual display improved, when two previous tasks with a visual display were done.

However, performance improvement can also be caused by mere practice. Therefore, number of actions performed in task repetition 3 and 4 in the VAT ~ AT condition was compared with number of actions performed in task repetition 3 and 4 in the AT condition. In an ANOVA analysis, no significant effects were found; improvement resulted from practice.

Time on task

The results on time on task are shown in Figure 7. Similar to the number of actions analysis, a new variable visual display condition was created.

5 A t -

~ 4

~ 3

~2

E 0

1

a. control b. experimental

. . . . . . . . . . . . . . i

VAT

2 3 4 task repetition

5

"~ 4

g ==2 C

E 0

1 2 3 4 task repetition

Figure 7: Mean time on task (minutes) and standard error. The left-hand panel (a) shows means for four tasks of the two control conditions• The right-hand panel (b) shows means for four tasks of the two experimental conditions.

An ANOVA with repeated measures was used with task repetition (2) as a within-subject independent variable and visual display condition (2) as a be tween-sub jec t independent variable, and time on task as the dependent variable. A main effect for task repetition was found to be nearly significant (F(1,21) = 4.32, p = 0.05). Fewer time was required to perform the second task. A significant main effect for visual display condition was found (F(1,21) = 9.78, p < 0.01). Participants who worked without a visual display required almost twice as much time than participants who worked with a visual display (mean: 2 min. 53 sec. (without display), 1 min. 36 sec. (with display)).

In order to compare differences over all four tasks, a MANOVA with repeated measures was used with task repetition (4) as a within-subject independent variable and conditions (4) as a between-subject independent variable. Time on task was the dependent variable. A significant main effect for task repetition was found (F(3,57) = 3.64, p <

3 8 1

Papers CHI 2 0 0 0 ° 1 - 6 APRIL 2 0 0 0

0.05). A linear trend in the data was significant (F(1,19) = 5.97, p < 0.05). Participants needed less time to compile each successive music programme (mean time on task across successive'tasks: 2 min. 25 sec., 2 min. 4 sec., 1 min. 53 sec. and 1 rain. 41 sec.). A significant task repetition by condition interaction effect was found (E(9,57) = 3.26, p < 0.005). When means were compared, it was found that task repetition 1 in the AT ~ VAT condition required more time than in the other conditions (F(3,19) = 4.17, p < 0.05).

The 'transfer' effect was investigated in the same way as for the number of actions analysis. An ANOVA with repeated measures in which transfer was treated as a between-subject variable, and task repetition (2) as a within-subject variable was carried out. Time on task was the dependent variable. A significant main effect for task repetition was found (F(1,17) = 10.12, p < 0.01). Participants needed less time to complete task repetition 2 and 4 than for respectively task repetition 1 and 3. A main effect for transfer was found to be just not significant (F(1,17) = 4.20, p = 0.06), however, participants who had worked with the visual display before needed 48 fewer seconds in a non-visual task, than participants who started to work without a visual display. Because no significant ' transfer' effect was found, any performance improvement resulted from practice.

Questionnaire score

Data on questionnaire score in the four conditions were re- coded to create two different visual display conditions. The results are shown in Figure 8. An ANOVA with repeated measures was used with kind of question (single-step and multiple-step) as a within-subject independent variable, and visual display condition (2) and experimental session (2) as between-subject independent variables. Questionnaire score was the dependent variable.

a. 1st questionnaire

o~ lo

"~ 7.5

~ e.5

0 without dis 9lay with displa

b. 2nd questionnaire

10

7.5

5

2.5

0

.[S].sipglg... [Z] .m.utupte.,

without display with display

Figure 8: Mean questionnaire score and standard error. The left- hand panel (a) shows means divided up in scores for single step and multiple step interactions (maximum: 10 both), across visual display conditions after two tasks. The right-hand panel (b) shows means after four tasks.

A significant main effect for experimental session was found (F(1,44) = 5.96, p < 0.05). Participants were better in completing the questionnaire for the second time (mean questionnaire score across successive sessions: 13.4, 15.0). A significant main effect for kind of question was found (F(1,44) = 17.76, p < 0.001). Participants were better in answering the questions concerning the multiple-step interactions than the questions concerning the single-step interactions (mean questionnaire score for kind of question: 6.7 (single-step), 7.6 (multiple-step)). An interaction effect for kind of question by visual display condition was found to

be significant (F(1,44) = 23.07, p < 0.001). Participants who had worked without a visual display were bet ter in answering questions concerning multiple-step interactions (mean questionnaire score concerning mul t ip le-s tep interactions for visual display conditions: 8.2 (without display), 7 (with display)).

Discuss ion Participants were instructed to program music as efficiently as possible without taking notice of personal music preferences. All twenty-four participants were able to perform a music programming task successfully, right from the start, with or without a visual display. They were given three minutes of free exploration with the interaction style before the tasks started. They had received no procedural instruction on how to work with the interaction style.

According to Hypothesis (i), music programming with a visual display should be more efficient than programming wi thout a visual d isp lay . The resu l t s showed that programming without a visual display needed significantly more time to complete. However, it did not require the execution of significantly more actions. On the basis of these results, Hypothesis (i) cannot be rejected.

The results also showed that part icipants were steep learners; they needed increasingly less time for successive task.

According to Hypothesis (ii), participants who have worked without a visual display should have a higher score on procedural knowledge than participants who have worked with a visual display. The results showed that participants who had worked without a visual display were better in answering questions about multiple-step interactions. On the basis of these results, Hypothesis (ii) cannot be rejected. Procedures had to be acquired and remembered explicitly, when there was no visual display was available.

According to Hypothesis (iii), participants who worked with a visual display in one condition and are subsequently transferred to another condition without a visual display should perform more efficiently than participants who start working without a visual display. The results showed that p e r f o r m a n c e i m p r o v e m e n t was caused by prac t ice , primarily, and not by previous experience with a visual display. In other words, participants who started to work without a display performed less efficiently than participants who had earlier experience with the display, only because they had performed fewer tasks. On the basis of these results, Hypothesis (iii) must be rejected.

In the first task, it also appeared that programming without a visual display took more time per action (mean time per action: 3.2 sec. (without display), 1.9 sec. (with display); t = -7.08, p < 0.001) and more lateral trackball movements (mean number of lateral movements: 19.1 (without display), 7.9 (with display); t = -3.11, p < 0.01), whereas other types of actions did not differ across visual display conditions. Lateral movements were actions to hop from one roller to another and hence were suitable for exploring the spatial relations between rollers. It is therefore likely that extra time and extra lateral ball movements are linked to acquisition of spatial and procedural knowledge when there is no visual

382 ~-- L~IIJ]I C ~ H T 2 O O O

CHI 2000 * 1-6 APRIL 2000 P a p e r s

display available. When there is no visual display and the user is unfamiliar with the interaction style, it is suggested that interaction follows a time-consuming dead-reckoning process in which each current state of interaction has to be inspected explicitly, using other means than vision, to be able to do further action. If, on the other hand, a visual display is available, which shows all information for interaction, actions can be planned ahead, which is more efficient.

CONCLUSION Instant usability is particularly important for interactive home devices , as these devices are typ ica l ly used intermittently without training. Therefore, any interaction style for music programming should be self-explanatory. Selecting and listening to music should also be possible without a visual display. Think of, for instance, the use of remote controls or portable devices, a dimly lit situation while having a party or the desire to enjoy music without a need for visual inspection of information.

The presented interaction style combines a manual input modality with various output modalities: the visual, tactual and auditory modalities. It consists of a self-explanatory visual roller metaphor Which is controlled by a force feedback trackball. Tactual and auditory cues are used to strengthen the impression of rollers, but also to enable music programming without a visual display.

The user evaluation assessed instant usabili ty of the interaction style with and without a visual display. Users were able to complete a given music programming task successfully with or without a visual display after three minutes of free explorat ion and without procedural instruction. Over time, they learned to perform the tasks more efficiently; less time and fewer actions were required for each successive task.

It is obvious that purposeful nonvisual interaction grows out of knowledge about the interaction space. For the interaction style, this knowledge includes declarative, procedural and spatial knowledge. Therefore, working without a visual display involves explicit acquisition and remembering of procedures and the spatial relationships of task objects. This acquisition limits efficientfirst-time use of the interaction style, but does not impede a successful completion of the task. Participants who worked with both versions of the in teract ion style prefer red a visual d isplay jus t for convenience. Only one participant preferred the displayless interface; he commented that 'it contains all attractive features that a novel has and a movie picture lacks; one is free to make an own interpretation and devise an own world.'

In summary, it has been demonstrated that users who work with only a tactual and auditory interface are able to operate a new interaction style successfully, only less efficient in time. Tactual and auditory feedback makes interaction possible in contexts-of-use in which information on a visual display is poorly legible or even absent. Working without a visual display is not a commonly preferred method of operation, due to the need of the explicit acquisition and remembering of procedures. Concluding, the multimodal

interaction style for music programming meets its user requirements on instant usability and optional use of a visual display. If hand-held trackball devices with force feedback mature to fully usable and low-cost input devices consuming only a little power, this interface to music programming may be an desirable feature on, for instance, jukeboxes, portable players, remote controls and car audio equipment.

ACKNOWLEDGMENTS We would like to thank Henk Korteweg for helping us with creating Figures 2 and 4.

REFERENCES 1. Consumer Reports (1997). Jukebox time? Consumer Reports,

62, 2, February 1997, 34-38.

2. Kumin, D., (1994). Ch-ch-changers! Stereo Review, 59, 60- 66.

3. Eggen, J.H., Haakma, R., and Westerink, J.H.D.M. (1996). Layered Protocols: hands-on experience. International Journal of Human-Computer Studies, 44, 45-72.

4. Veer, G.C., van der (1990). Human computer interaction: learning, individual differences, and design recommendations. Doctoral Thesis, Vrije Universiteit, Amsterdam, the Netherlands.

5. Hutchins, E. (1989). Metaphors for Interface Design. In: Taylor, M.M., N6el, F., and Bouwhuis, D.G. (Eds.). The Structure of Multimodal Dialogue, Amsterdam: North- Holland, 11-28.

6. Kieras, D.E., and Bovair, S. (1984). The role of a mental model in learning to operate a device. Cognitive Science, 8, 255-273.

7. Payne, S.J., (1988). Metaphorical instruction and the early learning of an abbreviated-command computer system, Acta Psychologica, 69, 207-230.

8. Shrager, J., and Klahr, D. (1986). Instructionless learning about a complex device: the paradigm and observations. International Journal of Man-Machine Studies, 25, 153-189.

9. Engel, F.L., Goossens, P.H., and Haakma, R., (1994). Improved efficiency through I- and E-feedback: a trackball with contextual force feedback, International Journal of Human-Computer Studies, 41, 949-974.

10. MacKenzie, I.S., Sellen, A., and Buxton, W. (1991). A comparison of input devices in elemental pointing and dragging tasks, Human Factors in Computing Systems: CH1'91 Conference Proceedings. New York: ACM.

11. Gaver, W. (1993). What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology, 5, 4, 285-313.

12. Hermes, D.J. (1998). Auditory material perception. IPO Annual Progress Report, 33, 95-102.

13. Rumbaugh, J., Jacobson, 1., and Booch, G. (1999). The Unified Modeling Language Reference ManuaL Amsterdam: Addison-Wesley.

14. Pauws, S.C. (2000). Music and choice: Adaptive systems and multimodal interaction, Doctoral Thesis, Eindhoven University of Technology, the Netherlands.

Tt-~'~ f~t,..,,,r-f/,.#R~- I S HC..L,'-~.L- 383

programming and enjoying music with your eyes closed

Documents