earwitness testimony: never mind the variety, hear the length

Earwitness Testimony: Never Mind the Variety,Hear the Length

SUSAN COOK* and JOHN WILDINGPsychology Department, Royal Holloway, University of London, UK

SUMMARY

Three aspects of voice recognition were investigated in the study reported here: memory forfamiliar voices, memory for the words spoken, and the relative effects of length and variationin a voice extract on long- and short-term memory. In Experiment 1, recognition memory forthe briefly heard voice of a stranger was superior with longer extracts (p50.01), but increasingvowel variety did not improve performance. This pattern was repeated for short-term memory(p50.01) in Experiment 2. Scores for the above task correlated significantly (p50.05) withscores for recognizing well-known voices. In a further test of well-known voice memory inExperiment 3, a weak and non-significant positive correlation (r=0.29) was found betweenmemory for well-known voices and memory for a once-heard voice. Memory for the wordsspoken did not correlate significantly with memory for the unknown voice itself. Thepossibilities of a memory-for-voices general ability, and forensic applications of the findingsare discussed. & 1997 by John Wiley & Sons, Ltd.

INTRODUCTION

`The major problem with identification evidence is that, unlike verbaltestimony, which can be submitted to the scrutiny of cross-examinationfor internal consistency and general plausibility, and can be comparedwith other testimony and circumstantial evidence, the witness in anidentification simply asserts that the accused is the person . . . there is nostory to dissect, no inconsistencies to reconcile . . .' (Shepherd, Ellis andDavies, 1982).

Witness identification evidence is widely admissible in courts in the common lawjurisdictions, including North America and the UK, despite the amount ofpsychological research in the past twenty years that has cast doubt on thereliability and stability of such evidence (e.g. Wells and Loftus, 1984). Witness

CCC 0888±4080/97/020095±17$17.50 Received 3 April 1996& 1997 by John Wiley & Sons, Ltd. Accepted 1 July 1996

APPLIED COGNITIVE PSYCHOLOGY, VOL. 11, 95±111 (1997)

*Address correspondence to: Susan Cook, Psychology Department, Royal Holloway, University ofLondon, Egham Hill, Egham, Surrey, TW20 0EX, UK.The work described was carried out with the help of ESRC grant R0042943069 to the first author. Theauthors thank the National Sound Archives for their help in obtaining voice samples. Many thanks arealso due to the schools and colleges who provided participants in these experiments. Most importantly,thanks also go to John Valentine for his statistical advice.

testimony can involve the aural as well as the visual modality. In an attempt to definemore ways to assess the validity of the witness testimony, a body of psychologicalwork has developed on both sides of the Atlantic (see Clifford, 1983; and Yarmey,1994, for useful reviews). This work has the potential to be useful in court casesinvolving aural identification evidence. There is a major difficulty in interpreting theliterature, as certain studies have been carried out on the recognition of previouslyfamiliar voices (see, for example, the voices of colleagues in Bricker and Pruzansky,1966, or classmates in Murry and Cort, 1971), and others on once-heard voices (suchas those reported in Legge, Grosmann and Pieper, 1984, and Roebuck and Wilding,1993). The authors could find no prior research comparing the two, andconsequently no indication of how, and whether, results from familiar voiceresearch might be relevant in assessing ability to recognize a once-heard voice again.The experiments reported, therefore, address the split between memory for a once-heard voice and well-known voice, the relative roles of length and variety of what isspoken, and the relative roles of memory for the words spoken and memory for thevoice itself.

Length or variety

One area of interest is the effects of the duration of the witnessing episode. Thecommonsense view is that the length of time for which you have heard someonespeaking should predict your ability to recognize their voice again (see, for example,Clifford, 1980, 1983, and Read and Craik, 1995). Unfortunately the length of speechsample is often confounded with its phonemic variety. The relative roles of utterancevariety and utterance length in predicting memory for a voice have been probed alittle by earlier workers (e.g. Bricker and Pruzansky, 1966; Murry and Cort, 1971;Pollack, Pickett and Sumby, 1954). From earlier work, therefore, it seems that thevariety of what is said may matter more than the length of the utterance, at least inrecognizing familiar voices.

The roles of length and variety may be different with a familiar voice and when aonce-heard voice is to be committed to memory. Legge et al. (1984) manipulatedboth speech sample length and retention interval, working with the voices ofstrangers, and found that length mattered. They found that subjects who had heard60 s of speech could correctly identify which of two speakers was the one previouslyheard on 70% of trials, whereas subjects who had heard only 6 s of speech performedbelow chance level at this task. In a more naturalistic study, Orchard and Yarmey(1995) found a strongly significant effect of utterance duration, using 30-s and 8-mindurations with a retention interval of 2 days. Roebuck and Wilding (1993) found aneffect of variety rather than length per se. This seems to lend support to the idea thatit is utterance variety rather than utterance length of the target sentence that mattersfor recognizing a once-heard voice again. Roebuck and Wilding investigated thelength versus variety question with unknown voices more systematically than earlierworkers had done. The current study aimed to replicate the Roebuck and Wildingfinding with a longer delay between initial listening and the voice recognition test.The Devlin Report on Identification Evidence found that in over 70% of cases ofwitness identification, the lineup in the police station took place within 9 days of theoriginal statement. A 1-week time delay was of much more direct applicationforensically than the no (or minimal) delay condition used by Roebuck and Wilding.

96 S. Cook and J. Wilding

There was also a concern with the original study, as with the work by McGehee(1937), that the subjects were listening to multiple voices, which may have set upsome interference effects between trials (see Thompson, 1985, for detailed criticismof McGehee's pioneering studies). In fact the subjects were listening to so manyvoices as to make the legal parallel difficult to draw. In the present study, therefore,it was decided that subjects should only be required to recognize two voices fromlineups: one male and one female. The loss of statistical power consequent on takingsuch a small number of measurements from each subject was felt to be a smallerproblem than the loss of psychological credibility in a task asking subjects torecognize a large number of items after a substantial delay. It was hypothesized that,as with Roebuck and Wilding (1993), recognition accuracy for once-heard voiceswould differ according to the length and the variety of the utterance heard.Sex differences in voice identification have been looked for sporadically by

experimental psychologists, and found only in some cases. In her classic (1937)study, McGehee found that men responded more accurately than women, whereasThompson (1985) found no substantial sex effects in this study of unknown voices.Roebuck and Wilding (1993) found that male voices were recognized better thanfemale voices, and a `same sex' interaction, where subjects performed better whenthey were of the same sex as the speaker. In this current study, therefore, differencesin performance between different genders of speaker and listener were examined. Noclear prediction in either direction was made, in the view of the earlier contradictoryfindings.

Known and unknown voices

Reading review articles in the area (Clifford, 1980, 1983; Yarmey, 1994, 1995) revealsthat there are many dichotomies along which voice recognition memory studiescould be split: for example, experimental and naturalistic; those using lineups andthose using paired stimuli; those using multiple targets and those using a singletarget; and those using target absent and those using target present lineups. Theauthors considered the division between those using once-heard voices, and thoseusing the voices of colleagues, classmates or a previously learned bank of voices tobeg the question of whether these two tasks were in any sense the same. The split inthe voice literature between known and unknown voice memory indicated that anassessment of the similarities and differences was overdue. A clearer understandingof the relation between memory for known and unknown voices might indicate howthe two strands of the literature relate. It seemed possible that some initial indicationof a general aptitude for voice memory tasks could arise from comparing memoryfor known and unknown voices. Early work suggested that certain groups ofindividuals may be better at recognizing voices than others: for example, musicians(McGehee, 1944) or blind people (Bull, Rathborn and Clifford, 1983). Stevens,Williams, Carbonell and Woods (1968) remarked on the large difference in subjects'abilities to identify voices. The question therefore arises as to whether there might bea general ability for voice recognition, and consequently a relation between subjects'performance on different voice memory tasks merits investigation. Experimentalresults on voice memory suggest that studies where the task is speaker identificationof someone you already know (e.g. Pollack et al. 1954) obtain higher correctrecognition rates than experiments where the task is to select a once-heard voice

Earwitness length and variety 97

from a lineup (e.g. Roebuck and Wilding, 1993). The question arising is whether therecognition memory is merely operating at a depressed level because of the taskdifficulty of recognizing a voice after only one learning exposure, or whetherrecognizing the voice of a person you know becomes a different sort of task altogether.

The current study therefore used famous voices to look for a general ability atvoice recognition tasks. The hypothesis was, therefore, that subjects who performedwell at recognizing a stranger's voice, and selecting it from a lineup, should alsoperform well on a test of recalling the names or detailed biographical information offamous people from hearing their voices.

Memory for words spoken

The third area addressed in this study was the relationship between memory for avoice and memory for the verbal information uttered by that voice. Work from atheoretical standpoint has raised questions about whether surface information, suchas the nature and qualities of the voice speaking, is totally lost when the meaning ofthe incoming speech signal is comprehended by the listener. This is also a questionthat would be of applied interest in a legal setting, where a subject may losecredibility if they claim that they can recall the voice speaking but not the wordsspoken. Some experimental work does suggest that memory for the voice speakingmay be integrated into the memory code for the words themselves (Craik andKirsner, 1974; Palmeri, Goldinger and Pisoni, 1993). Some experimental worktherefore points to the connection of memory for voice and words in explicitlyretrievable long-term memory. Because of the likely applied interest in the question,the relationship between the memory for the words spoken and memory for the voicespeaking them was explored in the current study. The hypothesis was that amoderate positive correlation would exist between memory for the words spokenand for the voice speaking those words.

EXPERIMENT 1

Method

DesignFor the main aspect of this study (henceforth known as the Unknown Voices Test) anindependent groups design was used. The amount of information present in thesentence was manipulated, with three conditions: short low variation sentences(henceforth called Short Unvaried); short high variation sentences (Short Varied) andlonger high variation sentences (Long Varied). Vowel sounds were selected as anindex of variability as in Roebuck and Wilding (1993), as containing more energythan consonants and being more open to variation (see Gelfand, 1990; O'Connor,1973; Roebuck and Wilding, 1993). The sentences used were a subset of those usedby Roebuck and Wilding. All sentences were recorded by both a male and a femalespeaker, and three different men and women were used in all as targets. Each speakerrecorded the three sentences used in each condition, i.e. each speaker was a target forone-third of the subjects. Thirty subjects served in each condition, approximatelyhalf of whom were female.


Additionally a correlational design was run, in which all subjects served. Subjectsparticipated in an identification test of famous voices (henceforth known as theFamous Voices Test), and the scores on this were correlated with the correctidentifications above on the Unknown Voices Test.A further correlational design was run, in which all subjects served (henceforth

known as the Words Memory Test). Scores for free recall of the text spoken in theUnknown Voices Test were correlated with scores for the correct identification ofvoices from that task.

SubjectsNinety subjects participated in this study, of whom 41 were male and 49 were female.All were students in local schools and sixth-form colleges, recruited through contactswith various teachers. The age range was 15:1±19:0 years, with an average age of17:0. Subjects were tested individually by one experimenter. None of the subjectsknew any of the speakers on the Unknown Voices Test tape.

MaterialsFor the Unknown Voices Test, sample voices were recorded digitally at 22 kHzsampling rate with 16 bit resolution using Media Vision Pro Audio Studio 16 soundcard and waveform editor on an IBM compatible personal computer with 486architecture. Tape playback during testing was carried out on a Sony portablecassette player.Short Unvaried sentences contained no more than 3 different vowel sounds, with

an average of 2.33 vowel sounds (e.g. These black sheep seem fat); Short Variedsentences contained between 5 and 6 different vowel sounds, with an average of 5.66(e.g. They still teach good songs); Long Varied sentences were formed of similarvowel variety but had a greater number of syllables than those in the short variedcondition, with an average of 6.0 different vowel sounds, and 11.33 syllables (e.g.They will sit on mats on the sand at the clean beach).

Procedure

Tape preparationÐfamous voices. For the Famous Voices Test, a pilot study wasconducted with 18 first-year psychology undergraduates, aged between 18 and 19years, who indicated, out of a list of 37 names of actors, broadcasters and politicians,those who were known to them. From those known to all of the pilot subjects, a tapeof suitable spoken materials was compiled with the help of the National SoundArchives. Copyright clearance was obtained through the BBC to use the extracts.The Famous Voices Test was assembled on high-quality audio tape, comprising 6extracts with a mean length of 11 syllables (i.e. as long as the Long Varied sentences),spoken by 6 celebrities (3 female and 3 male). (See Appendix 1 for a list of speakers.)

Tape preparationÐunknown voices. Voices of volunteers from within and aroundthe Psychology Department were used. Speakers were all aged between 18 and 25,and none had a marked regional accent. Voices were recorded in a soundproof room.Each speaker was recorded saying all nine of the test sentences. Voices were rated bysix independent raters along ten criteria selected from the `Voice Criteria Check List'developed by Handkins and Cross, 1985 (Yarmey, 1991). Only voices that were notrated as being extreme along the critical dimensions of pitch or rate of speaking were


eligible for use in lineups. Lineups were constructed using the editing facilities on thePro Audio computer setup and were then transferred to high-quality audio tape.Lineups were of six voices, all of the same gender as the target voice, and with mildsoutheast England accents, saying the same sentence as had been uttered by thetarget voice. Fifteen different male voices were used as foils to construct the 9 malelineups (3 target voices by 3 conditions), and 15 different female ones to constructthe female lineups. The target voice was positioned second, third, fourth or fifth ineach lineup (i.e. not first or last) in a pseudo-random manner. Each target voice wasused in one of the three sentences for each of the three conditions (i.e. each voiceappeared as a target on three occasions across the design, and was only heard atmost once by any given subject). Each target was set up with the same five voices inlineup each time it was used. No target absent lineups were prepared due to theexperimental complexity already inherent in the design.

Exposure and testing. The Famous Voices Test was administered, using standardinstructions and test forms, individually in a quiet room. Subjects were informed thatthey were participating in a `hearing' experiment, but were not alerted to the precisenature of the test questions that they would have to answer the following week. Theexposure to the stimuli of the Unknown Voices Test was then performed. For half ofthe subjects this order was reversed, and the Unknown Voices Test was administeredbefore the Famous Voices Test. The subject was then thanked for his/her help, andreminded to attend at the same time and place the following week. One week laterthe same experimenter returned to complete the Unknown Voices Test. The subjectthen wrote down anything that they could recall of the content of the UnknownVoices Test, for the Words Memory Test. The lineup phase of the Unknown VoicesTest was administered. Standard instructions told the subjects that the target voicemay or may not be present in the lineup (although in fact no target absent lineupswere used). The lineup for the unknown woman (or man in the case of 50% of thesubjects) was then administered in the same way.

Test marking. The Unknown Voices Test was marked by giving one mark for eachcorrect identification of the target voice from a lineup (i.e. maximum score=2).

The Words Memory Test was marked by giving one half mark for each word oridea that was present in the original sentence, up to a maximum of six.

The Famous Voices Test was marked by giving one mark for each correctidentification of a speaker, and a half mark for each identification that made itfairly sure that the voice was recognized without providing the name (e.g. `Thatman from the ``Carry On'' films' for Kenneth Williams). The maximum possiblescore was six.

Results

Results of the three tests carried out are reported below. The test of vowel varietyversus length is reported as Long-Term Unknown Voices Test. Sex differences areexamined separately. Associations between memory for known and unknown voicesare reported as Famous Voices Test. Finally, associations between memory for thevoice speaking and the words spoken are reported as Words Memory Test.


Long-Term Unknown Voices TestPerformance was generally quite poor, with a mean score across conditions of 38%,although still reliably above chance, (w2=23.35, df=1, p50.005).Performances in the short unvaried condition and in the short varied condition

were not as high as in the long condition (see Table 1).A Kruskal±Wallis one-way analysis of variance showed a main effect of sentence

type (H=8.18, n=90, p50.002).The long sentences led to more correct identifications of the target in the lineup

phase of the Long-Term Unknown Voices Test than either of the other twoconditions. A planned comparison, conducted according to the method oforthogonal contrasts described by Marascuilo and McSweeny, 1977, of both shortconditions against the long varied condition, showed this to be a significant effect oflength (w2=6.89, df=1, p50.01). The alternative planned comparison of the shortvaried condition against the short unvaried condition showed no significant effect ofvariety (w2=1.14, df=1, p40.05).

Gender differences. There did appear to be some differences relating to gender inthe data from the Long-Term Unknown Voices Test (see Table 2).Log linear modelling was used to analyse the data on gender differences. This

showed a two-way association between sex of listener and gender of the voice(w2=5.40, df=1, p=0.01). Performance was better on the same-sex voices.

Famous Voices TestScores were also quite low in the Famous Voices Test, with a mean score of 1.81(standard deviation 1.17) out of a possible maximum score of six. The highest actualscore of any individual subject was five out of six, and the lowest was zero.Three Spearman's rank correlations were carried out between the scores on the

Famous Voices Test and those on each condition of the Unknown Voices Test (seeTable 3). It was necessary to carry out three tests and then combine the results (seeFisher, 1954 for this technique) because of the significant difference between themeans of the treatment groups in the Unknown Voices scores.


Table 1. Mean and ranges of scores in the Long-TermUnknown Voices Test (n=30 for each condition, maximumscore=2)

Condition Mean Range

Short unvaried 0.72 0±2Short varied 0.43 0±2Long varied 1.03 0±2

Table 2. Numbers of correct identifications by gender of speaker and listener

Female voice Male voice Total correct

Female listeners 22 18 40 (n=49)Male listeners 9 20 29 (n=41)Total correct 31 38

The combination of the three correlations showed a relation between the testslooking at memory for known and unknown voices (w2=13.03, df=6, p50.05). Thatis to say that there was a tendency for the subjects who scored well on the UnknownVoices Test to score well on the Famous Voices Test also.

Words Memory TestScores on the Words Memory Test were also quite low, with a mean of 1.21 and astandard deviation of 1.63, although this was expected with what was, in effect, a testof verbatim recall after a week.Scores were correlated with scores on the Long-Term Unknown Voices Test by

three Spearman's rank correlations. When the three were combined they showedonly a very weak and non-significant positive correlation between the two (p40.05).

Discussion

Length or varietyIn contrast to Roebuck and Wilding (1993), there was a clear effect of length ratherthan variety in enhancing long-term memory for a once-heard voice. Additionalwords enhanced performance even though they did not add vowel variety. It may bethat the Roebuck and Wilding result was heavily influenced by the extra delay andinterference necessarily present in longer speech extracts over shorter speech extracts.This was not such a problem in the current study, because the time lag betweenpresentation and test was already 7 days, so it is scarcely plausible that the additionalseconds affected the result.

The subjects were slightly older in the Roebuck and Wilding study (average age of21 years instead of 17 years), but it is hard to conceptualize a device by which theresult should reverse so dramatically from one of length to one of variety with 4years of development. Another obvious difference between this study and theRoebuck and Wilding work was the use of a week-long gap between exposure andtesting instead of a few seconds. It is not clear why this should cause a reversal of theresult, but a retest of the current design with a short time lag would seem to be calledfor. This was the rationale for Experiment 2.

The data on gender did not support the idea that there is a consistent advantage indiscriminability of one sex of speaker over the other. There was, however, weaksupport for the same sex interaction hypothesis. This states that speakers are morereadily recognized by listeners of the same sex than the opposite, and this was bornout in the case of female (but not male) speakers.


Table 3. Correlations of scores for each condition on theUnknown Voices Test with scores on the Famous Voices Test(all one-tailed)

Short unvaried Short varied Long varied

Famous r=0.318 r=0.137 r=0.2p=0.043 p=0.230 p=0.15n=30 n=30 n=29

Memory for words spokenThere was only a weak and non-significant correlation between scores on the WordsMemory Test and those on the Unknown Voices Test. The stimuli were devised tocontrol for vowel variety and sentence length; consequently, they did not muchresemble naturally occurring speech. This may have accounted for the lack ofcorrelation between the recognition memory for the voices and the recall of thewords. There is, however, no reason to suppose that the words spoken in thecommission of a crime are necessarily (depending on the crime) conversational. Arecall test was used for the words spoken, and a recognition one for the voicespeaking, which did not improve the prospects of finding a strong relationship.These somewhat inconsistent measures were chosen, however, because in a forensicsituation a witness is likely to have to recall what was said then recognize a suspectpossibly from a lineup.

Known and unknown voicesScores on the Famous Voices Test correlated significantly and positively with scoreson the Long-Term Unknown Voices Test. This supported the hypothesis that thesetwo tests would demonstrate some general propensity for voice recognition memory.If supported more generally, this connection could be of use forensically, showing ageneral level of competence at voice recognition that could be used to modify thecredibility of an individual's earwitness testimony (in either direction). Further workwould be needed to find a voice memory test with enough predictive power to statesomething about an individual's voice memory ability. In a recent Court of Appealcase in England (Appeal in the case of Robert McCheyne Robb, 1991), a phoneticsexpert was held to be an expert in comparing the voices on two tapes despite thedependence on

`auditory techniques alone . . . By that he meant that he listened, carefullyand repeatedly, to the disputed tapes and the control tape. He paid closeattention to the features of voice quality and voice pitch and thepronunciation of vowels and consonants to see if there were anysignificant discrepancies between the disputed tapes and the control tape.'(1991) 93 Criminal Appeal Reports.

This legal ruling that one man's ability to listen to a taped voice could be sosuperior to the general public's ability to listen to that same voice as to constituteexpert evidence begs the important psychological questions of familiarity and ageneral memory-for-voices ability. The correlation in the present study, althoughsignificant at the 0.05 level, was not huge, accounting for approximately 7.3% of thevariance. It was appreciated from the outset that there was a slight problem with thetask consistency in looking for a relationship between the task of correctlyidentifying which of six speakers you have heard before and the task of supplying aname for the famous person who is speaking. The first involves recognition, whichneed only be mediated by familiarity or feeling of knowing, whereas the secondrequires recognition plus the recall of personal information such as name. In view ofthe applied stance being taken this was considered to be unavoidable because thelegal parallel that was being drawn would be of a witness who had heard someonespeaking in the commission of a crime, and was given evidence as to whether theyrecognized the voice of a suspect as being the one they had heard before. It was


hoped that if a general memory for voices ability could be established by looking atperformance on a famous voice recognition task, then this could also be of use inassessing the voice memory abilities of a witness in a legal setting. There is goodreason to hope that a test with a greater number of items (and consequent increasedpower) may result in an even higher level of correlation. This points to the idea thatthere may be a memory for voices ability that is transferable across at least somedifferent types of memory test, and could be distinguishable from at least some typesof verbal memory. The importance of this idea to at least one legal ruling and theencouraging nature of this initial study form the rationale for extending the famousvoices research in Experiment 3. There was also still reason to relate experimentalfindings using known and unknown voices to each other.In conclusion, therefore, the Famous Voices Test appears to offer promising

avenues for further practical and theoretical research. The Unknown Voices Testseems to require further short-term memory replication to clarify its interpretation.

EXPERIMENT 2

The results of the Unknown Voices Test in Experiment 1 had not corroborated theearlier findings of Roebuck and Wilding (1993). In fact the result showed a stronglysignificant effect in the opposite direction (p50.01): that is, that length of utterancewas far more important for accurate voice recognition memory than variety in thesentences used. There were differences from the original study both in the actualvoice materials used (the original speakers no longer being available) and in thenumber of trials that each subject had to perform. In the original Roebuck andWilding study subjects performed 14 trials, whereas there were only 2 in Experiment1 (1 male voice and 1 female voice). The other and most major difference was the useof a week's delay (in Experiment 1) rather than immediate test (in the study byRoebuck and Wilding), and Experiment 1 was therefore repeated with short- ratherthan long-term memory.

Method

SubjectsFifty-four subjects participated in Experiment 2. There were roughly equal numbersof men and women with a mean age of 20 years 2 months. All subjects were attendersat Open Days at the Psychology Department at Royal Holloway College. None ofthe speakers on the test tapes were personally known to the subjects.

MaterialsThe voices and sentences used were as in the Unknown Voices Test of Experiment 1.

ProcedureThe procedure was as in the Unknown Voices Test of Experiment 1. The subjectswere tested in small groups, of approximately five, instead of singly. Subjects alsoheard the lineup immediately after the target voices, in this Short-Term UnknownVoices Test.


Results

Performance was generally better than in the Long-Term Unknown Voices Test,with a mean score across conditions of 75%. Performances in the Short UnvariedCondition and in the Short Varied Condition were not as high as in the Long VariedCondition (see Table 4).A Kruskal±Wallis one-way analysis of variance showed a main effect of sentence

type (H=7.04, n=54, p50.01).The long sentences led to more correct identifications of the target in the lineup

phase of the Unknown Voices Test than either of the other two conditions, and aplanned comparison of short versus long groups showed this to be a significant effectof length (w2=8.85, df=1, p50.005). The alternative contrast comparing shortvaried and short unvaried groups showed no significant effect of variety (w2=0.59,df=1, p40.05).

Discussion

Experiment 2 confirmed the finding of the Long-Term Unknown Voices Test inExperiment 1, and showed that length of utterance enhanced performance at voicerecognition but additional vowel variety did not. This ruled out the possibility thatthe discrepancy between the result on the Unknown Voices Test and the findings ofRoebuck and Wilding (1993) were brought about by the change from short-term tolong-term memory work.Another major and possibly important design difference between the Roebuck

and Wilding study and Experiment 1 was the number of voices that each participantheard. In view of the well-documented additional difficulty of recognizing voicesafter longer time lags (e.g. Legge et al. 1984), and the criticism of the use of multiplevoices in the McGehee studies (e.g. Thompson, 1985), fewer subjects heard fewervoices in the present experiment than they had in the Roebuck and Wilding work. Inthe Roebuck and Wilding (1993) study each subject heard 7 male and 7 femaletargets, and 14 lineups of 6 (the target and 5 distractors). The authors state that ofthe 16 male and 16 female voices that they used:

`No single voice was used more than five times in any condition, orappeared as a target more than once in any condition'.

This means that participants were hearing voices repeatedly across sentences, and upto 5 times during the 15min of testing. This amount of interference is highly likely toinfluence the performance on the task, and it seems most plausible that the variety of


Table 4. Mean and range of scores in the Short-TermUnknown Voices Test (n=18 for each condition,maximum score=2)

Condition Mean Range

Short unvaried 1.39 0±2Short varied 1.28 0±2Long varied 1.89 1±2

hearing a new vowel sound could release the subject from interference and enhanceperformance on that set of stimuli. Another important feature of the procedure usedby Roebuck and Wilding is that subjects could have a target voice that hadpreviously served as a foil on an earlier lineup. This means that a subject could notrely on a feeling of familiarity alone in completing the Roebuck and Wilding task, inthe same way that they could do in the Unknown Voices Task. In the Roebuck andWilding task, a subject would have to be able to distinguish the particular voice thatthey had just heard from other voices that they had also heard at some point in thetest phase. It seems plausible that hearing something for longer would improve itsfamiliarity, but that hearing a greater variety of vowels in the speech would improveits ease of discrimination. This could be one potentially important reason for thediscrepancy between the relative effects of length and variety of utterance in the twotasks.

EXPERIMENT 3

The results of the Famous Voices Test in Experiment 1 gave weight to the idea thatthere was a link between memory for known and unknown voices, and, therefore,another famous voices experiment seemed necessary.

One possible drawback with the Famous Voices Test in Experiment 1 was thediscrepancy in the nature of the scores being correlated. The Famous Voices Testscores reflected recall that had to include the recall of some personal data or a name,whereas the Unknown Voices Test scores reflected recognition that need only includea feeling of familiarity that the subject had heard that voice before. In terms of theBurton, Bruce and Johnston (1990) Interactive Activation model (IAC), as revisedBurton and Bruce (1992), recognition therefore required no access to PersonalInformation Nodes (PINs) or Semantic Information Nodes (SIUs). It must beconceded at this point that the IAC model does not make explicit reference to theexistence of Voice Recognition Units (presumably VRUs) as well as FaceRecognition Units (FRUs), but an earlier version of this family of models doesmake explicit the assumption of the existence of such a voice recognition route(Bruce and Young, 1986).

A new task was therefore designed so that the subject was required to say which ofseveral voices were the familiar ones. The scores on this Famous Voices Test II wereexpected to correlate positively and moderately with the scores on the UnknownVoices Test.

Method

SubjectsTwenty-seven sixth-form students were used, with a mean age of 16.66 years. Themajority of these (21) were female.

MaterialsThe voices and sentences used for the Unknown Voices tape were as used inExperiments 1 and 2. The tape used for the Famous Voices Test II was prepared asbelow.


Procedure

Tape preparation. A new Famous Voices Test Tape (Famous Voices Test II) wasprepared. The voices of seven celebrities, previously recorded from the NationalSound Archive with copyright approval from the BBC as obtained in Experiment 1,were used. Voices of seven foils were collected on high quality audio tape. The foilswere selected so as to have a similarly wide range of ages, accents, pitch and speakingspeed as the famous voices. The 2 sets of voices were edited together into a lineup of14. Order was established randomly (see Appendix 2 for identities of speakers).

Testing. The procedure for the Unknown Voices Test was as in Experiments 1 and2. Only long varied sentences were used, as there was no need to replicate the lengtheffect that had been found twice. The subjects were tested in large groups of around14. The Unknown Voices lineup was heard after a delay of 1 week. After theUnknown Voices Test, subjects completed the Famous Voices Test II (order wasreversed for 50% of subjects). Standard instructions were used. Subjects indicatedwhether each of 14 voices was famous or unknown. It was ascertained that allsubjects had been able to hear the tapes, that all subjects had heard of all of thecelebrities whose voices were used, and that none of them had recognized anunknown voice as belonging to a friend or neighbour.

Test marking. The Famous Voices Test II was marked in a way so as to allow theuse of signal detection theory. All correct answers were counted (i.e. correctrejections of non-famous targets and hits of famous targets). All `famous' responseswere counted for each subject. The correct famous responses (hits) were expressed asa proportion of the total number of famous targets (7). The incorrect famousresponses (false alarms) were expressed as a proportion of the total number of thenon-famous targets (7). The probabilities were then looked up in summarized tablesof d ' and b (Wilding, 1982).The Unknown Voices Test was marked as in Experiment 1 (maximum score=2,

minimum=0).

Results

The mean correct number of responses on the Unknown Voices Test II was 10.04(range=7±13) out of a possible 14. The mean score on the Unknown Voices Test was0.93 (range=0±2) out of a possible 2.The d ' scores on the Famous Voices Test II were correlated with the Unknown

Voices scores using Spearman's rank correlation, giving a weak positive correlationthat narrowly failed to achieve statistical significance (r=0.29, n=27, p=0.068, one-tailed).

Discussion

The results of Experiment 3 showed that there was a similarly sized positivecorrelation between memory for famous and once-heard voices in this experiment asin Experiment 1. The small sample size (27) did not give the test a lot of power, andthe correlation was not significant with this sample, but the size and direction of thecorrelation does not suggest that the Famous Voices Test II is producing radicallydifferent results to the first Famous Voices Test.


GENERAL DISCUSSION

This set of experiments suggests that there is an important role of length of utteranceregardless of variety in both long-term and short-term memory for a once-heardvoice. It is worth stressing that the experiments conducted here are reflections of thecommonest forensic situation: that of a witness who attends a lineup of voices a fewdays after first hearing a stranger's voice (Devlin Report, 1976). The situationreflected in the Roebuck and Wilding (1993) study was so short-term as to beunrealistic forensically (see, for example, Hollien, 1990) and also involved such agreat number of target voices (seven male and seven female) as to stretch thecredibility of any witness. McGehee's (1937) ground-breaking study has beencriticized on the grounds of interference effects (e.g. Thompson, 1985), and Roebuckand Wilding's study is also open to criticism on this front. The two target results (onemale and one female) presented here, therefore, seem to be preferable for bothforensic and psychological credibility.

The other main finding suggests that there is a connection, but not an identity,between the tasks of remembering a once-heard and a well-known voice. The failureto find huge correlations between scores on Unknown and Famous Voices Tests maysuggest that there may be a fundamental change that takes place when a voicebecomes known. More work looking to see how voice memory is affected by thepresence of more information about the person is needed, to establish how buildingup theoretical concepts, such as the Person Information Node, may affect theworking of the voice recognition system in practice.

The question of how to reconcile the two strands of the voice memory literature,based on familiar and unfamiliar voices, remains, and seems to be the key tounderstanding models of voice-mediated person recognition. Much of the work fromwhich variety was supposed to be more important than length per se in voicerecognition was based on the recognition of friends and relatives (e.g. Bricker andPruzansky, 1966; Compton, 1963; Doehring and Ross, 1972; Murry and Cort, 1971;Pollack et al., 1954; Stevens et al., 1968, who all used familiar voices as theirmaterials). Legge et al. did use unfamiliar voices as their materials, but their smallesttarget set was still rather large (at five voices). The two-alternative forced-choicerecognition test that they used is also hard to compare to the forensic situation(Yarmey, Yarmey and Yarmey, 1994). The study that has also used the most legallycompelling format of a single voice with a long-term memory test found a significanteffect of utterance length (Orchard and Yarmey, 1995), but the relative roles oflength and variety were not one of the areas that the work sought to address.

The issue of how to compare familiar and unfamiliar voice memory is also ofimportance forensically. In the Court of Appeal Ruling in Regina V Turnbull (1976),which has required all subsequent trial judges to warn juries in England and Walesabout the dangers of convicting on identification evidence (specifically dealing withvisual identification), the distinction between recognizing a friend and a stranger ismade explicit.

`Recognition may be more reliable than identification of a stranger; buteven when the witness is purporting to recognise someone whom heknows, the jury should be reminded that mistakes in recognition of closerelatives and friends are sometimes made.' Criminal Appeal ReportRegina V Turnbull (1976)


The authors feel that after 20 years it is time for psychologists to offer moreinformation on the similarities and relative reliability of the two tasks.

REFERENCES

Baddeley, A. & Woodhead, M. (1982). Depth of processing and face recognition. CanadianJournal of Psychology, 36, 148±164.

Bricker, P. D. & Pruzansky, S. (1966). E�ects of stimulus content and duration on talkeridenti®cation. The Journal of the Acoustical Society of America, 40, 1441±1449.

Bruce, V. & Young, A. (1986). Understanding face recognition. British Journal of Psychology,77, 305±327.

Bull, R., Rathborn, H. & Cli�ord, B. R. (1983). The voice-recognition accuracy of blindlisteners. Perception, 12, 223±226.

Burton, A. M. & Bruce, V. (1992). I recognize your face but I can't remember your name: asimple explanation? British Journal of Psychology, 83, 45±60.

Burton, A. M., Bruce, V. & Johnston, R. A. (1990). Understanding face recognition with aninteractive activation model. British Journal of Psychology, 81, 361±380.

Cli�ord, B. R. (1980). Voice identi®cation by human listeners: on earwitness reliability. Lawand Human Behaviour, 4, 373±394.

Cli�ord, B. R. (1983). Memory for voices: the feasibility and quality of earwitness evidence. InS. M. A. Lloyd-Bostock and B. R. Cli�ord (Eds.), Evaluating witness evidence (pp. 189±218).Chichester: Wiley.

Compton, A. J. (1963). E�ects of ®ltering and vocal duration upon the identi®cation ofspeakers, aurally. The Journal of the Acoustical Society of America, 35, 1748±1752.

Craik, F. I. M. & Kirsner, K. (1974). The e�ect of speaker's voices on word recognition.Quarterly Journal of Experimental Psychology, 26, 274±284.

Devlin, Lord P. (1976). Report to the Secretary of State for the Home Department of theDepartmental Committee on the Evidence of Identi®cation in Criminal Cases. HMSO.

Doehring, D. G. & Ross, R. W. (1972). Voice recognition by matching to sample. Journal ofPsycholinguistic Research, 1, 233±242.

Fisher, R. A. (1954). Statistical Methods for Research Workers. London: Oliver and Boyd.Gelfand, S. A. (1990). HearingÐan introduction to psychological and physiological acoustics.

New York: Marcel Dekker.Hollien, H. (1990). The Acoustics of CrimeÐthe new science of forensic phonetics. New York:Plenum.

Legge, G. E., Grosmann, C. & Pieper, C. M. (1984). Learning unfamiliar voices. Journal ofExperimental Psychology: Learning Memory and Cognition, 10, 298±303.

Marascuilo, L. A. & McSweeney, M. (1977). Non-parametric and distribution-free Methods forthe Social Sciences. Monterey: Brooks/Coles.

McGehee, F. (1937). The reliability of the identi®cation of the human voice. The Journal ofGeneral Psychology, 17, 249±271.

McGehee, F. (1944). An experimental study of voice recognition. The Journal of GeneralPsychology, 31, 53±65.

Murry, T. & Cort, S. (1971). Aural identi®cation of children's voices. The Journal of AuditoryResearch, 11, 260±262.

O'Connor, J. D. (1973). Phonetics. Harmondsworth: Penguin.Orchard, T. L. & Yarmey, A. D. (1995). The e�ects of whispers, voice-sample duration, andvoice distinctiveness on criminal speaker identi®cation. Applied Cognitive Psychology, 9,249±260.

Palmeri, T. J., Goldinger, S. D. & Pisoni, D. B. (1993). Episodic encoding of voice attributesand recognition for spoken words. Journal of Experimental Psychology: Learning Memoryand Cognition, 19, 309±328.

Pollack, I., Pickett, J. M. & Sumby, W. H. (1954). On the identi®cation of speakers by voice.The Journal of the Acoustical Society of America, 26, 403±406.

R. V. Robb (1991) 93 Criminal Appeal Report 161.


R. v. Turnbull [1977] 1. QB. 224.Read, D. & Craik, F. I. M. (1995). Earwitness identi®cation: some in¯uences on voice

recognition. Journal of Experimental Psychology: Applied, 1, 6±18.Roebuck, R. & Wilding, J. (1993). E�ects of vowel variety and sample length on identi®cation

of a speaker in a line-up. Applied Cognitive Psychology, 7, 475±481.Shepherd, J. W., Ellis, H. D. & Davies, G. M. (1982). Identi®cation Evidence: A psychological

evaluation. Aberdeen: Aberdeen University Press.Stevens, K. N., Williams, C. E., Carbonell, J. R. & Woods, B. (1968). Speaker authentication

and identi®cation: a comparison of spectographic and auditory presentations of speechmaterial. The Journal of the Acoustical Society of America, 44, 1596±1607.

Thompson, C. P. (1985). Voice identi®cation: speaker identi®ability and a correction of therecord regarding sex e�ects. Human Learning, 4, 19±27.

Wells, G. L. & Loftus, E. F. (Eds.). (1984). Eyewitness testimony: Psychological perspective.Cambridge: Cambridge University Press.

Wilding, J. W. (1982). Perception, from sense to object. London: Hutchinson.Yarmey, A. D. (1991). Descriptions of distinctive and non-distinctive voices over time. Journal

of the Forensic Science Society, 31, 421±428.Yarmey, A. D. (1994). Earwitness evidence: memory for a perpetrator's voice. In D. F. Ross,

J. D. Read, and M. P. Toglia (Eds.), Adult eyewitness testimony: current trends anddevelopments (pp. 100±124). Cambridge: Cambridge University Press.

Yarmey, A. D. (1995). Earwitness speaker identi®cation. Psychology, Public Policy, and Law,1, 792±816.

Yarmey, A. D., Yarmey, A. L. & Yarmey, M. J. (1994). Face and voice identi®cations inshowups and lineups. Applied Cognitive Psychology, 8, 453±464.

APPENDIX 1

Famous Voices Test Tape I.KENNETH WILLIAMS (Comedy Actor)

`Ì think the first time I ever went on a cruise was '63 . . . ''FELICITY KENDALL (Actress)

`Ùmmm, yes, and the people are all supposedly very modern and some of themare very interesting and complicated . . . ''JOHN CLEESE (Comedian)``Not quite so much in the judgemental sense, but because it becomes such a

pleasure when you get somebody like the girl who sold me a pair of shoes two weeksago who really knew what she was doing . . . ''PRINCESS DIANA (English Royal family)

`Ìn Liverpool, for instance, it was the family of a profoundly handicappedteenager . . . ''GLORIA HUNNIFORD (Broadcaster)

``But was it always in a way Elvis that you had at the back of your mind in termsof presentation? . . . ''JOHN MAJOR (Prime Minister)

``Yes it was, it was certainly that, it was very interesting . . . ''

APPENDIX 2

Famous Voices Test Tape II.KENNETH WILLIAMS


UNKNOWN MALE 1 (in his twenties, S.E. accent)UNKNOWN FEMALE 1 (in her thirties, S. E. accent)PRINCESS DIANAJOHN MAJORUNKNOWN MALE 2 (in his forties, London accent)FELICITY KENDALLJOHN CLEESEUNKNOWN FEMALE 2 (in her thirties, S.E. accent)GLORIA HUNNIFORDUNKNOWN MALE 3 (in his fifties, London accent)UNKNOWN FEMALE 3 (in her twenties, Irish accent)MARGARET THATCHERUNKNOWN FEMALE 4 (in her forties, S. E. accent)


earwitness testimony: never mind the variety, hear the length

Documents