documentation of the danish emotional speech database deskom.aau.dk/~tb/speech/emotions/des.pdf ·...

31
Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg & Anya Varnich Hansen e-mail: [email protected] &[email protected] Center for PersonKommunikation Department of Communication Technology Institute of Electronic Systems Aalborg University Denmark

Upload: others

Post on 22-Mar-2020

34 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of theDanish Emotional Speech Database

DES

Inger Samsø Engberg & Anya Varnich Hansene-mail: [email protected] &[email protected]

Center for PersonKommunikationDepartment of Communication Technology

Institute of Electronic SystemsAalborg University

Denmark

Page 2: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

1

Preface

This report is the documentation of the design, recording and verification of the Danish Emotional Speechdatabase (DES). The database is recorded for Center for PersonKomunikation (CPK), Aalborg University,Denmark, as a part of the VAESS project (Voices, Attitudes and Emotions in Speech Synthesis). Thereport is written by Anya Varnich Hansen and Inger Samsø Engberg, CPK. The work done by GudrunKlasmeyer, TUB (Technical University of Berlin) at CPK in July to October 1995 and described in thereport: "Emotions in Speech" [Klas] has also been used.

Questions regarding the contents of the report can be addressed to the authors:Inger S. Engberg & Anya V. [email protected] or [email protected]

Center for PersonKommunikationAalborg UniversityFredrik Bajers Vej 7 A2DK 9220 Aalborg Øst

Anya V. Hansen & Inger S. Engberg, Aalborg September 1996

Page 3: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

2

Table of Contents

PREFACE................................................................................................................................................................................... 1

TABLE OF CONTENTS ......................................................................................................................................................... 2

1. VAESS-PROJECT REQUIREMENTS TO THE DATABASE...................................................................................... 4

2. CONSIDERATIONS ABOUT CHOICE OF SPEAKERS AND PROMPTING TEXT............................................. 5

2.1 SPEAKERS...........................................................................................................................................................................52.2 PROMPTING TEXT..............................................................................................................................................................6

3. RECORDING OF THE DATABASE................................................................................................................................. 7

3.1 PROMPTING TEXT..............................................................................................................................................................73.2 EMOTIONS ..........................................................................................................................................................................83.3 RECORDING CONDITIONS...................................................................................................................................................8

4. LISTENING TEST................................................................................................................................................................ 9

4.1 PERFORMING THE LISTENING TEST..................................................................................................................................94.2 RESULTS FROM THE LISTENING TEST............................................................................................................................10

5 PROCESSING THE DES DATABASE...........................................................................................................................12

5.1 LABELLING .......................................................................................................................................................................125.2 TRANSFERRING THE SPEECH DATA................................................................................................................................12

6. REFERENCES.....................................................................................................................................................................13

APPENDICES

A. UTTERANCES TO BE RECORDED..............................................................................................................................14

A.1 SINGLE WORDS ..............................................................................................................................................................14A.2 SENTENCES .....................................................................................................................................................................14A.3 PASSAGES .......................................................................................................................................................................14A.4 UTTERANCES FOR THE TARGET VOICES ......................................................................................................................16A.5 UTTERANCES ONLY FOR THE FEMALE TARGET VOICE ...............................................................................................18

B. RECORDING CONDITIONS..........................................................................................................................................21

B.1 RECORDING PROCEDURE ................................................................................................................................................21B.2 RECORDING ENVIRONMENT ............................................................................................................................................22B.3 RECORDING EQUIPMENT .................................................................................................................................................22

C. SPEAKER PROFILE.........................................................................................................................................................23

D. LISTENING TEST.............................................................................................................................................................25

D.1 PROCEDURE FOR THE LISTENING TEST.........................................................................................................................25D.2 QUESTIONNAIRE FOR THE LISTENING TEST..................................................................................................................26

E. THE DES DATA BASE CD-ROM...................................................................................................................................28

E.1 DIRECTORY STRUCTURE OF THE CD-ROM .................................................................................................................28E.2 THE FILE NAMING ...........................................................................................................................................................29

F. LABEL STATISTICS........................................................................................................................................................30

Page 4: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

3

Page 5: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

4

1. VAESS-project Requirements to the Database

The aim of the VAESS-project is to develop improved quality and range of synthetic voices including arange of emotions [Tide]. The improved voices will be included in a small powerful handheld personalcommunicator by which speech-disabled people are able to control the voice quality and the emotionsproduced by the speech synthesiser.

One of the objectives of the VAESS-project is to provide sufficient speech data with complete andaccurate labels to allow a systematic study of inter-speaker and inter-attitude variations in speech [Tide].The Danish EUROM.1 database contains 60 speakers and has got an almost even distribution regardinggender, age and phonemes [Lind]. The existing EUROM.1 database thus offers sufficient data to studyinter-speaker variations. In order to study inter-attitude variations a Danish Emotional Speech database(DES) should be recorded. The DES database should contain 4 speakers (2 male and 2 female) expressing5 emotions (neutral, surprise, happiness, sadness and anger), each for 30 sec, thus totalling 10 min. ofDanish emotional speech.

In the project two new synthetic voices must be developed, these are called target voices. The DESdatabase must contain sufficient data for each target voice, both for training and evaluating thesynthesiser. In order to ensure sufficient data for the target voices, additional recordings with a neutralvoice must be performed.

We have chosen to combine the recordings of the emotional speech and the recordings of the targetvoices. When choosing one of the actors voices as the target voice direct comparison is possible alsobetween natural and synthetic emotional speech.

Page 6: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

5

2. Considerations about Choice of Speakers and Prompting Text

2.1 Speakers

For parameter analysis undistorted speech signals without background noise are required. In order toinvestigate emotional speech as deviation from neutral speech it is necessary to record the same utterancein different emotional situations. In consequence the recording has to be done systematically underlaboratory conditions.

In some psychological experiments, it has been tried to induce specific emotions into test persons [Murr],but for ethical reasons it is undesirable to induce negative emotions into test persons. As a consequencethe emotional speech had to be spoken by actors. That emotions simulated by actors are a goodapproximation to true emotional speech, is shown in [Will]. In [Will] recordings of a speaker reportingfrom a dramatic event was compared with recordings of an actor simulating the reporter's emotional stateduring the event. Differences between the recordings were found, but in general the mode of speakingand the fundamental frequency range and variation were alike.

It is however not advisable to use stage actors, becausethey tend to exaggerate some features to make theemotional content very clear which makes theutterances sound unnatural. Four actors familiar withradio theatre were employed for the recording of DES,see Table I. The speaker profile for each of the actorsis given in Appendix C

Initials Gender AgeDHC Female 34KLA Female 52JZB Male 38HO Male 52

Table I. Gender and age for the four actors usedin collecting DES.

Page 7: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

6

2.2 Prompting Text

Emotions can confidentially be recognised in very short utterances like "Yes" or "No" [Klas]. This meansshort sentences or even single words are appropriate to analyse emotional features in speech. But it can beinteresting to analyse passages of "fluent" speech to study pauses and specific emotional sounds likelaughter or sighs. It is best to choose some utterances which appear often in everyday communication.

It is very unclear how emotions are perceived by listeners. In normal communication situations thelistener does not make really conscious decisions about the speakers emotional state. This has to be takeninto consideration when a listening test with native listeners is designed. If they are asked to judge theemotional content of an utterance a conscious decision is necessary. The listeners will first try to makethis decision from the semantical meaning. If this is not possible, they try to remember situations, inwhich the utterance could appear. If this is also not possible it becomes more difficult for the listeners tomake confident decisions. Therefore the prompting text should be semantically neutral, this means itshould not imply a specific emotional meaning.

In order to provide enough speech data for the two target voices additional recordings similar to the onesin EUROM.1 will be performed. Each target voice should speak at least 8 passages and 10 sentences. Apassage consists of 5 task related sentences whereas the blocks of 5 sentences are designed tocompensate for the uneven phoneme distribution in the passages. The sentences within a compensatingblock are not task related. These additional passages and sentences are taken from EUROM.1 [Lind],where the design of the compensating sentences was guided by an analysis of the diphone distribution ofphonotypical transcriptions of the passages.

Page 8: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

7

3. Recording of the Database

From the requirements to the recording of the database stated in [Tide] and described in Section 1together with the considerations above, it was decided to record and test the database as described here.

3.1 Prompting Text

For the DES the following is recorded:

- 2 single words- 9 sentences and- 2 passages of fluent speech.

The utterances above are spoken by the four actors once for each of the five emotions.

The two target voices should also record:

- 8 passages- 10 sentencesspoken with a neutral voice.

The words, sentences and passages can be found in Appendix A. A phonotypical transcription of theDanish text is also included in Appendix A together with a translation. The phonotypical transcription wasdone by Tom Brøndsted, CPK.

Besides the extra neutral utterances spoken by the two target voices HO and DHC, additional recordingswere performed. With the female target voice DHC, 12 extra passages and 15 extra sentences wererecorded. The extra passages and sentences were also taken from EUROM.1 and can be found inAppendix A5.

With the other male voice JZB, the 8 passages and 10 sentences for the target voices were also recorded.The last female voice KLA made no additional recordings.

Page 9: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

8

3.2 Emotions

Each utterance should be spoken with each of the emotions under investigation. In the VAESS-project thefollowing emotions will be investigated:

- Neutral- Surprise- Happiness- Sadness- Anger

It is advisable to read all the utterances with one emotion and then change the emotion and start overagain. In that way the actors will not have to change emotions more than five times.

3.3 Recording Conditions

DES was recorded in an acoustically damped sound studio at Aarhus theatre [Aarh]. A high qualitymicrophone was used, which did not influence the spectral amplitude or phase characteristics of thespeech signal. Between the operator room and the recording room, a window was placed so that theactors and the operators could see each other at all times. The operators could get in contact with theactor via an intercommunication system. In addition, the operators were continuously listening to theactors via the recording chain. Two operators were present all the time during the recordings to reducefatigue effects. During the recordings the one operator was listening to the actor (aiming at discoveringthe speaking errors) and the other was controlling the recordings and the equipment. In Appendix B therecording procedure and equipment is described in more detail.

Page 10: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

9

4. Listening Test

As mentioned in Section 2.2 it is very important to collect speech with unambiguous emotional content.This is guarantied by a listening test, in which listeners evaluate the emotional content of the recordedutterances.

There are considerable differences concerning the recognizability of different emotions in spokenutterances. The results from a study published by Scherer in 1995 using 14 different emotions variedfrom 78% recognition for hot anger down to 15% for disgust [Sche].

In a Swedish listening test described in [Öste], 6 emotions were correctly identified in 81% of the cases.In [Öste] only 2 actors were used speaking 6 different emotions on 6 sentences. This results in a muchsmaller amount of data from which 72 sentences were selected for the listening test.

4.1 Performing the Listening Test

A listening test was performed to test whether listeners could identify the emotional content of therecorded utterances. 20 normal-hearing listeners (10 of each gender) mainly staff at CPK was used. Theyhad an average age of 38 years, ranging from 18 to 59 years, see Table II.

Four listening tests, one for each of the actors, were designed. Each listening test consisted of 13(2+9+2) utterances spoken with 5 different emotions. Resulting in 4 listening tests consisting of 65utterances, see Appendix D.2. The listening test was prepared on a DAT-tape, and the listeners performed

Initials Gender Age Date Succession HO DHC JZB KLA CorrectAVH Female 28 1-8 HO-DHC-JZB-KLA 48 48 52 50 76%SVA Male 27 8-8 HO-DHC-JZB-KLA 45 44 48 44 70%OA Male 33 9-8 JZB-KLA-HO-DHC 36 49 46 49 69%PLE Male 26 9-8 HO-DHC-JZB-KLA 35 38 45 40 61%HE Male 59 14-8 JZB-KLA-HO-DHC 33 38 44 29 55%

JPM Male 34 15-8 JZB-KLA-HO-DHC 49 55 55 50 80%ILW Female 57 21-8 JZB-KLA-HO-DHC 41 50 46 46 70%HC Female 26 2-9 HO-DHC-JZB-KLA 40 44 44 43 66%

HEB Male 45 4-9 DHC-JZB-KLA-HO 40 40 45 45 65%POR Male 41 5-9 KLA-HO-DHC-JZB 38 42 44 40 63%GE Female 57 5-9 KLA-HO-DHC-JZB 37 53 47 45 70%

BLR Female 36 6-9 HO-DHC-JZB-KLA 42 46 54 49 73%PE Male 39 9-9 DHC-JZB-KLA-HO 43 45 46 41 67%VJP Female 39 10-9 JZB-KLA-HO-DHC 47 45 48 47 72%SPJ Male 47 10-9 KLA-HO-DHC-JZB 35 42 47 40 63%JEB Female 53 11-9 DHC-JZB-KLA-HO 33 42 43 36 59%PED Male 18 11-9 KLA-HO-DHC-JZB 43 40 46 39 65%JFV Female 24 12-9 KLA-HO-DHC-JZB 49 44 45 40 68%IHH Female 19 18-9 DHC-JZB-KLA-HO 32 40 40 44 60%JT Female 47 20-9 DHC-JZB-KLA-HO 45 43 53 48 73%

Average 38 41 44 47 43 67%Table II. Ages, gender and scores for the 20 native listeners. The score in the column under the actors initialsare the number of correct identified emotions out of 65. The Column called Correct contains the averagepercentage of correct identified emotions for that particular listener.

Page 11: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

10

the listening test one by one, see Appendix D.

No training session was offered to the subjects before the test and they were not given any feedbackduring the session. The listening test did not always start with the same actor, but the succession of theactors was the same. The listener was urged to ask for breaks when needed, but after each actor a breakwas made.

The listeners were asked to judge the emotional contents of the utterance by forced choice. They wereallowed to hear the utterance several times before deciding on the emotional category. They werehowever not allowed to go back to compare with earlier utterances or change an earlier choice. After eachlistening test, the listeners were asked to state whether they found the task very easy, easy, neither easynor difficult, difficult or very difficult to find the emotional content. The they were asked to state thefactors that made them choose the different emotions. At last they were asked if they had anything else toreport concerning the listening test of this speaker. The questionnaire for the listening test is shown inAppendix D.2 together with a translation.

4.2 Results from the Listening Test

The emotions was correct identified in 67% of the cases, ranging from 55% to 80%, see Table II.Surprise and happiness was often confused as well as neutral and sadness. The confusions between theemotions are shown in Table III, where the emotions vertically are the emotions the actors tried to induce,and the emotions horizontally are the emotions interpreted by the listeners.

Since the listeners were not introduced totraining sets before the listening test, it wastested whether the listeners scored higher on the20 last utterances than on the first 20utterances. 63% of the first 20 utterances wereperceived correctly, but 73% of the last 20utterances were perceived correct. It shouldthough be mentioned that not all emotions foreach actor were represented in the first/last 20utterances! There was a difference of 10%between the score of the first 20 utterances andthe last 20, showing that the listeners hadadopted to the voice in question.

It was also tested whether female listeners werebetter listeners than male listeners. The femalelisteners perceived the correct emotions in 69%of the cases where as the male listenersperceived the correct emotions in 66% of cases.With a significance level at 5% there can befound no difference in their performance in thistest [Ross].

DES contains both single words, sentences andpassages. The type of utterance was foundimportant for the score. The passages wereeasiest to identify emotions from (they were

Listeners RESPONSE in %?

Actors?

Neu. Sur. Hap. Sad. Ang.

Neu.60,8

57,8-63,7

2,6

1,8-3,8

0,1

0,0-0,6

31,7

29,0-34,6

4,8

03,7-06,3

Sur.10,0

8,3-12,0

59,1

56,1-62,1

28,7

26,0-31,5

1,0

0,5-1,8

1,3

0,7-2,1

Hap.8,3

6,8-10,1

29,8

27,1-32,7

56,4

53,4-59,4

1,7

1,1-2,7

3,8

2,8-5,1

Sad.12,6

10,7-14,8

1,8

1,2-2,8

0,1

0,02-0,6

85,2

82,9-87,2

0,3

0,1-0,9

Ang.10,2

8,5-12,2

8,5

6,9-10,3

4,5

3,4-6,0

1,7

1,1-2,7

75,1

72,4-77,6

Total20,4

19,3-21,5

20,4

19,3-21,5

18,0

16,9-19,0

24,3

23,1-25,5

17,0

16,0-18,1

Table III. Confusions between the emotions for allspeakers and listeners with coherent 95% confidenceinterval. Neu. is short for neutral, Sur. for surprise,Hap. for happy, Sad. for sadness and Ang. for angry.Total shows how often the different emotions werechosen by the listeners.

Page 12: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

11

perceived correct in 68% of the cases). This could be because there is most data in the passages, as theyare the longest. Between the single words (perceived correct in 65% of the cases) and the sentences(perceived correct in 76% of the cases) there were little difference.

4 of the 9 sentences in DES were inquiring. It was tested whether the inquiring sentences wereinterpreted as surprised more often than the non inquiring sentences. Hence, the relative representation ofsurprise in inquiring sentences ? and the relative representation of surprise in non inquiring sentences ?was calculated as:

The relative representation of surprise in inquiring sentences ? was 168%. Contrasting to this the relative

representation of surprise in non inquiring sentences ? was 67%. From the test it can be seen that theinjuring sentences often were interpreted as surprised even though another emotion was in question. Infuture, questions should not be used as semantic neutral sentences.

After each listening test, the subjects were asked to state whether they found the task very easy, easy,neither easy nor difficult, difficult or very difficult. They were further asked to describe the factors thatmade them choose the different emotions. Finally they were asked for further comments they might haveto the listening test of this particular speaker.

75% of the listeners found it difficult orneither easy nor difficult to identify theemotions. Except for one listener, the actorsDHC and JZB were found from difficult toeasy, where as HO and KLA werecharacterised as very difficult to neither easynor difficult, except for one listener. Thejudged difficulty together with the actual scorefor the different actors can be seen in TableIV.

In the last remarks, that the listeners wereasked to give about the listening test, it can beseen that some listeners found HO’s emotionalstate difficult to hear due to his deep voice! Also the fact that DHC was younger than KLA was stated bya listener.

The listeners differed much in opinion as to whether the actors had exaggerated or under exaggerated.Some stated that it was like stage acting, especially JZB, but others stated that the emotions not were clearenough. The listeners found it easiest to judge JZB, who also got the highest score, see Table II. All in allthe impression of the difficulties corresponded well with the actual score. In general the emotions of thetwo younger actors were judged to be easier to identify than the emotions of the older actors. Thisindicates that emotions expressed by younger people perhaps are easier to identify.

? ?Inquring sentences interpreted as surprisedAll inquring sentences that were surprised

? Non inquring sentences interpet as surprised

All non inquring sentences that were surprised?

Initials HO DHC JZB KLA TotalScore 62,4% 68,3% 72,1% 66,5% 67,3%

very difficult 3 1 - 3 7difficult 11 4 4 10 29

neither / nor 5 9 8 6 28easy - 5 5 1 11

very easy - - 1 - 1Total 19 19 18 20 76

Table IV. The judged difficulty together with the actualscore for the different actors. “Neither / nor” stands for“neither easy nor difficult”, and a “-” means that no onechose this option.

Page 13: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

12

5 Processing the DES database

5.1 LabellingThe Hidden Markov model Toolkit [Youn] was used labeling the DES database. Three state models,trained on the EUROM.1 [Lind] was used.

The label format is XWAVES (the label format used in xwaves from Entropic). The format is based onASCII text files, so therefore it should be fairly easy to convert to another label file format. In Table V theformat for the XWAVES label file is given. For an example of a XWAVES label file see Table VI.

In Appendix F label statistics for DES is shown together with label statistics for EUROM.1.

5.2 Transferring the Speech data

The emotional speech data was first transferred from DAT-tape to files by the use of a UNIX UserCommand, NARECORD. The data was transferred as lineary encoded data with 16 bits/sample. Thesampled data was stored as 20 kHz consecutive packed binary data with most significant byte first. Onlythe one channel was transferred resulting in a mono signal.

For the transfer a Panasonic Professional Digital Audio Tape Recorder SV-4100 was used together with aTownshendComputerTools DAT-Link+ Digital Audio Interface.

Then the recorded material was transferred from files to a CD-ROM. The structure of the DES CD-ROMis shown in Appendix E, where the file naming also is explained.

#end ccode label_nameend ccode label_name...end ccode label_name

Table V. XWAVES label format. where end is theboundary of the label label_name and ccode is acolor code used by xwaves. end is specified inseconds. Any value may be used for the ccode butoften a value of 121 is used.

#0.052436 121 SIL0.165000 121 f0.264999 121 A0.310750 121 v0.346749 121 SIL

Table VI. An example of a XWAVES label file.

Page 14: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

13

6. References

[Aarh] Aarhus Teaters Lydstudie, Skolegade 9, 3. sal, 8000 Århus C, Denmark.

[Klas] G. Klasmeyer 'Emotions in Speech' Institut fur Kommunikationswissenschaft, Technical Universityof Berlin, 1995.

[Lind] B. Lindberg & H. Christensen (1995) 'Documentation of the Danish EUROM.1 Database', Espritproject 2589 (SAM) Multi-lingual speech input/output assessment, methodology and standardisation.CPKDenmark.

[Murr] Murray & Arnott (1992) 'Towards the simulation of emotion in synthetic speech: A review of theliterature on human vocal emotion' JASA 93(2) p.1100

[Olse] J. Ø. Olsen (1995) Quality Assessment of the Danish EUROM-1 Corpus, Center forPersonKommunikation, Aalborg University, November 28, 1995.

[Ross] Ross, Sheldon M (1987), Introduction to Probability and Statistics for Engineers and Scientists,John Wieley & Sons, Inc.

[Sche] Scherer 1995. Futher information will follow!

[Tide] 'Tide Project : TP1174-VAESS, Technical Annex',26-6-1995.

[Voet] Jan Voetmann, Lydteknisk Institut (Institute of sound techniques), Bygning 356, Akedemivej, 2800Lyngby, Denmark.

[Well] J. C. Wells 'Computer-coded Phonemic Notation of Individual Languages of the EuropeanCommunity' Journal of the International Phonetic Association (1989) 19(1), 31-54.

[Will] Williams, C.E & Stevens, K.L (1972) Emotions and speech: Some acoustical correlates, Journal ofAcoustic Soc. Am. 52:4, part 2, pp. 1238-1250.

[Youn] Steve Young et al. (1996) HTK - Hidden Markov Model Toolkit (2.0), Entropic CambridgeResearch Laboratory.

[Öste] Öster, Anne-Marie & Risberg, Arne (1986) The identification of the mood of a speaker by hearingimpaired listeners, Speech Transmission Lab. - QPSR 4.

Page 15: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

14

A. Utterances to be Recorded

In this Appendix a translation of the utterances is included together with a phonotypical transcriptionwritten in SAM-PA [Well]. For each word (written in italic) there are several phonotypical transcriptions(parted by a comma , ), since the transcription covers all possible pronunciations. The phonotypicaltranscription was made by Tom Brøndsted, CPK.

A.1 Single Words

1. Ja. Yes.Ja ja

2. Nej. No.Nej nAj

A.2 Sentences

1. Du er en sød dreng. You are a nice boy.Du du er aQ en en sød s2D dreng dRaN

2. Jeg er ikke sulten. I am not hungry.Jeg jAj, jA er aQ ikke eg@, eg sulten suldn, suld@n

3. Jeg ved det heller ikke. I don't know either.Jeg jAj, jA ved ve, veD det de heller hElQ ikke eg@, eg

4. Hvad er det? What is this?Hvad va, vaD er aQ det de

5. Hvor er du? Where are you?Hvor v@ er aQ du du

6. Hvor skal du hen? Where are you going?Hvor v@ skal sga, sgal du du hen hEn

7. Kom med mig! Come with me!Kom kQm med mED, mE mig mAj, mA

8. Kommer du her igen? Is it you again?Kommer kQmQ du du her hEQ, heQ, hA igen igEn

9. Jeg synes vi mangler nogle, som er lidt længere.I think we need some that are a little longer.

Jeg jAj, jA synes syns vi vi mangler mANlQ nogle no@n som sQm er aQ lidt led længere lENQ

A.3 Passages

Page 16: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

15

1. Hej! Kom ind! Jeg må fortælle dig, hvad der skete idag. Da jeg tog til stationen for at møde Peter,så jeg en gammel dame, der solgte blomster tæt ved markedspladsen. ...

Hi! Come in! I have to tell you, what happened today. When I went to the station to meet Peter, I saw anold lady selling flowers near the marketplace. ...

Hej hAj Kom kQm ind en Jeg jAj, jA må mO fortælle fQtEl@ dig dAj, dA hvad va, vaD derdA, d{:Q skete sged, sged@ idag id{: Da da jeg jAj, jA tog to til te, tel stationen sdaSon@n,sdaSon for fQ at Q møde m2D@, m2D Peter pedQ så sO jeg jAj, jA en en gammel gAm@l,gAml dame dam@, dam der dA, d{:Q solgte sQld@ blomster blQmsdQ tæt tEd ved veD, ve, markedspladsen mAk@Dsplas@n, m"Ak@Dsplasn

2. I det følgende er gengivet den opsummering af projektet, som findes i den tekniskeprojektbeskrivelse. Opsummeringen af projektet er gengivet her, for at sætte det arbejde, der er udført hosCPK ind i et større perspektiv. Hos CPK vil den største vægt blive lagt på emner markeret med en stjerne.Fordelingen af mande-måneder på de forskellige arbejdsopgaver er givet i afsnit 1.2.

The following is the project summary as it is given in the Technical Annex. The project summary isrepresented here to set the work done at CPK into a larger perspective. At CPK the major effort will bespent on the subjects marked with a star. The man power distribution on the different work packages isgiven in section 1.2.

I i det de følgende f2lj@n@ er aQ, gengivet gEngi@D, gEngiuD den dEn opsummeringQbsumeQeN af a, projektet proSEgd@d, proS{:gd@D som sQm findes fen@s, fens i i den dEn,dn tekniske tEgnisg@ projektbeskrivelse proSEgdbesgRiuls@ Opsummeringen QbsumeQeN@n afa, projektet proSEgd@d, proS{:gd@D er aQ gengivet gEngi@D, gEngiuD her hEQ, heQ, hA forfQ at Q sætte sEd@, sEd det de arbejde Ab{:jd@, Ad{:id der dA, d{:Q er aQ udført uDf2Qd hos hOs CPK sepekO ind en i i et ed større sd&Q@ perspektiv p{:QsbEgtiu Hos hOs CPKsepekO vil ve, vel den dEn, dn største sd&Qsd@ vægt vEgd blive bli, bliu lagt lAgd på pO emner EmnQ markeret mAkeQ@D, mAke@D med mE, mED en en stjerne sdjaQn@, sdjaQn Fordelingen fQtdeleN@n af a mande-måneder man@mOn@DQ på pO de di forskellige fQsgEli:,fQsgEli@ arbejdsopgaver AbAjdsQbgavQ er aQ givet giuD, giv@D, giv@d i i afsnit Ausnid 1 ed 2 to

Page 17: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

16

A.4 Utterances for the Target Voices

The following utterances together with the numbering are taken from the Danish EUROM.1 database. Aphonotypical transcription in SAM-PA [Well] can be found in the description of EUROM.1 [Lind].

A1O0 Jeg har et problem med min vandvarmer. Vandstanden er for høj, så vandet bliver vedmed at løbe over. Ku' I sende en mand ud at kigge på det tirsdag formiddag? Det er den enestedag, hvor jeg kan være hjemme i denne uge. Jeg vil gerne have, at I ringer tilbage og fortællermig, om det kan blive som aftalt.

A2P0 I går aftes åbnede jeg hoveddøren for at lukke katten ud. Det var en skøn aften, så jegslentrede ned gennem haven for at få lidt frisk luft. I det samme hørte jeg entredøren smække.Jeg forstod straks, at jeg var smækket ude. For at gøre det endnu værre blev jeg arresteret, dajeg forsøgte at bryde ind i mit eget hus.

A3Q0 Vejmeldingen for Storkøbenhavn i dag tirsdag den 28. november. Der er isglatte veje ihele regionen, der er blevet saltet siden klokken 4. Trafikken er tiltagende på alleindfaldsvejene. Der rapporteres om begyndende kødannelse ved Hans Knudsens Plads.Frakørslen på Helsingørmotorvejen ved Jægersborgvej er spærret på grund af et størreharmonikasammenstød.

A4R0 Vi har en udmærket sekretær ansat hos os. Desværre har hun sagt op og rejser medudgangen af næste måned. Familien flytter til New Zealand, de skal rejse via Malaysia ogThailand. Vi kommer allesammen til at savne hende. Hun er den type, der altid kan få andre igodt humør.

A5F0 Tøvejret forvandlede hurtigt snefnuggene til grå pytter på pladsen mellem hytten ogpigtrådshegnet.

Dværgpilene og tjørnebuskene langs det faldefærdige markhegn myldrede af kvidrendesmåfugleyngel.

Tjeneren fortalte undskyldende, at menukortet desværre kun bød på kylling,klipfisk og kørvelsuppe med æg.

Z0 Da jeg gik i gymnasiet, spiste jeg altid en stor portion ymer med mysli hver morgen.

Bjarne - det er ham den nyrige med klubbens smarteste golftøj og det elendigste golfhandicap.

Page 18: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

17

B1O1 Jeg vil gerne tale med vandforsyningen. Reparationsarbejdet på hovedledningen udenfor min ejendom er endnu ikke færdigt, og min kælder er oversvømmet. Den sidste, jeg taltemed, var temmelig uforskammet. Han fortalte mig, at alle reparationshold var optaget i de næsteto uger. Vil det sige, at jeg skal bruge min kælder som svømmebassin indtil da?

B2P1 Jeg har altid syntes, at det var svært at sove i toget. For det første kan jeg aldrig findemig tilrette i sædet. For det andet taler de øvrige passagerer for højt, eller snorker. Og for dettredie så er der den konstante støj fra hjulene. Hvis jeg endelig falder i søvn, så bliver jegsøreme vækket af billetkontrolløren.

B3Q1 Jeg vil gerne tale med serviceafdelingen. Mit fjernsyn har været til reparation i næstentre uger nu, og jeg vil vide, hvornår det er færdigt. De hentede det den 13. og lovede, at detskulle være færdigt i løbet af en uge. Jeg er klar over, at der har været problemer medreservedele, men nu har det varet længe nok. Kan jeg få en endelig dato?

B4R1 Jeg hader mandag morgener, især når det regner. Gaderne er fedtede, og jeg er nødttil at gå meget forsigtigt til stationen. Jeg ville meget gerne kunne tage en taxi, men jeg har ikkeråd. Min løn er så ringe, at jeg knap nok har penge til sko! Bare jeg ville vinde en million, såkunne jeg købe en bil.

B5F1 Gertrud Olsens pakkasse indeholdt en rusten mjødøse, en persianer muffe og en antikfonografmaskine.

Chefkonsulenten blev vred, da hans grønne MG punkterede på vej hjem fra Bellevue.

Jeg hørte på konventet, at den grønlandske rødørn er ukendt for de fleste.

Z1 Monopoltilsynets nye minimalpriser på relækasser og giropapir virker pjattede.

Kongens triumftog blev hurtigt afbrudt af fyråb fra demonstranter i grønlige flyjakker.

Page 19: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

18

A.5 Utterances only for the Female Target Voice

The following utterances together with the numbering are taken from the Danish EUROM.1 database. Aphonotypical transcription in SAM-PA [Well] can be found in the description of EUROM.1 [Lind].

K1O2 Hvad vil 1992 i virkeligheden betyde for det almindelige menneske? Det blivernaturligvis lettere at krydse grænserne for at søge arbejde. Man bliver i stand til at uddanne sigi ét land og arbejde i et andet. Det bliver måske endda muligt at starte på studierne et sted ogtage eksamen et andet sted. Jeg er spændt på, hvad der vil ske med prisniveauet?

K2P2 Det er brandvæsenet. Vi forsøger at lokalisere et alarmopkald, der er kommet igennemuden adresseangivelse. Opkaldet er fra en telefonboks. Manden er desperat, og det eneste vikan tyde, er de sidste fire cifre: 70 og 78. Vil I prøve at spore samtalen med det samme.

K3Q2 Jeg er ked af, jeg måtte melde afbud til festen i lørdags. Jeg havde glædet mig til at sejer igen. Men desværre havde jeg et uheld, lige før jeg skulle ud af døren. Jeg skulle ned ikælderen efter noget og gad ikke tænde lyset. På vej op igen snublede jeg på trappen ogforstuvede min ankel.

K4R2 Kan de anbefale en af restauranterne her i nabolaget. Jeg er lige ankommet her ieftermiddags. Jeg er interesseret i noget virkelig exotisk. En polynesisk eller indonesiskrestaurant for eksempel. Det skal helst ikke være udelukkende vegetarisk.

K5F2 Pøj, hvor de smagte de røde pølser, vi fik ved pølsevognen på trafikpladsen i Tønder.

Ifølge landbrugskonsulenten skal man ikke rynke på næsen af gotlandsfårenes kødydelse.

Sypigens vulgære knækorte kjole gjorde naturligvis lykke ved prinsernes sidste hofbal.

Z2 Den lokalpatriotiske prorektor røbede, at han er tidligere jysk juniormester i langrend.

Chefstewardessens smukke opalring funklede om kap med stjernerne på det dybblåhimmelhvælv.

Page 20: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

19

D1O3 Min søster er forfærdelig mørkeræd. Hun nægter at gå alene ud efter mørkets frembrud.Hun vil altid have én eller anden til at følge sig. Min far har foreslået hende at tage hunden medud. Den vil kunne forsvare hende og gø, hvis en truende situation skulle opstå.

D2P3 Jeg vil gerne bestille en taxa til i morgen tidlig. Navnet er Jokumsen, Amalie Allé nummer7, Charlottenlund. Jeg skal være i Kastrup Lufthavn inden klokken 6:45, og jeg skal have ekstrameget bagage med. De må love mig at være her senest klokken 5:30.

D3Q3 Der er genvej gennem mosen til mit sommerhus. Nogle af de lokale beboere påstår, atdet spøger dernede. Der er ikke mange, der er begejstrede for at gå den vej efter mørketsfrembrud. Naturligvis tror jeg ikke på sådan noget overtroisk vrøvl. Jeg holder af turen og nyderden maleriske udsigt.

D4R3 Min kone har et meget kompliceret rejseprogram i næste måned. Kunne De give mignogle råd om den mest økonomiske løsning. Hun skal til en række møder fra klokken 9 tilklokken 13 i Paris, Brügge, Frankfurt, Rom og Hamburg i løbet af fem dage. Kan De findenogle passende aftenfly og hotelarrangementer? Min kone vil helst undgå store upersonligehoteller.

D5F3 Høtyvene og alle de nyindkøbte frøpakker flød hulter til bulter i de snavsede vandpytter.

Der er flere af hans jødiske bekendte, der stadigvæk kan tale lidt jiddish til husbehov.

Man siger, at madelskere er vilde med så simple ting som hønsekødssuppe og Høngcamenbert.

Z3 Med skælvende hænder og knastør hals fjernede Sonja den utætte toppakning påmotorblokken.

Folkemasserne blev urolige, da nationalrådet erklærede al nationalpoesi for bandlyst.

Page 21: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

20

E1O4 Det New Zealandske rugby hold kaldes "De helt sorte". De er alle meget høje og brede.De spillede mod Danmark i søndags. Selvom vi spillede vældigt godt, tabte vi med 17 mål. Dervar ingen rigtige slåskampe. New Zealænderne tabte aldrig bolden og skød altid lige midt imålet. Jeg synes, det var en god kamp.

E2P4 Den gamle fisker var en stor mand med mørkt krøllet hår og buskede øjenbryn. Hanhavde sin helt bestemte plads på kajen. På stille dage sad han og underholdt tilhørerne medsine skipperskrøner fra de syv have. Men når bådene kom ind, var han som forandret. Hans rustemme og vilde gestikuleren holdt alle havnearbejderne i fuld gang.

E3Q4 Jeg prøver at komme i kontakt med Jørgen Juliussen i Ribe. Han er flyttet fraJernbanegade nummer 17 til en anden adresse i Ribe. Kan De oplyse mig om hans nyenummer? Han er flyttet for ca. tre måneder siden. Så vidt jeg ved, har han ikke fået hemmeligtnummer.

E4R4 Hej, jeg har en skøn ferie her i Lønstrup. Vejret er varmt, solen skinner, og havet erbare ubeskriveligt. I går gik jeg en tur oppe langs klinterne. Det blæste temmelig meget, og jegvar nær blæst ned. Jeg er blevet meget solbrændt, men man kan tydelig se, at jeg spiser formeget is.

E5F4 Joachim Palms sangrøst har skaffet ham adskillige fine præmier ved mangeskolesangdyster.

Vølund-vaskemaskinernes nypris er steget vanvittigt i løbet af de sidste fire et halvt år.

Vi er omsider nået til det punkt, hvor den konkrete plan nødvendigvis må offentliggøres i radioog T.V.

Z4 Hønsene skreg og vred sig voldsomt for at slippe væk fra genboens nytjærede halvtag.

Naboens yngste tøs kylede fnisende mine Lacoste golfsko i svømmepølen med et stort plask.

Page 22: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

21

B. Recording Conditions

B.1 Recording Procedure

Prior to the recordings the speakers were given a very brief introduction to the recordings together withthe prompting text on paper. The session started by giving the actors a more thorough description of theexperiment. Then the actors were given a few minutes to re-examine the text and ask for explanationsregarding pronunciation.

The actor was then taken to the recording room and placed in a chair at a table on which the microphonewas placed. The prompting text was placed on the table. The actor was asked to speak in differentemotions before the recording in order to set the recording level. The actors were urged to ask at any timefor a break when needed, aiming at creating a non stressing atmosphere for the actors.

Monologues from different actors were recorded in separate sessions. In this way the inter-speakervariations can be investigated. The actors were urged to use their own every day emotional expression,and not the exaggerated emotional expression known from stage acting. All the utterances was spokenwith one emotion and then the utterances were repeated with the next emotion. It was advisable to takesome time to imagine a specific emotional situation for each utterance, until the actor really was in aspecific emotional state, this is important to collect natural sounding emotional speech.

Page 23: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

22

B.2 Recording Environment

DES was recorded in an acoustically damped sound studio at Aarhus theatre [Aarh]. The studio was builtin 1989 with Jan Voetmann [Voet] as advisor. The studio is “floating” on rubber absorbers as is theoperator rum. Between the studio and the operator rum an angled window with three layers of glass (ofdifferent thickness) is placed.

The operators could get in contact with the actors via an intercommunication system. In addition, theoperators were continuously listening to the actors via the recording chain. The actors were placed attable with their arms on the table in order to keep the same distance to the microphone during the holerecording.

B.3 Recording Equipment

For the recordings, which was made on the 20th of June 1996 at Aarhus Theatre [Aarh], the same fixedconfiguration was used for all the actors:

? AKG 414 ULS microphone ? Amek Angela 36 ch. in -line Mixerdesk ? PANASONIC DAT SV 3500

During the recordings a sound blanket was drawn in the studio.

The In the mixer a 5 Hz high pass filter cut of the lowest frequencies before the whole session wasrecorded on a DAT tape at 48 kHz sampling frequency and in stereo. The DAT tape recorder was startedwhen the gain was set and was only stopped when the actor had a break or left the studio. This way notonly the required recordings was recorded but also the communication between the operators and theactor. The two operators present during the recordings were Andreas Bertelsen working at the soundstudio [Aarh] and Inger S. Engberg CPK.

Page 24: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

23

C. Speaker Profile

First name : Henning

Initials : HO

Age : 52 years

Sex : Male

For how many years have you worked as an actor? : 27 years

Height : 178 cm

Weight : 85 kg

Smoker : Yes

First name : Dorthe

Initials : DHC

Age : 34 years

Sex : Female

For how many years have you worked as an actor? : 7 years

Height : 178 cm

Weight : 63 kg

Smoker : Yes

Page 25: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

24

First name : Jens

Initials : JSB

Age : 38 years

Sex : Male

For how many years have you worked as an actor? : 13 years

Height : 184 cm

Weight : 88 kg

Smoker : Yes

First name : Karen-Lis

Initials : KLA

Age : 52 years

Sex : Female

For how many years have you worked as an actor? : -

Height : 170 cm

Weight : 79 kg

Smoker : Yes

Page 26: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

25

D. Listening Test

D.1 Procedure for the Listening Test

Thirteen sentences spoken with five different emotions by four different speakers, results in 260sentences for which a listening test is designed. The utterances are parted into four groups, one for eachspeaker, and inside each group the utterances are randomly mixed among each other. Before eachutterance is played to the listener a number corresponding to the no. in the questionnaire shown in SectionD.2, is spoken. This is done to help the test person concentrating on the emotional content of theutterances and nothing else.

When a test person arrives he or she is given four questionnaires, one for each of the four speakers (seeSection D.2). The listener is told to decide which emotion incl. neutral the presented utterance is spokenin. The test person is also told to ask for repeat of utterances as well as breaks and what ever needed.There will however be breaks between different speakers.

The listening test described will be performed for each of the 20 speakers.

Page 27: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

26

D.2 Questionnaire for the Listening Test

Spørgeskema til lyttetestQuestionnaire for the Listening Test

Lyttetest af DES (Danish Emotional Speech) Databasen, Aalborg Universitet,1996.Listening test of DES (Danish Emotional Speech) Database, Aalborg University, 1996

Lytterens navn :____________________________________________________________________The listeners name

Køn : _____________ Alder :_____________ Initialer : ______________ Dato : ______________Gender Age Initials Date

Du kan få gentaget sætningen, det antal gange du finder nødvendigt, for at træffe en beslutning. Træfvenligst en beslutning selvom du ikke er sikker!You can have the sentence repeated as many times as you wishes to make a decision. Please make adecision even though you are not sure.

# The following is similar for all four actors: HO, DHC, JSB and KLA, and therefore not shown here.#

Lyttetest af taler HO :Listening test of speaker HO

Page 28: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

27

Hvordan var det at bedømme taler HO's følelser?Did you find the task of identifying the emotions:

Meget let:_______ Let:_______ Hverken let eller svært:_______ Svært:______ Meget svært:______Very easy Easy Neither easy nor difficult Difficult Very difficult

Hvilke faktorer fik dig til at tro, at taleren var neutral :Which factors made you belief that the speaker was neutral:___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Hvilke faktorer fik dig til at tro, at taleren var overrasket :Which factors made you belief that the speaker was surprised:___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Hvilke faktorer fik dig til at tro, at taleren var glad :Which factors made you belief that the speaker was happy:___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Hvilke faktorer fik dig til at tro, at taleren var bedrøvet :Which factors made you belief that the speaker was sad:___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Hvilke faktorer fik dig til at tro, at taleren var vred :Which factors made you belief that the speaker was angry:___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Yderligere bemærkninger til lyttetesten af taler HO :Additional remarks to the listening test with HO:___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Page 29: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

28

E. The DES data base CD-ROM

E.1 Directory Structure of the CD-ROM

DES

HO

KLA

DHC

FEMALE

ANN

WAV

DOC

MALE

JZB

ANGRY

SAD

HAPPY

SURPRISE

NEUTRAL

TARGET

Figure E.I. The directory structure of the DES data base. The ANN directory has a structure similar to the WAVdirectory. The target directory is not present for actor KLA, as no additional neutral speech was recorded forher.

Page 30: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

29

E.2 The File Naming

The file names are constructed as shown in Figure E.II.

Initials

Utterance

Emotion

File type

. -

neu, sur, hap, sad, ang, tar

w, a

yes, no, se#, pa#, a#, b##

ho, dhc, jzb, kla

Figure E.II. Construction of the file names.

Each file name starts with either a w or a a, where w stands for wave, indicating that this file contains thespeech signal, and a stands for annotation, indicating that this file contains the label file.

The next three letters tells us which emotion this file corresponds to. neu stands for neutral, sur forsurprise, hap for happy, sad is of cause sad, ang stands for angry and tar stands for target. Target is alsoneutral, but utterances in target is not represented in the other emotions, and are not the same or evenpresent for all actors.

After the underscore, _ , three letters are used for specifying the utterance. The two short words yes andno are used. For the sentences se followed by a number from 1 to 9 is used. The numbers correspondswith the numbers on the prompting sheet, see Appendix A. Passages are shortened to pa followed by thenumber 1 or 2, also corresponds with the numbers on the prompting sheet.

The utterances in the target directory are called a1, a2... a4, e1, e2... e4 and so on. There also exists a a5,but it is parted into 5 sentences, and their naming is a51 for first sentence in a5.

After the dot, . , the initials of the actors are used: ho, dhc, jzb and kla. When there is a space left as withho, no and a2 the space is left unused, no underscore is added.

An example of a filename could be whap_se5.ho, this file contains the wave file of actor ho speakingsentence 5 with a happy emotion, atar_b52.dhc contains the label file for actor dhc speaking sentence b5line 2 with a neutral emotion recorded as a target voice.

Page 31: Documentation of the Danish Emotional Speech Database DESkom.aau.dk/~tb/speech/Emotions/des.pdf · Documentation of the Danish Emotional Speech Database DES Inger Samsø Engberg &

Documentation of the Danish Emotional Speech data base

30

F. Label Statistics

In [Olse] a study to asses the quality of the DanishEUROM.1 corpus has been undertaken. The averageduration of the phonemes for the passages andsentences of the many speaker part of EUROM.1 islisted in Table F.I together with the average durationof the phonemes in the DES database. From the tableit is seen that the average phoneme duration in DES isclose to the one in EUROM.1.

The SIL model was used for modelling initial and finalsilence periods, the SP model for optimal betweenword silences.

EUROM.1many speaker

DESTotal

Label Count AvrDur Count AvrDur[ms] [ms]

SIL 1106 563,4 1632 322,4SP 3804 321,9 1053 81,4p 776 83,1 329 82,9b 1548 64,8 394 63,1t 1505 94,8 397 95,3d 4396 52,6 1548 54,2k 989 92,3 309 104,2g 2430 58,4 765 60,8f 1625 92,7 411 80,3s 4945 95,0 1394 96,3v 1558 57,7 451 58,4D 1867 75,9 550 73,7j 1274 58,4 464 61,2h 1203 62,0 334 50,3m 2457 75,8 789 66,8n 5416 67,0 1421 66,0N 648 88,1 194 87,0l 3440 53,5 794 43,7R 1671 68,5 350 58,4i 3420 74,8 908 71,8e 3379 65,8 1064 65,8E 2424 69,7 817 71,0{: 551 124,4 154 124,4a 3681 75,2 1110 84,2A 2727 87,8 916 90,6@ 3182 57,2 978 44,4y 490 89,7 114 84,62 695 93,3 183 81,39 579 91,6 121 101,0u 1689 94,6 462 84,6o 1032 109,8 289 91,1O 1133 100,2 292 88,3Q 5381 82,4 1494 68,7

Table F.I. label statistics for the EUROM.1(themany speaker part) [Olse] and the DES database.Count is the number of occurrences, AvrDur theaverage duration in milli seconds.