preliminary f0 statistics for young swedish males and forensic phonetics jonas lindh –...
Post on 18-Dec-2015
219 views
TRANSCRIPT
![Page 1: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/1.jpg)
Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics
Jonas Lindh – [email protected]://www.ling.gu.se/~jonasDepartment of Linguistics, Göteborg Universityand GSLT (Graduate School of Language Technology)
IAFPA 2006
![Page 2: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/2.jpg)
Outline• Background and Introduction
– F0 and Forensic Phonetics– Modulation theory of speech
• Hypotheses• Methods• Results
– F0 Statistics – for Young Swedish males– Robustness test– Vocal effort test.– Liveliness illustration.
• Conclusions• Future Work
![Page 3: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/3.jpg)
Background and Introduction
• F0 a reliable parameter for speaker identification (French, 1990 ; Hollien, 1990 ; Künzel,
1987 ; Nolan, 1983 - in Braun, 1995).• Technical, physiological and psychological
factors (Braun, 1995).• Fundamental frequency measures.• Some previous studies and results.
![Page 4: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/4.jpg)
Background and Introduction (Braun, 1995)
• Technical factors– Tape speed unfortunately still a problem. – Sample durations (50, 75, 14, 120 s?).
• Physiological factors– Age, smoking, operations. – Larynx size, shape and mass.– Between speaker variation.
• Psychological factors– Noise level, emotions, time of the day.– Vocal effort, speaking rate, F0-dynamics, voice quality– Within speaker variation
![Page 5: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/5.jpg)
Background and Introduction
• Fundamental frequency measures– Average
– Standard deviation
– Median
– Interquartile range
– F0 mode
– Base value! Modulation theory of speech.
![Page 6: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/6.jpg)
Modulation theory of speech• The theory /…/ considers speech signals as the result
of allowing conventional gestures to modulate a carrier signal that has the personal characteristics of the speaker. This implies that in general the conventional information can only be retrieved by demodulation. In order to perceive the phonetic quality of a speech signal, listeners evaluate the deviations of the properties of the signal (F0, formant frequencies, etc.) from those they expect of a neutral vocalization produced by the speaker with properties given by his age, sex, vocal effort, speech rate, etc. (part of abstract -Traunmüller, 1994)
![Page 7: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/7.jpg)
F0 Liveliness
European lang. Chinese lang.
Liveliness class SD N SD N(4) Ve ry high 4.8 + +(3) High 4.0 + – –(2) Moderate 2.8 – + – – – 4.0 – –(1) Low 2.1 –
Average F0‑variation (SD in semitones) as a function of the type of speech as classified in.
Under ‘Type’, the speech samples are classified according to their expected liveliness (Traunmüller & Eriksson, 1995).
![Page 8: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/8.jpg)
F0 Mean, SD and ‘liveliness’
Investigation Type n Sex Age F0 SD
Rappaport (1958), German 1 190 m 129 2.3Chevrie‑Muller et al. (1967),Fr 2 21 m 20–61 145 2.5Boë et al. (1975), Fr 2 30 m 118 2.8Takefuta et al. (1972), English 4 24 m 127 3.8Chen (1974), Mandarin Chinese 2 2 m 30–50 108 4.1Rose (1991), Wú 2 4 m 25–62 170 4.1Kitzing (1979), Swedish 2 51 m 21–70 110 3.0Pegoraro Krook (1988), Swedish 2 198 m 20–79 113 2.6
![Page 9: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/9.jpg)
F0 Mean, SD and ‘liveliness’
Investigation Type n Sex Age F0 SD
Johns‑Lewis (1986), English:Conversation 2 5 m 24–49 101 3.4Reading 3 5 m 24–49 128 4.35Acting 4 5 m 24–49 142 4.85Graddol (1986), English:Reading passage A 2 12 m 25–40 119 3.6Reading passage B 3 12 m 25–40 131 4.55
Average/investigation 10 m 124 3.4Average/balanced speaker 471 m 119 2.8
![Page 10: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/10.jpg)
F0 Liveliness (Traunmüller & Eriksson, 1995)
• The SD of F0 increases with increasing ‘liveliness’ of the discourse.
• The SD of F0 seems to be larger in tone languages than in non‑tone languages.
![Page 11: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/11.jpg)
F0 baseline (Traunmüller & Eriksson, 1995)
• Fb = Fmean – k (F)• Where k is a constant (app. 1.43).• App. 5% F0 values below Fb . • Different liveliness, same Fb .
• Tested by changing the factor and not Fb when resynthesizing natural speech.
• ke = 0.156, 0.414, 0.704, 1.000, 1.290, 1.566, 1.830• “Det finns folkstammar som äter både kattkött och hundkött”.
![Page 12: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/12.jpg)
Hypotheses concerning F0 for young Swedish males
• The F0 median is more robust than the F0 mean when it comes to technical factors, i.e. less sensitive to outliers.
• The base value shows least within speaker variation of presented measures within a voice modality. (creaky voice, shouting or raising one’s voice)
• The 5% limit frequency (alternative baseline) is more robust than the base value when the technical factor means positive octave jumps.
![Page 13: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/13.jpg)
Methods
• The software Praat (Boersma & Weenink, 2005) was used to automatically extract F0 data from 109 young male speakers (20-30 years old).– The group exist as such in the Swedia database.– 62% of convicted criminals in Sweden 2004 (25-35).
• The recordings were taken from the Swedia database (<http://www.swedia.nu>) – spontaneous speech.
• Mean duration of 52.3 sec.
![Page 14: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/14.jpg)
Methods• Edited out interviewer.• Manual check of octave jumps.• Ongoing is the collection of 5% limit frequency, F0
mode (histograms for each speaker’s F0 distribution) and interquartile range.
![Page 15: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/15.jpg)
Methods
• A small robustness test was made by measuring F0 for simultaneous recording on four different devices (material Livijn, 2004).
– The North wind and the sun (in Swedish).
– MCA, Cassette, Mobile and digital (Reference).
![Page 16: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/16.jpg)
Methods
• Vocal effort test.
• 5 male speakers from Eriksson & Traunmüller (2000)
• High quality recordings.
• 5 distances/subject outdoors (0,3-1,5-7,5-37,5-187,5m)
– “Jag tog ett violett, åtta svarta och sex vita.”
![Page 17: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/17.jpg)
Methods
• A liveliness illustration
• Recordings of a simulated carrier signal + a neutral, happy, sad and angry voice.
![Page 18: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/18.jpg)
Results
Mean distribution of F0 for YM
0 0 1
8
21
28
22
14
10
1
4
00
5
10
15
20
25
30
70 80 90 100 110 120 130 140 150 160 170 Fler
Hz
N S
pea
ker
s
• Mean of means 120,8 Hz – 65% between 100-130 Hz
![Page 19: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/19.jpg)
Results
F0 mean trend
708090
100110120130140150160170180
0 10 20 30 40 50 60 70 80 90 100 110
Speakers
F0
mea
n (H
z)
![Page 20: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/20.jpg)
Results
Median distribution of F0 for YM
0 0
5
10
31
22 21
10
6
2 20
0
5
10
15
20
25
30
35
70 80 90 100 110 120 130 140 150 160 170 Fler
Hz
N S
peak
ers
•Mean of medians 115,8 Hz – 68% between 100-130 Hz
![Page 21: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/21.jpg)
Results
F0 Median trend
708090
100110120130140150160170
0 10 20 30 40 50 60 70 80 90 100 110
Speakers
Med
ian
s (H
z)
![Page 22: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/22.jpg)
ResultsStandard deviations of F0 for YM
02
15
27
19
14 15
11
4
1 1 00
5
10
15
20
25
30
5 10 15 20 25 30 35 40 45 50 55 FlerHz
N S
peakers
•Mean of std’s 24,1 Hz – 56% between 10-25 Hz
![Page 23: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/23.jpg)
Results
•Mean of baselines 86,3 Hz – 68% between 70-100 Hz
Baseline frequencies for YM
0 1 1 1
15 16
3127
13
3 1 00
10
20
30
40
30 40 50 60 70 80 90 100 110 120 130 Fler
Hz
N S
peak
ers
![Page 24: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/24.jpg)
Results
F0 baseline trend
406080
100120140
0 10 20 30 40 50 60 70 80 90 100 110
Speakers
Bas
elin
es (H
z)
![Page 25: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/25.jpg)
ResultsF0 Measure Robustness
20253035404550556065707580859095
100105110115120125130135140
REF REF_band MOB MOB_band MCA MCA_band CAS CAS_band
Recording device
Fre
quen
cy (
Hz) Mean
STD
Base
Median
Alt-IQ-base
Alt-base
![Page 26: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/26.jpg)
Results
F0 measures of modal to shout
5
25
45
65
85
105
125
145
165
185
205
225
245
265
285
305
325
345
Harald
1
Harald
2
Harald
3
Harald
4
Harald
5
Henrik
1
Henrik
2
Henrik
3
Henrik
4
Henrik
5
Niclas
1
Niclas
2
Niclas
3
Niclas
4
Niclas
5
Peter1
Peter2
Peter3
Peter4
Peter5
Prefek
t1
Prefek
t2
Prefek
t3
Prefek
t4
Prefek
t5
Stark1
Stark2
Stark3
Stark4
Stark5
Speakers Effort 1-5
Hz
Mean
STD
Base
Median
Alt-IQ-base
Alt-base
![Page 27: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/27.jpg)
ResultsLiveliness illustration
0
10
20
30
40
50
60
70
80
90
100
110
carrier neutral happy sad angry
Liveliness
F0
(Hz)
Mean
STD
Base
Median
Alt-IQ-base
Alt-Base
![Page 28: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/28.jpg)
Conclusions
• The median is more robust than the mean when it comes to technical factors, i.e. less sensitive to outliers.– Yes. Manual check and results confirm this.
• The base value shows least within speaker variation of presented measures within a voice modality.– Yes. Shouting or raising one’s voice can mean raising one’s
base value.
– 68% within 30 Hz, same as median.
• The 5% limit frequency is more robust than the base value when the technical factor means positive octave jumps.– Yes. Robustness test.
![Page 29: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/29.jpg)
Conclusions
• F0 should be measured in case work.
• If baseline values are different there should be a reasonable explanation for it not to indicate speaker difference.– Such as ‘voice modality’ (creak, shout etc.)
differences.
![Page 30: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/30.jpg)
Future work
• F0 mode (ongoing) and individual histograms.
• More measures on different “liveliness” levels for same and different speakers on different recording devices.
• Sample size vs. content.
• Authentic case material.
• Separate study of creaky voice.
![Page 32: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/32.jpg)
ReferencesBoersma, P. & Weenink, D. (2005) Praat: doing phonetics by computer (Version 4.3.27)
[Computer program] Retrieved October 7, 2005, from http://www.praat.org/Braun, A. (1995) Fundamental frequency – how speaker-specific is it?, in Braun and
Köster (eds) (1995): 9-23Brottsförebyggande Rådet: [www] Retrieved November 26, 2005, from http://www.bra.se/Bruce, G. (1982) Developing the Swedish Intonation Model. In Working Papers 22 (Lund
University, Dep of Linguistics, 51-116.Jassem, W., Steffen-Batog, S., and Czajka, M. (1973) Statistical characteristics short-term
average F0 distributions as personal voice features, in W. Jassem (ed.) (1973) Speech Analysis and Synthesis vol. 3:209-25, Warsaw: Polish Academy of Science.
Kitzing, P. (1979) Glottografisk frekvensindikering: En undersökningsmetod för mätning avröstläge och röstomfång samt framställning av röstfrekvensdistributionen (Lund University,Malmö)
Nolan, F. (1983) The Phonetic Bases of Speaker Recognition, Cambridge: Cambridge University Press.
Traunmüller, H. (1994) Conventional, biological, and environmental factors in speech communication: A modulation theory. Phonetica 51: 170 - 183.
Traunmüller, H. & Eriksson, A. (1995) The frequency range of the voice fundamental in the speech of male and female adults. Unpublished Manuscript (can be retrieved from http://www.ling.su.se/staff/hartmut/aktupub.htm)
Traunmüller, H. & Eriksson, A. (1995) The perceptual evaluation of F0-excursions in speech as evidenced in liveliness estimations. J. Acoust. Soc. Am. 97: 1905 - 1915.
Hartmut Traunmüller and Anders Eriksson (2000) "Acoustic effects of variation in vocal effort by men, women, and children", J. Acoust Soc. Am. 107: 3438 - 3451.
Rose, P. (2002) Forensic Speaker Identification. New York, Taylor & Francis.Rose, P. (1991) How effective are long term mean and standard deviation as normalisation
parameters for tonal fundamental frequency?, Speech Communication 10:229-247
![Page 33: Preliminary F0 Statistics for Young Swedish Males and Forensic Phonetics Jonas Lindh – jonas.lindh@ling.gu.se jonas Department](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649d245503460f949fa93b/html5/thumbnails/33.jpg)