the reliability of formant measurements in high quality audio data: the effect of agreeing...
TRANSCRIPT
The Reliability of Formant Measurements in High Quality Audio Data: The Effect of
Agreeing Measurement Procedures
Martin Duckworth, Kirsty McDougall, Gea de Jong, Linda Shockey
Introduction
• Formant measurement implicitly required legally in the UK in speaker comparison cases
• Measurements on analogue spectrograms had to be by hand and eye
• Measurements on digital spectrograms can be assisted by formant trackers, LPC is common
Introduction
• How replicable are measurements by eye on digital spectrograms?
Introduction
• How replicable are measurement by eye on digital spectrograms?
• If LPC tracking is used what can lead to variability?
Introduction
• How replicable are measurement by eye on digital spectrograms?
• If LPC tracking is used what can lead to variability?
− Software settings
Introduction
• How replicable are measurement by eye on digital spectrograms?
• If LPC tracking is used what can lead to variability?
− Software settings
− Point at which data is extracted
Study Aims
• What is required in order to make measurements more replicable?
Study Aims
• What is required in order to make measurements more replicable?
• If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements?
Study Aims
• What is required in order to make measurements more replicable?
• If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements?
• If method of analysis is the same does this lead to statistically improved reliability between laboratories?
Aims continued
• We are aiming to find a reliable means of obtaining formant values
• We are examining reliability, not validity
Data
• read speech from Cambridge DyViS database
• male
• Standard Southern British English
• aged 18-25
• 40 speakers: Set 1 (20 speakers)Set 2 (20 speakers)
Data
• 6 monophthongs: / iː, æ, ɑː, ɔː, ʊ, uː /• 6 repetitions per vowel per speaker• elicited in hVd contexts in sentences:
It’s a warning we’d better HEED today. It’s only one loaf, but it’s all Peter HAD today. We worked rather HARD today. We built up quite a HOARD today. He insisted on wearing a HOOD today. He hates contracting words, but he said a WHO’D today.
Measurements
• Analysts from 3 labs – Cambridge, Plymouth, Reading
• Task: to measure F1, F2, F3 for each vowel token using Praat
• Set 1 – using individual – but constrained- methods
• Set 2 – after a meeting at which a single method is agreed
Set 1 Methods
• Measure the formants at a relatively early point in the vowel
Set 1 Methods
• Measure the formants at a relatively early point in the vowel
• Measure formants over no more than 5 glottal pulses
Set 1 Methods
• Measure the formants at a relatively early point in the vowel
• Measure formants over no more than 5 glottal pulses
• Use either:
− LPC tracking checked against the spectrogram or
Set 1 Methods
• Measure the formants at a relatively early point in the vowel
• Measure formants over no more than 5 glottal pulses
• Use either:
− LPC tracking checked against the spectrogram or
− hand/eye measures
Set 2 Method
• Measure towards the start of the vowel
Set 2 Method
• Measure towards the start of the vowel
• Measure in a relatively steady early part of the vowel
Set 2 Method
• Measure towards the start of the vowel
• Measure in a relatively steady early part of the vowel
• Measure around the vowel's maximum intensity
Set 2 Method
• Measure towards the start of the vowel
• Measure in a relatively steady early part of the vowel
• Measure around the vowel's maximum intensity
• Use a single time slice
Set 2 Method (continued)
• Use the LPC formant tracker adjusted for best visual fit
Set 2 Method (continued)
• Use the LPC formant tracker adjusted for best visual fit
• When values generated by Praat are judged by visual inspection to be incorrect, replace them by correct values from a time-slice immediately preceding or following the slice being measured.
Results: HAD, F1
Lab1 Lab2 Lab3
Set 1
Results: HAD, F1
Lab1 Lab2 Lab3
Set 1
Results: HAD, F1
Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Set 1 Set 2
Results: HAD, F1
Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Set 1 Set 2
Statistical Analysis
• 3 formants 6 vowels 2 datasets = 36 tests
• Two-way ANOVA
- repeated measures on the factor Lab (3)
- between-groups factor Speaker (20)
• If Lab signficant at p < 0.05:Pairwise comparisons with Sidak correction
Results: HAD, F1
Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Set 1 Set 2
Results: HAD, F1
Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Lab: significant
Set 1 Set 2
Results: HAD, F1
Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Lab: significant
0.0010.000
0.000
Set 1 Set 2
Results: HAD, F1
Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
0.0010.000
0.000
Set 1 Set 2
Lab: significant Lab: significant but pairwise comparisons NS
Results: HAD, F1
Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Lab: significant
0.0010.000
0.000
Set 1 Set 2
NS NS
NS
Lab: significant but pairwise comparisons NS
Results: HAD, F2
Results: HAD, F2
Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Set 1 Set 2
NS NS
NS
Lab: not significant Lab: not significant
NS NS
NS
Results: HAD, F3
Results: HAD, F3
Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Set 1 Set 2
Lab: significant Lab: not significant
NS0.000
0.000NS NS
NS
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2main effect
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2
pairwise comparisons
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2
improvement
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2
improvement
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2
Summary - HAD
F1 F2 F3 F1 F2 F3
Lab sig NS sig sig NS NS
1 vs 2 sig NS NS NS NS NS
1 vs 3 sig NS sig NS NS NS
2 vs 3 sig NS sig NS NS NS
Set 1 Set 2
Set 2: good news
Effect of Lab - 6 vowels
Set 1 F1 F2 F3
heed sig NS sig
had sig NS sig
hard sig sig sig
hoard sig sig sig
who’d sig sig NS
hood sig sig sig
Effect of Lab - 6 vowels
Set 1 Set 2 F1 F2 F3 F1 F2 F3
heed sig NS sig sig NS sig
had sig NS sig sig NS NS
hard sig sig sig NS sig sig
hoard sig sig sig sig sig NS
who’d sig sig NS sig sig sig
hood sig sig sig NS sig NS
Influence of Speaker
• Interaction Lab x Speaker significant (p < 0.05) for F1-F3 of all 6 vowels for both Set 1 and Set 2
certain speakers lead to measurement differences among labs
for example…
F3 of HARD (Set 2) means by speaker
F3 of HARD (Set 2) means by speaker
Agreement across labs in most cases, but certain individuals lead to measurement differences among labs
F3 of HARD (Set 2) means by speaker
Agreement across labs in most cases, but certain individuals lead to measurement differences among labs
Subject 42 HARD6 F3 = 3325 Hz
Subject 42 HARD4 F3 = 2219Hz Subject 42 HARD2 F3 = 2579Hz
Difficult cases: subject 42 F3
Difficult cases: subject 43 F3
Subject 43 HARD2 F3?Subject 43 HARD1 F3?
Visual inspection
Visual inspection vs formant tracker
Visual inspection
Subject 43 HARD2 F3?Subject 43 HARD1 F3?
Visual inspection
Visual inspection
Tracker Tracker
The effect of intraspeaker variability, possibly voice quality • This can affect:
− The visibility of formants
− The functioning of the LPC tracker
for example…
The effect of intraspeaker variability
Subject 37: HAD1 F1=?? Subject 37: HAD6 F1
..had today. ..had today.
Discussion: Laboratory Effects
• Do different laboratories produce different formant values?
Discussion: Laboratory Effects
• Do different laboratories produce different formant values? YES
Discussion: Laboratory Effects
• Do different laboratories produce different values formant values? YES
• Does replicating the measurement method reduce these differences?
Discussion: Laboratory Effects
• Do different laboratories produce different formant values? YES
• Does replicating the measurement method reduce these differences? YES
Discussion: Laboratory Effects
• Do different laboratories produce different formant values? YES
• Does replicating the measurement method reduce these differences? YES
• Could these be reduced further?
Discussion: Laboratory Effects
• Do different laboratories produce different formant values? YES
• Does replicating the measurement method reduce these differences? YES
• Could these be reduced further? YES
Other sources of variability
• Settings (e.g. No. of poles; No of Formants in Praat)
Other sources of variability
• Settings
• The exact point in the vowel at which the measure is taken
Other sources of variability
• Settings
• The exact point in the vowel at which the measure is taken
• The ‘readability’ of the spectrogram which can be affected by speaker characteristics
Conclusion
• Developing standard ways of collecting formant values could assist comparisons between experts in case work
• If records are kept relating to time points, software and settings then the measurement process can be replicated
Acknowledgements
• IAFPA Research Grant for travel expenses
• Economic and Social Research Council UK for funding the DyViS Project ‘Dynamic Variability in Speech: A Forensic Phonetic Study of British English’ [RES-000-23-1248]
• Other members of the DyViS project – Francis Nolan and Toby Hudson