the reliability of formant measurements in high quality audio data: the effect of agreeing...

The Reliability of Formant Measurements in High Quality Audio Data: The Effect of

Agreeing Measurement Procedures

Martin Duckworth, Kirsty McDougall, Gea de Jong, Linda Shockey

Introduction

• Formant measurement implicitly required legally in the UK in speaker comparison cases

• Measurements on analogue spectrograms had to be by hand and eye

• Measurements on digital spectrograms can be assisted by formant trackers, LPC is common

Introduction

• How replicable are measurements by eye on digital spectrograms?

Introduction

• How replicable are measurement by eye on digital spectrograms?

• If LPC tracking is used what can lead to variability?

Introduction



− Software settings

Introduction



− Software settings

− Point at which data is extracted

Study Aims

• What is required in order to make measurements more replicable?

Study Aims


• If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements?

Study Aims


• If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements?

• If method of analysis is the same does this lead to statistically improved reliability between laboratories?

Aims continued

• We are aiming to find a reliable means of obtaining formant values

• We are examining reliability, not validity

Data

• read speech from Cambridge DyViS database

• male

• Standard Southern British English

• aged 18-25

• 40 speakers: Set 1 (20 speakers)Set 2 (20 speakers)

Data

• 6 monophthongs: / iː, æ, ɑː, ɔː, ʊ, uː /• 6 repetitions per vowel per speaker• elicited in hVd contexts in sentences:

It’s a warning we’d better HEED today. It’s only one loaf, but it’s all Peter HAD today. We worked rather HARD today. We built up quite a HOARD today. He insisted on wearing a HOOD today. He hates contracting words, but he said a WHO’D today.

Measurements

• Analysts from 3 labs – Cambridge, Plymouth, Reading

• Task: to measure F1, F2, F3 for each vowel token using Praat

• Set 1 – using individual – but constrained- methods

• Set 2 – after a meeting at which a single method is agreed

Set 1 Methods

• Measure the formants at a relatively early point in the vowel

Set 1 Methods


• Measure formants over no more than 5 glottal pulses

Set 1 Methods



• Use either:

− LPC tracking checked against the spectrogram or

Set 1 Methods



• Use either:

− LPC tracking checked against the spectrogram or

− hand/eye measures

Set 2 Method

• Measure towards the start of the vowel

Set 2 Method


• Measure in a relatively steady early part of the vowel

Set 2 Method



• Measure around the vowel's maximum intensity

Set 2 Method



• Measure around the vowel's maximum intensity

• Use a single time slice

Set 2 Method (continued)

• Use the LPC formant tracker adjusted for best visual fit

Set 2 Method (continued)

• Use the LPC formant tracker adjusted for best visual fit

• When values generated by Praat are judged by visual inspection to be incorrect, replace them by correct values from a time-slice immediately preceding or following the slice being measured.

Results: HAD, F1

Lab1 Lab2 Lab3

Set 1

Results: HAD, F1

Lab1 Lab2 Lab3 Lab1 Lab2 Lab3

Set 1 Set 2

Statistical Analysis

• 3 formants 6 vowels 2 datasets = 36 tests

• Two-way ANOVA

- repeated measures on the factor Lab (3)

- between-groups factor Speaker (20)

• If Lab signficant at p < 0.05:Pairwise comparisons with Sidak correction

Results: HAD, F1


Set 1 Set 2

Results: HAD, F1


Lab: significant

Set 1 Set 2

Results: HAD, F1


Lab: significant

0.0010.000

0.000

Set 1 Set 2

Results: HAD, F1


0.0010.000

0.000

Set 1 Set 2

Lab: significant Lab: significant but pairwise comparisons NS

Results: HAD, F1


Lab: significant

0.0010.000

0.000

Set 1 Set 2

NS NS

NS

Lab: significant but pairwise comparisons NS

Results: HAD, F2

Results: HAD, F2


Set 1 Set 2

NS NS

NS

Lab: not significant Lab: not significant

NS NS

NS

Results: HAD, F3

Results: HAD, F3


Set 1 Set 2

Lab: significant Lab: not significant

NS0.000

0.000NS NS

NS

Summary - HAD

F1 F2 F3 F1 F2 F3

Lab sig NS sig sig NS NS

1 vs 2 sig NS NS NS NS NS

1 vs 3 sig NS sig NS NS NS


Set 1 Set 2

Summary - HAD

F1 F2 F3 F1 F2 F3





Set 1 Set 2main effect

Summary - HAD

F1 F2 F3 F1 F2 F3





Set 1 Set 2

pairwise comparisons

Summary - HAD

F1 F2 F3 F1 F2 F3





Set 1 Set 2

Summary - HAD

F1 F2 F3 F1 F2 F3





Set 1 Set 2

improvement

Summary - HAD

F1 F2 F3 F1 F2 F3





Set 1 Set 2

Summary - HAD

F1 F2 F3 F1 F2 F3





Set 1 Set 2

improvement

Summary - HAD

F1 F2 F3 F1 F2 F3





Set 1 Set 2

Summary - HAD

F1 F2 F3 F1 F2 F3





Set 1 Set 2

Set 2: good news

Effect of Lab - 6 vowels

Set 1 F1 F2 F3

heed sig NS sig

had sig NS sig

hard sig sig sig

hoard sig sig sig

who’d sig sig NS

hood sig sig sig

Effect of Lab - 6 vowels

Set 1 Set 2 F1 F2 F3 F1 F2 F3

heed sig NS sig sig NS sig

had sig NS sig sig NS NS

hard sig sig sig NS sig sig

hoard sig sig sig sig sig NS

who’d sig sig NS sig sig sig

hood sig sig sig NS sig NS

Influence of Speaker

• Interaction Lab x Speaker significant (p < 0.05) for F1-F3 of all 6 vowels for both Set 1 and Set 2

certain speakers lead to measurement differences among labs

for example…

F3 of HARD (Set 2) means by speaker

F3 of HARD (Set 2) means by speaker

Agreement across labs in most cases, but certain individuals lead to measurement differences among labs

Subject 42 HARD6 F3 = 3325 Hz

Subject 42 HARD4 F3 = 2219Hz Subject 42 HARD2 F3 = 2579Hz

Difficult cases: subject 42 F3

Difficult cases: subject 43 F3

Subject 43 HARD2 F3?Subject 43 HARD1 F3?

Visual inspection

Visual inspection vs formant tracker

Visual inspection

Subject 43 HARD2 F3?Subject 43 HARD1 F3?

Visual inspection

Visual inspection

Tracker Tracker

The effect of intraspeaker variability, possibly voice quality • This can affect:

− The visibility of formants

− The functioning of the LPC tracker

for example…

The effect of intraspeaker variability

Subject 37: HAD1 F1=?? Subject 37: HAD6 F1

..had today. ..had today.

Discussion: Laboratory Effects

• Do different laboratories produce different formant values?


• Do different laboratories produce different formant values? YES


• Do different laboratories produce different values formant values? YES

• Does replicating the measurement method reduce these differences?



• Does replicating the measurement method reduce these differences? YES




• Could these be reduced further?




• Could these be reduced further? YES

Other sources of variability

• Settings (e.g. No. of poles; No of Formants in Praat)


• Settings

• The exact point in the vowel at which the measure is taken


• Settings

• The exact point in the vowel at which the measure is taken

• The ‘readability’ of the spectrogram which can be affected by speaker characteristics

Conclusion

• Developing standard ways of collecting formant values could assist comparisons between experts in case work

• If records are kept relating to time points, software and settings then the measurement process can be replicated

Acknowledgements

• IAFPA Research Grant for travel expenses

• Economic and Social Research Council UK for funding the DyViS Project ‘Dynamic Variability in Speech: A Forensic Phonetic Study of British English’ [RES-000-23-1248]

• Other members of the DyViS project – Francis Nolan and Toby Hudson

the reliability of formant measurements in high quality audio data: the effect of agreeing...

Documents

vowelmeasure formants

digital spectrograms

lpc tracking

early point

analogue spectrograms

high quality audio data

different laboratories

single method