states and times or cycles and frequencies?
TRANSCRIPT
Phonology and Phonetics of Rhythm:
States and Timesor
Cycles and Frequencies?
Dafydd Gibbon
Bielefeld UniversityJinan University
13th Chinese Phonetics ConferenceGuangzhou, November 2018
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 2
Theses: YARD (Yet Another Discussion of Rhythm)
In speech (and in music):1) Rhythms are oscillations, not sequences of isochronous states2) Rhythmic oscillations are essentially linear (‘finite state’)3) Rhythms occur in time domains of different sizes, each of them
linear4) The time domains are associated with ranks of units of
speech/language from discourse to phoneme
So rhythms appear to be hierarchical, but the hierarchy● is layered● has limited depth● has linear layers with iterative cycles● is not a general recursive hierarchy
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 3
Linear Layered Discourse Rhythms
57 seconds of a BBC news broadcast, from Aix-MARSEC corpus (A0101B)
Two Frequency zones:1) Approximately 120 – 220 Hz2) Approximately 300 – 400 Hz (on ‘paratone’ onsets)
Finite depth linear 2-cycle iterative layered structure(FS, cf. Pierrhumbert, not a recursive hierarchy)
● Two independently motivated but structurally dependent linear layers(like hours & minutes on a clock)
● Each gradually declining and periodically resetting
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 4
1. Denotation, Reference
2.symbols, Icons,
Indices inSpace-Time
semiotic relation
3.Categorial
simple and structured
forms
Multimodal Interpretation
hierarchicalpatterning
FocusContrast
Emphasis
linearpatterns
Three ‘hierarchies’, not one, with a ternary semiotic basis for signs at all ranks.
Semantic Interpretation
The Multilinear Rank Interpretation Model
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 5
Discourse: Monologue, Dialogue
Utterance: turn, IPU, ...
Sentence, clause, phrase
Word: simple, inflected, compound, derived
Multimodal
Rank
Interpretation
Architecture
MultilinearCategory Ranks
Ranks: Categories and their Interpretations
The Multilinear Rank Interpretation Model
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 6
The discourse dimension: people communicating with their multimodal rank interpretation architectures
The Multilinear Rank Interpretation Model
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 7
States and Times or Cycles and Frequencies? - Overview
1.Rhythm Communities
2. Phonological cycles: abstract, iterative rhythm
3. Phonetic states: sequences of isochronous states as rhythm
4.Phonetic cycles: rhythm as oscillation:– Production as amplitude and frequency modulation– Perception as amplitude and frequency demodulation
5. Pitch cycles: the role of F0 in rhythm– AM and FM similarities– Some problems with F0 estimators (aka ‘pitch trackers’)
6. More roles of F0 in discourse rhythms
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 8
Rhythm Communities – a Selection
Conversation Analysis
Linguistic Analysis
Phonetic Annotation Analysis
PhoneticSignal
Analysis
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 9
Rhythm Communities – a Selection
Conversation Analysis
‘Authentic’ everyday dialogue dataEthnomethodology: task scenarioSystematic but subjective judgmentsFocus on interpretation:
● discourse grammar: framing● semantics: topic development● pragmatics: speaker, addressee
Linguistic Analysis
Constructed data paradigmsRules for verbal categoriesAbstract stress position patterns:
● linear structures, grids● hierarchical structures, trees
● words● sentences
Phonetic Annotation Analysis
Standardised reading data● sentences● stories (The North Wind…)
Annotation:● manual, semi-automatic● syllables, feet, C or V chunks
Focus on isochrony
Phonetic Signal Analysis
Smoothing models● splines, polynomials
Oscillation models● production oriented● perception oriented
● Amplitude Modulation Spectrum● Frequency Modulation Spectrum
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 10
Rhythm Communities – a Selection
Conversation Analysis
‘Authentic’ everyday dialogue dataEthnomethodology: task scenarioSystematic but subjective judgmentsFocus on interpretation:
● discourse grammar: framing● semantics: topic development● pragmatics: speaker, addressee
Linguistic Analysis
Constructed data paradigmsRules for verbal categoriesAbstract stress position patterns:
● linear structures, grids● hierarchical structures, trees
● words● sentences
Phonetic Annotation Analysis
Standardised reading data● sentences● stories (The North Wind…)
Annotation:● manual, semi-automatic● syllables, feet, C or V chunks
Focus on isochrony
Phonetic Signal Analysis
Smoothing models● splines, polynomials
Oscillation models● production oriented● perception oriented
● Amplitude Modulation Spectrum● Frequency Modulation Spectrum
Tree Structures
Linguistics: deductive, top-down● paradigmatic (classificatory)● syntagmatic (compositional)
Phonetics: inductive, bottom-up● classificatory (CART etc.)● compositional
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 11
Phonological Iteration as Abstract Oscillation:
Iterative intonation
Iterative tonal sandhi (Niger Congo)
Iterative tonal sandhi (Tianjin dialect)
Note: by ‘cycle’ I do not mean tone cycles in theparadigmatic , classificatory sense,
but syntagmatic, compositional iterations
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 12
Empirical overgeneration1) Accents in a sequence tend to be all H* or all L*2) Global contours tend to be rising with L*
accents, falling with H* accents3) Global contours may span more than 1 turn
Empirical undergeneration1) Paratone hierarchy not included2) No time constraints
Pierrehumbert’s regular grammar / finite state transition network
Phonological iteration as a layered hierarchyof linear abstract oscillations
Not the first (cf. Reich,’t Hart et al., Fujisaki, …)
But linguistically the most interesting.
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 13
From an allotonic point of view:
● 3 cycles● 1-tape (1-level) transition
network
Niger-Congo Iterative Tonal Sandhi (the most general case)
Phonological iteration as a layered hierarchyof linear abstract oscillations
At the most abstract level, just one node with H and L cycling around it.
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 14
From an allotonic point of view:
● 3 cycles● 2-tape (= 2-level) transition
network
Niger-Congo Iterative Tonal Sandhi (the most general case)
Phonological iteration as a layered hierarchyof linear abstract oscillations
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 15
From phonetic signal processing point of view:
● 3 cycles● 3-tape (= 3-level) transition
network
Phonological iteration as a layered hierarchyof linear abstract oscillations
Niger-Congo Iterative Tonal Sandhi (the most general case)
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 16
Martin Jansche 1998Tianjin Mandarin tone sandhi
Tianjin Dialect Iterative Tonal Sandhi
Phonological iteration as a layered hierarchyof linear abstract oscillations
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 17
Phonetics, the Search for Rhythm I
States and Times: the Isochrony Approach
1-dimensional2-dimensional3-dimensional
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 18
One-dimensional annotation mining of time-stamp durations
One-dimensional because the result of the analysis is a single scale. The results are all comparable to variance or standard deviation, but differ in detail.
For example, with the nPVI, subtraction is between neighbouring data in a moving window (so a kind of AMDF, Average Magnitude Difference Function), not between mean and data, thus factoring out tempo variations to some extent.
(or Standard Deviation)
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 19
Wagner, Petra (2007). “Visualizing levels of rhythmic organisation.” Proc. International Congress of Phonetic Sciences, Saarbrücken 2007, pp. 1113-1116, 2007
Two-dimensional because duration relations are represented in a z-scored scatter plot, not as a single scale.Result, visualising the scale in two dimensions:
Mandarin: means are scattered relatively evenly around the centreEnglish: e.g. count(short-short) > count(long-long), not binary!
LONG-LONG
LONG-SHORTSHORT-SHORT
SHORT-LONG
Two-dimensional annotation mining of time-stamp durations
LONG-LONGSHORT-LONG
LONG-SHORTSHORT-SHORT
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 20
Gibbon, Dafydd. 2006. “Time types and time trees: Prosodic mining and alignment of temporally annotated data”. In: Stefan Sudhoff, et al., eds. Methods in Empirical Prosody Research. Berlin: Walter de Gruyter, pp. 281–209, 2006.
Three-dimensional because alternative trees are possible, depending on the algorithm settings:
- binary/nonbinary, lower/higher percolated- related to phrasal and discourse patterns
Induction of compositional tree structures
Two-dimensional annotation mining of time-stamp durations
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 21
All these approaches
● are based on annotated time-stamps, not the signal● model static durations● use pairwise relations● focus on isochrony, equal timing of durations
● completely ignore the essential property of rhythm, which is alternation of units with approximately equal duration
● in other words: oscillation
Time Stamps:
AnnotationMining
Time Stamps:
1D Isochrony
Time Stamps:
2DRelations
Time Stamps:
3DTime Trees
Time Stamps:
AnnotationMining
Time Stamps:
1D Isochrony
Time Stamps:
2DRelations
Time Stamps:
3DTime Trees
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 22
Phonetics, The Search for Rhythm II
Cycles and Frequencies: Oscillation Approaches
Rhythm Production as Amplitude Modulation (AM)Rhythmic Amplitude Envelope + Carrier Frequency
There are many studies of oscillation in production
I will concentrate on ...
Rhythm Perception as Amplitude DemodulationRhythmic Amplitude Envelope – Carrier Frequency
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 23
Selected Work on Amplitude Envelope Demodulation Spectra[1] Cummins, Fred, Felix Gers and Jürgen Schmidhuber. “Language identification from prosody
without explicit features.” Proc. Eurospeech. 1999.[2] He, Lei and Volker Dellwo. “A Praat-Based Algorithm to Extract the Amplitude Envelope and
Temporal Fine Structure Using the Hilbert Transform.” In: Proc. Interspeech 2016, San Francisco, pp. 530-534, 2016.
[3] Hermansky, Hynek. “History of modulation spectrum in ASR.” Proc. ICASSP 2010.[4] Leong, Victoria and Usha Goswami. “Acoustic-Emergent Phonology in the Amplitude Envelope of
Child-Directed Speech.” PLoS One 10(12), 2015.[5] Leong, Victoria, Michael A. Stone, Richard E. Turner, and Usha Goswami. “A role for amplitude
modulation phase relationships in speech rhythm perception.” JAcSocAm, 2014.[6] Liss, Julie M., Sue LeGendre, and Andrew J. Lotto. “Discriminating Dysarthria Type From
Envelope Modulation Spectra.” Journal of Speech, Language and Hearing Research 53(5):1246–1255, 2010.
[7] Ludusan, Bogdan Antonio Origlia, Francesco Cutugno. “On the use of the rhythmogram for automatic syllabic prominence detection.” Proc. Interspeech, pp. 2413-2416, 2011.
[8] Ojeda, Ariana, Ratree Wayland, and Andrew Lotto. “Speech rhythm classification using modulation spectra (EMS).” Poster presentation at the 3rd Annual Florida Psycholinguistics Meeting, 21.10.2017, U Florida. 2017.
[9] Tilsen Samuel and Keith Johnson. “Low-frequency Fourier analysis of speech rhythm.” Journal of the Acoustical Society of America. 2008; 124(2):EL34–EL39. [PubMed: 18681499]
[10] Tilsen, Samuel and Amalia Arvaniti. “Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages.” The Journal of the Acoustical Society of America 134, p. 628 .2013.
[11] Todd, Neil P. McAngus and Guy J. Brown. “A computational model of prosody perception.” Proc. ICSLP 94, pp. 127-130, 1994.
[12] Varnet, Léo, Maria Clemencia Ortiz-Barajas, Ramón Guevara Erra, Judit Gervain, and Christian Lorenzi. “A cross-linguistic study of speech modulation spectra.” JAcSocAm 142 (4), 1976–1989, 2017.
☛
☛
☛
☛
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 24
INFORMATION
AMPLITUDEMODULATION
CARRIERFREQUENCY
NOISEFREQUENCIES
+ ✕ FILTERCOEFFICIENTS
INFORMATION
FREQUENCYMODULATION
SPECTRALANALYSESin different
frequency zones,COORDINATION
AMPLITUDE DEMODULATIONin different time zonesrectification, LP filtering
envelope detection
FREQUENCYDEMODULATION
in different frequency zonespitch tracking, formant tracking
AM Multiple
Oscillators,productionemulation
Rhythmis
oscillationand iteration
AM Spectral Zones,
perceptionemulation
FMdiscourse
turnsemotion
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 25
Modulation Regularities – Maybe the Long-Term Spectrum?
Rhythm: thelong-term spectrum?
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 26
Modulation Regularities – Maybe the Long-Term Spectrum?
F0
Rhythm: thelong-term spectrum?
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 27
Modulation Regularities – Maybe the Long-Term Spectrum?
F0 F0 F0harmonics
Long-term spectrum:
not so useful
Rhythm: thelong-term spectrum?
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 28
Modulation Regularities – Maybe the Long-Term Spectrum?
?
What is happening
here?
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 29
Modulation Regularities – Maybe F0?
And what is the role of
F0?
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 30
Rhythm as iteration
The Reality of Rhythm!
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 31Amplitude Envelope Modulation Spectrum (AEMS, AMS, EMS) Frequency Zones
Amplitude Envelope Modulation
↓Amplitude Envelope Demodulation
absolute value of Hilbert transform(or rectification & peak-picking / LP filtering)
↓Spectral slice (FFT)
↓Spectral Zone Edge Detection
Amplitude Modulation, Demodulation and Envelope Spectrum
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 32
Amplitude Demodulation and the Amplitude Envelope Spectrum
Amplitude ModulationSpectrum
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 33
AM Spectrum and the AM Difference Spectrum
Amplitude ModulationDifference Spectrum
(pioneered by Wiktor Jassem)
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 34
The Role of F0
If a spectrum can be derived from the AM envelope,
why not derive a spectrum from the FM track
and see whether they correlate?
Preliminary answer:
Yes, they do correlate to some extent, but not overwhelmingly strongly!This is not very surprising, of course, since they are partly co-extensive locally and globally, though locally not too similar.
I will look at both AM and FM spectra.
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 35
Mandarin, female30 sec, < 1Hz
AM and FM Demodulation
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 36
Mandarin, female30 sec, < 1Hz
0.2 Hz / 5s:the rhythm ofinterpausal units?
AM and FM Demodulation
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 37
AM and FM Demodulation and Spectrum Analysis: Calibration
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 38
English (RP)Edinburgh corpus
“The North Wind and the Sun”
Beijing MandarinYu corpus
“bei3 feng1 gen1 tai4 yang2”
Envelope Demodulation: Extending to Discourse Spectra
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 39
English (RP)Edinburgh corpus
“The North Wind and the Sun”
Beijing MandarinYu corpus
“bei3 feng1 gen1 tai4 yang2”
Short phrases
Short IPUs
ParatoneIPUs
IPU layers
PhrasesIPUs
1 Hz
Envelope Demodulation: Extending to Discourse Spectra
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 40
SpectralFrequency
ZoneBoundaries EnglishNewsreading
EnglishStory
Extending to Discourse Spectra: English Genres
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 41
Contrast
elderly male Englishyoung female Mandarin
Extending to discourse spectra: English-Mandarin
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 42
Rhythms in Syntagmatic, Compositional Spectral Trees
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 43
L-strong, > L-strong, < R-strong, > R-strong, <
Frequency Domain Trees: Spectral Zone Hierarchies
EnglishNewsreading
AM and FM Demodulation and Spectral Tree Induction
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 44
L-strong, <
AEMS Frequency Tree
EnglishNewsreading
AM and FM Demodulation and Spectral Tree Induction
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 45
L-strong, <
AEMS Frequency Tree
EnglishNorth Wind & Sun
AM and FM Demodulation and Spectral Tree Induction
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 46
L-strong,
<
AEMS Frequency TreeMandarinNorth Wind &
Sun
AM and FM Demodulation and Spectral Tree Induction
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 47
Rhythms in Paradigmatic, Classificatory Spectral Trees
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 48
Data:Chunks from “The North Wind and the Sun”
Male, English: 40s
Female, Mandarin: 40s
Method:Comparison of non-overlapping adjacent 5s audio chunks– offsets into recording: 0, 5, 10, 15, 20, 25, 30, 35– AEMS for each chunk– Inter-speaker comparison (AEMS pointwise means, r=0.82)– Classification by hierarchical similarity / distance:
● Pearson Distance: 1 – r , range 0 … 2● comparison of 7 classifiers● similar classification methods used in dialectometry,
stylometry, language typology
Consistency of AM Spectra: Rhythm Classification
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 49
Similarity criterion:>3 adjacent same-speaker settings
Largest per speaker score: (4+3)/16
Largest cluster:4/16
Consistency of Spectra: Rhythm Classification
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 50
Highest speaker-specific total:1 Nrst.Pt. (4+3+3)/10
Largest cluster:5 UPGMC 5/10
Consistency of Spectra: Rhythm Classification
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 51
Discourse Rhythms: Long FM contours
Thesis: in evolution,– frequency modulation and rhythm came first
● emotional cries● turn-taking came before grammar,
Levinson, “Turn-taking in Human Communication – Origins and Implications for Language Processing”, 2015
Note: in infant speech,– frequency modulation and rhythm also come first
● emotional criesWermke, Sebastian-Galles
● turn-takingcf. the ‘bootstrapping’ literature
the infant ‘twin-talk’ videos on YouTube ☺
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 52
Answer: falling utterance contour
Question+Answer: rising-falling adjacency pair contoursyntagmatic entrainment
Question: rising utterance contour
Discourse Rhythms: Long FM contours
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 53
Answer: falling utterance contour
L* sequence, global rise, final rise
H* sequence, global fall
H* sequence, global fall
L* sequence, global fall, final rise
Response Continuation
Interview start Question Question
Discourse Rhythms: Long FM contours
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 54
But there are Methodological Problems in F0 / Pitch estimation
1. Terminology:– articulation rate (production)– F0 (acoustic transmission)– pitch (perception)
2. Measurement:– F0 estimation implementations yield slightly different results
● Autocorrelation● Normalised Cross-correlation● Average Magnitude Difference Function (AMDF)● FFT peak detection● Cepstrum
– Environment differences● Preprocessing: low-pass filter; centre-clipping● Postprocessing: moving median
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 55
RAPT (Robust Algorithm for Pitch Tracking)
RAPTC binary
David Talkin
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 56
RAPT (Python emulation)
RAPTPython
emulation
Daniel Gaspari
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 57
Reaper (Robust Epoch And Pitch EstimatoR)
Reapercompiled C
binary
David Talkin
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 58
Praat
Praatscript+binary
Paul Boersma
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 59
YIN (as opposed to YANG, Python emulation)
YINPython
emulation
Patrice Guyot
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 60
YAAPT (Yet Another Algorithm for Pitch Tracking, Python emulation)
YAAPTPython
emulation
Bernardo J. B. Schmitt
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 61
SWIPE (Square Wave Inspired Pitch Estimator, Python emulation)
SWIPEPython
emulation
Disha Garg
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 62
F0 – Pitch: Methodological Problems
1. Terminology:– articulation rate (production)– F0 (acoustic transmission)– pitch (perception)
2. Measurement:– F0 estimation implementations yield slightly different results
● Autocorrelation● Normalised Cross-correlation● Average Magnitude Difference Function (AMDF)● FFT peak detection● Cepstrum
– Environment differences● Preprocessing: low-pass filter; centre-clipping● Postprocessing: moving median
So let’s do somthing about it
● do-it-yourself F0 estimation● with full control over all parameters.
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 63
F0 – Pitch: A Constructive Do-It-Yourself Strategy
Time domain:
AMDF
A kind of auto-correlation, but with subtraction minima not correlation maxima
Preprocessing:● centre-clipper● low-pass filterPostprocessing:● moving median
Simple: no● voice detection● candidate weighting
(code on GitHub)
Frequency domain:
FFT+spectrum peak-picking
Finding the lowest frequency spectral peak in the Fourier transform
Preprocessing:● centre-clipper● low-pass filterPostprocessing:● moving median
Simple - no● voice detection● candidate weighting
(code on GitHub)
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 64
How about AMDF (Average Magnitude Difference Function, Python)
Dafydd Gibbon
AMDFnaive difference minima, Python
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 65
FFTpeak (Simple F0 Tracker, Python)
S0FTnaive FFT
peak, Python
Dafydd Gibbon*
* inspired by a snippet from ‘Jonathan’, gist.github.com/endolith/255291
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 66
Comparing F0 estimators with RAPT as ‘gold standard’
Benchmark against standard F0 estimators
Encouraging for FFTpeak (problems remain, of course):● correlation ignores some relevant properties such as
overall difference in pitch height● (slightly positively biased) idea of the relationship● not enough test data● not as robust as RAPT● but suggests that RAPT is fit for purpose
F0 estimator pair Correlation
RAPT:PyRAPT 0,8902RAPT:Praat 0,8657RAPT:FFTpeak 0,9096RAPT:f0AMDF 0,8605
FFTpeak:RAPT 0,9096FFTpeak:Praat 0,8409FFTpeak:PyRAPT 0,8352FFTpeak:f0AMDF 0,8016
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 67
RAPT (Robust Algorithm for Pitch Tracking)
RAPTC binary
David Talkin
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 68
Continuing with FM anyway: Emotive Rhythms
Thesis 1:In the evolutionary time domain: emotive ‘animal’ modulations came before structural modulations
Thesis 2:In the beginning was “Wow!” (Or “Aaah!”)
Thesis 3:Or the wolf whistle (it’s not simply ‘cat-calling’)
Thesis 4:Other primates wowed, aahed and whistled first – we humans continued the custom
Is this why in some societies there is a taboo on whistling?
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 69
Many Uses of Frequency Modulation in Discourse Rhythms
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 70
Alibaba FM
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 71
哇
Cantonese region (Guangzhou) Wu region (Shanghai)
‘Tone 6’ ☺Tone 4
Emotive Exclamations
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 72
啊
Cantonese region (Shenzhen)
Emotive Exclamations
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 73
FM Rhythms in Teleglossia
13th Chinese Phonetics Conference, 2018 D. Gibbon, States and Times or Cycles and Frequencies 74
Thank you!谢谢 !