automatic pitch tracking september 18, 2014 the digitization of pitch the blue line represents the...
TRANSCRIPT
![Page 1: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/1.jpg)
Automatic Pitch Tracking
September 18, 2014
![Page 2: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/2.jpg)
The Digitization of Pitch
• The blue line represents the fundamental frequency (F0) of the speaker’s voice.
• Also known as a pitch track
• How can we automatically “track” F0 in a sample of speech?
• Praat can give us a representation of speech that looks like:
![Page 3: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/3.jpg)
Pitch Tracking• Voicing:
• Air flow through vocal folds
• Rapid opening and closing due to Bernoulli Effect
• Each cycle sends an acoustic shockwave through the vocal tract
• …which takes the form of a complex wave.
• The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.
![Page 4: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/4.jpg)
Voicing Bars
![Page 5: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/5.jpg)
Voicing Bars
Individual glottal pulses
![Page 6: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/6.jpg)
Voicing = Complex Wave
• Note: voicing is not perfectly periodic.
• …always some random variation from one cycle to the next.
• How can we measure the fundamental frequency of a complex wave?
![Page 7: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/7.jpg)
• The basic idea: figure out the period between successive cycles of the complex wave.
• Fundamental frequency = 1 / period
duration = ???
![Page 8: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/8.jpg)
Measuring F0• To figure out where one cycle ends and the next
begins…
• The basic idea is to find how well successive “chunks” of a waveform match up with each other.
• One period = the length of the chunk that matches up best with the next chunk.
• Automatic Pitch Tracking parameters to think about:
1. Window size (i.e., chunk size)
2. Step size
3. Frequency range (= period range)
![Page 9: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/9.jpg)
Window (Chunk) Size
Here’s an example of a small window
![Page 10: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/10.jpg)
Window (Chunk) Size
Here’s an example of a large(r) window
![Page 11: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/11.jpg)
Initial window of the waveform is compared to another window (of the same duration) at a later point in the waveform
![Page 12: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/12.jpg)
Matching
The waveforms in the two windows are compared to see how well they match up.
Correlation = measure of how well the two windows match
???
![Page 13: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/13.jpg)
Autocorrelation• The measure of correlation =
• Sum of the point-by-point products of the two chunks.
• The technical name for this is autocorrelation…
• because two parts of the same wave are being matched up against each other.
• (“auto” = self)
![Page 14: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/14.jpg)
Autocorrelation Example• Ex: consider window x, with n samples…
• What’s its correlation with window y?
• (Note: window y must also have n samples)
• x1 = first sample of window x
• x2 = second sample of window x
• …
• xn = nth (final) sample of window x
• y1 = first sample of window y, etc.
• Correlation (R) = x1*y1 + x2* y2 + … + xn* yn
• The larger R is, the better the correlation.
![Page 15: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/15.jpg)
By the NumbersSample 1 2 3 4 5 6
x .8 .3 -.2 -.5 .4 .8
y -.3 -.1 .1 .3 .1 -.1
product -.24 -.03 -.02 -.15 .04 -.08
Sum of products = -.48
• These two chunks are poorly correlated with each other.
![Page 16: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/16.jpg)
By the Numbers, part 2Sample 1 2 3 4 5 6
x .8 .3 -.2 -.5 .4 .8
z .7 .4 -.1 -.4 .1 .4
product .56 .12 .02 .2 .04 .32
Sum of products = 1.26
• These two chunks are well correlated with each other.
(or at least better than the previous pair)
• Note: matching peaks count for more than matches close to 0.
![Page 17: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/17.jpg)
Back to (Digital) Reality
The waveforms in the two windows are compared to see how well they match up.
Correlation = measure of how well the two windows match
???
These two windows are poorly correlated
![Page 18: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/18.jpg)
Next: the pitch tracking algorithm moves further down the waveform and grabs a new window
![Page 19: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/19.jpg)
The distance the algorithm moves forward in the waveform is called the step size
“step”
![Page 20: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/20.jpg)
Matching, again
The next window gets compared to the original.
???
![Page 21: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/21.jpg)
Matching, again
The next window gets compared to the original.
???
These two windows are also poorly correlated
![Page 22: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/22.jpg)
The algorithm keeps chugging and, eventually…
another “step”
![Page 23: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/23.jpg)
Matching, again
The best match is found.
???
These two windows are highly correlated
![Page 24: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/24.jpg)
The fundamental period can be determined by calculating the length of time between the start of window 1 and the start of (well correlated) window 2.
period
![Page 25: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/25.jpg)
period
• Frequency is 1 / period
• Q: How many possible periods does the algorithm need to check?
• Frequency range (default in Praat: 75 to 600 Hz)
Mopping up
![Page 26: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/26.jpg)
Moving on
• Another comparison window is selected and the whole process starts over again.
![Page 27: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/27.jpg)
*
**********************
*******************
*************
****** ********************
************* ************** ***********************
**********************
*********** ****************** *******
****************
F0 (Hz)
1 2 3 4 (s)
200300400
Time
would
Uhm
I
like
A flight to Seattle from Albuquerque
• The algorithm ultimately spits out a pitch track.
• This one shows you the F0 value at each step.
Thanks to Chilin Shih for making these materials available
![Page 28: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/28.jpg)
Pitch Tracking in Praat• Play with F0 range.
• Create Pitch Object.
• Also go To Manipulation…Pitch.
• Also check out:
![Page 29: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/29.jpg)
Summing Up• Pitch tracking uses three parameters
1. Window size
• Ensures reliability
• In Praat, the window size is always three times the longest possible period.
• E.g.: 3 X 1/75 = .04 sec.
2. Step size
• For temporal precision
3. Frequency range
• Reduces computational load
![Page 30: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/30.jpg)
Deep Thought Questions• What might happen if:
• The shortest period checked is longer than the fundamental period?
• AND two fundamental periods fit inside a window?
• Potential Problem #1: Pitch Halving
• The pitch tracker thinks the fundamental period is twice as long as it is in reality.
• It estimates F0 to be half of its actual value
![Page 31: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/31.jpg)
Pitch Halving
pitch is halvedCheck out normal file in Praat.
![Page 32: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/32.jpg)
More Deep Thoughts• What might happen if:
• The shortest period checked is less than half of the fundamental period?
• AND the second half of the fundamental cycle is very similar to the first?
• Potential Problem #2: Pitch doubling
• The pitch tracker thinks the fundamental period is half as long as it actually is.
• It estimates the F0 to be twice as high as it is in reality.
![Page 33: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/33.jpg)
Pitch Doubling
pitch is doubled
![Page 34: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice](https://reader030.vdocuments.us/reader030/viewer/2022033102/56649e445503460f94b3862c/html5/thumbnails/34.jpg)
Microperturbations• Another problem:
• Speech waveforms are partly shaped by the type of segment being produced.
• Pitch tracking can become erratic at the juncture of two segments.
• In particular:
• voiced to voiceless segments
• sonorants to obstruents
• These discontinuities in F0 are known as microperturbations.
• Also: transitions between modal and creaky voicing tend to be problematic.