![Page 1: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/1.jpg)
ECE 598: The Speech ECE 598: The Speech ChainChain
Lecture 7: Fourier Transform;Lecture 7: Fourier Transform;
Speech Sources and FiltersSpeech Sources and Filters
![Page 2: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/2.jpg)
TodayToday Fourier Series (Discrete Fourier Transform)Fourier Series (Discrete Fourier Transform)
Sawtooth wave exampleSawtooth wave example Finding the coefficientsFinding the coefficients
SpectrogramSpectrogram Frequency ResponseFrequency Response Speech Source-Filter TheorySpeech Source-Filter Theory Speech SourcesSpeech Sources
TransientTransient FricationFrication AspirationAspiration VoicingVoicing Formant TransitionsFormant Transitions
![Page 3: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/3.jpg)
Topic #1: Discrete Fourier Topic #1: Discrete Fourier TransformTransform
![Page 4: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/4.jpg)
Sawtooth WaveSawtooth Wave
Sawtooth Sawtooth first first harmonic:harmonic: Amplitude: Amplitude:
0.640.64 Phase: Phase:
0.520.52 radiansradians
![Page 5: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/5.jpg)
Sawtooth Second HarmonicSawtooth Second Harmonic
Sawtooth Sawtooth second second harmonic:harmonic: Amplitude: Amplitude:
0.320.32 Phase: Phase:
0.530.53 radiansradians
![Page 6: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/6.jpg)
Sawtooth Third HarmonicSawtooth Third Harmonic
Sawtooth Sawtooth third third harmonic:harmonic: Amplitude: Amplitude:
0.210.21 Phase: Phase:
0.550.55 radiansradians
![Page 7: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/7.jpg)
Sawtooth: Calculating the Sawtooth: Calculating the Fourier TransformFourier Transform
Average over Average over one full one full period of period of (125Hz (125Hz sawtooth * sawtooth * 125Hz 125Hz cosine) = -cosine) = -0.030.03
![Page 8: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/8.jpg)
Sawtooth: Calculating the Sawtooth: Calculating the Fourier TransformFourier Transform
Average Average over one full over one full period of period of (125Hz (125Hz sawtooth * sawtooth * 125Hz sine) 125Hz sine) = 0.636= 0.636
![Page 9: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/9.jpg)
Sawtooth: Amplitude and Sawtooth: Amplitude and Phase of the First HarmonicPhase of the First Harmonic
First Fourier Coefficient of the Sawtooth Wave:First Fourier Coefficient of the Sawtooth Wave: -0.03 + 0.636j-0.03 + 0.636j |-0.03 + 0.636j| = 0.64|-0.03 + 0.636j| = 0.64 phase(-0.03 + 0.636j) = 0.52phase(-0.03 + 0.636j) = 0.52 XX11 = -0.03+0.636j = 0.64 e = -0.03+0.636j = 0.64 ej0.52j0.52
First Harmonic:First Harmonic: hh11(t) = 0.64 e(t) = 0.64 ej(250j(250t+0.52t+0.52))
Re{h1(t)} = 0.64 cos(250Re{h1(t)} = 0.64 cos(250t+0.52t+0.52))
![Page 10: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/10.jpg)
Sawtooth: Calculating the Sawtooth: Calculating the Fourier TransformFourier Transform
Average over Average over one full one full period of period of (125Hz (125Hz sawtooth * sawtooth * 250Hz 250Hz cosine) = -cosine) = -0.030.03
![Page 11: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/11.jpg)
Sawtooth: Calculating the Sawtooth: Calculating the Fourier TransformFourier Transform
Average Average over one full over one full period of period of (125Hz (125Hz sawtooth * sawtooth * 250Hz sine) 250Hz sine) = -0.3173= -0.3173
![Page 12: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/12.jpg)
Sawtooth: Amplitude and Sawtooth: Amplitude and Phase of the Second HarmonicPhase of the Second Harmonic
Second Fourier Coefficient of the Sawtooth Second Fourier Coefficient of the Sawtooth Wave:Wave: -0.03 + 0.3173j-0.03 + 0.3173j |-0.03 + 0.3173j| = 0.32|-0.03 + 0.3173j| = 0.32 phase(-0.03 + 0.3173j) = 0.53phase(-0.03 + 0.3173j) = 0.53 XX11 = -0.03+0.3173j = 0.32 e = -0.03+0.3173j = 0.32 ej0.53j0.53
First Harmonic:First Harmonic: hh22(t) = 0.32 e(t) = 0.32 ej(500j(500t+0.53t+0.53))
Re{hRe{h22(t)} = 0.64 cos(250(t)} = 0.64 cos(250t+0.53t+0.53))
![Page 13: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/13.jpg)
How to Reconstruct a Sawtooth How to Reconstruct a Sawtooth from its Harmonicsfrom its Harmonics
At sampling frequency of FS=8000Hz At sampling frequency of FS=8000Hz (8000 samples/second), the period of a (8000 samples/second), the period of a 125Hz sawtooth is 125Hz sawtooth is (8000 samples/second)/(125 cycles/second) = (8000 samples/second)/(125 cycles/second) =
64 samples64 samples The “0”th harmonic is DC (0*125Hz = 0 The “0”th harmonic is DC (0*125Hz = 0
Hz)Hz) So we need to add up 63 harmonics:So we need to add up 63 harmonics:
x(t) = hx(t) = h11(t) + h(t) + h22(t) + … + h(t) + … + h6363(t) (t)
![Page 14: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/14.jpg)
Why Does This Work???Why Does This Work??? It’s a property of sines and cosinesIt’s a property of sines and cosines They are a They are a BASIS SETBASIS SET, which means that…, which means that… Let <> mean “average over one full period:”Let <> mean “average over one full period:”
<cos<cos22(m(m00t)> = 0.5 for any integer mt)> = 0.5 for any integer m
<cos(m<cos(m00t)cos(nt)cos(n00t)> = 0 if m ≠ nt)> = 0 if m ≠ n
above two properties also hold for sineabove two properties also hold for sine
<cos(m<cos(m00t) sin(nt) sin(n00t)> = 0t)> = 0
00 = 2 = 2FF00 = 2 = 2/T/T00 is the “fundamental frequency” is the “fundamental frequency” in radians/secondin radians/second
![Page 15: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/15.jpg)
Topic #2: SpectrogramTopic #2: Spectrogram
![Page 16: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/16.jpg)
SpectrogramSpectrogram
Time (seconds)
Frequ
ency
(H
z)
![Page 17: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/17.jpg)
How to Create a How to Create a SpectrogramSpectrogram
Divide the waveform into overlapping windowDivide the waveform into overlapping window Example: window length = 6msExample: window length = 6ms Example: window spacing = one per 2msExample: window spacing = one per 2ms Result: neighboring windows overlap by 4msResult: neighboring windows overlap by 4ms
Fourier transform each frame to create the “short-Fourier transform each frame to create the “short-time Fourier transform” Xtime Fourier transform” Xkk(t)(t) Read: kth frequency bin, taken from the frame at time t Read: kth frequency bin, taken from the frame at time t
secondsseconds Could also write X(t,f) = Fourier coefficient from the frame Could also write X(t,f) = Fourier coefficient from the frame
at time=t seconds, from the frequency bin centered at f at time=t seconds, from the frequency bin centered at f Hz.Hz.
In pixel (k,t), plot 20*log10(abs(XIn pixel (k,t), plot 20*log10(abs(Xkk(t))(t)) The Fourier transform expressed in decibels!The Fourier transform expressed in decibels!
![Page 18: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/18.jpg)
Reading a SpectrogramReading a Spectrogram
Time (seconds)
Frequ
ency
(H
z)
Vowel
Turbulence(Frication orAspiration)
Stop
Glide(ExtremeFormant Frequencies)
Each vertical stripe is a glottal closure instant(BANG – high energy at all frequencies)Inter-bang period = pitch period ≈ 10 ms
Nasals
![Page 19: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/19.jpg)
Topic #3: Frequency Topic #3: Frequency ResponseResponse
andandSpeech Source-Filter TheorySpeech Source-Filter Theory
![Page 20: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/20.jpg)
Frequency ResponseFrequency Response Any sound can be windowedAny sound can be windowed
Call the window length “N”Call the window length “N” Its windows can be Fourier transformed:Its windows can be Fourier transformed:
x(t) = hx(t) = h11(t) + h(t) + h22(t) + …(t) + …
x(t) = Xx(t) = X11eejj00tt + X + X22ee2j2j00tt + … + …
What happens when you filter a cosine?What happens when you filter a cosine?Input = eInput = ejj00tt --- Output = H( --- Output = H(00)e)ejj00tt
Input = eInput = e2j2j00tt --- Output = H(2 --- Output = H(200)e)e2j2j00tt
……
![Page 21: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/21.jpg)
Frequency ResponseFrequency Response
If x(t) goes through any filter:If x(t) goes through any filter:x(t) x(t) → ANY FILTER →→ ANY FILTER → x(t) x(t)
Then we can find y(t) using freq Then we can find y(t) using freq response!response!
x(t) = Xx(t) = X11eejj00tt + X + X22ee2j2j00tt + … + …
y(t) = H(y(t) = H(00)X)X11eejj00tt + H(2 + H(200)X)X22ee2j2j00tt + … + …
![Page 22: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/22.jpg)
Frequency ResponseFrequency Response
In general, we can writeIn general, we can write Y(Y() = H() = H()X()X())
![Page 23: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/23.jpg)
Examples of FiltersExamples of Filters Low-Pass Filter:Low-Pass Filter:
Eliminates all frequency components above some cutoff Eliminates all frequency components above some cutoff frequencyfrequency
Result sounds muffledResult sounds muffled High-Pass Filter:High-Pass Filter:
Eliminates all frequency components below some cutoff Eliminates all frequency components below some cutoff frequencyfrequency
Result sounds fizzyResult sounds fizzy Band-Pass Filter:Band-Pass Filter:
Eliminates all components except those between fEliminates all components except those between f11 and f and f22
Result: sounds like an old radio broadcastResult: sounds like an old radio broadcast Band-Stop Filter:Band-Stop Filter:
Eliminates only the frequency components between f1 and Eliminates only the frequency components between f1 and f2f2
Result: usually hard to hear that anything’s happened!Result: usually hard to hear that anything’s happened!
![Page 24: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/24.jpg)
Examples of FiltersExamples of Filters A Room:A Room:
Adds echoes to the signalAdds echoes to the signal A Quarter-Wave Resonator:A Quarter-Wave Resonator:
Emphasizes frequency components at f=(c/4L)+Emphasizes frequency components at f=(c/4L)+(nc/2L), for all integer n(nc/2L), for all integer n
Components at other frequencies are passed Components at other frequencies are passed without any emphasiswithout any emphasis
A Vocal Tract:A Vocal Tract: Emphasizes any part of the input that is near a Emphasizes any part of the input that is near a
formant frequencyformant frequency Any energy at other frequencies is passed Any energy at other frequencies is passed
without emphasiswithout emphasis
![Page 25: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/25.jpg)
The Source-Filter Model of The Source-Filter Model of Speech ProductionSpeech Production
Input: Weak SourcesInput: Weak Sources Glottal closureGlottal closure Turbulence (frication, aspiration)Turbulence (frication, aspiration)
Filter: The Vocal TractFilter: The Vocal Tract Energy near formant frequency is emphasized Energy near formant frequency is emphasized
by a factor of 10 or a factor of 100by a factor of 10 or a factor of 100 Energy at other frequencies is passed without Energy at other frequencies is passed without
emphasisemphasis Speech = (Vocal Tract Transfer Function) X Speech = (Vocal Tract Transfer Function) X
(Excitation)(Excitation) S(S() = T() = T() E() E())
![Page 26: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/26.jpg)
Topic #4: Speech Sources.Topic #4: Speech Sources.Presented as:Presented as:
Events in the Release of a Events in the Release of a Stop ConsonantStop Consonant
![Page 27: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/27.jpg)
Events in the Release of a Events in the Release of a StopStop
“Burst” = transient + frication (the part of the spectrogram whose transfer function has poles only at the front cavity resonance frequencies, not at the back cavity resonances).
![Page 28: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/28.jpg)
Pre-voicing during ClosurePre-voicing during ClosureTo make a voiced stop in most European languages:
Tongue root is relaxed, allowing it to expandm so that vocal folds can continue to vibrating for a little while after oral closure.
Result is a low-frequency “voice bar” that may continue well into closure.
In English, closure voicing is typical of read speech, but not casual speech.
“the bug”
![Page 29: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/29.jpg)
Burst = Transient and FricationBurst = Transient and FricationFrication = Turbulence at a Frication = Turbulence at a
ConstrictionConstriction
Front cavity resonance frequency: FR = c/4Lf
Turbulence striking an obstacle makes noise
The fricative “sh” (this ship)
![Page 30: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/30.jpg)
Aspiration = Turbulence at the Glottis Aspiration = Turbulence at the Glottis (like /h/). Transfer Function During (like /h/). Transfer Function During
Aspiration…Aspiration…
![Page 31: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/31.jpg)
Formant Formant Transitions: Transitions:
Labial Labial ConsonantsConsonants
“the mom”
“the bug”
![Page 32: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/32.jpg)
Formant Formant Transitions: Transitions:
Alveolar Alveolar ConsonantsConsonants
“the tug”
“the supper”
![Page 33: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/33.jpg)
Formant Formant Transitions: Transitions: Post-alveolar Post-alveolar ConsonantsConsonants
“the shoe”
“the zsazsa”
![Page 34: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/34.jpg)
Formant Formant Transitions: Transitions:
Velar Velar ConsonantsConsonants
“the gut”
“sing a song”
![Page 35: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/35.jpg)
Formant Transitions: A Formant Transitions: A Perceptual StudyPerceptual Study
The study: (1) Synthesize speech with different formant patterns, (2) recordsubject responses. Delattre, Liberman and Cooper, J. Acoust. Soc. Am. 1955.
![Page 36: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters](https://reader036.vdocuments.us/reader036/viewer/2022062802/56649e995503460f94b9ca21/html5/thumbnails/36.jpg)
Perception of Formant Perception of Formant TransitionsTransitions