Download - Speech Processing Final Project
![Page 1: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/1.jpg)
Speech Processing Final Project
Estimation of pole and zero model in voiced speech
by Rafael A Alvarez
![Page 2: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/2.jpg)
Introduction
This presentation shows the results of speech estimation using an pole-zero model. The pole-zero model was derived using the Linear prediction coding and homomorphic filtering methods. When both methods are combined the combined method is called the homomorphic prediction method. Several signals will be analyzed and the problems encounter in each case will be presented.
![Page 3: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/3.jpg)
Objectives
This project attempts to address the following areas:
• Modeling of speech using a pole-zero model
• Modeling of speech using Linear prediction and homomorphic filtering methods.
• Results of estimating poles using Hommorphic Prediction.
• Results of Zeros estimation using inverse-filtering and Homomorphic Prediction.
• Problems estimating the model.
• Other possible applications .
![Page 4: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/4.jpg)
Speech modeling
The complete discrete-time speech production model.
Speech source Gain
Mixer
Vocal track and lips radiation
![Page 5: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/5.jpg)
Speech modeling
• Periodic or voiced speech can be modeled with:
( ) ( ) ( ) ( )X z A G z V z R z
( )X z Speech signalA Gain( )G z Glottal flow
( )V z Vocal track (poles and zeros)
Radiation impedance( )R z
![Page 6: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/6.jpg)
Speech modeling
• Complete transfer function of speech signal for voiced sound
1 1
1 1
2 1 * 1
1
1 1 1
1 (1 )(1 )
i
i
M Mo
k kk k
C
k kk
z z b zX z A
z c z c z
Radiation impedance Vocal tract zeros (min and max phase)
Glottal flowVocal tract poles
![Page 7: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/7.jpg)
Linear Prediction Analysis
• Linear prediction coding approximates the system using an an all pole model.
• The zero produced at the radiation of the lips is approximated by long set of poles (Not efficient)
• The resulting Transfer function:
1
1p
kgk
k
S z AH z
U za z
![Page 8: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/8.jpg)
Linear prediction analysis
• Linear combination of past values
1 1
1p p
k kk k g
k k
S z a z S z a S z z AU z
• Time domain representation
1
p
k gk
s n a s n k Au n
• When train of unit samples ug[n] = 0 then the above equation results in:
1
p
kk
s n a s n k
![Page 9: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/9.jpg)
Linear prediction analysis
• Two implementations of this analysis are:
• Covariance method• Considers the value outside the window
• Autocorrelation method• Considers the values outside the window to be zero.
• Uses a window like the hamming window
![Page 10: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/10.jpg)
Linear Prediction analysis
• Results of LPC using autocorrelation methods
0 0.5 1 1.5 2 2.5 3 3.510
-3
10-2
10-1
100
101
102Frequency response of the original and approximated signals using 8 poles
Log
bas
e 1
0 M
ag
nitu
d
![Page 11: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/11.jpg)
Homomorphic filtering
• Base on the concept of superposition• Can easy separate linearly combine systems.
• Generalized superposition• Can separate non-linearly combine systems.• The following properties must apply
1 2 1 2( [ ] [ ]) ( [ ]) ( [ ])
( : [ ]) ( [ ])
x n x n x n x n
x n x n
• Canonical formulation of homomorphic system
x[n]+
y[n]L+ + +
nx ny
1D
D
: . . . .
![Page 12: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/12.jpg)
Homomorphic filtering
• Homomorphic system for convolution
• Applying the results from before on systems resulting from convolution gives:
. log. + +
( )X z ( )X z
1Z Z
ˆ[ ]x n+
.exp .+
ˆ( )Y z ( )Y z
1Z Z [ ]y nˆ[ ]y n
[ ]x n
+ +
![Page 13: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/13.jpg)
Homomorphic filtering
• Algorithm to combined homomorphic filtering and LPC
x cepstrum liftering inv-cepstrum
w[n]
s[n]
LPCˆ[ ]h n
• With the liftering operation (filtering) convolutionaly combined signals can be separated in the quefrency domain.
![Page 14: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/14.jpg)
Homomorphic filtering
• Homomorphic filtering combined with LPC, Homomorphic prediction
0 2000 4000 6000 8000 1000010
-2
10-1
100
101
102 Frequency response of the approximated signals and original signal
Log
bas
e 1
0 M
ag
nitu
d
Frequency in hertz
Homomorphic LPCHomomorphicoriginal signal
![Page 15: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/15.jpg)
Homomorphic prediction
• Combining the previous we can derived a pole-zero model estimation method. – Remembering from before the zero-pole model of speech was given by
1 1
1 1
2 1 * 1
1
1 1 1
1 (1 )(1 )
i
i
M Mo
k kk k
C
k kk
z z b zS z A
z c z c z
– Remember that multiplication in the frequency domain transform into convolution in the time domain.
![Page 16: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/16.jpg)
Homomorphic prediction
• From the previous equation we have.
( ) ( ) * ( )
( ) ( ) ( )
( ) ( )( )
( )
1ˆ ( )( )
,
( ) ( ) ( ) ( )
s n p n h n
S z P z H z
P z B zS z
A z
H zA z
then
B z P z S z A z
S(z) = original signal
P(z) = glottal flow train
B(z)= vocal tract zeros
A(z) = vocal track poles
* The system zeros and glottal flow poles can be obtained by filtering the signal with the inverse vocal track poles. (Inverse filtering)
![Page 17: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/17.jpg)
Homomorphic prediction
• An algorithm to estimate poles and zeros can be derived– First we obtain an approximation of our vocal tract poles as presented before,
x LPC
w[n]
s[n] ˆ[ ]h n
• w[n] must window an area free of zeros and glottal flow poles – The resulting impulse response should represent all the poles in the system. Then this result can be used to inverse
filter the original signal.
![Page 18: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/18.jpg)
Homomorphic prediction
• Zero and glottal flow deconvolution.– Separate the glottal flow from the zeros.– Separate min and max phase zeros.
Inverse- filtering cepstrum
S(z)
High liftering
Low liftering
ˆ( )A z
inverse-cepstrum
B(z)P(z)
P(z)
B(z)High liftering
Low lifteringBmin(z)
Bmax(z)Inverse-cepstrum
Inverse-cepstrum
![Page 19: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/19.jpg)
Homomorphic prediction
• Example of “quefrency” domain signal of a voiced signal.
0 5 10 15 20 25 30-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2Real Cepstrum of windowed original signal
Quefrency ms
Produced by glottal flow train
vocal track poles and zeros
![Page 20: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/20.jpg)
Results
• Signal #1
• First we examine a simple case of a synthesized signal
0 20 40 60 80 100 120-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Original waveform
msec
am
plit
ud
Pitch period
![Page 21: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/21.jpg)
Homomorphic prediction
• Zero free area of original signal, pitch-synchronized method
2 4 6 8 10 12 14 16 18 20
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
extracted "zero free" area of original signal
msec
am
plit
ud
L M I
L = glottal width (19)
M= number of vocal tract zeros (2)
I = zero free area of vocal tract
(2 poles)
![Page 22: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/22.jpg)
Results
• Frequency response of the estimated poles of the vocal tract
0 2000 4000 6000 8000 1000010
-3
10-2
10-1
100
101
102 Spectrum of "zero free" extracted vocal tract (2 poles) and original signal
Log
bas
e 1
0 M
ag
nitu
d
Frequency hz
![Page 23: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/23.jpg)
Results
• Inverse-Filtered signal– After filtering only the glottal flow train and zeros remain
0 200 400 600 800 1000 1200-0.1
-0.05
0
0.05
0.1
0.15Inverse-filtered signal
![Page 24: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/24.jpg)
Results
• Cepstrum of inverse-filtered signal– By liftering(filtering) the high and low part of the cepstrum the glottal flow can be separated from the zeros.
0 50 100 150 200 250 300-2
-1.5
-1
-0.5
0
0.5
1
1.5Complex cepstrum of inverse-filtered signal (one period)
![Page 25: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/25.jpg)
Results
• Approximated glottal flow and zeros
0 5 10 15 20-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2Zero estimate
0 10 20 30 40 50-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4glottal pulse estimate
![Page 26: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/26.jpg)
Results
• Signal #2
• Second a more realistic signal.
0 50 100 150 200 250-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Original waveform
msec
am
plit
ud
Pitch period
![Page 27: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/27.jpg)
Results
• Signal #2.
Problems:
• How many zeros?
• What is the length of the glottal flow?
• How many poles?
1 2 3 4 5 6 7-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
extracted "zero free" area of original signal
msec
am
plit
ud
![Page 28: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/28.jpg)
Results
• Signal #2
0 5 10 15 20 25-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4Inverse-filtered signal
msec
Why are the results so different?
0 2000 4000 6000 8000 1000010
-2
10-1
100
101
102 Spectrum of "zero free" extracted vocal tract (4 poles) and original signal
Log
bas
e 1
0 M
ag
nitu
d
Frequency hz
![Page 29: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/29.jpg)
Results
• Signal #2
• Signal #1
0 100 200 300 400 500-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Autocorration of original signal
0 100 200 300 400 500-10
-5
0
5
10
15Autocorration of original signal
The autocorrelation function on Signal #2 shows aliasing. The method will not work if the signal wasn’t sample at a high
enough frequency.
![Page 30: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/30.jpg)
Results
• Signal #3. Voice recorded at 20Khz
3 4 5 6 7 8 9 10 11 12
x 104
-0.4
-0.2
0
0.2
0.4
0.6
0.8Recorded signal at 20Khz
Area extracted for processing
![Page 31: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/31.jpg)
Results
• Extract area free of zeros
0 10 20 30 40 50 60 70 80-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6extracted "zero free" area of original signal
msec
am
plit
ud
Problems:
• How many zeros?– 6 zeros
• What is the length of the glottal flow?
– 38 since its 20khz
• How many poles?– 10 poles
![Page 32: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/32.jpg)
Results
• Spectrum of zero-free area, all-pole approximation and original signal
0 0.5 1 1.5 2 2.5
x 104
10-2
10-1
100
101
102 Spectrum of "zero free" extracted vocal tract (8 poles) and original signal
Log
bas
e 1
0 M
ag
nitu
d
Frequency hz
![Page 33: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/33.jpg)
Results
• Resulting inverse-filtered signal
20 25 30 35 40 45 50
-0.1
-0.05
0
0.05
0.1
Inverse-filtered signal
msec
Original zero-free area
•Area should be flat if all the poles where approximated accurately.
•Compare with other areas it seem flat.
![Page 34: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/34.jpg)
Possible enhancements
• Implement a iterative algorithm that optimizes the results by combining different values for the different variables.
– Length of glottal pulse
– number of zeros
– number of poles
• Try to different approaches using the HF and LPC tools to get a better approximation.– Use homomorphic filtering to remove the zeros and/or glottal flow first.
– Use pitch estimation algorithm to better establish the pitch period
– Established a better relationship between the zeros and poles in the quefrency domain.
![Page 35: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/35.jpg)
Problems
• Problems in the method:– Requires a good approximation of the area free of zeros
– Requires a good approximation of the the number of zeros, poles and length of glottal flow
– Requires a good approximation of the all pole approximation
– Requires a high sampling rate of the original signal
– May not work for high pitch voice
![Page 36: Speech Processing Final Project](https://reader033.vdocuments.us/reader033/viewer/2022061605/56815168550346895dbf99fb/html5/thumbnails/36.jpg)
Applicatoins
• Possible applications include:– Speech synthesis : recreate human voice
– Speech processing: machine human interaction
– Speaker recognition: extraction of key features of the speaker