Download - Lecture 16 (104 Slides)
-
8/12/2019 Lecture 16 (104 Slides)
1/104
Digital Speech ProcessingDigital Speech Processing
Speech Coding Methods BasedSpeech Coding Methods Basedon Speech Modelson Speech Models
1
-
8/12/2019 Lecture 16 (104 Slides)
2/104
Waveform Coding versus BlockWaveform Coding versus BlockProcessingProcessing
sample-by-sample matching of waveforms coding quality measured using SNR
Source modeling (block processing) block processing of signal => vector of outputs
every block overlapped blocks
Block 1
2
Block 2
Block 3
-
8/12/2019 Lecture 16 (104 Slides)
3/104
-- weve carried waveform coding based on optimizing and maximizing
achieved bit rate reductions on the order of 4:1 (i.e., from 128 KbpsPCM to 32 Kbps ADPCM) at the same time achieving toll quality SNR for telephone-bandwidth speech
o ower ra e ur er w ou re uc ng speec qua y, we nee oexploit features of the speech production model, including:
source modeling s ectrum modelin use of codebook methods for coding efficiency
we also need a new way of comparing performance of differentwaveform and model-based coding methods
, , -based coders since they operate on blocks of speech and dont followthe waveform on a sample-by-sample basis
new subjective measures need to be used that measure user-perceived
3
, ,
-
8/12/2019 Lecture 16 (104 Slides)
4/104
Enhancements for ADPCM Coders
pitch prediction noise shaping
Analysis-by-Synthesis Speech Coders multipulse linear prediction coder (MPLPC) code-excited linear prediction (CELP)
Open-Loop Speech Coders two-state excitation model LPC vocoder residual-excited linear predictive coder
m xe exc a on sys ems speech coding quality measures - MOS speech coding standards
4
-
8/12/2019 Lecture 16 (104 Slides)
5/104
Differential QuantizationDifferential Quantization
n x nd nd nc
][~ n x ][ n xP : simple predictor ofvocal tract response
][ nc ][ nd ][ n x
5= = p
k
k
z zP 1)( ][~
n x
-
8/12/2019 Lecture 16 (104 Slides)
6/104
Issues with Differential uantizationIssues with Differential uantization difference signal retains the character of
the excitation si nal switches back and forth between quasi-periodic and noise-like signals
pre c on ura on even w en us ng
p =20) is order of 2.5 msec (for sampling predictor is predicting vocal tract response
not the excitation period (for voiced sounds)
Solution incorporate two stages ofprediction, namely a short-time predictor
6
-
time predictor for pitch period
-
8/12/2019 Lecture 16 (104 Slides)
7/104
Pitch PredictionPitch Prediction+ + ++ +
+ ++
+
+
+ )( zP = M z zP )(
[ ] x n
++
)(2 zP )(1 1 zP =
=
p
k
k
k z zP 12 )(
)( zPc
first stage pitch predictor:
res uaTransmitter Receiver
second sta e linear redictor vocal tract redictor :
1 ( ) = M P z z
721( )
== p
k
k k P z z
-
8/12/2019 Lecture 16 (104 Slides)
8/104
first stage pitch predictor:
i
M 1
this predictor model assumes that the pitch period, , is an integer number of samples and is a gain constant allowing
i M
for variations in pitch period over time (for unvoiced or background frames, values of and are irrelevant) i
M
1
is of the form:
+ = + +M M P z z z z 1
1 =M M k z 1
this more advanced form provides a way to handle non-integer pitch period through interpolation around the nearest integer
=i
k
8
pitch period value, M
-
8/12/2019 Lecture 16 (104 Slides)
9/104
The combined inverse system is the cascade in the decoder system:i
1 2
1 1 1( ) 1 ( ) 1 ( ) 1 ( )c c
H z P z P z P z= =
[ ][ ]1 21 ( ) 1 ( ) 1 ( ) 1 1 -
cP z P z P z = =
i
[ ]1 2 1( ) ( ) ( )P z P z P z
1 2 1
1
which is implemented as a parallel combination of two predictors:
and
c
P z P z P z
i
[ ]
The prediction signal, can be expressed as x ni
p
9
1
k k =
-
8/12/2019 Lecture 16 (104 Slides)
10/104
-
8/12/2019 Lecture 16 (104 Slides)
11/104
-
8/12/2019 Lecture 16 (104 Slides)
12/104
first search for M that maximizes [M ] opt
Solve for more accurate pitch predictor bym n m z ng e var ance o e expan epitch predictor
Solve for optimum vocal tract predictorcoefficients, k , k=1,2,,p
12
-
8/12/2019 Lecture 16 (104 Slides)
13/104
Pitch PredictionPitch Prediction
Vocal tractprediction
Pitch and
prediction
13
-
8/12/2019 Lecture 16 (104 Slides)
14/104
Noise Shaping in DPCMNoise Shaping in DPCMSystemsSystems
14
-
8/12/2019 Lecture 16 (104 Slides)
15/104
-
8/12/2019 Lecture 16 (104 Slides)
16/104
Noise ShapingNoise Shaping
Basic ADPCM encoder and decoder
Equivalent ADPCM encoder and decoder
16Noise shaping ADPCM encoder and decoder
-
8/12/2019 Lecture 16 (104 Slides)
17/104
Noise Sha inNoise Sha in[ ] [ ] [ ] ( ) ( ) ( )
The equivalence of parts (b) and (a) is shown by the following:
From art a we see that:
x n x n e n X z X z E z= + = +
i
i
( ) ( ) ( ) ( ) [1 ( )] ( ) ( ) ( )
with
D z X z P z X z P z X z P z E z= =
i
( ) ( ) ( ) [1 ( )] ( ) [1 ( )] ( )
showing the equivalence of parts (b) and (a). Further, since
D z D z E z P z X z P z E z= + = +
i
( ) ( ) ( ) ( )1 ( )
1([1 ( )] ( ) [1 ( )] ( ))
X z H z D z D zP z
P z X z P z E z
= =
= +
( ) ( )
Fee X z E z
= +
i
ding back the quantization error through the predictor,
17
[ ] [ ]
, ,, by the quantization error, , incurred in quantizing the
difference signal
x n e n
[ ]., d n
-
8/12/2019 Lecture 16 (104 Slides)
18/104
Shaping the Quantization NoiseShaping the Quantization Noise( )
( ),
To shape the quantization noise we simply replace bya different system function, giving the reconstructed signal as:
P z
F z
i
'( ) ( ) '( ) '( )1 ( )
1
X z H z D z D zP z
= =
'
1 ( )P z
1 ( )( ) '( )
F z X z E z
= +
[ ]Thus if is coded by the encoder, the -transform of thereconstructed signal at the receiver is:
x n zi
'( ) ( ) '( )1 ( )
'( ) '( ) ( ) '(1
X z X z E zF z
E z E z z E zP z
= + = = )
181 ( )
( ) 1 ( ) where is the effective noise shaping filter F z
z P z
=
i
-
8/12/2019 Lecture 16 (104 Slides)
19/104
Noise shaping filter options:i
. an we assume no se as a a spec rum, en
the noise and speech spectrum have the same shape 2. then the e uivalent s stem is the standard
z
F z P z
=
= DPCM
1
'( ) '( ) ( )system where with flat noise spectrum
'' '' p
k k
E z E z E z
= =
= =1
.
to ''hide'' the noise beneath the spectral peaks of the speech signal;k =
zeros have the same angles in the -plane, but with a radius that isdivided by .
z
19
-
8/12/2019 Lecture 16 (104 Slides)
20/104
Noise Sha in Filter Noise Sha in Filter
20
-
8/12/2019 Lecture 16 (104 Slides)
21/104
Noise Shaping Filter Noise Shaping Filter 2'
2 /2 / 2
' '
,
1 ( )
power of then the power spectrum of the shaped noise is of the form:
S
S
e
j F F j F F F eP e
=
1 ( )S P e
spectrum
above
speech
21
-
8/12/2019 Lecture 16 (104 Slides)
22/104
Fully Quantized AdaptiveFully Quantized AdaptivePredictive Coder Predictive Coder
22
-
8/12/2019 Lecture 16 (104 Slides)
23/104
Input is x [n]
P 2 (z ) is the short-term (vocal tract) predictor
Signal v [n] is the short-term prediction error
Goal of encoder is to obtain a quantized representation of this excitation
23
, .
-
8/12/2019 Lecture 16 (104 Slides)
24/104
Total bit rate for ADPCM coder:i
where is the number of bits for the quantization of thedifference signal, is the number of bits for encoding the
ADPCM S P P
B B
= + +i
step size at frame ra ,te and is the total number of bits
allocated to the predictor coefficients (both long and short-term)
with frame rate
P
P
F B
F
8000 1 4Typically and even with bits, we need between
8000 and 3200S F B= i
bps for quantization of difference signal Typically we need about 3000-4000 bps for the side informationi
(step size and predictor coefficients) Overall we need between 11,000 and 36,000 bps for a fui lly quantized system
24
-
8/12/2019 Lecture 16 (104 Slides)
25/104
Bit Rate for LP CodingBit Rate for LP Coding
speech and residual sampling rate: F s=8 kHz LP analysis frame rate: F
=F
P = 50-100 frames/sec
quantizer stepsize: 6 bits/frame predictor parameters:
M (pitch period): 7 bits/frame vocal tract predictor coefficients: PARCORs 16-20, 46-
50 bits/frame
prediction residual: 1-3 bits/sample total bit rate:
25
BR = 72*F P + F s (minimum)
-
8/12/2019 Lecture 16 (104 Slides)
26/104
TwoTwo--Level (B=1 bit) Quantizer Level (B=1 bit) Quantizer re ct on res ua
Quantizer in ut
Quantizer output
Reconstructed pitch
Original pitch residual
econs ruc e speec
26
-
8/12/2019 Lecture 16 (104 Slides)
27/104
ThreeThree- -Level Center Level Center- -Clipped Quantizer Clipped Quantizer re ct on res ua
Quantizer in ut
Quantizer output
Reconstructed pitch
Original pitch residual
econs ruc e speec
27
-
8/12/2019 Lecture 16 (104 Slides)
28/104
Summary of Using LP in SpeechSummary of Using LP in Speechodingoding
the redictor can be more so histicated than a
vocal tract response predictorcan utilizeperiodicity (for voiced speech frames) t e quant zat on no se spectrum can e s ape
by noise feedback
the formant peaks in the speech, thereby utilizing theperceptual masking power of the human auditory
we now move on to more advanced LP coding ofspeech using Analysis-by-Synthesis methods
28
-
8/12/2019 Lecture 16 (104 Slides)
29/104
Analysis Analysis- -byby--SynthesisSynthesisSpeech CodersSpeech Coders
29
-
8/12/2019 Lecture 16 (104 Slides)
30/104
-- --
closed-loop adaptive predictive coder was
input/excitation to the vocal tract model) to
rates while maintaining very high quality at
30
-
8/12/2019 Lecture 16 (104 Slides)
31/104
A A--bb--S Speech CodingS Speech Coding
Replace quantizer for generating excitation signal with anoptimization process (denoted as Error Minimization above)
31
w ere y e exc a on s gna , n s cons ruc e ase onminimization of the mean-squared value of the synthesis error,
d [n]= x [n]- x [n]; utilizes Perceptual Weighting filter.~
-
8/12/2019 Lecture 16 (104 Slides)
32/104
A A--bb--S Speech CodingS Speech Coding
[ ]
as c opera on o eac oop o c ose - oop - - sys em: 1. at the beginning of each loop (and only once each loop), the
speech signal, , is used to generate an optimum order th x n p
i
11 1( )
1 ( )
p
i
H zP z
z = =
[ ] [ ] [ ]
[ ],
2. the difference signal, , based on an initial estimateof the speech signal, is perceptually
d n x n x n
x n
=
=
weighted by a speech-adaptive
filter of the form:1 ( )
( )1 ( )
(see next vugraph)
3. the error minimization box and the excitation generator create a sequence
P zW z
P z
=
[ ]
of error signals that iteratively (once per loop) improve the match to theweighted error signal
4. the resulting excitation signal, , which is an improved estimate of thed n
32
ac ua pre c on error s gna or eac oop era on, s ue o exc e e LPC filter and the loop processing is iterated until the resulting error signal
meets some criterion for stopping the closed-loop iterations.
-
8/12/2019 Lecture 16 (104 Slides)
33/104
-
8/12/2019 Lecture 16 (104 Slides)
34/104
-
8/12/2019 Lecture 16 (104 Slides)
35/104
Implementation of A-B-S Speechoding
the vocal tract filter that produces high qualitysynthetic output, while maintaining a structuredrepresentation that makes it easy to code the
excitation at low data rates SolutionSolution :: use a set of basis functions whichallow you to iteratively build up an optimal
,basis function at each iteration in the A-b-S
35
-
8/12/2019 Lecture 16 (104 Slides)
36/104
Implementation of A-B-S Speech Coding
1 2{ [ ], [ ],..., [ ]}, 0 1
Assume we are given a set of basis functions of the form:
and each basis function is 0 outside the defining interval.Q
Q
f n f n f n n L =
i
At each iteration of the A-b-S loop, wei
:
select the basis function
from that maximially reduces the perceptually weighted mean-squared error, E
( )1 2
0
[ ] [ ] [ ] [ ]
[ ] [ ]where and are the VT and perceptual weighting filte
L
n
E x n d n h n w n
h n w n
=
=
rs.[ ],
[ ] [ ]
.
We denote the optimal basis function at the iteration as
giving the excitation signal where is the optimal
wei htin coefficient for basis function
k
k
th
k k k
k f n
d n f n
n
=
i
The A-b-Sk
i iteration continues until the perceptually weighted error falls below some desired threshold, or until a maximum number of iterations is reached ivin the final excitation si nal as: N d n
36
1
[ ] [ ]k
N
k k
d n f n =
=
-
8/12/2019 Lecture 16 (104 Slides)
37/104
- -
Closed Loo Coder Closed Loo Coder
37Reformulated Closed Loop Coder Reformulated Closed Loop Coder
-
8/12/2019 Lecture 16 (104 Slides)
38/104
-
8/12/2019 Lecture 16 (104 Slides)
39/104
- -
, ,...,
[ ])
e now eg n e era on o e - - oop,
We optimally select one of the basis set (call this anddetermine the am litude ivin :
k f n
=i
i
[ ] [ ], 1, 2,...,
W
k k k d n f n k N = =
i
e then form the new perceptually weighted error as:1
1
' [ ] ' [ ] [ ] '[ ]
' [ ] ' [ ]
k k k k
k k k
th
e n e n f n h n
e n y n
=
=
( )1
2
0
' [ ]
- L
k k n
E e n
=
=
i
( )1
2
10
' [ ] ' [ ] L
k k k n
e n y n
=
=
39
-
8/12/2019 Lecture 16 (104 Slides)
40/104
-
8/12/2019 Lecture 16 (104 Slides)
41/104
- - Our final results are the relations:
N N
i
1 121 1
2
'[ ] [ ] '[ ] ' [ ]
' ' ' '
k k k k
k k
L L N
x n f n h n y n = =
= =
0 0 1
21
2 '[ ] ' [ ] ' [ ]
k k n n k
L N
k k j
E x n y n y n
= = =
= 0 1
where then k j = =
1 1
' ' ' '
re-optimized 's satisfy the relation:k L N L
0 1 0
, , ,...,
[ ]
At receiver use set of along with to create excitation:k
j k k jn k n
k f n = = =
= =
i
411[ ] [ ]
k k k
x n f n =
= [ ]h n
A l iA l i bb S h i C diS h i C di
-
8/12/2019 Lecture 16 (104 Slides)
42/104
Analysis Analysis- -byby--Synthesis CodingSynthesis Coding
[ ] [ ] 0 1 1 u pu se near pre c ve co ng
= = f n n Q L
-
. . a an . . em e, ,
Proc. IEEE Conf. Acoustics, Speech and Signal Proc ., 1982.
[ ] 1 2
vector of white Gaussian noise, = =M f n Q
M. R. Schroeder and B. S. Atal, Code-excited linear prediction
Self-excited linear predictive vocoder (SEV)(CELP), Proc. IEEE Conf. Acoustics, Speech and Signal Proc ., 1985.
1 2, s e vers ons oprevious excitation source = n n
42
R. C. Rose and T. P. Barnwell, The self-excited vocoder,Proc. IEEE Conf. Acoustics, Speech and Signal Proc ., 1986.
-
8/12/2019 Lecture 16 (104 Slides)
43/104
43
-
8/12/2019 Lecture 16 (104 Slides)
44/104
Multipulse LP Coder Multipulse LP Coder
2
u pu se uses mpu ses as e as s unc ons; us e as cerror minimization reduces to:
i
0 1
[ ] [ ] k k n k
E x n h n = =
=
44
-
8/12/2019 Lecture 16 (104 Slides)
45/104
1 11. find best and for single pulse solution
2. subtract out the effect of this impulse from the speech
waveform and repeat the process3. do this until desired minimum error is obtained -
perceptually close to the original
45
M lti l A l iM lti l A l i
-
8/12/2019 Lecture 16 (104 Slides)
46/104
Multipulse AnalysisMultipulse Analysis
Output fromprevious
B. S. Atal and J. R. Remde, A new model of LPC excitation-
frame
46
, .IEEE Conf. Acoustics, Speech and Signal Proc ., 1982.
-
8/12/2019 Lecture 16 (104 Slides)
47/104
Examples of Multipulse LPCExamples of Multipulse LPC
B. S. Atal and J. R. Remde, A new model of LPC Excitation-
47
, .IEEE Conf. Acoustics, Speech and Signal Proc ., 1982.
-
8/12/2019 Lecture 16 (104 Slides)
48/104
-- =
impulses/sec X 9 bits/impulse => 7200 bps need 2400 bps for A(z) => total bit rate of
9600 bps code pulse locations differentially ( i = N i N i-1 ) to reduce range of variable
amplitudes normalized to reduce dynamicran e
48
-
8/12/2019 Lecture 16 (104 Slides)
49/104
basic idea is that primary pitch pulses are correlated and predictable over
, . .,s [n] s [n-M ]
break correlation of speech into short term component (used to providespectral estimates) and long term component (used to provide pitch pulse
first remove short-term correlation by short-term prediction, followed byremoving long-term correlation by long-term predictions
49
-
8/12/2019 Lecture 16 (104 Slides)
50/104
( ) 1 ( ) 1
prediction error filter
= = p
k A z P z z 1
( )short term residual, , includes primary pitch pulses that can be removedby long-term predictor of the form
=
k
u n
( ) 1 giving
=
M B z bz
( ) ( ) ( ) = v n u n bu n M ( )with fewer large pulses to code than in u n
50
A l iA l i bb S h iS h i
-
8/12/2019 Lecture 16 (104 Slides)
51/104
Analysis Analysis- -byby--SynthesisSynthesis
mpu ses se ecte to represent t e output o t e ong term pre ctor, rat er t an t e output o t e s ort
term predictor most impulses still come in the vicinity of the primary pitch pulse
=> result is hi h ualit s eech codin at 8-9.6 Kb s
51
-
8/12/2019 Lecture 16 (104 Slides)
52/104
Code Excited LinearCode Excited LinearPrediction (CELP)Prediction (CELP)
52
-
8/12/2019 Lecture 16 (104 Slides)
53/104
Code Excited LPCode Excited LP basic idea is to represent the residual after long-term (pitch
period) and short-term (vocal tract) prediction on each frame- ,
by multiple pulses replaced residual generator in previous design by a
frame at 8 kHz sampling rate
can use either deterministic or stochastic codebook10
deterministic codebooks are derived from a training set ofvectors => problems with channel mismatch conditions
histogram of the residual from the long-term predictorroughly is Gaussian pdf => construct codebook from whiteGaussian random numbers with unit variance
53
CELP used in STU-3 at 4800 bps, cellular coders at 800 bps
Code Excited LPCode Excited LP
-
8/12/2019 Lecture 16 (104 Slides)
54/104
Code Excited LPCode Excited LP
54
amplitude distribution of the residual from the long-term pitch predictoroutput is roughly identical to a Gaussian distribution with the same meanand variance.
CELP EncoderCELP Encoder
-
8/12/2019 Lecture 16 (104 Slides)
55/104
CELP Encoder CELP Encoder
55
-
8/12/2019 Lecture 16 (104 Slides)
56/104
For each of the excitation VQ codebook vectors
the following operations occur: the codebook vector is scaled by the LPC gain
, , the error signal, e [n], is used to excite the long-term
pitch predictor, yielding the estimate of the speechsignal, x [n], for the current codebook vector the signal, d [n], is generated as the difference
between the s eech si nal x n and the estimated
speech signal, x [n] the difference signal is perceptually weighted and the
-
~
56
Stochastic Code (CELP)Stochastic Code (CELP)
-
8/12/2019 Lecture 16 (104 Slides)
57/104
Stochastic Code (CELP)Stochastic Code (CELP)Excitation AnalysisExcitation Analysis
57
-
8/12/2019 Lecture 16 (104 Slides)
58/104
58
-
8/12/2019 Lecture 16 (104 Slides)
59/104
-
8/12/2019 Lecture 16 (104 Slides)
60/104
Goal is to suppress noise below the maskingthreshold at all fre uencies usin a filter of the form:
i
111
1
p
k k k
k z
=
21
1
p pk k
k k
z z
z
=
=
where the typical ranges of the pai
0.2 0.4
rameters are:
1
2
. .
0.8 0.9
i
60
in the valleys without distorting the speech.
-
8/12/2019 Lecture 16 (104 Slides)
61/104
61
-
8/12/2019 Lecture 16 (104 Slides)
62/104
-
array of Gaussian random numbers, where mostof the samples between adjacent codewordswere identical
Such overlapping codebooks typically use shiftsof one or two samples, and provide largecomplexity reductions for storage and
given frame
62
Overlapped StochasticOverlapped Stochastic
-
8/12/2019 Lecture 16 (104 Slides)
63/104
Overlapped StochasticOverlapped Stochasticodebookodebook
Two codewords which are identical except for a shift of
63
wo samp es
-
8/12/2019 Lecture 16 (104 Slides)
64/104
64
CELP WaveformsCELP Waveforms
-
8/12/2019 Lecture 16 (104 Slides)
65/104
a or g na speec
b s nthetic s eechoutput
residual
(d) reconstructed LPCresidual
(e) prediction residualafter pitch prediction
65
10-bit randomcodebook
-
8/12/2019 Lecture 16 (104 Slides)
66/104
66
-
8/12/2019 Lecture 16 (104 Slides)
67/104
67
FSFS--1016 Encoder/Decoder 1016 Encoder/Decoder
-
8/12/2019 Lecture 16 (104 Slides)
68/104
68
-
8/12/2019 Lecture 16 (104 Slides)
69/104
-
8/12/2019 Lecture 16 (104 Slides)
70/104
-- Three sets of features are produced by
the encoding system, namely:1. the LPC spectral parameters (coded as a
set of 10 LSP parameters) for each 30 msecframe
2. the codeword and gain of the adaptivecodebook vector for each 7.5 msec sub-
3. the codeword and gain of the stochastic
70
. -frame
-
8/12/2019 Lecture 16 (104 Slides)
71/104
--
71
-
8/12/2019 Lecture 16 (104 Slides)
72/104
72
-
8/12/2019 Lecture 16 (104 Slides)
73/104
Total delay of any coder is the time taken by the input
speec samp e o e processe , ransm e , an
decoded, plus any transmission delay, including: buffering delay at the encoder (length of analysis frame window)-~ -
processing delay at the encoder (compute and encode all coderparameters)-~20-40 msec
buffering delay at the decoder (collect all parameters for a frameof speech)-~20-40 msec processing delay at the decoder (time to compute a frame of
output using the speech synthesis model)-~10-20 msec ,
of signals, forward error correction, etc.) is order of 70-130 msec
73
-
8/12/2019 Lecture 16 (104 Slides)
74/104
,
large due to forward adaptation forestimatin the vocal tract and itchparameters backward adaptive methods generally
produced poor quality speech Chen showed how a backward adaptive
as a conventional forward adaptive coder atbit rates of 8 and 16 kbps
74
Low Delay (LD) CELP Coder Low Delay (LD) CELP Coder
-
8/12/2019 Lecture 16 (104 Slides)
75/104
1 ( / 0.9)P z
1 ( / 0.4)
=
P z
75
-
8/12/2019 Lecture 16 (104 Slides)
76/104
-- only the excitation sequence is transmitted to thereceiver; the long and short-term predictors arecom ne n o one or er pre c or w osecoefficients are updated by performing LPC analysis onthe previously quantized speech signal
e exc a on ga n s up a e y us ng e ga ninformation embedded in the previously quantizedexcitation
- , ,bits/sample at an 8 kHz rate; using a codeword length of5 samples, each excitation vector is coded using a 10-bit
- -
codebook) a closed loop optimization procedure is used to populatethe sha e codebook usin the same wei hted error
76
criterion as is used to select the best codeword in theCELP coder
16 kbps LD CELP Characteristics16 kbps LD CELP Characteristics
-
8/12/2019 Lecture 16 (104 Slides)
77/104
8 kHz sam lin rate 2 bits/sample for coding residual
5 samples per frame are encoded by VQ using a10-bit gain-shape codebook 3 bits (2 bits and sign) for gain (backward adaptive on
7 bits for wave shape recursive autocorrelation method used to
compute autocorrelation values from pastsynthetic speech.
77
-voice
LDLD--CELP Decoder CELP Decoder
-
8/12/2019 Lecture 16 (104 Slides)
78/104
a pre c or an ga n va ues are er ve romcoded speech as at the encoder
1
1 110 11 ( )
= M P z
78
3 1110 21 ( )
p P z
-
8/12/2019 Lecture 16 (104 Slides)
79/104
-
8/12/2019 Lecture 16 (104 Slides)
80/104
- -
used to derive an excitation signal that
while being efficient to code
code-excited LPC
the CELP idea
80
-
8/12/2019 Lecture 16 (104 Slides)
81/104
81
--
-
8/12/2019 Lecture 16 (104 Slides)
82/104
82
-
8/12/2019 Lecture 16 (104 Slides)
83/104
d n x n
83
ModelModel--Based CodingBased Codingassume we model the vocal tract transfer function as
-
8/12/2019 Lecture 16 (104 Slides)
84/104
( )( )( ) ( ) 1 ( )
= = =
X z G GH z S z A z P z
1
( )
LPC coder 100 frames/sec, 13 parameters/frame (p 10 LPC
=
=
=> =
k k k
P z a z
coefficients, pitch period, voicing decision, gain) 1300 parameters/secondfor coding versus 8000 samples/sec for the waveform
=>
84
-
8/12/2019 Lecture 16 (104 Slides)
85/104
dont use predictor coefficients (large dynamic range,=>
poles, PARCOR coefficients, etc. code LP parameters optimally using estimated pdfs for
1. V/UV-1 bit 100 bps
2. Pitch Period-6 bits uniform 600 b s3. Gain-5 bits (non-uniform) 500 bps4. LPC poles-10 bits (non-uniform)-5 bitsor an s or o eac o po es ps
Total required bit rate 7200 bps S5-original
85
loss from original speech quality)
quality limited by simple impulse/noise excitation modelS5-synthetic
-
8/12/2019 Lecture 16 (104 Slides)
86/104
.
2. use of PARCOR coefficients (| k i |= uniform pdf with small spectral sensitivity
=> 5-6 bits for coding can achieve 4800 bps with almost samequality as 7200 bps system above
can achieve 2400 bps with 20 msecframes => 50 frames/sec
86
--
-
8/12/2019 Lecture 16 (104 Slides)
87/104
87
--
-
8/12/2019 Lecture 16 (104 Slides)
88/104
the ke roblems with s eech coders based on all-pole linear prediction models
production model
idealization of source as either ulse train orrandom noise lack of accountin for arameter correlation
using a one-dimensional scalar quantizationmethod => aided greatly by using VQ methods
88
VQVQ--Based LPC Coder Based LPC Coder
-
8/12/2019 Lecture 16 (104 Slides)
89/104
on PARCOR
coefficients
bottom line : todramatically improve
quality need improved
Case 1: same quality as 2400 bps LPC vocoder Case 2: same bit rate, higher quality
exc a on mo e
10-bit codebook of PARCOR vectors
44.4 frames/sec
-
22 bit codebook => 4.2 mill ioncodewords to be searched
never achieved good quality
89
, ,
2-bit for f rame synchronization
total bit rate of 800 bps
due to computation, storage,graininess of quantization at cellboundaries
-
8/12/2019 Lecture 16 (104 Slides)
90/104
network-64 Kb s PCM 8 kHz sam lin rate 8-bit log quantization)
international-32 Kbps ADPCM teleconferencing-16 Kbps LD-CELP wireless-13, 8, 6.7, 4 Kbps CELP-based coders secure telephony-4.8, 2.4 Kbps LPC-based
coders (MELP) - - storage for voice mail, answering machines,
- -
90
-
8/12/2019 Lecture 16 (104 Slides)
91/104
91
-
8/12/2019 Lecture 16 (104 Slides)
92/104
bit rate-2400 to 128,000 b s quality-subjective (MOS), objective ( SNR ,
intelligibility) - , delay-echo, reverberation; block coding delay,
rocessin dela , multi lexin dela ,transmission delay-~100 msec
telephone bandwidth-200-3200 Hz, 8kHz
wideband speech-50-7000 Hz, 16 kHz samplingrate
92
Network Speech CodingNetwork Speech Coding
d dd d
-
8/12/2019 Lecture 16 (104 Slides)
93/104
tandardstandards
G.711 compandedPCM 64 Kbps tollG.726/727 ADPCM 16-40 Kbps toll
48, 56,64 Kb s.
G.728 LD-CELP 16 Kbps toll
G.729A CS-ACELP 8 Kbps toll-
93
. .& ACELP
. .Kbps
-
8/12/2019 Lecture 16 (104 Slides)
94/104
Secure Telephony SpeechSecure Telephony Speech
di t d ddi t d d
-
8/12/2019 Lecture 16 (104 Slides)
95/104
oding tandardsoding tandards
- . ps g
FS-1016 CELP 4.8 Kbps FS-1016
95
-
8/12/2019 Lecture 16 (104 Slides)
96/104
-
8/12/2019 Lecture 16 (104 Slides)
97/104
2 types of coders waveform approximating-PCM, DPCM, ADPCM-coders which produce a
reconstructed signal which converges toward the original signal with
decreasing quantization error - - - - - , , . ,
which produce a reconstructed signal which does not converge to the originalsignal with decreasing quantization error
to quality of or iginal speech
parametric coder converges-
maximum quality (due to themodel inaccuracy ofrepresenting speech)
97
-
8/12/2019 Lecture 16 (104 Slides)
98/104
talker and language dependencytalker and language dependency - especially for parametriccoders that estimate itch that is hi hl variable across men, womenand children; language dependency related to sounds of thelanguage (e.g., clicks) that are not well reproduced by model-basedcoders
-normalized to a maximum level; when actual samples are lower thanthis level, the coder is not operating at full efficiency causing loss ofquality
- , ,and interfering talkers; levels of background noise varies, makingoptimal coding based on clean speech problematic
multiple encodingsmultiple encodings - tandem encodings in a multi-linkcommun ca on sys em, e econ erenc ng w mu p e enco ers
channel errorschannel errors - especially an issue for cellular communications;errors either random or bursty (fades)-redundancy methods ofternused
98
nonnon--speech soundsspeech sounds - e.g., music on hold, dtmf tones; sounds thatare poorly coded by the system
-
8/12/2019 Lecture 16 (104 Slides)
99/104
Measures of Speech Coder QualityMeasures of Speech Coder Quality
Intelligibility-Diagnostic Rhyme Test (DRT)
-
8/12/2019 Lecture 16 (104 Slides)
100/104
Intelligibility Diagnostic Rhyme Test (DRT) compare words that differ in leading consonant en y spo en wor as one o a pa r o c o ces high scores (~90%) obtained for all coders above 4
Kbps Subjective Quality-Mean Opinion Score (MOS)
5 excellent quality
3 fair quality 2 poor quality
MOS scores for high quality wideband speech(~4.5) and for high quality telephone bandwidth
100
speech (~4.1)
Evolution of S eech Coder Performance
E ll t North American TDMA
-
8/12/2019 Lecture 16 (104 Slides)
101/104
Excellent North American TDMA
Good 2000
Fair
Poor
ITU RecommendationsCel l ul ar St andar ds
Secure Telephony1980 Profile
1980
Bad
ro e2000 Profile
101Bit Rate (kb/s)
-
8/12/2019 Lecture 16 (104 Slides)
102/104
GOOD (4)G.723.1
G.729
IS-127 G.728 G.726G.711
FAIR (3)
IS54
FS1016MELP1995
2000
POOR (2)FS10151990
BAD (1) 1980
102BIT RATE (kb/s)
-
8/12/2019 Lecture 16 (104 Slides)
103/104
64 kbps Mu-Law PCM 32 kbps CCITT G.721 ADPCM
- 8 kbps CELP
4.8 kbps CELP for STU-3
103
. - -
d b d h d
-
8/12/2019 Lecture 16 (104 Slides)
104/104
Wideband Speech Coding a e a er
3.2 kHz-uncoded 7 kHz-uncoded 7 kHz-coded at 64 kbps (G.722) 7 kHz-coded at 32 kbps (LD-CELP)
- - Female talker
3.2 kHz-uncoded 7 kHz-uncoded 7 kHz-coded at 64 kbps (G.722) - -
104
7 kHz-coded at 16 kbps (BE-CELP)