[ieee icsipnn '94. international conference on speech, image processing and neural networks -...

4
'I' 1994 International Symposium on Speech, Image Processing and Neural Networks, 13-16 April 1994, Hong Kong Improving the Performance of Long History Scalar and Vector LSP Quantizers C. S. Xydeas and K. K. M. So Department of Electrical Engineering, University of Manchester, Dover Street, Manchester M13 9PL, UK. ABSTRACT In order to obtain high quality decoded speech in low bit rate coding systems, it is essential to quantize codec parameters accurately with a minimum number of bits. A new and efficient quantization approach for codec parameters, called Long History Quantization (LHQ), has recently been proposed and applied to the scalar and vector quantization of LSP coefficients, to achieve "transparency" at an average bit rate of about 22 and 19 bits per frame respectively. The proposed paper presents certain improvements in these quantization schemes which further reduce the average LSP bit rates down to 20 and 17 bits per 20ms frame. 1. INTRODUCTION The line spectrum pair (LSP) coefficients have been widely accepted by the speech coding community as a powerful representation of the LPC information. Thus, quantization schemes, which exploit the interframe [ 1,2,3] and intraframe [4,5] correlation of LSP parameters, have been developed. More recently, a general approach for the adaptive quantization of codec parameters, called Long History Quantization (LHQ), has been proposed. LHQ exploits the constraints imposed on the signal by i) the speech production mechanism characteristics of individuals and ii) language and phonetic considerations [6, 71. This paper presents certain improvements in the LHQ-Scalar (LHSQ) and LHQ-Vector (LHVQ) LSP quantization schemes reported in [7]. The quantization process is enhanced by adding a variable frame LPC analysis capability to the LHQ approach. In this case a similarity measure is defined between adjacent input speech segments. This effectively determines the range over which LPC analysis is performed to yield a set of LSP coefficients (<LSP>), see Figure 1. An index M is transmitted separately as side information which is the ratio of the LPC analysis frame size, divided by the basic input speech frame size. LHQ is then applied to the set of <LSP> coefficients to produce <QLSP> indices. Simple noiseless coding [9] which exploits any redundancy present in the long<QLSP> bit statistics, is also applied to yield the main bit stream at the output of the system. 0-7803-1865-X/94/$3.00 0 1994 IEEE 551 The general LHQ approach and its application to quantization of LSP coefficients are briefly reviewed in Sections 2 and 3 respectively. The incorporation of multiple frame LPC analysis and noiseless coding of quantization indices produced by LHQ schemes is described in Sections 4 and 5. Results which highlight the performance of the proposed LSP quantization system are presented in Section 6 and indicate a transparent performance at average bit rates of 20 and 17 bits per 20ms coding frame. Figure 1: Block Diagram of the Enhanced LHQ Scheme 2. LONG HISTORY QUANTIZATION (LHQ) A speech production model is usually employed in low bit rate speech coding and a set of model parameters is derived on a frame-by-frame basis. Optimized scalar [5] and vector [8] quantizers, which are designed according to long term statistics of model parameters, are generally applied. However, similar or exact speech signal information is likely to be found in previous frames and such long term redundancy is exploited by LHQ schemes to achieve higher signal compression. In general, let Er denote the kth speech frame to be encoded and {Pk}, ; i=l, 2, ..., B are B sets of model parameters representing Er. In addition, EIk.j and {p'k.j)i ; j=1, 2, ..., N are regarded as the N previously decoded speech frames and the corresponding quantized parameter sets respectively. {Pk )I can be LHQ-quantized to one of the N entries ((P'k.j)f) using a LHQ adaptive codebook which contains part of the signal's history. The LHQ approach is applicable to a single parameter or sets of ISSIPNN94 - I

Upload: kkm

Post on 17-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks - Hong Kong (13-16 April 1994)] Proceedings of ICSIPNN '94. International Conference on

'I'

1994 International Symposium on Speech, Image Processing and Neural Networks, 13-16 Apri l 1994, Hong Kong

Improving the Performance of Long History Scalar and Vector LSP Quantizers

C. S. Xydeas and K. K. M. So

Department of Electrical Engineering, University of Manchester,

Dover Street, Manchester M13 9PL, UK.

ABSTRACT

In order to obtain high quality decoded speech in low bit rate coding systems, it is essential to quantize codec parameters accurately with a minimum number of bits. A new and efficient quantization approach for codec parameters, called Long History Quantization (LHQ), has recently been proposed and applied to the scalar and vector quantization of LSP coefficients, to achieve "transparency" at an average bit rate of about 22 and 19 bits per frame respectively. The proposed paper presents certain improvements in these quantization schemes which further reduce the average LSP bit rates down to 20 and 17 bits per 20ms frame.

1. INTRODUCTION

The line spectrum pair (LSP) coefficients have been widely accepted by the speech coding community as a powerful representation of the LPC information. Thus, quantization schemes, which exploit the interframe [ 1,2,3] and intraframe [4,5] correlation of LSP parameters, have been developed. More recently, a general approach for the adaptive quantization of codec parameters, called Long History Quantization (LHQ), has been proposed. LHQ exploits the constraints imposed on the signal by i) the speech production mechanism characteristics of individuals and ii) language and phonetic considerations [6, 71.

This paper presents certain improvements in the LHQ-Scalar (LHSQ) and LHQ-Vector (LHVQ) LSP quantization schemes reported in [7]. The quantization process is enhanced by adding a variable frame LPC analysis capability to the LHQ approach. In this case a similarity measure is defined between adjacent input speech segments. This effectively determines the range over which LPC analysis is performed to yield a set of LSP coefficients (<LSP>), see Figure 1. An index M is transmitted separately as side information which is the ratio of the LPC analysis frame size, divided by the basic input speech frame size.

LHQ is then applied to the set of <LSP> coefficients to produce <QLSP> indices. Simple noiseless coding [9] which exploits any redundancy present in the long<QLSP> bit statistics, is also applied to yield the main bit stream at the output of the system.

0-7803-1865-X/94/$3.00 0 1994 IEEE 551

The general LHQ approach and i ts application to quantization of LSP coefficients are briefly reviewed in Sections 2 and 3 respectively. The incorporation of multiple frame LPC analysis and noiseless coding of quantization indices produced by LHQ schemes is described in Sections 4 and 5 . Results which highlight the performance of the proposed LSP quantization system are presented in Section 6 and indicate a transparent performance at average bit rates of 20 and 17 bits per 20ms coding frame.

Figure 1: Block Diagram of the Enhanced LHQ Scheme

2. LONG HISTORY QUANTIZATION (LHQ)

A speech production model is usually employed in low bit rate speech coding and a set of model parameters is derived on a frame-by-frame basis. Optimized scalar [ 5 ] and vector [8] quantizers, which are designed according to long term statistics of model parameters, are generally applied. However, similar or exact speech signal information is likely to be found in previous frames and such long term redundancy is exploited by LHQ schemes to achieve higher signal compression.

In general, let Er denote the kth speech frame to be encoded and {Pk}, ; i = l , 2, ..., B are B sets of model parameters represent ing E r . In addi t ion, EIk.j and {p'k.j)i ; j=1, 2, ..., N are regarded as the N previously decoded speech frames and the corresponding quantized parameter sets respectively. {Pk )I can be LHQ-quantized to one of the N entries ((P'k.j)f) using a LHQ adaptive codebook which contains part of the signal's history. The LHQ approach is applicable to a single parameter or sets of

ISSIPNN94 -

I

Page 2: [IEEE ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks - Hong Kong (13-16 April 1994)] Proceedings of ICSIPNN '94. International Conference on

parameters and can operate with fixed scalar or vector quantizers. A decision is usually made in each coding frame [6] in terms of LHQ or fixed quantization, which is transmitted in the form of a binary flag, Fk,,. Thus, [Pkjl is first quantized, using a fixed quantizer, to (F$')iwhich is then compared to the entries, (P'k-j ) I , of the LHQ codebook. The system then decides whether to transmit {&}I ={a')~ with 4 . 1 =Oor the index j of the {Pk}j =(P'k.j)f assignment with Fk,l=l. In this case, the LHQ codebook search is based on quantized parameter values and involves only the indices of the fixed quantizer output levels. The LHQ codebook is then updated using a caching technique to maximize the coding relevance of the information stored in the codebook.

3.

LHQ has been applied with fixed scalar (LHSQ) or vector (LHVQ) quantizers to code a set of 10 LSP coefficients. The basic input LPC analysis frame duration is 2Oms and the LSP coefficients are initially quantized using a total of 38 bits (scalar) or 30 bits (3-split vector) per coding frame. The LHSQ and LHVQ search algorithms compare and look for a match of the output index of the fixed scalar/vector quantizer to those stored in the LHQ adaptive codebook.

LHQ AS APPLIED TO LSP COEFFICIENTS

A similar procedure is performed in the LHVQ search process. Let VQ(0,)k be the indexofthepthvector-quantized LSP sub-vector, CL,,, in the kth LPC analysis frame. In the same way, VQ(O,)k., represents the index of the pth sub-vector of the jth vector, Clk.j,,. stored in the LHQ codebook. However, condition (1) is not applicable as the search criterion in this case because the absolute difference between the indices of two LSP sub-vectors does not usually provide any information on their spectral discrepancy. In order to utilize the {VQ(Op)k.j) indices, an alternative search method is employed. During the fixed vector quantization of {CkJ) ; p = l , 2, 3 , N. best candidateindices, {VQ(O,,)k ; q=1, 2, ..., Ns), for each sub-vector are obtained. The LHVQ search commences with the possible matching of any of the N. candidate indices, (VQ(O1,)k)r to the (VQ(O,,)k.j) elements of the LHVQ codebook. This is then repeated for the second and third sets of N, candidate indices with the corresponding

codebook. The entries of the LHVQ codebook, for which ( v Q ( o , , ) y ) and (VQ(OS,q)k.j) subsets Of the LHVQ

is satisfied, are selected and further examined with condition (3). It is only then Fk is set to 1 indicating the use of LHVQ in the quantization of the input frame.

Consider that SQ(O,)k represents the index of the pth scalar quantized LSP coefficient, CL,,, in the kth LPC analysis frame, In the same way, SQ(O.)k.i represents the 4. MULTIPLE FRAME LPC ANALYSIS index of the pth element of the~jth vectbr, t ? k i p stored in the LHQ codebook.

The LHSQ search algorithm compares the indices of {CL,,) to those of codebook vectors {C'k.j,p) ; j= 1, 2, ..., N and a codebook entry is identified as a possible candidate to represent {Ck,,} if the following inequality is satisfied for all p.

A similarity detection process has been devised to exploit the interfame correlation of LSP parameters. As a result, LPC analysis can be performed over multiples of the basic input frame size if high correlation is found in adjacent frames. A quantitative correlation measure, @ , is defined for this purpose. In general,

When none of the LHQ codebook vectors is identified to be useful, {CL,,) is assigned to represent {Ck,,} and Fk is set to 0. On the other hand, one or several codebook entries may satisfy ( I ) , and spectral distortion (SD) is formed for each candidate entry. In general, where L/2 5 m S 3L12, x(n) is the nth sample of the current

speech frame, {x(n)), , and L is the size of the basic input frame. The {x(n))c+l speech frame is considered to be "similar" to the {x(n)k frame when @(m) is above a certain threshold, TO, i.e.

where Sk(oi) and Sk(wl) are the spectral values at @(m) 2 TO ( 6 ) frequency wi of the original and "quantized" LPC spectra respectively. N, denotes the number of spectral values used from 0 to x . The candidate entry with minimum SD is selected and compared to a threshold value TSD which effectively controls the "transparent" quantization of the input LSP vectors. Fk is set to 1 only when

SD < TSD

When @(m) <TO, the LPC analysis is confined to the single {x(n)), basic input frame. Otherwise, the above correlation test is repeated using the (x(n))c+l and {x(n)},z frames. In this way, the size of the LPC analysis frame can be extended to a multiple of the basic input frame size. In addition, each correlation test is followed by a comparison of the energy (3)

552

Page 3: [IEEE ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks - Hong Kong (13-16 April 1994)] Proceedings of ICSIPNN '94. International Conference on

values of the adjacent basic input frames and an LPC frame is extended only if the absolute value of the difference formed between adjacent frame energy values is less than or equal to a predetermined threshold, TE. The last condition,

(7)

ensures effectively that the prediction gain of the extended frame compares well with that of the basic frame.

5 .

A simple noiseless coding method is applied to the indices produced by the LHQ schemes, in order to remove any redundancy present in the index j of

the index j provides the lower limit of representing this information with a minimum average number of binary digits, H. i.e.

NOISELESS CODING OF LHQ INDICES

{C'k.j,p) ; j=1 , 2, .... N information. Entropy coding of

N 1 H = x p ( j ) l o g t y (binary digits)

P(J) il

where p( j ) is the probability associated with index j

However, it is not very practical to employ entropy coding on index j. Instead, a much simpler noiseless coding strategy has been used to take advantage of the uneven probability distribution of the index j. When a set of LSP coefficients is LHQ-quantized (i.e. &=I ) , the value of the index j is examined and if jSV, an extra flag bit, Fv= l is used to indicate that the length of the index codeword is LogiV. Otherwise, Fv=O and the length of the index codeword is set to log2N. The value of V is determined experimentally.

6 . EVALUATION OF ENHANCED LHSQ AND LHVQ SCHEMES

The performance of the corresponding improved variable bit rate LHQ schemes is shown in Tables 3 and 4. In these results, the TO value has been determined experimentally and set to Ta=O.85 and the maximum number of basic input speech frames included in the LPC analysis frame is set to 2. As a result, for about 20% of the time, the LPC analysis frame used is extended to 40ms.

Using a LHQ codebook size of N=1024, the cumulative distribution of the index j is shown in Figure 2. It can be observed that at least 30% of the total number of frames have an index value less than or equal to 8, which effectively defined V=8.

IW

90

80

10

ta

w

40

........

1 2 4 8 16 32 64 I28 256 SI2 IOU lndol

Figure 2: Cumulative frequency curve of index j

Tables 3 and 4 support our overall experience that transparent LPC quantization is achieved, using this new variable bit rate LHSQ and LHVQ methodologies, at about 20 and 17 bits per 20ms frame. These variable bit rate LPC quantization results are very much near to the "optimum" compression performance which can be delivered by Backward Adaptive Codebook Quantization.

The proposed variable bit rate LHSQ and LHVQ schemes have been applied for coding sets of 10 LSP coefficients. Performance is measured in terms of Average Spectral Distortion (ASD), percentage of frames (%) quantized with Fk=l and the Average Bit Rate (AvBR) per 20ms frame. LHQcodebook sizes were varied fromN=16 toN=1024 and ASD is measured over I2000 frames. Transparent quantization is achieved when i) ASD 5 ldB, ii) less than 2% of decoded frames satisfy 2dB 5 SD S 4dB and iii) none of the decoded frames exhibits SD > 4dB.

The bit allocation of fixed scalar and vector quantizers used in the LHSQ and LHVQ schemes is 38 and 30 bits per 20ms frame respectively. Optimum TSD and N, values have been determined experimentally and set to 1.6 dB and 60. Quantization performance results for the above LHQ schemes [7] are shown in Table I and Table 2 respectively.

Table 1: Sc~~QUantization ofLSpC,,&icients with LHQ

553

Page 4: [IEEE ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks - Hong Kong (13-16 April 1994)] Proceedings of ICSIPNN '94. International Conference on

Table 2: Vector Quantization of LSP Coefficients with LHQ 5. REFERENCES

Table 3: Enhanced Scalar Quantization of LSP Coefficients with LHQ

[ l ] C . Kuo, F. Jean and H. Wang, “Low Bit Rate Quantization of L S P Parameters us ing Two- dimensional Differential Coding”, IEEE Proc. ICASSP, pp. 197-100, 1992.

[2] J . Grass and P. Kabal, “Methods of improving Vector- Scalar Quantization of LPC Coefficients”, IEEE Proc. ICASSP, pp.657-660, 1991.

131 R. Laroia, N . Phamdo and N . Farvardin, “Robust and Efficient Quantization of Speech LSP Parameters using Structured Vector Quantizers”, IEEE Proc. ICASSP, pp.641-644, 1991.

[4] N. Phamdo and N. Farvardin, “Coding of Speech LSP Parameters using TSVQ with Interblock Noiseless Coding”, IEEE Proc. ICASSP, pp. 193-196, 1990.

[5] N. Sugamura and N. Farvardin, “Quantizer Design in LSP Speech Analysis and Synthesis”, IEEE Proc. ICASSP, pp. 398-401,1988.

[6] C. Xydeas and K. So, “A Long History Quantization Approach to Low Bit Rate Speech Coding”, EURASIP Proc. EUSIPCO, Vol. I , pp. 479-482, 1992.

[7] C. Xydeas and K. So, “A Long History Quantization Approach to Scalar and Vector Quantization of LSP Coefficients”, IEEE Proc. ICASSP, Vol. 11, pp. 1-4, 1993.

[8] K. Paliwal and B. Atal, “Efficient Vector Quantization of LPC Parameters at 24 bitslframe”, IEEE Proc. ICASSP, pp. 661-664, 1991.

Table 4 Enhanced Vector Quantization of LSP coefficients with LHQ

[9] B. Lathi, “An Introduction to Random Signals & Communications Theory”, International Textbook Company, pp. 428-441, 1968.

554