06423756

8/14/2019 06423756

1/4

8/14/2019 06423756

2/4

8/14/2019 06423756

3/4

WEI et al. : SPEECH QUALITY IMPROVEMENT BASED ON LIST VITERBI AND JOINT SOURCE-CHANNEL DECODING IN UMTS 547

Class A Class B C R C

Class C

Power Control

AMR Speech Codec

VACRC

LVACRC

RNC

NodeB LVADecoding

ViterbiDecoding

ViterbiDecoding

LVADec Bits


Class C

Power Control

AMR Speech Codec

VACRC

ViterbiDecoding

ViterbiDecoding

ViterbiDecoding

VADec Bits

BFI (Bad FrameIndicator)

BFI (Bad FrameIndicator)

(a) (b)

Fig. 3. Dual-CRC scheme by using the LVA decoder. (a) The existing schemewith Viterbi decoding for Class A; (b) The proposed dual-CRC scheme withLVA decoding for Class A.


Class C

Power Control

AMR Speech Codec

VACRC

LVACRC

RNC

NodeB

LVADecoding

ViterbiDecoding

LVADec Bits

BFI(Bad FrameIndicator)

ViterbiDecoding

Soft-ValueEstimation

JSCD

PECJSCDCRC

Soft-Vales(LLRs)

VADec Bits

VADec Bits

Decoded bits

Feedback Values

Fig. 4. The scheme by jointly utilizing the LVA and the JSCD.

The LVA decoding result of Class A and the Viterbi decodingresults of Class B and C are sent to the AMR speech codec forsource decoding. With the dual-CRC scheme, it does not needto modify the power control process or the target BLER, andit has no impact on any other system parameters such as the

RM factor in the AMR speech frame. Meanwhile, as Class Ais the most important in the AMR speech frame, the schemecan signicantly improve the speech quality and minimize theimpact on the whole system.

B. Jointly Utilizing the LVA and the JSCD Scheme

The JSCD utilizes the SSI to recover the correlated pa-rameters in the AMR speech frames, and the LVA enhancesthe channel coding performance. Considering the differentprinciples of operation governing the JSCD and the LVA, wepropose a scheme to further enhance the speech quality by joint utilization of the LVA and the JSCD.

This scheme is shown in Fig. 4. After LVA decoding, weutilize the JSCD to recover the parameters of the AMR speechframe. Since the JSCD needs the soft value of each of theAMR speech bits, the LVA decoder is required to generatethe log-likelihood ratio (LLR) for each decoded bit. However,it is usually not practical to generate all LLRs of everycandidate in the list decoding due to hardware limitations.In our scheme, we only generate the LLRs when the LVAdecoding is incorrect. If the LVA decoding result is incorrect,it is identical to the decoding result of Viterbi decoding, andthe LLRs can be easily generated by a soft-in-soft-out decoder,such as the BCJR or the max-log-MAP decoder. If the LVAdecoding result is correct, the LLR for each decoded bit can

be set to be according to the decoded bit, e.g., bit 0 ismapped to + and bit 1 is mapped to .

The detailed joint LVA and JSCD algorithm is as follows:

Algorithm 1 The joint LVA and JSCD algorithm1: Perform LVA decoding for Class A and Viterbi decoding

for Class B and C. Send out the CRC results of bothViterbi and LVA decoding for Class A bits.

2: Generate LLRs of the LVA decoded bits. If the LVA CRCis correct, set the LLRs to be in the manner describedabove, then go to Step 5. Otherwise, go to Step 3.

3: Perform JSCD decoding once, and output the modieddecoded bits, the LLRs and the extrinsic information tobe fed back to the channel decoder. The iteration numberis increased by one.

4: If the iteration number is equal to the maximum allowed,perform parameter-level error concealment (PEC) as pro-posed in [14], and send the BFI to indicate a frame whichcannot be recovered by PEC. The AMR speech codec doesnot need to perform frame-level error concealment for the

modied frames. If the iteration number is smaller thanthe maximum allowed, go to Step 1.5: Set the iteration number to zero, and send the decoding

results to the AMR speech codec.

The issue of misdetection of the CRC should be takeninto consideration, since an undetected bad frame of ClassA may result in noise. With a 12-bit CRC, the undetectedframe error rate (UER) by Viterbi decoding with block errorrate of P bler is approximately P bler / 2 12 , and the UER of LVA decoding is less than N times that of Viterbi decoding,where N is the number of list candidates of the LVA decoder.When performing the ISCD with one iteration based on theLVA, the UER is approximately 2N P bler / 2 12 . Therefore,with the target BLER of 1% for AMR speech, the UER isapproximately 1/51200 when jointly performing the PLVA4and the ISCD with one iteration. This means that with a 20msspeech frame, a noisy frame would occur approximately oncein every 17 minutes, which is acceptable in practice.

IV. S IMULATION RESULTS

The simulation is performed based on a full UMTS uplink chain, with source coding performed by the AMR 12.2k speech coder. The Viterbi algorithm and the PLVA4 areutilized for convolutional decoding, and the ISCD, operating

with either 0 or 1 iterations, is applied as the JSCD. If a speechframe fails the CRC, the AMR speech codec will discardthis frame in the decoding process. A MOS estimator [15]is applied to evaluate the speech quality.

The simulation results are shown as Fig. 5 - Fig. 8, forboth additive white Gaussian noise (AWGN) and typical urban(TU) fading channels. The PLVA4 has about 0.3 to 0.5dBperformance gain in terms of BLER over Viterbi decoding.With the proposed Dual-CRC scheme, more than 0.3 MOSgain can be obtained with the PLVA4 over Viterbi decodingin most signal-to-noise ratio (SNR) regions.

From the simulation results, we can see that the ISCDhas little impact on the BLER, while it improves the

8/14/2019 06423756

4/4

548 IEEE COMMUNICATIONS LETTERS, VOL. 17, NO. 3, MARCH 2013

0.5 1 1.5 2 2.5 310

3

10 2

101

100

Eb/N0 (dB)

B L E R o

f C l a s s

A b i t s

VAVA+ISCD,NumIteration = 1PLVA4PLVA4+ISCD,NumIteration = 0PLVA4+ISCD,NumIteration = 1

Fig. 5. BLER of VA, VA+JSCD, PLVA and PLVA+JSCD in AWGN channel.

0.5 1 1.5 2 2.5 31

1.5

2

2.5

3

3.5

4

Eb/N0 (dB)

M e a n

O p

i o n

S c o r e

( M O S )


Fig. 6. MOS of VA, VA+JSCD, PLVA and PLVA+JSCD in AWGN channel.

MOS signicantly. VA+ISCD(NumIteration=1) has 0.2 to0 .3 MOS gain over Viterbi decoding in most SNR re-gions, but the gain is a little smaller than that of the PLVA4. The PLVA4+ISCD(NumIteration=0) and thePLVA4+ISCD(NumIteration=1) have about 0.05 to 0.1 and0 .1 to 0.2 MOS gain over the PLVA4, respectively.

V. CONCLUSIONBased on the characteristics of source and channel coding

of AMR speech in UMTS, we propose a novel and practicaldecoding scheme to improve the speech quality, with little

modication of the current system architecture. The speechquality can be signicantly improved by using the LVAdecoder only, or through joint use of the JSCD and LVAdecoders.

REFERENCES[1] 3GPP TS26.090v10.0.0, Adaptive multi-rate (AMR) speech codec,

transcoding functions, Apr. 2011.[2] H. Holma, J. Melero, J. Vainio, T. Halonen, and J. Makine, Performance

of adaptive multirate (AMR) voice in GSM and WCDMA, in Proc. 2003VTC Spring , pp. 21772181.

[3] A. D. Subramaniam, W. R. Gardner, and B. D. Rao, Iterative jointsource-channel decoding of speech spectrum parameters over an additivewhite Gaussian noise channel, IEEE Trans. Audio, Speech, and Lan-guage Processing , vol. 14, no. 1, pp. 152162, Jan. 2006.

3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.210

3

10 2

101

100

Eb/N0 (dB)

B L E R o

f C l a s s

A b i t s

VAVA+ISCDPLVA4PLVA4+SBSDPLVA4+ISCD

Fig. 7. BLER of VA, VA+JSCD, PLVA and PLVA+JSCD in TU channel.

3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.21

1.5

2

2.5

3

3.5

4

Eb/N0 (dB)

M e a n

O p

i o n

S c o r e

( M O S )


Fig. 8. MOS of VA, VA+JSCD, PLVA and PLVA+JSCD in TU channel.

[4] T. Fingscheidt, T. Hindelang, R. V. Cox, and N. Seshadri, Joint source-channel (de-)coding for mobile communications, IEEE Trans. Commun. ,vol. 50, no. 2, pp. 200212, Feb. 2002.

[5] T. Lundberg, P. de Bruin, S. Bruhn, S. Hakansson, and S. Craig, Adaptivethresholds for AMR codec mode selection, in Proc. 2005 VTC Spring ,pp. 21772181.

[6] N. Seshadri and C. E. Sundberg, List Viterbi decoding algorithms withapplications, IEEE Trans. Commun. , vol. 42, no. 2/3/4, pp. 104120,Feb./Mar./Apr. 1994.

[7] 3GPP TS34.108v11.1.0, Reference radio bearer congurations used inradio bearer interoperability testing, Mar. 2012.

[8] J. Hagenauer, Source-controlled channel decoding, IEEE Trans. Com-mun. , vol. 43, no. 9, pp. 24492457, Sept. 1995.

[9] T. Fingscheidt and P. Vary, Softbit speech decoding: a new approach toerror concealment, IEEE Trans. Speech Audio Process. , vol. 9, no. 3,pp. 240251, Mar. 2001.

[10] N. G ortz, On the iterative approximation of optimal joint source-channel decoding, IEEE J. Sel. Areas Commun. , vol. 19, pp. 16621670,Sept. 2001.

[11] R. Perkert, M. Kaindl, and T. Hindelang, Iterative source and channeldecoding for GSM, in Proc. 2001 IEEE Int. Conf. Acoustics, Speech,and Signal Processing , pp. 26492652.

[12] M. Adrat, U. V. Agris, and P. Vary, Convergence behavior of iterativesouce-channel decoding, in Proc. 2003 IEEE Int. Conf. Acoustics,Speech, and Signal Processing , pp. 269272.

[13] 3GPP TS25.212v10.0.0, Chanel coding and multiplexing, Oct. 2010.[14] T. Breddermann, S. Iwelski, and P. Vary, Bad parameter indication for

error concealment in wireless multimedia communication, in Proc. 2010VTC Fall , pp. 15.

[15] ITU-T, P.862.1, Mapping function for transforming P.862 raw resultscores to MOS-LQO, Nov. 2003.

06423756

Documents