bhatt ninad

Int J Speech Technol (2011) 14:157–165DOI 10.1007/s10772-011-9093-5

Proposed modifications in ETSI GSM 06.10 full rate speech codecand its overall evaluation of performance using MATLAB

Ninad Bhatt · Yogeshwar Kosta

Received: 16 March 2011 / Accepted: 25 May 2011 / Published online: 29 June 2011© Springer Science+Business Media, LLC 2011

Abstract Today, the primary constrain in wireless commu-nication system is limited bandwidth and power. Wirelesssystems involved in transmission of speech envisage that ef-ficient and effective methods be developed (bandwidth us-age & power) to transmit and receive the same while main-taining quality-of-speech, especially at the receiving end.Speech coding is a technique, since the era of digitization(digital) and computerization (computational and processinghorsepower—DSP) that has been a material-of-research forquite some time amongst the scientific and academic com-munity. Amongst all elements of the communication sys-tem (transmitter, channel and receiver), transmission chan-nel (carrier of information/data, also called the medium) isthe most critical and plays a key role in the transmission andreception of information/data.

This paper proposes some modifications in the selectionprocess of grid positions in Regular Pulse Excitation sec-tion of 13 kbps ETSI GSM 06.10 Full Rate Speech coderso that there is an overall 1.8 kbps (36 bits / each 20 msframe) reduction in bit-rate which can be utilized for eitherimproving error detection and correction at channel codingor for hidden data embedding and transmission over wire-less link. Both Standard GSM FR and proposed GSM FRare implemented in MATLAB. Here, Subjective and Objec-tive analysis are carried out on a proposed system to evaluateits performance and the results obtained are then comparedwith the results of GSM 06.10 Full Rate coder using set of

N. Bhatt (�)Veer Narmad South Gujarat University, Surat, Gujarat, Indiae-mail: [email protected]

Y. KostaMarwadi Education Foundation, Rajkot, Gujarat, Indiae-mail: [email protected]

tables and graphs. As can be observed from obtained resultsthat both PESQ and MOS scores are quite comparable foreach wave files and marginal degradation of both can be wit-nessed with respect to decrease in codec bitrates.

Keywords Speech coding · GSM · ETSI · RPE-LTPcoder · Subjective analysis · Objective analysis · MATLAB

1 Introduction

Full Rate GSM 06.10 Speech Coder basically belongs toHybrid coder (Analysis by Synthesis coder) which pro-vides attractive trade off between waveform coders andvocoders, both in terms of speech quality and transmissionbit rate, although generally at the price of higher complex-ity (Malkovic 2003). The speech encoder takes its input as a13 bit uniform PCM signal either from the audio part of themobile station or on the network side, from the PSTN via an8 bit / A-law to 13 (13 bit × 8 KHz = 104 Kbps) bit uni-form PCM as specified in GSM 06.01 (ETSI 2005–2006).The encoded speech at the output of the speech encoder isdelivered to a channel encoder unit which is specified inGSM 05.03 (ETSI 2005–2006). In the receive direction, in-verse operations take place. GSM 06.10 describes the de-tailed mapping between input blocks of 160 speech samplesin 13 bit uniform PCM format to encoded blocks of 260 bitsand from encoded blocks of 260 bits to output blocks of 160reconstructed speech samples. The rate of sampling is 8000samples/s leading to an average bit rate for the encoded bitstream of 13 kbps. The coding scheme is so called Regu-lar Pulse Excitation-Long Term Prediction-Linear PredictiveCoder.

mailto:[email protected]

mailto:[email protected]

158 Int J Speech Technol (2011) 14:157–165

2 GSM full rate encoder

The detailed block diagram of GSM 06.10 Speech Encoderis shown in Fig. 1. The input speech frame, consisting of 160signal samples is first pre-processed to produce an offset freesignal, which is then subjected to a first order pre-emphasisfilter. The 160 samples obtained are then analyzed to deter-mine the coefficients for the short term analysis (LPC Anal-ysis). These parameters are then used for the filtering of thesame 160 samples. The result is 160 samples of the shortterm residual signal. The filter parameters, termed reflectioncoefficients, are transformed to log area ratios, LARs, be-fore transmission. The speech frames are divided into 4 sub-frames with 40 samples of the short term residual signal ineach. Each sub-frame is processed block wise by the sub-sequent functional elements. Before the processing of eachsub block of 40 short term residual samples, the parame-ters of the long term analysis filter, the LTP lag and the LTPgain, are estimated and updated in the LTP analysis block,on the basis of the current sub-block of the present and astored sequence of the 120 previous reconstructed short termresiduals. A block of 40 long term residual signal samples isobtained by subtracting 40 estimates of the short term resid-ual signal from the short term residual signal itself. The re-sulting block of 40 long term residual samples is fed to theRegular Pulse Excitation analysis which performs the basiccompression function of the algorithm. As a result of theRPE analysis, the block of 40 input long term residual sam-ples are represented by one of 4 candidate sub-sequencesof 13 pulses each. The subsequence selected is identifiedby RPE grid position (M). The 13 RPE pulses are encodedusing Adaptive Pulse Code Modulation (APCM) with esti-mation of the sub-block amplitude which is transmitted tothe decoder as side information. The RPE parameters arealso fed to a local RPE decoding and reconstruction mod-ule which produces a block of 40 samples of the quantizedversion of the long term residual signal. By adding these40 quantized samples of the long term residual to the pre-vious block of short term residual signal estimates, a recon-structed version of the current short term residual signal isobtained. The block of reconstructed short term residual sig-nal samples is then fed to the long term analysis filter whichproduces the new block of 40 short term residual signal esti-mates to be used for the next sub-block thereby completingfeedback loop.

3 GSM full rate decoder

The detailed block diagram of GSM 06.10 Speech Decoderis shown in Fig. 2. The decoder includes the same structureas the feedback loop of the encoder. In error free transmis-sion, the output of this stage will be the reconstructed short

Table 1 Bit allocation for GSM Full Rate Speech Coder (ETSI2005–2006)

Parameter No. per Resolution Total bits /

frame frame

LPC 8 6, 6, 5, 5, 4, 4, 3, 3 36

Pitch Period 4 7 28

Long Term Gain 4 2 8

Grid Position 4 2 8

Peak Magnitude 4 6 24

Sample Amplitude 4 × 13 3 156

Total 260

term residual samples. These samples are then applied to theshort term synthesis filter followed by the de-emphasis filterresulting in the reconstructed speech signal samples. GSM06.10 describes the detailed mapping between input blocksof 160 speech samples in 13 bit uniform PCM format to en-coded blocks of 260 bits and from encoded blocks of 260bits to output blocks of 160 reconstructed speech samples.The sampling rate is 8000 samples/sec leading to an averagebit rate for the encoded bit stream of 13 kbps. The bit allo-cation for the ETSI GSM 06.10 Full Rate Speech coder is asshown in Table 1.

4 Proposed modification in GSM full rate speech coder

GSM Full Rate Coder Consists of three major blocks LinearPredictive Coding Section, Long Term Predictive Sectionand Regular Pulse Excitation Section. The proposed mod-ifications are suggested in RPE Section in the selection ofgrid positions. In RPE section, selection of grid position andsamples is modified such a way that no samples repeat inmultiple grids which is the case of GSM Full Rate coder infirst and forth grid where except sample number 0 and sam-ple number 39 don’t repeat where as all other samples inboth grids repeat. A new proposed grid selection strategy isas shown in Fig. 3. As can be seen in Fig. 3, if the weightingfiltered Prediction-error sequence is down-sampled by a ra-tio of 4 instead of 3, it results into four interleaved sequenceswith regularly spaced pulses. These are defined with

Xm[k] = X[m + 4k];m = 0,1,2,3; k = 0,1,2 . . .9; (1)

where, m = no. of grids per sub segment and k = no. ofsamples per grid.

The benefit in this sampling grid position selection is that,there is no repetition of any sample in multiple grids whereas now the total number of samples per grid reduces from13 to 10 so ultimately there is a reduction in overall bit-rate

Int J Speech Technol (2011) 14:157–165 159

Fig. 1 Detailed block diagram of Full Rate GSM 06.10 Speech Encoder (ETSI 2005–2006)

Fig. 2 Detailed block diagram of Full Rate GSM 06.10 Speech Decoder (ETSI 2005–2006)


Fig. 3 Sampling grids used in position selection for proposed GSMFR 11.2 Kbps coder

Table 2 Bit allocation for proposed GSM full rate speech coder

Parameter No. per Resolution Total bits /

frame frame

LPC 8 6, 6, 5, 5, 4, 4, 3, 3 36

Pitch Period 4 7 28

Long Term Gain 4 2 8

Grid Position 4 2 8

Peak Magnitude 4 6 24

Sample Amplitude 4 × 10 3 120

Total 224

of 1.8 kbps (3 samples per grid × 3 bits per sample × 4subframes = 36 bits/20 ms frame) compared to actual bitrates of 13 kbps for GSM 06.10 FR coder which can be use-ful for error detection and correction purpose in each frametransmission. The proposed modification in GSM FR offersa new bit allocation table as shown in Table 2 and Table 3shows modified encoder parameters according to its occur-rence and its bit allocations in speech frame with referenceto standard GSM FR (ETSI 2005–2006).

The parameters produced by GSM FR encoder like shortterm filter parameters, long term prediction parameters andRPE parameters have their unequal importance with respectto their recovered speech quality. The GSM FR encoded 260bits are rearranged according to its subjective importance asmentioned in GSM 05.03 (ETSI 1999). The rearranged bitsare classified into class Ia, Ib and II which contains total no.of 50, 132 and 78 bits respectively.

Error protection in different classes is using differentmethods like CRC and convolution coding. Table 4 showsthe modifications in GSM 05.03 (Table 2) for proposedGSM FR 11.2 kbps (ETSI 1999), so now the saved 36 bitsper frame can effectively be used either for better error pro-tection at channel coding stage or for steganographic datatransmission.

5 Subjective and objective analysis

5.1 Subjective analysis

(1) Mean Opinion Score (MOS)One of the important Subjective Analysis is MOS (Mean

Opinion Score) which is a statistical method of judging thequality of the compressed speech. Randomly untrained lis-teners are chosen and they are asked to judge the overallquality of recovered speech signal produced after decoding.The ratings of all listeners are recorded after playing de-coded speech in quiet environment with high quality head-phones and then averaged to get final MOS scores. They areprovided as given in the Table 5.

5.2 Objective analysis

To evaluate the performance of proposed GSM FR, the dif-ferent types of Objective Analysis have been carried outin this paper. Objective Analysis has been categorized intowaveform, spectral, perceptual and composite measures (Huand Loizou 2008).

5.2.1 Waveform based objective analysis

The following parameters are evaluated in this category ofObjective Analysis.

(1) Signal to Noise Ratio is mathematically defined as

SNR = 10 log10

∑ |Si |2∑ |Si − So|2 (2)

where Si = input signal, So = decoded signal and N = totalno. of frames.

(2) Segmental SNR is mathematically given as

SNRSEG

= 1

M

M−1∑

j=0

10 log10

[ ∑mj

n−mj −N+1 s2(n)∑mj

n−mj −N+1[s(n) − s(n)]2

]

(3)

where, s(n) = input signal, s(n) = decoded signal, N =segment length, M = no. of segments and mj = end of thecurrent segment (Hu and Loizou 2008).

5.2.2 Perceptual based objective analysis and compositemeasure

The following is the important parameter for performingperceptual based analysis.

(1) Perceptual Evaluation of Speech Quality (PESQ)PESQ compares an original speech signal with the de-

coded signal that is the result of passing the original signalthrough a communication system. The output of PESQ is a


Table 3 Proposed modifications in GSM FR encoder output parameters in order of occurrence and bit allocation within the speech frame

Parameter Parameter number Parameter name Variable name Number of bits Bit number (LSB-MSB)

Filter parameters 1 Logarithmic Area Ratio 1 to 8 LAR 1 6 b1. . .b6

2 LAR 2 6 b7. . .b12

3 LAR 3 5 b13. . .b17

4 LAR 4 5 b18. . .b22

5 LAR 5 4 b23. . .b26

6 LAR 6 4 b27. . .b30

7 LAR 7 3 b31. . .b33

8 LAR 8 3 b34. . .b36

LTP parameters 9 LTP Lag N1 7 b37. . .b43

10 LTP Gain b1 2 b44. . .b45

RPE parameters 11 Grid position M1 2 b46. . .b47

12 Block amplitude Xmax1 6 b48. . .b53

13 RPE pulse 1 X1(0) 3 b54. . .b56

14 RPE pulse 2 X1(1) 3 b57. . .b59

. . . . .

. . . . .

22 RPE pulse 10 X1(9) 3 b81. . .b83


24 LTP Gain b2 2 b91. . .b92



27 RPE pulse 1 X2(0) 3 b101. . .b103

28 RPE pulse 2 X2(1) 3 b104. . .106

. . . . .

. . . . .

36 RPE pulse 10 X2(9) 3 b128. . .b130


38 LTP Gain b3 2 b138. . .b139



41 RPE pulse 1 X3(0) 3 b148. . .b150

42 RPE pulse 2 X3(1) 3 b151. . .b153

. . . . .

. . . . .

50 RPE pulse 10 X3(9) 3 b175. . .b177


52 LTP Gain b4 2 b185. . .b186



55 RPE pulse 1 X4(0) 3 b195. . .b197


Table 3 (Continued)

Parameter Parameter number Parameter name Variable name Number of bits Bit number (LSB-MSB)

56 RPE pulse 2 X4(1) 3 b198. . .b200

. . . . .

. . . . .

64 RPE pulse 10 X4(9) 3 b221. . .b224

prediction of the perceived quality that would be given to de-coded speech by a listeners in subjective listening tests likeMOS. The PESQ score is mapped to a MOS like scale withrange between 1.0 and 4.5 (de Lamare and Alcaim 2005). Incomparison with other objective measures, the PESQ mea-sure is the most complex to compute and is the one recom-mended by ITU-T P.862 for speech quality assessment of3.2 kHz (narrow-band) handset telephony and narrow-bandspeech Codecs (ITU-T 2001). The other benefit of PESQ isthat it provides high correlation with subjective MOS analy-sis. PESQ score is computed as a linear combination of theaverage disturbance value Dind and the average asymmetri-cal disturbance values Aind as follows

PESQ = a0 + a1Dind + a2Aind (4)

where, the parameters a0, a1 and a2 are determined usingMultiple linear regression analysis and then optimized forrequired measurements. Different set of parameters (a0, a1,a2) are chosen for establishing correlation between PESQscores and Composite measures in line with (Hu and Loizou2008). Also parameters like Dind and Aind were treated asindependent variables in regression analysis.

(2) Composite measuresAs conventional objective measures are not sufficient to

provide high correlations in terms of speech/noise distortionand overall speech quality, it is hence necessary to combinedifferent objective measures in order to produce Compos-ite measure (Hu and Loizou 2008). Here, Multiple LinearRegression Analysis and Multivariate Adaptive RegressionSpines (MARS) techniques are used to produce different pa-rameters. With reference to ITU P.835 standards, the follow-ing parameters are used for evaluation of Composite mea-sure: Predicted rating of Overall Speech Quality (Covl), Rat-ing of speech distortion (Csig) and Rating of backgrounddistortion (Cbak) (Hu and Loizou 2008; ITU-T 2003).

5.2.3 Spectral based objective analysis

The following parameters are evaluated in this category ofObjective Analysis.

(1) Frequency Weighted Segmental SNR (fwSNRseg) isexpressed as follows

fwSNRseg

= 10

M

×M−1∑

m=0

∑Kj=1 W(j,m) log10

|X(j,m)|2(|X(j,m)|−|X(j,m)|)2

∑Kj=1 W(j,m)

(5)

where W(j,m) is the weight placed on the j th frequencyband, K is the number of bands, M is the total num-ber of frames in the signal, |X(j,m)| is the weighted (bya Gaussian-shaped window) clean signal spectrum in thej th frequency band at the mth frame, and |x(j,m)| in theweighted decoded signal spectrum in the same band (Hu andLoizou 2008).

6 Performance evaluation of proposed GSM FR coder

Here, both GSM 06.10 FR and proposed GSM FR codersare implemented in MATLAB and performance of bothcoders is evaluated using different Subjective and Objec-tive measures. First GSM 06.10 FR coder is implementedin MATLAB and then proposed modifications are carriedout in GSM FR to provide room of 36 bits/ frame in 260bits of each transmitted frames for better error concealmentat channel coding. For the purpose of Subjective and Ob-jective analysis, five different wave files have been chosen(NOIZEUS 2009). Each Wave file is sampled at 8 kHz andcoded by 16 bits mono rather than 13 bits which is the caseof actual GSM FR to produce 104 kbps.

6.1 Results obtained for MOS analysis

As discussed previously, MOS analysis is carried out forfive different wave files. Here, ten untrained listeners hadto judge the quality of speech in quiet environment usinghigh quality headphones. Listeners had to rate score for de-coded speech files of both standard GSM FR coder and alsofor proposed GSM FR coder. Resulting average MOS scoresare sited in Fig. 4. As can be seen from Fig. 4 that there isa small degradation in MOS score for proposed GSM FRcoder in comparison with standard GSM FR coder but stillthe proposed coder offers acceptable values of MOS scorewhen compared with its counterpart.


Table 4 Modifications in proposed GSM FR coder for channel coding (224 bits/20 ms frame)

Parameter name Parameter number Bit index Label Class

Log area ratio 1 1 5 d0 1 A

block amplitude 12, 26, 40, 54 5 d1, d2, d3, d4 With parity check

Log area ratio 1 1 4 .







LTP Lag 9, 23, 37, 51 6 .

Block amplitude 12, 26, 40, 54 4 .

Log area ratio 2, 5, 6 2, 5, 6 3 .

LTP lag 9, 23, 37, 51 5 .

LTP Lag 9, 23, 37, 51 4 .

LTP lag 9, 23, 37, 51 3 .

LTP lag 9, 23, 37, 51 2 .





LTP lag 9, 23, 37, 51 1 d49

Log area ratio 5, 6 5, 6 2 d50, d51 1 B

LTP gain 10, 24, 38, 52 1 . With parity check

LTP lag 9, 23, 37, 51 0 .

Grid position 11, 25, 39, 53 1 .


Log area ratio 2, 3, 8, 4 2 .

2, 3, 8, 4 5, 7

Log area ratio 5, 7 10, 24, 38, 52 1 .

LTP gain 12, 26, 40, 54 0 .

Block amplitude 13. . . .22 2 .

RPE pulses 27. . . .36 2 .

RPE pulses 41. . . .50 2 .

RPE pulses 55. . . .64 2 .

RPE pulses 11, 25, 39, 53 2 .

Grid position 12, 26, 40, 54 0 .

Block amplitude 13. . . .22 1 .

RPE pulses 1 .

RPE pulses 27. . . .35 1 d145

RPE pulses 36 1 d146 Class 2

RPE pulses 41. . . .50 1 . Without error protection

RPE pulses 55. . . ..64 1 .


Log area ratio 2, 3, 6 2, 3, 6 1 .




Table 4 (Continued)

Parameter name Parameter Number Bit index Label Class

Log area ratio 8, 3 8, 3 0 .


Log area ratio 4, 5 4, 5 0 .


RPE pulses 13. . . .22 0 .

RPE pulses 27. . . .36 0 .

RPE pulses 41. . . .50 0 .

RPE pulses 55. . . .64 0 .

Log area ratio 2, 6 2, 6 0 d222, d223

Fig. 4 MOS score comparisonbetween standard GSM FRcoder and proposed GSM FRcoder

Fig. 5 PESQ score comparisonbetween standard GSM FRcoder and proposed GSM FRcoder

Table 5 Mean opinion score (MOS) ratings

Sr. No. Choice MOS ratings

1 Excellent 5

2 Good 4

3 Fair 3

4 Poor 2

5 Unacceptable 1

6.2 Results obtained for objective analysis

In this paper, as mentioned in previous section differenttypes of Objective analysis have been conducted and theirresults are tabulated as shown below in Table 6. As can be

observed in Table 6, all Objective parameters offer accept-able values. It is visible from obtained results that there isa small degradation in values of each parameter when theperformance of proposed GSM FR coder is compared withstandard GSM FR coder. This small degradation of valuesfor all parameters is in response to reduction of bitratesby 1.8 kbps from standard GSM FR to Proposed GSM FRcoder. Figure 5 shows PESQ score comparison between bothstandard and proposed coders.

7 Discussion and concluding remarks

In order to conserve the channel bandwidth, the role of aspeech coder is to provide toll quality recovered speech sig-


Table 6 Objective analysis comparison between standard GSM FR coder & Proposed GSM FR coder

Algorithm Wave files (.wav) Perceptual Analysis Composite measures Waveform Analysis Spectral Analysis

PESQ Csig Cbak Covl SNR SNRseg fwSNRseg

Standard Ninad.wav 3.0812 1.6643 2.1895 1.6737 3.9566 1.7324 1.1441

GSM FR Ninadvoice.wav 3.0381 2.7733 2.4256 2.3592 6.1327 1.8035 2.4729

(13 kbps) Five.wav 2.8038 2.7017 2.4217 2.1832 3.5587 3.6560 8.4468

Doormono.wav 2.6190 2.9858 2.5744 2.4626 3.3471 3.5598 8.3558

Sp21.wav 2.8598 2.6280 2.4405 2.3222 2.0206 2.4059 7.3820

Proposed Ninad.wav 3.0042 1.6209 2.1597 1.6320 3.7958 1.7136 0.8179

GSM FR (11.2 kbps) Ninadvoice.wav 2.8952 2.5618 2.2948 2.1394 5.0678 1.5015 1.0801

Five.wav 2.9636 2.9062 2.4944 2.4357 3.5587 3.0310 8.0093

Doormono.wav 2.3190 2.8337 2.3982 2.2509 2.7804 3.0660 7.6917

Sp21.wav 2.7457 2.4834 2.3076 2.1393 1.8839 2.2064 6.5518

nal even with comparatively lower bit rate and also with lessdelay and complexity. There is a trade off between Qualityof Speech and Bit Rate. Full Rate GSM Speech CODEC of-fers moderate delay and less complexity in comparison withother coders but at the cost of comparatively moderate bitrate (Malkovic 2003).

The idea behind implementation of proposed GSM FRcoder is to reduce the bit-rate of GSM FR coder (by1.8 kbps) which can be used for better error concealmentat channel coding at the same time the spared 36 bits perframe can be used for steganographic data transmission aswell. The proposed GSM FR coder, as is implemented inline with standard GSM FR coder, reduces overall complex-ity and delay (which is an inherent benefit of GSM FR coderwhen compared to other standard GSM coders) but providessmall degradation in speech quality as observed in Figs. 4and 5 for MOS ratings and PESQ scores respectively. Sub-jective analysis provides acceptable values of MOS scorefor proposed coder while compared with standard GSM FRcoder. Objective analysis of different parameters also re-sulted into satisfactory values when both coders are com-pared. Both PESQ (Objective) and MOS (subjective) scoresfor both coders are quite comparable and small reduction intheir values is also clearly visible with reduction in overall

bitrates of coder from standard 13 kbps coder to proposed11.2 kbps coder.

References

de Lamare, R. C., & Alcaim, A. (2005). Strategies to improve the per-formance of very low bit rate speech coders and application to avariable rate 1.2 kbps codec. IEE Proceedings. Vision, Image andSignal Processing, 152(1).

ETSI (1999). Channel coding (GSM 05.03 version 8.9.0 2005-01),pp. 12–19 & 98.

ETSI (2005–2006). Digital cellular telecommunications system (Phase2+), full rate speech, transcoding (GSM 06.10 version 8.2.0), pp.10–59.

Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality mea-sures for speech enhancement. IEEE Transactions on Audio,Speech and Language Processing, 16(1).

ITU-T (2001). Recommendation P.862, Perceptual evaluation ofspeech quality (PESQ): an objective method for end-to-endspeech quality assessment of narrow-band telephone networksand speech codec, pp. 1–18.

ITU-T (2003). Subjective test methodology for evaluating speech com-munication systems that include noise suppression algorithm,P.835.

Malkovic, D. (2003). Speech coding methods in mobile radio com-munication systems. In 17th international conference on appliedelectromagnetics and communications, Croatia.

The NOIZEUS database (2009). Available: http://www.utdallas.edu/~loizou/speech/noize.

http://www.utdallas.edu/~loizou/speech/noize

http://www.utdallas.edu/~loizou/speech/noize

bhatt ninad

Documents