speech compression using lpc

Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12)

1

Abstract—The past decade has observed progress towards the

submission of low-rate speech coders to public and military

communications. It is essential to this progress that has been the

new speech coders accomplished high quality speech at low data

rates. These coders include mechanisms to show the spectral

properties of speech like speech waveform matching, and

improve the code performance for the human ear. Several of

these have been adopted in cellular telephony standards.

Service providers are unceasingly met with the challenge of

accommodating more users within a limited allocated bandwidth

in mobile communication services. For this object, service

providers are constantly in search of low bit-rate speech coders

that deliver high-quality speech.

In this paper the simulated low bit rate speech signal using

Linear Predictive Coding (LPC) in MATLAB was implemented.

Index Terms—Auto Correlation, Formants, LPC, Levinson

Durbin recursion.

I. INTRODUCTION

―LPC was first introduced as a method for encoding human

speech by the United States Department of Defense in federal

standard 1015, published in 1984‖[1]. Vocal tract can be

approximated as a variable diameter tube. Human speech is

produced in the vocal tract. The linear predictive coding

(LPC) model is based on the vocal tract characterized by this

tube of a varying diameter and it represented in mathematical

approximation. At a particular time, the speech sample is

equals to linear sum of the p previous samples. The important

facet of LPC is the linear predictive filter which determines

the value of the next sample by a linear combination of

previous samples. ―In normal scenario, speech is sampled at

8000 samples/second with 8 bits quantization. This delivers

data rate of 64000 bits/second. Linear predictive coding drops

this to 2400 bits/second.‖[1]. At this rate the speech has a

distinct synthetic sound and there is an obvious loss of quality.

However, the speech can still be easily understandable and

audible to human kind. Hence, it is a lossy form of

compression.

Sometimes, lossy algorithms are thought-out acceptable

because the loss of quality is often undetectable to the human

ear. Fact is that in conversations silence take up greater than

50% of time. It is an easy way to save bandwidth that not to

transmit the silence. One important thing about speech

production is that mechanically there is a high correlation

between adjacent samples of speech.

II. LPC SYSTEM IMPLEMENTATION

The filter model used in LPC is known as the linear predictive

filter. It has two key components: analysis / encoding and

synthesis / decoding.

III. LPC Analyzing/encoding

The encoding part of LPC includes observing the speech

signal and break down it into segments.

Fig. 1 LPC encoder block-diagram

LP methods have been used in control and information

theory—called methods of system estimation and system

identification used extensively in speech under group of

names mentioned below referred from [7].

1. covariance method

2. autocorrelation method

3. lattice method

4. inverse filter formulation

5. spectral estimation formulation

6. maximum likelihood method

7. inner product method

A. Input speech

Under the normal situation, the input signal is sampled at a

rate of 8000 samples per second. This input signal is then

break down into segments and it is transmitted to the receiver.

The 8000 samples in each second of speech signal are broken

into approx. 180 sample segments. This means that each

segment represents 22.5 milliseconds of the input speech

signal.

B. Voice/Unvoiced Determination

As per LPC algorithm, before a speech segment is determined

as being voiced or unvoiced it is first passed through a low-

pass filter with a band of 1 kHz. It is important to determine if

a segment is voiced or unvoiced because voiced sounds have a

distinct waveform then unvoiced sounds. The LPC encoder

informs the decoder if a signal segment is voiced or unvoiced

by sending a single bit. Remember that voiced sounds are

generally vowels and can be considered as a pulse that is

similar to periodic waveforms. These sounds have very large

amplitudes and high energy levels. Voiced sounds also have

distinct formant or resonant frequencies. Unvoiced sounds are

usually non-vowel or consonants sounds and often have

random waveforms and are chaotic. It has smaller amplitudes

then voiced sounds and therefore less energy.

Hence, the decision of voiced and unvoiced speech signals is

confirmed by counting the number of times a waveform

crosses the x-axis and then comparing that value to the

normally range of values (threshold Values) for most unvoiced

and voiced sounds.

Speech Compression using LPC

Disha Modi, M.Tech (Communication),

Electronics and Communication Department

Institute of Technology - Nirma University


2

C. Pitch Period Estimation

The pitch period can be thought of as the period of the vocal

cord vibration that happens during the construction of voiced

speech. Therefore, the pitch period is only required for the

decoding of voiced segments and is not needed for unvoiced

segments since they are produced by turbulent air flow not

vocal cord vibrations. One type of algorithm takes advantage

of the fact that the autocorrelation of a period function,

Rxx(k), will have a maximum when k is equivalent to the

pitch period. These algorithms usually detect a maximum

value by checking the autocorrelation value against a

threshold value. One problem with algorithms that use

autocorrelation is that the validity of their results is susceptible

to interference as a result of other resonances in the vocal

tract. When interference occurs the algorithm can’t guarantee

accurate results. Another problem with autocorrelation

algorithms occurs because voiced speech is not entirely

periodic. This means that the maximum will be lower than it

should be for a true periodic signal.

D. Vocal Tract Filter

The filter that is used by the decoder to re-form the original

input signal is formed based on a set of coefficients. In order

to find the filter coefficients that best match the current

segment being examined the encoder tries to minimize the

mean squared error.

= ∑

E[ ∑

]=0

-2E[ ∑ ]=0

∑ [ ] [ ]

(Use fact that [ ]

Taking the derivative yields a set of M equations. To solve for

the filter coefficients E[ ] has to be estimate.

Autocorrelation is the approach that will be explained here for

linear predictive coding. Autocorrelation needs several initial

assumptions be made about the set or sequence of speech

samples, [ ], in the current segment. First, it needs [ ] be

stationary and second, it needs the [ ] sequence is zero

outside of the current segment. In autocorrelation, each

E[ ] is converted into an autocorrelation function of

the form Ryy(|i-j|). The estimation of an autocorrelation

function Ryy(k) can be expressed as follows.

Using Ryy(k), the M equations that were acquired from taking

the derivative of the mean squared error can be written in

matrix form RA = P where A contains the filter coefficients.

In order to determine the filter coefficients, the equation A =

P must be solved. This equation cannot be solved without

first computing . This is an easy computation if one

observes that R is symmetric and all diagonals consist of the

same element. This type of matrix is called a Toeplitz matrix

and can be easily inverted [1].

The Levinson-Durbin (L-D) Algorithm is a recursive

algorithm that is considered very computationally efficient

since it takes advantage of the properties of R when

determining the filter coefficients.

L-D Algorithm [2]

The basic simple ideas behind the recursion are first that it is

easy to solve the system for k =1, and second that it is also

very simple to solve for a k +1 coefficients sized problem

when we have solved a for a k coefficients sized problem. In

general none of the coefficients of the different sized problem

match, so it is not a way to calculate but a way to

calculate the whole vector as a function of ,

and . Thinking about it Levinson-Durbin induction would

be a better name.

We are looking for =[

] so that =[

] with

=[

] and is not necessary at this stage. The dot

product of the second line of gives

+ = 0

Therefore,

and +

Solving the size K+1 Problem

Suppose that we have solved the size k problem and have

found , and .

Then we have

has one more row and column than so we cannot

apply it directly to , however if we expend with a zero

and call this vector we can apply to it and we get

the following interesting result


3

Since the matrix is symmetric, we also have something

remarkable when reversing the order of coefficients of

and calling this vector .

We can notice that a linear combination is of

the form wanted for since the first element is a 1 for all

values of . Now if there was a value of for

Calculating ) gives

IV. TRANSMITTING THE PARAMETERS[1]

In an original form, speech is usually transmitted at 64,000

bits/second using 8 bits/sample and a rate of 8000 Hz for

sampling. LPC drops this rate to 2,400 bits/second by breaking

the speech into segments and then directing the

voiced/unvoiced information, the pitch period, and the

coefficients for the filter that signifies the vocal tract for each

segment. The compressed signal used by the filter on the

receiver end is determined by the classification of the speech

segment as voiced or unvoiced and by the pitch period of the

segment. The encoder transmits a single bit to tell if the

current segment is voiced or unvoiced. The pitch period is

quantized using quantizer. 6 bits are required to represent the

pitch period.

If the segment contains voiced speech than a 10th order filter

is used. This means that 11 values are needed: 10 reflection

coefficients and the gain. If the segment contains unvoiced

speech than a 4th order filter is used. This means that 5 values

are needed: 4 reflection coefficients and the gain.

Quantization done as follows:

1 bit voiced/unvoiced

6 bits pitch period (60 values)

10 bits k1 and k2 (5 each)

10 bits k3 and k4 (5 each)

16 bits k5, k6, k7, k8 (4 each)

3 bits k9

2 bits k10

5 bits gain G

1 bit synchronization

54 bits TOTAL BITS PER FRAME

Verification for Bit Rate of LPC Speech Segments

Sample rate = 8000 samples/second

Samples per segment = 180 samples/segment

Segment rate = Sample Rate/ Samples per Segment

= (8000 samples/second)/ (180 samples/second)

= 44.444444.... Segments/second

Segment size = 54 bits/segment

Bit rate = Segment size * Segment rate

= (54 bits/segment) * (44.44 segments/second)

= 2400 bits/second

V. LPC synthesis/decoding

Fig. 2 LPC synthesizer/decoder block-diagram [4]

The process of decoding a sequence of speech segments is the

reverse of the encoding process. Each segment is decoded

individually and the sequence of reproduced sound segments

is joined together to represent the entire input speech signal.

The decoding or synthesis of a speech segment is based on the

54 bits of information that are transmitted from the encoder.

Each segment of speech has a different LPC filter that is

eventually produced using the reflection coefficients and the

gain that are received from the encoder. 10 reflection

coefficients are used for voiced segment filters and 4

reflection coefficients are used for unvoiced segments. These

reflection coefficients are used to generate the vocal tract

coefficients or parameters which are used to create the filter.

The final step of decoding a segment of speech is to pass the

excitement signal through the filter to produce the synthesized

speech signal.

VI. APPLICATION

In general, the most common usage for speech compression is

in standard telephone systems. In fact, a lot of the technology


4

used in speech compression was developed by the phone

companies. Further applications of LPC and other speech

compression schemes are voice mail systems, telephone

answering machines, and multimedia applications. Most

multimedia applications, unlike telephone applications,

involve one-way communication and involve storing the data.

SIMULATION RESULTS

Simulated low bit rate different speech signals using Linear

Predictive Coding (LPC) in MATLAB was implemented.

Fig. 3 Female Original Voice

Fig. 4 Female LPC coded Voice

Fig. 5 Male Original Voice

Fig. 6 Male LPC coded Voice

Performance measurements of LPC compressed signals (both

male and female) are shown in Table I. Looking at the SNR

computed in Table I, it is obvious that both male and female

sounds are noisy as they have a low SNR value. It observed

that for all levels of compression the quality is better with

male signal than female signal; On the other hand the

compression factor with female signal has larger values

comparable with these of male signal. This result is expected

because the female voice has more high frequencies than male

voice. It has observed that no further enhancements can be

achieved beyond certain level of decomposition for both

signals.

PARAMETER MALE FEMALE

Sampling Rate 8000 8000

File length

(in seconds) 2.07 2.77

Length of Original

Signal 99328 133120

Length of

Constructed Signal 97920 132480

SNR(in dB) 17.077 14.77

Compression Ratio 0.9858 0.9952

Table 1 Comparison of male and female LPC synthesized voice

CONCLUSION

Linear Predictive Coding is an analysis/synthesis technique to

lossy speech compression that attempts to model the human

production of sound instead of transmitting an estimate of the

sound wave. Linear predictive coding achieves a bit rate of

2400 bits/second which makes it ideal for use in secure

telephone systems. Secure telephone systems are more

concerned that the content and meaning of speech, rather than

the quality of speech, be preserved. The tradeoff for LPC’s

low bit rate is that it does have some difficulty with certain

sounds and it produces speech that sound synthetic. Linear

predictive coding encoders break up a sound signal into

different segments and then send information on each segment

to the decoder. The encoder send information on whether the

segment is voiced or unvoiced and the pitch period for voiced

segment which is used to create an excitement signal in the

decoder. The encoder also sends information about the vocal

tract which is used to build a filter on the decoder side which

when given the excitement signal as input can reproduce the

original speech.

REFERENCES

[1] J. Bradbury, ―Linear Predictive Coding,‖ 2000.

[2] C. Collomb, ―1 . Description of Linear Prediction 2 . Minimizing the

error,‖ pp. 1–7, 2009. [3] D. R. Sandeep, ―Compression and Enhancement of Speech Signals,‖ no.

Seiscon, pp. 774–779, 2011.

[4] M. A. Osman, N. Al, H. M. Magboub, and S. A. Alfandi, ―Speech compression uses LPC and wavelet,‖ pp. 92–99, 2010.

[5] V. Hardman and O. Hodson. Internet/Mbone Audio (2000) 5-7.

[6] Scott C. Douglas. Introduction to Adaptive Filters, Digital Signal Processing Handbook (1999) 7-12.

[7] D. S. Processing, ―Digital Speech Processing — Lecture 13 Linear

Predictive Coding ( LPC ) - Introduction LPC Methods.‖ Poor, H. V., Looney, C. G., Marks II, R. J., Verdú, S., Thomas, J. A.,

Cover, T. M. Information Theory. The Electrical Engineering Handbook

(2000) 56-57.

speech compression using lpc

Documents