4. implementation of speech compression using wavelet...

56
109 4. Implementation of speech compression using Wavelet Transform 4.1 Introduction The design of the software for the speech compression is based on the concepts of the lifting scheme and conventional filter bank method of wavelet transforms. This software is developed in C language. However to provide graphical user interface, to visualise the speech waveforms, recording and playing speech files and to simplify data analysis, this software is also written in visual C++. 4.2 Design criteria We choose to implement Haar, Cubic, CDF and Daubechies-4 wavelets with the lifting scheme and Haar, Daubechies series (Daub-4, Daub-6, Daub-8, Daub-10, Daub-12, and Daub-16 Wavelets with filter bank method. If speech file is small (less then 10 KB) then it can be compressed without dividing into blocks (frames) or it can be compressed in blocks. If the speech file is large then entire file is divided into the blocks of size 256, 512 or 1024 bytes. To simulate near real time speech compression, speech buffers of the size 256 bytes are used. Level dependant and global thresholding techniques are used for lossy speech compression. Hard thresholding and soft thresholding techniques are used for de-noising. Programming is done in C language because it is flexible, structured and efficient language. Programs written in C language requires less execution time compare to other high level languages. Most of digital signal processors manufacturers (Texas, AT&T, Motorola etc.) provide C compilers, simulators and emulators. These C language compilers provide standard C language with extensions for DSP to generated efficient code. C language has high- level language capabilities (structure, functions, arrays etc.) and low-level

Upload: duongdang

Post on 17-May-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

109

4. Implementation of speech compression using Wavelet Transform

4.1 Introduction The design of the software for the speech compression is based on the concepts

of the lifting scheme and conventional filter bank method of wavelet transforms.

This software is developed in C language. However to provide graphical user

interface, to visualise the speech waveforms, recording and playing speech files

and to simplify data analysis, this software is also written in visual C++.

4.2 Design criteria • We choose to implement Haar, Cubic, CDF and Daubechies-4 wavelets with

the lifting scheme and Haar, Daubechies series (Daub-4, Daub-6, Daub-8,

Daub-10, Daub-12, and Daub-16 Wavelets with filter bank method.

• If speech file is small (less then 10 KB) then it can be compressed without

dividing into blocks (frames) or it can be compressed in blocks.

• If the speech file is large then entire file is divided into the blocks of size 256,

512 or 1024 bytes.

• To simulate near real time speech compression, speech buffers of the size

256 bytes are used.

• Level dependant and global thresholding techniques are used for lossy

speech compression. Hard thresholding and soft thresholding techniques are

used for de-noising.

• Programming is done in C language because it is flexible, structured and

efficient language. Programs written in C language requires less execution

time compare to other high level languages. Most of digital signal processors

manufacturers (Texas, AT&T, Motorola etc.) provide C compilers, simulators

and emulators. These C language compilers provide standard C language

with extensions for DSP to generated efficient code. C language has high-

level language capabilities (structure, functions, arrays etc.) and low-level

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

110

assembly language capabilities (Bit manipulation, direct hardware input-

output, calling of assembly language macros etc.)

• The software is also developed using Visual C++ to provide graphical

interface, to display speech waveform, recording and playing speech file. This

software displays original file size, compressed file size, number of zeros,

compression ratio, statistical comparison between original and reconstructed

speech that simplify results analysis.

• Optimization techniques are used to reduce computational time50. Logic

optimisation is done. In lifting scheme, floating point operations are converted

into fixed point operations. In filter bank method, fast wavelet transform

algorithm is used to reduce computational burden..

4.3 Wavelet Transform algorithms

Speech compression using wavelet transform is performed with following steps:

1. Read speech samples from the standard speech file frame by frame. Each

frame consists of 256 samples.

2. Apply wavelet transform on frame of 256 samples.

3. Apply thresholding (replace small wavelet coefficients by zero).

4. Encode wavelet coefficients.

5. Store encoded wavelet coefficients.

6. Read next frame and repeat steps 2 to 5 for entire speech file.

Speech is reconstructed using inverse wavelet transform with following steps:

1. Read the first frame of wavelet coefficients from the compressed file.

2. Decode wavelet coefficients. (Restore zeros).

3. Apply inverse wavelet transform on the frame.

4. Store the frame in the standard speech file.

5. Read next frame of wavelet coefficient and repeat steps 2 to 4.

The wavelet transform algorithms using Haar, Daubechies series, cubic

interpolation, CDF (Cohen-Daubechies-Feauveau) wavelet used in this thesis are

presented here.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

111

4.3.1 Haar Wavelet Transform algorithm Let us consider a sequence of data samples: data[0], data[1], …….., data[n] Haar transform on this sequence can be calculated with following steps:

• The sequence of data samples is divided into two frames: odd frame and

even frame.

• Samples of odd frame are subtracted from samples of even frame that

gives detail coefficients.

• Even samples are replaced by average of even and odd samples that

gives average coefficients.

The approximate coefficients “A” and detail coefficients “D” are overwrite on the

same data sequence to save memory: data[1] = D or data[1] = data[1] - data[0] data[0] = A or data[0] = data[0] + data[1]/2

Inverse Haar transform can be calculated by changing the sign of operation.

data[0] = data[0]-data[1]/2 data[1] = data[1] + data[0]

This steps can be implemented with C code as given below:

Forward Haar Wavelet Transform:

for(s=2; s<=n; s=s<<1) {

for(k=0; k<n; k+=s) { data[k+(s>>1)] = data[k+(s>>1)]-data[k];

data[k] = data[k] +((data[k+(s>>1)])>>1); }

}

A=(data[0]+data[1])/2

D= data[1]-data[0]

Input samples data[0],data[1] ..data[n]

Approximate Coefficients

Detail Coefficients

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

112

Inverse Haar Wavelet Transform:

for(s=n; s>=2; s=s>>2 ) {

for(k=0; k<n; k+=s ) {

data[k] = data[k]-((data[k+(s>>1)])>>1); data[k+(s>>1)] = data[k+(s>>1)]+data[k];

} }

Example:

Let us consider data sequence: Data = [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31] After splitting data into even frame and odd frame we get: Even data = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] Odd data = [1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31] After first iteration of the Haar wavelet transform we get: Even data = cA#

0 = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] Odd data = cD0 = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ] In the second iteration, detail coefficient cD0 are kept as it is and even

coefficients cA0 are further decomposed into two frames. After second iteration

we get:

Evan data = cA1 = [1 5 9 13 17 23 25 29 ] Odd data = cD1 = [2 2 2 2 2 2 2 2 ] Similarly, after third iteration we get: Even data = cA2 = [3 11 19 27] Odd data = cD2 = [ 4 4 4 4] If we arrange the transformed data in the sequence cA2, cD2, cD1, cD0, then

wavelet-transformed sequence can be given as:

Data=[3 11 19 27 4 4 4 4 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

# cA0 means approximation coefficients at level 0. cD0 means detail coefficient at level 0.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

113

In the inverse Haar transform, cA1 is calculated from cA2 and cD2. cA0 is

calculated from cA1 and cD1. Original data sequence is calculated from cA0 and

cD0.

If we apply inverse wavelet transform on the coefficient sequence without

thresholding, we get reconstructed data same as original. If we truncate small

wavelet coefficients to zero, then we get error between original data and

reconstructed data. The value of threshold depends on how much error we want

to tolerate.

Let us apply thresholding in the transformed sequence. Let us use threshold=2

i.e. all the coefficients less then “2” are reduced to zero. In our example, all the

values of the cD0 are less then “2” so we get,

Data=[ 3 11 19 27 4 4 4 4 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

cA2 cD2 cD1 cD0

In the first iteration, cA1 is calculated from cA2 and cD2: cA1 = [1 5 9 13 17 23 25 29 ] cD1 = [2 2 2 2 2 2 2 2 ] In second iteration, cA0 is calculated from cA1 and cD1: cA0 = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] cD0 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] In third iteration data “rData” is reconstructed form cA0 and cD0. rData = [ 0 0 2 2 4 4 6 6 8 8 10 10 12 12 14 14 16 16 18 18 20 20 22 22 24 24 26 26 28 28 30 30] It can be seen that reconstructed data “rData” is not same as that of original data

because of thresholding. Reconstructed data is approximate version of original

data. As threshold value increases, error between original data and

reconstructed data increases. The detailed error analysis is given in chapter 5.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

114

4.3.2 CDF (Cohen-Daubechies-Feauveau) Wavelet Transform Cohen-Daubechies-Feauveau wavelet is a family of bi-orthogonal wavelets. It is

having a finite support. Scaling function is always symmetric. It can be

implemented in two ways:

1. Using lifting scheme

2. Using conventional sub-band coding method (Filter bank method) Family of Cohen-Daubechies-Feauveau wavelet is represented by CDF(n,ñ).

Primal wavelet has n vanishing moments and dual wavelet has ñ vanishing

moments. Smoothness is directly related with the vanishing moments. As

number of vanishing moments increases, wavelet function becomes more

smooth.

In CDF(2,2), the primal and dual wavelet has two vanishing moments. It has 5

analysis filter coefficients and 3 synthesis filter coefficients.

Implementation of CDF(2,2) wavelet with lifting scheme is discussed here:

Let us consider a data set of “n” speech samples λ0,k. Where, k = 0 to n

��This data set is divided into two parts. Even frame consists of even

samples such as λ0,0 λ0,2 λ0,4 etc. Odd frame consists of odd samples

such as λ0,1 λ0,3 λ 0,5 etc.

��Odd samples are predicted from even samples by prediction function.

Difference between predicted value and actual value represents loss of

information.

Hence, loss of information can be represented by,

ϒ-1, k = λ 0,2k+1 – ½( λ -1,k + λ -1, k+1) Where, λ 0,2k+1 is actual value of odd sample. ½ ( λ -1,k + λ -1, k+1) is average of neighbouring even samples.

��In second level, all even samples are further divided into two frames and

the process repeated as shown in lifting process chart in the Fig 3.4.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

115

Now let us assume that original speech samples are stored in the data file then

following procedure is required:

��Read speech data file dynamically and put it in array called data[k].

$OO�WKH�VSHHFK�VDPSOHV� �0,k are placed in this array.

��Run wavelet transform up to Nth level. Where, N=log2(n)

��Calculate wavelet coefficient for odd sample at each level.

��Store wavelet coefficients in the output file called “output.wlt”.

For simplicity, let us consider total 16 samples. Hence value of N=4.

First Iteration:

data[1] = data[1] – ½(data[2]+data[0]) data[3] = data[3] – ½(data[4]+data[2])

data[5] = data[5] – ½(data[6]+data[4]) data[7] = data[7] – ½(data[8]+data[6])

: :

data[15] = data[15] – ½(data[16]+data[14]) (Assuming boundary data[16]=0;) Second Iteration:

data[2] = data[2] – ½(data[4]+data[0]) data[6] = data[6] – ½(data[8]+data[4]) data[10] = data[10] – ½(data[12]+data[8]) : data[14] = data[14] – ½(data[16]+data[12]) Third Iteration: data[4] = data[4] – ½(data[8]+data[0]) data[12] = data[12] – ½(data[2]+data[0]) Fourth Iteration: data[8] = data[8] – ½(data[16]+data[0])

These iterations can be implemented by following C code:

for(x=3; x>0; x--) { for(k=0; k<4; k=k+(1<<(3-x)) { data[2*k+(1<<(3-x)]-=(data[2*k+(1<<(2-x)]+data[2*k])>>1; } }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

116

In general, we may write above loops for “n” samples:

N=log(n)/log 2; for(x=N; x>0; x--) { for(k=0; k<(1<<(N>>1); k=k+(1<<(N-x)) { data[2*k+(1<<(N-x)]-=(data[2*k+(1<<(N-x)]+data[2k])>>1; } }

Inverse wavelet transform is performed to obtain reconstructed samples from the

transformed coefficients. Samples are reconstructed by adding loss of

information into predicted value.

�0,2k+1 = ϒ-1, k ���ò�� �-1, k �� �-1, k+1 )

Loss of information is stored at the same location to perform in-place calculation.

If we consider n=16 and N=4 inverse wavelet transform can be calculated as under:

First Iteration: data[8] = data[8] + ½ (data[16]+data[0]) Second Iteration: data[4] = data[4] + ½(data[8]+data[0])

data[12] = data[12] + ½(data[2]+data[0])

Third Iteration: data[2] = data[2] + ½(data[4]+data[0])

data[6] = data[6] + ½(data[8]+data[4]) data[10] = data[10] + ½(data[12]+data[8]) : data[14] = data[14] + ½(data[16]+data[12])

Fourth Iteration: data[1] = data[1] + ½(data[2]+data[0])

data[3] = data[3] + ½(data[4]+data[2]) data[5] = data[5] + ½(data[6]+data[4]) data[7] = data[7] + ½(data[8]+data[6])

: :

data[15] = data[15] + ½(data[16]+data[14])

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

117

Inverse wavelet transform can be implemented with following C code:

for(x=1; x<=3; x++) { for(k=0; k<4; k=k+(1<<(3-x)) {

data[2k+(1<<(3-x)]+=(data[2k+(1<<(2-x) ]+data[2k])>>1; } }

In General, for n samples:

N=log n/log 2; for(x=1; x<=N; x++) { for(k=0; k<(1<<(N>>1); k=k+(1<<(N-x)) { data[2*k+(1<<(N-x)]+=(data[2*k+(1<<(N-x) ]+data[2*k])>>1; } }

Modified C code for Forward and inverse wavelet transform is given below: Forward wavelet transform:

for(s=2; s<n; s<<=1) { for(k=0; k<n; k=k+s) { data[k+(s>>1)]- = (data[k]+data[k+s])>>1; } }

Inverse wavelet Transform:

for(s=(n>>1); s>=2; s>>=1) { for(k=0; k<n; k=k+s) { data[k+s/2]+=(data[k]+data[k+s])>>1; } }

Example: Let us consider the same data sequence as we have used in Haar wavelet: Data = [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

118

After splitting data into even frame and odd frame we get: Even data = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] Odd data = [1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31] After first iteration of the CDF(2,2) wavelet transform we get:

Even data = cA0 = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] Odd data = cD0 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] Second iteration:

Even data = cA1 = [0 4 8 12 16 20 24 30 ] Odd data = cD1 = [0 0 0 0 0 0 0 0 ] Third iteration:

Even data = cA2 = [0 8 16 24] Odd data = cD2 = [ 0 0 0 0] Fourth Iteration:

Even data = cA3 = [0 16] Odd data = cD3 = [ 0 0 ] If we arrange the transformed data in the sequence cA3, cD3, cD2, cD1, cD0, then

wavelet-transformed sequence can be given as:

Data=[ 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]

cA3 cD3 cD2 cD1 cD0

If we encode above sequence with run-length encoding, then it can be

represented with four bytes compare to original data size 32 bytes. Thus, CDF

wavelet gives more compact representation of data compare to Haar wavelet. In

this example, we have considered linear data; hence it can be compressed

efficiently without applying thresholding. Thus efficient loss less compression of

linear data is possible. However, real time data, such as speech samples are not

linear, hence we have to apply thresholding operator for the compression.

Filter bank Implementation of CDF(2,2) wavelet uses following filter coefficients:

Analysis LPF h = { -0.125, 0.25, 0.75, 0.25, -0.125} Analysis HPF g = { 0.25, -0.5, 0.25 } Synthesis LPF Ih = { 0.25, 0.5, 0.25 } Synthesis HPF Ig = {0.125, 0.25, -0.75, 0.25, 0.125} C code for the filter bank implementation is given in the section 4.3.4.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

119

4.3.3 Cubic interpolations in lifting scheme of wavelet transform In CDF(2,2) wavelet transform, linear interpolation is used to predict odd sample

from neighbouring even samples. If we use cubic interpolation then odd samples

are predicted from four neighbouring samples. Weighting factors associated with

each sample.are derived in the section 3.6. We will use weighting factors given in

Table 3.2. Cubic interpolating subdivision gives good prediction for the speech

signal. Boundary treatment is required for right most and left most samples.

Prediction in four different cases:

• Two neighbouring samples on left and two samples on right.

)0625.05625.05625.00625.0( 3,1,1,3,1, +−+−−−−−+−− −++−−= kjkjkjkjjkj λλλλλγ

• One neighbouring sample on left and three samples on right.

)0625.03125.09375.03125.0( 5,3,1,1,1, +−+−+−−−+−− +−+−= kjkjkjkjjkj λλλλλγ

• Three samples on left and one sample on right.

)3125.09375.03125.00625.0( 1,1,3,5,1, +−−−−−−−+−− ++−−= kjkjkjkjjkj λλλλλγ

• Four sample on left and no sample on right

)1875.21875.23125.13125.0( 1,3,5,7,1, −−−−−−−−+−− +−+−−= kjkjkjkjjkj λλλλλγ

Cubic interpolation prediction can be implemented with following C code: void CubicTransform(unsigned char data[ ],int n){ int s,k; unsigned char predict; for( s = 2; s <= (n>>1); s <<= 2 ) { for(k = 0; k < n; k += s ){ if(k==0) /* 1 left & 3 right sample*/ predict=k2*data[k]+k3*data[s]-k2*data[2*s]+k0*data[3*s]; else if(k==n-s) /* 1 right & 3 left sample*/ predict=k0*data[k-2*s]-k2*data[k-s]+k3*data[k]+k2*data[k+s]; else if(k==n-s/2) /* 4 left sample & 0 right sample*/ predict=-k2*data[k-3*s]+k5*data[k-2*s]-k4*data[k-s]+k4*data[k]; else /* 2 left sample & 2 right sample*/ predict= -k0*data[k-s]+k1*data[k]+k1*data[k+s]-k0*data[k+2*s]; data[k+s/2] -= predict; } } }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

120

Inverse transform can be given by following C code: void inv_CubicTransform(unsigned char data[],int n) { int s,k; unsigned char predict; for( s = (n>>1); s >= 2; s >>= 2 ) { for(k = 0; k < n; k += s ) { if(k==0) predict=k2*data[k]+k3*data[s]-k2*data[2*s]+k0*data[3*s]; else if(k==n-s) predict=k0*data[k-2*s]-k2*data[k-s]+k3*data[k]+k2*data[k+s]; else if(k==n-s/2) predict=-k2*data[k-3*s]+k5*data[k-2*s]-k4*data[k-s]+k4*data[k]; else predict= -k0*data[k-s]+k1*data[k]+k1*data[k+s]-k0*data[k+2*s]; data[k+s/2] += predict; } } } 4.3.4 Daubechies wavelet transforms Daubechies wavelets are very popular because they are good compromise

between compact support and smoothness.

Lifting scheme implementation of Daubechies wavelet transform: Daubechies wavelet transform can be implemented using lifting scheme of

wavelet transform by factorising it into lifting steps as discussed in section 3.5.3.

Let us consider a data set of “n” speech samples λ0,k. Where, k = 0 to n. Lifting

steps for Daubechies-4 wavelet transform are as under:

• Split the data set into even and odd frames.

λ-1,k ←λ0,2k (Even samples) and ϒ-1,k ← λ0,2k+1 (Odd samples)

• Update 1 stage: Update even samples from the odd samples.

λ-1,k ←λ-1,k + √3ϒ-1,k

• Predict stage: Predict odd samples from the even samples.

ϒ-1,k ← ϒ-1,k - 43 λ-1, k –

423 − λ-1, k-1

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

121

• Update 2 stage: Update even samples from predicted odd samples

λ-1,k ← λ-1,k - ϒ-1,k+1

• Normalize stage:

λ-1,k ← 2

13 − λ-1,k and ϒ-1,k ←

2

13 + ϒ-1,k

Above lifting steps are implemented with following C code: void update1(int data[], const int N) { int half=N>>1; int j; float sqrt3 = sqrt(3); for(j=0;j<half; j++) { data[j]=data[j]+sqrt3*data[half+j]; } } void predict_daub(int data[],const int N) { int i; int half=N>>1; data[half]=data[half]-sqrt3*data[0]/4- (sqrt3-2)*data[half-1]/4; for(i=1; i<half; i++){ data[half+i]=data[half+i]-sqrt3*data[i]/4-(sqrt3-2)*data[i-1]/4; } } void update2(int data[], const int N) { int i; int half=N>>1; int j=half; for(i=0; i<half-1; i++) { data[i]=data[i]-data[half+1+i]; } data[half-1]=data[half-1]-data[half]; } void normalize(int data[], const int N) { int i; int half=N>>1; for(i=0; i<half; i++) { data[i]=(sqrt3-1)*data[i]/(sqrt(2)); data[i+half]=(sqrt3+1)*data[i+half]/(sqrt(2)); } }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

122

Daubechies Inverse wavelet transform using lifting scheme is exactly reverse

process as described under:

• Undo normalize: λ-1,k ← 2

13 + λ-1,k and ϒ-1,k ←

2

13 − ϒ0,k

• Undo Update 2: λ-1,k ← λ-1,k + ϒ-1,k+1

• Undo Predict: ϒ-1,k ← f(2k+1) + 43 λ-1, k +

423 − λ-1, k-1

• Undo Update 1: λ-1,k ←λ-1,k - √3ϒ-1,k • Merge: λ0,k ← λ-1,k + ϒ-1,k

Daubechies inverse wavelet transform can be implemented with following C

code:

void undo_update1(int data[],int N) { int half=N>>1; int j; for(j=0;j<half; j++) { data[j]=data[j]-sqrt(3)*data[half+j]; } } void undo_predict_daub(int data[],const int N) { int i; int half=N>>1; data[half]=data[half]+sqrt3*data[0]/4+(sqrt3-2)*data[half-1]/4; for(i=1; i<half; i++) { data[half+i]=data[half+i]+sqrt3*data[i]/4+(sqrt3-2)*data[i-1]/4; } } void undo_update2(int data[], int N) { int i; int half=N>>1; int j=half; for(i=0; i<half-1; i++) { data[i]=data[i]+data[half+1+i]; } data[half-1]=data[half-1]+data[half]; }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

123

void undo_normalize(int data[], int N) { int i; int half=N>>1; for(i=0; i<half; i++) { data[i]=(sqrt3+1)*data[i]/(sqrt(2)); data[i+half]=(sqrt3-1)*data[i+half]/(sqrt(2)); } } Filter bank implementation of Daubechies wavelets transforms: In the filter bank implementation, speech samples f(k) are applied at the input of

Daubechies wavelet filter bank. The input speech samples are convolved with

low-pass and high-pass analysis filters h and g. The output of each filter is down

sampled by two, yielding the transformed signals cA0 and cD0. Down sampling is

performed to keep total number of transformed coefficients same as that of input

samples.

The coarse information cA0 is further applied at the input of analysis filter bank

and decomposed into cA1 and cD1. This process is repeated “N” times to achieve

“N level” decomposition.

Equation 4.1 and 4.2 describes practical implementation of Daubechies wavelet

transform. This implementation is fast compare to pyramidal algorithm in which

wavelet filter coefficients are arranged in form of matrix and matrix is multiplied

with input samples. In pyramidal algorithm, half of the computation is waste

because of down sampling.

∑−

=

+=1

0

)2()()(N

mj kmfmhkcA 4.1

∑−

=

+=1

0

)2()()(N

mj kmfmgkcD 4.2

Where, cAj(k) =Coarse coefficeints at level j

cDj(k)= Detail coefficients at level j

h(m) = Low pass filter coefficients

g(m) = High pass filter coefficients

N = Number of filter coefficients

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

124

C code for the Daubechies wavelet transform is given here:

for(k=0; k<(length_data>>1); k++) { cA[k]=0.0;

cD[k]=0.0; for(m=0;m<length_filt; m++) {

cA[k]+=h[m]*data[(m+2*k)%length_data]; cD[k]+=g[m]*data[(m+2*k)%length_data]; }

} for(int i=0; i<(length_data>>1); i++) {

data[i]=cA[i]; data[(length_data>>1)+i]=cD[i];

} As described in section 2.3.7, Daubechies Inverse wavelet transform gives

perfect reconstruction because synthesis filters are mirror image of analysis

filters. In the inverse wavelet transform, the coefficients sequences “cA” and “cD”

are applied at the input of synthesis filter bank after up sampling.

C code for the Daubechies inverse wavelet transforms:

for(m=0; m<(length_data>>1); m++) {

cA[m]=data[m]; cD[m]=data[length_data/2+m];

} for(k=0;k<length_data;k+=2) { buff[k]=0.0; data[k]=0.0; for(m=0;m<length_filt; m+=2) { buff[k]+=Ih[m]*cA[((k-m)/2+half_length)%half_length]; data[k]+=Ig[m]*cD[((k-m)/2+half_length)%half_length]; } }

for(k=1;k<length_data; k+=2) { buff[k]=0.0; data[k]=0.0;

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

125

for(m=1;m<length_filt; m+=2) { buff[k]+=Ih[m]*cA[((k-m)/2+half_length)%half_length]; data[k]+=Ig[m]*cD[((k-m)/2+half_length)%half_length]; } } for(k=0;k<length_data; k++) data[k]+=buff[k];

The list of filter Daubechies filter coefficients used in this thesis is given in

Appendix D.

4.3.5 Encoding We get number of zeros and small wavelet coefficients after applying wavelet

transform on the speech signal. If we replace small wavelet coefficients with

zeros by applying thresholding, then number of zeros will increase. If we store

the wavelet coefficients after thresholding then we will not get any compression

because storage of zeros will also require memory space. Advantage of applying

wavelet transform for the speech compression becomes evident if we apply

encoding to take advantage of series of zeros present in the sequence. If we

apply encoding directly on the speech samples, we will not get much

compression. Wavelet transform gives us sparse organized data with sequence

of zeros, hence encoding after the wavelet transform gives good compression.

Hence some encoding methods are necessary to take advantage of wavelet

transform for the speech compression.

Encoder allocates less number of bits to the zeros. To encode the wavelet-

transformed speech, we have used different methods like run length encoding,

Huffman coding and LZW coding. Run length encoding is very attractive from the

point of view of real time speech compression. Huffman coding72 is a popular

loss-less coding technique that compresses a signal by mapping the signal

symbols to variable length binary codes. LZW coding32 is dictionary based

encoding system that gives good compression ratio. All the results presented in

this study are based on LZW encoding after wavelet transform.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

126

4.4 Software development in C and Visual C++ Main programs performs following tasks:

• Read speech file. • Write speech file. • Reading and writing bit files. • Apply forward wavelet transform using different wavelets. • Encode wavelet coefficients. • Decode wavelet coefficients. • Perform inverse wavelet transform using different wavelets to reconstruct

speech file. • Statistical comparison of two speech files. • Recording and playing speech files (In Visual C++)

4.4.1 Functions used in the C program

Various functions used to perform above tasks in this software are explained

here:

Name: split Declaration: void split(int[],int) Description: This function splits input frame of 256 bytes into odd and even frames.

Splitting of the given array can be done with following codes

for(i=0; i<n; i++) { even_data[i]=data[2*i]; odd_data[i]=data[2*i+1]; }

The problem with this code is that it requires separate array for the storage of

even and odd data. The better method is to split data array in such a way that no

duplicate array is required. For example if there are total N number of data split

function should arrange first N/2 data as odd samples and rest N/2 data as even

samples.

Let us consider sample array data[8];

0 1 2 3 4 5 6 7 1st Iteration 0 2 1 4 3 6 5 7 2nd Iteration 0 2 4 1 6 3 5 7 3rd Iteration 0 2 4 6 1 3 5 7

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

127

This can be done with help of following C code:

void split(int data[],const int N) { int start=1; int end = N-1; while(start<end) { int i; int tmp; for(i=start; i<end; i+=2) { tmp=data[i]; data[i]=data[i+1]; data[i+1]=tmp; } start=start+1; end=end-1; }

Name: predict Declaration: void predict_cdf(int data[],const int N) Description: This function is used to predict odd samples from the set of even samples. Then

predicted value is subtracted from the original odd sample values in order to

remove redundancies. This function is applied on the data array after split()

function.

void predict_cdf(int data[],const int N) {

int i; int half=N>>1; int j=0; for(i=half; i<N-1; i++) { data[i]=data[i]-((data[j]+data[j+1])>>1); j++; }

} Name: update_cdf Declaration: void update_cdf(int data[], const int N) Description: This function update even samples from the predicted set of odd samples in

order to maintain overall average value of the signal. This function update (lift)

values of even samples (coarse signal) from the detail samples (original value -

predicted value). If our prediction is correct and if we are getting all details equal

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

128

to zero then this function does not perform any job. If prediction is wrong then

even samples are modified to maintain average of the signal. void update_cdf(int data[], const int N) {

int half=N>>1; int j=half; for(int i=1; i<half; i++) { data[i]=data[i]+((data[j]+data[j+1])>>2); j++; }

} Name: undo_update_cdf Declaration: void undo_update_cdf(int[], const int) Description: Undo update is the reverse process of update, which is used for inverse wavelet

transform. By using undo_update function, even samples are recovered back by

subtracting update information. Sign of mathematical operation is reversed to

obtain undo_update.

void undo_update_cdf(int data[], const int N) { int half=N>>1; int j=half; int i; for(i=1; i<half; i++) {

data[i]=data[i]-((data[j]+data[j+1])>>2); j++;

} }

Name: undo_predict_cdf Declaration: void undo_predict_cdf(int [], const int) Description: Undo predict is reverse process of predict, which is used for inverse wavelet

transform. This function adds loss of information to the predicted value. Odd

samples can be recovered by using even samples and detail values (loss of

information). Sign of mathematical operation is reversed in predict function to

obtain undo_predict.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

129

void undo_predict_cdf(int data[], const int N) { int i; int half=N>>1; int j=0; for(i=half; i<N-1; i++) {

data[i]=data[i]+((data[j]+data[j+1])>1); j++;

} }

Name: merge Declaration: void merge(int [], const int) Description:. This function is used to combine odd and even samples to obtain reconstructed

original data. This is reverse process of split function. This function passes data

array and length of data array as arguments. First half of the data array is even

samples and next half is odd samples. Both are merged using following code.

void merge(int data[],const int N) { int half=N>>1; int start=half-1; int end = half; int tmp,i; while(start>0) { for(i=start; i<end; i+=2) { tmp=data[i]; data[i]=data[i+1]; data[i+1]=tmp; } start=start-1; end=end+1; } }

Let us consider sample array data[8], N=8, half=4 0 2 4 6 1 3 5 7 First iteration: 0 2 4 1 6 3 5 7 (start=3 end=4) Second iteration: 0 2 1 4 3 6 5 7 (start=2 end=5) Third iteration: 0 1 2 3 4 5 6 7 (start=1 end=6)

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

130

Name: CdfTransform() Declaration: void CdfTransform(int [],int) Description: This function performs wavelet transform on input data samples. This function

calls three basic functions of the wavelet transform using lifting scheme: split,

predict and update function. The level of transformation is N=log2n (Where n is

length of the data). It arranges wavelet coefficients in the sequence cAN, cDN,

cDN-1, cDN-2, … . cD0. In the first iteration, it decomposes input data sequence into

coarser sequence cA0 and detail sequence cD0. In the second iteration, it further

decomposes cA0 into cA1 and cD1 and so on up to cAN and cDN. The function is

written such that it does not require separate arrays to store these sequences. In-

place calculations allow us to place all the values in the single array.

void CdfTransform(int data[],int n) { int i; for(i=n; i>1; i=i>>1) { split(data,i); predict(data,i); update(data,i); } }

Name: inv_CdfTransform Declaration: void inv_CdfTransform(int [], int) Description: This function performs inverse wavelet transform to reconstruct speech data from

wavelet coefficients. This function calls three basic functions of the inverse

wavelet transform with lifting scheme: undo update, undo predict and merge.

void inv_CdfTransform(int data[], int n) { int i; for(i=2; i<=n; i=i<<1) { undo_update(data,i); undo_predict(data,i); merge(data,i); } }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

131

Name: max Declaration: int max(unsigned char[], int) Description: This function finds out maximum value of wavelet coefficient out of block (Block

may consists of 256, 512 or more samples depends on block length). This

function is useful for level dependant thresholding.

int max(unsigned char data[], int n) { int i; int maxvalue=0; for(i=0;i<n;i++) { if(maxvalue<data[i]) maxvalue=data[i]; } return maxvalue; }

Name: open_obfile Declaration: BFILE *open_obfile( char * ) Description: This function opens output bit file. This file allows us to write bit. Sample code is

given below: BFILE *open_obfile( char *name ) { BFILE *b1; b1 = (BFILE*) calloc(1, sizeof( BFILE)); if (b1==NULL) return(b1); b1->file = fopen(name, "wb"); b1->row = 0; b1->mask = 0x80; return(b1); }

Structure BFILE consists of member variables file pointer f1, unsigned char mask

and integer row. The variables mask and row are used to manage bit read write

operation.

Name: open_ibfile Declaration: BFILE *open_ibfile(char * ) Description: This function opens input bit file. This file allows us to read bit. This routine is

same as open_obfile except that the file is opened in read binary mode. Instead

of b1->file = fopen(name, "wb"); b1->file = fopen(name, "rb"); is used.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

132

Name: close_obfile Declaration: void close_obfile( BFILE * ) Description: This function closes output bit file.

void close_obfile(BFILE *b1) { if (b1->mask != 0x80) putc( b1->row, b1->file ); fclose( b1->file ); free((char*)b1 ); }

Name: close_ibfile Declaration: void close_ibfile( BFILE * ) Description: This function closes input bit file.

void close_ibfile(BFILE *b1 ) { fclose( b1->file ); free((char*)b1); }

Name: write_bit Declaration: void write_bit( BFILE *bit_file, int bit ) Description: Normal file functions available in C language allow us to write byte at a time but it

does not allow us to write single bit. This function allows us to write individual bit

in the file. The variable “mask” and “row” are defined to read and write bit. The

variable mask is initialised to 0x80. The mask is shifted to right after every bit

read/write operation. The value of mask becomes zero after total eight shift

operations and row is completed. This row is written into or read from the file

using standard C functions. Sample code for write_bit is given below:

void write_bit( BFILE *b1, int bit ) { if ( bit ) b1->row =b1->row | b1->mask; b1->mask >>= 1; if ( bit_file->mask == 0 ) { putc( b1->row, b1->f1 ); b1->row = 0; b1->mask = 0x80; } }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

133

Name: write_bits Declaration: void write_bits( BFILE *bit_file,unsigned long code, int count ) Description: This function allows us to write more than one bit at a time in the specified bit file.

The number of bits is passed as an argument to the function. Local variable

“mask” is used to keep track of bits.

void write_bit(BFILE *b1, unsigned long code int bit_count) { unsigned long mask; mask=1; mask=mask<<(bit_count-1); while(mask!=0) { if ( mask & code ) b1->row =b1->row | b1->mask; b1->mask >>= 1; if ( bit_file->mask == 0 ) { putc( b1->row, b1->f1 ); b1->row = 0; b1->mask = 0x80; } mask=mask>>1; }

Name: read_bit Declaration: int read_bit( BFILE* bit_file ) Description: Normal file functions in C allow us to read byte at a time but it does not allow us

to read a single bit. This function allows us to read individual bit in the file.

int read_bit( BFILE* b1 ) { int value; if ( b1->mask == 0x80 )b1->row = getc( b1->file ); value = b1->row & b1->mask; b1->mask >>= 1; if ( b1->mask == 0 ) b1->mask = 0x80; return( value ? 1 : 0 ); }

Name: read_bits Declaration: unsigned long read_bits( BFILE *bit_file, int bit_count ) Description: This function allows us to read more than one bit at a time from the specified bit

file.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

134

unsigned long read_bits( BFILE *b1, int bit_count ) { unsigned long mask=1; unsigned long value=0; mask = mask << (bit_count-1); while ( mask != 0) { if (b1->mask == 0x80) b1->row = getc( b1->file ); if (b1->row & b1->mask) value = value| mask; mask >>= 1; b1->mask=b1->mask>>1; if (b1->mask == 0) b1->mask = 0x80; } return(value ); }

Name: count_bytes Declaration: void count_bytes( FILE *, unsigned long *); Description: To build the Huffman Tree, it is necessary to find out relative frequencies of each

wavelet coefficient. This function counts bytes in the specified file. The position of

the file input is saved and it is restored after counting.

void count_bytes(FILE *input, unsigned long *counts) { long position; int c; position = ftell( input ); while ( ( c = getc( input )) != EOF ) counts[ c ]++; fseek( input, position, SEEK_SET ); }

Name: scale_counts Declaration: void scale_counts( unsigned long *, NODE *); Description: This function finds out maximum count for any symbol in the file. All the count

values are scaled in such a fashion that it can fit in single unsigned char.

void scale_counts( unsigned long *counts, NODE *nodes ) {

unsigned long max_count; int i; max_count = 0; for ( i = 0 ; i < 256 ; i++ ) if ( counts[ i ] > max_count )

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

135

max_count = counts[ i ]; if ( max_count == 0 ) { counts[ 0 ] = 1; max_count = 1; } max_count = max_count / 255; max_count = max_count + 1; for (i = 0;i<256;i++) {

nodes[i].count = (unsigned int)(counts[i]/max_count); if ( nodes[i].count == 0 && counts[i]!=0) nodes[i].count = 1;

} nodes[ END_OF_STREAM ].count = 1;

}//End of scale_counts() NODE is the structure consists of integer variables “child0”, “child1” and associated counts. Name: build_tree Declaration: int build_tree( NODE* ); Description: This function builds Huffman tree by combining two free nodes with lowest weight

into new internal node with combine weight of the nodes. This is done by one

loop and loop exit when only one node is left.

int build_tree(NODE *nodes ) { int next_free; int i; int min_1; int min_2; nodes[ 513 ].count = 0xffff; for ( next_free = END_OF_STREAM + 1 ; ; next_free++ ) { min_1 = 513; min_2 = 513; for ( i = 0 ; i < next_free ; i++ ) if ( nodes[i].count != 0 )

{ if (nodes[i].count<nodes[ min_1 ].count)

{ min_2 = min_1; min_1 = i; }

else if (nodes[i].count<nodes[min_2].count) min_2 = i; }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

136

if ( min_2 == 513 )break; nodes[next_free].count=nodes[min_1].count+nodes[min_2].count; nodes[ min_1 ].count = 0; nodes[ min_2 ].count = 0; nodes[ next_free ].child_0 = min_1; nodes[ next_free ].child_1 = min_2; } next_free--; return( next_free ); } All nodes below 257 have count value set to their frequency. If it is non-zero

value then it is active node. Node 513 is used for comparison purpose. Initially its

value is set at 65535 that is maximum value. It is sure that all nodes have less

than this value.

Name: convert_tree_to_code Declaration: void convert_tree_to_code( NODE*,CODE*,unsigned int,int,int) Description: This function is used to assign code to each symbol value. Root node of the tree

is assigned zero and by travelling down to individual branch of the tree one or

zero is added to the code each time. Whenever we rich to leaf of desired symbol,

we store the code values for that leaf in the code array and return back to

previous node from where we start journey at the other side of the tree.

void convert_tree_to_code( NODE *nodes, CODE *codes, unsigned int code_so_far, int bits, int node ) { if ( node <= END_OF_STREAM ) { codes[ node ].code = code_so_far; codes[ node ].code_bits = bits; return; } code_so_far <<= 1; bits++; convert_tree_to_code( nodes, codes, code_so_far, bits, nodes[node].child_0 ); convert_tree_to_code( nodes, codes, code_so_far | 1, bits, nodes[ node ].child_1 ); }

Name: output_counts Declaration: void output_counts( BFILE*, NODE*); Description: This function is used to store the symbol counts in the compressed file so that

decoder can read it. All the sample values are not stored. Only present sample

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

137

values are stored. A count run consists of the value of the first symbol in the run

followed by the value of last symbol in the run and it is followed by count values

for all the symbols from first to last.

void output_counts(BFILE *output, NODE *nodes) { int first; int last; int next; int i; first = 0; while ( first < 255 && nodes[ first ].count == 0 ) first++; for ( ; first < 256 ; first = next ) { last = first + 1; for ( ; ; ) { for ( ; last < 256 ; last++ ) if ( nodes[ last ].count == 0 ) break; last--; for ( next = last + 1; next < 256 ; next++ ) if ( nodes[ next ].count != 0 ) break; if ( next > 255 ) break; if ( ( next - last ) > 3 ) break; last = next; } putc( first, output->file); putc( last, output->file); for (i = first ; i <= last ; i++ )

{ putc( nodes[ i ].count, output->file ) } } if ( putc( 0, output->file ) != 0 ) fatal_error( "Error writing byte counts\n" ); }

Name: input_counts Declaration: void input_counts( BFILE*, NODE*); Description: This function is used for decoding. This function is used to read count values

those stored during encoding.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

138

void input_counts(BFILE *input, NODE *nodes) { int first; int last; int i; int c; for ( i = 0 ; i < 256 ; i++ ) nodes[ i ].count = 0; first = getc( input->file ); last = getc( input->file ); for ( ; ; ) { for ( i = first ; i <= last ; i++ ) c = getc(input->file); nodes[ i ].count = (unsigned int) c; first = getc( input->file ); if ( first == 0 ) break; last = getc( input->file ); } nodes[ END_OF_STREAM ].count = 1; }

Name: compress_data Declaration: void compress_data( FILE*, BFILE*,CODE*); Description: This function is used to store Huffman code corresponds to wavelet coefficients

in the output file. This function uses Huffman tree and code table.

void compress_data(FILE *input, BFILE *output, CODE *codes) { int c; while((c=getc(input))!=EOF) write_bits( output,(unsigned long)codes[ c ].code, codes[c].code_bits ); write_bits(output,(unsigned long)codes[ END_OF_STREAM ].code, codes[END_OF_STREAM].code_bits ); } Name: expand_data Declaration: void expand_data( BFILE*, FILE *, NODE *,int); Description: This function reads Huffman code and decodes wavelet coefficient. The tree is

traversed starting at root node, reading a bit in, and taking either the child_0 or

child_1 path. Eventually, the tree winds down to leaf node and corresponding

symbol (wavelet coefficient) is output.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

139

void expand_data(BFILE* input,FILE* output,NODE* nodes, int root_node) { int node; for ( ; ; ) { node = root_node; do { if ( read_bit(input)) node = nodes[node].child_1; else node = nodes[node].child_0; } while (node > END_OF_STREAM); if ( node == END_OF_STREAM ) break; putc(node, output); } }

Name: print_ratios Declaration: void print_ratios( char*, char* ); Description: This function is used to print compression ratio. It takes ratio of original file size to

compressed file size and prints the ratio.

void print_ratios( char *input, char *output ) { long input_size; long output_size; int ratio; input_size = file_size( input ); if ( input_size == 0 ) input_size = 1; output_size = file_size( output ); ratio = input_size * 100L / output_size; printf( "\nInput bytes: %ld\n", input_size ); printf( "Output bytes: %ld\n", output_size ); if ( output_size == 0 ) output_size = 1; printf( "Compression ratio: %d%%\n", ratio ); }

Name: file_size Declaration: long file_size( char*); Description: This function finds out size of the specified file.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

140

long file_size( char *name ) { long eof_ftell; FILE *file; file = fopen( name, "r" ); if ( file == NULL ) return( 0L ); fseek( file, 0L, SEEK_END ); eof_ftell = ftell( file ); fclose( file ); return( eof_ftell ); }

Name: encode Declaration: void encode(char*,char*); Description: This function encodes wavelet coefficients using Huffman algorithm72. This

function calls other functions to calculate probabilities, built Huffman tree, convert

tree to code and finally allocates bits. This function assigns less number of bits to

frequently occurring symbols. For example, zeros in wavelet-transformed data. void encode(char *filein, char *fileout) { BFILE *output; FILE *input; unsigned long *counts; NODE *nodes; CODE *codes; int root_node; input = fopen( filein, "rb" ); if(input == NULL ) fatal_error( "Error in opening input file: %s\n",filein ); output = open_obfile( fileout ); if(output == NULL ) fatal_error( "Error opening output file %s\n", fileout ); counts = (unsigned long *) calloc(256, sizeof(unsigned long)); if(counts == NULL ) fatal_error( "Error allocating counts array\n" ); if((nodes = (NODE *) calloc( 514, sizeof( NODE )))== NULL ) fatal_error( "Error allocating nodes array\n" ); if((codes = (CODE *)calloc(257, sizeof( CODE )))== NULL ) fatal_error( "Error allocating codes array\n" ); count_bytes( input, counts ); scale_counts( counts, nodes ); output_counts( output, nodes ); root_node = build_tree( nodes ); convert_tree_to_code( nodes, codes, 0, 0, root_node ); compress_data( input, output, codes ); free( (char *) counts );

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

141

free( (char *) nodes ); free( (char *) codes ); close_obfile( output ); fclose( input ); print_ratios( filein, fileout); }

Name: decode Declaration: void decode(char*, char*); Description: This function decodes encoded wavelet coefficients. This function calls other

functions to read counts, built Huffman tree based on it and expand bit at a time.

Finally decoded wavelet coefficients are stored in the output file. void decode(char* filein, char* fileout) { FILE *output; BFILE *input; NODE *nodes; int root_node; input = open_ibfile( filein ); if ( input == NULL ) fatal_error( "Error opening input file %s\n", filein); output = fopen( fileout, "wb" ); if ( output == NULL ) fatal_error( "Error opening output file %s\n", fileout ); if ((nodes = (NODE *) calloc(514, sizeof( NODE)))== NULL) fatal_error( "Error allocating nodes array\n" ); input_counts(input, nodes ); root_node = build_tree( nodes ); expand_data(input, output, nodes, root_node ); free((char *) nodes ); close_ibfile( input ); fclose( output ); } Name: zero_encode Declaration: void zero_encode(char*); Description: This function encodes wavelet coefficients by run length encoding. This is the

simplest type of encoding. In this, sequence of zeros is replaced by count value

followed by zero. In the wavelet-transformed speech, there are so many zeros,

so this type of encoding is effective. void zero_encode(char filename[]) { FILE *f1,*f2; char n1,count=0; f1=fopen(filename,"rb");

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

142

f2=fopen("noutput","wb"); while(!feof(f1)) { n1=getc(f1); if(n1==0){count++;} else { if(count!=0) { putc(count,f2); putc(0x00,f2);count=0;} putc(n1,f2); } } if(count!=0) { putc(count,f2); putc(0x00,f2);} fclose(f1); fclose(f2); }

Name: zero_decode Declaration: void zero_decode(char*); Description: This function decodes wavelet coefficients from the encoded file. It reads values

sequentially. When it finds zero value, it reads previous value as a count and

substitutes zeros as per the count value.

void zero_decode(char filename[]) { FILE *f1,*f2; int n1,n2=0; int count=0; int i; f1=fopen(filename,"rb"); if(f1==NULL) printf("Error in opening %s file",filename); f2=fopen("newout","wb"); if(f2==NULL) printf("Can not create newout file"); while(!feof(f1)) { n1=getc(f1); if(n1==0) { count=n2; for(i=1;i<=count;i++) putc(0x00,f2); } else { if(n2!=0) putc((char)n2,f2); } if(feof(f1)) break; n2=getc(f1); if(n2==0) { count=n1; for(i=1;i<=count;i++) putc(0x00,f2); }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

143

else { if(n1!=0) putc((char)n1,f2); } } fclose(f1); fclose(f2); }

Name: zencode Declaration: int zencode(int[] ,int ) Description: It performs same job as zero_encode() but it encodes the wavelet coefficients

frame by frame.

int zero_encode(int data[], int size) { int n1,count=0; int i=0,j=0; while(i<size) { n1=data[i]; if(n1==0){count++;} else { if(count!=0) { ndata[j++]=count; ndata[j++]=0x00; count=0; } ndata[j++]=n1; } i++; } if(count!=0) { ndata[j++]=count; ndata[j++]=0x00;} for(i=0; i<j; i++) data[i]=ndata[i]; return(j); }

Name: zdecode Declaration: int zdecode(FILE *, int data, int size) Description: It performs the same job as zero_decode but it decodes wavelet coefficients

frame by frame from the file.

int zero_decode(FILE *input, int data[],int size) { int n1,n2=0; int count=0; int i;

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

144

int j=0; while(1) { if(feof(input)){data[j]=n2; break;} n1=getw(input); if(n1==0) { count=n2; for(i=1;i<=count;i++) {data[j]=0; j++; if(j==size) return(j);} } else { if(n2!=0) {data[j]=n2; j++; if(j==size){ fseek(input,-2,SEEK_CUR); return(j);}} } if(feof(input)) {data[j]=n1; break;} n2=getw(input); if(n2==0) { count=n1; for(i=1;i<=count;i++)

{data[j]=0; j++;if(j==size) return(j);} } else { if(n1!=0){data[j]=n1; j++; if(j==size){ fseek(input,-2,SEEK_CUR); return(j);}} } } return(j); } Name: lz_encode Declaration: void lz_encode(char*,char*); Description: This function encodes wavelet coefficients using Lempel and Ziv (LZ78)

encoding method72. void lz_encode(char *file1, char *file2) { BFILE *output; FILE *input; int character; int string_code; unsigned int index; input =fopen( file1, "rb" ); if(input == NULL ) fatal_error( "Error opening %s for input\n", file1 ); output = open_obfile(file2); if ( output == NULL ) fatal_error( "Error opening %s for output\n", file2 ); init_storage();

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

145

init_diction(); if ((string_code=getc(input))==EOF) string_code = END_OF_STREAM; while((character=getc(input))!=EOF) { index = find_child_node( string_code, character); if(DICT( index ).code_value !=-1) string_code = DICT( index ).code_value; else { DICT(index).code_value = next_code++; DICT(index).parent_code = string_code; DICT(index).character = (char) character; write_bits( output, (unsigned long) string_code, current_code_bits ); string_code = character; if(next_code>MAX_CODE ) { write_bits(output, (unsignedlong)FLUSH_CODE,current_code_bits ); init_diction(); } else if ( next_code > next_bump_code ) { write_bits(output, (unsignedlong)BUMP_CODE,current_code_bits); current_code_bits++; next_bump_code <<= 1; next_bump_code |= 1; } } close_obfile(output); fclose(input); } Name: lz_decode Declaration: void lz_decode( char *, char *) Description: This function decodes wavelet coefficients using Lempel and Ziv (LZ78)

decoding method72.

void lz_decode( char *file1, char *file2 ) { FILE *output; BFILE *input; unsigned int new_code; unsigned int old_code; int character; unsigned int count; input=open_ibfile( file1);

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

146

if ( input == NULL ) fatal_error( "Error opening %s for input\n", file1 ); output = fopen(file2 , "wb" ); if ( output == NULL ) fatal_error( "Error opening %s for output\n", file2 ); init_storage(); for ( ; ; ) { init_diction(); old_code =(unsigned int)read_bits(input, current_code_bits); if ( old_code == END_OF_STREAM ) return; character = old_code; putc( old_code, output ); for ( ; ; ) { new_code=(unsigned int)read_bits( input,current_code_bits); if (new_code==END_OF_STREAM ) return; if (new_code==FLUSH_CODE )break; if (new_code==BUMP_CODE ) { current_code_bits++; continue; } if ( new_code>=next_code ) { decode_stack[0]=(char) character; count = decode_string(1,old_code); } else count = decode_string(0,new_code); character = decode_stack[count-1]; while ( count > 0 ) putc( decode_stack[ --count ], output ); DICT( next_code ).parent_code = old_code; DICT( next_code ).character=(char)character; next_code++; old_code = new_code; } } close_ibfile( input ); fclose( output ); }

Name: read_data Declaration: void read_data(char*); Description: This function is used to read sample values from the file. This function is used for

the testing purpose.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

147

void read_data(char filename[]) { int m,i=0; int n; char c; FILE *f1; f1=fopen(filename,"rb"); if(f1==NULL) fatal_error( "Error in reading file %s\n",filename); while(!feof(f1)) { n=getc(f1); if(i%16==0) printf("\n%02x----->",i); printf("%02x ",n);i++; if(i%256==0) {if(getch()==0x1b) break;} } printf("\n End of reading\n"); fclose(f1); }

Name: write_data Declaration: void write_data(char*); Description: This function is used to write sample data (dummy data) in the file. This is used

to write sine wave data for testing purpose.

void write_data(char filename[]) { int n; long m; FILE *f1; f1=fopen(filename,"wb"); if(f1==NULL) fatal_error( "Error opening output file %s\n", filename); printf("\nEnter total no. of bytes :"); scanf("%ld",&m); while(m>=0) { n=(int)(100*sin(2*pi*p/100)); putc((char)n,f1); m--; } fclose(f1); }

Name: read_wave Declaration: void read_wave(char*); Description:

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

148

This function is used to read content of speech file (wav file). This function reads

header information such as length of file, samples per second, bytes per

samples, channels etc. and actual sample values stored in the file.

void read_wave() { int choice; int n; char filename[20]; FILE *fp; char data[17]; data[16]=’\0’; printf("Enter name of the file :"); scanf("%s",filename); fp = fopen(filename,"rb"); int i=fread(data,1,16,fp); if(i!=16){printf("Error!"); } printf("READING WAVE FILE:\n"); printf("header %c %c %c %c %ld %8s\n", *data,*(data+1),*(data+2),*(data+3),*(long*)(data+4),data+8); i=fread(data,1,16,fp); int type=*((int*)(data+4)); int channels=*((int*)(data+6)); long sample_rate=*((long*)(data+8)); long byte_per_sec= *((long*)(data+12)); printf("\n type= %d\n channels= %d\n sample rate=%ld", type,channels,sample_rate); printf("\n byte per second=%ld\n",byte_per_sec); i=fread(data,1,4,fp); int block_size=*((int*)(data)); int bits=*((int*)(data+2)); printf("Block length : %d \n",block_size); printf("No of bits per sample : %d\n",bits); fseek(fp,50,0); i=fread(data,1,8,fp); printf("%c%c%c%c\n",*(data),*(data+1),*(data+2),*(data+3)); long data_size = *((long*)(data+4)); printf("size of file %s is %ld\n",filename,data_size); printf("Press Enter to observe content of the file :\n"); if(getch()==0x0d) { i=0; fseek(fp,58,0); while(!feof(fp)) { if(i%16==0){printf("\n%02d --->",i);} n=getc(fp); printf("%02x ",n); i++; if(i%256==0){getch();} } } }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

149

Name: compare Declaration: void compare(char*); Description: This function is used for statistical comparison between original speech file and

reconstructed speech file. This is used for objective speech quality measure. This

function find outs mean absolute deviation, root mean square error (RMSE) and

signal to noise ratio.

void compare(char filename1[]) { int i,r,s,diff; long n=0; long deviation=0; long sum_sqrs=0; long sqdiff,sqrs,sqrr,error=0; double mse,rmse,mean_dev; float SNR; char filename2[25],c,d; FILE *f1,*f2; f1=fopen(filename1,"rb"); check(filename1); printf("\n\nComprasion with file:"); scanf("%s",filename2); f2=fopen(filename2,"rb"); check(filename2); while(!feof(f1)) { s=getc(f1); //Signal r=getc(f2); //Reconstructed Signal diff=s-r; sqrs=pow(s,2); sqdiff= pow(diff,2); sum_sqrs=sum_sqrs+sqrs; error=error+(long)pow(diff,2); deviation=deviation+abs(diff); n++; } mse = (float)error/(n-1); rmse=sqrt(mse); if(error!=0) SNR=10*log(sum_sqrs/error)/log(10); else SNR=100; mean_dev=(double)deviation/n; printf("\n %d samples are different",j); printf("\n Total mean absolute deviation is = %f",mean_dev); printf("\n Total Mean square error MSE = %lf",mse); printf("\n Total RMSE %lf",rmse); printf("\n SNR : %f",SNR); fclose(f1); fclose(f2); }//End of comparision

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

150

4.4.2 Classes and its members for Visual C++ program All the functions explained in the section 4.4.1 are also used in the Visual C++

program as a member function of the class “CspeechDlg”. Classes and its

member functions used in Visual C++ software are explained here.

Name: CSpeechDlg

Member functions:

The CspeechDlg class provides a encapsulation of everything required for

speech compression using wavelet transform. This class consists of wide verities

of the functions to perform wavelet transform using Haar, Daubechies and

Cohen-Daubechies-Feauveau biorthogonal wavelets using filter bank and lifting

scheme. The list of member functions for the wavelet transform is given here.

The source code for these functions is discussed in section 4.3 and 4.4.1. The

same source code is used in Visual C++ software.

void CSpeechDlg::split(BYTE[], const int) void CSpeechDlg::predict(BYTE[], const int) void CSpeechDlg::update(BYTE[], const int) void CSpeechDlg::CdfTransform(BYTE[], const int) void CSpeechDlg::undo_update(BYTE[], const int) void CSpeechDlg::undo_predict(BYTE[],const int) void CSpeechDlg::merge(BYTE[], const int); void CSpeechDlg::inv_CdfTransform(BYTE[],const int) void CSpeechDlg::HaarTransform(BYTE[], const int) void CSpeechDlg::predict_Haar(BYTE[],const int) void CSpeechDlg::update_Haar(BYTE[],const int) void CSpeechDlg::undo_update_Haar(BYTE[],const int) void CSpeechDlg::undo_predict_Haar(BYTE[],int n) void CSpeechDlg::inv_HaarTransform(BYTE[], const int) void CSpeechDlg::transform_daub4(int [], int n)

void CSpeechDlg::inv_transform_daub4(int [], int n) void CSpeechDlg::normalize(int[],const int)

void CSpeechDlg::update_ int[],const int) void CSpeechDlg::predict_daub(int[],const int)

void CSpeechDlg::update_daub1(int[],const int) void CSpeechDlg::undo_normalize(int[],const int) void CSpeechDlg::undo_update_daub2(int[],const int) void CSpeechDlg::undo_predict_daub(int[],const int) void CSpeechDlg::undo_update_daub1(int[],const int) void CSpeechDlg::CubicTransform(int[],const int)

void CSpeechDlg::inv_CubicTransform(int[],const int) void CSpeechDlg::compare(char*, char*)

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

151

CSpeechDlg class consists of other member functions that are invoked by

clicking the command buttons.

void CSpeechDlg::OnPlay() void CSpeechDlg::OnRecord() void CSpeechDlg::OnFileFwt() void CSpeechDlg::OnFileIwt() void CSpeechDlg::OnCompress() void CSpeechDlg::OnExpand() void CSpeechDlg::OnSelectFile() void CSpeechDlg::OnAddNoise() void CSpeechDlg::OnDenoise() void CSpeechDlg::OnSelendokComboWavelet() void CSpeechDlg::OnSelendokComboBlockLength() void CSpeechDlg::OnRadioLifting() void CSpeechDlg::OnRadioFilter() void CSpeechDlg::OnClear()

Name: CWave The CWave class provides a simple encapsulation for the recording and

playback of waveform audio. This class is responsible for combining actions of

gettings samples from the speech file or input devices. It saves samples to the

file. It provides buffer for the temporary storage of samples. It consists of follwing

member functions.

Member functions: SaveSamples: This function is used to save samples coming from recording

device to standard wave file. Name of the file is passed as an argument. Header

information such as file size, channels, samples rate, bytes per second etc., is

stored first. Samples are obtained from buffer and stored in the file.

void CWave::SaveSamples(CString strFile) { ASSERT( m_buffer.GetNumSamples()>0); CFile *f; CFile myfile(strFile,CFile::modeCreate|CFile::modeWrite); f=&myfile; f->Write("RIFF", 4); DWORD dwFileSize= m_buffer.GetNumSamples()*m_pcmWaveFormat.nBlockAlign+36; f->Write(&dwFileSize, sizeof(dwFileSize)) ; f->Write("WAVEfmt ", 8) ;

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

152

DWORD dwFmtSize = 16L; f->Write(&dwFmtSize, sizeof(dwFmtSize)) ; f->Write(&m_pcmWaveFormat.wFormatTag, sizeof(m_pcmWaveFormat.wFormatTag)); f->Write(&m_pcmWaveFormat.nChannels, sizeof(m_pcmWaveFormat.nChannels)); f->Write(&m_pcmWaveFormat.nSamplesPerSec, sizeof(m_pcmWaveFormat.nSamplesPerSec)); f->Write(&m_pcmWaveFormat.nAvgBytesPerSec, sizeof(m_pcmWaveFormat.nAvgBytesPerSec)); f->Write(&m_pcmWaveFormat.nBlockAlign, sizeof(m_pcmWaveFormat.nBlockAlign)); f->Write(&m_pcmWaveFormat.wBitsPerSample, sizeof(m_pcmWaveFormat.wBitsPerSample)); f->Write("data", 4) ; DWORD dwNum = m_buffer.GetNumSamples() * m_pcmWaveFormat.nBlockAlign; f->Write(&dwNum, sizeof(dwNum)) ; f->Write(m_buffer.GetBuffer(), dwNum) ; myfile.Close(); f->Close(); } GetSamples: This function is used to read samples from the specified file. It

reads header information as well as actual sample values. void CWave::GetSamples(CString strFile) { char szTmp[10]; CFile *f; CFile myfile(strFile, CFile::modeRead); f=&myfile; WAVEFORMATEX pcmWaveFormat; ZeroMemory(szTmp, 10 * sizeof(char)); f->Read(szTmp, 4 * sizeof(char)) ; if (strncmp(szTmp, _T("RIFF"), 4) != 0) ::AfxThrowFileException(CFileException::invalidFile,-1, f->GetFileName()); DWORD dwFileSize = m_buffer.GetNumSamples()* m_pcmWaveFormat.nBlockAlign+36; f->Read(&dwFileSize, sizeof(dwFileSize)); ZeroMemory(szTmp, 10 * sizeof(char)); f->Read(szTmp, 8 * sizeof(char)) ; if (strncmp(szTmp, _T("WAVEfmt "), 8) != 0) ::AfxThrowFileException(CFileException::invalidFile, -1, f->GetFileName()); DWORD dwFmtSize f->Read(&dwFmtSize, sizeof(dwFmtSize)); f->Read(&pcmWaveFormat.wFormatTag, sizeof(pcmWaveFormat.wFormatTag));

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

153

f->Read(&pcmWaveFormat.nChannels, sizeof(pcmWaveFormat.nChannels)); f->Read(&pcmWaveFormat.nSamplesPerSec, sizeof(pcmWaveFormat.nSamplesPerSec)); f->Read(&pcmWaveFormat.nAvgBytesPerSec, sizeof(pcmWaveFormat.nAvgBytesPerSec)); f->Read(&pcmWaveFormat.nBlockAlign,

sizeof(pcmWaveFormat.nBlockAlign)); f->Read(&pcmWaveFormat.wBitsPerSample, sizeof(pcmWaveFormat.wBitsPerSample)); ZeroMemory(szTmp,10*sizeof(char)); f->Read(szTmp, 4*sizeof(char)) ; m_pcmWaveFormat = pcmWaveFormat; DWORD dwNum=dwFileSize-36; DWORD dwNum1; f->Read(&dwNum1, sizeof(dwNum)) ; m_buffer.SetNumSamples(dwNum / pcmWaveFormat.nBlockAlign, pcmWaveFormat.nBlockAlign); f->Read(m_buffer.GetBuffer(), dwNum) ; myfile.Close(); f->Close(); } SetBuffer: This function is used to copy data buffer to internal wave data buffer.

if flag bCopy is set, then it calls member function CopyBuffer of class

CWaveBuffer. . If bCopy is reset, then it calls member function SetBuffer of class

CWaveBuffer. void CWave::SetBuffer(void* pBuffer,DWORD dwNumSample,bool bCopy) { ASSERT(pBuffer); ASSERT(dwNumSample > 0); ASSERT(m_pcmWaveFormat.nBlockAlign > 0); if (bCopy) { m_buffer.CopyBuffer(pBuffer, dwNumSample, m_pcmWaveFormat.nBlockAlign); } else { m_buffer.SetBuffer(pBuffer, dwNumSample, m_pcmWaveFormat.nBlockAlign); } }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

154

Following functions are used to get the speech buffer, number of samples and

length of buffer from the CWaveBuffer class.

void* CWave::GetBuffer() const { return m_buffer.GetBuffer(); } DWORD CWave::GetNumSamples() const { return m_buffer.GetNumSamples(); } DWORD CWave::GetBufferLength() const { return(m_buffer.GetNumSamples()*m_pcmWaveFormat.nBlockAlign); } Following function is used to build the format of the “wav” file to save recorded

speech. This information will be saved as header information.

void CWave::BuildFormat(WORD nChannels, DWORD nFrequency, WORD nBits) { m_pcmWaveFormat.wFormatTag = WAVE_FORMAT_PCM; m_pcmWaveFormat.nChannels = nChannels; m_pcmWaveFormat.nSamplesPerSec = nFrequency; m_pcmWaveFormat.nAvgBytesPerSec = nFrequency*nChannels*nBits/8; m_pcmWaveFormat.nBlockAlign = nChannels*nBits / 8; m_pcmWaveFormat.wBitsPerSample = nBits; m_buffer.SetNumSamples(0L, m_pcmWaveFormat.nBlockAlign); } Name: CWaveBuffer

The CWaveBuffer class handles intermediate buffer. It performs various tasks

like it copies the buffer, set the buffer, set number of samples in buffer, get

number of samples from existing buffer. To perform these tasks, following

member functions are used.

Member functions: void CWaveBuffer::CopyBuffer(void* pBuffer,DWORD dwNumSamples, int nSize) { ASSERT(dwNumSamples >= 0); ASSERT(nSize); if (!m_pBuffer) SetNumSamples(dwNumSamples, nSize);

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

155

if(__min(m_dwNum, dwNumSamples) * nSize > 0) { ZeroMemory(m_pBuffer, m_dwNum * m_nSampleSize); CopyMemory(m_pBuffer, pBuffer, __min(m_dwNum, dwNumSamples)*nSize); } } void CWaveBuffer::SetBuffer(void *pBuffer, DWORD dwNumSamples, int nSize) { ASSERT(dwNumSamples >= 0); ASSERT(nSize); delete[] m_pBuffer; m_pBuffer = pBuffer; m_dwNum = dwNumSamples; m_nSampleSize = nSize; } int CWaveBuffer::GetSampleSize() const { return m_nSampleSize; } DWORD CWaveBuffer::GetNumSamples() const { return m_dwNum; } void CWaveBuffer::SetNumSamples(DWORD dwNumSamples, int nSize) { ASSERT(dwNumSamples >= 0); ASSERT(nSize > 0); void* pBuffer = NULL; pBuffer = new char[nSize * dwNumSamples]; SetBuffer(pBuffer, dwNumSamples, nSize); } Name: CWaveDevice

The CWaveDevice class is used to manage Audio input/output device. It gets

device number, determines whether wave can be played during play and

determines whether wave can be recorded during recording.

Member Functions: GetDevice() function returns MS Windows device number. IsInputFormat()

checks whether wave can be recorded or not. IsOutputFormat() checks whether

wave can be played or not.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

156

bool CWaveDevice::IsInputFormat(const CWave& wave) { return (waveInOpen( NULL, GetDevice(), &wave.GetFormat(), NULL, NULL, WAVE_FORMAT_QUERY) == MMSYSERR_NOERROR); } bool CWaveDevice::IsOutputFormat(const CWave& wave) { return (waveOutOpen( NULL, GetDevice(), &wave.GetFormat(), NULL, NULL, WAVE_FORMAT_QUERY) == MMSYSERR_NOERROR); } CWaveDevice::CWaveDevice(const CWaveDevice &copy) { m_nDevice = copy.GetDevice(); } inline UINT CWaveDevice::GetDevice() const { return m_nDevice; } Name: CWaveIn

The CWaveIn class provides an encapsulation of the functionality of input device

driver. It handles recording function. It sets wave format, sets device for the

recording and adds new buffers with help of various member functions.

Member functions:

Record() function used to record speech from the sound card(input device).

Pause() used to pause the recording process. Continue() to continue recording

process. Open() to open input device for recording. SetDevice() sets the device

for the recording. MakeWave() creates new wave buffer so as it can be used to

store samples of speech in the file.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

157

bool CWaveIn::Record(UINT nTaille) { ASSERT(nTaille > 0); ASSERT(m_hWaveIn); if ( !Stop() ) { return false; } m_bResetRequired = false; FreeListOfBuffer(); FreeListOfHeader(); SetWaveFormat( m_wave.GetFormat() ); m_nIndexWaveHdr = NUMWAVEINHDR - 1; m_nBufferSize = nTaille; for (int i = 0; i < NUMWAVEINHDR; i++) { if ( !AddNewHeader(m_hWaveIn) ) return false; } if ( IsError(waveInStart(m_hWaveIn)) ) return false; return true; } DWORD CWaveIn::GetNumSamples() { DWORD dwTotal = 0L; POSITION pos = m_listOfBuffer.GetHeadPosition(); while (pos){ CWaveBuffer* p_waveBuffer =(CWaveBuffer*) listOfBuffer.GetNext(pos); dwTotal += p_waveBuffer->GetNumSamples(); } return dwTotal; } CWave CWaveIn::MakeWave() { void* pBuffer=new char[GetNumSamples()* m_wave.GetFormat().nBlockAlign]; DWORD dwPosInBuffer = 0L; POSITION pos = m_listOfBuffer.GetHeadPosition(); while (pos) { CWaveBuffer* p_waveBuffer = (CWaveBuffer*) m_listOfBuffer.GetNext(pos); CopyMemory((char*)pBuffer+dwPosInBuffer, p_waveBuffer->GetBuffer(), p_waveBuffer->GetNumSamples() * p_waveBuffer->GetSampleSize()); dwPosInBuffer += p_waveBuffer->GetNumSamples() * p_waveBuffer->GetSampleSize(); }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

158

m_wave.SetBuffer( pBuffer, GetNumSamples() ); return m_wave; } bool CWaveIn::Continue() { if (m_hWaveIn) { return !IsError( waveInStart(m_hWaveIn) ); } return true; } bool CWaveIn::Open() { return !IsError( waveInOpen(&m_hWaveIn,m_waveDevice.GetDevice(), &m_wave.GetFormat(),(DWORD)waveInProc,NULL,CALLBACK_FUNCTION)); } bool CWaveIn::Pause() { if (m_hWaveIn) { return !IsError( waveInStop(m_hWaveIn) ); } return true; } Name: CWaveInterface

The CWaveInterface provides interface with input output device for playing and

recording speech files. Its member functions provide device name for input and

output. It also returns number of input and output devices.

Member functions:

GetWaveInDevice() returns input device name. GetWaveOutDevice() returns

output device. GetWaveInCount() returns number of input devices detected.

GetWaveOutCount() returns number of output devices detected.

CString CWaveInterface::GetWaveInName(UINT nIndex) { ASSERT(nIndex < GetWaveInCount()); WAVEINCAPS tagCaps; switch (waveInGetDevCaps(nIndex, &tagCaps, sizeof(tagCaps))) { case MMSYSERR_NOERROR: return tagCaps.szPname; break; default: return ""; } }

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

159

UINT CWaveInterface::GetWaveOutCount() { return waveOutGetNumDevs(); } UINT CWaveInterface::GetWaveInCount() { return waveInGetNumDevs(); } Name: CWaveOut

The CWaveOut class provides an encapsulation of the functionality of output

device driver. It handles play function. It sets wave format, sets device for the

play and adds new buffers by using various member functions.

Member functions: bool CWaveOut::Play(DWORD dwStart, DWORD dwEnd) { if ( !Stop() ) return false; m_dwStartPos=(dwStart==-1)? 0L :dwStart* m_wave.GetFormat().nBlockAlign; m_dwEndPos = (dwEnd == -1) ? m_wave.GetBufferLength() : __min(m_wave.GetBufferLength(),dwEnd)* m_wave.GetFormat().nBlockAlign; m_nIndexWaveHdr = NUMWAVEOUTHDR - 1; for (int i = 0; i < NUMWAVEOUTHDR; i++) { if (!AddNewHeader(m_hWaveOut)) return false; } return true; } bool CWaveOut::Stop() { if (m_hWaveOut != NULL) { m_dwStartPos = m_dwEndPos; if ( IsError(waveOutReset(m_hWaveOut)) ) { return false; } } return true; } bool CWaveOut::AddFullHeader(HWAVEOUT hwo, int nLoop) { if ( GetBufferLength() == 0) { return false; } m_nIndexWaveHdr=(m_nIndexWaveHdr==NUMWAVEOUTHDR-1)?

0 : m_nIndexWaveHdr + 1;

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

160

m_tagWaveHdr[m_nIndexWaveHdr].lpData = (char*)m_wave.GetBuffer()+m_dwStartPos; m_tagWaveHdr[m_nIndexWaveHdr].dwBufferLength = m_dwEndPos-m_dwStartPos; m_tagWaveHdr[m_nIndexWaveHdr].dwFlags = WHDR_BEGINLOOP | WHDR_ENDLOOP; m_tagWaveHdr[m_nIndexWaveHdr].dwLoops = nLoop; m_tagWaveHdr[m_nIndexWaveHdr].dwUser = (DWORD)(void*)this; if ( IsError(waveOutPrepareHeader(hwo, &m_tagWaveHdr[m_nIndexWaveHdr],sizeof(WAVEHDR)))) { return false; } if ( IsError(waveOutWrite(hwo, &m_tagWaveHdr[m_nIndexWaveHdr],sizeof(WAVEHDR)))) { waveOutUnprepareHeader( hwo, &m_tagWaveHdr[m_nIndexWaveHdr],sizeof(WAVEHDR) ); m_tagWaveHdr[m_nIndexWaveHdr].lpData = NULL; m_tagWaveHdr[m_nIndexWaveHdr].dwBufferLength = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwFlags = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwUser = NULL; m_nIndexWaveHdr--; return false; } m_dwStartPos = m_dwEndPos - m_dwStartPos; return true; } bool CWaveOut::AddNewHeader(HWAVEOUT hwo) { if(GetBufferLength()==0) return false; m_nIndexWaveHdr=(m_nIndexWaveHdr==NUMWAVEOUTHDR-1)? 0 : m_nIndexWaveHdr + 1; m_tagWaveHdr[m_nIndexWaveHdr].lpData = (char*)m_wave.GetBuffer()+m_dwStartPos; m_tagWaveHdr[m_nIndexWaveHdr].dwBufferLength = GetBufferLength(); m_tagWaveHdr[m_nIndexWaveHdr].dwFlags = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwUser = (DWORD)(void*)this; if ( IsError(waveOutPrepareHeader(hwo, &m_tagWaveHdr[m_nIndexWaveHdr], sizeof(WAVEHDR))) ) { return false; } if(IsError(waveOutWrite(hwo,&m_tagWaveHdr[m_nIndexWaveHdr], sizeof(WAVEHDR)))) { waveOutUnprepareHeader(hwo, &m_tagWaveHdr[m_nIndexWaveHdr],sizeof(WAVEHDR)); m_tagWaveHdr[m_nIndexWaveHdr].lpData = NULL;

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

161

m_tagWaveHdr[m_nIndexWaveHdr].dwBufferLength = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwFlags = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwUser = NULL; m_nIndexWaveHdr--; return false; } m_dwStartPos += GetBufferLength(); return true; } DWORD CWaveOut::GetBufferLength() { return __min(m_dwWaveOutBufferLength, m_dwEndPos-m_dwStartPos); } DWORD CWaveOut::GetPosition() { if (m_hWaveOut) { MMTIME mmt; mmt.wType = TIME_SAMPLES; if ( IsError(waveOutGetPosition(m_hWaveOut, &mmt, sizeof(MMTIME)))) { return -1; } else return mmt.u.sample; } return -1; } bool CWaveOut::IsPlaying() { bool bResult = false; if(m_nIndexWaveHdr>-1&& m_tagWaveHdr[m_nIndexWaveHdr].dwFlags!=0) { bResult|=!(m_tagWaveHdr[m_nIndexWaveHdr].dwFlags & WHDR_DONE==WHDR_DONE); } return bResult; } bool CWaveOut::ResetRequired(CWaveOut* pWaveOut) { return (pWaveOut->m_dwStartPos >= pWaveOut->m_dwEndPos); }

Name: CWaveformDlg The CWaveformDlg class provides display of waveforms. It is having member

function OnPaint() responsible for drawing the speech waveforms.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

162

Member function: void CWaveformDlg::OnPaint() { int Gain,Offset; CPaintDC dc(this); // device context for painting CBrush mybrush(RGB(255,255,0)); CRect rect(0,0,1000,500); dc.Rectangle(rect); dc.FillRect(rect,&mybrush); CPen mypen(PS_SOLID, 1, RGB(0,0,255)); int m_nMapMode=MM_TWIPS; dc.SetMapMode(m_nMapMode); CSpeechDlg *pWnd=(CSpeechDlg*)GetParent(); if(pWnd) { if(pWnd->cmd==1) { if(pWnd->BufferLength<10000) { m_nMapMode=MM_LOMETRIC; dc.SetMapMode(m_nMapMode); Gain=50-pWnd->m_Gain/2; Offset=-pWnd->m_Offset*2; } else { Gain=(100-pWnd->m_Gain)*2; Offset=-pWnd->m_Offset*20; } for(int i=0; i<pWnd->BufferLength; i++) { if(i==0) dc.MoveTo(0,0); dc.LineTo(i,Offset-Gain*(pWnd->sample[i])/5); } } } else MessageBox("No Valid parent pointer"); } 4.4.3 Test Set up Test set up is prepared in C and Visual C++ for the speech compression using

wavelet transform.

In C Programs command line arguments are used to specify file name and type

of operation.

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

163

For Example: daub <operation> <filename> accepts two arguments. The first

argument specifies type of operation i.e. forward wavelet transform, inverse

wavelet transform, reading a file, writing a file etc. The second argument

specifies name of the file on which operation is performed. If user does not

specify any argument it displays following usage message:

Following commands are used for different wavelets: Lifting scheme of wavelet transform: Wavelet Command Haar lifthaar <operation> <filename> CDF cdf <operation> <filename> Daubechies Liftdaub <operation> <filename> Cubic prediction Cubic <operation> <filename> Filter bank algorithm: Wavelet Command Haar Haar <operation> <filename> Daub-8 daub8 <operation> <filename> Daubechies series daub <operation> <filename>

Speech Compression using Wavelet Transform Daubechies wavelet using filter bank method Usage: daub <operation> <filename> Example: daub fwt data : It performs forward wavelet transform on "data" file Other Operations: iwt : Inverse wavelet transform read: Read data file write: Write data file compare : compare two data files plotdiff: Plot the difference between two data files

CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM

164

A test set up developed by us using Visual C++ Programming is given in Fig. 4.1 This test set up provides following facilities for the analysis.

• Select the Wavelet.

• Record and play speech files.

• Selection of speech files.

• Threshold selection.

• Block length selection.

• Threshold variation for lossy compression.

• Display of speech waveform.

• Displays number of zeros after wavelet transforms.

• Displays original file size, compressed file size and compression ratio.

• Display parameters like SNR, Mean Absolute Deviation, Mean square error

(MSE), RMSE for statistical comparison between original speech file and

reconstructed speech file.

The detailed user manual for Visual C++ based “Wavelet based speech

compression” software is given in the Appendix C.

Fig 4.1 Snap shot of the test setup for speech compression using wavelet transform