development of vocoder board for speech …...bit rate vocoder board. the process includes component...

12
DOI: 10.23883/IJRTER.2017.3352.ATUFP 220 Development of vocoder board for speech compression at three different rates Lavanya krishna 1 , Rajeswari P 2 1 PG student, DSCE, Bangalore, 2 Associate Professor, DSCE, Bangalore Abstract: Compression of speech signal is an important field in digital signal processing. Because of limited bandwidth in many fields especially in the field of military speech compression has a significant importance in the present world. The other reasons for speech compression are limited transmission and storage capacity. The process of converting human speech signals into encoded representation then converting back into original signal by decoding back to produce a close approximation of the original signal. This paper presents a speech compression by designing a low bit rate Vocoder board. The process includes component selection for doing schematic followed by PCB design. The final board is tested by giving speech input of different languages and then calculating PESQ of both input and compressed speech. Index Terms— Vocoder board, TWELP, Cool edit pro, PESQ, Software configuration. I. INTRODUCTION 1 According to information theory, the minimum bitrate at which the conditions of distortionless transmission of any source signal is possible is determined by the entropy of the speech source message. The compression after the maximum level compression results in distortion and loss of signal. Various speech encoding techniques includes LPC, CELP, MELP and TWELP. Compression of speech signal results in low bit rate data which reduces the bandwidth required for transmission. Implementing better efficient compression techniques results in both quality and LBR data. Encoding, decoding and compression of speech signal can be done using VOCODER . VOCODER can be configured either by hardware or by software. In hardware configuration jumpers are used to fix the voltage for configurable pins, where as in software configurations any of the processors or the controllers can be used to configure the pins. The Blackfin device BF548 can be used to configure and also to read and write the speech signals from and to the VOCODER respectively. On top of that the read speech signal can be encrypted. The encryption of compressed speech signal is the main requirement. A large part of the researches in speech process algorithms is motivated by the need of obtaining secure military communications, to allow effective operation in a hostile environment. Since the bandwidth of the communication channel is a sensitive problem in military applications, low bit-rate speech compression methods are used. Several speech processing applications such as Mixed Excitation Linear Prediction are characterized by very strict requirements in power consumption, size, and voltage supply. These requirements are difficult to fulfill, given the complexity and number of functions to be implemented, together with the real time requirement and large dynamic range of the input signals. To meet these constraints, careful optimization should be done at all levels, ranging from algorithmic level, through system and circuit architecture, to layout and design of the cell library. The key points of this optimization are among others, the choice of the algorithms, the

Upload: others

Post on 25-Apr-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

DOI: 10.23883/IJRTER.2017.3352.ATUFP 220

Development of vocoder board for speech compression

at three different rates

Lavanya krishna1, Rajeswari P

2

1PG student, DSCE, Bangalore,

2Associate Professor, DSCE, Bangalore

Abstract: Compression of speech signal is an important field in digital signal processing. Because of

limited bandwidth in many fields especially in the field of military speech compression has a

significant importance in the present world. The other reasons for speech compression are limited

transmission and storage capacity. The process of converting human speech signals into encoded

representation then converting back into original signal by decoding back to produce a close

approximation of the original signal. This paper presents a speech compression by designing a low

bit rate Vocoder board. The process includes component selection for doing schematic followed by

PCB design. The final board is tested by giving speech input of different languages and then

calculating PESQ of both input and compressed speech.

Index Terms— Vocoder board, TWELP, Cool edit pro, PESQ, Software configuration.

I. INTRODUCTION1

According to information theory, the minimum bitrate at which the conditions of distortionless

transmission of any source signal is possible is determined by the entropy of the speech source

message. The compression after the maximum level compression results in distortion and loss of

signal. Various speech encoding techniques includes LPC, CELP, MELP and TWELP. Compression

of speech signal results in low bit rate data which reduces the bandwidth required for transmission.

Implementing better efficient compression techniques results in both quality and LBR data.

Encoding, decoding and compression of speech signal can be done using VOCODER . VOCODER

can be configured either by hardware or by software. In hardware configuration jumpers are used to

fix the voltage for configurable pins, where as in software configurations any of the processors or the

controllers can be used to configure the pins. The Blackfin device BF548 can be used to configure

and also to read and write the speech signals from and to the VOCODER respectively. On top of that

the read speech signal can be encrypted. The encryption of compressed speech signal is the main

requirement.

A large part of the researches in speech process algorithms is motivated by the need of obtaining

secure military communications, to allow effective operation in a hostile environment. Since the

bandwidth of the communication channel is a sensitive problem in military applications, low bit-rate

speech compression methods are used. Several speech processing applications such as Mixed

Excitation Linear Prediction are characterized by very strict requirements in power consumption,

size, and voltage supply. These requirements are difficult to fulfill, given the complexity and number

of functions to be implemented, together with the real time requirement and large dynamic range of

the input signals. To meet these constraints, careful optimization should be done at all levels, ranging

from algorithmic level, through system and circuit architecture, to layout and design of the cell

library. The key points of this optimization are among others, the choice of the algorithms, the

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 221

modification of the algorithms to reduce computational complexity, the choice of a fixed-point

arithmetic unit, the minimization of the number of bits required at every node of the algorithm, and a

careful match between algorithms and architecture. This paper concentrates on low bit rate speech

coding technology, mainly in TWELP and solved the problem of optimizing the program of TWELP

on Digital Signal Processor platform. The algorithm was ported onto a fixed point DSP, Blackfin

537, and stage by stage optimization was performed to meet the real time requirements. The main

functions involved were analysis, parameter encoding, parameter decoding and synthesis. The fixed

point source code at the TWELP front end was also thoroughly optimized at the C Level. Memory

optimization techniques such as data placement and caching were also used to reduce the processing

time. The results were obtained show that real-time implementations of a speech Vocoder based on

the TWELP standard for low bit rate communications (2400 bps) can be successful on DSP

platforms.

1. STATEMENT OF THE PROBLEM

As already discussed in introduction bandwidth in any communication system plays a critical role.

Efficient use of bandwidth is of main concern in any kind of communication. When the data to be

transmitted is compressed then the bandwidth requirement for compressed data transmission will

also decreases. Not only the bandwidth requirement but also the capacities to store the data will also

decreases which in turn decrease the time to access the data. Some of the existing compression

technique may result in low bit rate data but may fail to produce better quality speech. In order to

achieve the above conditions compression of data which results in speech with better quality and also

of low bit rate has to be used. These requirement leads to the development of a compression

technique which results in speech with better quality and also of low bit rate.

II. MOTIVATION

In any field of communication the bandwidth is the most important requirement. In order to reduce

the bandwidth requirement in any communication system especially in the field of military speech

compression is one of the prominent methods among many other techniques to reduce the need of

bandwidth. The disadvantages like bad speech quality, less compression ratio and so on leads to this

method of speech compression using low bitrate vocoder.

III. OBJECTIVE

The main objective of this project is to reduce the bandwidth requirement in communication system

by performing a speech compression especially in military applications. Other than reduction of

bandwidth the project also aimed at keeping the speech quality as good as the input speech even after

performing compression. The compression of speech of different languages, maintaining the PESQ

of the output speech signal is also the aim of this project.

Performing the compression of different speech signal with different rates is one of the objectives of

this project. The compression of 21 different languages at three different rates such as 600bps,

1200bps and 600bps is also in the list of objectives of this project.

IV. METHODOLOGY

The speech to be compressed is given as an input to the low bit rate vocoder board where in the

speech signal is encoded and compressed by vocoder chip and the encoded and compressed speech

will be decoded and decompressed at the receiver side. The decompressed speech can be playback to

hear the speech. The output speech is tested for its quality by checking its PESQ. The PESQ of

output speech is compared with that of input speech. There are many methods to do the compression;

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 222

in this project vocoder is implemented as a speech compressor. The compression algorithm used here

is the TWELP.

1.1 VOCODER

The Vocoder is a chip of 128 pins. It has built-in FLASH and RAM and can perform real time speech

compression. The main advantage is that there will be no need for external storage which decreases

the complexity. It supports three different rates such as 600bps, 1200bps and 2400bps which are

configurable either by software or by hardware.

Vocoder chip can integrate the codec AD73311 or AIC23 seamlessly and configure it when powering

up without user involving. The board with Vocoder chip is connected to BF548 via UART. The

communication between board and BF548 unit is Asynchronous and Full duplex. There are three

ports available in BF548, among them port 3 is used for the purpose of trans reception between

vocoder board and BF548.

The board supports four different baud rates, three different coding rate, MICIN mode, LINEIN

mode, two different CODEC such as AIC23 and AD73311 which can be configurable either by

software or by hardware.

5.2 BF548

The architecture of BF548 is described in this section

Figure 1: BF548 architecture

V. BLOCK DIAGRAM

Figure 2: Block diagram of Vocoder

The above block diagram represents the functional overview of the Vocoder. Each block functions

are described below.

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 223

6.1 Algorithm block

The function of algorithm block is to implement the functions related to encoding/decoding

algorithm. This block is the core module of Vocoder chip.

During encoding the algorithm block receive speech data from the codec interface block then

compress and encode the data and then send to the BF548 interface clock to transmit.

During decoding the algorithm block receive data from the BF548 block, decodes the data and then

send to the codec interface module to playback.

6.2 CODEC interface block

The codec interface block connects to the external codec to which the speech data to be

compressed has to be send.

During encoding the codec interface block receives the speech data from external codec and then

sends it to an algorithm block to do compression encoding.

During decoding the codec interface block receives the decoded speech data from algorithm block

and then send it back to the external codec to play.

6.3 BF548 interface block

The BF548 block connects to the external BF548 and is used to transport encoded/decoded data

and also the configuration data.

During encoding the BF548 block receives data from algorithm block frame them and send to

external BF548 unit.

During decoding, the BF548 interface block receives speech data frame from external BF548 unit

decode the frames and then send to algorithm block for speech data encoding.

During configuring BF548 interface block receives configuring data frame from external BF548,

decode the frames and then send to the configuration block for parsing and configuring.

The communication between BF548 and configuration block is full duplex therefore the

configuration data, encoded data and decoded data can be send simultaneously.

6.4 Configuration block

This block configure the chip function according to configure pin status or external configure data.

When powered up, configure block samples the configure pins’ status to configure the chip

accordingly.

When in operating, the configure block accepts the data from BF548 interface block BF548 and

configure related blocks after parsing the data.

VI. BOARD COMPOSITION

This section describes the components of developed board. Figure 3.2 is the developed board. Figure

3.3 represents the detailed representation of the developed board which named every components in

the board.

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 224

Figure 3: Vocoder board

The above Figure 3 shows the final board for compression. The basic and important components of

this board are CODEC, VOCODER CHIP and other filter circuits. The CODEC performs encoding

and decoding in input and output side of the board. Converting the input analog signal into digital

signal is performed by ADC and at the receiver side DAC converts the digital signal into analog

signal as it is required to hear. VOCODER CHIP incorporated with a processor which consist

TWELP algorithm accordingly compression occur. This algorithm consists of the process to be

applied for the input speech. Other circuits includes power supply, switch to monitor the power

supply, jumpers for the hardware configuration, sockets for inputting the speech and to hear the

output speech, includes resistors and capacitors, voltage regulators, reset button and other filtering

circuits.

Figure 4: Board with component description

The above figure shows the demo board composition with description of each component. The demo

board can input analog speech signal, encode it with vocoder chip then decode in loop. With external

speaker, it will show the performance of encoding and decoding.

VII. DESCRIPTION OF JUMPERS

Pin J4 in the left side is for power supply input, connecting to a 5V DC supply. A switch near the pin

J4 turns on/off the power of the whole demo board. Switch it to the upper position in the figure to

turn on the power. Two LED will be lit on when the board starts work. LED D1 is red which

represents 5.5V power work well and LED D4 is green which represents 3.3V power work well.

The pin J1 connects to the analog speech signal. This pin connects to PC’s audio output interface.

The pin J2 is for speech output, connecting to external speaker or ear phone to play the speech after

decoding.

LINEIN mode and MICIN mode are two different modes available to give input to the board.

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 225

2. SYSTEM DESIGN

The design of the system involves the following steps. Each of the following steps is important and

place equal role in the development of this project. Any mistakes in any of the following steps will

produce error in the entire project. All the steps are related to each other and hence bright skill and

concentration is do required while carrying out these steps.

Step1: Schematic of whole system by using Orcad tool

Step2: Simulation of schematic

Step3: Drawing of the correct schematic using CADs

Step 4: Submission of drawing for PCB design

Step 5: Board testing by giving speech as an input.

Step 6: Replacement of some components if necessary.

Step 7: Checking the quality of speech signal, compression ratio, PESQ results of input and

compressed speech.

Step 8: Compression for three different rates are checked.

Step 9: Interfacing Vocoder board with BF548 for software configuration.

Step 10: Development of C code for software configuration

Step 11: Running the developed code for all kinds of configuration in visual DSP environment.

Step 12: Note down all the obtained results.

Step 13: Comparing the obtained results of PESQ with that of original signal.

9.1 PIN ASSIGNMENT

The Vocoder chip of 128 pin each pin can be configurable either by software or by hardware method.

The hardware configuration is done by jumpers and for software configuration the board need to

configure with BF548 and has to program it according to which the software configuration takes

place. The pin diagram and its details is as shown below in figure 4.1

Figure 5: Pin diagram of Vocoder chip

VOCODER

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 226

9.2 COMPLETE SCHEMATIC

Figure 6: Complete schematic of vocoder board

VIII. CONFIGURATION AND INTERFACES

10.1 Configuration:

Vocoder chip can be configured in hardware or software method. One hand, the chip samples the

voltage of the configure pins to finish the configuration. On the other hand, user can configure it via

software protocol when running.

10.2 Rate selection

Vocoder chip works at three rates: 2400bps, 1200bps and 600bps. The pins are illustrated in the

table below Table 1: Rate selection pins definition.

RATE_SEL1 RATE_SEL0 Coding Rate

0 0 2400bps

0 1 1200bps

1 0 600bps

1 1 Codec loop

Codec Loop Mode means skipping the processing of coding and decoding with playing back the

speech data directly. It is used to test or debug. Rate selection can also be configured with software

protocol.

10.3 Codec Selection

With external pin selection, Vocoder chip can connect to AD73311 and AIC23 seamless.

The pin CD_SEL is given as below:

Table 2: CODEC selection pins

CD_SEL Function

0 Select AD73311

1 Select AIC23

10.4 BF548 Interface Rate Selection (Baud rate)

The chip connect to external BF548 via asynchronous UART and the Baud rate can be selected

using the pins as following Table 3: Baud Rate selection pins definition.

BR_SEL1 BR_SEL0 Baud Rate

0 1 15200bps

0 1 9600bps

1 0 4800bps

1 1 2400bps

Baud rate of the serial port can be configured with software protocol.

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 227

IX. INTERFACES

11.1 BF548 Interface

VOCODER connects with BF548 using UART port. The speed can be selected by hardware or

software. The interface pin includes UART_TX and UART_RX. The port time sequence adopts the

standard UART time sequence.

11.2 Codec Interface

VOCODER support kinds of Codec, which is selectable by hardware or software. The interface pins

include: BCLK_IN, FSYN_IN, PCM_IN, BCLK_OUT, FSYN_OUT, and PCM_OUT. The port time

sequence can be configured by software protocol.

11.3 Configure Interface

When VOCODER is powered up, it sample the configure pins to configure the working mode. It can

also be configured with software protocol when running. The configure pins include: coding rate

select pins (RATE_SEL0, RATE_SEL1), Codec select pins (CD_SEL), BF548 port speed select pins

(BR_SEL0, BR_SEL1).

X. COMMUNICATION PROTOCOL AND FRAME STRUCTURE

12.1 communication protocol

The communication protocol between BF548 unit and internal BF548 with configuration block is

UART protocol.

A UART (Universal Asynchronous Receiver and Transmitter) is a device allowing the reception and

transmission of information, in a serial and asynchronous way.

• A UART allows the communication between a computer and several kinds of devices (printer,

modem, etc.), interconnected via an RS-232 cable. Data transmission is made by the UART in a

serial way, by 11-bit blocks:

• A 0 bit marks the starting point of the block

• Eight bits for data

• One parity bit

• A 1 bit marking the end of the block

• The transmission and reception lines should hold a 1 when no data is transmitted.

• The first transmitted bit is start bit data parity bit stop bit

• The first transmitted bit is the LSB (least significant bit)

• The parity bit is set to 1 or 0, depending on the number of 1's transmitted: if even parity is used,

this number should be even; if odd parity is used, this number should be odd. If the chosen parity is

not respected in the block, a transmission error should be detected

• The transmission speed is fixed, measured in bauds.

12.2 Frame structure

The frame length is fixed as 16 Byte and the frame structure is as below:

Table 4: Frame structure

Header CMD_TYP LEN PAYLOAD CHECKSUM

(1) HEADER

Frame head, 2 bytes length. The content is fixed as 0x4C4E

(2) CMD_TYPE

The command type, 1 byte length.

(3) LEN

The payload length. 1 byte length.

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 228

(4) PAYLOAD

The payload data. 11 bytes length.

(5) CHECKSUM

Checksum is a 1 byte length. Add the first 15 bytes in a frame (it means the total frame

excluding checksum itself) and get the low 8 bits of the sum as the checksum.

XI. RESULTS

The design procedure is discussed in the above chapters the results such as compressed output, PESQ

results is as shown below

Fig 7: Hindi input speech

Fig 8: Hindi compressed speech(1200bps)

Figure 9: English input speech

Figure 10: English compressed speech (1200bps)

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 229

Figure 11: Portuguese input speech

Figure 12: Portuguese compressed speech (1200bps)

Figure 13: Hindi compressed speech (2400bps)

Figure 14: English compressed speech (2400bps)

Figure 15: Portuguese compressed speech (2400bps)

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 230

Figure 16: PESQ results (1200bps)

Figure 17: PESQ result (2400bps)

XII. CONLUSION AND FUTURE WORK

The compression of speech signal using Vocoder is done. In this project, the inter-frame redundancy

of 2400 bps TWELP parameters was successfully exploited. The algorithm was ported onto two

Blackfin ADSP-BF548. The real time implementation is done by using BF548 EZ KIT. The results

are ideal and meet the needs of engineering applications: real-time implementation of a speech

Vocoder that is based on the TWELP standard for low bit rate communications (2400 bps and

1200bps) on hardware platforms. Test results show that the compressed speech is good enough to

implement in various field of communication. The output speech was fairly intelligible with a MOS

measure of 2.75 on the average. Moreover, it is possible to reduce the execution time per frame from

160 milliseconds to 47 milliseconds. This work can be extended to building encoders with very low

bit rates (for instance, 800 bps), by using different methods for parameter quantization and for

calculating the gain and by applying various optimizations to the code, which would allow the

implementation of such an encoder on DSP platforms in real time. Compression for still low bit rate

encoder has to be build.

Instead of interfacing the vocoder board with the BF548 DSP kit, keeping both vocoder board and

BF548 on same board has to be done as future work. In this project, the coding is done at three

different rates and the minimum rate is 600bps. Coding at still minimum rate is also the future

enhancement of this project. Improving the quality of the speech and also PESQ is also required

enhancement in future.

REFERENCE 1. T. E. Tremain, “The Government Standard Linear Predictive Coding LPC-10”, Speech Technology, pp.40-49, 2014

2. A. McCree, T. Kwan, E. B. George and V. Viswanathan, “A 2.4 kbit’s MELP coder candidate for the new U.S. Federal

Standard”, Acoustics, Speech, and Signal Processing, IEEE International Conference, vol.1, pp. 200-203, 2015.

3. L. M. Supplee, R. P. Cohn, J. S. Collura and A. McCree, “MELP: the new Federal Standard at 2400bps”, Acoustics,

Speech, and Signal Processing, IEEE International Conference, vol.2, pp. 1591-1594, 2011.

4. J. Wang, J, Zhao, J. Yang and Y. Yang, “The research for the MELP Vocoder and its real-time replementation”, in

Journal of the Institution of Engineers, vol. 44, pp.38-58, 2004.

5. ADSP-BF537 Blackfin Processor Hardware Reference manual, Revision 3.4, 2013.

International Journal of Recent Trends in Engineering & Research (IJRTER)

Volume 03, Issue 07; July - 2017 [ISSN: 2455-1457]

@IJRTER-2017, All Rights Reserved 231

6. M. Olausson and L. Dake, “The ADSP-21535 Blackfin and Speech Coding”, Proceedings of the Swedish System-on-

chip Conference (SSoCC), 2003.

7. Blackfin DSP Instruction Set Reference, 2002.

8. ADSP-21535 Blackfin DSP Hardware Reference, 2002.

9. ITU-t recommendation on g.723.1, dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3

kbit/s, 2013

10. ITU-t recommendation g.729, coding of speech at 8 kbit/s using conjugate-structure algebraic-codeexcited-linear-

prediction (cs-acelp), 2014

11. ETSI GSM Fullrate Speech Codec for Analog Devices Blackfin, Bayer DSP Solutions, 2008.

12. G. Bertini, F. Fontata, D. Gonzalez, L. Grassi and M. Magrini, “Voice Transformation Algorithms with Real Time

DSP Rapid Prototyping Tools”, 2004, unpublished.

13. Y. Shaked and A. L. Cole, “Implementation of MELP based Vocoder for 1200/2400 bps”, The EE Project Contest

2000, Technion Signal and Image Processing Lab, 2000, unpublished.

14. J. M. Valin , “Speex: A Free Codec for Free Speech”, available at http://jmvalin.ca/papers/speex_lca2006.pdf (Last

Accessed: May 2015).

15. Vorbis codec, available at http://www.vorbis.com/ (Last Accessed: May 2015).

16. ITU-T Recommendation P.800 Methods for subjective determination of transmission quality, available at

http://www.itu.int/ITU-T/ recommendations/rec.aspx?rec=3638 (Last Accessed: May 2015).

17. J. M. Gibson, “Speech coding methods, standards, and applications”, in Circuits and Systems Magazine, IEEE, vol.

5, pp. 30-49, 2005.