a low complexity code compression based on hybrid rlc …

A LOW COMPLEXITY CODE COMPRESSION BASED ON

HYBRID RLC-BM CODES

Satheesh Kumar J M.E.,(Ph.D) Anthony Babu A Assistant Professor, Deepika P S Dept. of ECE Hemalatha S Hindusthan College of Engineering and Technology Dinesh Kumar A Coimbatore-32 UG Scholars, Dept of ECE,

Hindusthan College of Engineering and Technology,

Coimbatore-32

Abstract : Bit stream compression is

important in reconfigurable system design

since it reduces the bit stream size and the

memory requirement. It also improves the

communication bandwidth and thereby

decreases the reconfiguration time.

Dictionary based code compression

techniques are popular because they provide

both good compression ratio and fast

decompression mechanism. The efficiency of

these techniques is limited by the number of

bit changes used during compression. The

cost of storing the information is more for

repeating instruction sequences. The original

data are compressed using ‘codeword-length

constrained Bitmask code compression

(CLCBCC) with Mixed-bit saving dictionary

selection (MBSDS)’ algorithm. CLCBCC

with MBSDS algorithm gives the combined

advantages of both the algorithms. Run

length encoding (RLE) of consecutive

repeating bit sequences may yield a better

compression result. To represent such

encoding no extra bits are needed. The

compressed words are run length encoded

only if the RLE yields a shorter code length

than the original bitmask encoding. This

ensures faster execution and memory saving.

The proposed method also ensures that the

decompression efficiency remains the same as

that of the existing methods.

Keywords: dictionary based code compression,

bit masking ,run length coding, compressed

data, decompression technique.

I. INTRODUCTION

Integrated circuits were made possible

that which the semiconductor devices

could perform the functions of

vacuum tubes using millions of

transistors in a single silicon chip.

While designing an integrated circuit

power speed and area are the three

major things taken in consideration.

ICs have consistently migrated to

smaller feature sizes over the years,

allowing more circuitry to be packed

on each chip. As the feature size

International Journal of Pure and Applied MathematicsVolume 118 No. 20 2018, 4753-4763ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu

4753

shrinks, almost everything improves -

the cost per unit and the switching

power consumption go down, and the

speed goes up. Wolfe and Chanin [1]

first proposed code compression for

reducing the program size to conserve

memory usage. It leads to reduction in

cost. After that many code

compression algorithms have been

derived to code size, power

consumption as well as to improve the

performance. The complexity of

compression techniques grow rapidly

results in additional memory usage

and decompression technique also

become difficult. To taper both

problems become challenges to vlsi

systems.

Dictionary-based code compression

(DCC) [2] can be used to achieve an

efficient compression ratio. It posses

a relatively simple decoding

hardware and provide a higher

compression bandwidth. Although

several compression algorithms are

available. No single compression

algorithm worked for all kinds of

benchmarks. In this paper ,we have

combined three algorithms to increase

the compression performance with

smaller hardware overhead. Bitmask

code compression (BCC) algorithm[3]

[4] can be used for the compression

of data after dictionary selection and

then Run length coding is used for

further reduction. The remainder of

the paper is organized as follows.

section II explain about the related

works on code compression.

section III describes about both

dictionary based selection and bit

masking algorithm.

section IV describes about the

proposed hybrid RLC and BM codes.

section V presents the experimental

results and report using Modelsim and

Xilinx software's.

section VI ,Finally the conclusion is

drawn here.

II.RELATED WORK

Numerous data compression

algorithms have been applied to

compress code efficiently. Wolfe and

Chanin [1] were the first to use

International Journal of Pure and Applied Mathematics Special Issue

4754

Huffman coding on microprocessor

without pipeline stages. A line

address table maps the compressed

block addresses to actual addresses

when the cache misses and branch

instructions are encountered. Based

on the same concept, Lekatas and

wolf [5] applied arithmetic coding

with Markov model to reduced

instruction set computing processor.

Larin and Conte[6] applied Huffman

coding to VLIW processors. Xie et

al.[7] used Tunstall coding and

arithmetic coding to perform variable

to fixed compression on VLIW

processors. Lin et al.[8]proposed a

code compression for VLIW

processors. Lin et al.[9] proposed

selective code compression. Bonny

and Henkel [10] conjunction with

filled buffer technique and extended

blocks.

Qin and Mishra [11] used bounded

Huffman coding to compress

instructions and proposed a bit stream

parallely decompressed. Bonny and

Henkel [12] used left-uncompressed

instructions and compressed

instructions. It improves the

performance of decompression

engine. Lefurgy et al.[2] proposed the

first DCC algorithm. Gorjiara et

al.[13] used DCC with multi

dictionary for NISC. Ros and Sutton

[14] proposed hamming distances.

Qin et al.[15] combined BCC and run

length coding for improved

performance of the compression

technique.

Recent research in code compression

focused on combining various

compression techniques to get

optimized compression ratio. Wang

and Lin observed that no single

compression works efficiently.

Based on the Wei Jhih Wang and

Chang Hong Lin proposed the

combination of both CLBCC and

DCC. The algorithm we have derived

can be applicable for several

applications using smaller hardware

overhead. New technique to enhance

compression and decompression

engine performance.


4755

III. DCC AND BITMASK

ALGORITHMS

Dictionary based code

compression techniques are popular

because they provide both good

compression ratio and fast

decompression mechanism. The

efficiency of these techniques is

limited by the number of bit changes

used during compression. The cost of

storing the information for more

repeating instruction sequences. The

input configuration bit-stream is read

sequentially in the reverse order.

Then, the dictionary and the index are

derived based on the principles of the

well-known compression algorithm.

The original configuration bit-stream

can be reconstructed by parsing the

dictionary with respect to the index in

reverse order. The achieved

compression ratio is the ratio of the

total memory requirements (i.e.,

dictionary and index) to the size of the

bit-stream. The frequency

distributions were similar for all the

benchmarks. Compressing these high-

frequency instructions with the same

codeword length as other low-

frequency instructions would result in

inefficient compression. To overcome

this problem, these high-frequency

instructions are separated into another

small dictionary to obtain shorter

codeword lengths. Two LUTs are

used for the Bitmask approach. A

large LUT is used to compress single

instructions, and a small LUT is used

to compress the extremely high-

frequency instructions. The small

LUT was modifiable for storing either

single instructions or instruction

sequences.

An MBSDS first transforms

every unique instruction into a single

node. Two directional edges between

two nodes indicate that these two

instructions were matched to each

other using the Bitmask compression

approach. The proposed algorithm

then calculates the bit saving of all

nodes, and inserts the most profitable

node into the dictionary. The most


4756

profitable node is then removed from

the graph. Since all the neighboring

nodes of the most profitable node can

be covered by the most profitable

node, the node saving of each

neighboring node should subtract the

edge saving from the edge with the

most profitable node. Furthermore, all

the edges of the neighboring nodes

are removed. These steps are repeated

until the dictionary is full. The code

bits for compression is given below,

00-Uncompresssed

01-compressed with small LUT

10-compressed with small LUT

11-bitmask

Vector A 00000000 01

Vector A 00000000 01

Vector A 00000000 01

Vector A 00000000 01

Vector B 01010101 00 01010101 index BigLUT

Vector C 01011101 10 0 0 01011101

Vector C 01011101 10 0 1 1100000

Vector C 01011101 10 0

Vector D 01010111 00 01010111 index smallLUT

Vector E 11000011 11 11 11 1 0 00000000

Vector F 00001100 00 00001100

Vector F 00001100 00 00001100

Vector F 00001100 10

Vector G 11000000 10

Vector G 11000000 10

Vector H 11000100 11 10 01 1

mask position mask value

Fig 1.CLCBCC with MBSDS

IV. PROPOSED

ALGORITHM

In this section, the proposed

algorithms are described. A separate

dictionary used to reduce the code

word length. The process is explained

as block diagram given below,

Fig 2.block diagram for compression

In certain cases, such as in low

code density architecture which

contains a high number of unique

instructions, because of algorithm

characteristics, a Large LUT requires

a large chip area, additional power

consumption. Thus, it is desirable to

minimize the dictionary size. Two

LUTs are used here and Bit masking

is used then. After the above

processes Run length encoding is

executed.

original code

Dictionary selection bitmask RLC output

big lut

small lut


4757

Run-length encoding (RLE) is a

very simple form of data compression

in which runs of data (that is,

sequences in which the same data

value occurs in many consecutive data

elements) are stored as a single data

value and count, rather than as the

original run. This is most useful on

data that contains many such runs: for

example, simple graphic images such

as icons, line drawings, and

animations. It is not useful with files

that don't have many runs as it could

greatly increase the file size.

The compressed words are run

length encoded only if the RLE yields

a shorter code length than the original

bitmask encoding [16]. In other

words, if there are R repetitions of

code with length L and the number of

bits required to encode them using

RLE is L’ bits, RLE is used only if

R*L > L’ bits. Since RLE is

performed independently, the bit

savings calculation during dictionary

selection should be modified

accordingly to model the effect of

RLE. The different types of encoding

process

is represented in the given fig.

Fig3 Run length encoding variants

Bit-, Byte-, and Pixel-Level RLE Schemes

The basic flow of all RLE algorithms

is the same, as illustrated in Figure 4

Fig 4. flow chart of run length encoding


4758

we can use this RLC algorithm either

before this DCC and Bitmask or else

vice versa. But according to our

analysis it's better to apply RLC after

DCC and Bitmask.

Algorithm:

input:

1. bit streams to be compressed as

vector.

2.small dictionary

3.big dictionary

4.mask types

output:

compressed code.

begin

step 1: comparing data's with both

LUT and then compress it.

step 2: If it is not suit for above one

go for bitmask.

step 3: The above compressed data's

are compressed through RLC.

end

Fig 5.Flow chart for compression

The decompression method is

available to reproduce the original

data whenever it needed.

V EXPERIMENTAL RESULTS

In this section, experimental

benchmarking results for using the

Modelsim and Xilinx.

Table 1: Comparison of existing and

proposed

S.NO

PARAMETERS

DCC

ALGORITHM

HYRID RLC BM CODES

1

Power

consumption

390 P(m/w)

355

P(m/w)

2

Area(no of

slices occupied)

630

326

3

Delay

14.19ns

12.958ns


4759

Fig 6 comparison of existing and proposed parameters

Hybrid RLC-BM algorithm:

Fig 7 delay of hybrid rlc- bm algorithm

Fig 8. represents area used by hybrid rlc-bm algorithm

Fig 9.represents the power consumed in hybrid rlc-bm algorithm

VI.CONCLUSION

Hybrid RLC-BM algorithm is

Compared to the DCC algorithm the

parameters like power consumption

,delay or speed and area are optimally

reduced. The proposed method

improves CR by over 40% with a

slight hardware overhead. This

algorithm is implemented in

FPGA[17-18].

0

200

400

600

800

excisting proposed

area

1 area

320

340

360

380

400

excisting proposed

power consumption

powerconsumption

12

13

14

15

excisting proposed

delay

delay


4760

REFERENCES

[1].A. Wolfe and A. Chanin,

''Executing compressed programs on

an embedded RISC architecture," in

Proc.25th Annu Int

Symp.Microarchitecture,Dec.1992,pp.

81-91.

[2]C. Lefurgy. P. Bird, I-C. Chen, and

T, Mudge, "Improving code density

using compression techniques," in

Proc 30th Annu.ACM/IEEE Int.

Symp. MICRO, Dec.1997,pp.194-

203.

[3] S-W. Seong and P.Mishra ,"A

bitmask -based code compression

technique for embedded system ,"in

Proc. IEEE/ACMICCAD,Nov.2006,

pp.251-254.

[4] S-W. Seong and P. Mishra, "An

efficient code compression technique

using application -aware bitmask and

dictionary selection methods," in

Proc. DATE ,2007,pp.1-6.

[5] H.Lekatsas and W. Wolf

,"SAMC:A code compression

algorithm for embedded processors,

"IEEE Trans .computer-aided design

integer-circuits Syst.,vol.18,no.12,pp.

1689-1701 ,Dec. 1999.

[6]S.Y. Larin and T.M .Conte,

"complier-driven cached code

compression schemes for embedded

ILP processors ,"in Proc .32nd Annu.

Int.Symp.Microarchitecture,nov.1999,

pp. 82-91.

[7]Y. Xie, W. Wolf, and H.Lekatsas,"

Code compression for VLIW

processor using variable -to-fixed

coding, "in Proc. 15th ISSS, 2002,

PP.138-143.

[8]C.H. Lin ,Y. Xie, and W. Wolf,"

Code compression for VLIW

processor using a self-generating

table, "IEEE Trans. Very Large Scale

Integer.(VLSI)Syst.,vol.

15,no.10.pp.1160-1171, oct. 2007.

[9]C.-W .Lin, C. H. Lin and W .J.

Wang, "A power aware code-

compression design for RISC/VLIW

architecture, "J. Zhejiang Univ.-

sci.Comput.

Electron.),vol.12,no.8,pp.629-637,aug

2011.

[10] T. Bonnt and J. Henkel, "FBT:

Filled buffer technique to reduce

code size for VLIW processors," in

Proc. IEEE/ACM Int.

Conf.CAD(1CCAD),Nov,2008,pp.54

9-554.

[11]X. Qin and P.MISHRA,

"Efficient placement of compressed

code for parallel decompression, "in

Proc.22nd Int, Conf>VLSI

Design,Jan,2009,pp.335_340.

[12] T. Bonnt and J. Henkel, ''LICT:

Left-uncompressed instructions

compression technique to improve the

decoding performance of VLIW

processors, "in Proc 46thACM/IEEE

Design

Autom.Conf.,Jul,2009,pp903_906.

[13]B. Gorjiara, M. Reshadi, and D.

Gajski, "Merged dictionary code

compression for FPGA

implementation of custom microcode

Pes," ACM Trans Reconfigurable

Technol.Syst,vol,1,no.2,jun

2008,Art.ID 11.


4761

[14] M. Ros and P. Sutton "A

hamming distance based VLIW/EPIC

code compression technique, "in Proc

Int,Conf,CASES,2004,pp.132_139.

[15] X. Qin, C. Murthy, and

P.Mishra."Decoding-aware

compression p f FPGA bit streams,

"IEEE Trans, Very Large Scale

Integr,(VLSI)Syst.,vol.19,no,3,pp,411

-419,Mar,2011.

[16] Jubairahmed, L., Satheeskumaran, S.,

Venkatesan, C.: Contourlet transform based

adaptive nonlinear diffusion filtering for

speckle noise removal in ultrasound images.

Clust. Comput. 1–10 (2017).

https://doi.org/10.1007/s10586-017-1370-x

[17] Satheeskumaran, S., Sabrigiriraj, M.:

VLSI implementation of a new LMS based

algorithm for noise removal in ECG signal.

Int. J. Electron. 103, 975–984 (2016).

[18] Venkatesan C, Karthigaikumar P,

Varatharajan R (2018) FPGA

implementation of modified error

normalized LMS adaptive filter for ECG

noise removal. Cluster Computing, 1-9.

https://doi.org/10.1007/s10586-017-1602-0.


4762

a low complexity code compression based on hybrid rlc …

Documents