a low complexity code compression based on hybrid rlc …
TRANSCRIPT
A LOW COMPLEXITY CODE COMPRESSION BASED ON
HYBRID RLC-BM CODES
Satheesh Kumar J M.E.,(Ph.D) Anthony Babu A Assistant Professor, Deepika P S Dept. of ECE Hemalatha S Hindusthan College of Engineering and Technology Dinesh Kumar A Coimbatore-32 UG Scholars, Dept of ECE,
Hindusthan College of Engineering and Technology,
Coimbatore-32
Abstract : Bit stream compression is
important in reconfigurable system design
since it reduces the bit stream size and the
memory requirement. It also improves the
communication bandwidth and thereby
decreases the reconfiguration time.
Dictionary based code compression
techniques are popular because they provide
both good compression ratio and fast
decompression mechanism. The efficiency of
these techniques is limited by the number of
bit changes used during compression. The
cost of storing the information is more for
repeating instruction sequences. The original
data are compressed using ‘codeword-length
constrained Bitmask code compression
(CLCBCC) with Mixed-bit saving dictionary
selection (MBSDS)’ algorithm. CLCBCC
with MBSDS algorithm gives the combined
advantages of both the algorithms. Run
length encoding (RLE) of consecutive
repeating bit sequences may yield a better
compression result. To represent such
encoding no extra bits are needed. The
compressed words are run length encoded
only if the RLE yields a shorter code length
than the original bitmask encoding. This
ensures faster execution and memory saving.
The proposed method also ensures that the
decompression efficiency remains the same as
that of the existing methods.
Keywords: dictionary based code compression,
bit masking ,run length coding, compressed
data, decompression technique.
I. INTRODUCTION
Integrated circuits were made possible
that which the semiconductor devices
could perform the functions of
vacuum tubes using millions of
transistors in a single silicon chip.
While designing an integrated circuit
power speed and area are the three
major things taken in consideration.
ICs have consistently migrated to
smaller feature sizes over the years,
allowing more circuitry to be packed
on each chip. As the feature size
International Journal of Pure and Applied MathematicsVolume 118 No. 20 2018, 4753-4763ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu
4753
shrinks, almost everything improves -
the cost per unit and the switching
power consumption go down, and the
speed goes up. Wolfe and Chanin [1]
first proposed code compression for
reducing the program size to conserve
memory usage. It leads to reduction in
cost. After that many code
compression algorithms have been
derived to code size, power
consumption as well as to improve the
performance. The complexity of
compression techniques grow rapidly
results in additional memory usage
and decompression technique also
become difficult. To taper both
problems become challenges to vlsi
systems.
Dictionary-based code compression
(DCC) [2] can be used to achieve an
efficient compression ratio. It posses
a relatively simple decoding
hardware and provide a higher
compression bandwidth. Although
several compression algorithms are
available. No single compression
algorithm worked for all kinds of
benchmarks. In this paper ,we have
combined three algorithms to increase
the compression performance with
smaller hardware overhead. Bitmask
code compression (BCC) algorithm[3]
[4] can be used for the compression
of data after dictionary selection and
then Run length coding is used for
further reduction. The remainder of
the paper is organized as follows.
section II explain about the related
works on code compression.
section III describes about both
dictionary based selection and bit
masking algorithm.
section IV describes about the
proposed hybrid RLC and BM codes.
section V presents the experimental
results and report using Modelsim and
Xilinx software's.
section VI ,Finally the conclusion is
drawn here.
II.RELATED WORK
Numerous data compression
algorithms have been applied to
compress code efficiently. Wolfe and
Chanin [1] were the first to use
International Journal of Pure and Applied Mathematics Special Issue
4754
Huffman coding on microprocessor
without pipeline stages. A line
address table maps the compressed
block addresses to actual addresses
when the cache misses and branch
instructions are encountered. Based
on the same concept, Lekatas and
wolf [5] applied arithmetic coding
with Markov model to reduced
instruction set computing processor.
Larin and Conte[6] applied Huffman
coding to VLIW processors. Xie et
al.[7] used Tunstall coding and
arithmetic coding to perform variable
to fixed compression on VLIW
processors. Lin et al.[8]proposed a
code compression for VLIW
processors. Lin et al.[9] proposed
selective code compression. Bonny
and Henkel [10] conjunction with
filled buffer technique and extended
blocks.
Qin and Mishra [11] used bounded
Huffman coding to compress
instructions and proposed a bit stream
parallely decompressed. Bonny and
Henkel [12] used left-uncompressed
instructions and compressed
instructions. It improves the
performance of decompression
engine. Lefurgy et al.[2] proposed the
first DCC algorithm. Gorjiara et
al.[13] used DCC with multi
dictionary for NISC. Ros and Sutton
[14] proposed hamming distances.
Qin et al.[15] combined BCC and run
length coding for improved
performance of the compression
technique.
Recent research in code compression
focused on combining various
compression techniques to get
optimized compression ratio. Wang
and Lin observed that no single
compression works efficiently.
Based on the Wei Jhih Wang and
Chang Hong Lin proposed the
combination of both CLBCC and
DCC. The algorithm we have derived
can be applicable for several
applications using smaller hardware
overhead. New technique to enhance
compression and decompression
engine performance.
International Journal of Pure and Applied Mathematics Special Issue
4755
III. DCC AND BITMASK
ALGORITHMS
Dictionary based code
compression techniques are popular
because they provide both good
compression ratio and fast
decompression mechanism. The
efficiency of these techniques is
limited by the number of bit changes
used during compression. The cost of
storing the information for more
repeating instruction sequences. The
input configuration bit-stream is read
sequentially in the reverse order.
Then, the dictionary and the index are
derived based on the principles of the
well-known compression algorithm.
The original configuration bit-stream
can be reconstructed by parsing the
dictionary with respect to the index in
reverse order. The achieved
compression ratio is the ratio of the
total memory requirements (i.e.,
dictionary and index) to the size of the
bit-stream. The frequency
distributions were similar for all the
benchmarks. Compressing these high-
frequency instructions with the same
codeword length as other low-
frequency instructions would result in
inefficient compression. To overcome
this problem, these high-frequency
instructions are separated into another
small dictionary to obtain shorter
codeword lengths. Two LUTs are
used for the Bitmask approach. A
large LUT is used to compress single
instructions, and a small LUT is used
to compress the extremely high-
frequency instructions. The small
LUT was modifiable for storing either
single instructions or instruction
sequences.
An MBSDS first transforms
every unique instruction into a single
node. Two directional edges between
two nodes indicate that these two
instructions were matched to each
other using the Bitmask compression
approach. The proposed algorithm
then calculates the bit saving of all
nodes, and inserts the most profitable
node into the dictionary. The most
International Journal of Pure and Applied Mathematics Special Issue
4756
profitable node is then removed from
the graph. Since all the neighboring
nodes of the most profitable node can
be covered by the most profitable
node, the node saving of each
neighboring node should subtract the
edge saving from the edge with the
most profitable node. Furthermore, all
the edges of the neighboring nodes
are removed. These steps are repeated
until the dictionary is full. The code
bits for compression is given below,
00-Uncompresssed
01-compressed with small LUT
10-compressed with small LUT
11-bitmask
Vector A 00000000 01
Vector A 00000000 01
Vector A 00000000 01
Vector A 00000000 01
Vector B 01010101 00 01010101 index BigLUT
Vector C 01011101 10 0 0 01011101
Vector C 01011101 10 0 1 1100000
Vector C 01011101 10 0
Vector D 01010111 00 01010111 index smallLUT
Vector E 11000011 11 11 11 1 0 00000000
Vector F 00001100 00 00001100
Vector F 00001100 00 00001100
Vector F 00001100 10
Vector G 11000000 10
Vector G 11000000 10
Vector H 11000100 11 10 01 1
mask position mask value
Fig 1.CLCBCC with MBSDS
IV. PROPOSED
ALGORITHM
In this section, the proposed
algorithms are described. A separate
dictionary used to reduce the code
word length. The process is explained
as block diagram given below,
Fig 2.block diagram for compression
In certain cases, such as in low
code density architecture which
contains a high number of unique
instructions, because of algorithm
characteristics, a Large LUT requires
a large chip area, additional power
consumption. Thus, it is desirable to
minimize the dictionary size. Two
LUTs are used here and Bit masking
is used then. After the above
processes Run length encoding is
executed.
original code
Dictionary selection bitmask RLC output
big lut
small lut
International Journal of Pure and Applied Mathematics Special Issue
4757
Run-length encoding (RLE) is a
very simple form of data compression
in which runs of data (that is,
sequences in which the same data
value occurs in many consecutive data
elements) are stored as a single data
value and count, rather than as the
original run. This is most useful on
data that contains many such runs: for
example, simple graphic images such
as icons, line drawings, and
animations. It is not useful with files
that don't have many runs as it could
greatly increase the file size.
The compressed words are run
length encoded only if the RLE yields
a shorter code length than the original
bitmask encoding [16]. In other
words, if there are R repetitions of
code with length L and the number of
bits required to encode them using
RLE is L’ bits, RLE is used only if
R*L > L’ bits. Since RLE is
performed independently, the bit
savings calculation during dictionary
selection should be modified
accordingly to model the effect of
RLE. The different types of encoding
process
is represented in the given fig.
Fig3 Run length encoding variants
Bit-, Byte-, and Pixel-Level RLE Schemes
The basic flow of all RLE algorithms
is the same, as illustrated in Figure 4
Fig 4. flow chart of run length encoding
International Journal of Pure and Applied Mathematics Special Issue
4758
we can use this RLC algorithm either
before this DCC and Bitmask or else
vice versa. But according to our
analysis it's better to apply RLC after
DCC and Bitmask.
Algorithm:
input:
1. bit streams to be compressed as
vector.
2.small dictionary
3.big dictionary
4.mask types
output:
compressed code.
begin
step 1: comparing data's with both
LUT and then compress it.
step 2: If it is not suit for above one
go for bitmask.
step 3: The above compressed data's
are compressed through RLC.
end
Fig 5.Flow chart for compression
The decompression method is
available to reproduce the original
data whenever it needed.
V EXPERIMENTAL RESULTS
In this section, experimental
benchmarking results for using the
Modelsim and Xilinx.
Table 1: Comparison of existing and
proposed
S.NO
PARAMETERS
DCC
ALGORITHM
HYRID RLC BM CODES
1
Power
consumption
390 P(m/w)
355
P(m/w)
2
Area(no of
slices occupied)
630
326
3
Delay
14.19ns
12.958ns
International Journal of Pure and Applied Mathematics Special Issue
4759
Fig 6 comparison of existing and proposed parameters
Hybrid RLC-BM algorithm:
Fig 7 delay of hybrid rlc- bm algorithm
Fig 8. represents area used by hybrid rlc-bm algorithm
Fig 9.represents the power consumed in hybrid rlc-bm algorithm
VI.CONCLUSION
Hybrid RLC-BM algorithm is
Compared to the DCC algorithm the
parameters like power consumption
,delay or speed and area are optimally
reduced. The proposed method
improves CR by over 40% with a
slight hardware overhead. This
algorithm is implemented in
FPGA[17-18].
0
200
400
600
800
excisting proposed
area
1 area
320
340
360
380
400
excisting proposed
power consumption
powerconsumption
12
13
14
15
excisting proposed
delay
delay
International Journal of Pure and Applied Mathematics Special Issue
4760
REFERENCES
[1].A. Wolfe and A. Chanin,
''Executing compressed programs on
an embedded RISC architecture," in
Proc.25th Annu Int
Symp.Microarchitecture,Dec.1992,pp.
81-91.
[2]C. Lefurgy. P. Bird, I-C. Chen, and
T, Mudge, "Improving code density
using compression techniques," in
Proc 30th Annu.ACM/IEEE Int.
Symp. MICRO, Dec.1997,pp.194-
203.
[3] S-W. Seong and P.Mishra ,"A
bitmask -based code compression
technique for embedded system ,"in
Proc. IEEE/ACMICCAD,Nov.2006,
pp.251-254.
[4] S-W. Seong and P. Mishra, "An
efficient code compression technique
using application -aware bitmask and
dictionary selection methods," in
Proc. DATE ,2007,pp.1-6.
[5] H.Lekatsas and W. Wolf
,"SAMC:A code compression
algorithm for embedded processors,
"IEEE Trans .computer-aided design
integer-circuits Syst.,vol.18,no.12,pp.
1689-1701 ,Dec. 1999.
[6]S.Y. Larin and T.M .Conte,
"complier-driven cached code
compression schemes for embedded
ILP processors ,"in Proc .32nd Annu.
Int.Symp.Microarchitecture,nov.1999,
pp. 82-91.
[7]Y. Xie, W. Wolf, and H.Lekatsas,"
Code compression for VLIW
processor using variable -to-fixed
coding, "in Proc. 15th ISSS, 2002,
PP.138-143.
[8]C.H. Lin ,Y. Xie, and W. Wolf,"
Code compression for VLIW
processor using a self-generating
table, "IEEE Trans. Very Large Scale
Integer.(VLSI)Syst.,vol.
15,no.10.pp.1160-1171, oct. 2007.
[9]C.-W .Lin, C. H. Lin and W .J.
Wang, "A power aware code-
compression design for RISC/VLIW
architecture, "J. Zhejiang Univ.-
sci.Comput.
Electron.),vol.12,no.8,pp.629-637,aug
2011.
[10] T. Bonnt and J. Henkel, "FBT:
Filled buffer technique to reduce
code size for VLIW processors," in
Proc. IEEE/ACM Int.
Conf.CAD(1CCAD),Nov,2008,pp.54
9-554.
[11]X. Qin and P.MISHRA,
"Efficient placement of compressed
code for parallel decompression, "in
Proc.22nd Int, Conf>VLSI
Design,Jan,2009,pp.335_340.
[12] T. Bonnt and J. Henkel, ''LICT:
Left-uncompressed instructions
compression technique to improve the
decoding performance of VLIW
processors, "in Proc 46thACM/IEEE
Design
Autom.Conf.,Jul,2009,pp903_906.
[13]B. Gorjiara, M. Reshadi, and D.
Gajski, "Merged dictionary code
compression for FPGA
implementation of custom microcode
Pes," ACM Trans Reconfigurable
Technol.Syst,vol,1,no.2,jun
2008,Art.ID 11.
International Journal of Pure and Applied Mathematics Special Issue
4761
[14] M. Ros and P. Sutton "A
hamming distance based VLIW/EPIC
code compression technique, "in Proc
Int,Conf,CASES,2004,pp.132_139.
[15] X. Qin, C. Murthy, and
P.Mishra."Decoding-aware
compression p f FPGA bit streams,
"IEEE Trans, Very Large Scale
Integr,(VLSI)Syst.,vol.19,no,3,pp,411
-419,Mar,2011.
[16] Jubairahmed, L., Satheeskumaran, S.,
Venkatesan, C.: Contourlet transform based
adaptive nonlinear diffusion filtering for
speckle noise removal in ultrasound images.
Clust. Comput. 1–10 (2017).
https://doi.org/10.1007/s10586-017-1370-x
[17] Satheeskumaran, S., Sabrigiriraj, M.:
VLSI implementation of a new LMS based
algorithm for noise removal in ECG signal.
Int. J. Electron. 103, 975–984 (2016).
[18] Venkatesan C, Karthigaikumar P,
Varatharajan R (2018) FPGA
implementation of modified error
normalized LMS adaptive filter for ECG
noise removal. Cluster Computing, 1-9.
https://doi.org/10.1007/s10586-017-1602-0.
International Journal of Pure and Applied Mathematics Special Issue
4762
4763
4764