code compression by register operand dependency
TRANSCRIPT
The Journal of Systems and Software 72 (2004) 295–304
www.elsevier.com/locate/jss
Code compression by register operand dependency
Kelvin Lin *, Jean Jyh-Jiun Shann, Chung-Ping Chung
Department of Computer Science & Information Engineering, National Chiao Tung University, 1001 Ta Hsueh Road, HsinChu 300, Taiwan, ROC
Received 1 January 2002; received in revised form 26 July 2002; accepted 21 February 2003
Available online 5 December 2003
Abstract
This paper proposes a dictionary-based code compression technique that maps the source register operands to the nearest
occurrence of a destination register in the predecessor instructions. The key idea is that most destination registers have a great
possibility to be used as source registers in the following instructions. The dependent registers can be removed from the dictionary if
this information can be specified otherwise. Such destination–source relationships are so common that making use of them can
result in much better code compression. After removing the dependent register operands, the original dictionary size can be reduced
significantly. As a result, the compression ratio can benefit from: (a) the reduction of dictionary size due to the removal of dependent
registers, and (b) the reduction of program encoding due to the reduced number of dictionary entries.
A set of programs has been compressed using this feature. The compression results show that the average compression ratio is
reduced to 38.41% on average for MediaBench benchmarks compiled for MIPS R2000 processor, as opposed to 45% using operand
factorization.
� 2003 Elsevier Inc. All rights reserved.
1. Introduction
Most of the embedded systems are cost sensitive.
Small memory size results in a lower cost and lower
power requirement. Typically, programs in an embed-
ded system are stored in a ROM associated with an
ASIC, whose sizes translate directly into silicon area and
cost. Thus, memory size reduction becomes more
important in the design of an embedded system. Inaddition, as the complexity of an embedded system
grows, programming in assembly language and optimi-
zation by hand are no longer practical and economical.
The programs are written in high-level languages
(HLL), such as C and C++, and compiled into execu-
tables. Direct translation from high-level languages into
the machine code incurs the penalty of code size due to
completeness of translation for each HLL statement tomachine instructions. Some code optimization, such as
redundant code removal or common sub-expression
*Corresponding author. Address: No. 193, Jhongsiao N. Rd.,
Gueiren Township, Tainan County, 711 Taiwan, (ROC).
E-mail addresses: [email protected], [email protected]
(K. Lin), [email protected] (J.J.-J. Shann), cpchung@csie.
nctu.edu.tw (C.-P. Chung).
0164-1212/$ - see front matter � 2003 Elsevier Inc. All rights reserved.
doi:10.1016/S0164-1212(03)00214-0
elimination must be extra processed (Aho et al., 1986).Most compiler optimizations focus on the execution
speed rather than the code size, and this fact results in a
speed-space trade-off. Therefore, code size optimization
has great potential under such a programming envi-
ronment.
This paper proposes a code compression technique
that substantially reduces code size. This method is
based on the operand factorization (Araujo et al., 1998),but separates the instruction sequence differently into
the opcode sequence, the mapping sequence, and the
residual operand sequence. The key idea of this method
is that a destination register has a great possibility of
being used as the source register in the following
instruction. We use the mapping tag to specify the rela-
tionship between source registers and destination regis-
ters so that the occurrences of the same registers can beeliminated from the operand sequence used in the
operand factorization method. We find that the varia-
tions of the relations are much smaller than that of the
operands themselves. The dictionary storing the map-
ping information occupies only a small amount of space,
and the size of dictionary storing the operands is re-
duced significantly. As a result, program encoding
benefits from the reduced number of entries of the
296 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304
operand sequences. Experimental results show that
the compression ratio reaches 38.41% on average for
MediaBench (Lee et al., 1997) compiled for MIPS
R2000 processor.
This paper is organized as follows: Section 2 discusses
the related work in code compression; Section 3 proposesthe detailed register dependency method; Section 4 de-
scribes the decompression engine; Section 5 presents the
simulation results; and Section 6 summarizes this work.
2. Related work
An intuitive way to achieve the reduction of codes isto restrict the instruction size as shown in Fig. 1. This is
the approach adopted in the design of the Thumb
(Turley, 1995) and MIPS16 (Kissell, 1997). Shorter
instructions are achieved primarily by restricting the
number of bits that encode opcodes, registers and
immediate values. Fewer opcodes mean that an opera-
tion might be achieved by multiple instructions and
fewer registers imply less freedom for the compiler toperform important tasks, such as global register allo-
cation. However, the results are 30–40% smaller pro-
grams running 15–20% slower than programs using
standard RISC instruction set.
Another way to reduce the code size is compression,
which encodes the occurrences of identical instructions
(or instruction sequences) in a program into smaller
codewords to reduce the program size. Lefurgy et al.(1997) propose a dictionary-based compression method,
which stores one copy of the common instruction se-
quences into the dictionary and replaces the occurrences
of the sequences with shorter (fixed or variable-length)
codewords than the instruction sequences themselves.
All the codewords are aligned at the byte boundary to
achieve the best compression ratio. Post-compilation
modifies all branch offsets to reflect the new compressedaddress space. Each branch offset contains a 3-bit offset
and a 13-bit byte address. The average compression
ratios of 61%, 66%, and 74% are reported for the
PowerPC, ARM, and i386 processors, respectively.
Wolfe and Chanin (1992) propose a statistical com-
pression method in Compressed Code RISC Processor
Fig. 1. ARM and Thumb instruction
(CCRP). CCRP performs code compression based on
Huffman-encoding (Huffman, 1952). Each 32-byte cache
line is compressed into smaller aligned bytes or words.
Line Address Table (LAT) is used to map the original
program instruction addresses into compressed code
instruction addresses. The LAT generated by the com-pression tool is stored in memory along with the pro-
gram. The LAT size counts approximately 3% the
original program size. Wolfe et al. report an average
compression ratio of 73% on MIPS R2000.
To further improve the compression ratio, more
similarities between instructions must be explored to
reduce both the dictionary size and the program
encoding. Araujo et al. (1998) find that most instructionsequences expanded from expression tree (Aho et al.,
1986) are identical with either opcode sequences or
operand sequences, but not both. Therefore, they sepa-
rate the instruction sequences into tree-patterns (opcode
sequences) and operand patterns (operand sequences) as
shown in Fig. 2. This method is called operand factor-
ization. The concept of operand factorization comes
from the idea of superoperator and is first applied as theencoding technique for intermediate representation (IR)
compression (Proebsting, 1995). The common opcode
sequences of the expression trees are stored in the tree-
pattern dictionary (TPD) and the common operand se-
quences in the operand (pattern) dictionary. Each
instruction sequence is encoded into tree- pattern
codeword and operand-pattern codeword. The decom-
pression engine reassembles the instruction sequence bycombining the entries in both dictionaries indexed by the
codeword pairs. After separating the dictionary, the
dictionary size is reduced significantly, thus the entire
program size is reduced. The average compression ratio
for this scheme is 43% using Huffman-encoding (Huff-
man, 1952) and 48% using MPEG-2 VLC encoding
(Haskell et al., 1996).
The program written in bytecoded, stack-basedinstruction set can be compressed by the grammar-based
compression method (Fraser and Evans, 2001). This
method transforms the language grammar, creating an
expanded grammar that represents the same language as
the original grammar, but permits a shorter derivation
of the sampled program. A program’s derivation under
formats for Add instruction.
Fig. 2. (a) Expression tree, (b) tree pattern and (c) operand pattern.
Table 1
Classification of instructions
Categories Example Instruction
1. op �nop
2. op src mthi $rn
3. op dst mfhi $rn
4. op imm j address
K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304 297
the expanded grammar forms the compressed bytecode
representation of that program. An average compres-sion ratio of 36% is reported.
Some compiler techniques are also proposed to re-
duce code size. Cooper et al. apply the pattern-matching
algorithm (Fraser et al., 1984) to find out the repeating
instruction sequences and then achieve code compres-
sion via procedure abstraction and cross jumping tech-
niques (Cooper and McIntosh, 1999). The procedure
abstraction makes a given code region into a procedureand replaces the other instances of this code region with
calls to the new-made procedure, while the cross jump-
ing technique replaces the occurrences of the repeated
code region with jump instructions to an unique copy of
that code region. Branch rewriting, operation pattern
abstracting and register renaming techniques are also
used to increase code region repetitions. The average
program size reductions of 7.91% and 22.48% are re-ported for applying code compression only and applying
code compression plus code optimization, respectively.
Ernst et al. (1997) propose two compiler techniques,
called wire code and interpretable code, for two sce-
narios that transmission and memory are bottlenecks,
respectively. Both techniques gather information about
the common patterns that appear in the code and divide
the stream of code into two smaller streams: one holds theoperators and the other holds the literal operands
for each operator that needs a literal operand. Both
smaller streams are gzipped and the compression ratio
of 22.49% is reported for wire code. The interpretable
code, also called BRISC, scans the input program,
generates candidate instruction patterns and estimates
the most potential size reduction for all candidate pat-
terns. The potential size reduction is done by operandspecialization and opcode combination. Operand spe-
cialization separates the operands in the candidate
instruction one by one to achieve the maximum repeti-
tions of operands, while the opcode combination tries to
combine the most used opcode to achieve the maximum
repetition of opcode patterns. The repeating opcode
patterns and operand patterns are compressed using
dictionary-based algorithm. The average compressionratio of 59% is reported.
5. op src, src mult $rn, $rm
6. op src, imm bgez $rn, address
7. op dst, src �move $rn, $rm
8. op dst, imm lhi $rn, value
9. op src, src, imm sw $rn, offset($rm)
10. op dst, src, src add $rn, $rm, $rk
11. op dst, src, imm lw $rn, offset($rm)
3. Register operand dependency
Observation shows that the instruction sequences
translated from the source code have dependencies be-
tween the register operands. Experiments are carried out
to measure the number of dependent registers. There areon average 27.5% source registers are dependent on the
destination registers of the previous instructions (true
data dependency (Johnson, 1991)) and 12.1% destina-
tion registers are dependent on the previous destination
registers (output dependency) for MediaBench bench-
marks (Lee et al., 1997) compiled for MIPS instruction
set. This reality motivates us to compress the program
by utilizing these dependencies. The size of the dictio-nary storing the operand sequences can be reduced after
removing the dependent registers from operand se-
quences. The number of distinct operand sequences re-
duces as a result, thus reduces the size of compressed
program. The following procedures describe the com-
pression algorithm using such a technique.
3.1. Instruction classification
This step examines the types of operands in instruc-
tion formats to find the register dependency. To build
the compression model, we are required to examine the
instruction set of the MIPS R2000 processor (Gerry,
1988). The instructions are classified into the following
categories shown in Table 1 according to the different
types of operands.In Table 1, the op indicates the opcode of an
instruction, src (dst) indicates the register operand is
used as a source (destination) register for operation op,
and imm indicates the field is an immediate value. This
classification is based on the register types and the
number of registers the opcode really used. The
instructions �nop’ and �move’ are pseudo instructions,
but not assembly instructions. The pseudo instruction
Fig. 4. Representations of operand relations and residual operand
sequences.
298 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304
�move $rn, $rm’ can be implemented by either assembly
instruction �addu $rn, $rm, $r0’ or �or $rn, $rm, $r0’. The
opcode �addu’ for different operand sequences (�$rn,$rm, $rk’ ($rk 6¼ $r0) and �or $rn, $rm, $r0’, for example)
are encoded into different new opcodes. Encoding the
same opcode with different types of operand sequencesinto different opcodes reduces code size more than only
encoding the different operands. There are two reasons:
(a) There are only a small number of opcodes for MIPS
R2000. Encoding the opcodes used in different pseu-
do instructions distinctively increases fewer bits than
the encoding for distinct operand sequences them-
selves.(b) Some opcodes are likely to use a specific operand,
such as ’lw $reg, offset($sp). Encoding such cases
(opcode + likely operand) into a new opcode can
shorten the operand sequence, and shorter operand
sequences are more likely to be the same with each
other.
This classification is then used to find the dependencyrelationship between source and destination registers.
3.2. Register operand dependency
Observation reveals that instruction sequences may
have the same dependency relations between registers
even if they have a different opcode sequence or operand
sequences.Consider the two instruction sequences in Fig. 3. The
second source register of the second instruction is the
same as the destination register of the first instruction. A
mapping tag is used to indicate this dependency. Each
source register has a mapping tag. For an n-bit mapping
tag, we assigned the values:
(a) �0’: for load operation that will read a register id fromthe operand sequence,
(b) 1 to 2n � 1: for relation operation showing the num-
ber of instructions from the preceding instruction
with a destination register of the same register id
to the current instruction.
Fig. 3. Example opcode sequenc
Note that a non-zero mapping tag value (relation
operation), also called the mapping distance, counts
only the number of instructions with a true data
dependency (Johnson, 1991). The remaining un-mapa-
ble registers and the immediate values are listed and
called the residual operand sequence. We use a 2-bitmapping tag for example, value �0’ indicates the load
operation and values �1’, �2’ or �3’ indicate relation
operations. Fig. 4 shows the representation of relations
and the residual operand sequences:
We explain the representations of operand relations
for the first and the second instructions:
(a) The first instruction does not contain any sourceregister, so there is no need for any mapping tag.
All the operands are listed in the residual operand
sequence.
(b) In the second instruction, the first source register
($r2) has not appeared before, so the mapping tag
value is �0’ and this register is put in the residual
operand sequence. The second source register ($r3)
is the same as the destination register of the firstinstruction, and thus the mapping tag value �1’ indi-cates that this source register is the same as the des-
tination register of the instruction one position up.
The last register is a destination register, and is
put in the residual operand sequence.
The other instructions are proceed in the same way. It is
surprising that these two instruction sequences in the
e and operand sequences.
K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304 299
example have the same mapping value sequence al-
though they are quite different instruction sequences.
This method not only extracts the common register
relations, but also reduces the number of operands in
the original operand sequence to increase the re-usage of
the residual operand sequences. This step attempts tolocate the register dependencies for all instructions in the
basic blocks. This dependency relationship does not
cross the basic block boundary since the registers are not
guaranteed to be alive across the basic block.
3.3. Register majority
After removing the dependent source registers fromoperand sequence, the residual operand sequences still
contain redundancies. There are some registers appear-
ing frequently in different residual operand sequences.
We call the most frequent register among all residual
operand sequences the first (register) majority, the sec-
ond most frequent register the second majority, and so
on. Effective reduction of the dictionary size can be
achieved by remove the majorities. We borrowed valuesfrom the mapping tag to denote the mapping of the
independent registers to the majorities. Assuming that
there are m majorities, for an n-bit mapping tag, we
assign tag values:
(a) �0’: reserved for load operation,
(b) �1’ to �2n � m� 1’: for relation operations, and
(c) �2n � m’ to �2n � 1’: for majority relations which indi-cate the first majority, the second majority, and so
on.
The register majorities are different from one program to
another. These majorities are stored in dedicated regis-
ters, called Majority Registers (MRs), so that these MRs
can be mapped by mapping tag. Refer to the example in
Fig. 4. Assuming that the register �$r5’ is the first and theonly majority for a 2-bit mapping tag, tag value �3’ isreserved for majority. The instruction sequence in Fig. 4
can be transformed to Fig. 5.
In this step, the occurrences of the mapping tags with
load operations are replaced with the majority opera-
tions if the corresponding registers are majorities.
Fig. 5. Instruction sequence after applying the majorities.
3.4. Program encoding
After removing the registers with destination–source
and majority relationship from the operand sequences,
the program is divided into instruction sequences as the
basic unit for compression. An instruction sequence isdefined as a number of instructions with each instruc-
tion (except the last one) having the destination–source
relationship to the following instructions. Then, each
instruction sequence is operand-factorized, extracted of
the dependent registers and partitioned into three se-
quences: opcode sequence, mapping tag sequence and
residual operand sequence. Every instruction sequence is
independently Huffman-encoded into codewords CWOP,CWMAP and CWROD representing for the opcode se-
quences, mapping sequences and residual operand se-
quences, respectively. The entire program is transformed
into the form: [CWOP1, CWMAP1, CWROD1, CWOP2,
CWMAP2, CWROD2, . . . , CWOPi, CWMAPi, CWRODi],
assuming that the program has a total of i instructionsequences. Codewords are allowed to split at the end of
bytes. Bits from the spilled codeword are spilled into thenext byte. The compression ratio benefits from the use
of splitting codewords and Huffman-encoding, but will
cause the execution performance penalty. We trade-off
the performance advantage in exchange for more com-
pression ratio.
3.5. Branch target address
One side effect of code compression is that it alters the
locations of instructions in the program due to the
instruction size reduction. The branch offsets must be
patched according to the change of the target addresses
so as to reduce an unnecessary translation from
uncompressed address space into compressed address
space. Since the codewords can be of any length and not
necessarily byte aligned, the branch target must be ableto point at any bit location within a byte. The branch
offset is divided into two fields:
(a) byte address (13 or 23 bits depending on the branch
type), and
(b) bit offset (3 bits).
Since the original offset specifies the address at wordgranularity and the new offset specifies the address at the
bit granularity, the overall branch distance is reduced to
1/32. Nevertheless, for the program analyzed, only a
small percentage of targets require more than 23 bits.
For those branches, a jump table is provided for storing
the target addresses. Similar to Lefurgy et al.’s (1997),
the jump table addresses are patched up to reflect the
compressed addresses. For those systems supportingdynamically linked procedure calls (or dynamically
linked libraries, DLL), a separated translation table
300 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304
such as LAT (Wolfe and Chanin, 1992) may be more
suitable for the operating system to dynamically allocate
memory space for these procedures at run time.
4. Decompression engine
This section describes the decompression engine de-
signed for our compression method. The decompression
engine mainly consists of three dictionaries and an
Instruction Assembly Buffer (IAB) as shown in Fig. 6.
4.1. Dictionaries
Three dictionaries store the opcode sequences, map-
ping tag sequences and residual operand sequences,
independently. The opcode sequences are stored in the
Opcode Dictionary (OPD) with each entry containing
three fields:
(a) opcode field: the opcode bits of an instruction,
(b) mapping type field: two-bit field indicating howmany mapping tags must be read for this instruc-
tion, and
(c) end mark: signifies the end of an instruction se-
quence.
Fig. 6. Decompression engine.
Since the maximal number of source registers for
instructions is 2, 2 bits are sufficient to distinguish zero,
one and two registers. The same opcodes with different
mapping types will be encoded into different OPD en-
tries due to their different mapping type fields.
The mapping tag sequences are stored in the Mapping
Dictionary (MAPD). We optimize the MAPD size by
sharing a shorter mapping tag sequence with a longer
one of which the prefix sub-sequence is the same as the
shorter one.
The Residual Operand Dictionary (ROD) which stores
residual operand sequences consists of two storages:
(a) Register Generation (RGEN): storing the registerlists of all residual operands and the register major-
ities.
(b) Immediate value Dictionary (IMD): storing the dis-
tinct immediate values in the program.
As proposed by Araujo et al. (1998), residual operand
sequence with immediate values can be used to minimize
the RGEN. The register majorities are stored into themajority registers before the compressed program begins
execution. When the residual operand sequence contains
an immediate value, the Rregister Bus (RegBus) associ-
ated to the immediate operand is not needed and can be
made do not care. For the example in Fig. 5, the residual
operand sequences �$r3, 0xfcb08, $r2, $r2, 0x1c’ and
�$r3, $r6, 0xa004, $r2, 0xa002505c’ can share the com-
mon register sequence �$r3, $r6, $r2, $r2, $r? ($r? meansdo not care). When an instruction is assembled, all
registers are sent to the IAB via RegBus, but the register
with the corresponding position of an immediate value
will be ignored by IAB according to the opcode from
OPD. On the other hand, immediate values are stored in
the IMD for each distinct immediate value in the pro-
gram, no matter which residual operand sequence con-
tains it. These values are clustered into memory banksaccording to the number of bits consumed. Access IMD
and RGEN can be processed in parallel to accelerate the
decompression.
4.2. Instruction Assembly Buffer (IAB)
The CWOP, CWMAP and CWOD are extracted from
the compressed program to index to instruction op-
codes, mapping tags and residual operands, respectively.
The fetched opcodes, mapping tags, registers and
immediate values are sent to the IAB to assemble to the
original instruction sequences. The IAB must decide:
(a) How many mapping tags it must read according to
the mapping type in opcode dictionary.
(b) Where the register operands are to be fetched (Reg-
ister Bus, Mapping Queue or Majority Registers)
depending upon the value of mapping tag.
K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304 301
(c) Whether the immediate value is used depending
upon the opcode.
Mapping Queue (MQ) is a register mapping queue
that stores the destination registers id of previous
several instructions. When the tag value is a relationoperation, the register id in the MQ at the position
specified by the mapping tag value is fetched for the
current source register. After assembling the instruc-
tion, the decompression engine pushes the destination
register id into the MQ so that the further mapping tag
can reference it. The number of entries in the MQ is
equal to the maximal mapping distance defined previ-
ously.Fig. 7 shows the decompression steps for the
example in Fig. 5. The first instruction contains no
source register, so there is no need for mapping tag. All
operands come from corresponding residual operand
sequence. This instruction defines a destination regis-
ter, so the decompression engine pushes the destina-
tion register id ($r3) into mapping queue as shown in
Fig. 7(a). The second instruction reads two mappingtags:
(a) The first mapping tag value �0’ indicates that the
source register ($r2) comes from residual operand
sequence.
(b) The second mapping tag value �1’ indicates the sec-
ond source register ($r3) is the first register id in
MQ, which is the destination register of the previousinstruction.
The last register is a destination register. This register
must be read from the residual operands and is pushed
into the MQ as shown in Fig. 7(b). The last instruction is
processed in the same way as the second instruction as
shown in Fig. 7(c) except that:
Fig. 7. Decompre
(a) The first mapping value �3’ indicates that the registerid comes from the Majority Register (MR).
(b) This instruction defines no destination register, so
there is no need to push the destination register id
into the MQ.
5. Experimental results
The following section describes the experimental re-
sults of code compression by register dependency. We
use the MediaBench for analyzing this technique. The
programs are compiled for the MIPS R2000 processor
using GCC version 2.8.1 with optimization option O2.We first examine the suitable size of mapping tag to find
the best compression ratio, and then compare the
compression ratios between this method and the oper-
and factorization method.
5.1. Size of mapping tag
It is critical to determine what size of the mapping tagis sufficient for compacting the both dictionaries and the
program encoding. Simulation is used to find the suit-
able tag size. The simulated tag sizes range from 1 to 5
bits (since the encoding of register in the original pro-
gram is 5 bits long), and the number of majorities ranges
from �0’ to �2tag size � 1’. Following subsections describe
the compacting effect using different sizes of mapping
tag.
5.1.1. RGEN size reduction
Fig. 8 shows the size reduction of RGEN versus the
size of mapping tag. The x-axis is the number of
majorities and the y-axis is the RGEN size ratio to the
original program size. This graph consists of six data
sets classified into two categories:
ssion steps.
Fig. 8. Size reduction of RGEN vs. mapping tag size.
Fig. 10. Dictionary size of RGEN plus MAPD vs. mapping tag size.
Fig. 11. Final compression ratio vs. mapping tag size.
302 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304
(a) The first data set: a single point indicating the size
ratio of RGEN resulted from the operand factoriza-
tion method.
(b) The other data sets: consisting of 2n points each forn ¼ 1 to 5, showing the size ratios using n-bit map-
ping tag with 0; 1; . . . ; 2n � 1 majorities.
In the second legend data set (n ¼ 1 case), we saw
that the size reduction due to dependency is more than
due to majorities. Furthermore, the last point of each
curve always tilts up. This is because the number of
registers removed from the residual operand sequencesby mapping distance 1 is sufficiently larger than the
number of (2n � 1)th majority registers. From Fig. 8, a
5-bit mapping tag with 30 majorities reduces the RGEN
size the most.
5.1.2. Mapping penalty
Although the size of RGEN is reduced, we introduce
the MAPD. Fig. 9 shows the MAPD size compared tothe original program. The x-axis is the number of
majorities and the y-axis is the MAPD size ratio. For-
tunately, the size of MAPD is much smaller than the size
reduced in RGEN. The overall effect is still positive for
compression. Fig. 10 shows the size of MAPD plus
RGEN. The overall dictionary size reduction is 8% by
using register dependency compression method.
Fig. 9. MAPD size vs. mapping tag size.
5.2. Compressed code reduction
Fig. 11 shows the final compress ratios versus the size
of mapping tag. Since the sizes of RGEN are reduced,
the size of program encoding is also reduced as a result.The main advantage to reduce the program encoding
is the reduced number of entries of residual operands,
which is almost proportional to size of RGEN plus
IMD. As we expect, a 5-bit mapping tag with 30
majorities results in the best compression ratio.
To detail the benefits of compression using our
method, we examine the size contributions of dictio-
naries and the program encodings. Fig. 12 shows sizecontribution of each component in the compression
Fig. 12. Size ratios for all components in a compressed program.
cjpeg decode epic jpgtran mpeg2dec rawcaudio rdjpgcom Agv
Fig. 13. Final compression ratios for all benchmarks.
Table 2
Size reduction of dictionary and program encoding
Dictionary size Compressed code size Dictionary size reduction Program encoding reduction
Operand factorization 0.1516 0.3337 N/A N/A
1-bit mapping tag 0.1262 0.392 0.0254 )0.05832-bit mapping tag 0.1053 0.3566 0.0463 )0.02293-bit mapping tag 0.0855 0.3379 0.0661 )0.00424-bit mapping tag 0.072 0.3218 0.0796 0.0119
5-bit mapping tag 0.0628 0.306 0.0888 0.0277
K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304 303
ratio. The x-axis shows the size of mapping tag (# bits)and y-axis shows the best compression ratios for each
size of mapping tag. There are seven components:
(a) OPD: opcode dictionary,
(b) IMD: immediate value dictionary storing distinct
values in the program,
(c) RGEN: register generation storing register list of the
residual operands,(d) MAPD: mapping tag dictionary,
(e) CWOP: encoding of the opcode sequences,
(f) CWMAP: encoding of the mapping tag sequences,
and
(g) CWROD: encoding of residual operand sequences.
Fig. 12 shows that the three largest components in
operand factorization change from RGEN, CWOP andCWROD into CWOP and CWMAP. The size of RGEN and
CWROD reduced most. Table 2 shows size reductions
when the components are classified into two major
parts:
(a) Dictionary part: consists of OPD, MAPD, RGEN
and IMD.
(b) Program encoding part: consists of encoding of op-code sequences, mapping tag sequences and residual
operand sequences.
Table 3
Detailed component contribution for benchmark programs
Benchmark
name
Method OPD (%) IMD (%) RGEN (%) M
cjpeg Od Fact 1.01 2.37 11.78
Reg Map 1.01 2.37 1.33 1
decode Od Fact 2.02 2.14 13.46
Reg Map 2.02 2.14 3.57 2
epic Od Fact 1.19 3.35 11.72
Reg Map 1.19 3.35 1.29 1
jpgtran Od Fact 0.83 2.38 11.18
Reg Map 0.83 2.38 1.35 1
mpeg2dec Od Fact 1.48 3.69 13.15
Reg Map 1.48 3.69 2.20 2
rawcaudio Od Fact 1.89 3.20 13.30
Reg Map 1.89 3.20 1.87 2
rdjpgcom Od Fact 1.70 3.39 12.94
Reg Map 1.70 3.39 1.67 2
Agv Od Fact 1.44 2.93 12.50
Reg Map 1.44 2.93 1.89 1
From Table 2, the maximal factor in reducing the
compression ratio is the reduction of dictionary size
rather than program encoding.
Fig. 13 shows the detailed compression ratios for all
benchmark programs. Each benchmark consists of 2
bars, one for operand factorization (Od) and the otherfor our register dependency (Reg). The OPD, IMD and
CWop are invariant in these two methods. The average
decrement of the RGEN is 11% and the increment of
MAPD is 2%. This is the main advantage of register
dependency method. The average decrement of CWod is
14%, but the increment of CWmap is 12%. Total detail
statistics are given in Table 3.
APD (%) CWOP (%) CWROD (%) CWMAP (%) Final (%)
12.54 20.83 48.53
.57 12.54 4.36 13.70 36.88
13.04 18.98 49.64
.17 13.04 4.96 13.19 41.09
13.29 19.95 49.5
.99 13.29 4.34 13.73 39.18
12.47 21.07 47.93
.34 12.47 4.73 13.77 32.14
13.15 20.31 51.78
.02 13.15 5.18 13.42 41.14
12.65 17.79 48.83
.34 12.65 4.54 12.94 39.43
12.61 18.33 48.97
.11 12.61 4.47 13.07 39.02
12.82 19.60 49.31
.93 12.82 5.94 12.11 38.41
304 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304
6. Conclusion
In this paper, we propose to use the register depen-
dency to compress embedded system programs for a
MIPS processor. The key idea of this method is to re-
move the dependent registers from the operand se-quences to reduce the dictionary size and encoded
program size. The success in reducing code size is as-
cribed to:
(a) Encoding the opcodes with likely used operands into
new opcodes shorten the operand sequences to in-
crease the reusage of common operand sequences.
(b) Destination–source relationship eliminates the regis-ter specifications in a serial of instructions manipu-
lating the identical data.
(c) Majority relationship eliminates the multiple occur-
rences of the same register in residual operand se-
quence to increase the re-usage of the common
operand sequences.
This research can be further improved in two ways:
(a) From Fig. 12, the program encodings for both op-
code sequences and mapping tag sequences are the
largest portions contributing to the compression
ratio. Reduction in the mapping tag size and re-usage
of the OPD entries are next step goals for reducing
both components in the compressed code.
(b) We can improve the algorithm to find more rela-tions between operands. Such improvement may in-
clude building both the language grammar and
register allocation rules, and compressing the
instruction sequences to the derivation of these
rules.
References
Aho, A. et al., 1986. Compilers: Principles, Techniques and Tools.
Addison-Wesley, Boston.
Araujo, G., et al., 1998. Code compression based on operand
factorization. In: 31st Annual ACM/IEEE International Sympo-
sium on Microarchitecture.
Cooper, K.D., McIntosh, N., 1999. Enhanced code compression for
embedded RISC processors. In: ACM SIGPLAN ’99 Conference
on Programming Language Design and Implementation SIG-
PLAN Notices, 34 (5), pp. 139–149.
Ernst, J. et al., 1997. Code compression. In: 1997 ACM SIGPLAN
Conference on Programming Language Design and Implementa-
tion SIGPLAN Notices, 32 (5), pp. 358–365.
Fraser, C.W., Evans, W., 2001. Bytecode compression via profiled
grammar rewriting. In: Proceeding of Int’ l Conference on
Programming Languages Design and Implementation.
Fraser, C.W. et al., 1984. Analyzing and compressing assembly code.
SIGPLAN Notices 19 (6), 117–121.
Gerry, K., 1988. MIPS RISC Architecture. Prentice-Hall.
Haskell, B.G. et al., 1996. Digital Video: An Introduction to MPEG-2.
Chapman & Hall.
Huffman, D.A., 1952. A method for the construction of minimum
redundancy codes. Proc. IEEE 40, 1089–1101.
Johnson, M., 1991. Superscalar Microprocessor Design. Prentice-Hall.
Kissell, K., 1997. MIPS16: High-density MIPS for the Embedded
Market. Silicon Graphics MIPS Group.
Lee, C., et al., 1997. MediaBench: a tool for evaluating and
synthesizing multimedia and communications systems. In: 30th
Annual ACM/IEEE International Symposium on Microarchitec-
ture.
Lefurgy, C., et al., 1997. Improving code density using compression
techniques. In: 30th Annual International Symposium on Microar-
chitecture.
Proebsting, T.A., 1995. Optimizing an ANSI C interpreter with
superoperators. In: Proceedings of ACM Conference on Principles
of Programming Languages, pp. 322–332.
Turley, J.L., 1995. Thumb squeezes ARM code size. Microproc. Rep. 9
(4), 27.
Wolfe, A., Chanin, A., 1992. Executing compressed programs on an
embedded RISC architecture. In: 25th Annual International
Symposium on Microarchitecture.
Kelvin Lin received the M.E. and Ph.D. degrees in computer engi-neering from the Department of Computer Science and InformationEngineering, National Chiao Tung University, Taiwan, ROC in 1995and 2002, respectively. He is an IC design engineer at R&DDivision ofVIA Technologies, Inc.
Jean Jyh-Jiun Shann received the Ph.D. degree in computer engi-neering from the Department of Computer Science and InformationEngineering, National Chiao Tung University, Taiwan, ROC in 1994and the M.E. degree in computer engineering from the Department ofElectrical and Computer Engineering, University of Texas at Austin,USA. She is a Associate Professor at the Department of ComputerScience and Information Engineering of Chiao Tung University, Tai-wan, ROC. Her research interests include parallel processing, neuralnetwork and fuzzy systems.
Chung-Ping Chung received the M.E. and Ph.D. degrees in electricalengineering from Texas A&MUniversity, College Station, TX, USA in1981 and 1986, respectively. He is a Professor at the Department ofComputer Science and Information Engineering of Chiao Tung Uni-versity, Taiwan, ROC. His research interests include computer archi-tecture, parallel processing, VLSI system design and systemsimulation.