code compression by register operand dependency

10
Code compression by register operand dependency Kelvin Lin * , Jean Jyh-Jiun Shann, Chung-Ping Chung Department of Computer Science & Information Engineering, National Chiao Tung University, 1001 Ta Hsueh Road, HsinChu 300, Taiwan, ROC Received 1 January 2002; received in revised form 26 July 2002; accepted 21 February 2003 Available online 5 December 2003 Abstract This paper proposes a dictionary-based code compression technique that maps the source register operands to the nearest occurrence of a destination register in the predecessor instructions. The key idea is that most destination registers have a great possibility to be used as source registers in the following instructions. The dependent registers can be removed from the dictionary if this information can be specified otherwise. Such destination–source relationships are so common that making use of them can result in much better code compression. After removing the dependent register operands, the original dictionary size can be reduced significantly. As a result, the compression ratio can benefit from: (a) the reduction of dictionary size due to the removal of dependent registers, and (b) the reduction of program encoding due to the reduced number of dictionary entries. A set of programs has been compressed using this feature. The compression results show that the average compression ratio is reduced to 38.41% on average for MediaBench benchmarks compiled for MIPS R2000 processor, as opposed to 45% using operand factorization. Ó 2003 Elsevier Inc. All rights reserved. 1. Introduction Most of the embedded systems are cost sensitive. Small memory size results in a lower cost and lower power requirement. Typically, programs in an embed- ded system are stored in a ROM associated with an ASIC, whose sizes translate directly into silicon area and cost. Thus, memory size reduction becomes more important in the design of an embedded system. In addition, as the complexity of an embedded system grows, programming in assembly language and optimi- zation by hand are no longer practical and economical. The programs are written in high-level languages (HLL), such as C and C++, and compiled into execu- tables. Direct translation from high-level languages into the machine code incurs the penalty of code size due to completeness of translation for each HLL statement to machine instructions. Some code optimization, such as redundant code removal or common sub-expression elimination must be extra processed (Aho et al., 1986). Most compiler optimizations focus on the execution speed rather than the code size, and this fact results in a speed-space trade-off. Therefore, code size optimization has great potential under such a programming envi- ronment. This paper proposes a code compression technique that substantially reduces code size. This method is based on the operand factorization (Araujo et al., 1998), but separates the instruction sequence differently into the opcode sequence, the mapping sequence, and the residual operand sequence. The key idea of this method is that a destination register has a great possibility of being used as the source register in the following instruction. We use the mapping tag to specify the rela- tionship between source registers and destination regis- ters so that the occurrences of the same registers can be eliminated from the operand sequence used in the operand factorization method. We find that the varia- tions of the relations are much smaller than that of the operands themselves. The dictionary storing the map- ping information occupies only a small amount of space, and the size of dictionary storing the operands is re- duced significantly. As a result, program encoding benefits from the reduced number of entries of the * Corresponding author. Address: No. 193, Jhongsiao N. Rd., Gueiren Township, Tainan County, 711 Taiwan, (ROC). E-mail addresses: [email protected], [email protected] (K. Lin), [email protected] (J.J.-J. Shann), cpchung@csie. nctu.edu.tw (C.-P. Chung). 0164-1212/$ - see front matter Ó 2003 Elsevier Inc. All rights reserved. doi:10.1016/S0164-1212(03)00214-0 The Journal of Systems and Software 72 (2004) 295–304 www.elsevier.com/locate/jss

Upload: kelvin-lin

Post on 02-Jul-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

The Journal of Systems and Software 72 (2004) 295–304

www.elsevier.com/locate/jss

Code compression by register operand dependency

Kelvin Lin *, Jean Jyh-Jiun Shann, Chung-Ping Chung

Department of Computer Science & Information Engineering, National Chiao Tung University, 1001 Ta Hsueh Road, HsinChu 300, Taiwan, ROC

Received 1 January 2002; received in revised form 26 July 2002; accepted 21 February 2003

Available online 5 December 2003

Abstract

This paper proposes a dictionary-based code compression technique that maps the source register operands to the nearest

occurrence of a destination register in the predecessor instructions. The key idea is that most destination registers have a great

possibility to be used as source registers in the following instructions. The dependent registers can be removed from the dictionary if

this information can be specified otherwise. Such destination–source relationships are so common that making use of them can

result in much better code compression. After removing the dependent register operands, the original dictionary size can be reduced

significantly. As a result, the compression ratio can benefit from: (a) the reduction of dictionary size due to the removal of dependent

registers, and (b) the reduction of program encoding due to the reduced number of dictionary entries.

A set of programs has been compressed using this feature. The compression results show that the average compression ratio is

reduced to 38.41% on average for MediaBench benchmarks compiled for MIPS R2000 processor, as opposed to 45% using operand

factorization.

� 2003 Elsevier Inc. All rights reserved.

1. Introduction

Most of the embedded systems are cost sensitive.

Small memory size results in a lower cost and lower

power requirement. Typically, programs in an embed-

ded system are stored in a ROM associated with an

ASIC, whose sizes translate directly into silicon area and

cost. Thus, memory size reduction becomes more

important in the design of an embedded system. Inaddition, as the complexity of an embedded system

grows, programming in assembly language and optimi-

zation by hand are no longer practical and economical.

The programs are written in high-level languages

(HLL), such as C and C++, and compiled into execu-

tables. Direct translation from high-level languages into

the machine code incurs the penalty of code size due to

completeness of translation for each HLL statement tomachine instructions. Some code optimization, such as

redundant code removal or common sub-expression

*Corresponding author. Address: No. 193, Jhongsiao N. Rd.,

Gueiren Township, Tainan County, 711 Taiwan, (ROC).

E-mail addresses: [email protected], [email protected]

(K. Lin), [email protected] (J.J.-J. Shann), cpchung@csie.

nctu.edu.tw (C.-P. Chung).

0164-1212/$ - see front matter � 2003 Elsevier Inc. All rights reserved.

doi:10.1016/S0164-1212(03)00214-0

elimination must be extra processed (Aho et al., 1986).Most compiler optimizations focus on the execution

speed rather than the code size, and this fact results in a

speed-space trade-off. Therefore, code size optimization

has great potential under such a programming envi-

ronment.

This paper proposes a code compression technique

that substantially reduces code size. This method is

based on the operand factorization (Araujo et al., 1998),but separates the instruction sequence differently into

the opcode sequence, the mapping sequence, and the

residual operand sequence. The key idea of this method

is that a destination register has a great possibility of

being used as the source register in the following

instruction. We use the mapping tag to specify the rela-

tionship between source registers and destination regis-

ters so that the occurrences of the same registers can beeliminated from the operand sequence used in the

operand factorization method. We find that the varia-

tions of the relations are much smaller than that of the

operands themselves. The dictionary storing the map-

ping information occupies only a small amount of space,

and the size of dictionary storing the operands is re-

duced significantly. As a result, program encoding

benefits from the reduced number of entries of the

296 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304

operand sequences. Experimental results show that

the compression ratio reaches 38.41% on average for

MediaBench (Lee et al., 1997) compiled for MIPS

R2000 processor.

This paper is organized as follows: Section 2 discusses

the related work in code compression; Section 3 proposesthe detailed register dependency method; Section 4 de-

scribes the decompression engine; Section 5 presents the

simulation results; and Section 6 summarizes this work.

2. Related work

An intuitive way to achieve the reduction of codes isto restrict the instruction size as shown in Fig. 1. This is

the approach adopted in the design of the Thumb

(Turley, 1995) and MIPS16 (Kissell, 1997). Shorter

instructions are achieved primarily by restricting the

number of bits that encode opcodes, registers and

immediate values. Fewer opcodes mean that an opera-

tion might be achieved by multiple instructions and

fewer registers imply less freedom for the compiler toperform important tasks, such as global register allo-

cation. However, the results are 30–40% smaller pro-

grams running 15–20% slower than programs using

standard RISC instruction set.

Another way to reduce the code size is compression,

which encodes the occurrences of identical instructions

(or instruction sequences) in a program into smaller

codewords to reduce the program size. Lefurgy et al.(1997) propose a dictionary-based compression method,

which stores one copy of the common instruction se-

quences into the dictionary and replaces the occurrences

of the sequences with shorter (fixed or variable-length)

codewords than the instruction sequences themselves.

All the codewords are aligned at the byte boundary to

achieve the best compression ratio. Post-compilation

modifies all branch offsets to reflect the new compressedaddress space. Each branch offset contains a 3-bit offset

and a 13-bit byte address. The average compression

ratios of 61%, 66%, and 74% are reported for the

PowerPC, ARM, and i386 processors, respectively.

Wolfe and Chanin (1992) propose a statistical com-

pression method in Compressed Code RISC Processor

Fig. 1. ARM and Thumb instruction

(CCRP). CCRP performs code compression based on

Huffman-encoding (Huffman, 1952). Each 32-byte cache

line is compressed into smaller aligned bytes or words.

Line Address Table (LAT) is used to map the original

program instruction addresses into compressed code

instruction addresses. The LAT generated by the com-pression tool is stored in memory along with the pro-

gram. The LAT size counts approximately 3% the

original program size. Wolfe et al. report an average

compression ratio of 73% on MIPS R2000.

To further improve the compression ratio, more

similarities between instructions must be explored to

reduce both the dictionary size and the program

encoding. Araujo et al. (1998) find that most instructionsequences expanded from expression tree (Aho et al.,

1986) are identical with either opcode sequences or

operand sequences, but not both. Therefore, they sepa-

rate the instruction sequences into tree-patterns (opcode

sequences) and operand patterns (operand sequences) as

shown in Fig. 2. This method is called operand factor-

ization. The concept of operand factorization comes

from the idea of superoperator and is first applied as theencoding technique for intermediate representation (IR)

compression (Proebsting, 1995). The common opcode

sequences of the expression trees are stored in the tree-

pattern dictionary (TPD) and the common operand se-

quences in the operand (pattern) dictionary. Each

instruction sequence is encoded into tree- pattern

codeword and operand-pattern codeword. The decom-

pression engine reassembles the instruction sequence bycombining the entries in both dictionaries indexed by the

codeword pairs. After separating the dictionary, the

dictionary size is reduced significantly, thus the entire

program size is reduced. The average compression ratio

for this scheme is 43% using Huffman-encoding (Huff-

man, 1952) and 48% using MPEG-2 VLC encoding

(Haskell et al., 1996).

The program written in bytecoded, stack-basedinstruction set can be compressed by the grammar-based

compression method (Fraser and Evans, 2001). This

method transforms the language grammar, creating an

expanded grammar that represents the same language as

the original grammar, but permits a shorter derivation

of the sampled program. A program’s derivation under

formats for Add instruction.

Fig. 2. (a) Expression tree, (b) tree pattern and (c) operand pattern.

Table 1

Classification of instructions

Categories Example Instruction

1. op �nop

2. op src mthi $rn

3. op dst mfhi $rn

4. op imm j address

K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304 297

the expanded grammar forms the compressed bytecode

representation of that program. An average compres-sion ratio of 36% is reported.

Some compiler techniques are also proposed to re-

duce code size. Cooper et al. apply the pattern-matching

algorithm (Fraser et al., 1984) to find out the repeating

instruction sequences and then achieve code compres-

sion via procedure abstraction and cross jumping tech-

niques (Cooper and McIntosh, 1999). The procedure

abstraction makes a given code region into a procedureand replaces the other instances of this code region with

calls to the new-made procedure, while the cross jump-

ing technique replaces the occurrences of the repeated

code region with jump instructions to an unique copy of

that code region. Branch rewriting, operation pattern

abstracting and register renaming techniques are also

used to increase code region repetitions. The average

program size reductions of 7.91% and 22.48% are re-ported for applying code compression only and applying

code compression plus code optimization, respectively.

Ernst et al. (1997) propose two compiler techniques,

called wire code and interpretable code, for two sce-

narios that transmission and memory are bottlenecks,

respectively. Both techniques gather information about

the common patterns that appear in the code and divide

the stream of code into two smaller streams: one holds theoperators and the other holds the literal operands

for each operator that needs a literal operand. Both

smaller streams are gzipped and the compression ratio

of 22.49% is reported for wire code. The interpretable

code, also called BRISC, scans the input program,

generates candidate instruction patterns and estimates

the most potential size reduction for all candidate pat-

terns. The potential size reduction is done by operandspecialization and opcode combination. Operand spe-

cialization separates the operands in the candidate

instruction one by one to achieve the maximum repeti-

tions of operands, while the opcode combination tries to

combine the most used opcode to achieve the maximum

repetition of opcode patterns. The repeating opcode

patterns and operand patterns are compressed using

dictionary-based algorithm. The average compressionratio of 59% is reported.

5. op src, src mult $rn, $rm

6. op src, imm bgez $rn, address

7. op dst, src �move $rn, $rm

8. op dst, imm lhi $rn, value

9. op src, src, imm sw $rn, offset($rm)

10. op dst, src, src add $rn, $rm, $rk

11. op dst, src, imm lw $rn, offset($rm)

3. Register operand dependency

Observation shows that the instruction sequences

translated from the source code have dependencies be-

tween the register operands. Experiments are carried out

to measure the number of dependent registers. There areon average 27.5% source registers are dependent on the

destination registers of the previous instructions (true

data dependency (Johnson, 1991)) and 12.1% destina-

tion registers are dependent on the previous destination

registers (output dependency) for MediaBench bench-

marks (Lee et al., 1997) compiled for MIPS instruction

set. This reality motivates us to compress the program

by utilizing these dependencies. The size of the dictio-nary storing the operand sequences can be reduced after

removing the dependent registers from operand se-

quences. The number of distinct operand sequences re-

duces as a result, thus reduces the size of compressed

program. The following procedures describe the com-

pression algorithm using such a technique.

3.1. Instruction classification

This step examines the types of operands in instruc-

tion formats to find the register dependency. To build

the compression model, we are required to examine the

instruction set of the MIPS R2000 processor (Gerry,

1988). The instructions are classified into the following

categories shown in Table 1 according to the different

types of operands.In Table 1, the op indicates the opcode of an

instruction, src (dst) indicates the register operand is

used as a source (destination) register for operation op,

and imm indicates the field is an immediate value. This

classification is based on the register types and the

number of registers the opcode really used. The

instructions �nop’ and �move’ are pseudo instructions,

but not assembly instructions. The pseudo instruction

Fig. 4. Representations of operand relations and residual operand

sequences.

298 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304

�move $rn, $rm’ can be implemented by either assembly

instruction �addu $rn, $rm, $r0’ or �or $rn, $rm, $r0’. The

opcode �addu’ for different operand sequences (�$rn,$rm, $rk’ ($rk 6¼ $r0) and �or $rn, $rm, $r0’, for example)

are encoded into different new opcodes. Encoding the

same opcode with different types of operand sequencesinto different opcodes reduces code size more than only

encoding the different operands. There are two reasons:

(a) There are only a small number of opcodes for MIPS

R2000. Encoding the opcodes used in different pseu-

do instructions distinctively increases fewer bits than

the encoding for distinct operand sequences them-

selves.(b) Some opcodes are likely to use a specific operand,

such as ’lw $reg, offset($sp). Encoding such cases

(opcode + likely operand) into a new opcode can

shorten the operand sequence, and shorter operand

sequences are more likely to be the same with each

other.

This classification is then used to find the dependencyrelationship between source and destination registers.

3.2. Register operand dependency

Observation reveals that instruction sequences may

have the same dependency relations between registers

even if they have a different opcode sequence or operand

sequences.Consider the two instruction sequences in Fig. 3. The

second source register of the second instruction is the

same as the destination register of the first instruction. A

mapping tag is used to indicate this dependency. Each

source register has a mapping tag. For an n-bit mapping

tag, we assigned the values:

(a) �0’: for load operation that will read a register id fromthe operand sequence,

(b) 1 to 2n � 1: for relation operation showing the num-

ber of instructions from the preceding instruction

with a destination register of the same register id

to the current instruction.

Fig. 3. Example opcode sequenc

Note that a non-zero mapping tag value (relation

operation), also called the mapping distance, counts

only the number of instructions with a true data

dependency (Johnson, 1991). The remaining un-mapa-

ble registers and the immediate values are listed and

called the residual operand sequence. We use a 2-bitmapping tag for example, value �0’ indicates the load

operation and values �1’, �2’ or �3’ indicate relation

operations. Fig. 4 shows the representation of relations

and the residual operand sequences:

We explain the representations of operand relations

for the first and the second instructions:

(a) The first instruction does not contain any sourceregister, so there is no need for any mapping tag.

All the operands are listed in the residual operand

sequence.

(b) In the second instruction, the first source register

($r2) has not appeared before, so the mapping tag

value is �0’ and this register is put in the residual

operand sequence. The second source register ($r3)

is the same as the destination register of the firstinstruction, and thus the mapping tag value �1’ indi-cates that this source register is the same as the des-

tination register of the instruction one position up.

The last register is a destination register, and is

put in the residual operand sequence.

The other instructions are proceed in the same way. It is

surprising that these two instruction sequences in the

e and operand sequences.

K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304 299

example have the same mapping value sequence al-

though they are quite different instruction sequences.

This method not only extracts the common register

relations, but also reduces the number of operands in

the original operand sequence to increase the re-usage of

the residual operand sequences. This step attempts tolocate the register dependencies for all instructions in the

basic blocks. This dependency relationship does not

cross the basic block boundary since the registers are not

guaranteed to be alive across the basic block.

3.3. Register majority

After removing the dependent source registers fromoperand sequence, the residual operand sequences still

contain redundancies. There are some registers appear-

ing frequently in different residual operand sequences.

We call the most frequent register among all residual

operand sequences the first (register) majority, the sec-

ond most frequent register the second majority, and so

on. Effective reduction of the dictionary size can be

achieved by remove the majorities. We borrowed valuesfrom the mapping tag to denote the mapping of the

independent registers to the majorities. Assuming that

there are m majorities, for an n-bit mapping tag, we

assign tag values:

(a) �0’: reserved for load operation,

(b) �1’ to �2n � m� 1’: for relation operations, and

(c) �2n � m’ to �2n � 1’: for majority relations which indi-cate the first majority, the second majority, and so

on.

The register majorities are different from one program to

another. These majorities are stored in dedicated regis-

ters, called Majority Registers (MRs), so that these MRs

can be mapped by mapping tag. Refer to the example in

Fig. 4. Assuming that the register �$r5’ is the first and theonly majority for a 2-bit mapping tag, tag value �3’ isreserved for majority. The instruction sequence in Fig. 4

can be transformed to Fig. 5.

In this step, the occurrences of the mapping tags with

load operations are replaced with the majority opera-

tions if the corresponding registers are majorities.

Fig. 5. Instruction sequence after applying the majorities.

3.4. Program encoding

After removing the registers with destination–source

and majority relationship from the operand sequences,

the program is divided into instruction sequences as the

basic unit for compression. An instruction sequence isdefined as a number of instructions with each instruc-

tion (except the last one) having the destination–source

relationship to the following instructions. Then, each

instruction sequence is operand-factorized, extracted of

the dependent registers and partitioned into three se-

quences: opcode sequence, mapping tag sequence and

residual operand sequence. Every instruction sequence is

independently Huffman-encoded into codewords CWOP,CWMAP and CWROD representing for the opcode se-

quences, mapping sequences and residual operand se-

quences, respectively. The entire program is transformed

into the form: [CWOP1, CWMAP1, CWROD1, CWOP2,

CWMAP2, CWROD2, . . . , CWOPi, CWMAPi, CWRODi],

assuming that the program has a total of i instructionsequences. Codewords are allowed to split at the end of

bytes. Bits from the spilled codeword are spilled into thenext byte. The compression ratio benefits from the use

of splitting codewords and Huffman-encoding, but will

cause the execution performance penalty. We trade-off

the performance advantage in exchange for more com-

pression ratio.

3.5. Branch target address

One side effect of code compression is that it alters the

locations of instructions in the program due to the

instruction size reduction. The branch offsets must be

patched according to the change of the target addresses

so as to reduce an unnecessary translation from

uncompressed address space into compressed address

space. Since the codewords can be of any length and not

necessarily byte aligned, the branch target must be ableto point at any bit location within a byte. The branch

offset is divided into two fields:

(a) byte address (13 or 23 bits depending on the branch

type), and

(b) bit offset (3 bits).

Since the original offset specifies the address at wordgranularity and the new offset specifies the address at the

bit granularity, the overall branch distance is reduced to

1/32. Nevertheless, for the program analyzed, only a

small percentage of targets require more than 23 bits.

For those branches, a jump table is provided for storing

the target addresses. Similar to Lefurgy et al.’s (1997),

the jump table addresses are patched up to reflect the

compressed addresses. For those systems supportingdynamically linked procedure calls (or dynamically

linked libraries, DLL), a separated translation table

300 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304

such as LAT (Wolfe and Chanin, 1992) may be more

suitable for the operating system to dynamically allocate

memory space for these procedures at run time.

4. Decompression engine

This section describes the decompression engine de-

signed for our compression method. The decompression

engine mainly consists of three dictionaries and an

Instruction Assembly Buffer (IAB) as shown in Fig. 6.

4.1. Dictionaries

Three dictionaries store the opcode sequences, map-

ping tag sequences and residual operand sequences,

independently. The opcode sequences are stored in the

Opcode Dictionary (OPD) with each entry containing

three fields:

(a) opcode field: the opcode bits of an instruction,

(b) mapping type field: two-bit field indicating howmany mapping tags must be read for this instruc-

tion, and

(c) end mark: signifies the end of an instruction se-

quence.

Fig. 6. Decompression engine.

Since the maximal number of source registers for

instructions is 2, 2 bits are sufficient to distinguish zero,

one and two registers. The same opcodes with different

mapping types will be encoded into different OPD en-

tries due to their different mapping type fields.

The mapping tag sequences are stored in the Mapping

Dictionary (MAPD). We optimize the MAPD size by

sharing a shorter mapping tag sequence with a longer

one of which the prefix sub-sequence is the same as the

shorter one.

The Residual Operand Dictionary (ROD) which stores

residual operand sequences consists of two storages:

(a) Register Generation (RGEN): storing the registerlists of all residual operands and the register major-

ities.

(b) Immediate value Dictionary (IMD): storing the dis-

tinct immediate values in the program.

As proposed by Araujo et al. (1998), residual operand

sequence with immediate values can be used to minimize

the RGEN. The register majorities are stored into themajority registers before the compressed program begins

execution. When the residual operand sequence contains

an immediate value, the Rregister Bus (RegBus) associ-

ated to the immediate operand is not needed and can be

made do not care. For the example in Fig. 5, the residual

operand sequences �$r3, 0xfcb08, $r2, $r2, 0x1c’ and

�$r3, $r6, 0xa004, $r2, 0xa002505c’ can share the com-

mon register sequence �$r3, $r6, $r2, $r2, $r? ($r? meansdo not care). When an instruction is assembled, all

registers are sent to the IAB via RegBus, but the register

with the corresponding position of an immediate value

will be ignored by IAB according to the opcode from

OPD. On the other hand, immediate values are stored in

the IMD for each distinct immediate value in the pro-

gram, no matter which residual operand sequence con-

tains it. These values are clustered into memory banksaccording to the number of bits consumed. Access IMD

and RGEN can be processed in parallel to accelerate the

decompression.

4.2. Instruction Assembly Buffer (IAB)

The CWOP, CWMAP and CWOD are extracted from

the compressed program to index to instruction op-

codes, mapping tags and residual operands, respectively.

The fetched opcodes, mapping tags, registers and

immediate values are sent to the IAB to assemble to the

original instruction sequences. The IAB must decide:

(a) How many mapping tags it must read according to

the mapping type in opcode dictionary.

(b) Where the register operands are to be fetched (Reg-

ister Bus, Mapping Queue or Majority Registers)

depending upon the value of mapping tag.

K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304 301

(c) Whether the immediate value is used depending

upon the opcode.

Mapping Queue (MQ) is a register mapping queue

that stores the destination registers id of previous

several instructions. When the tag value is a relationoperation, the register id in the MQ at the position

specified by the mapping tag value is fetched for the

current source register. After assembling the instruc-

tion, the decompression engine pushes the destination

register id into the MQ so that the further mapping tag

can reference it. The number of entries in the MQ is

equal to the maximal mapping distance defined previ-

ously.Fig. 7 shows the decompression steps for the

example in Fig. 5. The first instruction contains no

source register, so there is no need for mapping tag. All

operands come from corresponding residual operand

sequence. This instruction defines a destination regis-

ter, so the decompression engine pushes the destina-

tion register id ($r3) into mapping queue as shown in

Fig. 7(a). The second instruction reads two mappingtags:

(a) The first mapping tag value �0’ indicates that the

source register ($r2) comes from residual operand

sequence.

(b) The second mapping tag value �1’ indicates the sec-

ond source register ($r3) is the first register id in

MQ, which is the destination register of the previousinstruction.

The last register is a destination register. This register

must be read from the residual operands and is pushed

into the MQ as shown in Fig. 7(b). The last instruction is

processed in the same way as the second instruction as

shown in Fig. 7(c) except that:

Fig. 7. Decompre

(a) The first mapping value �3’ indicates that the registerid comes from the Majority Register (MR).

(b) This instruction defines no destination register, so

there is no need to push the destination register id

into the MQ.

5. Experimental results

The following section describes the experimental re-

sults of code compression by register dependency. We

use the MediaBench for analyzing this technique. The

programs are compiled for the MIPS R2000 processor

using GCC version 2.8.1 with optimization option O2.We first examine the suitable size of mapping tag to find

the best compression ratio, and then compare the

compression ratios between this method and the oper-

and factorization method.

5.1. Size of mapping tag

It is critical to determine what size of the mapping tagis sufficient for compacting the both dictionaries and the

program encoding. Simulation is used to find the suit-

able tag size. The simulated tag sizes range from 1 to 5

bits (since the encoding of register in the original pro-

gram is 5 bits long), and the number of majorities ranges

from �0’ to �2tag size � 1’. Following subsections describe

the compacting effect using different sizes of mapping

tag.

5.1.1. RGEN size reduction

Fig. 8 shows the size reduction of RGEN versus the

size of mapping tag. The x-axis is the number of

majorities and the y-axis is the RGEN size ratio to the

original program size. This graph consists of six data

sets classified into two categories:

ssion steps.

Fig. 8. Size reduction of RGEN vs. mapping tag size.

Fig. 10. Dictionary size of RGEN plus MAPD vs. mapping tag size.

Fig. 11. Final compression ratio vs. mapping tag size.

302 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304

(a) The first data set: a single point indicating the size

ratio of RGEN resulted from the operand factoriza-

tion method.

(b) The other data sets: consisting of 2n points each forn ¼ 1 to 5, showing the size ratios using n-bit map-

ping tag with 0; 1; . . . ; 2n � 1 majorities.

In the second legend data set (n ¼ 1 case), we saw

that the size reduction due to dependency is more than

due to majorities. Furthermore, the last point of each

curve always tilts up. This is because the number of

registers removed from the residual operand sequencesby mapping distance 1 is sufficiently larger than the

number of (2n � 1)th majority registers. From Fig. 8, a

5-bit mapping tag with 30 majorities reduces the RGEN

size the most.

5.1.2. Mapping penalty

Although the size of RGEN is reduced, we introduce

the MAPD. Fig. 9 shows the MAPD size compared tothe original program. The x-axis is the number of

majorities and the y-axis is the MAPD size ratio. For-

tunately, the size of MAPD is much smaller than the size

reduced in RGEN. The overall effect is still positive for

compression. Fig. 10 shows the size of MAPD plus

RGEN. The overall dictionary size reduction is 8% by

using register dependency compression method.

Fig. 9. MAPD size vs. mapping tag size.

5.2. Compressed code reduction

Fig. 11 shows the final compress ratios versus the size

of mapping tag. Since the sizes of RGEN are reduced,

the size of program encoding is also reduced as a result.The main advantage to reduce the program encoding

is the reduced number of entries of residual operands,

which is almost proportional to size of RGEN plus

IMD. As we expect, a 5-bit mapping tag with 30

majorities results in the best compression ratio.

To detail the benefits of compression using our

method, we examine the size contributions of dictio-

naries and the program encodings. Fig. 12 shows sizecontribution of each component in the compression

Fig. 12. Size ratios for all components in a compressed program.

cjpeg decode epic jpgtran mpeg2dec rawcaudio rdjpgcom Agv

Fig. 13. Final compression ratios for all benchmarks.

Table 2

Size reduction of dictionary and program encoding

Dictionary size Compressed code size Dictionary size reduction Program encoding reduction

Operand factorization 0.1516 0.3337 N/A N/A

1-bit mapping tag 0.1262 0.392 0.0254 )0.05832-bit mapping tag 0.1053 0.3566 0.0463 )0.02293-bit mapping tag 0.0855 0.3379 0.0661 )0.00424-bit mapping tag 0.072 0.3218 0.0796 0.0119

5-bit mapping tag 0.0628 0.306 0.0888 0.0277

K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304 303

ratio. The x-axis shows the size of mapping tag (# bits)and y-axis shows the best compression ratios for each

size of mapping tag. There are seven components:

(a) OPD: opcode dictionary,

(b) IMD: immediate value dictionary storing distinct

values in the program,

(c) RGEN: register generation storing register list of the

residual operands,(d) MAPD: mapping tag dictionary,

(e) CWOP: encoding of the opcode sequences,

(f) CWMAP: encoding of the mapping tag sequences,

and

(g) CWROD: encoding of residual operand sequences.

Fig. 12 shows that the three largest components in

operand factorization change from RGEN, CWOP andCWROD into CWOP and CWMAP. The size of RGEN and

CWROD reduced most. Table 2 shows size reductions

when the components are classified into two major

parts:

(a) Dictionary part: consists of OPD, MAPD, RGEN

and IMD.

(b) Program encoding part: consists of encoding of op-code sequences, mapping tag sequences and residual

operand sequences.

Table 3

Detailed component contribution for benchmark programs

Benchmark

name

Method OPD (%) IMD (%) RGEN (%) M

cjpeg Od Fact 1.01 2.37 11.78

Reg Map 1.01 2.37 1.33 1

decode Od Fact 2.02 2.14 13.46

Reg Map 2.02 2.14 3.57 2

epic Od Fact 1.19 3.35 11.72

Reg Map 1.19 3.35 1.29 1

jpgtran Od Fact 0.83 2.38 11.18

Reg Map 0.83 2.38 1.35 1

mpeg2dec Od Fact 1.48 3.69 13.15

Reg Map 1.48 3.69 2.20 2

rawcaudio Od Fact 1.89 3.20 13.30

Reg Map 1.89 3.20 1.87 2

rdjpgcom Od Fact 1.70 3.39 12.94

Reg Map 1.70 3.39 1.67 2

Agv Od Fact 1.44 2.93 12.50

Reg Map 1.44 2.93 1.89 1

From Table 2, the maximal factor in reducing the

compression ratio is the reduction of dictionary size

rather than program encoding.

Fig. 13 shows the detailed compression ratios for all

benchmark programs. Each benchmark consists of 2

bars, one for operand factorization (Od) and the otherfor our register dependency (Reg). The OPD, IMD and

CWop are invariant in these two methods. The average

decrement of the RGEN is 11% and the increment of

MAPD is 2%. This is the main advantage of register

dependency method. The average decrement of CWod is

14%, but the increment of CWmap is 12%. Total detail

statistics are given in Table 3.

APD (%) CWOP (%) CWROD (%) CWMAP (%) Final (%)

12.54 20.83 48.53

.57 12.54 4.36 13.70 36.88

13.04 18.98 49.64

.17 13.04 4.96 13.19 41.09

13.29 19.95 49.5

.99 13.29 4.34 13.73 39.18

12.47 21.07 47.93

.34 12.47 4.73 13.77 32.14

13.15 20.31 51.78

.02 13.15 5.18 13.42 41.14

12.65 17.79 48.83

.34 12.65 4.54 12.94 39.43

12.61 18.33 48.97

.11 12.61 4.47 13.07 39.02

12.82 19.60 49.31

.93 12.82 5.94 12.11 38.41

304 K. Lin et al. / The Journal of Systems and Software 72 (2004) 295–304

6. Conclusion

In this paper, we propose to use the register depen-

dency to compress embedded system programs for a

MIPS processor. The key idea of this method is to re-

move the dependent registers from the operand se-quences to reduce the dictionary size and encoded

program size. The success in reducing code size is as-

cribed to:

(a) Encoding the opcodes with likely used operands into

new opcodes shorten the operand sequences to in-

crease the reusage of common operand sequences.

(b) Destination–source relationship eliminates the regis-ter specifications in a serial of instructions manipu-

lating the identical data.

(c) Majority relationship eliminates the multiple occur-

rences of the same register in residual operand se-

quence to increase the re-usage of the common

operand sequences.

This research can be further improved in two ways:

(a) From Fig. 12, the program encodings for both op-

code sequences and mapping tag sequences are the

largest portions contributing to the compression

ratio. Reduction in the mapping tag size and re-usage

of the OPD entries are next step goals for reducing

both components in the compressed code.

(b) We can improve the algorithm to find more rela-tions between operands. Such improvement may in-

clude building both the language grammar and

register allocation rules, and compressing the

instruction sequences to the derivation of these

rules.

References

Aho, A. et al., 1986. Compilers: Principles, Techniques and Tools.

Addison-Wesley, Boston.

Araujo, G., et al., 1998. Code compression based on operand

factorization. In: 31st Annual ACM/IEEE International Sympo-

sium on Microarchitecture.

Cooper, K.D., McIntosh, N., 1999. Enhanced code compression for

embedded RISC processors. In: ACM SIGPLAN ’99 Conference

on Programming Language Design and Implementation SIG-

PLAN Notices, 34 (5), pp. 139–149.

Ernst, J. et al., 1997. Code compression. In: 1997 ACM SIGPLAN

Conference on Programming Language Design and Implementa-

tion SIGPLAN Notices, 32 (5), pp. 358–365.

Fraser, C.W., Evans, W., 2001. Bytecode compression via profiled

grammar rewriting. In: Proceeding of Int’ l Conference on

Programming Languages Design and Implementation.

Fraser, C.W. et al., 1984. Analyzing and compressing assembly code.

SIGPLAN Notices 19 (6), 117–121.

Gerry, K., 1988. MIPS RISC Architecture. Prentice-Hall.

Haskell, B.G. et al., 1996. Digital Video: An Introduction to MPEG-2.

Chapman & Hall.

Huffman, D.A., 1952. A method for the construction of minimum

redundancy codes. Proc. IEEE 40, 1089–1101.

Johnson, M., 1991. Superscalar Microprocessor Design. Prentice-Hall.

Kissell, K., 1997. MIPS16: High-density MIPS for the Embedded

Market. Silicon Graphics MIPS Group.

Lee, C., et al., 1997. MediaBench: a tool for evaluating and

synthesizing multimedia and communications systems. In: 30th

Annual ACM/IEEE International Symposium on Microarchitec-

ture.

Lefurgy, C., et al., 1997. Improving code density using compression

techniques. In: 30th Annual International Symposium on Microar-

chitecture.

Proebsting, T.A., 1995. Optimizing an ANSI C interpreter with

superoperators. In: Proceedings of ACM Conference on Principles

of Programming Languages, pp. 322–332.

Turley, J.L., 1995. Thumb squeezes ARM code size. Microproc. Rep. 9

(4), 27.

Wolfe, A., Chanin, A., 1992. Executing compressed programs on an

embedded RISC architecture. In: 25th Annual International

Symposium on Microarchitecture.

Kelvin Lin received the M.E. and Ph.D. degrees in computer engi-neering from the Department of Computer Science and InformationEngineering, National Chiao Tung University, Taiwan, ROC in 1995and 2002, respectively. He is an IC design engineer at R&DDivision ofVIA Technologies, Inc.

Jean Jyh-Jiun Shann received the Ph.D. degree in computer engi-neering from the Department of Computer Science and InformationEngineering, National Chiao Tung University, Taiwan, ROC in 1994and the M.E. degree in computer engineering from the Department ofElectrical and Computer Engineering, University of Texas at Austin,USA. She is a Associate Professor at the Department of ComputerScience and Information Engineering of Chiao Tung University, Tai-wan, ROC. Her research interests include parallel processing, neuralnetwork and fuzzy systems.

Chung-Ping Chung received the M.E. and Ph.D. degrees in electricalengineering from Texas A&MUniversity, College Station, TX, USA in1981 and 1986, respectively. He is a Professor at the Department ofComputer Science and Information Engineering of Chiao Tung Uni-versity, Taiwan, ROC. His research interests include computer archi-tecture, parallel processing, VLSI system design and systemsimulation.