source coding algorithms for fast data compression

SOURCE CODING ALGORITHMS FOR FAST DATA COMPRESSION

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

BY

Richard Clark Pasco

May 1976

@ Copyright 1976

b Y

Richard Clark Pasco

This page intentionally left blank.

ABSTRACT

Noiseless source coding, or noiseless data compression, is a one-

to-one mapping between data and a more compact representation. Invertible

arithmetic algorithms are presented which encode strings of random source

symbols with known conditional probabilities into strings of symbols for

a channel. One algorithm encodes blocks of fixed length into codewords

satisfying the prefix condition whose expected length exceeds the source

Shannon entropy by at most two symbols plus an exponentially decreasing

function of computational precision. The new process differs from pre-

vious coding algorithms, such as Huffman coding, in that computation time

grows only linearly with block length, permitting real-time compression

at rates arbitrarily close to the source entropy rate by using very long

blocks. A similar algorithm encodes variable length source strings into

fixed size codewords with comparable results. A generalized structure is

proposed to unify the new algorithms with those of Elias and Rissanen.

Acknowledgements

I wish to thank: Professor Thomas M. Cover, my advisor, whose

guidance, technical insight, and direction was invaluable; Professor

Robert M. Gray for his editorial assistance; Bell Telephone Laboratories

for salary and expenses during my study; U.S. Air Force Contract fIF44620-

74-C-0068 for computer time; my son, Matthew, for cheerful diversion

from my work; and Ms. Katherine Adams for typing the final manuscript.

Table of Contents

. . . . . . 1 . INTRODUCTION AND NOTATION

2 . FIXED-TO-VARIABLE CODES . . . . . . . 2 . 1 Problem Statement and History . 2 .2 The Fixed-to-Variable Algorithms

2.2 .1 The encoding algorithm . 2.2.2 The decoding algorithm . 2 .2 .3 Codeword set is proper . 2.2 .4 Compression rate . . . . 2.2.5 Computational complexity

3 . VARIABLE-TO-FIXED CODES . . . . . . . 3.1 Problem Statement and History . 3.2 The Variable-to-Fixed Algorithms

4 . EXPERIMENTAL IMPLEMENTATION . . . . . 5 . GENERALIZATION . . . . . . . . . . .

. . . . . . Appendix A EFFECTS OF QUANTIZATION OF PROBABILITIES 65

Appendix B MAXIMIZING VF MESSAGE LENGTH DOES NOT

. . . . . . . . . . . . . . . . . . . . MINIMIZE RATE 69

. . . . . . . . . . . . . . . . . . Appendix C PROGRAM LISTINGS 73

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . - 1 0 5

vii

CHAPTER 1

INTRODUCTION AND NOTATION

This paper presents new algorithms for variable rate noiseless source

coding of data from discrete-time, finite-alphabet sources. The new algo-

rithms are fast, achieving compression rates arbitrarily close to the

source entropy while requiring a computational effort of two multiplica-

tions and one addition per symbol to be encoded. This speed permits coding

operations to be performed in real-time, eliminating the need for storage

of tables of codewords.

To put the new algorithms in perspective, we explain some terms used

in the previous paragraph. Source coding is a translation from redundant

source data to a more compact channel representation. By noiseless, we

mean that the source data may be reconstructed exactly given the correct

channel data.

To facilitate the study of the new algorithms, we make some definitions.

A set S of finite strings of symbols from a finite discrete alphabet A

is proper [~elinek and Schneider, 19721 if and only if no string in S is

the prefix of any other string in S , and S is complete [~ilbert and

Moore, 19591 if and only if every infinite string of symbols from A is

prefixed by some string in S . Table 1 illustrates examples drawn from

the binary alphabet A = {0,1) . Table 1. Examples of Complete and Proper

Sets of Binary Strings

Not Complete

Proper

These concepts are well known in the theory of variable length codes:

a proper set of codewords is called a prefix or instantaneous code

[~bramson, 19631; a complete set of codewords is called an exhaustive

code [~ilbert and Moore, 19591.

Cohn El9761 has described a new formal structure for source codes.

The following definition is intended to agree with his use of the same

term: A noiseless block-to-block code is a one-to-one mapping from a com- ----

plete and proper message set of strings of source symbols into a proper

codeword set of strings of channel symbols. The elements of the domain

will be called messages; the elements of the range will be called codewords.

In this paper we will restrict our attention to noiseless block-to-

block codes. This restriction leads to some useful properties. A one-to-

one mapping is invertible; every codeword represents a unique message. A

complete message set guarantees that every possible source sequence can be

encoded. A proper message set allows the encoder to transmit a codeword

immediately upon receipt of the last symbol of a message. Finally, a proper

codeword set allows the decoder to output the decoded message instanta-

neously upon receipt of the last symbol of a codeword.

A performance measure associated with source codes is their rate, which

we shall define by R = EL / EN

where EN is the expected message length and EL is the expected codeword

length. This expression for block-to-block codes was presented by Cohn as

a theorem derived from a more general definition.

Two specific types of block-to-block codes are especially easy to im-

plement. If the message set is the set of all strings of fixed length N

source symbols, the code is called fixed-to-variable (FV). Alternately, if --

the codeword set is constrained to contain only codewords of fixed length

L channel symbols, the code is called variable-to-fixed -- (VF). If neither

condition is met, the code is called variable-to-variable. - It is perhaps

of historical interest to know that most early block-to-block codes were

of the FV category [~uffman, 1952, and Gilbert and Moore, 19591. Variable-

to-variable [~olomb, 19661 and variable-to-f ixed [~unstall, 1968, Schalkwijk,

1972, and Jelinek and Schneider, 1972 and 19741 codes have appeared more

recently.

We will make the notational convention that if x is a singly-

infinite (hereafter shortened to infinite) sequence, x will denote the i

i-th symbol of x and x(i) = x x 1 2u'Xi

will denote the i-symbol prefix

Suppose an information source emits the infinite random sequence X

of random symbols from an M-ary alphabet I = { 0 1 2 - 1 , and

let x denote a particular M-ary sequence. Assume that the conditional

probability distributions

i - = P(Xi=xil ~(i-l)=x(i-1))

are known for all i and for all x . n

The probability that a particular finite string x(n) E IM prefixes

X is given by n

P(x(n)) = P(X(n)=x(n)) = Np(xilx(i-1)) (3 )

i= 1 where x(0) is the null string.

Suppose that a D-ary noiseless channel is available which can transmit

symbols from I = { 0 2 D - 1 with equal cost and without error. In-

tuitively, the average information content, measured in D-ary digits, of

the i-th source symbol Xi is the conditional entropy of Xi given the

past. The entropy of Xi conditioned on a specific past is

and the conditional entropy of Xi given the past is

Let B be a complete and proper message set from IM

and let

b(x)~B be the unique message which prefixes x . For all b in B,

let ~ ( b ) be its length and b for 1 I i I N(b) be the i-th symbol of i

b . The probability that message b E B prefixes X , given by (3) with

n = N(b) and x(N(b)) = b , is N(b)

The expected length of the first message is

EN1 - - C' P (b) N(b) . b E B

Let L(b) denote the length of the codeword assigned to message ~ E B . The expected length of the first codeword is

Finally, we may define the entropy, or average information content, of the

first message,

Because messages and codewords are in a one-to-one correspondence, they

share a common distribution. Thus HD(b(X)) is also the antropy of t h e

first codeword.

In this paper, a new theory for the design of algorithms which quickly

encode data with known statistics at rates approaching the theoretical

minimum will be developed. In Chapter 2, shortcomings of traditional FV

methods will be illustrated and used to motivate the new work. A new FV

compression algorithm will be presented and analyzed in detail. In

Chapter 3, a modified algorithm for VF coding will be similarly analyzed.

In Chapter 4, demonstration implementations of the new algorithms will be

discussed. Finally, in Chapter 5, the theory will be generalized and re-

lated to other work in the field. For an overview of the new theory, it

is suggested that the reader go to Chapter 5 next, and then read Chapters

2 through 5 in sequence.

CHAPTER 2

FIXED-TO-VARIABLE CODES

2.1 Problem Statement and History

With fixed-to-variable (FV) codes, the message set is the set of all

M-ary strings of length N ,

The message which prefixes the infinite source string X is simply the

first N symbols from X ,

A FV code assigns a unique distinct codeword from a proper set to each

string in 1: . Let L be a random variable whose value is the length

of the codeword assigned to b(X) and hence to X(N) . The rate expres-

sion (1) becomes for FV codes

R = E L / N (12)

The FV coding problem is the design of algorithms which assign codewords

to messages in I with minimal rate R and reasonable computational M

complexity.

Shannon [1948] and Fano [1948] proved that there exists no FV code

which maps message set into a codeword set with expected codeword IM

length less than HD(x(~)) and that there always exists a code with ex-

pected codeword length bounded by

and Huffman [1952] presented an algorithm for finding the optimum such

code. By making N arbitrarily large compression rate EL/N approaching

the per-symbol entropy HD(x(~))/~ could be achieved, but this is not

practical when codes need to be computed in real-time. Real-time coding

is necessary when little memory is available for table storage and the

dependence of the conditional distribution of the source on the past is

complex. The impracticality of real-time Huffman coding with large N

lies in the number of computations required to assign codewords to source

strings.

~uffman's algorithm maps a probability distribution over a message

set into a set of codewords with integer lengths. To make the integer-

length constraint have negligible effect on the rate of the code, the

messages and codewords are made very long. The number of messages n

grows exponentially with the message length N . The computation time

grows faster yet: Van Voorhis [1975] has shown that a straightforward

implementation of ~uffman's algorithm on a set of n messages requires

2 O(n ) calculations to assign codewords, where the notation f(x) =

O(g(x)) means that the ratio f(x)/g(x) is bounded. A clever implemen-

tation of Huffman's algorithm still requires O(n log n) steps. With

N N n = M this means that O(NM ) operations per block are needed. It is

clearly impractical to implement these algorithms as N grows.

In this paper, however, a different approach is taken. With arith-

metic, instead of coding over the set of all source strings of

length N , single source symbols are encoded individually and the code

elements are combined arithmetically into a codeword. Various ways in

which the code elements may be combined are discussed in Chapter 5.

Consider the communications system shown in Figure 1.

Figure 1. Communication System Block Diagram

- - - L - - - - - -1 - - - - - - - - L - I I I

I I I

I I

I +Predictor I

I .- J.

- I I I I

The compressor, consisting of a predictor and an encoder, implements the

function defined as a code in the previous section. The predictor con-

tains the knowledge of the source statistics. By observing previous source

symbols, the predictor informs the encoder of the conditional probability

distribution of the next symbol to be emitted. The encoder grows a code-

word by combining this information with the actual symbol emitted. By an

analogy to be developed, the encoder maps source symbols into code ele-

ments whose lengths are not constrained to be integral numbers of channel

symbols. A message maps into N of these elements which are packed

snugly into a large codeword, necessarily of integer length for trans-

mission. The codeword is transmitted through a noiseless channel to the

Source

expander, which inverts the work of the compressor. The expander contains

a replica of the predictor in the encoder, and a decoder which uses the

predictor's output to dismantle the codeword. This is possible because

the decoder needs to know only the distribution for the symbol it is

1 I ' i-' Encoder Decoder

- - *user ' I ' -

I

working on, and it outputs symbols sequentially as it extracts them from

the codeword. The integer-length constraint is made to have negligible

effect on the rate by increasing the block length. This is practical

because the computational complexity grows only linearly with N , the

number of source symbols encoded in the block.

Arithmetic coding is a generalization of a procedure due to Elias

(unpublished result, explained by Jelinek in [1968a] and [1968b]). Briefly,

Elias' idea is to define a cumulative distribution function (CDF) on a

lexicographic ordering of the set of all strings of length N . To encode a given message, the CDF is evaluated at the message. In our notation,

we say of sequences x and y that x < y if and only if xi < yi for

the minumum i such that xi * yi . Define the symbol CDF -

and the string CDF -

Elias observed that F(x(N)) is the sum of the probabilities of N sets

of strings, where the i-th set contains all strings which first differ

from x by being less in the i-th symbol, so

The probability of a string may be calculated from the probability of a

string one symbol shorter and the conditional probability of the last

symbol, 10

Elias exhibited a sequential algorithm based on (16) and (17) for comput-

ing the message CDF F(x(N)) . Elias coding is a special case of

arithmetic coding because the symbol CDF (14) maps source symbols into

code elements and the sum (16) combines them into a codeword. By its

definition, F(x(N)) is a monotonic function of x(N) and therefore an

invertible function. At each message x(N), , F(x(N)) increases by a

step of height P(x(N)) ; hence specifying any point in the interval

[F(x(N) ) , F(x(N) ) + P (x(N) ) ) uniquely specifies x(N) . Such a point

may be obtained by truncating the D-ary expansion of F(x(N)) + P(x(N))

to bogD P(x(N))] digits, where [ z ] = the least integer not less than

z . By (9) the expected number of digits lies in the interval

[H~(x(N)) , HD(X(N)) + 1 ) . Unfortunately, Jelinek's statement that the computational complexity

of Elias' algorithm grows linearly with the message length N is only

correct for small N , where the product in (17) can be represented in

one computer word. For large N , multiprecision techniques must be

used to accurately represent P(x(i)) . Suppose that the conditional

probability p(xil x(i-1)) is expressed in J digits for all i . Then,

by (17), P(x(i)) has J more digits than Pi-1)) , or a total of

iJ digits. The complexity of each of the N multiplications grows

linearly toward NJ . Therefore the total complexity of encoding a mes-

sage of length N is O(N~J) ; i.e., it grows as the square of the

message length.

Another problem with Elias' algorithm is that the resulting code-

words do not satisfy the prefix condition; the codeword set is not proper.

11

Therefore it is necessary either to attach a length-indicating prefix to

each codeword or to insert a comma between codewords. Either technique

introduces an inefficiency, lengthening the average codeword by approxi-

mately logD N digits.

2.2 The Fixed-to-Variable Algorithms

In this section an encoding algorithm, which maps messages of N

source symbols into variable-length codewords, and its inverse for decod-

ing are presented. The length L of the codeword W depends on the

message and the source statistics, with expected length bounded by

where V (K) is an exponentially decreasing function of the computational D

precision K , and can be made negligibly small with convenient values

of K . The compression rate can therefore be made arbitrarily close to

HD(X(N))/N by making N large enough. This is done without penalty,

for computational complexity grows linearly with N . More specifically

it grows as O(NJ(K+M)) , where N is the message length, J is the

number of digits in the D-ary representation of the source probabilities,

K is the precision in D-ary digits of internal arithmetic, and M is

the source alphabet size. The algorithms require that radix-D notation

be used for number representation and arithmetic. Therefore, although D

is arbitrary, implementation on a digital computer is easiest when D is

a power of 2 . Although the structure of the encoding algorithm presented in this

section strongly resembles Elias coding, the codeword is no longer sub-

ject to interpretation as a cumulative distribution function on the set

of all messages of length N . This is because finite-precision arith-

metic is used to achieve the linear growth of computational complexity

with message length. It was experimentally confirmed that the obvious

scheme of rounding the results of all calculations in Elias' algorithm

fails; losses in precision prohibit all but the first few symbols from

being decoded correctly. The significant result here is that by using

the proper strategy for limiting precision in encoder and decoder, the

message may be decoded without error, and only a small penalty in code-

word length is paid.

Another new result is that by making the codewords just one digit

longer than with Elias coding, a proper codeword set results. Therefore

no commas or length-indicating prefixes with their inherent inefficiencies

are needed.

2.2.1 The encoding algorithm

We will begin our study with the encoding algorithm. First we will

examine the data structures in the computer memory during the execution of

the algorithm. The encoding algorithm is initially presented in the struc-

tured form of a high-level computer language, then translated into a

recursive notation for analysis. Following this we will develop an in-

tuitive understanding of its operation, and finally mathematically

formalize this understanding.

Four variables, Q , C , F , and T , will represent the data struc-

tures for the encoding algorithm.

Let Q be an array containing M fixed-point J-digit fractional

D-ary numbers. Each number Q(m) , where ~ E I M , has a rational value

defined by

where the qmj E ID are the digits in its D-ary expansion. Assuming

that the P(x. Ix(i-1)) are multiples of D-j for all xi E IM , array 1

Q will hold conditional probabilities for the i-th symbol to be encoded

provided by the predictor,

qi(xi) = p(xilx(i-1)) for all xieIM . (18

-J If for some i the p(xilx(i-1)) are not multiples of D then the

Qi can be made arbitrarily close to the actual probabilities by increas-

ing their precision J . Appendix A finds a precision J sufficient to

place a given bound on the added codeword length due to this quantization.

Typically, J will be on the order of a computer word length. The Qi

must satisfy

0 I Qi(xi) < 1 for all xi&IM , (19)

-J qi(xi) is a multiple of D for all xi E IM , (20)

and

Let C be another array, formatted exactly like Q . Array C is

filled with the cumulative distribution function given by Ci(0) = 0

and

ci(xi) = C' Qi(m) for all positive x. 1 E I M ' (22) m < x i

Therefore (21) and (22) imply that for all x. E: IM , 1

Let F be a multiprecision fixed-point fractional D-ary number.

While it will rarely be needed, enough storage for JN digits should be

available. A pointer may be used to indicate the least significant

nonzero digit. The rational value of F is defined by

where the £.€ID are the digits in its D-ary expansion. We will soon J

see how F is used.

Let T be a normalized floating point number, consisting of two

parts: A significant field containing K D-ary digits to tl t2 ... t K- 1

satisfying t €ID for 0 I k I K-1 and t * 0 , and an exponent field k 0

containing the nonnegative integer T . Later we shall establish require-

ments for K and bounds on T . Typically K will be on the order of a

computer word length. The rational value of T is defined to be

Using positional scientific notation, T would be written

From this definition it follows that for any positive integer n

T > n if and only if T < D - ~ ,

-n -n+l T = n if and only if D I T < D Y

and -n+l

T < n if and only if D I T .

The encoding algorithm follows.

Algorithm FVE (Fixed-to-Variable Encoding):

(1) Set F +- 0 . (2) Set T -+ 1 . (3) For i -+ 1,2,3, ... N do:

(3a) Load the Q and C arrays from the predictor.

(3b) Get symbol x from the source. i

(3c) Set F -+ F + T.c~(x~) . (3d) Set T -+ T.Qi(xi) truncated to K significant digits.

(4) Set L +- T + 1 . (5) Truncate F to L digits and add D - ~ (i.e. set F +- D - ~ ~ ~ F + 1J ). (6) Transmit the L digits of F . (7) Go to 1 and begin encoding the next message.

For analysis it is desirable to attach a distinct label to each value

taken on by the variables in Algorithm FVE. Let F and T denote the i i

values attained by F and T respectively after the i-th iteration of

step FVE-3. Thus

F = O 0

and

and, for 1 I i I N ,

and Ti = T Q. (xi) truncated to K i-1 1 significant digits. (30)

Note that because Qi(xi) < 1 , (30) implies that {T~) is a decreasing

sequence. Let T~ denote the value contained in the exponent field of

16

T after the N-th iteration of step FVE-3, and let W denote the result

of step FVE-5. In this notation, steps FVE-4 and FVE-5 become

and

where LzJ = the greatest integer not exceeding z . To develop an intuitive understanding of the operation of Algorithm

FVE, suppose that K -t and there is no truncation in step FVE-3d. Then

and Fi = F(x(i))

where the right-hand quantities are as defined for Elias coding. When T

is truncated to K digits, however, this interpretation is invalid. One

valid interpretation is that F is an overlapped concatentation of code

elements C(xi) , with successive code elements assigned less significant

digits of F . T is then a pointer indicating where in accumulator F

the next code element may be placed without overlapping its predecessor

too much. Another interpretation is that F is a sum of scaled code

elements, and T is a scale factor which places a sufficiently small

weight on each code element so that it does not interfere with its pre-

decessor. Under these interpretations, the necessity of truncation rather

than rounding is clear. If T were rounded up in step FVE-3d, successive

code elements might be weighted so heavily that they could interefere with

preceding code elements.

In the following example, Algorithm FVE encodes the sequence

3 3 1 3 3 3 3 5 3 3 from I6 into the decimal codeword 2198. Let M=6,

D=10, N=10, J=2, and K=2 . Suppose the source emits independent sym-

bols with identical distributions

By (4), the entropy of each symbol is H (X.) = 0.319 digitslsymbol for 10 1

all i , and by (9) the entropy of a message of length 10 is HlO(X(lO))=

3.19 digitslmessage. (By (45) we have V10(2) = .0458 digitslsymbol, and

the expected codeword length, by (51), is between 4.19 and 5.65 digits.)

We encode the typical sequence 3 3 1 3 3 3 3 5 3 3 . Step FVE-1 sets

Fo = .O and step FVE-2 sets To = 1.OD-0 , where the notation a.bD-c

means a.b x D-' , the D being used to separate the significant field

of T from its exponent field T . Table 2 lists the results of step FVE-3.

Table 2. Example of Step FVE-3

i - - Xi Fi Ti

Working through this example may clarify the roles of F as an accumu-

lator containing a sum of scaled code elements and T as a pointer

containing a scale factor. For example, partial sum F3 = -1864 reslllts

when the code element of the third symbol, C(x3) = C(1) = .O1 , is mul-

tiplied by the scale factor from the previous iteration, T2 = 6.41)-1 =

.64 , and the product .0064 is added to the previous sum F2 = .18 . The new pointer T3 = 5.1D-2 results when the previous scale factor,

T2 = 6.4D-1 , is multiplied by the probability of the third symbol,

Q(x~) = ~(1) = .08 , and the product 5.12D-2 is truncated to two signi-

ficant digits. After the tenth iteration, step FVE-4 sets L = 3+1 = 4 . Step FVE-5 sets W = .2198 and the codeword transmitted is 2198.

Let us analyze the properties of Algorithm FVE. We begin by build-

ing the necessary theory to show that the codeword can be expressed in L

digits, that the message may be decoded correctly, and that the codeword

set is proper.

Boundary (27) and recursion (29) imply that F is a sum of scaled N

code elements

The truncation in (30) implies the bound

A direct consequence of the final truncation and addition (32) is

and a useful relation is obtained by applying (31) and (25),

An essential characteristic of Algorithm FVE is that the scale factor T

after each iteration of step FVE-3 bounds from above the scaled sum of

all subsequent code elements; i.e., if a and b are integers and

1 5 a I b , then

This result is established in a slightly different form by the following

lemma.

Lemma: For any integers a , b , if 1 I a I b then

Proof (by induction on b ) :

(1) If b = a , equality is trivial.

(2) If b > a , the inductive step is taken by the following chain of

inequalities, for the reasons noted below:

where (a) is the inductive hypothesis; (b) follows from bound (23);

(c) follows from the truncation inequality (34); and (d) results when the

term T C (x ) is included in the summation. b-1 b b Q.E.D.

We need assurance that the operations of Algorithm FVE did not cause

a carry overflow out of the most significant (D-I) digit of F , and

that the codeword can therefore be expressed exactly in L digits as in-

tended.

Theorem: Codeword W can be expressed exactly in L digits.

-L Proof: Because W is a multiple of D it is enough to show that

0 I W < 1 . That W 2 0 follows from (35) and the fact that all terms

in (33) are nonnegative. That W < 1 is given by

(a) w I F + D - ~ N

Where (a) is the upper bound of (35); (b) follows from (36) and D 1 2 ;

(c) follows from sum (33); (d) results from Lemma (37) with a = l , b = N ;

(e) follows because (28) and (30) imply To = 1 and T1 = Ql(xl) ; and

(f) is bound (23). Q.E.D.

2.2.2 The decoding algorithm

We have seen the encoding algorithm and some of its properties.

Algorithm FVE maps messages of N source symbols into codewords of vari-

able length L digits. Next we will investigate a decoding algorithm

which maps an infinite string of channel digits into a sequence of mes-

sages.

Algorithm FVD (Fixed-to-Variable Decoding):

(1) Set F + transmitted string.

(2) Set T +- 1 . (3) For i +- 1,2,3, ... N do:


(3b) Set yi + max y : y E IM and Ci(y) 6 FIT 1 . ( 3 c ) Output decoded symbol yi . (3d) Set F +- F - T.Ci(yi) . (3e) Set T + TWQ~(Y~) truncated to K significant digits.

(4) Set L + T + 1 , and begin decoding the next codeword, L channel

digits after the beginning of the present codeword.

In step FVD-3b, only the first J digits to the right of the radix

point of the quotient FIT need be calculated to make the indicated com-

parison, because C is exact in J digits. A brief examination of

elementary long division algorithms shows that this can be done by ex-

amining only J digits of F beyond the least significant nonzero digit

of T . Considering the formats in which F and T are represented,

we see that F need only be known to T + (K-1) + J digits. Thus in

practice it will be unnecessary to fill the entire buffer at step FVD-1;

rather, digits of F can be read in from the channel as they are refer-

enced. 22

Let G denote the value of F after the execution of step FVL)-1 0

and let G denote the value of F after the i-th iteration of step i

FVD-3. By the argument just advanced, only the first digits of G are

explicitly represented in the decoder; the remainder are still in the

channel.

No distinct symbol is needed to denote the values of T during de-

coding, for it will be shown that yi = x i y so that the values of T

determined by FVD-2 and FVD-3e will always equal the values determined

by FVE-2 and FVE-3d. Thus Ti may also denote the contents of T after

the i-th iteration of FVD-3.

Briefly, Algorithm FVD correctly decodes the message by correctly

decoding each symbol one at a time. When one symbol is being decoded,

the sum of subsequent scaled code elements is bounded, as shown by Lemma

(37). This bound is small enough so that the current symbol is decoded

correctly and its exact code element is subtracted, exposing the next

code element for decoding. To clarify this, we will use Algorithm FVD

to parse the codeword 2198 generated in the previous example from the

beginning of a decimal channel sequence and to decode the message cor-

rectly.

Again, let M=6 , D=10 , N=10 , J=2 , and K=2 . As required, the

predictor provides the same independent, identical distribution as used

in the previous example. We decode the channel string

2 1 9 8 3 1 4 1 5 9 .... Step FVD-1 sets Go = .219+ , where + denotes that subsequent digits

have not yet been read in from the channel. Step FVD-2 sets T = 1.OD-0. 0

Table 3 lists the results of step FVD-3.

Table 3. Example of Step FVD-3

Let us examine how the decision Y2 = 3 is made. The quotient

G1/~l = .1198+/.80 is computed to two places, resulting in .14. Since

this is at least as large as C(3) = .10 but not as large as C(4) =

.90 , step FVD-3b sets y = 3 . After the tenth iteration of step 2

FVD-3, step FVD-4 sets L = 4 and begins the next codeword with G = 0

We will next formally prove that Algorithm FVD always correctly de-

codes the message. We will see that channel data after the codeword has

too small an effect on Go to cause decoding errors.

The decoder sees the codeword W followed by the beginning of the

next codeword. Because the decoder cannot immediately determine where

codeword W ends, step FVD-1 cannot set G = w exactly but can only 0

bound it by W S G < W + D - ~

0

Nesting this into bounds (35) on W , we obtain

and by (36) this becomes

2 FN 5 Go < FN + - T

D N '

Since D 1 2 ,

Steps FVD-3b and FVD-3d imply

and

Theorem: Algorithm FVD produces an exact copy of the message,

y i = x for 1 S i S N . i

Proof (by induction on i ) :

Let 1 5 j S N and suppose that the first j - 1 symbols have been

correctly decoded. Given y -1) = ( j -1) , we will show that y = j

X j *

That y1 = x1 will follow with j = 1 . We will first compute

two bounds on the partial sum G then will observe that the decision j-1 '

rule (39) sets Yj = x . The reasons for each link in the chain of in- j

equalities follow afterward. The lower bound is

where (a) follows from recursion (40) and the inductive hypothesis; (b)

follows from the lower bound of (38); (c) results from cancelling terms

in sum (33); and (d) results when some nonnegative terms are dropped. The

upper bound is given by

where (a) follows from recursion (40) and the inductive hypothesis; (b)

follows from the upper bound of (38); (c) results from cancelling terms

with (33) and separating the last term from the sum; (d) follows from

(23); (e) follows from the truncation inequality (34); (f) results from

Lemma (37) with a = j and b = N-1 ; and (g) follows by (34). Thus

we have

Dividing by Tj-l yields

Note that if x = M-1 the upper bound is 1 , by (21) and (22); other- j

wise the upper bound is C (x.+l) , by (22). In either case the decision j J

rule (39) correctly sets y = x . Q.E.D.

2.2.3 Codeword set is proper

We have seen that the decoding algorithm reproduces the message

correctly regardless of the channel data following the codeword, although

it does examine some of that data during the decoding process. We will

next compute a bound on the number of channel symbols examined beyond the

codeword end, and argue that the codeword set is proper.

In the discussion following the presentation of Algorithm FVD, it

was demonstrated that y can be determined by examining only the first N

T ~ - l + (K-1) + J digits of the channel sequence. Since T ~ - ~ I T~ and

L = T~ + 1 , this means that at most L + K + J - 2 channel digits will

be read by Algorithm FVD before the message is completely decoded. L of

these are the codeword; therefore at most J + K - 2 must be saved for

2 7

decoding subsequent messages.

That the codeword set is proper (the code is instantaneous) is

established by considering the following modification to the decoder:

Each time a new digit of G is needed, but before reading any actual

channel symbols, the decoder could append an arbitrary string of J + K - 2

digits and attempt to complete decoding. If the decoder determined that

the codeword extended into the trial string, then another channel symbol

would be read. But as soon as the codeword ended with the last symbol

already read in, the decoder would immediately output the message, which

would be correct by Theorem (41). Thus the fact that Algorithm FVD ex-

amines channel data after the codeword during the decoding process is

only a property of the algorithm. By sketching the design of an instan-

taneous decoder, we have shown that the set of codewords satisfies the

prefix condition and therefore is proper.

2.2.4 Compression rate

Let us develop the tools necessary to examine the expected behavior

of the codeword length L . Lemma: Let Z be a real number satisfying 0 < Z < 1 and let T be

the truncation to K significant digits of the 0-ary expansion of Z . Then

Proof: Represent Z and T in D-ary normalized floating-point form;

i.e., let 00

and

where t. E I and to 2 1 . 1 D

By (a) and (c), in analogy to (25),

Subtracting (b) from (a)

and thus by (c)

Dividing, applying upper bound (e) to the numerator and lower bound (d)

to the denominator,

--r-K+l Z - T - < D - - D1-K z il-r

Subtracting (f) from 1 = 1 ,

T 1-K - > l - D Q.E.D. z

This lemma and recursion (30) establish a lower bound complementary

to the upper bound given by the truncation inequality (34) ,

Define

and

Taking logarithms of (43) yields

. li . log log, qi(xi) D Qi(xi> + VD(K) .

Roughly speaking, li measures the length of the codeword dedicated

to the i-th symbol from the source, and VD(K) is an upper bound to the

length of the codeword wasted by each truncation in step FVE-3d.

Table 4 shows typical values of VD(K) . For example, V (16) = 2

4.40 X means that with a double-precision implementation on an

eight-bit binary microprocessor, less than one ten-thousandth of a code-

word bit is wasted for each source symbol by using finite precision

arithmetic. Similarly, V16(6) = 3.44 x means that with a standard

precision 6-digit floating-point hexadecimal representation such as used

on the IBM 370, less than one one-millionth of a hexadecimal channel

digit is wasted per source symbol.

Table 4

Typical values of v~(K) = - logD (I-DI-~)

(hexadecimal (hexadecimal (blKts) 1 (:::I; digits) 1 v l ~ ~ ~ ~ t s )

Theorem: The function VD(K) = -log D (1-Dl) decays exponentially with

K . That is, for any c > 1 there exists a Kc such that K 2 K C

implies

(A) D - ~ I v (K) 5 c (A) D - ~ ,

t D t Ln D

(a) (b)

Proof of (a): Recall the well-known bound, for all z > 0 ,

R n z z - 1 .

Thus

Let z = I - D - ~ . Observe that z is positive if K > 0 , which is assumed. Substituting

for z ,

( ) D - 5 VD(K) . 9.E.D. (a).

-CZ Proof of (b): Let c > 1 and g(z) = 1 - z - e . Then gf (z) =

-1 + ce-" . Note that g(0) = 0 , g(z) is continuous, and g1 (0) =

c - 1 > 0 . Thus g(z) is increasing as it passes through the origin.

Therefore there exists a z > 0 such that z E [O,zc] implies g(z) 2 0. C

-

By definition of g(z) this implies that for z E [o,.~] ,

-CZ 1 - z 2 e .

Taking logarithms to the base D ,

Let z = D and K 2 1 - log z . If K 2 K then z r [0,zC] and c D c c

1-K cD log (1-D1-K) 2 - .

D ' Ln D

Reversing signs, K 2 K implies C

vD (K) s c (O) dK . Q.E.D. (b) &n D

We are now in a position to compute bounds on the rate. By (18),

(46) becomes

By ( 6 ) , summing this relation over i = 1,2,3, ... N yields

Codeword length L is given by

(a) L = 'rN + 1

(el = ( l o D T i-1 - log D T i )I + 1

i= 1

where (a) is (31); (b) follows from (25); (c) is the symmetry between

ceiling and floor; (d) follows because To = 1 in boundary (28) ; in (e)

the expression is written as a telescoping sum; and (f) follows from the

definition (44) of li . Because z 5 r z l < z + 1 , (50) implies

Nesting this into (49) implies

Taking expectation yields

Dividing by N yields the desired bounds on the rate.

Theorem: Algorithm FVE achieves a compression rate bounded by

where N is the message length, K is the precision of the internal

register T , and VD(~) = -logD (1 - D l ) is an exponentially decreas-

ing function of K . Proof: This is a direct consequence of the preceding development, and

results when each term in (51) is divided by N . Q.E.D.

By making N sufficiently large, R can be made arbitrarily close

to the average entropy per symbol of the first N symbols. Next we

shall see that this can be done with reasonable computational complexity.

33

2.2.5 Computational complexity

Let us investigate how the computational complexity of algorithms

FVE and FVD grows with precisions J and K and block length N . Step FVE-1 requires a small fixed amount of work once per block.

Even though a large storage area is available for I? , this step need

not clear it if pointer is kept to indicate the end of the occupied

area. The memory can be cleared as 4 is advanced when step FVE-3c

needs more storage, as will be investigated. Step FVE-1 need only ini-

tialize fl = 0 and = 1 ; the complexity of this is O(1) . Step FVE-2 initializes the K significant digits of T and its

exponent field T ; the complexity is thus O(K) . Step FVE-3 is executed N times. We compute the complexity of each

iteration. Step FVE-3a requires that 2MJ digits be loaded, although in

some applications this can be reduced. If the source outputs are inde-

pendent and identically distributed (IID), step FVE-3a could be eliminated

entirely, with Q and C held constant. For a finite state Markov

source, several Q and C arrays could be fixed and a pointer switched

to the correct arrays for the state of the source. But for this worst-

case analysis, we shall assume the complexity of FVE-3a is O(MJ) . Step

FVE-3b is trivial. Step FVE-3c requires two stages; the first is calcu-

lation of a J + K digit product from J-digit and K-digit factors. If

J and K are the computer word size, this can usually be done in one

operation. A straightforward multiprecision approach would require JK

operations, although Knuth [1971] has shown more efficient procedures.

We shall assume a complexity of O(JK) for this stage. The second stage

requires the addition of the J + K digit product to a multiprecision

number, requiring J + K operations for the initial addition and a few

more to propagate the carry. That the carry propagation is negligible

is established by the following argument: At worst, suppose the initial

addition always generates a carry. F becomes the codeword of an effi-

cient source code; therefore its digits are nearly uniformly distributed

among ID . Propagation of the carry through n additional places re-

quires that n consecutive digits of F have value D-1 ; this occurs

with probability D-n . Hence the additional distance of carry propaga-

tion is approximately geometrically distributed with mean 1/D , which

is small compared to J+K . If pointer @ is used to keep track of

the occupied area of F , it need be advanced no more than J digits

from the previous iteration. This is because T is set by FVE-3d and

Q(xi) is at least D - ~ . Therefore the overall complexity of step FVE-3c

is the complexity of the multiplication, O(JK) . Step FVE-3d requires

that another J +K digit product be calculated; this complexity is also

taken to be O(JK) . The net complexity of step FVE-3 grows as the sum

of the complexities of its parts, for a total of O(MJ + JK) per itera-

tion. For N iterations this becomes O(NJ(M+K)) . Step FVE-4 is trivial.

Step FVE-5 simply requires that pointer @ be moved, a digit incre-

mented, and a carry propagated. For the argument established before,

this represents a negligible amount of work.

The computational complexity of Algorithm FVE is summarized on the

next page.

Step Complexity

FVE- 1 O(1)

FVE-2 0 (K)

FVE-3 O(NJ(M+K))

Per iteration: O(J(M+K))

FVE-3a O(MJ)

FVE-3b O(1)

FVE-3c O(JK)

FVE-3d O(JK)

FVE-4 0 (1)

FVE-5 0 (1)

Total O(NJ(M+K))

When J and K are one computer word length, and the source model

is such that the Q and C arrays do not have to be computed for each

symbol, then only two single-precision multiplications and one single-

precision addition per symbol perform the encoding functions of step FVE-3.

Algorithm FVE is fast because these are easy operations; supportive exper-

imental results will be discussed in Chapter 4.

We shall now see that decoding is no more complex than encoding. The

complexity of Algorithm FVD grows as for Algorithm FVE because the only

step in Algorithm FVD without a corresponding step in FVE is FVD-3b. We

examine its complexity. As was noted, the division F/T only needs to be

carried out to J digits and the remainder dropped. Dividing a J + K

digit dividend by a K digit divisor to obtain a J digit quotient re-

quires O(JK) operations. The search for y can be conducted as a binary

search when M is large, requiring log M comparisons of at most J

digits each. The search therefore requires O(J logM) operations. Since

FVD-3b is iterated N times, its total complexity grows as O(NJ(K+logM));

this is absorbed into the complexity of the remaining steps for an overall

3 6

complexity of O(NJ(K+ M)) , the same as for Algorithm FVE.

The complexity of encoding, transmitting, and decoding a block is

thus O(NJ(K+M)) . For constant J , K , and M , this grows linearly

with N .

CHAPTER 3

VARIABLE-TO-FIXED CODES

3.1 Problem Statement and History

For variable-to-fixed (VF) codes, the rate expression (1) becomes

where codeword length L is fixed. The optimal VF code maximizes the

expected message length EN . We shall see that the problem of finding optimal VF codes for con-

ditional sources is much more difficult than the problem of finding optimal

VF codes for memoryless sources.

Tunstall [1968, quoted in Jelinek and Schneider, 19721 showed an

optimal design procedure for selecting message sets for VF codes for

memoryless sources. His algorithm is in a sense a dual to Huffman's FV

algorithm, and its computational complexity is comparably large when the

number of messages is large.

Lynch [1966], Davisson [1966], and Schalkwijk [1972] exhibited a

VF coding algorithm which does a good job with typical sequences from a

memoryless source, but as pointed out by Cohn [1976] is suboptimal because

of its treatment of atypical sequences. Essentially a dual to Elias

coding, their algorithm suffers from the same computational complexity. A

codeword is a sum of binomial coefficients. For long messages, the bi-

nomial coefficients become very large, and there are many of them in the

sum. The computational complexity grows as the square of the message

length, rendering the algorithm impractical.

There does not appear to have been any published solution to the

problem of selecting optimal message sets for VF codes for conditional

3 9

sources. (That Tunstall's procedure is not always optimal for conditional

sources is shown by the example in Appendix B. Code set A was dclsignc.ct

by Tunstall's algorithm and is suboptimal.) Jelinek and Schneider [ 1 9 7 4 ]

have considered VF codes for Markov sources, but their goal is mini-

mizing the probability of error due to buffer overflow rather than

maximizing expected message length. That VF codes for conditional

sources have not yet been theoretically explored as fully as FV codes

is understandable because their analysis is much more complicated.

Because of the fixed message length N of FV codes, the j-th

message always begins at time jN + 1 . When codewords are assigned to

the j-th message, it does not matter how codewords were assigned to the

first j-1 messages. The code designer is constrained to a given par-

sing of the source string into messages and may therefore independently

code each message.

With VF coding, however, the time when each message begins depends

on the source string and the message sets from which previous messages

were selected. A naive designer might choose a message set which maxi-

mizes the expected message length of the first message but causes the

second message to begin at a time when all possible codes are inefficient.

A simple example of this interaction appears in Appendix B and will be

discussed when the VF algorithms of Section 3.2 are analyzed.

We will begin by heuristically modifying the FV algorithms of

Chapter 2 for VF coding. Then we shall analyze their performance and

argue that while not necessarily optimal they are efficient and asymp-

totically approach the optimal, which is possible because computational

complexity grows linearly with the codeword length.

4 0

3.2 The Variable-to-Fixed Algorithms

With only minor modifications, Algorithms FVE and FVD may be con-

verted for variable-to-fixed coding. The data structures for the VF

algorithms will be the same Q , C , F , and T as were used for the

VF code, and they will satisfy (18) - (26). The symbol-by-symbol en-

coding is unchanged; the difference lies in how the decision that a block

is complete is made. In the FV algorithms a block was complete after

N symbols had been encoded. But in the VF case it is desirable to en-

code as many symbols into a codeword of fixed length L as will fit.

Because the stopping decision will be the same in both the encoder and

decoder, and because the decoder only has access to previously decoded

symbols, it is necessary that the encoder decide whether to accept another

source symbol for encoding based only on symbols already encoded and not

on the new candidate. Under the interpretation of F as an overlapped

concatenation of code elements and T as a pointer, a new symbol may

only be accepted if T is sufficiently far from the least-significant

end of the allocated codeword so that even the longest code element will

fit. Alternately, interpreting F as a sum of scaled code elements and

T as a scale factor, a new symbol may only be encoded if T is large

enough so that the scaled code element of any new symbol can be accurately

represented within the L digits allowed.

Under either interpretation, a new symbol is accepted if T is not

below some threshold. For speed, only the exponent field T of T is

examined to make the comparison. This decision is represented by the

"while" clause in the encoding algorithm.

Algorithm VFE (Variable-to-Fixed Encoding):

(1) Set F +- 0 . 4 1

(2) Set T +- 1 . (3) While T < L - J do:


(3b) Get symbol xi from the source.

(3c) Set F + F + T.ci(xi) . (3d) Set T + T.Q.(xi) truncated to K significant digits.

1

-L (4) Truncate F to L digits and add D (i. e., set F +- LD~F +d ) . (5) Transmit the L digits of F . (6) Go to (1) and begin encoding the next message.

As in the case of the FV algorithms, we attach distinct labels to

successive values taken on by variables in Algorithm VFE, letting Fi and

Ti denote the values attained by F and T respectively after the i-th

iteration of step VFE-3. Let N denote the total number of iterations,

and hence the message length. Now N is a function of the random source

data X . Even so, relations (27) - (30) and (32) - (35) are still valid,

as is Lemma (37) and its proof. The dependence of codeword length L on

'rN given by (31) no longer applies, however. Instead, the threshold

L - J for T determines N . We will investigate why this threshold

guarantees correct decoding. Recall that bounds (38) were used to prove

correct decoding for the FV code. In the VF case, however, we cannot

control the codeword precision; it is fixed at L digits. Therefore to

satisfy (38) the VF encoder must insure that TN is not too small.

Since the encoder must decide to accept the N-th symbol based on T N-1 '

the following lemma is needed.

Lemma: The final value of T is related to the previous value by

-J with equality if and only if QN(Q = D . Proof: Recall recursion (30) and that by (19) and (20) Q (x ) is a

N N

positive-integer-multiple of D - ~ . There are two cases depending on

whether the integer is 1 or more.

-J -J Casel: If QN(XN)=~ then T N = D T exactly. The truncation N- 1

in step VFE-3d has no effect because QN(xN) is a power of D . case2: If Q (x) > 2 ~ - ~ , then

N N

where (a) is the lower bound of (43); (b) is the case 2 premise; and (c)

DK- 1 follows because D 2 2 and K 2 1 imply 2 2 and hence

(I-DI-~) 5 112 . Q.E.D.

With the aid of the above lemma we may calculate absolute bounds

on T . N

Theorem: The final value of T is bounded,

Proof: The last iteration of step VFE-3 began because L-J . With

n = L - J in (26) this implies

Substituting this into (54) proves the lower bound. Step VFE-3 declined

to begin an (N+l)st iteration because T 2 L - J . With n = L - J in N

(24) and (25), this implies the upper bound. Q.E.D.

The lower bound of (55) and the fact that D - ~ < D-~" imp 1 y

which by (35) implies

Once again W can be expressed in L digits; the proof is unchanged

except that step (b) now follows from (57) . The VF decoding algorithm is obtained by modifying Algorithm FVD

in the same way as Algorithm FVE was modified to obtain Algorithm VFE.

Algorithm VFD (Variable-to-Fixed Decoding):

(1) Set F + The first L channel digits.

(2) Set T + 1 . (3) While T < L - J do:


(3b) Set y. = maxiy: YE I and Ci(y) 5 FIT} . 1 M

(3c) Output the decoded symbol y i

(3d) Set F + F - T.Ci(yi) . (3e) Set T +- T.Q.(yi) truncated to K significant digits.

1

(4) Got to (1) and begin decoding the next codeword, L channel digits

after the beginning of the present codeword.

Let G denote the value of F after the i-th iteration of VFD-3. i

Since the codeword length is constant, step VFD-1 sets Go = W exactly.

By (58) this means F I G < F + T N O N N

which confirms (38) for the VF code. The decoding rule (39) and recur-

sion (40) apply directly.

The correct-decoding theorem (41) and its proof hold for the VF

code also, with the additional verification that the decoder outputs N

symbols, the correct number. This is because the decision that a block

is complete is based on the same values of T in encoder and decoder.

The rate definition (1) becomes for a VF code

We will explore the properties of EN . As for the FV code, define

ti and vD(K) according to (44) and (45). Then (46) follows directly.

For initial analysis suppose that the source is memoryless (single-state).

In this case the li are independent, identically distributed (IID)

positive-real-valued random variables. We pause to develop a needed

theorem.

Consider a random process {Ll,L2,L3, ...) where the & are IID i

positive-real-valued random variables. Define the maximum and minimum

possible values of L i '

1 max =sup{.: p(Lit z) > 0)

and L = infiz: p(Li5 z) > o ) . min

These definitions allow (but do not require) random variables to be i

continuous. Define the cumulative independent-increment process

The time to reach some fixed positive threshold L' ,

is an integer-valued random variable determined by the random process.

We seek bounds on the expected value of N . Lemma: L' < E L N 6 L 1 + L . max (60)

Proof: We prove the stronger result L' < LN 5 L' + Lax . That

L' < LN follows by definition of N . The upper bound is proved by con-

tradiction. Assume that for some trial LN > L' + lmax . Then

L ~ - l 2 LN - L > L' , which contradicts the definition of N . Q.E.D. max

Lemma: ELN = EN EL . (61)

Proof: [~ald, 1947, Section 3.51 The cumulative process L will take

the longest to reach the threshold when all steps attain their i

minimum value. In that case, by definition of N ,

Smaller values of N result when some of the L exceed Lmin i . In

general,

Let m = L'lLmin + 1 so that N 6 m . Then 1 J

where (a ) is because t h e ti a r e I I D ; i n (b) t h e sum i s separa ted ; (c )

i s by l i n e a r i t y of expec ta t ion ; (d) i s by d e f i n i t i o n of LN ; (e) fo l lows

from condi t ion ing on N knowing 1 I N I m ; ( f ) i s because g+1 through 1 a r e independent of L1 through L (g) i s because t h e

m j '

a r e I I D ; (h) i s a r e f a c t o r i n g ; ( i ) fol lows because t h e f i r s t sum i s 1 i

and t h e second sum i s t h e d e f i n i t i o n of EN . Adding EL EN - m EL t o

both s i d e s completes t h e proof. Q.E.D.

Theorem: For t h e random process descr ibed i n t h e t e x t , t h e expected va lue

of N i s bounded by

Proof: Substituting (61) into (60) yields

Dividing by E [ gives bounds on EN ,

L' - < E N S L'+ 'max EL Q.E.D.

~1

We now apply this result to finding the rate of the VF code for

memoryless sources. Taking expectation on (42),

HD(Xi) ~e~ < H~(X~) + VD(K) .

Let 1 = 1 max

+ VD(K) = logD - + VD(K) = J + VD (K) . logD iiGTq

D -J

Recall from (50) that

With the upper bound of (55) this implies

Similarly by (56)

Therefore setting L' = L - J - 1 satisfies (59) . Substituting this,

(63), and (64) into (62) yields

Theorem: Algorithm VFE achieves with memoryless sources a compression

rate bounded by

L L - H (X.) I R < ----- L-1 D 1 L-J-1 (67)

where L is the codeword length, J is the number of digits in the

D-ary representation of the source probabilities, HD(Xi) is the

entropy of each independent, identically distributed source symbol,

K is the number of digits of precision in the internal register T ,

and VD(K) = -10~~(l-~l-~) . Proof: This is a direct consequence of the preceding development,

obtained by dividing all terms in (66) by L and inverting. Q.E.D.

Because the necessary theory for calculating the rate of a VF

code for a conditional source (a source with memory) is not yet

fully developed, we shall not attempt to find any bounds on the

rate that Algorithm VFE achieves for conditional sources. It would

also be meaningless to try to compute the expected length of the

first message, because as shown by the example in Appendix B,

maximizing the expected message length for every history may be a

suboptimal strategy. Instead, we will examine how effectively

Algorithm FVE uses each codeword to contain information.

Recall that X denotes the infinite random source string to be

emitted, and B denotes a complete and proper message set. Because

exactly one message b(X)EB prefixes X , the message set B parti-

tions the range of X into I B I categories. We are to transmit one

codeword W from a set of equal-length codewords in a one-to-one corres-

pondence with B . Because there is a fixed cost associated with the

transmission of W , we want to maximize the information contained in W

about X , or I(W;X) . Because information is symmetric [~sh, 19651,

this quantity equals I(X;W) which in turn equals I(X;b(X)) because

codewords and messages are in one-to-one correspondence. But I(X;b(X))=

H(b (X)) - H(b (X) I X) , where H(b (X) I X) = 0 because b(x) is uniquely

determined by x for all infinite strings x . Therefore we want to

maximize H(b(X)) . This is done by selecting B so that all messages

are as nearly equiprobable as possible.

Suppose message b of length N(b) is encoded. Then

With this, (42) implies

I " < log D P (bi 1 b (i-1)) + vD(K)

Summing over i = 1,2,3,. . .N(b) with (6) yields

By (65) this becomes

5 0

where N is the length of the longest message. Exponentiating and re- max

arranging terms yields

We will now show that there is no VF code with a codeword length

less than L - J -1- Nmax V (K) which could encode the same message set D

as Algorithm VFE (or any other message set which conveys as much informa-

tion).

Theorem: The message entropy achieved by Algorithm VFE is bounded below

by HD(b(x)) > L - J - 1 - N VD(K) max

where L is the codeword length, J is the number of digits in the D-ary

representation of each source conditional probability, N is the length max

1-K of the longest message, and VD(K) = -logD(l-D ) decays exponentially

with the number K of significant digits in register T . Proof: We have

> C P(~)(L-J - 1 - N v~(K)) max be B

= L - J - 1 - N vD(K) max

where (a) is the definition of message entropy (9); (b) follows from (68)

and the monotonicity of the logarithm function; and (c) results from

factoring and replacing the sum of the probabilities with 1. The behavior

of VD(K) is given by (47). Q.E.D.

Note that by letting L become very large, the entropy per codeword

symbol,

approaches 1 digit; thus Algorithm VFE is optimal in the limit.

CHAPTER 4

EXPERIMENTAL LMPLEMENTATION

To demonstrate the speed of Algorithms FVE, FVD, VFE, and VFD, they

were implemented on an Interdata model 7/16 minicomputer. This is a

microprogrammed machine with a 16-bit word length and average instruction

execution time of about two microseconds. We will not go into detail

about the machine's architecture and instruction set as these are covered

in the manufacturer's documentation [~nterdata, 19711. For compatability

of nomenclature with their larger machines, Interdata calls the 16-bit

word a halfword; this convention will be followed here.

Parameters chosen for the experimental implementation were selected

both to simplify programming and to accurately reflect a typical applica-

tion. A binary channel (D =2) is frequently encountered in practice;

since binary arithmetic is natural to the host machine this parameter was

chosen. The precisions J and K of the probabilities Q and pointer

T were both chosen to be 12 bits. It seems unlikely that any source

would have a probability distribution known more accurately than to one

part in 4096. That V2(12) = 7.05 x insures that keeping 12 bits

of pointer T is sufficient to limit codeword waste to under one one-

thousandth of a bit per symbol. Conveniently, 12-bit numbers fit within

the 16-bit halfword without occupying the most significant bit, which is

used as the sign bit of two's complement notation for signed numbers.

Although "Multiply Halfword Unsigned" is a standard microcoded instruc-

tion, there is no corresponding divide instruction which does not use

two's complement notation.

Although Interdata specifies a floating-point hexadecimal format,

the minicomputer used in this experiment has no hardware or microcode

to perform floating point arithmetic. When conventional floating point

operations are needed, they are normally performed by subroutines in the

operating system. Because these subroutines provide many features not

needed in the coding algorithms (recall that T is always positive and

at most one), they are slow. Consequently a simpler format was chosen

to represent pointer T . Two halfwords are allocated. The first con-

tains the 12 significant bits of T right adjusted, and the second con-

tains the binary representation of the exponent field.

The Q and C arrays are allocated as described in Section 2.2.

Each element occupies the 12 least significant bits of one halfword.

The multiprecision accumulator F is implemented as a large array,

packed 16 bits per halfword.

For ease of programming, expecially for communications with the ex-

perimenter, the test programs were written in FORTRAN. However, the key

steps involving arithmetic with Q , C , F , and T are not easily

coded in FORTRAN because of the format of these numbers. Special FORTRAN-

callable assembly-language subroutines were written to perform these

steps.

Program listings and results appear in Appendix C. The first four

listings are the assembly language subroutines. Subroutine NEXTT performs

the multiplication and truncation in step FVE-3d. Subroutine SCADD per-

forms the multiplication and addition in step FVE-3c. Subroutine SCGET

returns the quotient in step FVD-3b. Subroutine SCSUB performs the multi-

plication and subtraction in step FVD-3d. Then two FORTRAN-coded routines

5 4

are listed. DUMP provides a formatted hexadecimal list of the first N

halfwords beginning at a specified address. DUMPV is similar but the

number of digits allocated per halfword depends on the maximum value to

be represented.

The subroutine listings are followed by Program BFV, a demonstration

of Algorithms FVE and FVD for Bernoulli (memoryless binary) sources. The

bulk of the code provides for interactively obtaining the source statistics

and message length form the experimenter. The program then uses a system-

supplied random number generator to generate a source sequence according

to the desired statistics. Comments distinguish the steps of Algorithms

FVE and FVD. Following the program listing is a sample printout.

Program MFV is an extension of Program BFV to encode first order

Markov sources (sources whose output is a Markov chain) with arbitrary

alphabet size. A sample printout follows the program listing.

Program VF is a demonstration of Algorithms VFE and VFD for memory-

less sources with arbitrary alphabet size; again it is followed by a

sample printout.

Each of the three programs encoded several million source symbols

and decoded them without error. The execution time of the programs was

found to grow linearly with the message length, with typical time being

30 seconds to encode, decode, and compare a message of 30,000 symbols.

The speed of 1000 symbols per second approximately applied to all three

programs; however, a large alphabet slowed the latter two because a sub-

optimal linear search in step FVD-3b was selected to simplify programming.

The author's experience indicates that for the Interdata 7/16,

programs written in FORTRAN run much slower than programs written in

assembly language, with a speed ratio as great as ten or more. It is

5 5

suspected t h a t a c a r e f u l l y engineered assembly language implementation f o r

a s p e c i f i c a p p l i c a t i o n could y i e l d an even g r e a t e r i nc rease i n speed over

t h i s demonstration.

CHAPTER 5

GENERALIZATION

The algorithms explored in this paper may be considered special

cases of a family of algorithms in which the codeword is a sum of code

elements which have been scaled or shifted so that they do not interfere

with one another when added. There is a one-to-one correspondence between

source symbols and code elements, and each code element has associated

with it a measure of its precision or length used in computing the scale

factors.

To illustrate this structure, consider a simple classical Huffman

coding problem. Suppose a source emits symbols 0 , 1 , and 2 with

probabilities 1 , 4 , and 114 which are assigned code elements 0 ,

10 , and 11 respectively. The code string for the sequence 01201 is

01011010, where the code elements have been concatenated.

Any concatenation is equivalent to displacing each code element to

the right of its predecessor by a distance equal to the length of the pre-

decessor, or to displacing the codestring left by a distance equal to the

length of a new code element in order to make room for it. In the gen-

eralized case where code elements need not have integer "lengths" these

two displacement modes are not equivalent. We consider strings of D-ary

symbols to be radix-D numbers and introduce a radix point as a reference.

Shifting a number left by p places then corresponds to multiplying by

DP ; negative p implies right shifts. We illustrate with a decimal

(D=10) example. Suppose that we wish to concatenate the numbers 2 and

5 . If the distance p between them is to be one place, we could multiply

2 by 10' and add 5 , giving 25 . Or we could start with 2 and add

5 7

5 X 10-I , giving 2.5 . But now suppose that the numbers 2 and 5

are to be separated by p = 1% digits. The first method yields

2 x lo1$ + 5 = 68.24555 , but the second method yields 2 + 5 x 10-14 =

2.158114 , which differs markedly from 68.24555 . A machine implementation requires that, in addition to a table of

code elements and their lengths, two registers be provided. The accumu-

lator contains the running sum of scaled code elements, and the pointer

provides the scale factors indicating the displacements of the code ele-

ments in the accumulator. To illustrate these definitions, consider

Algorithm FVE. Here array C contains the code elements. The values of

Q may be called the precisions of C because C(x) + z where

0 I z < Q(x) uniquely specifies x . F is the accumulator and T is

the pointer. We will represent T in fixed-point form so that its use

as a pointer is clear. Set M = 3 , D = 2 , N=5, J=2, K = 1 and encode

the same string 01201:

We set L = 8 + 1 = 9 and the codeword is 010110101 . Coincidentally,

the ~(x) are the Huffman code elements and the code string is the con-

catenated Huffman code string with a trailing 1 attached.

It is not necessary that the displacement of code elements be accom-

plished as in algorithms FVE and VFE. In fact there are eight types of

arithmetic coding algorithms in the family as a result of three choices.

First, instead of shifting the code element, it would be possible to

shift the contents of the accumulator in the opposite direction. Second,

the code elements could be added to the left (most significant) end of

the accumulator instead of to the right. Third, the pointer can contain

either the actual scale factors as in this work, or their logarithms. In

the latter case, the pointer is moved by adding the displacement, and the

scale factors are obtained by exponentiation. Table 5 summarizes the en-

coding algorithms belonging to the family.

One algorithm of the family was developed independently by Rissanen

[1976]. It shifts the code element to the most significant end of the

accumulator, using a pointer obtained by addition and exponentiation. We

shall now compare the alternatives in the three choices, and see that it

is preferable to shift the code element rather than the accumulator, and

to add code elements to the least significant end of the accumulator.

Scaling the new code element is preferable to scaling the accumulator

because a smaller volume of data must be manipulated. Code elements are

of fixed length J digits, but the significant length of the accumulator

grows until it reaches the codeword length L . If code elements of

length J are scaled by a pointer of length K digits, each of the N

multiplications is between a J-digit and a K-digit factor. If the accu-

mulator is scaled, however, the N multiplications grow in complexity as

5 9

Table 5

Encoding Algorithm Summary

Let p = probability of symbol i i

F = Accumulator T = Scale factor pointer

or L = Logarithmic pointer

Initialize F = 0 1::: I Shifting the Code Element

A. Right 1. By multiplication (Elias, Pasco)

F = F + T.C i T = T-p i

2. By exponentiation

F = F + D - ~ * c ~ L = L + L i

B. Left 1. By multiplication

T = T/p i

F = F + T-C i

2. By exponentiation (Rissanen) L = L + L i

I1 Shifting the Accumulator A. Left

1. By multiplication F = F + C i F = F/pi

2. By exponentiation F = F + C i

-ti F = F + D

B. Right 1. By multiplication

F = F .

F = F + C i

2. By exponentiation -Li

F = F O D

the length of the accumulator increases. The last multiplication involves

L-digit and K-digit factors. Thus the complexity would grow as O(NL) ,

proportional to the square of the message length. Because of this intol-

erable complexity, we shall hereafter assume that the code element is

scaled.

We say a codeword is grown to the right if successive code elements

are added to the least significant end of the accumulator, and grown to

the left if they are added to the most significant end. We will contrast

these two techniques.

With codewords grown to the right, the radix point is either at the

left end of the accumulator or a fixed distance from it. With codewords

grown to the left, the reverse is true. This difference is important at

decoding time. The decoder sees an infinite string of channel digits,

from which it must load the accumulator with a codeword correctly justi-

fied for the ensuing arithmetic operations of decoding. Fixed length

codewords present no problem if the order of transmission (e.g. most

significant digit first) and the radix point location are agreed upon.

With variable length codewords, however, the problem depends on how the

codeword was grown with respect to the radix point. If the codeword is

grown to the right and transmitted most significant digit first, then the

radix point may be placed and the decoding begun before the end of the

codeword is known, as in the case of algorithm FVD. If the codeword is

grown to the left, however, problems result. The most obvious is that

there is no way of determining the number of digits which are to be placed

to the left of the radix point. But the problem is much more subtle and

extends even to VF codes.

Decision rules for decoding involve comparing the (possibly scaled)

magnitude of the accumulator with some other quantity. The most signifi-

cant digits of the accumulator will most heavily influence the results

of the comparison. This dictates that the symbols encoded in the most

significant digits of the codeword be decoded first, so that their heavily

weighted code elements may be subtracted. Codewords grown to the left

must therefore be decoded in the reverse order from which they were en-

coded, while codewords grown to the right may be decoded in their original

order. In order that the leftmost code element may be decoded correctly,

its scale factor must be known. For codewords grown to the right this

scale factor is as initialized by the algorithm; in algorithm FVE it is 1 .

But for codewords grown to the left the scale factor of the leftmost code

element is a function of the preceding source symbols in the message,

which are unknown to the decoder. Therefore whenever a codeword grown to

the left is being used, the scale factor of the most significant code

element must be transmitted in addition to the codeword itself. In other

words, the contents of the pointer as well as the accumulator must be

transmitted. Typically, about log N additional digits are required D

for the pointer. This necessity, which Rissanen recognized, limits the

compression rate achievable by such algorithms.

There is another reason, perhaps of greater significance, for pre-

ferring algorithms which grow codewords to the right. Recall that a

necessary assumption in the discussion of the operation of the predictor

for conditional sources is that the predictor in the expander, like the

predictor in the compressor, provides the probability distribution for

the current symbol conditioned on all previous symbols. But we have seen

that codewords grown to the left must be decoded in the reverse order

62

from which they were encoded, and thus the previous symbols are unavail-

able to the predictor in the expander. This prohibits using of algorithms

whose codewords grow to the left for compression of data from conditional

sources.

There does not appear to be any fundamental reason to select either

the multiplicative or additive-with-exponentiation technique for moving

the pointer, and practical reasons may dictate the selection. Many com-

puters have hardward which can quickly perform the multiplications used

in this paper, but some do not. Rissanen has shown that the exponentia-

tion can be quickly done by table look-up, making this technique

competitive.

APPENDIX A

EFFECTS OF OUANTIZATION OF PROBABILITIES

We are to investigate the effect on compression rate of the necessary

quantization of the source conditional probabilities. To simplify the

notation we will concentrate on some specific time index i and some

specific history x i - 1 . We may then define

and p(xi) = p(xi1x(i-1)) . (A2

Under these assumption, if both (18) and (20) are satisfied, then

(48) implies 1 ti log -

D p(xi)

-J Suppose now that p(x.) is not a multiple of D for some xi& IM.

1

Then (18) is not satisfied and (48) no longer applies. Let .eli denote

the contribution of x to the codeword length under these conditions. i

BY (461,

The extra codeword length because of the quantization is bounded above by

subtracting (A3) from (A4) ,

The expected value

is the Kullback-Liebler [1951] distance from distribution p to distri-

bution q . Kullback and Liebler have shown that ID(p,q) 2 0 with

6 5

equality if and only if p(xi) = q(xi) for all x. E IM . 1

We have already seen that V (K) decays exponentially with K . We D

will now see that the precision J of q(xi) sufficient to bound

P (xi) logD --- arbitrarily closely from above grows only as the logarithm

q (xi>

of the inverse of the bound, and then show by example that a much smaller

precision is sometimes sufficient to bound the expected value ID(p,q) . Theorem: For any probability distribution p and any real number 6

satisfying 0 < 6 < 1 , there exists an integer J given by

and a quantized approximation q to p satisfying

and

q(x ) r 0 for all xi€1 i M '

- J q(xi) is a multiple of D for all x.€IM ,

1

P (xi) such that < 6 for all x. E I

1 M ' (All)

Proof : Consider the interval [D-6p (Xi) , (xij] . The width of the

interval is (1 - D ) x i . I£ J = + l o g D d + l , then 1 by Lemma (A12) appearing after this proof,

and hence

Therefore there is some multiple of D - ~ in the interval. Set q(xi)

equal to the largest such multiple. Then

-J Inorder that the q(xi) maysumto 1 , multiplesof D maynowbe

arbitrarily added to various q(xi) until q(Xi) = 1 . ~~t~ that

x i E IM

this action preserves the bound (All). Q.E.D.

We now prove the lemma required in the previous proof.

Lemma: If D 1 2 and 0 < 6 < 1 , then

Proof: let D 2 2 and 0 < 6 < 1. . Let f(6) = D-D1-& - 6 . Then f" (6) = -(ln < 0 . Observe that f (0) = 0 and f (1) =

D-2 2 0 . By convexity f(6) 2 0 for all 6 between 0 and 1 . Thus

D > 6 . Taking logarithms to base D ,

Rearranging terms,

It is seldom necessary that bound (All) be satisfied for all xi€ I*;

rather it is often sufficient to bound only the expected value ID(p,q) . The following example shows that this can sometimes be done with a much

smaller J . Suppose that M = 3 and the source distribution p =

(.015, .015, .970) . We want the minumum J which will allow us to en-

code for a decimal (D =lo) channel with the average quantization loss

- 3 IlO(p,q) less than = 10 . By formula (A5), setting J = 6 would be

sufficient. However, either distribution q1 = (. 01, .01, .98) or

q2 = (-02, .02, .98) does the job with J = 2 , because IIO(p,ql) =

-4 9.62 X and I (p,q2) = 6.17 x 10 . 10

The results of this appendix are summarized in the following theorem.

Theorem: Let 1 be the contribution to the codeword length of the i-th i

symbol x when the source conditional probabilities may be exactly rep- i

resented in the Q array, and let 1' be the contribution of x when i i

the conditional probabilities are quantized to J digits. The codeword

length wasted by quantization may be bounded arbitrarily closely above

where 6 is an arbitrary real number satisfying 0 < 6 < 1 , and VD(K)=

-logD (1 - is an exponentially decreasing function of the precision

K of register T . This is achieved by selecting a large enough K and

by selecting the number of digits J in the quantized probabilities ac-

cording to

J = log - + log D + + l l . 1 pmin

Proof: This results from Theorem (A7) when (All) is substituted into

(A5). The exponential behavior of VD(K) is given by Theorem (47). Q.E.D.

6 8

APPENDIX B

MAXIMIZING VF MESSAGE LENGTH DOES NOT MINIMIZE RATE

Although the optimal minimum-rate VF code maximizes the steady-state

expected message length EN, it does not necessarily follow for a condi-

tional source (a source with memory) that maximizing the expected message

length for every history will achieve this optimum. This surprising re-

sult is illustrated by the following counterexample. Suppose we wish to

encode a binary Markov chain [~eller, 19681 with the transition matrix

into codewords of length one ternary symbol. Define a -- code set to be a

set of codes (as defined in Chapter 1) and a rule for selecting one of

them based on the state of the source, where the state of a source is all

aspects of its history which influence its future behavior. For a Markov

chain, the state is simply the previous symbol. Let code set A assign

the three codewords to the message set {0,10,11 for state 0 but use

the message set {00,01,1) for state 1, and let code set B use the message

set 0 1 0 , l l for either state. The expected message length given state

0 is 1.9 source symbols for either code set; given state 1 it is 1.6

for code set A and 1.4 for code set B. We shall see that despite this,

code set B is better.

We first investigate code set A. According to (6) the message

probabilities are

from which the expected message lengths are, by (7),

A code set induces a subchain of states with one transition of the sub-

chain defined as one message emitted by the source, and the state of the

subchain defined as the last symbol of the previous message [~elinek and

Schneider, 18741. The probability of a transition from state a to state

b is thus the total probability of all messages from state a ending in

symbol b . The state transition matrix under code set A is therefore

The subchain stationary distribution is

Thus the expected message length is

Similarly for code set B,

The surprising result is that although E(NAI0) = ~ ( ~ ~ 1 0 ) and

E(NAI1) > E(NBI1) , the unconditioned expected message lengths are re-

lated in the reverse way, EN < EN B '

and code set B has the lower rate. A

An intuitive justification for this result occurs when the message

and hence code entropies are compared. For state 1 , the message entropy

for code set A is H3(.06, .54, .4) = 0.790 ternary digits, but for code

set B the message entropy is H3(.6, . 2 4 , .16) = 0.858 ternary digits.

As argued in Chapter 3, this means that code set B packs more information

into each codeword.

APPENDIX C

PROGRAM LISTINGS

*+NE:x:TT ++*+*****++**+***********44444***44**4*44*44*4+*44

4

4 :SUEROUT I PIE NE::ITT (T r I:!') 4

4 TH I :5 Zu'EROl-IT I NE PERFORMS THE MUL'T I P L 11163.T I Ot.4

4 T = T * <%> TRI-INCATED 'TU 1? ZIGtJ I F I CHNT P I T

.I) I- ISE~ EB . r . ~ ~ p s F14E-:3n 9 F10:lD-:3~? $'FE-:31l. *p411 '8)F 4

4 CALL I N I ~ SE~I-IENCE:

4 INTEGER*? T (2:) 9 Q I::HLPHHBET 3 I ZE:? 9 ::<:

CHLL NE:j::TT (T r I;! (><::I > 4

4 F ~ R M H T UF RRSUMENTS: + T - FLOHTING POINT SPECIHL F O P M H T ~ TWU HHLFW

4 T1::l'j I~OPITHINS 5 1 ~ p ~ 1 ~ 1 ~ ~ 1 t . 4 ~ FIEL-D? 12 EI.T

+ IrtDJUSTEn 9 I.4 I TH I PIPL. I E I l RHTI I I:.' P O I NT

.* THE 1 ~ T H RNII 1 F'TH PIT ~ ~ o t . 1 RI GWT.

.* T I::?, C O N T ~ INS E:X:PO~.JEPIT FIELD 9 A POSIT I 1,s

4 INTEGER? R I 1SH.T-HDJI-IZTEIr.

4 - ONE HRLFWORD r o N rt=i I 1.1 I t . 4 ~ H F I ::<:ED-F~ I NT

,* F? I QH'T-AnJl-ISTED FRHlZT I ON C l I TH I MFL I

4 POINT LEFT OF P103T S I13N I P4 I SHt.JT F I T

4

****+*+*++++++******4*44***4444********44******44*

EMTP'I' NEXTT 4

4 .?. .2.'fMBOL 1 C REIS I STER ASS I Gt-JMENTS.

+ 12 H EQlJ 1 0 HKI~REZS OF I:!.

TH ElY-I 11 I~DDREZS OF T. TH EaLl 12 MOST s IIGN I F I I Z R ~ J ' T HF~L

T L EIJU 1'3 LEHST : ~ I I ~ C J I F I I Z A ~ ~ T H A T :x: E1Jl-l 14 EXPONENT FIELIB OF T . RTN €121-1 1 5 R E T U R ~ . ~ L I ~ c : : . 4

4 GET PARHMETERS AND MULT I FL'T..

4

NEXTT :ZTM QS9 FTNzfi'y' .~ . - l jE 7. REG IZTERS.

42 LH T L ? 0 (TH:r I ~ E T 3 I G N I F I C A N T F I E L

O04OR 0 [14C:H Pi0 ERPORP

DONE FTN:SH14

* NEXTT NORM I:! A RTPi TFI TH T L TX

52 NORM SLH TH? 1 -. 2.H I FT PPODIICT LEFT.

5 :3 A 1 .5 T ::.( . 1 1 t.lCPEMEtJT E::.::POt.lENT.

5-1 THT T H r X i F 8 0 0 " P 4 a ~ t . i ~ ~ I ZED YET?

57 * 53 -.

.L.F)!.>E RESULT A N D RETI-IRN.

53 6 0 DONE :STY THg 0 (TA> -. ~.RI...~E SIG~.JIFICANT FIE

61 LM QHg FTN:SH$' RESTORE REG IZTEPS.

62 HH RTN r 0 l::RTN> CHLCULATE RETUPN AI~D

.- ., b .-r HR RTN RETURN. 64 + 5 F T ~ { : ~ ~ l ~ v ' '~Z-QA-QI FORTRHN PEG I STEP SAI..?

66 END

4 TH I s SVEROUT I NE PERFORMS THE MULT I PL I CHT I ON

4 AND ADKII T 1 ON F = F + 'T411: c::'::! 4 USE^ BY STEPS F1dE-:3c ~ t . 4 1 1 VFE-~IZ. 4

FORMAT OF HRIZUMENTS: F - M U L T I P ~ E C I ~ I ~ N FRACTION? PACKED 16 B I T S

HHLFWORX3 I N HRRA"i. ? I MPL I ED RAII I X PO I NT

OF MOST S ISNIF ICANT B I T I N MOZT J IS t . I IF1

FIRST::^ HYLFWORII.

T - F L O ~ T I N I ~ POINT ZPECIHL FORMAT? 'TGIO Ht3LFW

T1::l::l C o r r ~ e ~ r t s SIGNIFICANT F I E L n r 12 BIT

HbJUSTEDr WITH IMPL IED RADI:? POINT

THE 1 ~ T H AND ~ Z T H B I T FROM PI15W.T.

T <2? I ~ O N T A INS E::e:PONEt.IT F I E L n 2 A PO5 I T I I...'

I NTEGER? R I GHT-HDJUSTEB.

1; ( X : > - ONE H A L F W O R ~ CONTAI CI I NG A F I .:.::ED.-PO I NT

R I GHT-AbJU5'TEII FRHCT I ON WITH I M F L I

POINT LEFT OF MO'ST SIGN1 ~ . I I I Z A ~ ~ T B I T

4

++*+++*44++4+*+*+4+*444**+44444444*444444444+4444*

ENTRg~' SCHDD 4

4 : ~ ~ M B O L I I= REG I STER ASS I GNMENTS:

+ C A EQU 6 ~ I I I I R E ~ ~ OF C 'I::.::::' . PTR EQlJ 7 POINTER INTO F i34PA"i .

TH EQCl :3 HIGH SII;NIFII=ANT BIT

TM EQLi 3 P~IDISLE S I Gt.IIFICAC4.T Ir

TM 1 EQCl 10 HLTERNATE REGIJTEP F

TL EaIJ 11 LOW SIGNIFIIZHNT BITS

TH EQU 12 ~ D D R E S S OF T. RUNE EQCl 13 COt.45THt.IT ONE.

T :+ Eal-1 14 E::.;PONENT F I E L D OF T. RTQ EQU 15 RETURFI L I t.4~:.

4

4 I?ET PARAMETERS RNb MULT I PL'r' SIIFN I F I CAt.4 T F I EL

4 .Y - :.l-.HDD STm CH 9 FTNSH'y' 'ZA~. .E FCIRTTRH REIF I ZTE

LH TM? 12 (TH, GET SIGNIFIl=Ht.4'T F I E L

+ HDD PRODVIZT TO CORRECT FLHCE IN F r = t ~ ~ f i . v . .

LH TXr 2 (TH::I - I2ET E::.<F'ONENT F I ELD 0 111 12 1 JR 48EC

[I [I [I 2 [I 0 1 SR 1::37E

FFFF 1j01C:F: CE70

On133 002OR C47O

FFFE 111 024R 4H7F

I2 I? 02 13 1?;?:3R 2l;jEF [I O;:AF( C:4E 11

O 111 I? F [II:IE:ER 1:'3E0

0 0 O:3 I:II:I'E:~R 4:3:30

0052R 0036R Z12%, 0O:3SR 1:7EO

FFFF 0 12:f:CR ED:3E

onng 0040R Z:3083 0 0 4 2 k 24B [I 0044F 0:3H9 0 0 4 i P ECHE

F F F 8 [I O4HR ECSE

FFF8 I:IO~ER 61B7

0 o 04 [I 1:152R 4E97

0 0 0'2 111 1:155F 4 [I97

0 0 1:I 2 005AR 4E:37

O I:I O O OO!~IER 40:37

0000 [I[lC,?F( z'3:3Cn

L H I PTRr -1 (TX) .:. -L .~JFTRA~ZT 1

:SRHH PTRr :3 I I.." ID E E: ..I.. ;3 1.4 KI

PTR9 X'FFFE'

PTRr 2 (RTYb

BPS LEFT %: H I

'SHIFT LEFT 1 T O i2 FL

ES FIGHT L I S

LHR SplL

ADD2 TLr O TM19 TM TM 1 7 -3 (TX.1

- bo TWO HALFWURI lF

CLEHR T LOW.

CUP r. TM a b r m

~.HIFT R I I ~ H T TX-I2 FLA

HDD:3 AHM TLr 4 CPTR)

HDDZ HCH TMr 2 (PTR)

HCH THr O (PTR.,

DONE

HDD2 0 0 5 2 ~ ADD3 0 04ER I: H 0 0 06, DUNE OO6ER FTNSHIV' O [17:3R LEFT 0 0 38R PROP O 0gC.R PTR 0007 RIGHT O1142R NINE 0013D FTPi 0 [I [IF

+ SCADD OOOOR TF1 O 0 OC TH 01:10% TL 13 0 !:,I: TM [I [I [I E? TM1 I? 0 111 R T :.:: 1301lE

79 + 3 0 + PROPAGHTE CARPY. :31 + a 2 L I S RONEr l 3:3 PROP :31:s PTR,L> EIUP.IP PO I NTER

:3 4 AHM BONE? O <PTP)

:3 5 BE:;.: PROP !315 DONE LM I:H I FTN:?AV FIESTORE REIS I STEPS.

37 AH RTN r 0 ITRTN:r I~HLSULH'TE RET~JRN ADTI

:3 8 BR RTN 33 r3 0 FTNSA1%) n::: .:, -,c-CH-CA .-.

31 END

01][10R DO70 0 064F:

I:ll:1[14R 4:31:F 0 0 [I4

0 [I O:3R 45DF I1 111 I., 6

(I 0 OC:R 4:3EC [I 0 111 ;2

I:I[I~ OR C87E FFFF

***+*****++*+***+++***+***+********++*++**+**++**~+ ,* -+ ::IJEROIJT INE :I:CGET <Fr T 9 R A T I lJ.i * + TH I 5 ZLIPROUT I NE RETURN3 THE DVOT 1 EN'T "T + E..( STEPS F 1 d D - 3 ~ ~ p t r i ':.'FD-:>p.

* + I::ALLING SERUENCE:

+ I NTEGER+2 F c::xxx:r , T @:?::I , RAT I * IZHLL SClSET (F, T , RAT 10:)

FORMF~T OF INPUTS: * F - MULTIPRECI~ION FRSCTIONI PACKED 16 BITS

+ HfiLFWORIT I N RRRe V' r I MPL I ED RHD I PO I PIT

+ OF MOS'T S I 8 N I F I C A N T B I T It4 MOST 3113NIFI

+ (FIRST:> HALFWORD.

* T - FLORTINS POINT SPECIAL FORMHTz TWO HFtLFW

+ T (I:> ~ZONTHINS SIGNIFICANT FIELII~ 12 PIT

+ !3bJUSTED 9 1.4 I TH I MPL I ED RHD I ::.:: PCI I t.4T

4 THE 1 ~ T H RND L ~ T H PIT FROPI QISYT.

* T 1.:'2:> CONTH INS E:X:PONENT F I ELD 7 Y PO= I + I I...~

I EITElTER 9 R I GYT-AD JUSTED . RHT I El- ONE HHLFWORD IZONTA I N I NG A F I :>::ED-PO I t.4~

+ R I iSH'T.-F)DJU:STED FRACT 10151 WITH I tIpL I

+ POINT LEFT OF MOST SISNINICANT B I T

+ *+++*++**+***+****+**+*******************+*+*+****

ENTRY SCGET * * .7

.L.YMEOL I C REG I STER ASS I SMMENTS . .+

PTR EQU 7 F'OIPJTER IN.TO F ~=tPqR.i.

FH EQIJ :3 HIGH BITS FROM F. F M EQI-1 9 ~~IIDIILE PIT= FFRV F.

FM 1 EQCl 1 0 ALTERNRTE REGISTER F

!=L EQlJ 11 LOW BIT.: FPOM F. T A EOCl 12 ~~DKIPESS OF T. I: A E I ~ U 13 AIIIIRESS OF El-IOT I EN'T.

T )( EOCl 14 E::.<PONENT F I ELII UF T. RTN EQU 15 RETURN L I MK.

,* * EHSED ON Tr GET DATA FROM CORRECT HREH I N F.

.> 7 - .2.~,~3ET STM PTR r FTNTFIV 'SAI..~E FORTRAN UEG I STE

LH TX , 2 I::TA> I ~ E T EXPONENT F I E L D 0

[1014R C:E70 [I 0 0'3

l:inl:2R C470 FFFE

130lCR 4H7F [I I] i l ~

I:I 0 2 OR 48-37' [I [I 02

111 024R 4:387 1]000

0l]Z:3R Z6EF [ I~ZAR C4EO

0 0 [IF OOSER II:yEO

0 0 O:3 [I 0:3L5R 4'33 0

0 052P [1Ij:3CmF! Z1'6 L -

[I l:1:33R C:7E O FFFF

0 0:31:P EC:3E 0003

0 134 I]R 23 09 Qn42R 4:3B7

111 0 [I 4 01:146R Z4AO 13048R €ERE

FFF8 004CR EDBE

FFF8 01:151:1p [ I G ~ H

NHI PTRr X'FFFEN 11 E L E T E L :: E : E I. ..' E *.I E: .r' '1

L H FM r 2 (PTR, I ~ E T F MIDDLE B I T 3

1 :z T ::.; 9 1 5 NHI T X r l 5

BPS LEFT 1::. r NEED LEFT Z:HIFTS PIGHT XHI TX? X'FFFF" II:!J~.IFLEMENT TX

B 12 D I Id 1 : ~ HIID THLl H A L F W U R I l Z

LEFT LH F L r 4 W T R I GET LOW HALFWORII

L I Z F M l r O :sLL FM 1 9 -:3 (T>:) ~:HIFT LEFT T)-!3 p~f i* :

OHR FMr F M l C~MBINE F M PIT=.

* * PERFORM DI I , ' IS ION RNII RETLIRN PEIULT.

D I '4 DH FH r 10 I:TH> DII..>IIIE FHrFM .' T

STH FMg 0 1::C:A:r .> :.TORE RESULT

DONE LM P TR r FTNSHo R E S T ~ ~ E REG I ZTERS.

AH RTNr O (RTN:) II:SLI:UL,=,TE RETI-IRP.I ex115

BR RTN * FTNSH1$) 32-PTR-PTR

ENT!

CH D I '4 DONE FH FL FM FMl FTNSH1n,J LEFT PTR RIGHT RTN

+ PCGET T A T>::

FORMA T OF ~RSUMENTS:

F - MI-ILT I FPEC I 5 I ON FPR8Z.T I ON 9 FFICKEb f 6 B I T S

HFILFWORD I N RRRA'v'r I PiPL I ED RHD 1:s: PO I t4T

OF MOST S I G N I F I C R N T B I T I t4 MOST S I ~ 5 N I F I

'::FIRST) HFILFWOR~.

T - FLORTING POINT SFEIZIAL FORMAT^ TGJO HFILFW

r(1') CONTAINS S I I ~ N I F I I Z F I F ~ T " I E L D ? 12 B I T

ADJUSTED r W I TH I MPL I ED RAIl I:%: PO I NT

THE 1 ~ T H AND ~ Z T H FIT FROM R I WIT.

T (L>> I ~ O N T A I N 5 E:s:::PONENT F I ELD r A Po5 I T I I...'

I NTEGERr R I CHT-AT3JUSTED.

1: (:<) - ONE HALFWURn CONTAIN I NG 13 F I :s:ED-PO I NT

RIGHT-HDJUSTED FRACTION WITH I M P L I

P O I N T L E F T OF MOST S I G N I N I C H N T B I T

* 4

FL FM FH 1: A PTR TY TM TM1 T L TH RUNE T :< RTN

EQl l 3 EQIJ 4 EQl l 5 EBCl 6 EQI-I 7 EQCI 8 EQlJ '3 E 121-1 1 0 EQIJ 11 EQCl 12 EQC! 13 ErS!U 14 EQCl 1'5

Lob4 B I 'T5 OF FROKIUIZT.

MIDDLE BITS UF FRODU

HIGH BITS OF FRCI~PIUCT

HDDRES= OF 12. POIN'TER INTr3 F ARf?A'r'

HIGH B I T S UF T. Y I ~ P L E E: IT5 OF T. HLTERNATE REG I STER F

Low BITS OF T. F~KIDRESS OF T. I ~ D N ~ T A N ' T ONE.

E::.::FOCIENT FIELD OF T. RETI-1-t.4 L I ~4k:. .

4 8 + 49 + !SET PHRWMETERS AN^ MULT I F.L'.I.. 3 I 'sr.4 I F I IZHN‘T F I EL

5 0 + 51 'SCSI-IB STM FLrFTN:?AV -. l.a=l'.'~ FORTRAN A E ~ I : 5 ' r ~

56 L I Z T L r O ):!-EAR T LOW BIT=

57 58 RASED ON EXPONENT F I E L r l O F Tr JUPTRACT "aODU

53 FROM PROPEP PLHCE I N F HRPA'r'.

611 6 1 L H T ::.< 9 2 C T H :> GET EXPONENT FIELIB !:I O 1 6R 4SEC

1:1 0 0 2 001AR C37E

FFFF Cl01ER CE70

0 I] 013 OOZ2R 1,7470

FFFE 0112dR 4H7F

0 0 02 0I]zHF: Zt5EF I!OL>CR C4EO

111 O OF 0030R C9EO

0 0 0 8 01134R 4'330

O 12 4ER 013'33R 3 1 26 003AR lZ7EO

FFFF 0 13'3Ep ED:SE

0 I1 0'3 0 042R z:3 136 0044R 118H9 0046R ECHE

FFFS 0114HR ECSE

FFF8 0114ER 4 9 3 7

I] 111 111 4 0052R 4:347

0 [I I] z 045c,p 4 3 9 7

0 8 i t [I

.- ., L H I PTR r - 1 < T N ,- b~ L:VBTRACT 1

6 :2 SRHA P T R r 3 DIVIDE B Y :3 ANIS

6 4 N H I PTRr X'FFFE' DELETE L:SB: EGLE~.I B.T.T

A I S NHI

CHI

.7 = ? 2.UETRACT '3 H H L F W O

BPS LEFT XH I

'SLL THr 9 (TX>

R :s FIGHT LHR

:SRL

LUAD REGISTERS FROM

FMs 2 l:PTR>

0 0:32R 0 O131:R Pi0 ERRORS

CH ISONE FH F L F M FTNSHV LEFT PROP PTR RIGHT RUNE RTN ::;C'IIJE 7. 2. IJ B 13 TY TH T L TM TM 1 TX

:3 0 SHR F L r TL 81 STH FL r 4 (PTRT

:3 2 :SCHP F M r TM :3 '3 STH FM r 2 <PTR>

:3 4 .- - bI,HR FHr TH 5 5 STH FHr O SPTR)

:3 6 BNC'S DONE 87 LCS RONEr 1 :3:3 PROP C. .&IS PTR, 2 99 HHM RUNEr 0 CIPTR)

3 3 BR RTN 94 39 FTNSHV DS :3Z-FL-FL 96 END

SUP.TWAI=T LOC.I H~~LFWOF! -. >.fii..)E 11 I FFEWENCE

.-. '..~JBTRHIZT t.1 I DDLE HqLF

.- '1.[UETRklZT H IGH .:. L.HVE HIGH HALFL-4OPD

FORTRHN PEGI~TER 5 ~ 1 ...'

THTS SlJRROUTT.ME DlTMPS THE F I F S T N PALFllORnS I N A R R A Y T,OC (OR !.I HALFTJORDS SKCINMTNG IJIT'I 1,OC) 1:T I-IEYADECTPIAT, nR!To J,OCICPT, UNIT Tau.

DO lfl0 I = 1 , N ' ,

I,SUR = ~ n n ( 1 - 1 , 1 6 ) + 1 CALI, !JNPAC!!(LOC(T) LTHE(LSUR) ) I F (I,SIIR . EO. 16) 1 7 ~ f ~ ~ (1,1~,28@) LTPJI:

SURKOUTINE DLTFlPV(LI!, F1, LOC, hl/\.T,I'!l)

T H I S SLIHROUTINF: PllMPS THE F I R S T !! I ~ A I . F ~ ~ l C I R ~ S T?l APPAP 1,OC (OR ?I IIALF\JORDS RECIFJMING \ l I T t 1 1,OC) T M 11EXAT)CCT?lAJ, OYTr) LOGICAI. ITPJIT T,ll.

. - C O H T I N ~ J E . I F (LSUR .NE. 32) r!KITE (LU, 2flfl) (T ,T?lE(I ) ,1=1 ,LSTTH) RGTIIKI*? FOKTIAT(R(1X 4 A 2 ) ) DO 19)l T = 1,r LSUR f l O n ( 1 - 1 . 6 4 ) + 1 CALL IJNPACK(I,O~: ( I )-1 URIJF) T,'II?E (T,SITH) = 2 56 * J , A ~ J D ( U R L ~ I ? ( 2 I F (LSUR. C 0 . 6 4 ) !TRITE (LU,2$41) COMTIPJITE I F (LSUH .FE. 6 4 ) WRITE (TAU, 2g1) RETURN

PROISRAM PJAME = RFV LATEST WEI;~ZIUN 4...'2..."715. PROISRAMMER = F. C. PASIZO FIXED-TO-VRRIABLE SOURCE C O D I N G A L G O R I T H M DEMUNSTRHT 1Ot.J

F O R EERNOULL I SOURCES. D = z r J = K = 12 B I T S .

O - HVMAN INTERFACE

6 - PRINTER F O R b d T H AN11 R E S U L T S L O G

~~FIRIRPLES O F MAJOR S I G N I F I C H N C E :

XPTR = NUMPER O F % S i " M B 0 L S H L R E R D Y USEI l .

YPTR = NUMBER O F :~ 'Y 'MBOLS I N '$ RRRAY.

L = ~ZOQEWORII L E N G T H

COMMON .HCHANNL.fF 1 r F LOG I CHL LOG r NOLOG INTEGER+Z XDIM. XPTRr ::.::LEN9 YDIMr '(PTRr FIRST:, ZEPOz :.: 1:2049:> 'f (2049:1 I NTEGER*? F 1 r F ( 2 049:) - FD I M r LIJ 9 Q (2) r C: 9 T 1:Z:) r FOVERT INTEGER+Z Nr AN:SWERs YES I)HTH j'ES.zelH'$ ...'9ZERO.*'t':I.J DHTH FD I N r XD I M r i ' I I l M ' i ? 049 r 2 04'3 r 2 049..# F 1 = 0 WRITE (09 001:)

. . FORMHTC'FIXED-TO-VRRIAPLE SOURCE CODING DEMONSTRHTION:'..I

I:HLL PROMPT (09 'Do ,YOU WRNT DHTF) L O G ON L. 11.6. ':',I' OR N:) '? 1.'' ? READ ( 0 2 0 04:) HNSWER FORMAT (A1 3 LOG = VINSWER. EQ. YES) PiOLOG = . NOT.LOG

!JRITE 0031 FORMRT ( / EERNOIJLL I SOURCE MODEL ' 1 I F CLOG) 1r.M I T E ( 6 9 003) [!!RITE (Or 0 1 0) FORMAT ('ENTER PROBABILITY OF FI IOCIFZCE " 1 " fi.5 NUMER~TOI? OF~".I

CHLL PROMPT ( O r /FRFICT ION: h i l - . f 4 0 9 6 REHI) < 0, 02 0:) Q (2:) FORMAT (14) IF (12 1::s) . LE. [I. OR. (2) .5E. 4 096) GO TO [I 07 12 = 4 0 9 6 - 13 (2) Qc:l 'r = 1:

H= (13 (1 1) +HLOI; (Q (1 :> ./4 0'36. > (2:) *ALOG (Q (2:) ..id 09<, . ::I ':I ...' ( 4 1596, * A L O I ~ I:, 5':1 ':I

CHLL PROMPT ( 0 9 / M E S S ~ ~ G E LENGTH N = '? ~-il- REHD (I] 9 08 1.1 N FORMAT < 15, XLEN = 16 + XDIM I F CN. GT. :.:LEN? GO TO 0:30

.y.

.2.ET UP SOURCE HRRHV SCCORD ING TO PROEHB I L IT IES.

WRITE C O r C591:) FORMHT C "GENEWRT I NG SOURCE ..' 1) DO 095 XPTR = 1rXDIM ::.<rXFTR) = 0 DO 0% XPTR = I r X L E N I F 1: I RAND (:4 096) . GE. I=> CALL :zETR 1 T (X 9 :x:F'TR) CONTINUE LH:ST = 0

Ir.lRI TE 1: 0 r 1 05s FORMAT ( . . . * ' ' E ~ c a n I Na '' ::I

F'd4- 1 DO 1111 I = 19FDIN F(I::l = O

F!)E-Z T(1) = 204:3 T 1:2) = 0

Fta)E-3 DO l5l:I XPTR = FIRST*LH:ST

F V E - 3 P IPYM = NTHBIT(X.XPTR) + 1

F!)E-'3c 1 F (1 'S'.(M. GE. 2) IZALL 3C:BDD (F T 9 I:)

F1l)E-'2~ IZALL NEXTT (T 9 0 (I S'~M':I ::)

FVE-4 L = T(2:r .+ 1

Fg~)E-5 To fibb ;?a& (-L.:I 9 W E fiDn ( 1 . 0 l 2++ (-L+ 1 ::I ::a & (0.5:) T (1) = 204% T ( 2 ) = L -1 CALL '7.- L.I-.HDD CF r T r 2 048:)

WRITE(0,178T NvL . .

FORMHT ( 1 6 " ~ Y M P U L ~ ENCODED; CODEWORD LENGTH =." 1 5 " FIT+.' .:I I F <NOLOG:? GO TO 2 0 0 RATE = FLOHT l:L> ..NFLOAT CN::l IdRITE (6, 1'30) RATE FORMAT <..' 'RATE ='F6 . 2 .- .' PIT,= - ..'.-. s VMBOL" :I

!l!RITE (69 179:) N . . FOPMAT (.,":~~IJRI=E MESSRIFE ( '' I 5 ' SYMBOLS::~ := .. )

::.<PTR = CF 1 RST+ 15) .*'I 6 I ~ A L L DlJMP (:ij r N.' 115 9 X C:.:PTR) )

!I!RITE (69 l:30:) L FORMAT (., "THE 1z0nEwoRr1 I 5 1:' " 1% ." p I : ..' ':I CF1l-L DIJNP I:'C.r (L+lS:r ...'lC,. F:r

I.JRITE (09 2 1 0:1 FORMHT <'DECODING"?

DO 2'50 '$PTR = F IRST r LH'1:T

CHLL CLRB I T (Y r 'f12TR> I F I::ISYM.LE.~:> 5 0 TO 2 5 0 CHLL SETB I T (Y r 'U'PTF1::l

I.I.IRITE (l]r:I-rOS:> FORMHT <."DECODING COMPLETE. '?

I.IJRITE (09 401) . .

FORMAT ('EEI~;I N COMPRRINI~" .5

DO 4 0 0 I = F IRSTrLA5T I F CNTHEIT (Xr I>. NE.NTHRIT ( ' f r 1:) :) GO TO '3114 IXINTINUE IJ.IR 1 TE ( 0 ? 4 2 0:) I F f::L[7ts) Ij.IRITE (6,42O':1 FORMHT ( Z C : ~ ~ ~ ~ ~ ~ ~ o ~ SUIZCESSFUL':~

GO TO 1 0 0

IrJRITE 91 1:) 1110 TO 1 0 0 1 IrJRITE (Or '312:) t50 TO 1 0 0 1 WRITE ( O r '313) GO TO 1 0 0 1 FORMAT (.,'RAN OIJT OF 501-IRIZE I ) ~ T A . ":I

FORMAT I: ' OUTPUT BUFFER FULL " :)

FORMHT C #DECODING E R R O R ' )

Ir.lRITE ( O r 1002:) FORMAT 1:'TO R E S T R R T 1.4 I T H SAME PHRAMETERS r T'r'FE ''cfl" ' . : : I

PHU.SE GO TO 090 END

SOlJKCL FIISSSACF ( 4096 SYMRO1,S) = FFFFFFFF I'Fr7EFFF FFFFFFFF FFFFFRrF FTFFvnF'F FFFFFFFT Ff"FFITcT PFFT'TFTr FPFFFFFF rFFFFFI'l? F F F T F F F fFRFFFTS FFFPFFFF FFFFFrYF TFTTTPFF FlTrFFF*F FF7FFFFF FFFFFFFF FTTVFFFFF FFFFFFrF "FrFFFFT FFFFrPrF Tl'T'fT'l'T17 I'rf!'rcl'l' FFT'PFFFF VFFFFFFF Ff'FFFFFF FFFFrFTF FPFFFFFr FFFFFFFP FFFPFFrT ! : r r 7 F W r F7PFFFFT' FFFFFFFI, FFFTFFFF I'FFFFFTF AFFT'FFFF TFFTFFFF TFl'FFrrT7 r T ' F F T F r FTFTFRFF I'T'FFFFFF FPTrFPFF FFFFFFTF FPFFFFFF 17FFFFT'TI: FrT'FvfFF T'FFYF-'['T'I' FFFFFFFF E'FFFFFFF FDTFFFFF FFFFFFEF FFFFFFFF FFFFFrFr FFTFITFF F r r F f r F FFFFFDFF FFFFFFFF FFFFFFFF FFFFFFFF Fr rFFFFP FFFFFFFF FFvFrFrT lTyFFYrr FFFFFFFR FFFFFFFF FFFFFFFF FFFFFRFE F F F r r F F F FFFFFFFF FFrFFFFr r F f F r r r F FTFFFFFF FFFFFFFF FFFFFFFF FFFFFFTF FFYFFFFF FFFFFFEF F r r F F F F r 7FTfTTT'T FPPFFFFF FFFFFTFF FFFFFF7F vFFFFFF7 FFFT'FT'FF FFFFPT'rr FT'PTI'PTT Y r r T r r r r PFFFFFFF FFFFFFFF FFFFFFFF PFFFFFF7 FFTFrT'TF ITFFFFFPT rTFrPFFF FrcFPTr T'FFFFFFF rFFFFFFF FFT17FFFF TFFFFFFF RFFFFFTF FFFFTFFF rlTTTFTF VFPFPIW FFFFFFRF F7FFFFFF FFFFFFFP FFFFPFFf T F F r r r F F DFFFFrTr FrE'rFFFF T F F " F r v FFPFFFFF FTFFFFFF FFFPFFFF FFF7FFTr TTrFFFFI) FFFFFFE'F rTFTFFFT rFFFI'T'I'1' FFFFFFFF FFFFFFFF FFFF7FFF FFFFDFTI: FFFTFFFF FFFFFFFF FFVTFPFF TFFF!'PT.'lT

TNE CODEWOlVl I S ( 3fl1 BITS) : 5Rhn49A1 lnEl35495 PR426333 52DF438F PF7A67DR A2ARE4/+l+ CAOlj(i7FO nA@QI(j1:?n n7 l a n 2 ~ 7 ~ 6 a 5

COMPARTSOIJ SIICCESSFITL

UTT< = 41.4175 RITS/SYI'fROl,

SOURCE MESSAGE ( 4@96 SYMRO1,S) = FFFFFFFF FFFDFFFP FFFFRFFF FFFFFFrr FFPFFFFF FFFFFFFF FFVFrFr r rFFFFfFF FFVFFFFF FFFFFFFF FFVTFFFF FFFFPF7" TF'FFrFFF FFFFFFFF TrTPFrFF WFaVFl'F FFFFFFFF FFFFFFFF FFFFFnFF PFFFFFf17 FFFFFFFF FFFFFFFF r r F c f F F n r F F F F F F FFFRFFFF FFFFFFFF FFFFFFFF FFFFFPFT FPFFFFFF FFFFFFFT TPFT7rFFP TVFFrT'F FFFFFFEF FFFFFFFF FFFFFFFF FFFFFFrF FFFFFrFF VFFFFrFF FFTFFFFF FrFTprPF FFFFFFFF FFFFFFBF FFFFFFFP FFFFFnFF FFFFFFFF TFFFFFTF FFFFVFT'O T'rFFFrrF FFFFFFF7 FFFFFFFF FFFFFFFF FFFFFFFr FFllrFFFF FFFFFFFF FFFFTFFr F F r T F r F r FFFFFFFF FFFFFFFF FFFFEFFF FFFFFFFF FrFFFFFF FFFFFFFF FFF7FFFt7 rrrFT'WV FFFFFFFR FFFFFFFF FFFFHFFF FFFFFFFE FFFFFFFF FFFFFFFF F V F F I ~ F ~ F r F r F F F r r FFFFFFFF FFF7FFFr PFFFFFFF FFFFPFF7 FFFTFFFF FFFFFTFF F ? ) r 7 r F r F FFTFTTl'T FFFFFFFF rFFFFFFF FFFFFFFF FFFFFFTF TT'FFFFFr I'FFFEFrr W7FFTFT I~CPTI7Fl'l' FFFFFFFF FFFI'FPT'F TFFrFFFF FFFFFFVF 17FFFFTFF FFFFFFTF FFFfTFPF FFrl-FFfF P7FFFFFF FFFFFFFF FDfFfFFF F7FFFFFF FfFFFF'oF TFFEFTET rT'FfrFOc FFFFrrPF FFFrFFFF FFFFFFFF FFFTFFFF FFFFFFFr BFFFFFFF FFFFFFTF FPFrTPFF FFFFRrJ'F FFFFFFFF T,T7FFF7FF FFFFFFFF FFFFFFFF DFFFFFFF FFFFTVFF r7FFFFTiU FI 'FT'Wrr FFFFFFFF FEFFFFFF FPFFFFFF FFFFFFFF F V F F F F F F F F F F F T WFPFFFF I ' rFFFFFr

THE CODCTrCIRD I S ( 707 RTTS): 5FQ)R141%1 I3flE51)fl41) C7FFQ)2 19 85CBDAC7 2PA765194 57n3flhl C A?66l\Dnl Dl '3Friznh 638CC9C7 7952E422

COFlPAPT SnFl SITCCESSFITT,

PATC = l.@fl@ RITS/SYWROI,

L/ SOIIRCE JlKSS.I\GI: ( 4g96 SYFIROLS) =

C5F5E992 6284636E 56FS8DAl SDCC12qF 7Hflfl94n7 qTVE743C 3hFpRrHh lCrRR299 R74RDC7F Pfl336RnP 352fl@97@ C912D683 35T4Q184 R7ElQ672 ? 4CFF7AY l)ER9"IFfll 53684581 RRE67AC3 C375DR15 23DgFOE9 lR7FAK4h ERflDF841 C71444D1 735@77flG 46C66543 6AC1781C 15A54A36 3657R7Vlt 3431ArC9 C18flfl6"? 1T75R177 h 2 9 f l l n l r E59CQ93F RlDF@46Q C217R14F 1CR88A36 7Qrfl5CR5 323E35C1 5Cnr733C 3A3P?l?K E7EAASF9 FDF53F45 EAD73564 F2g3@16?\ 8E14AChl E22F24Rl A54nC4fl8 43hrR5RR AC3876A3 949C65F3 EDRD7Fh3 8DC014C3 7 r 7 5 E l h 4 9A6CCFnD 516314E4 AflQ1"4461 698lVC15 nC26E25C 6nRAA94E lF75DD7n C2599747 23E94323 R9F4Ffl3P rP964?35 2hR7C579 ECR2EF58 RF104SCE fi353391h DFWEE37 h323Rfl5C 449h43Afl l lQDD186 8hRlAAlC A841CEflT: RRCrlR2A lFF8Efl86 921)P;fl829 23782gE3 51'C1)8265 12Rfl74PP D@@A3rD@ E6565.4flI FAflflfl33C 35BD3F51 FOQDbE2R 15425F555 RlCfl7581 4fl487114 22fl8DR3F 1)Q)A?9817 E82D9AC4 AA940FE3 712202FT 77/tCB946 BDRA2FFF: 8ESA.F7:'r/+ A5hCFFA2 3R4Cl949 D53DPR55 3823F659 28D262F3 B8R956flh R33133A2 65F11n5fl 45F@h4@E 95872987 h227fl728 0698F446 7@CRD15@ 8flg5CgEJ) nCAAT599 5r6BhPCC DDA?lFDD Q77PFF7C 3R37C2CF 99ElFBAI P97hC68C 47462493 2FPflCgR9 A1$3ODCQ 566@1344E 991FECPC 7ElC3363 R7308DF6 2 5 V 7 F 1 2 E14DDC?3 594@A?Ch RnhRRnfiA

T111? CnPI::I!OPD TS ( 4@97 RTTS) : C5F5Eo92 f i 9 R b 3 6 E 5hP78DAl RDCC129T> 7RflgQ4n2 QFPE743C 3AF1'8FRh 1CESnYZ B74RDC7r Qfl336Rnn 35?@flQ7@ C912nhfl3 38F4n184 P7E1967? 2ACFF7AF DE892Ffl1 53684581 RsE67AC3 C3751lR15 23D0FOEc1 1 ??PAC46 I:RflDF841 C7 1454r)3 735@77G/6 46Ch6543 6AC1781C 15A54A36 3657R784 3431AFCQ c18@@6P2 1C75R122 h2QfllDlF E59C993F RlDFg46q C217R14F 1CR88A36 2qrfl5CR5 323E35C1 5C9r733E 3A38712E E7EAA8F9 FDF53F45 EAD73564 F2@3@163 8C14AC61 E22F24R1 454nC4@8 43hFR5RR AC3876A7 949C65E3 EDRD7F63 8DC@14C? 7F75ElA4 9A6CCFDD 516314T4 AflDlFA41 69SPDC15 DC26E25C 6DRAA94E 1F75Dn7!) F25Q9747 23Rfl49?3 RQFaPfl3P En964235 26R7C.579 CCR2EF58 RFln[+RCE 03533916 DF1)8E1'37 A323RO5C 44Q643Afl 119Dnl';O 86BlAAlC A841CEflE HECFlR2A lFF8EflRf. 92D8fl824 23782flE3 5ECQ8265 l?.Rfl7[tn? D @ @ A ~ F D ~ E6565A01 FAflflfl33C 95RD3F51 Ffl9n6C2R E425F555 ~ 1 ~ f l 7 5 5 1 4fl4871 1 b 22fl8DB3F IlflA9q817 E82D9AC4 AA940FE3 71 22D2FF 774CR946 SDRA2FFE 8F8AF7Q4 A56CFFA2 3R4C1349 D53DflR55 3823F65Q 2811262F3 RRR956flh R33133A2 6 5 F l l n 5 g 45Pfl640E 05872987 6227fl728 @698F445 7flCRn15fl 8flgSCflEn DCAPF599 5F68ARCC nDA21FDD 977FFF7C 3R33C2CF Q9ElFRAI RQ7AEhRC 474624q3 21'QfiCflRo Al@3TlnE?

L 566oR44E 991FCCTC 7ElC3363 1373fl8Drh 25F07F12 E14QDC23 594flA3Eh R9689Rfl4 88Qfl

COalP A R I SON SUCCESSFUT,

-. I-. -. L -. I-. -. C. C C C r; C: C -. C.. -. I-. C 1:

C: C 1:

C: -. C. C . L

PROGRAM NAME = MFV LATEST RE!.) I S I ON 4...;2...'- I" a *

PRUGR~MMER = R.C.PHSCO F x XED-TO-'*)AR I R B L SCIllRCE COD I Nl'. HLGOP I THM DEMUNSTRHT I O N

FOR 1 ST ORDER MRRKOV SOURCES.

THIS IMPLEMENTATION HHS PHRAMETERS

D = sr = K = 12 ~ 1 y s r HND L <= 4 0 9 6 EITS.

i $ ~ A ~ l ~ ~ ~ ~ ~ O F MAJOR S I G N I F I C A N C E :

MHLPH = HLPHRBET SIZE. (SYMBOLS RUN FROM 1 THROVISH MHLF'H> . XPTR = NUMBER OF X S.I~MBOLS ALRER~IY USED.

' V T R = NUMBER OF SYMPOLS IN 'f HRRRY. L = CODEWORD LENISTH

Q C I J'l= PROERBILTIY THAT S.Y.MBOL I IS FOLLOWED BY J

COMMON /CHHNNL.'F 1 9 F LOG1 CHL LnCr NOLOG DOUBLE PREC: I S ION A < 4 ? 4) r B (4) r MKFtREA (4) r QG! r H (41 r EH INTEGER+Z :+(I OIS17j ~:x:D1Mr:~:PTl?r'f (1 0017:) r YDIMr' fPTRr F IRST, LH:5.T INTEGER+s F l y F ( l 2 9 > ~ F D I M ~ L W Y Q C ~ ~ ~ > r I>LEFTrC[ :4r4) rTI!'?> rFOVERT I NTEGER+Z Pi 9 ANZIdER YES r XPREV I 'fP17EV r :4PRE14 1 9 1211 I M DHTH YES./lHY ..' DHTH FDIMr XDIMr YDIM.'lZ9r 1 O O O r lOOO.~r l;!DIM...'4..< F1 = 0 WRITE (Or 001)

. . 0 0 1 FORMAT ('MRRKUV Fid CODING DEMONSTRmT I O N . '. )

(I:HLL PROMPT (0 r "DO 'fnl-1 WRNT DATF) LOG url LU. 6. <'f OF? N::l '7 L ")

PEAD (Or 004:) HNSWER 0 04 FORMAT <A1 >

LOG = <HNSWER. EQ. Y E S NOLOG = . NOT. LOG

ESTABLISH ALPHABET S I Z E PFJTJ T)ISTRIBUTION:5.

CALL PROMPT SOURCE RLPHABET SIZE = ? 1 ..') REHD (Or 003) MALPH FORMAT (I 1:) I F CCMHLPH.LE.O>.OR. <MHLPH.GT.QDIM)> GO TO 0112 I F (LOG> blR I T E (6 r 0 05.:1 MHLPH FORMAT C.'..'" HLPHRBET SIZE =' r 15) DO 080 XPREV = 1 9 MRLPH 1: l:XPREV9 1 > = O CLEFT = 40915 H (XPREV) = 0. MHLPHl = MHLPH - 1 XPREVl = XPREV-1 URITE (09 010) XPREVl FORMAT ('ENTER NUMERkTOWS O F P R O E A E I L I T I E S O F S Y M B O L 5 F O L "

1 '.LOWING' 13) no 0 : ~ I = I~MALPM I l = I - 1 WRITE (Or012:) Q L E F T v I 1 FORMAT <':SPACE P E W F I I N I N G =' 15' ...'4(I'%; ENTER P W O P A B I L I T Y NO. " r 13.:1 IZHLL PROMPT (02 '~L1,.~4Cl96 REnD C 0 r 02 0::l Q CXPREV 2 1) FORMHT (14:) I F C CB (XPREV9 1:) . GE. O> . FIND. CQ CXPREVr 1:) . LE. QLEFT> > GO TO 030 MRITE ( O r 022) . . FORMAT < ' ~ O R R Y ~ OUT O F RANGE. TRY A G A I N UFJ THAT ONE. .' ) 150 TO 019 BLEFT = BLEFT - Q (XPPEVr I) I F (Q CXPREV r I) . EQ. 0:) GO TO 033 A C I 2 XPREV:) = DRLE (0 0::PREV r I) .'4096. :I

H t::)(PREV) = H CXPREV) + H C I r XPREV) +DLOG (H (I 9 XPREV:] 1) C (XPREVg 1 + 1) = C CXPREV r 1) + Q CXPREV 2 I > CUNT I NUE IS! (XPREVs MALPH) = QLEFT I F CQLEFT.EQ. 01 GO TO 037 Ql2 = DBLE CQLEFT .# 4096. H (XPREV) = HCXPREV) + QQ*DLOG <BQ) H (XPREV> = H (XPREV) ./DLOG C 0.5D 0) ACNFILPHrXPREV) = 1.0DO

I F CNOLOG:) GO TO 050 !I.IRITE Cbr070) (QCXPREVrI>rC(XPREVrI>rI=lrMALPH) FORMAT ( y' P R O ~ ~ B I L I T..~.. 17 1 : : ~ > I~IJMULAT I C ( : x : ) '' .C (2 (5::.: 1 6 " '4 096 ') ) ) 1.t.IR I TE C6 r 07 1) H C::<PRE1v'3 FORMHT C / 'ENTRDP.~ = " . F6. 3 .'' B I TS,'S~MPOL. ' ./.') CONTINUE

I F (NOLUG> GO TO 0:31 DO 073 I = l r M H L P H l H = H - 1.110 P (1:) = O.DQ BsMHLPH:i = 1.D0 CALL L E Q T l F ( A ~ l ~ M H L P H r ~ ~ ! D I M ? E ~ O ~ I t i K H R E A ~ I E R > InlRITE (69074) <ECI ) r I= l rMHLPH:> FCIRMHT C ' S T ~ T I ONRR'V. D I ~ T R I BUT I ON ' fl' <F 1 O.4) 11

EH = 0.D0 DO 075 I = l r M A L P H EH = EH + H + R MRITE (69 0763 EH FORNHT C'HI.~ERAGE SOUPIZE ENTROPY :3' B I T~...'S.~*MBOL '.':)

OBTHIN BLOCK LENGTH.

CHLL PROMPT (O~'BLDC~-: LENGTH N = ? L-L1 RERn COzOZO> N

C- .LET UP SOURCE HRRRY t3CCORD I N G T O PROPAB I L I T IES

~ J R I T E < O r 091:) FORMAT C/~ENERPIT I NG .Z UURIZE '':I

XPREV = 1 DO 0 9 7 XPTR = 1rXDIW I = IRHND <40136> ISYM = MHLPH I F ( C CXPREV r ISYM> . LE. 1:) 50 TO 0'36 ISYM = I9'r'M - 1 GO TO 95 X c:XPTR> = IS'r'M :.;PREV = I'S'r'M

XPREV = 1 YPRE'V = 1 LHST = O

F I R S T = LHST + 1 LHST = F IRST + N - 1 I F (LHST. 13. :.:DIN) Gr', TO 3 0 1

WRITE COrZlO) FORMAT ('DECODING':)

F '$1 D - I2 TC1:r = 2 0 4 8 T (2:) = O

F'y'D-:> DO 2 5 0 VPTR = FIRST! LHST

F14D-139 CALL SCGET CF r T 7 FOVERT) IS'r'M = MHLPH I F CFOVERT. 1SE. C: C'r'PREV r I SYM) :) GO TO 2 3 0 Is ' fM = I's'fM-1 GO TO 225

F14D-:3lz Y c 3 " T R = ISYM

F81)D-3rt CHLL SCSUB(FFT?C<YPREV~ ISSr'M>>

F1u)D-3~ CHLL NEXTT (Tr Q l.'r'PREVr ISYM) ) '$PRE1$) = 1 :S'fM

lrlR I TE C 0 r 3 05) FnRMflT ('DECODING COMPLETE. "I

IJRITE (07 401:r FORMHT ('BEGIN COMPAPIN~S')

I F (YPTR. NE.Lt3ST) 50 TO 9 0 3 DO 4 0 0 I = FIRSTrXPTR I F CX (I). NE.Y (1) T GD TO '304 CONT I NIJE IJRITE (09 420) I F CLOG> WRITE (6 r 4 2 0:) FORMRT ~I=OMPPARISON F ~ ~ ~ = ~ ~ ~ ~ ~ ~ ' : )

GO TO 1 0 0

WRITE <09911') GD TO 1 0 0 1 WRITE (09912) 150 TO 1 OO1 WPITE (Or 913:) 150 TO 1 O C l l WRITE <Or 913) GO TO 1 0 0 1 FDRMHT(./'RAN OUT OF 'SOURCE DATF). ':> FORMAT(' OIJTPUT BUFFER FULL'")

FORMAT C DECOD I NG ERPOR' 1) WRITE (09 100Z> FORMAT ( 'To PES1ART W I T H SAME PARAMETEPS 9 T vPE "CO" ' :> PAClSE GO TO 090 END

WPITE (02 105.1 FORMAT (..'../ 'ENCODING'

FVE- 1 DO 1 1 0 I = I Y F D I M F = 0

Fi,.'E-;I

T(1:) = 2048 T (2) = [I

F I*/ E - '3 DO 1 5 0 XPTR = FIRSTqLHST

FPE-3s IS'fM = :+ (XPTR)

F15tE-:3t= CHLL SCHDD CF 9 T r C (XPREV r I SYN:.')

FG'E-3n CHLL NEXTT (T 9 O I'XPREV r I .SYM) ) XPREV = ISYV

FVE-4 L := T (21 + 1

F1stE-5 TO HDD 2**(-L:)z bJE IIDD (1.0 * 2**( -L+i> '1 (13.5) T (1'1 = L>04:3 T ( 2 ) = L-1 CALL SCHDD l.:F 9 Tr 2 048

WRITE <Or 1781 N9 I- FORHHT (16 ' :=.'I'MPDLS IENCODED: CODEWORD LENGTH = " 15 ' ' B ITS..":^ I F s:NOLOG:j GO TCI 2Or1 RHTE = FLOAT (L> ..;'FLU! IT (N, Ir.IRITE (6, 190:) PHTE FORMAT (./'RHTE =" F6. . ." PI TS/S.U'MBOL <')

I.JEITE (69 179:) N FORMHT ( . . " : ~ ~ I J R ~ E 1 . 1 ~ 5 mc;E ( ' r 1 5 r ' S.VMEOLS:~ = ' ) CHLL DIJMP1~/ (6, f l y :if (FI i PT) r MHLF'H'I I.I.IRITE (69 180) L FOPMAT <:..<'THE ~ o ~ c ~ o D IS (' 15." BITS) : '':I [:ALL DUMP (6 9 (L+ 1.51) ..*' 15 9 Fl

PROP,ARII,TJ"I' CIIMUCATIVC 4 141149196 41144196

327614096 41014096 410/4096 368614096

ENTROPY = fl .923 RTTSISYYROL.

PRnR ARILITY CIllIIJLATTVE 41411441Qf) 4114096 418/4096 4141/4096

3276/4096 82014096

PROBARILITY CIRIIILATTVI' 327614096 014096

41@/4@9h 327614096 4141/4@96 7686144196

ENTROPY = 41.923 RITS/ SY?fROL.

STATTONARY DT.STRIRUTION 41.3333 0. 3333 0 .3333

Avera~e eource entropy = 0 .923 bits / synhol

THE CODEWOPD I S ( 906 BITS): D9DnCE2B RCEfl283E C2881AAA 64F022AI E26BAE66 CCR2D84C FF5ER3DE D33C2915 B355F3D2 12056BAfi: 43DA3CD7 FB34AC51' 72FlE3D8 RDR10EF5 31874DDT; R3'176R38 AF3 19E21 n6359EAn 4R744879 390CF323 7Rg40R2A 345716P.F DR? SFDR6 ?@95A3n(t 25FDAF7fl 8n1635C7 9E?75D58 28DR2467 Fg71

COMPARISOtI SUCCCSSFUL

PROGRAM NAME = I $ f LATEST REI..> I 5 I 0t.4 4 .*'z ..'715 PROCPAMMER = E.C.PASCO l ~ ~ ~ ~ ~ n ~ ~ ~ - ~ o - F ~ ~ ~ ~ SOURCE CODING RLGORITHM DEMCINSTP~~TION.

FOR INDEPENDENTP IDENTICALLY DISTRIBUTED I O U P C E MOLlEL.

' Y . h rr

THIS IMPLEMEHTRTION HR5 F A P A M E T ~ ~ S

D = 29 .J = K = 12 B I T S ? Rt+Jb L <* 4 0 3 6 F I T S .

I? - HUMAN INTERFRCE. r .'. . , ; * , 6 - PPINTOCIT - DATA 9 W D PESCILT+PLOG.

MHLPH = HLPHAPET S I Z E . ~:'~'Y'MPOLI R U b l FROM 1 TYROUGH YALPH'r. LW = I:OIIEWORD LENGTH i' 16 >;PTR = NIJMPER O F X S-tME0LS ALREADY 1-ISED.

'v'PTR = UMBER O F S V M B O L S I N 'r' A-Y.

L = I:ODEHORD LENlBTH <n M V L T I P L E O F 161) . COMMON .flCHHNNL.*'F 1 9 F LOGICHL LOGr NOLOG INTEGER+Z X(4OOO> rXDIMrXPTRr1f1.400cI:j r8r 'DIM?' tPTRrFIF2STrLA:5T INTEGER+Z F ~ P F ( Z ~ ~ ) ~ F D I M ~ L W P Q ( ~ ~ > ~ O L E F T ~ C ( ' ~ ? : ~ ~ T ~ Z ' > ~ F O V E R T I NTEGER+Z N r AN:S~JSER 9 YES IIHTFS 'r'ES.8 1 Hj' ..' DHTH FDIMr XDIHr ' fDIN/ i?57r 40l?Or 4900.' F 1 = O URITE ( O r 001) FORMFST ("VARIRPLE-TO-FIXED SOIJPCE IZODINB DEMONSTRAT I ~ N . ..'::I I X L L PROMPT t O r "Do vau WANT DATA LOIS ON I-. 1-1. 6 . ('t OR Pi::l .? h *" :I READ < O 9 0 04:) AN:SI.IJER FORMAT (A 1 > LOG = (YNSWER. EQ. YES::# NOLOG = .NOT. LOG

ESTAPL ISH HLPHRBET S I Z E AN11 D I STRIPVT I ON5.

CHLL PROMPT (0, "MHLPH = '7 hL:> REHD (0,00:3> MYLPH FORMHT <I?> 1 F ( (MALPH. LE. 0:) . OR. i:MALPH. ST. 99) :I 150 TO 0 02 I F CLOG) IJIRITE (69 O O"i3 MHLPH FORMHT (./' ..J .' HLPHRBET :5 I ZE =" r 15) I: ( 1 :J = fl

I;ILEFT = 4036

H = 0. MALPHl = MALPH - 1 lr!PITE ( O r 01 0::i MYLPHl FORMAT ( "ENTER' 9 I 3 9 ' NL~MERATORS OF PWOBHE I L IT I ES '> ':I DO 035 I = lrMHLPH1 WRITE (09 012::~ QLEFTr I FORMAT (":SF ACE REMPI I N I NG = .z I 5 ~ . ' 4 096 ; ENTER PROPFIB I L I T.I t.40. "' 9 I 3:) II:HLL PROMPT (0, h'1,.,"4 096 READ (0, 02:'0::1 Q1::I) FORMAT (14:) 1 F ( (Q < 1) . IZT. 0) . nND. (IS! (1 :I . L T . QLEFT) > 111O TO 0:3 111 UR 1 TE ( 0 9 I:IzZ)

. . FORMHT ( ) Z O R R ~ ~ OIJT O F RANIS€. TR.~. FIGHIN ON T H W T ONE. ' ?

GO TO 019 QLEFT = QLEFT - 1:. (1) I;II~I = Q (1) ."'4Oq36. H = H + QQ+HLOG(QQ) C c:I+l.r = 12l::I:l + Q C I I 1I:ONT I NUE Q CMHLPH) = lS!LEFT QI;! = l;!LEFT . 4096. H = H + I~Q*HLOI:(QQ) H = -H...*nLOlS (2. o

I F (N0LOG:r 150 TO OB0 I.i!RITE (15~TJ76) ( ~ ( I j , l : : ~ I : ~ r I = l ~ M F S L P H ~ F0RMF)TC.d' P~OB~BILIT~ I:!(::<::> CIJMIJLF)TIVE c(:~:.) '..' (2(5'x:?Ig" ...'409c;.':>'>':1 WRITE (69 071) H . . FORMFIT (./ " E N T R ~ P ~ = / Fc. 12 " B I T S / S ~ M P O L . " ...'.I !

CALL PROMPT ( O ~ ' C O ~ E W ~ R I ~ LENGTH L = ? b-1- READ (Or 020::i L I F CMODrLrl6:3.NE.O) GI3 TO 080 LW = L.."lb Ll,.ll - LLJ + 1 I F (Llr!. LT. 2 . OR. Llnll. ST. FDIM> GO TO 050 L M J = L - 1 2

IrJRITE <Or 091) FDRHAT C "IZENERAT I NIS SOVRCE '' 11

DO 0915 XPTP = 1 r XDIN I = IRAND(409&? I'SYM = MALPH IF (C( I%YM) .LE . I ) C;O TO 1536 1:S"fM = IS'fM - 1

60 TO 95 {XPTR) = I:S'fN

LAST - 0

Ir.lRITE (0, 105:) FORMHT ,.ENCODING"' :)

'v'FE- 1 DO 1111 I = l r L U l ~ ( 1 : ) = i l

'$IFE-lZ T ( 1 ) = ;104:3 T(r2:r = 11 ~ J H I L E TALI < L-.J I,O '4FE-'3

'~ )FE- '~B ::(PTR - XPTR + 1 1 F (::.<PTR. GT. XD 1 M.:l 5;0 TO 9 0 1 1 SYM := X (:::(PTR)

?,,'FE-'3,= I:HLL :SCfiDD (F TI 12 I:: I :SYM'> >

',,,'FE-.311 IZHLL HE:x:TT (T , Q (1 S'I'M'r > I F I::T(Z:) .LT.LM.J::l 150 TO 1 2 0

VFE-4 F(Llr.ll:> = 0 TO ADD s++ (-L) r HE R ~ I D #::I. 0 + 2++ (-L+l:I :j + (!:I. 5 ) T (1, = ;104:3 T ( 2 ) = L-1 IZHLL :SCHDD (:F 9 T 9 2 048:)

LAST = XPTR = LA:ST - F I R S T + 1

Ir.lRITE (0, 17:3> Nr L FORMfiT S'fMPOL.5 ENIZO~ED; CObEWORD LEN15TH ='' 15." P I T S . " 3 I F (NOLOl::, 150 TO 2 0 0 RHTE = FLOAT (Ll .NFLOHT (Pi:' Ir.lRITE (69 1 9 0 : ~ RATE FflRMHT (./'RHf E ='F6. 13.'. BI TS.':=.~MBIJL.") Ir.lR I TE (69 17'3:) N FURMHT (..#' '' S ~ I J ~ E M E I S A s E 1; " t 15 9 '.' ~ .V 'MEOLS:~ =' :I

CALL DIJMPP (6 , N 9 X (FIRST:) t MALPH:) I.I,IRITE (69 1:30.> L FORMAT (.#*'THE c o n ~ u o ~ o 15 (" 15' BIT:?:) : '.') CHLL IIIJMP (6 LW 9 F)

I:: -. I-.

1:

#3 [I 1

Ir.lRITE (Or 2 1 0? FORMAT ( ' . ~ E C O D I NG")

I,,,' F D - T(1) = 2 0 4 5 T ( 2 ) = O WHILE TAU < L-J no ' ~ F D - 3

' ~ ) F D - : ~ B CHLL :SCGET (F 9 T 9 FOVERT) IS'r'M = MALPH I F CFOVERT . GE. C ( I SL.i'MM:r GO TO 23 0 I:S'fM = IS'fM-1 150 TO 225

'~'FD-:~I: YPTR = YPTR + 1 I F C'i'PTR. CT. 'fD I MI GO TO 9 02 'f ('fPTR) = 1 S'$M

I.,,' F m- :3 IS CHLL :Sl:PIJR 9 T 9 1: ( 1 :55'M) >

I I v F D - ' ~ F CALL NEXTT CT r Q C I :S'fM:r :I

I F (T ( 2 ) . LT. LM.J::# 150 TO ZZO WRITE (Or 309) FORMAT ODECODING COMPLETE. .':I

!I.IRITE <Or 401) FORMAT ('BEGIN COMPARING":'

1 F (YPTR. ME. LRST) 150 TO '303 DO 4 0 0 I = FIRSTrLRST I F C:4 (1). T'4E.Y (1)) GO TO 9 0 4 I X N T I NUE 1r.lRITE <Or 420) I F (LOG? 1r.lRITE (69 420:) FORMRT i COMP~RIIKJN SUCCESSFUL')

GO TO 1 0 0

WRITE (Or 91 1) GO TO 1 0 0 1 IrlRITE ( 0 ~ ~ 3 1 2 ) GO TO 1 0 0 1 llJPITE (01 913:) GO TO 1 0 0 1 WRITE C09914:) I GO TO 1 0 0 1 FORMAT C ' RAN OUT OF SOURCE ~ IATR. <.)

FOf?MAT < ' OUTPUT PUFFER FULL "' :I

FORMAT ( DECOD I NG ERROP: LENGTH MI SMATIZH / )

-. , ME " '> FORMAT C"DEI=ODI NC ERROR: MI SMHTCH AT'' I SZ-TH Q..

Idk ITE COr 1002) FfJRMHT ( " T f l RESTART CI I TH 5,;AME PARAMETERS r T.rsPE " 1-10" ..' PHU'SE GO TO 090 END 10 1

TIIF: COT)EI!OR? 'IS ( 20148 PkITS) : rflhCQ)')gE flK7fl747E 8CO49CD5 8D5776F7 352T114Ag PC33Cf!51: 371:1!1'9b?. CCA3443l J321:AlA2A fl7407E7A 5F4flEfl98 CT,2rfSf{DI: lCA$2EC1 fl5412TVD (1C17101:I: FAAVIPI'C 77170DP5 gC3362C3 7261.2191 5flhCFrQ? 8A2EfllC1h 7F011432A hK7?5T!Y4 17CFq27C

L P,61(ntiFCR 531AA5K9 FFjI!lF3F3 31:@941)6') 3R5C5fl75 4C2RAR71 E77TD::flC ?.2CAF.:~~l/r Ffl36R9qfl hfl38RD87 F46Cg654 EflCAC73d 91RRf269 88A35ACA. D?C3?95n rF997.47n qIflsfln83c plcn3e6r3r: 5 4 4 ~ 5 1 ~ 5 2 3 7 4 3 1 ~ 0 9211:~~(42 4641212~1 3 ~ ~ ~ 1 ~ 1 1 2 ~ r 3 6 n 1 l i ~ R7CT74RD 2r9R5552 D933fl315 2AF124C5 613T4377 26CE3RhA h77C15rC 61@7jA@i1: 4)8CI:fl6r3 4 1 C18r7C 2ES9151A 3R51779" Crg39712 F47A5496 C ~ / I ~ ~ . Q ) ~ C 154P7f:I:l

COlll'hPI SO!! Sl1CCI;SSFUT.

ENTPOPY = ?. hq7 13TTS/S?FfROIJ.

RATE = 2.R25 RTTS/SYifP0T,

SOITRC T, MI' A41 11PQ)C I C 1 RQJCDQ\ $4371%C.I CC 1gcv q4C$ RflRRQ)P,SI? 8fl~r. 1 n f i ~ 8WC3fl8@ iflninici @RQ)31 1FU CRQ)fll IRd i ~ i r sn im 1 x 1 I I C W

THI: Cf~JllIlIOl'~ I S ( 2@4!1 SITS) Rn3QF7A6@ FCITE0639 17V51D32 7FQl:13E6P, 4C88D69C 644Rfl7.53 PARCFD43 273ER55F 7flE3C?C9 Q)20CDr)59 17T3A1 127o t?/tC6r\FEIl bE3rC31)F /lC(:97C? 1 @913AlCC 3A?@AflCI: Tr42CA37 7Arl"lrflRF 9 ~ 6 3 4 2 6 1 : 173ni32n2 r:35flr321 8 6 l / l?hCl ~444Ah2(iI H2RFfl24n

COrlPARJST)!? SIJCCESSFCTJ,

.S) = F11RQ)IPI l?flUlacll r, PRQ)7 loR@ ~ , i ~ n f l r : r f l 1 FCCHCRA 1 AACfllfll R l l@flqflI [ \ in incuf l U1@111f?7 fl(nQ)CQIC 1 r3 8RQ)80C(:Q)

7flfll fl 4flfl suf l14c11 nclRccrig l~?~c113$4f?(: 1 lCHl i n n 1glQ)CQI 1 PQ) flC3111flP R l l lRQ)@P, CRRfl1349 ncflcgplqf 1fli 11-71 5

Bibliography

Norman Abramson, Information Theory and Coding, McGraw-Hill, New York, 1963.

Robert Ash, Information Theory, Wiley Interscience, New York, 1965.

L.R. Bahl and H. Kobayashi, "Image Data Compression by Predictive Coding," IBM J. Res. Develop., March 1974, pp. 164-179.

H. Blasbalg and R. Van Blerkom, "Message Compression," IRE Trans. Space and Elec. Telem., Sept., 1962, pp. 228-238.

S.D. Bradley, "Optimizing a Scheme for Run Length Encoding," Proc. IEEE, Jan. 1969, pp. 108-109.

Larry Carter and John Gill, "Conjectures on Uniquely Decipherable Codes," IEEE Trans. Info. Theory, Vol. IT-20, No. 3, May 1974, pp. 394-396.

David L. Cohn, "Optimum Noiseless Source Codes with Fixed Dictionary Size," to be presented at IEEE International Symposium on Informa- tion Theory, Ronneby, Sweden, June 21-24, 1976.

Thomas M. Cover, "Enumerative Source Encoding", IEEE Trans. Info. Theory, Vol. IT-19, No. 1, Jan. 1973, pp. 73-77.

L.D. Davisson, "Comments on 'Sequence Time Coding for Data Compression"', Proc. IEEE, Vol. 54, Dec. 1966, p. 2010.

L.D. Davisson, "Comments on 'An Algorithm for Source Coding"', IEEE Trans. Info. Theory, Vol. IT-18, No. 6, Nov. 1972.

L.D. Davisson, "Universal Noiseless Coding," IEEE Trans. Info. Theory, Vol. IT-19, No. 6, Nov. 1973, pp. 783-795.

R.M. Fano, Technical Report No. 65, The Research Laboratory of Electronics, M.I.T., 1948.

William Feller, An Introduction to Probability Theory and its Applications, Vol. I, Third Edition, John Wiley and Sons, New York, 1968.

Robert G. Gallager, Information Theory and Reliable Communication, John Wiley and Sons, New York, 1968.

E.N. Gilbert, "Codes Based on Inaccurate Source Probabilities," IEEE Trans. Info. Theory, Vol. IT-17, No. 3, May 1971, pp.304-314.

E.N. Gilbert and E.F. Moore, "Variable-Length Binary Encodings," Bell System Tech. J., Vol. 38, July 1959, pp. 933-967.

Solomon W. Golomb, "Run-Length l":ncodings," IEEE Trans. Inf o. Theory, July 1966, pp. 399-401.

Thomas S. Huang, "An Upper Bound on the Entropy of Run-Length Coding," IEEE Trans. Info. Theory, Vol. IT-20, Sept. 1974, pp. 675-676.

David A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes," Proc. IRE, Vol. 40, No. 9, Sept. 1952, pp. 1098-1101.

Interdata, Inc., User's Manual, Publication No. 29-261R01, Interdata, Inc., Oceanport, New Jersey, 1971.

Frederick Jelinek, Probabilistic Information Theory, McGraw Hill, 1968, pp. 476-489.

Frederick Jelinek, "Buffer Overflow in Variable Length Coding of Fixed Rate Sources," IEEE Trans. I.T., Vol. IT-14, No. 3, May 1968, pp. 490-501.

Frederick Jelinek and Kenneth S. Schneider, "On Variable-Length-to-Block Coding," IEEE Trans. Info. Theory, Vol. IT-18, No. 6, Nov. 1972, pp. 765-774.

Frederick Jelinek and Kenneth S. Schneider, "Variable-Length Encoding of F.ixed-Rate Markov .Sources for Fixed-Rate Channels," IEEE Trans. Info. Theory, Vol. IT-20, No. 6, Nov. 1974, pp. 750-755.

Donald E. Knuth, The Art of Computer Programming, Vol. 2, 1st. edition, Addison-Wesley, 1971.

S. Kullback and R.A. Liebler, "On Information and Sufficiency," - The Annals of Mathematical Statistics, Vol. 22, No. 1, March 1951, pp. 79-86.

Thomas J. Lynch, "Sequence Time Coding for Data Compression," Proceedings IEEE, Vol. 54, Oct. 1966, pp. 1490-1491.

H. Meyr, Hans. G. Rosdolsky, and Thomas S. Huang, "Optimum Run-Length Codes," IEEE Trans. on Communications, Vol. COM-22, No. 6, June 1974, pp. 826-835.

John I. Molinder, "Optimal Coding with a Single Standard Run Length," IEEE Trans. Infor. Theory, Vol. IT-20, No. 3, May 1974, pp. 336-

J. Rissanen, "Generalized Kraft Inequality and Arithmetic Coding of Strings," IBM J. Res. and Dev., May 1976, (to be published).

J. Pieter M. Schalkwijk, "An Algorithm for Source Coding," IEEE Trans. Info. Theory, Vol. IT-18, No. 3, May 1972, pp. 395-399.

C.E. Shannon, "A Mathematical Theory of Communication," Bell System Tech. J., Vol. 27, No. 3, July 1948, pp. 379-423 and pp. 624-656.

B.P. Tunstall, "Synthesis of Noiseless Compression Codes," Ph.D. dissertation, Georgia Inst. Tech., Atlanta, 1968, (quoted in Jelinek and Schneider, 1972).

David C. Van Voorhis, "Constructing Codes with Bounded Codeword Lengths," IEEE Trans. Info. Theory, March 1974, pp. 288-299.

David C. Van Voorhis, "Practical Noiseless Coding," talk presented at Stanford University EE-375 Information Systems Seminar, Oct. 16, 1975.

Abraham Wald, Sequential Analysis, John Wiley and Sons, 1947, and Dover Publications, New York, 1973.

source coding algorithms for fast data compression

Documents