3 mathematical priliminaries data compression

15
Mathematical Preliminaries 1

Upload: shubham-jain

Post on 16-Jul-2015

22 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: 3 mathematical priliminaries DATA compression

Mathematical Preliminaries

1

Page 2: 3 mathematical priliminaries DATA compression

The development of data compression algorithmsfor a variety of data can be divided into two phases.

Modeling

Coding

In modeling phase we try to extract informationabout any redundancy that exists in the data anddescribe the redundancy in the form of a model.

2

Page 3: 3 mathematical priliminaries DATA compression

A description of the model and a “description” ofhow the data differ from the model are coded,generally using a binary alphabet.

The difference between the data and the model isoften referred to as the residual.

In the following three examples we will look at threedifferent ways that data can be modeled. We willthen use the model to obtain compression.

3

Page 4: 3 mathematical priliminaries DATA compression

If binary representations of these numbers is to betransmitted or stored, we would need to use 5 bits persample.

By exploiting the structure in the data, we can represent

the sequence using fewer bits.

If we plot this data as shown in following Figure

4

Page 5: 3 mathematical priliminaries DATA compression

We see that the dataseem to fall on a straightline.

A model for the datacould therefore be astraight line given by theequation:

5

Page 6: 3 mathematical priliminaries DATA compression

To examine the difference between the data and themodel. The difference (or residual) is computed:

En = An −Ān = 0 1 0 −1 1 −1 0 1 −1 −1 1 1

The residual sequence consists of only three numbers −1 0 1.

If we assign a code of 00 to −1, a code of 01 to 0, and a code of10 to 1, we need to use 2 bits to represent each element of theresidual sequence.

Therefore, we can obtain compression by transmitting orstoring the parameters of the model and the residualsequence.

The encoding can be exact if the required compression is to belossless, or approximate if the compression can be lossy.

6

Page 7: 3 mathematical priliminaries DATA compression

7

Page 8: 3 mathematical priliminaries DATA compression

The number of distinct values has been reduced. Fewer bits are required to represent each number and

compression is achieved. The decoder adds each received value to the previous decoded

value to obtain the reconstruction corresponding to the receivedvalue.

8

Page 9: 3 mathematical priliminaries DATA compression

The sequence is made up of eight different symbols.

we need to use 3 bits per symbol.

Say we have assigned a codeword with only a singlebit to the symbol that occurs most often, andcorrespondingly longer codewords to symbols thatoccur less often.

9

Page 10: 3 mathematical priliminaries DATA compression

10

Page 11: 3 mathematical priliminaries DATA compression

If we substitute the codes for each symbol, we will use 106 bits to encode the entire sequence.

As there are 41 symbols in the sequence, this works out to approximately 2.58 bits per symbol.

This means we have obtained a compression ratio of 1.16:1.

11

Page 12: 3 mathematical priliminaries DATA compression

Information Amount of Information Entropy Maximum Entropy Condition for maximum entropy

12

Page 13: 3 mathematical priliminaries DATA compression

Compression is achieved by removing data redundancywhile preserving information content.

The information content of a group of bytes (amessage).

Entropy is the measure of information content in amessage.

Messages with higher entropy carry more informationthan messages with lower entropy.

Data with low entropy permit a larger compression ratiothan data with high entropy.

13

Page 14: 3 mathematical priliminaries DATA compression

How to determine the entropy

Find the probability p(x) of symbol x in the message

The entropy H(x) of the symbol x is:

H(x) = - p(x) • log2p(x)

The average entropy over the entire message is the sum of the entropy of all n symbols in the message.

14

Page 15: 3 mathematical priliminaries DATA compression

15