15-211 fundamental structures of computer science feb. 24, 2005 ananda guna lempel-ziv compression
TRANSCRIPT
![Page 1: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/1.jpg)
15-211Fundamental Structuresof Computer Science
Feb. 24, 2005
Ananda Guna
Lempel-Ziv Compression
![Page 2: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/2.jpg)
Recap
![Page 3: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/3.jpg)
Huffman Trees
• Huffman Trees can be used to construct an optimal prefix code.
• What does optimal mean?
• Greedy algorithm to assemble a Huffman tree.
• locally optimal steps to global optimization
•Requires symbol frequencies.
• read the file twice – counting and encoding
![Page 4: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/4.jpg)
Huffman Encoding Process
![Page 5: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/5.jpg)
Adaptive Huffman or Dynamic Huffman
• Clearly, having to read the data twice (first for frequency count, then for actual compression) is a bit cumbersome.
• Perhaps data is available in blocks (streaming data)
• Can build an adaptive Huffman tree that adjusts itself as more frequency data become available.
![Page 6: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/6.jpg)
Adaptive Huffman ctd..
Mapping from source messages to code words based upon a running estimate of the source message probabilities
Change the tree to remain optimal for the current estimates
adaptive Huffman codes respond to locality
Requires only a single pass of the data
![Page 7: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/7.jpg)
Beating Huffman
How about beating the compression achieved by Huffman?
Impossible! It produces an optimal prefix code.
Right.
But who says we have to use a prefix code?
![Page 8: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/8.jpg)
Dictionary-BasedCompression
![Page 9: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/9.jpg)
Dictionary-based methods
Here is a simple idea:Keep track of “words” that we have
seen, and replace them with a code number when we see them again.
We can maintain dictionary entries(word, code)
and make additions to the dictionary as we read the input file.
![Page 10: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/10.jpg)
Lempel & Ziv (1977/78)
![Page 11: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/11.jpg)
Fred Hacker’s algorithm…
Fred now knows what to do…
( <the-whole-file>, 1 )
Transmit 1, done.
![Page 12: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/12.jpg)
Right?
Fred’s algorithm provides excellent compression, but…
…the receiver does not know what is in the dictionary!And sending the dictionary is the same
as sending the entire uncompressed file
Thus, we can’t decompress the “1”.
![Page 13: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/13.jpg)
Hence…
…we need to build our dictionary in such a way that the receiver can rebuild the dictionary easily.
![Page 14: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/14.jpg)
LZW Compression:The Byte Version
![Page 15: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/15.jpg)
Byte method LZW
We start with a trie that contains a root and n childrenone child for each possible charactereach child labeled 0…n
When we compress as before, by walking down the triebut, after emitting a code and growing
the trie, we must start from the root’s child labeled c, where c is the character that caused us to grow the trie
![Page 16: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/16.jpg)
LZW: Byte method example
Suppose our entire character set consists only of the four letters:{a, b, c, d}
Let’s consider the compression of the stringbaddad
![Page 17: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/17.jpg)
Byte LZW: Compress example
baddadInput:^
a bDictionary:
Output:
10 32
c d
![Page 18: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/18.jpg)
Byte LZW: Compress example
baddadInput:^
a bDictionary:
Output:
10 32
c d
1
4
a
![Page 19: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/19.jpg)
Byte LZW: Compress example
baddadInput:^
a bDictionary:
Output:
10 32
c d
10
4
a
5
d
![Page 20: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/20.jpg)
Byte LZW: Compress example
baddadInput:^
a bDictionary:
Output:
10 32
c d
103
4
a
5
d
6
d
![Page 21: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/21.jpg)
Byte LZW: Compress example
baddadInput:^
a bDictionary:
Output:
10 32
c d
1033
4
a
5
d
6
d
7
a
![Page 22: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/22.jpg)
Byte LZW: Compress example
baddadInput:^
a bDictionary:
Output:
10 32
c d
10335
4
a
5
d
6
d
7
a
![Page 23: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/23.jpg)
Byte LZW output
So, the inputbaddad
compresses to10335
which again can be given in bit form, just like in the binary method…
…or compressed again using Huffman
![Page 24: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/24.jpg)
Byte LZW: Uncompress example
The uncompress step for byte LZW is the most complicated part of the entire process, but is largely similar to the binary method
![Page 25: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/25.jpg)
Byte LZW: Uncompress example
10335Input:^
a bDictionary:
Output:
10 32
c d
![Page 26: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/26.jpg)
Byte LZW: Uncompress example
10335Input:^
a bDictionary:
Output:
10 32
c d
b
![Page 27: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/27.jpg)
Byte LZW: Uncompress example
10335Input:^
a bDictionary:
Output:
10 32
c d
ba
4
a
![Page 28: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/28.jpg)
Byte LZW: Uncompress example
10335Input:^
a bDictionary:
Output:
10 32
c d
bad
4
a
5
d
![Page 29: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/29.jpg)
Byte LZW: Uncompress example
10335Input:^
a bDictionary:
Output:
10 32
c d
badd
4
a
5
d
6
d
![Page 30: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/30.jpg)
Byte LZW: Uncompress example
10335Input:^
a bDictionary:
Output:
10 32
c d
baddad
4
a
5
d
6
d
7
a
![Page 31: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/31.jpg)
LZW Byte method:An alternative presentation
![Page 32: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/32.jpg)
Getting off the ground
Suppose we want to compress a file containing only letters a, b, c and d.
It seems reasonable to start with a dictionary
a:0 b:1 c:2 d:3
At least we can then deal with the first letter.
And the receiver knows how to start.
![Page 33: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/33.jpg)
Growing pains
Now suppose the file starts like so:
a b b a b b …
We scan the a, look it up and output a 0.
After scanning the b, we have seen the word ab. So, we add it to the dictionary
a:0 b:1 c:2 d:3 ab:4
![Page 34: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/34.jpg)
Growing pains
We output a 1 for the b. Then we get another b.
a b b a b b …
output 1, and add bb it to the dictionary
a:0 b:1 c:2 d:3 ab:4 bb:5
![Page 35: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/35.jpg)
So?
Right, so far zero compression.
But now we get a followed by b, and ab is in the dictionary
a b b a b b …
so we output 4, and put bab into the dictionary
… d:3 ab:4 bb:5 ba:6 bab:7
![Page 36: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/36.jpg)
And so on
Suppose the input continues
a b b a b b b b a …
We output 5, and put bbb into the dictionary
… ab:4 bb:5 ba:6 bab:7 bbb:8
![Page 37: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/37.jpg)
More Hits
As our dictionary grows, we are able to replace longer and longer blocks by short code numbers.
a b b a b b b b a …
0 1 1 4 5 6
And we increase the dictionary at each step by adding another word.
![Page 38: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/38.jpg)
More importantly
Since we extend our dictionary in such a simple way, it can be easily reconstructed on the other end.
Start with the same initialization, thenRead one code number after the other,
look up the each one in the dictionary, and extend the dictionary as you go along.
![Page 39: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/39.jpg)
Again: Extending
where each prefix is in the dictionary.
We stop when we fall out of the dictionary:
a1 a2 a3 …. ak b
We scan a sequence of symbols
a1 a2 a3 …. ak
![Page 40: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/40.jpg)
Again: Extending
We output the code for a1 a2 a3 …. ak and
put a1 a2 a3 …. ak b into the dictionary.
Then we set
a1 = b
And start all over.
![Page 41: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/41.jpg)
Sort of
Let's take a closer look at an example.
Assume alphabet {a,b,c}.
The code for aabbaabb is 0 0 1 1 3 5.
The decoding starts with dictionary
a:0, b:1, c:2
![Page 42: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/42.jpg)
Moving along
The first 4 code words are already in D.
0 0 1 1 3 5
and produce output a a b b.
As we go along, we extend D:
a:0, b:1, c:2, aa:3, ab:4, bb:5
For the rest we get
a a b b a a b b
![Page 43: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/43.jpg)
Done
We have also added to D:
ba:6, aab:7
But these entries are never used.
Everything is easy, since there is already an entry in D for each code number when we encounter it.
![Page 44: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/44.jpg)
Is this it?
Unfortunately, no.
It may happen that we run into a code word without having an appropriate entry in D.
But, it can only happen in very special circumstances, and we can manufacture the missing entry.
![Page 45: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/45.jpg)
A Bad Run
Consider input
a a b b b a a ==> 0 0 1 5 3
After reading 0 0 1, D looks like this:
a:0, b:1, c:2, aa:3, ab:4
![Page 46: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/46.jpg)
Disaster
The next code is 5, but it’s not in D.
a:0, b:1, c:2, aa:3, ab:4
How could this have happened?
Can we recover?
![Page 47: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/47.jpg)
… narrowly averted
This problem only arises when
• the input contains a substring …s s …
• s was just added to the dictionary.
Here s is a single symbol, but a (possibly empty) word.
![Page 48: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/48.jpg)
… narrowly averted
But then the fix is to output
x + first(x)
where x is the last decompressed word, and first(x) the first symbol of x.
And, we also update the dictionary to contain this new entry.
![Page 49: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/49.jpg)
Example
In our example we had
• s = b
• w = empty
The output and new dictionary word is bb.
![Page 50: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/50.jpg)
Another Example
aabbbaabbaaabaababb ==> 0 0 1 5 3 6 7 9 5
Decoding (dictionary size: initial 3, final 11)
a 0
a + 0 aa
b + 1 ab
bb - 5 bb
aa + 3 bba
bba + 6 aab
aab + 7 bbaa
aaba - 9 aaba
bb + 5 aabab
![Page 51: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/51.jpg)
The problem cases
code position in D
a 0
a + 0 aa 3
b + 1 ab 4
bb - 5 bb 5
aa + 3 bba 6
bba + 6 aab 7
aab + 7 bbaa 8
aaba - 9 aaba 9
bb + 5 aabab 10
![Page 52: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/52.jpg)
Old vs. new
Ordinarily, we use an old dictionary word for the next code word.
But sometimes we immediately use what was last added to the dictionary.
But then it must be of the form s s and we can still decompress.
![Page 53: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/53.jpg)
Pseudo Code: Compression
Initialize dictionary D to all words of length 1.
Read all input characters:
output code words from D,
extend D whenever a new word appears.
New code words: just an integer counter.
![Page 54: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/54.jpg)
Less Pseudo
initialize D;
c = nextchar; // next input character
W = c; // a string
while( c = nextchar ) {
if( W+c is in D ) // dictionary
W = W + c;
else
output code(W); add W+c to D; W = c;
}
output code(W)
![Page 55: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/55.jpg)
Pseudo Code: Decompression
Initialize dictionary D with all words of length 1.
Read all code words and
- output corresponding words from D,
- extend D at each step.
This time the dictionary is of the form
( integer, word )
Keys are integers, values words.
![Page 56: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/56.jpg)
Less Pseudo
initialize D;
pc = nextcode; // first code word
pw = word(pc); // corresponding word
output pw;
First code word is easy: codes only a single symbol.
Remember as pc (previous code) and pw (previous word).
![Page 57: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/57.jpg)
More Less Pseudo
while( c = nextcode ) {
if( c is in D ) {
cw = word(c);
pw = word(pc);
ww = pw + first(cw);
insert ww in D;
output cw;
}
else {
![Page 58: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/58.jpg)
The hard case
else {
pw = word(pc);
cw = pw + first(pw);
insert cw in D;
output cw;
}
pc = c;
}
![Page 59: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/59.jpg)
Implementation - Tries
Tries are the best way to implement LZW
in the LZW situation, we can add the new word to the trie dictionary in O(1) steps after discovering that the string is no longer a prefix of a dictionary word.
Just add a new leaf to the last node touched.
![Page 60: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/60.jpg)
LZW details
In reality, one usually restricts the code words to be 12 or 16 bit integers.
Hence, one may have to flush the dictionary ever so often.
But we won’t bother with this.
![Page 61: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/61.jpg)
LZW details
Lastly, LZW generates as output a stream of integers.
It makes perfect sense to try to compress these further, e.g., by Huffman.
![Page 62: 15-211 Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression](https://reader035.vdocuments.us/reader035/viewer/2022062802/56649e9a5503460f94b9ce7c/html5/thumbnails/62.jpg)
Summary of LZW
LZW is an adaptive, dictionary based compression method.
Encoding is easy in LZW, but uses a special data structure (trie).
Decoding is slightly complicated, requires no special data structures.