Download - How GZIP works... in 10 minutes
How GZIP Compression Works Raul Fraile …in 10 minutes
About me
• PHP/Symfony2 developer at
• PHP 5.3 Zend Certified Engineer
• Symfony Certified Developer
• BS in Computer Science. Ms(Res) student in Computing Technologies.
• Open source: LadybugPHP
What is GZIP?
• GZIP is a lossless compression method, we can recover the original data once decompressed.
• It has become the de-facto lossless compression method for compressing textual data in websites.
What is GZIP?
Web server
GET index.html Accept-Encoding: gzip
How it works?
• It is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding.
• First, the LZ77 algorithm replaces repeated occurrences of data with references.
• Second, Huffman coding assigns shorter codes to more frequent “characters”.
How it works?
This file is huge! That's because the file is not compressed
<33, 9>
LZ77
How it works?
“compressed”
Huffman coding
c: 1 o: 1 m: 1 p: 1
r: 1 e: 2 s: 2 d: 1
01100011 01101111 01101101 01110000 01110010 01100101 01110011 01110011 01100101 01100100
1100 011 010 000 001 111 10 10 111 1101
Why GZIP?
• GZIP is not the best compression method, but there are a few good reasons to use it.
• Provides a good tradeoff between speed and ratio.
• Difficulty to add newer compression methods.
Implementations
GNU GZIP
7-zip Zopfli
Different implementations, different results
GZIP + PHP
$originalFile = __DIR__ . '/jquery-1.11.0.min.js'; $gzipFile = __DIR__ . '/jquery-1.11.0.min.js.gz'; $originalData = file_get_contents($originalFile); $gzipData = gzencode($originalData, 9); file_put_contents($gzipFile, $gzipData); var_dump(filesize($originalFile)); // int(96380) var_dump(filesize($gzipFile)); // int(33305)
Beyond GZIP
• Preprocessing the text can have an impact on the compression ratio.
• How? Optimizing matches.
Beyond GZIP
Beyond GZIP
{ "name": "Raul", "country": "Spain" }, { "name": "Pablo", "country": "USA" }, { "name": "Pedro", "country": "Spain" }
Transposing JSON
{ "name": [ "Raul", "Pablo", "Pedro" ], "country": [ "Spain", "USA", "Spain" ] }
Beyond GZIPOrdering XML/HTML attributes
<input id='f1' class='field' name="f1" type="text" /> <input class="field" id="f2" type="text" name="f2" />
<input id="f1" class="field" name="f1" type="text" /> <input class="field" id="f2" type="text" name="f2" />
<input id="f1" class="field" name="f1" type="text" /> <input id="f2" class="field" name="f2" type="text" />
17,76 %
27,10 %
38,32 %
<input type="text" class="field" id="f1" name="f1" /> <input type="text" class="field" id="f2" name="f2" /> 38,32 %
Thank you!