jszap : compressing javascript code

Post on 22-Feb-2016

85 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

JSZap : Compressing JavaScript Code. Martin Burtscher , UT Austin Ben Livshits & Ben Zorn, Microsoft Research Gaurav Sinha , IIT Kanpur. A Web 2.0 Application Dissected. 1+ MB code. Talks to 14 backend services (traffic, images, directions, ads, …). - PowerPoint PPT Presentation

TRANSCRIPT

JSZap: Compressing JavaScript Code

Martin Burtscher, UT AustinBen Livshits & Ben Zorn, Microsoft Research

Gaurav Sinha, IIT Kanpur

2

A Web 2.0 Application Dissected

70,000+ lines of JavaScript code

downloaded2,855 Functions

1+ MB codeTalks to 14 backend

services(traffic, images,

directions, ads, …)

3

Lots of JavaScript being Transmitted

www.live.com

spreadsheets.google

maps.live

chi.lexigame

hotmail

gmail

dropthings

maps.google

pageflakes

bunny hunt

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Fraction of download that is JavaScript

Up to 85% of a Web 2.0

app is JavaScript code!

AJAX: Tension Headaches

4

Execution can’t start without

the code

Move code to client for

responsiveness

5

JavaScript on the Wire

JavaScript crunch

gzip -d parser AST

JSZap

gzip

6

JSZap Approach

• Represent JavaScript as AST instead of source

• Serialize the compressed AST

• Decompress directly into AST on client

• Use gzip as 2nd-level (de-)compressor

7

Benefits of AST-based Compression

• Compression: less to transmit• ASTs are blasted directly into the browser

Reduced Latency

• Reduces mobile charges• Reduces operator network costs: better for servers

Reduced Network Bandwidth

• Ensures well-formedness of code• Can use to check language subsets easily (AdSafe)• Caching incremental updates• Unblocking HTML parser

Correctness, Security, and other Benefits

8

JSZap Compression

JavaScript JSZap gzip

9

JSZap Compression

JavaScript identifiers gzip

literals

productions1

2

3

10

GZIP is a formidable

opponent

11

JSZap vs. GZIP

JSZapgzip0

5

10

15

20

25

30

35

40

5.45.4

18.419.0

8.411.5

Literals Identifiers Productions

Size

in K

B

12

Talk Outline

identifiers

literals

productions1

2

3

evaluation on real code

13

Background: ASTs

a * b + c 1) E E + T

2) E T3) T T * F

4) T F5) F id

+

*

a b

c5

5

1

3

5

Expression Grammar Tree

14

A Simple Javascript Examplevar y = 2;function foo () {

var x = "jscrunch";var z = 3;z = y + y;

}x = "jszap";

Identifier Stream

y foo x z z y y x

Literal Stream

"jscrunch" 2 3 "jszap"

Production Stream

1 3 4 ... 1 3 4 ...

15

Benchmarking JSZap

Benchmark name Source lines

Source bytes

gmonkey 922 17,382getDOMHash 1,136 25,467bing1 3,758 77,891bingmap1 3,473 80,066livemsg1 5,307 93,982bingmap2 9,726 113,393facebook1 5,886 141,469livemsg2 7,139 156,282officelive1 22,016 668,051

• JavaScript files up to 22K LOC

• Variety of app types

• Both hand-generated, and machine-generated

• gzipped everything

16

Components of JavaScript Sourcegm

onke

y

getD

OM

Hash

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

0%10%20%30%40%50%60%70%80%90%

100%

productions identifiers literals

• None of the categories can be ignored

• Identifiers become more prominent with code growth

17

Compressing the Production Stream

• Frequency-based production renaming

• Differential encoding: 26 and 57 => 2 and 3

• Chain rule: eliminate predictable productions

• Tree-based prediction-by-partial-match

18

PPMC

• Consider compressing – if (P) then X else X

• Should be very compressible• if (P) then ...abc... else ...abc...

P

XX

• Tree context used to build a predictor

• Provides the next likely child node given context C and child position p

• Arithmetic coding: more likely=shorter IDs

• See paper for details

19

Production Compression with PPMC

gmon

key

getD

OM

Hash

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

50%55%60%65%70%75%80%85%90%95%

100%

0.6772

Prod

uctio

n Co

mpr

essi

on (g

zip

= 1)

20

Compressing the Identifier Stream

• Symbol tables instead of identifier stream:– Compress redundancy: offset into table– Global or local symbol tables– Use variable-length encoding

• Other techniques:– Sort symbols by frequency– Rename local variables

21

Variable-length Encoding for Identifiers

is global?

is renamed local

00…

01…

fits in 1 byte?

11…

10…

22

Variable-Length Identifier Encodinggm

onke

y

getD

OM

Hash

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

0%10%20%30%40%50%60%70%80%90%

100%

parent local 2byte local 1byte local builtin global 2byte global 1byte

23

Symbol Tables: Effectiveness

gmon

key

getD

OM

Hash

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

80%

85%

90%

95%

100%

0.943

89%

Global ST VarEnc

Iden

tifier

s (N

oST

= 1)

24

Compressing Literals

• Symbol tables• Grouping literals by type• Pre-fixes and post-fixes• These techniques result in 5-10% savings

compared to gzip

25

Average JSZap Compression: 10%

gmon

key

getD

OM

Hash

bing

1

bing

map

1

livem

sg1

bing

map

2

face

book

1

livem

sg2

office

live1

80%82%84%86%88%90%92%94%96%98%

100%

0.8792

JSZa

p Co

mpr

essi

on (g

zip

= 1)

Productions; 26%

Identifiers; 57%

Literals; 17%

13% savings

26

Summary and Conclusions• JSZap: AST-based compression for JavaScript

• Propose a range of techniques for compressing– Productions– Identifiers– Literals

• Preliminary results are encouraging: 10% savings over gzip

• Future focus– Latency measurements – Browser integration

27

Well-formedness

Security (AdSafe)

AST representation

Unblocking HTML parser

Caching and incremental

updates

Compression with JSZap

?

Questions?

top related