web site optimization

51
Welcome

Upload: sunil-patil

Post on 26-Dec-2014

2.830 views

Category:

Documents


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Web Site Optimization

Welcome

Page 2: Web Site Optimization

Web Site Optimization

Presentation By:Sunil Patil

Page 3: Web Site Optimization

Our sponsors:

Page 4: Web Site Optimization

Name of Presentation

Page 4

AGENDA

AGENDA FOR THE SESSION

What is web site optimization ?

Why you should worry about web site optimization ?

Suggestions for optimizing web site

o Make fewer requests

o Use Caching

o Minimize request overhead

o Minimize response size

o Optimize browser rendering

Tools

Page 5: Web Site Optimization

Name of Presentation

Page 5

What is web site optimization?

WHAT IS WEB SITE OPTIMIZATION

End user cares about how much time it takes to render a page in his browser

(Perceived performance) and how fast he can move from one page to another

When you access a page in browser, it performs following steps to render page

o Make request

o Get HTML response (We focus mostly on this)

o Parse HTML response

o Find out resources (JS, CSS, Images) required on the page

o Download resource

o Parse resources

o Execute resources

During web site optimization, we try to optimize each of the above steps and try to

improve the perceived performance of the web site

Page 6: Web Site Optimization

Name of Presentation

Page 6

Connected.atech.com

Time to generate HTML 0.9 secTime to render page 40 sec.

Page 7: Web Site Optimization

Name of Presentation

Page 7

Advantages of web site optimization

WHY YOU SHOULD THINK ABOUT OPTIMIZING YOUR WEBSITE

Less than 15- 20 % of time is spent on generating and downloading html

o Improving this performance is not easy. It might require,

• Creating new architecture

• Re-fractoring code, introducing caching

• Tune backend

o If you improve this part by say 50 % overall gain will be 8-10%

More than 80% of time is spent in downloading, parsing and executing resources

o Improving performance is easy

• Configuration changes at infrastructure level

• Additional tasks, guidelines during development and build phase

• Additional components at infrastructure level

o If you improve this part by 50 %, overall gain will be 40%

Web pages are getting richer and complex (50 + resources, Ajax,..)

Page 8: Web Site Optimization

Name of Presentation

Page 8

Lessons Learned

LESSONS LEARNED FROM WEBSITE OPTIMIZATION EXPERIENCE AT CLIENT

Load testing does not fully capture all the performance related problems

o Business users, use older browsers compared to technical users

o Location of users matters

o Network speed matters

Changing HTTP Server level configuration takes time

o HTTP Servers are normally shared across different teams

o Application teams would be on different release cycle so they might not make changes

in their code

We under estimate the impact of web site optimization

As sites are getting richer, complex, there is greater need for web site optimization

and lot of research is happening in this area

Page 9: Web Site Optimization

Name of Presentation

Page 9

Make fewer request

1

Page 10: Web Site Optimization

Name of Presentation

Page 10

What is parallel connection ?

HOW PARALLEL CONNECTIONS IN BROWSER WORKS

The Http 1.1 specification says that a browser should allow at the most two parallel

connections per host name. So if your web page has 50 resources then browser will

start 2 downloads and queue the rest. Once a download is finished it will start next

download from queue

The total round-trip time is N/X, where N is the number of resources to fetch from a

host

Page 11: Web Site Optimization

Name of Presentation

Page 11

Number of parallel connection

NUMBER OF PARALLEL CONNECTIONS DEPEND ON BROWSER

Older browsers follow 2 parallel connections per host rule, but newer browsers use

more parallel connections.

o IE 6/7 -> 2

o IE 8 -> 6

o Firefox 2 -> 2

o Firefox 3 -> 6

o Safari 3/ 4 -> 4

o Opera -> 4

o Chrome -> 4

Browser can bring down number of parallel connection in special cases

o If you use IE 8 on dial up connection it will use 2 parallel connections

Page 12: Web Site Optimization

Name of Presentation

Page 12

Effect of number of parallel connections

2 parallel connections

6 parallel connections

Page 13: Web Site Optimization

Name of Presentation

Page 13

Script blocks parallel download

SCRIPT DOWNLOAD IS STOP ALL EVENT IN SOME BROWSER

When a browser encounters a <script> tag in html, it will stop everything until it

downloads the script , parses and executes the script

o Script tag might have a document.write(), which could affect the page content so

browser waits for the script to download and execute

• If your script performs long executing operation onload then it could cause issues

o Scripts on page must be executed in proper order

• second.js might depends on first.js, so first.js must be executed before second.js

Some of the newer browser download scripts in parallel but execute them in order

Page 14: Web Site Optimization

Name of Presentation

Page 14

Effect of browsers that block everything for scitpt

Browsers that blocks everything while downloading script

Page 15: Web Site Optimization

Name of Presentation

Page 15

How to improve parallelization

WHAT CAN WE DO TO ACHIEVE MORE PARALLELIZATION

Browsers limit number of parallel connections per hostname, so easiest way to get

around this problem will be to use multiple host names for downloading resources.

You can use one hostname to download HTML and up to 4 hostnames for

downloading other resources

o You can use www.static-atech.com for downloading resources. The

www.staticatech.com will actually point to same server

Combine files of similar type

o Use tools like Dojo Shrink Safe, YUI Compressor to combine multiple JS files

• Create a custom Dojo Build with additional classes, widgets,.. etc

o Use YUI Compressor to combine multiple .css files

o Use Images Maps, CSS Sprites

Inline smaller/non- cacheable resources

Page 16: Web Site Optimization

Name of Presentation

Page 16

Use caching

2

Page 17: Web Site Optimization

Name of Presentation

Page 17

Expiry based caching

WHAT IS EXPIRY BASED CACHING ?

Setting expiry caching header instructs browser to load resource from disk instead of

network. You can let browser know that it can cache response for certain period of

time

o The HTTP 1.1 Specification introduced Cache-Control header, you can set Cache-

Control: max-age=<noofseconds> and browser will cache the resource for

<noofseconds>. If it gets another request for resource during that time it will just use it

from disk

o The HTTP 1.0 Specification had Expires header. You can set Expires: Fri, 1 Oct 2010

12:00:00 GMT(Date in GMT) format. The browser will cache the resource and use it till

1st October

If you set both Cache-Control and Expires header then Cache-Control will take

precedence, older HTTP clients don’t understand Cache-control

Resource might get purged from cache if the browser’s cache size is reached

Page 18: Web Site Optimization

Name of Presentation

Page 18

What happens if you don’t set caching headers

HOW BROWSERS AND CACHES DEAL WITH ABSENCE OF EXPIRY RELATED HEADER

If you don’t want browser to cache a resource then you must set Cache-Control: no-

cache

If you don’t set either Expires or Cache-Control header, then browser or cache proxies

can use heuristic expiration

o Http Clients will read value of Last-Modified and if the resource is not changed for 10

months it will cache it for 1 months (Expiration Time = Now + 0.1 * (Time since Last-

Modified))

• Firefox

• IE 7

• Caching proxies

o Basic idea is if a resource is not changed for long time then it has less chance of

changing in future

Different clients might use different algorithms to come up with expiration time and

result could be unpredictable

Page 19: Web Site Optimization

Name of Presentation

Page 19

What can you do to improve caching ?

USE AGGRESSIVE CACHING OF STATIC RESOURCES

If you don’t know when resource will be updated, you should configure your site so

that HTML never gets cached and other resources get cached for long time (Months

or years)

o HTML document has references to all the resources on the page, so if a resource is

changed change its reference/URL in the HTML

• Change the file name Ex. From test.js to test_v1.js

• Change the folder Ex test.js to v1/test.js

• Create mod_rewrite rule. Ex v1/test.js, v2/test.js, v3/test.js gets mapped to test.js

If you know precisely when resource will be updated set Expires to that date

Page 20: Web Site Optimization

Name of Presentation

Page 20

Caching static resources

HOW TO CONFIGURE CACHING AT HTTP SERVER LEVEL

Apache HTTP Server has mod_expires module that you can be used to generate

expiry based caching header in response

o Sets both Cache-control and Expires header

o Can set headers for static content served by HTTP Server as well as static content

returned by the WebSphere’s File Serving Servlet

o Granular control, Can set headers globally or at URL, directory level

o Can set different expiry rules based on response content type, file extension,..

o This configuration says that images should be cached for 3 month and other resources

should be cached for 1 month

ExpiresActive On

ExpiresDefault "access 1 month"

ExpiresByType image/gif "access plus 3 month"

Page 21: Web Site Optimization

Name of Presentation

Page 21

Caching dynamic resources

HOW TO CONFIGURE CACHING OF RESOURCES SERVRED BY WEBSPHERE

The file serving servlet (Used for serving static files) does not set expires/cache-

control header. You can add ServletFilter in your web application

You can set Expires/Cache-Control headers in Servlet

WebSphere Portal server has navigatorservice.properties file that lets you configure

overall portal level caching, caching for ATOM feed

You can configure WPS to make anonymous page cachable, process is complicated

The Portlet Specification 2.0 has concept of expiration cache, which you can use for

setting Cache-control max-age and public/private header

o Set expiration-cache and cache-scope in portlet.xml

o Use ResourceResponse.getCacheControl() to get object of javax.portlet.CacheControl

and call its method setExpirationTime() and setPublicScope() methods

o Use ResourceURL.setCacheability() so that WPS generates cache friendly URLs

Page 22: Web Site Optimization

Name of Presentation

Page 22

Validation based caching

WHAT IS VALIDATION BASED CACHING ?

When a static HTML file is served (Apache HTTP Server, WebSphere’s File Serving

Servlet), the server will send Last-Modified header will value equal to date when the

file was modified (OS date)

Apache HTTP Server can generate ETag for static files based on its modification

time, size,..

If you don’t set Cache-Control: no-store, browser will store the response in cache

But every time you request the resource(No cached, or stale) it will send Conditional

GET request, with If-Modified-Since, If-None-Match header

Server will check if the resource is actually modified, if not it will return HTTP 304

with no body(Average 250 byte response) to indicate that browser can use the

response

Validation based caching is better than getting full HTTP 200 response with full body

but worst than cached resource which does not require HTTP request

Page 23: Web Site Optimization

Name of Presentation

Page 23

Validation based caching

HOW CACHE VALIDATION WORKS

The HTTP Specification has concept of Conditional GET, that helps client to prevent

download of same resource repeatedly

The Server can send Last-Modified, ETag header in response

HTTP Client (Browser, caching proxies) will copy the resource in disk cache along with

the headers

Next time when you request that resource the client will add If-Modified-Since and If-

None-Match headers to the request with the value that it had on disk

Server compares this values to the version it has and sends a HTTP 200 OK, with full

resource in the body of response if the resource is changed but if the resource is not

changed the server will send HTTP 304 Not Modified with only headers

o Original resource could be say 100kb, but the HTTP 304 respose will be 200-250 bytes,

you can save on download size

o Client has to make a request using one of the connections from parallel connection pool

Page 24: Web Site Optimization

Name of Presentation

Page 24

How validation caching works

Page 25: Web Site Optimization

Name of Presentation

Page 25

Configure ETag

WHY YOU SHOULD CONSIDER DISABLING ETAG

ETags are introduced to help with multiple HTTP server environment

HTTP Server can generate ETag(Similar to a version number) for the static

resources. Its enabled by default. The default format of ETag is INode MTime Size

Apache HTTP Server sends both Last-Modified and ETag header. You cant disable

Last-Modified. Browser will send both If-Modified-Since and If-None-Match header to

check if resource is still valid

As per HTTP Specification both IMS and INM conditions should be met for server to

return HTTP 304 (Desired behavior with smaller response)

If your request goes to HTTP server that has different file permission but same date,

Server will return HTTP 200 instead of HTTP 304

You can configure, disable ETag by adding FileETag None to httpd.conf. Or at least

configure it to FileETag MTime

Page 26: Web Site Optimization

Name of Presentation

Page 26

Leverage proxy caching

HOW TO CACHE RESOURCE ACROSS USERS

Big portion of internet traffic goes through caching proxy

o Proxy provided by ISP

o Proxy provided by corporate network for outbound connection

o Proxy infront of your web server for inbound connection

Enabling public caching in the HTTP headers for static resources allows the browser

to download resources from a nearby proxy server rather than from a remoter origin

server

o Proxy will share cached resources across proxies

You use the Cache-control: public header to indicate that a resource can be cached

by public web proxies in addition to the browser that issued the request.

Set appropriate Vary header (Vary: Accept-Encoding, User-Agent)

Page 27: Web Site Optimization

Name of Presentation

Page 27

Minimize request overhead

3

Page 28: Web Site Optimization

Name of Presentation

Page 28

HTTP Requst

WHAT HAPPENS WHEN BROWSER REQUESTS A RESOURCE

When you try accessing a resource in your browser, it performs following steps

o DNS resolution

o Establish HTTP connection

o Send request

o Receive response

You should try and reduce overhead on each of these steps

Page 29: Web Site Optimization

Name of Presentation

Page 29

Reduce DNS resolution time

REDUCE DNS RESOLUTION TIME

Before a browser establishes a connection with server it must resolve host name into

IP address. This value is cached by

o Operating System

o Browser

The DNS record cache has short life time and might have to traverse hierarchy to get

record

Reducing the number of unique hostnames from which resources are served cuts

down on the number of DNS resolutions that the browser has to make

Don't use more than 1 host for less than 5 resources, balance resources across host

names

Serve early loaded JavaScript from same domain as that of host

o Browsers block parallel download while downloading JavaScript, so it should be as fast

as possible

Page 30: Web Site Optimization

Name of Presentation

Page 30

Use HTTP Persistent Connection

WHAT IS HTTP PERSISTENT CONNECTION AND WHY YOU SHOULD CARE

Web clients often open connection to same site for downloading HTML and related

resources. HTTP 1.1 (Keep Alive in HTTP 1.0) allows HTTP devices to keep TCP

connection open after transaction complete and to reuse the preexisting connection

for future HTTP requests. The connections that are kept open after transaction are

called persistent connection

o You can avoid slow connection setup

o You can avoid slow-start congestion adaption phase.

Persistent connections are more efficient when used in conjunction with parallel

connections.

Starting from HTTP 1.1 connection is persistent by default unless you set

Connection: close

You can set “KeepAlive on” in Apache to turn on persistent connection

Page 31: Web Site Optimization

Name of Presentation

Page 31

Persistent Connection

Page 32: Web Site Optimization

Name of Presentation

Page 32

Size of HTTP Request

WHY SIZE OF HTTP REQUEST MATTERS ?

Most users have asymmetric connection, upload to download speed is in ration 1:4 to

1:20. That means uploading 500 bytes is same as downloading 10 KB. We cant

compress data in HTTP request. You should try and keep your request size small so

that it fits in one packet of 1500 bytes

Initial HTTP request suffers from Startup Throttling

HTTP request is made up of following things

o Request header set by browser

o URL, Referral URL

o Cookies

You should try and reduce size of each of the request components

Page 33: Web Site Optimization

Name of Presentation

Page 33

Request for static resource

Page 34: Web Site Optimization

Name of Presentation

Page 34

Minimize cookie size

HOW YOU CAN REDUCE COOKIE SIZE

Enterprise applications need at least few big cookies that we cant avoid

o LTPA Token, JSessionId, SSO related cookies

Every time a client sends an HTTP request, it has to send all associated cookies that

have been set for that domain and path along with it.

o Use server side storage for cookie for most of the cookie payload and send only a Key

in the cookie.

o Serving static resources from a cookie less domain reduces the total size of requests

made for a page

• Static resources do not need cookies

• Typical static file will be less than 10 KB, so more time is spent in making request then

getting response

Page 35: Web Site Optimization

Name of Presentation

Page 35

Minimize response size

4

Page 36: Web Site Optimization

Name of Presentation

Page 36

Compress response

USE GZIP FOR COMPRESSING RESPONSE

Compressing resources with GZip will reduce the size of resource by 70 %

Most modern browsers support compressed data. Browser sends Accept-Encoding

header to specify what all encodings it supports

You can configure HTTP server to compress both static files that it serves and

dynamic content that goes through it

o You should compress only text files such as HTML, JavaScript, CSS

o You should not compress binary files such as Images, PDF, They are already

compressed and there size might increase after GZip

o You should not compress resources less than 150 bytes

Page 37: Web Site Optimization

Name of Presentation

Page 37

Configure GZip on Apache HTTP Server

HOW TO CONFIGURE APACHE HTTP SERVER FOR GZIP

Apache HTTP Server has a mod_deflate module that you can use to GZip the

response

You can use it to GZip both static files served by Apache and dynamic responses

that are tunneled through Apache HTTP Server

o It checks if browser supports GZip and if yes then only GZip’s response

o It allows you to configure GZip by content type

• LoadModule deflate_module modules/mod_deflate.so

AddOutputFilterByType DEFLATE text/html text/plain text/xml

o Make sure that you set Vary: Accept-Encoding so that proxy can deal with clients who

do not support GZip properly

Page 38: Web Site Optimization

Name of Presentation

Page 38

Minification

MINIFY TEXT FILES

Minification is the practice of removing unnecessary characters from the code to

reduce its size

o Extra spaces

o Line breaks

o Indentation

o Comments

You can use tools to minify

o JavaScript

o CSS

o HTML

Page 39: Web Site Optimization

Name of Presentation

Page 39

Minify JavaScript

WHY MINIFY JAVASCRIPT

Compacting JavaScript code can save many bytes of data and speed up

downloading, parsing, and execution time.

Minification will reduce size by up to 30 %

There are several tools that you can use for minifying JavaScript

o Dojo Shrink safe

o YUI Compressor

o Google’s Closure compiler

Task to minify JavaScript should be part of your build script

You can also minify JavaScript on the fly using Servlet Filter

Page 40: Web Site Optimization

Name of Presentation

Page 40

Minify CSS

WHY MINIFY CSS

Compacting CSS code can save many bytes of data and speed up downloading,

parsing, and execution time.

Minifying CSS has same advantages that of minifying JavaScript

There are several tools for minifying CSS

o YUI Compressor

o Cssmin.js

You can add task to minify CSS in the build script

You can minify CSS on the fly using Servlet Filter

Page 41: Web Site Optimization

Name of Presentation

Page 41

Minify HTML

WHY COMPACT/MINIFY HTML

Compacting HTML code, including any inline JavaScript and CSS contained in it, can

save many bytes of data and speed up downloading, parsing, and execution time.

There are YUI Tag libraries that you can use to compress inline JavaScript and CSS

WebSphere generates quite few blank lines and white spaces in HTML

o Set com.ibm.wsspi.jsp.usecdatatrim  property to true in Web Container Custom

settings to bring size of generated HTML by up to 15%

Page 42: Web Site Optimization

Name of Presentation

Page 42

Optimize Images

WHY OPTIMIZE IMAGES

Properly formatting and compressing images can save many bytes of data

Images saved from programs like Fireworks can contain kilobytes of extra comments,

and use too many colors, even though a reduction in the color palette may not

perceptibly reduce image quality

Choose an appropriate Image file format

o PNGs are almost always superior to GIFs and are usually the best choice

o Use GIFs for very small or simple graphics and for images which contain animation.

o Use JPGs for all photographic-style images.

o Do not use BMPs or TIFFs.

Use an image compressor

Page 43: Web Site Optimization

Name of Presentation

Page 43

Optimize browser rendering

5

Page 44: Web Site Optimization

Name of Presentation

Page 44

What is optimizing browser rendering

OPTIMIZE BROWSER RENDERING

Once resources have been downloaded to the client, the browser still needs to load,

interpret, and render HTML, CSS, and Javascript code. By simply formatting your

code and pages in ways that exploit the characteristics of current browsers, you can

enhance performance on the client side.

o Put CSS at the top of the document

o Always specify content type encoding

o Specifying a character set early for your HTML documents allows the browser to begin

executing scripts immediately

o Put JavaScript at the end of the document

o Avoid CSS expressions

Page 45: Web Site Optimization

Name of Presentation

Page 45

Tools

6

Page 46: Web Site Optimization

Name of Presentation

Page 46

Testing tools

WHAT TOOLS SHOULD YOU USE FOR TESTING

Traditional load testing tools like Load Runners are not well suited for capturing

browser performance data

o They take simplistic view of HTTP transaction

o Browser has lot of logic and variations

Use load testing tools that run inside browser

o iOpus iMacros

o Selenium

o Gomez

Page 47: Web Site Optimization

Name of Presentation

Page 47

Yahoo YSlow

Page 48: Web Site Optimization

Name of Presentation

Page 48

Google Page speed

Page 49: Web Site Optimization

Name of Presentation

Page 49

Charles Web Debugging Proxy

Page 50: Web Site Optimization

Name of Presentation

Page 50

Reference

MORE INFORMATION

My Blog (http://wpcertification.blogspot.com/search/label/clientsideperformance)

High performance web site, Oreilly Publication

Even faster web site, Oreilly Publication

Page 51: Web Site Optimization

THANK YOU FOR WATCHING

CONTACT INFO:

ASCENDANT TECHNOLOGY, LLC

8601 Ranch Road 2222

Building I, Suite 205

Austin, TX 78730

Phone (512) 346-9580

Thank You

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

April 10, 2023