1 10 web workload characterization web protocols and practice

60
1 10 Web Workload Characterization Web Protocols and Practice

Upload: oscar-watkins

Post on 18-Dec-2015

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 1 10 Web Workload Characterization Web Protocols and Practice

1

10

Web Workload Characterization

Web Protocols and Practice

Page 2: 1 10 Web Workload Characterization Web Protocols and Practice

2

Topics

Web Protocols and Practice

WEB WORKLOAD

CHARACTERIZATION

Web Workload Definition Workload Characterization Statistics and Probability Distributions HTTP Message Characteristics Web Resource Characteristics User Behavior Characteristics Applying Workload Models

Page 3: 1 10 Web Workload Characterization Web Protocols and Practice

3

Web Workload Definition

Web Protocols and Practice

Important performance metrics, such as user-perceived latency and server throughput, depend on the interaction of numerous protocols and software components.

A workload consists of the set of all inputs a system receives over a period of time.

Web workload models are used to generate request traffic for comparing the performance of different proxy and server implementation.

WEB WORKLOAD

CHARACTERIZATION

Page 4: 1 10 Web Workload Characterization Web Protocols and Practice

4

Web Workload Definition

Web Protocols and Practice

Developing a workload model involves three main steps:

Identifying the important workload parameters Analyzing measurement data to qualify these

parameters Validating the model against reality

Constructing a workload model requires an understanding of statistical techniques for analyzing measurement data and representing the key properties of Web traffic.

WEB WORKLOAD

CHARACTERIZATION

Page 5: 1 10 Web Workload Characterization Web Protocols and Practice

5

Web Workload Definition

Web Protocols and Practice

Key properties of Web workloads are: HTTP message characteristics Resource characteristics User behavior

WEB WORKLOAD

CHARACTERIZATION

Page 6: 1 10 Web Workload Characterization Web Protocols and Practice

6

Workload Characterization

Web Protocols and Practice

A workload model consists of a collection of parameters that represent the key features of the workload that affect the resource allocation and system performance.

Workload model can be applied to a variety of performance evaluation tasks, such as the following:

Identifying performance problems Benchmarking Web components Capacity planning

WEB WORKLOAD

CHARACTERIZATION

Page 7: 1 10 Web Workload Characterization Web Protocols and Practice

7

Workload Characterization

Web Protocols and Practice

Workload models have several approaches: Trace-driven workload

» Constructs requests directly from an existing log or trace

» Reproduces a known workload» Avoids the intermediate step of analyzing the traffic» Not provide flexibility for experimenting with changes

to the workload» No clear separation between the load and

performance

WEB WORKLOAD

CHARACTERIZATION

Page 8: 1 10 Web Workload Characterization Web Protocols and Practice

8

Workload Characterization

Web Protocols and Practice

Stress testing» Sends requests as fast as possible to evaluate a

proxy or a server under heavy load» May not present the realistic traffic patterns

WEB WORKLOAD

CHARACTERIZATION

Page 9: 1 10 Web Workload Characterization Web Protocols and Practice

9

Workload Characterization

Web Protocols and Practice

Synthetic Workload» derives from an explicit mathematical model that can

be inspected, analyzed, and criticized» Represents the key properties of real Web traffic» Explores system performance in a controlled manner

by changing the parameters associated with each probability distribution

WEB WORKLOAD

CHARACTERIZATION

Page 10: 1 10 Web Workload Characterization Web Protocols and Practice

10

Workload Characterization

Web Protocols and Practice

To ensure that a workload model is representative of real workloads, the parameters of the model should have certain properties:

Decoupling from underlying system Proper level of detail Independence from other parameters

(Table 10.1)

WEB WORKLOAD

CHARACTERIZATION

Page 11: 1 10 Web Workload Characterization Web Protocols and Practice

11

Table 10.1. Examples of Web workload parameters

Web Protocols and Practice

WEB WORKLOAD

CHARACTERIZATION

CategoryParameter

ProtocolRequest method

Response code

ResourceContent type

Resource size

Response size

Popularity

Modification frequency

Temporal locality

Number of embedded resources

UsersSession interarrival times

Number of clicks per session

Request interarrival times

Page 12: 1 10 Web Workload Characterization Web Protocols and Practice

12

Statistics and Probability Distributions

Web Protocols and Practice

Statistics such as the mean, median, and variance capture the basic properties of many workload parameters.

Mean shows the average value of the parameters. Median shows the middle value of parameters.

Half of the values are smaller than the median and the other half are larger than the median.

Variance or standard deviation attempt to quantify how much the parameters varies from the average value.

WEB WORKLOAD

CHARACTERIZATION

Page 13: 1 10 Web Workload Characterization Web Protocols and Practice

13

Statistics and Probability Distributions

Web Protocols and Practice

For a sequence of 4100, 4700, 4200, 20,000, 4000 bytes

mean size = 7400 bytes median size = 4200 bytes

For a sequence of 4100, 4700, 4200, 4800, 4000 bytes

mean size = 4360 bytes median size = 4200 bytes

WEB WORKLOAD

CHARACTERIZATION

Page 14: 1 10 Web Workload Characterization Web Protocols and Practice

14

Statistics and Probability Distributions

Web Protocols and Practice

Probability distributions capture how a parameter varies over a wide range of values.

WEB WORKLOAD

CHARACTERIZATION

Page 15: 1 10 Web Workload Characterization Web Protocols and Practice

15

Statistics and Probability Distributions

Web Protocols and Practice

For a sequence of 4100, 4700, 4200, 20,000, 4000 bytes

F(x) = P(X <= x)

WEB WORKLOAD

CHARACTERIZATION

Example of cumulative distribution Function (CDF)

Page 16: 1 10 Web Workload Characterization Web Protocols and Practice

16

Statistics and Probability Distributions

Web Protocols and Practice

For a sequence of 4100, 4700, 4200, 20,000, 4000 bytes

Fc (x) = P(X > x) = 1 − F(x)

WEB WORKLOAD

CHARACTERIZATION

Figure 10.1. Example of complementary cumulative distribution Function (CCDF)

Page 17: 1 10 Web Workload Characterization Web Protocols and Practice

17

Statistics and Probability Distributions

Web Protocols and Practice

Several probability distributions have been widely applied to workload characterization.

One of the most popular probability distributions is the exponential distribution with the form

mean

WEB WORKLOAD

CHARACTERIZATION

/1)( xE

xexf )(

Page 18: 1 10 Web Workload Characterization Web Protocols and Practice

18

Statistics and Probability Distributions

Web Protocols and Practice

Relating a measured distribution to an equation requires justifying the hypothesis that the equation is capable of accurately representing the measured data.

Justifying this hypothesis consists of two key steps:

The measured data is fitted with the equation to determine the value of each variable.

Statistical tests are performed to compare the resulting equation with the measured equation.

WEB WORKLOAD

CHARACTERIZATION

Page 19: 1 10 Web Workload Characterization Web Protocols and Practice

19

Statistics and Probability Distributions

Web Protocols and Practice

In some cases, no single well-known distribution matches the measured data.

It may be necessary to represent different parts of the measured distribution with different equations.

WEB WORKLOAD

CHARACTERIZATION

Page 20: 1 10 Web Workload Characterization Web Protocols and Practice

20

HTTP Message Characteristics

Web Protocols and Practice

HTTP Request Methods HTTP Response Codes

WEB WORKLOAD

CHARACTERIZATION

Page 21: 1 10 Web Workload Characterization Web Protocols and Practice

21

HTTP Request Methods

Web Protocols and Practice

Knowing which request methods arise in practice is useful for optimizing server implementation and developing realistic benchmarks for evaluating Web proxies and servers.

Traffic characteristics: The overwhelming majority or Web requests use

the GET method to fetch resources and invoke scripts.

Small fraction of HTTP requests use the POST method to submit data in forms.

WEB WORKLOAD

CHARACTERIZATION

Page 22: 1 10 Web Workload Characterization Web Protocols and Practice

22

HTTP Request Methods

Web Protocols and Practice

Measurements show a small number of HEAD requests to test an operational Web server.

Web Distributed Authoring and Versioning (WEBDAV) use PUT and DELETE methods frequently.

The emergence of tools for testing and debugging Web components may increase the use of the TRACE method.

The exact distribution of request methods varies from site to site.

WEB WORKLOAD

CHARACTERIZATION

Page 23: 1 10 Web Workload Characterization Web Protocols and Practice

23

HTTP Response Codes

Web Protocols and Practice

Knowing how servers respond to client requests is an important part of constructing a realistic model of Web workloads.

Traffic characteristics: 200 OK: for 75% to 90% of responses 304 Not Modified: for 10% to 30% of responses The other redirection(3xx) codes and the client

error(4xx) codes are the most common 206 Partial Content: may become more common

when the server returns a range of bytes from the requested resource

WEB WORKLOAD

CHARACTERIZATION

Page 24: 1 10 Web Workload Characterization Web Protocols and Practice

24

HTTP Response Codes

Web Protocols and Practice

302 Found: is used for redirection responses and varies from site to site

WEB WORKLOAD

CHARACTERIZATION

Page 25: 1 10 Web Workload Characterization Web Protocols and Practice

25

Web Resource Characteristics

Web Protocols and Practice

Content Type Resource Size Response Size Resource Popularity Modification Frequency (Resource Changes) Temporal Locality Number of Embedded Resources

WEB WORKLOAD

CHARACTERIZATION

Page 26: 1 10 Web Workload Characterization Web Protocols and Practice

26

Web Resource Characteristics

Web Protocols and Practice

Understanding the characteristics of Web resources is an important part of modeling Web workload.

Resources are vary in terms of: How big they are How popular they are How often they change Characteristics of Web resources are: Content type Resource size

WEB WORKLOAD

CHARACTERIZATION

Page 27: 1 10 Web Workload Characterization Web Protocols and Practice

27

Web Resource Characteristics

Web Protocols and Practice

Response size Resource popularity Modification frequency (Resource changes) Temporal locality Number of embedded resources

WEB WORKLOAD

CHARACTERIZATION

Page 28: 1 10 Web Workload Characterization Web Protocols and Practice

28

Content type

Web Protocols and Practice

Content type has a direct relationship to other key workload parameters, such as resource size and modification frequency.

Traffic characteristics: Overwhelming majority of resources are text

content (plain and HTML) and images (jpeg and gif)

The remaining content types include documents such as postscript and PDF, software such as JavaScript of Java applets, and audio and video data.

WEB WORKLOAD

CHARACTERIZATION

Page 29: 1 10 Web Workload Characterization Web Protocols and Practice

29

Content type

Web Protocols and Practice

The emergence of new application can have an influence on the distribution of content types.

WEB WORKLOAD

CHARACTERIZATION

Page 30: 1 10 Web Workload Characterization Web Protocols and Practice

30

Resource Sizes

Web Protocols and Practice

The sizes of Web resources affect: The storage requirements at the origin server The overhead of caching resources at browsers

and proxies The load on the network The latency in delivering the response message

Traffic characteristics: The average resource size is relatively small

» Average size of an HTML: 4 to 8 KB» Median size of an HTML: 2 KB» Average size of an image: 14 KB

WEB WORKLOAD

CHARACTERIZATION

Page 31: 1 10 Web Workload Characterization Web Protocols and Practice

31

Resource Sizes

Web Protocols and Practice

Knowing the distribution of resource sizes at Web sites is useful for deciding how to allocate memory or disk space at a server or proxy.

The high variability in resource size is captured by the Pareto distribution

mean

α is a shape parameter

k is a scale parameter

WEB WORKLOAD

CHARACTERIZATION

kxxkxf ,)/()(

1),1/()( kxE

Page 32: 1 10 Web Workload Characterization Web Protocols and Practice

32

Statistics and Probability Distributions

Web Protocols and Practice

WEB WORKLOAD

CHARACTERIZATION

Figure 10.2. Exponential and Pareto distributions (with mean of 1)

Page 33: 1 10 Web Workload Characterization Web Protocols and Practice

33

Statistics and Probability Distributions

Web Protocols and Practice

WEB WORKLOAD

CHARACTERIZATION

Figure 10.3. Exponential and Pareto distributions on a logarithmic scale

Page 34: 1 10 Web Workload Characterization Web Protocols and Practice

34

Statistics and Probability Distributions

Web Protocols and Practice

WEB WORKLOAD

CHARACTERIZATION

Figure 10.4. Lognormal distribution

Page 35: 1 10 Web Workload Characterization Web Protocols and Practice

35

Response Sizes

Web Protocols and Practice

In analyzing the server and network performance, the size of response messages is a more important factor.

Traffic characteristics: Response sizes may differ from resource sizes for

a variety of reasons:» Some HTTP response messages do not have a

message body.» Some Web resources are never requested and do

not contribute to the set of response messages.» Some responses are aborted before they complete,

resulting in shorter transfers.

WEB WORKLOAD

CHARACTERIZATION

Page 36: 1 10 Web Workload Characterization Web Protocols and Practice

36

Response Sizes

Web Protocols and Practice

The median of the response size distribution is several hundred bytes smaller than the median resource size.

Response sizes can be represented by a combination of the lognormal and Pareto distributions.

Response size distribution has a heavy tail. Some factors suggest that the distribution of

response sizes is not the same as the distribution of resource sizes.

WEB WORKLOAD

CHARACTERIZATION

Page 37: 1 10 Web Workload Characterization Web Protocols and Practice

37

Resource Popularity

Web Protocols and Practice

The popularity of the various resources at a Web site has important performance implications.

The most popular resources are likely to reside in main memory at the origin server, obviating the need to fetch the data from disk.

Traffic characteristics: Popularity is measured in terms of the proportion

of requests that access a particular resource The probability mass function (pmf) P(r) captures

the proportion of requests directed to each resources.

WEB WORKLOAD

CHARACTERIZATION

Page 38: 1 10 Web Workload Characterization Web Protocols and Practice

38

Resource Popularity

Web Protocols and Practice

The proportion of requests for a resource follows Zipf’s Law:

r is the rank of an object

k is a constant that ensures that P(r) sums to 1.(Figure 10.5)

WEB WORKLOAD

CHARACTERIZATION

1)( krrP

Page 39: 1 10 Web Workload Characterization Web Protocols and Practice

39

Resource Popularity

Web Protocols and Practice

WEB WORKLOAD

CHARACTERIZATION

Figure 10.5. Zipf’s law

Page 40: 1 10 Web Workload Characterization Web Protocols and Practice

40

Resource Popularity

Web Protocols and Practice

more generally, a Zipf-like distribution has the form

for some constant c.» The extreme case of c= 0 corresponds to all

resources having equal popularity.» Early studies of requests to Web servers found c

values close to 1.» More recent studies show values for c in the range of

0.75 to 0.95

WEB WORKLOAD

CHARACTERIZATION

ckrrP )(

Page 41: 1 10 Web Workload Characterization Web Protocols and Practice

41

Resource Changes

Web Protocols and Practice

Web resources change over time as a result of modifications at the origin server.

Modifications to resources affect the performance of Web caching.

Resources that change less often may be given preference in caching or revalidated with the origin server less frequently.

Traffic characteristics: Images do not change very often Text and HTML files change more often than

images

WEB WORKLOAD

CHARACTERIZATION

Page 42: 1 10 Web Workload Characterization Web Protocols and Practice

42

Resource Changes

Web Protocols and Practice

Some resources change in a periodic fashion: News stories

The Expires header could indicate the next time that a cached resource would change.

Accurate timing information in the HTTP response message can reduce the load on the origin server as well as the user-perceived latency for accessing the resource.

An accurate model of Web workloads need to consider the frequency of resource changes.

WEB WORKLOAD

CHARACTERIZATION

Page 43: 1 10 Web Workload Characterization Web Protocols and Practice

43

Temporal Locality

Web Protocols and Practice

The time between successive requests for the same resource has a significant affect on Web traffic.

Resource popularity indicates the frequency of requests without indicating the spacing between the requests.

Temporal locality captures the likelihood that a requested resource will be requested again in the near future.

WEB WORKLOAD

CHARACTERIZATION

Page 44: 1 10 Web Workload Characterization Web Protocols and Practice

44

Temporal Locality

Web Protocols and Practice

Testing a server with a benchmark that has low temporal locality would underestimate the potential throughput.

High temporal locality also increases the likelihood that a request is satisfied by a browser or proxy cache and reduces the likelihood that a resource has changed since the previous access.

WEB WORKLOAD

CHARACTERIZATION

Page 45: 1 10 Web Workload Characterization Web Protocols and Practice

45

Temporal Locality

Web Protocols and Practice

Traffic characteristics: Temporal locality can be measured by

sequencing through the stream of requests, putting each request at the top of a stack, and noting the position in the stack- the stack distance - of the previous access to each resource.

The small stack distance suggests high temporal locality.

The stack distances for requests for a resource follow a lognormal distribution.

WEB WORKLOAD

CHARACTERIZATION

Page 46: 1 10 Web Workload Characterization Web Protocols and Practice

46

Number of Embedded Resources

Web Protocols and Practice

Embedded resources include images, JavaScript programs, and other HTML files that appear as frames in the containing Web page.

The number of embedded references in a Web page has significant impact on the server and network load.

Traffic characteristics: Web pages have a median of 8 to 20 embedded

resources. The distribution has high variability, following the

Pareto distribution.

WEB WORKLOAD

CHARACTERIZATION

Page 47: 1 10 Web Workload Characterization Web Protocols and Practice

47

Number of Embedded Resources

Web Protocols and Practice

The number of embedded images has tended to increase over time as more users have high-bandwidth connection to the Internet.

A large number of embedded resources does not necessarily translate into a large number of requests to the Web server:

A cached copy of embedded resource may be available.

Some embedded images do not reside at the same Web server as the containing Web page.

WEB WORKLOAD

CHARACTERIZATION

Page 48: 1 10 Web Workload Characterization Web Protocols and Practice

48

User Behavior Characteristics

Web Protocols and Practice

Web workload characteristics depend on the behavior of users as they download Web pages from various sites.

The workload introduced by a single user can be modeled at three levels:

Session» The series of requests by a single user to a single

Web site could be viewed as a logical session.

Click» A user performs one or more clicks to request Web

pages.

WEB WORKLOAD

CHARACTERIZATION

Page 49: 1 10 Web Workload Characterization Web Protocols and Practice

49

User Behavior Characteristics

Web Protocols and Practice

Request» Each click triggers the browser to issue an HTTP

request for a resource.

Each session arrival brings a new user to the site.

The client may establish a new TCP connection for a request or send a request on an existing TCP connection.

Session arrivals can be studied by considering the time between the start of one user session and the start of the next user session.

WEB WORKLOAD

CHARACTERIZATION

Page 50: 1 10 Web Workload Characterization Web Protocols and Practice

50

User Behavior Characteristics

Web Protocols and Practice

The session arrival times follow an exponential distribution.

Exponential interarrival times correspond to a Poisson process, when users arrive independently of one another.

The exponential distribution is not an accurate model of interarrival times of TCP connections and HTTP requests.

WEB WORKLOAD

CHARACTERIZATION

Page 51: 1 10 Web Workload Characterization Web Protocols and Practice

51

User Behavior Characteristics

Web Protocols and Practice

A workload model that assumes that HTTP requests arrive as a Poisson process would underestimate the possibility of the heavy-load periods and would overestimate the potential performance of the Web server.

The number of clicks associated with user sessions has considerable influence on the load on a server.

WEB WORKLOAD

CHARACTERIZATION

Page 52: 1 10 Web Workload Characterization Web Protocols and Practice

52

User Behavior Characteristics

Web Protocols and Practice

The number of clicks follows a Pareto distribution, suggesting that some sessions involve a much larger number of clicks than others.

The time between successive requests (request interarrival time) by each user has important implications on the server and network load.

The time between the downloading of one page and its embedded images and the user’s next click is referred to as think time or quiet time.

WEB WORKLOAD

CHARACTERIZATION

Page 53: 1 10 Web Workload Characterization Web Protocols and Practice

53

User Behavior Characteristics

Web Protocols and Practice

The characteristics of user think times influence the effectiveness of policies for closing persistent connections.

Most interrequest times are less than 60 seconds.

The think times follow a Pareto distribution with a heavy tail, with a around 1.5.

Heavy-tailed distributions apply to numerous properties of Web traffic:

Resource sizes

WEB WORKLOAD

CHARACTERIZATION

Page 54: 1 10 Web Workload Characterization Web Protocols and Practice

54

User Behavior Characteristics

Web Protocols and Practice

Response sizes The number of embedded references in a Web

page The number of click per session The time between successive clicks

A Web session can be modeled as a sequence of on/off periods, in which each on period corresponds to downloading a Web page and its embedded images and each off period corresponds to the user’s think time.

WEB WORKLOAD

CHARACTERIZATION

Page 55: 1 10 Web Workload Characterization Web Protocols and Practice

55

User Behavior Characteristics

Web Protocols and Practice

The duration of on/off periods both follow a heavy-tailed distribution.

The load on Web servers and the network exhibits a phenomenon known as self similarity, in which the traffic varies dramatically on a variety of time scales from microseconds to several minutes.

WEB WORKLOAD

CHARACTERIZATION

Page 56: 1 10 Web Workload Characterization Web Protocols and Practice

56

Applying Workload Models

Web Protocols and Practice

A deeper understanding of Web workload characteristics can drive the creation of a workload model for evaluating Web protocols and software components.

Generating synthetic traffic involves sampling the probability distribution associated with each workload parameter.

(Table 10.2)

WEB WORKLOAD

CHARACTERIZATION

Page 57: 1 10 Web Workload Characterization Web Protocols and Practice

57

Table 10.2. Probability distributions in Web workload models

Web Protocols and Practice

WEB WORKLOAD

CHARACTERIZATION

DistributionWorkload parameterExponentialSession interarrival times

ParetoResponse sizes (tail of distribution)

Resource sizes (tail of distribution)

Number of embedded images

Request interarrival times

LognormalResponse sizes (body of distribution)

Resource sizes (body of distribution)

Temporal locality

Zipf-likeResource popularity

Page 58: 1 10 Web Workload Characterization Web Protocols and Practice

58

Applying Workload Models

Web Protocols and Practice

Generating synthetic traffic that accurately represents a real workload is very challenging.

Validation of the synthetic workload model is an important step in constructing and using a workload model.

Validation is different from verification: Verification involves testing that the synthetic

traffic has the statistical properties embodied in the workload model.

WEB WORKLOAD

CHARACTERIZATION

Page 59: 1 10 Web Workload Characterization Web Protocols and Practice

59

Applying Workload Models

Web Protocols and Practice

Validation requires demonstrating that the performance of a system subjected to the synthetic workload matches the performance of the same system under a real workload, according to some predefined performance metrics.

Synthetic workload models are also used to test servers over a range of scenarios that might not have happened in practice.

Generating synthetic traffic provides an opportunity to evaluate a proxy or server in a controlled manner.

WEB WORKLOAD

CHARACTERIZATION

Page 60: 1 10 Web Workload Characterization Web Protocols and Practice

60

Applying Workload Models

Web Protocols and Practice

Web performance depends on the interaction between user behavior, resource characteristics, server load, and network dynamics.

Synthetic workloads help address the need to evaluate and compare Web software components in a controlled manner.

WEB WORKLOAD

CHARACTERIZATION