1 caching characteristics of internet and intranet web proxy traces arthur goldberg ilya pevzner...

29
1 Caching Caching Characteristics of Characteristics of Internet and Internet and Intranet Web Proxy Intranet Web Proxy Traces Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Scien New York University

Upload: gerard-lawson

Post on 14-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

1

Caching Characteristics Caching Characteristics of Internet and Intranet of Internet and Intranet Web Proxy TracesWeb Proxy Traces

Arthur GoldbergIlya PevznerRobert Buff

Courant Institute of Mathematical SciencesNew York University

Page 2: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

2

Clients, Servers and ProxyClients, Servers and Proxy

C1

CM

PC2

S1

SN

S2

Clients Ci configured to use cachingproxy P to access N servers Si . Aserver may use a proxy itself.

Page 3: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

3

HTTP Through a ProxyHTTP Through a ProxyBrowser Proxy Server

Miss

Hit

Page 4: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

4

Potential Web Caching Potential Web Caching BenefitsBenefits

• Reduce response time by delivering Reduce response time by delivering document from a closer and/or less document from a closer and/or less loaded server than the origin serverloaded server than the origin server

• Save bandwidth costs between Save bandwidth costs between proxy and origin serverproxy and origin server

Page 5: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

5

GoalsGoals

• Study large internet and intranet Study large internet and intranet tracestraces

• Evaluate caching opportunities and Evaluate caching opportunities and problemsproblems

• Examine cache size needs and Examine cache size needs and document residence timesdocument residence times

Page 6: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

6

Part 1Part 1

Proxy trace sources Proxy trace sources and proxy and proxy configurationsconfigurations

Page 7: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

7

Data SourcesData Sources

Page 8: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

8

ISP UsageISP Usage

• 450,000 users450,000 users

• LoadLoad– PeakPeak

• 500 unique clients500 unique clients

• 30 requests per second30 requests per second

– AverageAverage• 1M requests per day1M requests per day

Page 9: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

9

ISP hardware detailsISP hardware details

• IBM RS/6000 systemIBM RS/6000 system

• 256 MB RAM256 MB RAM

• Three 4 GB disksThree 4 GB disks

Page 10: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

10

ISP proxy configuration ISP proxy configuration detailsdetails• 8 proxies nationwide8 proxies nationwide• Netscape 2.5 proxyNetscape 2.5 proxy• 5.5 GB cache size5.5 GB cache size• Netscape extended-2 log formatNetscape extended-2 log format• ParametersParameters

– max-uncheck - 6 hoursmax-uncheck - 6 hours– lm-factor - 0.1lm-factor - 0.1– term-percent - 80%term-percent - 80%

Page 11: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

11

Intranet UsageIntranet Usage

• 8,000 employees8,000 employees

• LoadLoad– PeakPeak

• VariesVaries

– AverageAverage• 500K requests per day, over 10 hours500K requests per day, over 10 hours

Page 12: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

12

Intranet hardware detailsIntranet hardware details

• Sun Microsystems Ultra 1 serverSun Microsystems Ultra 1 server

• 1 GB RAM1 GB RAM

• Seven 4 GB disksSeven 4 GB disks

Page 13: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

13

Intranet proxy Intranet proxy configuration detailsconfiguration details

• 2 proxies2 proxies

• Squid 1.1.21 proxySquid 1.1.21 proxy

• 12 GB disk cache size12 GB disk cache size

• 750MB memory cache size750MB memory cache size

• Extended log formatExtended log format

Page 14: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

14

Part 2Part 2

Analysis of ISP and Analysis of ISP and Intranet traces Intranet traces assuming unlimited assuming unlimited cache storagecache storage

Page 15: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

15

Key Cache MetricsKey Cache Metrics

• Hit Ratio (Hit Ratio (HR HR ))

• Fractional Bandwidth Savings (BT)Fractional Bandwidth Savings (BT)

served documents of numbercache the from served documents of number

HR

served bytes of numbercache the in documents withserved bytes of number

BT

Page 16: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

16

Analyzing Caching Analyzing Caching PropertiesProperties

Hit RateBandwidth savings

HR(ALL) BT(ALL) All

HR(-ACTUAL) BT(-ACTUAL)Cached by operating proxy

Analysis nameDocuments cached

Cachable as per HTTP specification

HR(-RFC) BT(-RFC)

Page 17: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

17

ISP documents that cannot be ISP documents that cannot be cached, as per HTTP cached, as per HTTP specificationspecification

Reason for non-cacheability

% of entries in ISPtrace

Expires 3.2%

Cache control 0.8%

Pragma: no-cache 0.4%

Request method 0.0%

Page 18: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

18

Comment about “cookies”Comment about “cookies”• For Prodigy, RFC figures assume that For Prodigy, RFC figures assume that

Netscape proxy follows RFCNetscape proxy follows RFC

• In reality, Netscape proxy does not cache In reality, Netscape proxy does not cache documents with cookiesdocuments with cookies

• Documents with cookies, account for 2% of Documents with cookies, account for 2% of responses in Prodigy trace responses in Prodigy trace

• It follows that RFC figures for Prodigy may be It follows that RFC figures for Prodigy may be up to 2% higher than shownup to 2% higher than shown

Page 19: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

19

ISP Hit Ratio vs. Trace ISP Hit Ratio vs. Trace Length Length

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.0E+00 1.1E+06 2.1E+06 3.2E+06 4.2E+06Trace length

Hit ratio

HR(ALL)

HR(-RFC)

HR(-ACTUAL)

Page 20: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

20

ISP BT vs. Trace Length ISP BT vs. Trace Length

0

0.2

0.4

0.0E+00 1.3E+06 2.6E+06 3.8E+06 5.1E+06Trace length

Fraction

BT(ALL)

BT(-RFC)

BT(-ACTUAL)

Page 21: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

21

Intranet HR vs. Trace Intranet HR vs. Trace LengthLength

Hit Ratios

0.00.10.20.30.40.50.60.70.80.91.0

0.0E+00 4.0E+05 8.0E+05 1.2E+06 1.6E+06Trace length

Hit ratio

HR(ALL)

HR(-RFC)

HR(-ACTUAL)

Page 22: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

22

Intranet BT vs. Trace Intranet BT vs. Trace Length Length

Fractions of Bytes Transferred Saved by Caching

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.0E+00 4.0E+05 8.0E+05 1.2E+06 1.6E+06 Trace length

Fraction

BT(ALL)

BT(-RFC)

BT(-ACTUAL)

Page 23: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

23

Part 3Part 3

Analysis of ISP trace with finite Analysis of ISP trace with finite cache sizes.cache sizes.

Page 24: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

24

Prophetic Cache Prophetic Cache Replacement AlgorithmReplacement Algorithm

• A A Prophetic Prophetic cache stores exactly the set cache stores exactly the set of documents that will be referenced in of documents that will be referenced in the futurethe future

• An on-line prophetic cache algorithm An on-line prophetic cache algorithm cannot be builtcannot be built

• However, given a trace, prophetic However, given a trace, prophetic caching decisions can be determined off-caching decisions can be determined off-lineline

Page 25: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

25

Prophetic Cache Prophetic Cache Replacement Algorithm Replacement Algorithm (continued)(continued)

• Cache space used by a prophetic Cache space used by a prophetic cache is the minimum size needed cache is the minimum size needed to avoid cache missesto avoid cache misses– notes:notes:

• true for any maximum residence timetrue for any maximum residence time

• analyses make cyclical tracesanalyses make cyclical traces

Page 26: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

26

Maximum Hit Rate as a Maximum Hit Rate as a function of residence timefunction of residence time

20 40 60 80 100 120 140Residence time (hours)

0.10.20.30.40.50.6

Hit Rate

Page 27: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

27

Maximum Hit Rate as a Maximum Hit Rate as a function of residence function of residence time, by document sizetime, by document size

20 40 60 80 100 120 140Residence time (hours)

0.10.20.30.40.50.6

Hit Rate

101 – 1K

1K – 10K 10K – 100K

100K – 1M

1 - 10

11 - 100

Page 28: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

28

ConclusionsConclusions

• We analyze very long Web proxy traces We analyze very long Web proxy traces from an ISP and an intranetfrom an ISP and an intranet

• We propose a new method to evaluate We propose a new method to evaluate a proxy by comparing the actual hit rate a proxy by comparing the actual hit rate with potential hit ratewith potential hit rate

• We show that it is important to keep the We show that it is important to keep the cache residence time above one daycache residence time above one day

Page 29: 1 Caching Characteristics of Internet and Intranet Web Proxy Traces Arthur Goldberg Ilya Pevzner Robert Buff Courant Institute of Mathematical Sciences

29

AddressesAddresses

• E-mail: E-mail: {artg,pevzner,buff}@cs.nyu.edu{artg,pevzner,buff}@cs.nyu.edu

• WWW: WWW: www.cs.nyu.edu/{artg,pevzner,buff}www.cs.nyu.edu/{artg,pevzner,buff}

• Paper and presentation is available Paper and presentation is available at www.cs.nyu.edu/artgat www.cs.nyu.edu/artg