![Page 1: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/1.jpg)
Dr. Yingwu Zhu
Summary Cache : A Scalable Wide-Area Web Cache Sharing Protocol
![Page 2: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/2.jpg)
Contents
Background and Problems
Cache Cooperation
ICP
Summary Cache Bloom Filters – the math Bloom Filters as Summaries
Evaluation
Conclusions
![Page 3: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/3.jpg)
Web Caching Picture
Proxy Caches
Users
Regional Network
Rest of Internet
Bottleneck
. . . . . .
![Page 4: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/4.jpg)
What we have and What to be done ?
A single proxy cache serves a population Many such proxy caches out there! Can they cooperate as a single “large cache” to
serve the aggregated population to increase hit ratio, reduce latency and bandwidth cost? Proxy caches should cooperate and serve each other’s
misses
If so, how do we know which caches hold the requested objects that miss in current proxy cache? Content location among cooperating caches?
![Page 5: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/5.jpg)
ICP – What we have!
Inter Cache Protocol (ICP) First proposed in the context of the Harvest project Supports discovery and retrieval of documents
from neighboring caches
![Page 6: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/6.jpg)
Cache Cooperation via ICP
When one proxy has a cache miss, send queries to all siblings (and parents): “do you have the URL?”
Whoever responds first with “Yes”, send a request to fetch the file
If no “Yes” response within certain time limit, send request to Web server
Parent Cache (optional)
![Page 7: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/7.jpg)
Cache Cooperation Types
No cache sharing Proxies do not collaborate
Simple cache sharing (ICP-style) Proxy caches the document locally from other proxies Proxies do not coordinate cache replacement (each LRU)
Single-copy cache sharing (global LRU) Proxy not cache documents fetched from others Proxy marks documents as most-recently-access
Global cache Proxy fully coordinate in servicing misses and cache
replacement
![Page 8: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/8.jpg)
What are the benefits of cache cooperation?
Let Experiments showcase the benefits.
Data traces used.
![Page 9: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/9.jpg)
Traces and Simulations (1/2)
Use five sets of traces of HTTP requests Digital Equipment Corporation Web Proxy Server Traces
(DEC) Traces of HTTP request from the University of California at
Berkeley Dial-IP service (UCB) Traces of HTTP request made by users in the Computer Sci
ence Department, University of Pisa, Italy (UPisa) Logs of HTTP GET requests seen by the parent proxies at Q
uestnet, a regional network in Australia (Questnet) One-day log of HTTP request to the four major parent proxi
es, “bo”, “pb”, “sd”, and “uc” in the National Web Cache gierarchy by National Lab of Applied Network Research (NLANR)
![Page 10: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/10.jpg)
Traces and Simulations (2/2)Statistics about the traces The maximum cache hit ratio and byte hit ratio are achieved with the
infinite cache Infinite cache size - Total size in bytes of unique documents in a trace
The other hit ratios are calculated assuming a cache size that is 10% of the infinite cache
All use LRU as the cache replacement algorithm Restriction that documents larger than 250KB are not cached Assuming that each group has its own proxy (splitting traces into
groups) Simulated the cache sharing among the proxies
![Page 11: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/11.jpg)
Benefits of Cache Cooperation
![Page 12: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/12.jpg)
What we got so far?
Cache cooperation is good, even for simple cache cooperation!
So we need collaboration among proxy caches!
Now, we really need to address the issue How can a proxy know its misses can be served by
other cooperative proxies? We know ICP can do that. So turn our attention to ICP again!
![Page 13: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/13.jpg)
ICP: Good but …
ICP discovers cache hits in other proxies by having the proxy multicast a query message Thus, as the number of proxies increases, both the c
ommunication and the processing overhead increase quadratically
T = N * (N-1)*(1-H)*R T: Total ICP message N: Number of Proxies H: Hit rate R: Average requests
![Page 14: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/14.jpg)
Overhead of ICP
Though the Internet Cache Protocol (ICP) has been successful at encouraging Web cache sharing around the world, it is not a scalable protocol It relies on multicasting query messages to find remote
cache hits Every time one proxy has a cache miss, everyone else
receives and processes a query message As the number of collaborating proxies increases, the
overhead quickly becomes prohibitive
![Page 15: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/15.jpg)
Overhead of ICP
Overhead of ICP in the four-proxy case Environments
10 Sun Sparc-20 workstations connected with 100Mbps Ethernet Four workstations act as four proxy systems running Squid 1.1.14, a
nd each has 75MB of cache space Four workstations run 120 client processes, 30 processes on each w
orkstations The client processes on each workstation connect to one of the proxies Document sizes follow the Pareto distribution with α=1.1 and k=3.0
Two workstations act as servers, each with 15 servers listening on different ports
![Page 16: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/16.jpg)
ICP: Overhead!
Quantify the overhead of the ICP protocol, the number of proxies is 4 ICP increases the interproxy traffic by a factor of 7
0 to 90 CPU overhead by over 15% Average user latency by up to 11%
![Page 17: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/17.jpg)
Overhead of ICP
The result highlight the dilemma faced by cache administrators There are clear benefits of cache cooperation But the overhead of ICP is high
To address the problem, the authors proposed a new scalable protocol: “Summary Cache”
![Page 18: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/18.jpg)
Alternatives to ICP
Force all users to go through the same cache or the same array of caches Partition URLs among cooperating caches Difficult in a wide-area environment
Central directory server Directory server can be a bottleneck
Ideally, one wants a protocol:
keeps the total cache hit ratio highminimizes inter-proxy trafficscales to a large number of proxies
![Page 19: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/19.jpg)
Summary Cache
Basic idea: Let each proxy keep a directory of what URLs are
cached in every other proxy, and use the directory as a filter to reduce number of queries
Problem 1: keeping the directory up-to-date Solution: delay and batch the updates => directory
can be slightly out-of-date (trade freshness for #-of-msgs reducing)
Problem 2: DRAM requirement Solution: compress the directory => imprecise, but
inclusive directory
![Page 20: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/20.jpg)
Summary Cache
Proxy a compact summary of the cache directory of other proxies That is, the list of URLs of cached documents
Only send query message to correct proxyWhen a cache miss occurs, a proxy first probes all the summariesThe summaries do not need to be accurate at all times A false hit, the penalty is a wasted query message A false miss, the penalty is a higher miss ratio
Two key questions in the design of the protocol Frequency of summary updates Representation of summary
![Page 21: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/21.jpg)
Summary Cache
Scalability of summaries The key to the scalability of the scheme is that
summaries do not have to be up-to-date or accurate Summary does not have to be updated every time
the cache directory is changed The update can occur upon regular time intervals Small size, which does not consume much memory
![Page 22: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/22.jpg)
Summary Cache
Two kinds of errors are tolerated False misses
The document requested is cached at some other proxy but its summary does not reflect the fact
In this case A remote cache hit is lost The total hit ratio within the collection of caches is reduced
False hits The documents requested is cached at some other proxy but its
summary indicates that it is The proxy will send a query message to the other proxy, only to be
notified that the document is not cached there In this case
A query message is wasted
![Page 23: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/23.jpg)
Summary Cache
A third kind of error, remote stale hits Occurs in both summary cache and ICP When a document is cached at another proxy, but the
cached copy is stale
Two factors limit the scalability of summary cache Network overhead
Inter-proxy traffic
Memory required to store the summaries For performance reasons, the summaries should be stored in
DRAM, not on disk
![Page 24: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/24.jpg)
Impact of Update Delays
ICP : No update delay exact_dir : update delay increases (delay the updates until a certain
percentage of the cached documents are “new”)
![Page 25: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/25.jpg)
Summary Representation
In practice, proxies typically have 8GB to 20GB of cache spaceIf we assume 16 proxies of 8GB cache and file size of 8KB (16-byte MD5 per URL) Exact-directory summary would consume
= (16-1)*16*(8GB/8KB) = 240MB per proxy
The requirement on an ideal summary representation Small size, Inclusive, and low false hit ratio Solution, “Bloom Filters”
![Page 26: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/26.jpg)
Alternatives vs. Bloom Filters
First try: use server URLs only Problem: too many false hits, leading to too many
messages between proxies
Exact Directory Problem: memory consuming, message size Impractical!!!
Bloom Filters: Have the best of both!
![Page 27: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/27.jpg)
Bloom Filters – the math
If any of them is 0, then certainly b is not in the set A
Otherwise we conjecture that b is in the set although there is a certain probability that we are wrong False Positive (=> False Hit) Clear tradeoff between m/n an
d the probability of a false positive
![Page 28: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/28.jpg)
Bloom Filters: the Math
Given n keys, how to choose m and k?
Bit Vector v
1
1
1
1
m bits
• Suppose m is fixed (>2n), choose k: k is optimal when exactly half of the bits are 0 => optimal k = ln(2) * m/n
•False positive ratio under optimal k is (1/2)k
=> false positive ratio = (1/2)ln2*m/n = (0.62)m/n
![Page 29: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/29.jpg)
Bloom Filters: the Practice
Choosing hash functions bits from MD5 signatures of URLs
Maintaining the summary the proxy maintains an array of counters for each bit, the counter records how many times
the bit is set to 1
Updating the summary either the whole bit array or the positions of
changed bits (delta encoding)
![Page 30: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/30.jpg)
Bloom Filters as Summaries
Total hit ratio under different summary representations
m/n = 8, 16, 32
![Page 31: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/31.jpg)
Bloom Filters as Summaries
Ratio of false hits under different summary representations
![Page 32: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/32.jpg)
Bloom Filters as Summaries
Storage requirement (relative to infinite cache size) In terms of percentage of proxy cache size, of the summary
representations
![Page 33: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/33.jpg)
Bloom Filters as Summaries
Number of network messages per user request under different summary forms
![Page 34: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/34.jpg)
Bloom Filters as Summaries
Bytes of network messages per user request under different summary forms
![Page 35: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/35.jpg)
Enhancing ICP with Summary Cache
Prototype implemented in Squid 1.1.14
Repeating the 4-proxy experiments, the new ICP: Reduces UDP messages by a factor of 12 to 50 compared
with the old ICP Little increase in network packets over no cache sharing increase CPU time by 2 - 7% reduce user latency up to 4% with remote cache hits
![Page 36: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/36.jpg)
Conclusion
Summary Cache enhanced ICP
Using trace-driven simulations and measurements
Reduces the number of interproxy protocol message by factor of 25 to 60
Reduces the bandwidth consumption by over 50%
Almost no degradation in the cache hit ratios
Reduces CPU overhead between 30% to 95%
![Page 37: Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649f045503460f94c18510/html5/thumbnails/37.jpg)
Questions