world wide web caching: trends and technologys gerg barish & katia obraczka usc information...
TRANSCRIPT
World Wide Web Caching: Trends and Technologys
Gerg Barish & Katia Obraczka
USC Information Sciences Institute, USA,2000
plan
Introduction The Expected gains Desirable properties of a Web Caching system Caching architectures Cache deployment options Design techniques summary Future works
Introduction What is Web Caching ?
– Introducing proxy servers at certain points in the network that serve in caching Web documents for faster client access.
– Comparable to the cache memory in a computer system.
Why is it needed ?– Rapid growth in HTTP traffic to form the largest part of
the Internet traffic which causes more network congestion and server unavailability.
– The number of Web static pages almost doubles every year.
The Expected gains:
Bandwidth saving Improving content availability. Improving web server availability. Reducing network latency. Server load balancing. Improving user’s perception about network’s perform
ance.
Desirable properties :
Fast access Transparency Scalability Efficiency Adaptivity Stability Load balancing Simplicity
Caching Architectures
Proxy Caching– Deployed at the edges of the network– Unavailable cache Unavailable network– Single point of failure– User browser manual reconfiguration in times of failure– Browser auto-reconfiguration is a recent trend
client
client
client
cache router Web
(a). standalone
Caching Architectures Reverse Proxy Caching
– Placing proxies near the content provider
Transparent Caching– Eliminates the needs to manually configure web browsers– Router-based transparent proxy caching– Switch-based transparent proxy caching
client
client
client
router router
cache cachecache
Webclient
client
client
L4switch
cache cachecache
Web
(b)router-transparent (c)switch-transparent proxy caching
Caching Architectures
Adaptive Web Caching– Uses distributed cache meshes to solve the hot spot proble
m– Caches dynamically join and leave the groups based on con
tent demand– Adaptivity and self-organizing– Cache Group Management Protocol(CGMP)– Content Routing Protocol(CRP)– Administrative boundaries must be relaxed
Caching Architectures
Overlapping multicast groups of web caches
Self-organization of web caches
Caching Architectures Push Caching
– Keep data close to those clients requesting this information– Assumption: we are able launch caches that may cross
administrative boundaries– Incurs cost (storage and transmission)
Active Caching– Applies caching to dynamic documents– 30 % of client HTTP requests contains cookies– Cache applets– The servers provides the cache with the objects and any
associated cache applets
Cache Deployment options Near the content consumer(consumer-oriented)
– Better response time– Local service of requests
Near the content provider(provider-oriented)– Improves access to logical sets of data– Improve the scalability and availability of content– Problem critical to delay sensitive content (audio,video)
At strategic points in the network– Based on user access patterns and network topology and
conditions– Problem with administrative control
Design Techniques
Main Concerns:– Speed– Reliability– Scalability
design techniques:– Hierarchical caching– Intercache communication– Hash-Based request routing– Optimized disk I/O– Microkernel Operating System– Content prefetching– Cache consistency methods
Hierarchical Caching
Caches are arranged in a tree-like structure A child cache can query parent caches and other
siblings A parent cache can never query children This maintains information gradually filtering down to
the leaves To avoid swamping parents with information,
clustering may be applied to hierarchies.
Hierarchical Caching
Caches are placed at multiple levels of the network. Bottom – clients/browsers caches.
national
regional
institutional
bottom
web page not found
web page not found
Hierarchical Caching
Advantages:– Bandwidth efficient – especially when cache servers are slow.
– Allows to efficiently diffuse popular web pages towards the
demand.
Disadvantages– Cache server needs to be placed at key access points of the
network requires coordination among caches.– Each level adds a delay.– High levels are bottlenecks.– multiple copies at different cache levels.
Distributed Caching
Multiple Distributed Caches in meshes Caches at the bottom level only. No other intermediate caching levels. Improves scalability, availability, and physical locality Each cache server contains meta-data on the data
stored on other servers. Hierarchy used only for distributing information about
location of the copy. No copying of actual documents
Intercache Communication Composed of multiple distributed caches. Protocols:
– ICP (Internet Cache Protocol) [Squid]: Caches issue queries to other caches to determine the best location of object retrieval. Main problem is the message overhead
– CRP (Content Routing Protocol): ICP with multicast feature to query cache meshes
– Cache digests [Squid]: summarizes cache objects – WCCP (Web Cache Communication Protocol) [Cisco]:
Enables transparent redirection of HTTP traffic to Cisco Cache Engine
– CARP (Cache Array Routing Protocol) [Microsoft]: Uses Hashing Schemes for location determination of the required proxy having the requested information
Hashing function
Point the local cache in direction of other caches which have the object or can get it.
Hash-Based request routing– Use hash-function to map a key (such as the url) t
o a cache within a cluster– Reduces (eliminates) the need of caches to query
each other– Ex) Netcache-MD5-indexed URL hash-function
CARP
Optimized I/O: Treat the object cache with high performance data base. Determine if the object has been cached in memory data
structure. Disk operations locate where is in the disk place the content. Costly I/O operations can be avoided.
How the resources are managed . Improve resource allocation. Optimize cache performance.
Microkernel Operating System:
Content prefetching
The latter uses data accumulate by the server,such as historical information.
Content prefetching– Local based– Server-hint based
Implementation:– Between clients and servers– Between clients and proxies– Between proxies and servers
Improvements:– Less latency (from 26% improvement to 57%)– Improved access time
Cache coherency (consistency)
Ensure that the cached object does not reflect stale or defunct data.
Consistency techniques:– Client polling:compare the cached object with that of the original
object .– Invalidation callbacks:the server contact the proxies when
objects change.– TTL and Adaptive TTL– If-Modified Since:caches only when they are requested and
there expiration date has been reached.
Summary:
Different designing caches but some issues common among them.
Advantages:1. Improve content availability.
2. Reduce network latencies.
3. Reduce address increasing bandwidth demands.
4. Can hide network problems.
5. Reduce server burden.
Disadvantages:1. Stale pages.
2. Information retained in caches.
Open Future Works(trends): Content security. 1.Net cache
2. Cache flow
Handling more complex objects and real-time data
Web Caching based on Ontology ?– User access pattern prediction– Prefatching – Cache placement/replacement
-The appliance deployed in parallel to firewall-The appliance can be used to control who accesses a web site.-Virus scanning for all incoming content.
-Added content filtering to its caches.
RTEE(real time event engine): captures,caches,and queries data at speeds greater than 12000 event/s.