world wide web caching: trends and technologys gerg barish & katia obraczka usc information...

World Wide Web Caching: Trends and Technologys

Gerg Barish & Katia Obraczka

USC Information Sciences Institute, USA,2000

plan

Introduction The Expected gains Desirable properties of a Web Caching system Caching architectures Cache deployment options Design techniques summary Future works

Introduction What is Web Caching ?

– Introducing proxy servers at certain points in the network that serve in caching Web documents for faster client access.

– Comparable to the cache memory in a computer system.

Why is it needed ?– Rapid growth in HTTP traffic to form the largest part of

the Internet traffic which causes more network congestion and server unavailability.

– The number of Web static pages almost doubles every year.

The Expected gains:

Bandwidth saving Improving content availability. Improving web server availability. Reducing network latency. Server load balancing. Improving user’s perception about network’s perform

ance.

Desirable properties :

Fast access Transparency Scalability Efficiency Adaptivity Stability Load balancing Simplicity

Caching Architectures

Proxy Caching– Deployed at the edges of the network– Unavailable cache Unavailable network– Single point of failure– User browser manual reconfiguration in times of failure– Browser auto-reconfiguration is a recent trend

client

client

client

cache router Web

(a). standalone

Caching Architectures Reverse Proxy Caching

– Placing proxies near the content provider

Transparent Caching– Eliminates the needs to manually configure web browsers– Router-based transparent proxy caching– Switch-based transparent proxy caching

client

client

client

router router

cache cachecache

Webclient

client

client

L4switch

cache cachecache

Web

(b)router-transparent (c)switch-transparent proxy caching


Adaptive Web Caching– Uses distributed cache meshes to solve the hot spot proble

m– Caches dynamically join and leave the groups based on con

tent demand– Adaptivity and self-organizing– Cache Group Management Protocol(CGMP)– Content Routing Protocol(CRP)– Administrative boundaries must be relaxed


Overlapping multicast groups of web caches

Self-organization of web caches

Caching Architectures Push Caching

– Keep data close to those clients requesting this information– Assumption: we are able launch caches that may cross

administrative boundaries– Incurs cost (storage and transmission)

Active Caching– Applies caching to dynamic documents– 30 % of client HTTP requests contains cookies– Cache applets– The servers provides the cache with the objects and any

associated cache applets

Cache Deployment options Near the content consumer(consumer-oriented)

– Better response time– Local service of requests

Near the content provider(provider-oriented)– Improves access to logical sets of data– Improve the scalability and availability of content– Problem critical to delay sensitive content (audio,video)

At strategic points in the network– Based on user access patterns and network topology and

conditions– Problem with administrative control

Design Techniques

Main Concerns:– Speed– Reliability– Scalability

design techniques:– Hierarchical caching– Intercache communication– Hash-Based request routing– Optimized disk I/O– Microkernel Operating System– Content prefetching– Cache consistency methods

Hierarchical Caching

Caches are arranged in a tree-like structure A child cache can query parent caches and other

siblings A parent cache can never query children This maintains information gradually filtering down to

the leaves To avoid swamping parents with information,

clustering may be applied to hierarchies.


Caches are placed at multiple levels of the network. Bottom – clients/browsers caches.

national

regional

institutional

bottom

web page not found

web page not found


Advantages:– Bandwidth efficient – especially when cache servers are slow.

– Allows to efficiently diffuse popular web pages towards the

demand.

Disadvantages– Cache server needs to be placed at key access points of the

network requires coordination among caches.– Each level adds a delay.– High levels are bottlenecks.– multiple copies at different cache levels.

Distributed Caching

Multiple Distributed Caches in meshes Caches at the bottom level only. No other intermediate caching levels. Improves scalability, availability, and physical locality Each cache server contains meta-data on the data

stored on other servers. Hierarchy used only for distributing information about

location of the copy. No copying of actual documents

Intercache Communication Composed of multiple distributed caches. Protocols:

– ICP (Internet Cache Protocol) [Squid]: Caches issue queries to other caches to determine the best location of object retrieval. Main problem is the message overhead

– CRP (Content Routing Protocol): ICP with multicast feature to query cache meshes

– Cache digests [Squid]: summarizes cache objects – WCCP (Web Cache Communication Protocol) [Cisco]:

Enables transparent redirection of HTTP traffic to Cisco Cache Engine

– CARP (Cache Array Routing Protocol) [Microsoft]: Uses Hashing Schemes for location determination of the required proxy having the requested information

Hashing function

Point the local cache in direction of other caches which have the object or can get it.

Hash-Based request routing– Use hash-function to map a key (such as the url) t

o a cache within a cluster– Reduces (eliminates) the need of caches to query

each other– Ex) Netcache-MD5-indexed URL hash-function

CARP

Optimized I/O: Treat the object cache with high performance data base. Determine if the object has been cached in memory data

structure. Disk operations locate where is in the disk place the content. Costly I/O operations can be avoided.

How the resources are managed . Improve resource allocation. Optimize cache performance.

Microkernel Operating System:

Content prefetching

The latter uses data accumulate by the server,such as historical information.

Content prefetching– Local based– Server-hint based

Implementation:– Between clients and servers– Between clients and proxies– Between proxies and servers

Improvements:– Less latency (from 26% improvement to 57%)– Improved access time

Cache coherency (consistency)

Ensure that the cached object does not reflect stale or defunct data.

Consistency techniques:– Client polling:compare the cached object with that of the original

object .– Invalidation callbacks:the server contact the proxies when

objects change.– TTL and Adaptive TTL– If-Modified Since:caches only when they are requested and

there expiration date has been reached.

Summary:

Different designing caches but some issues common among them.

Advantages:1. Improve content availability.

2. Reduce network latencies.

3. Reduce address increasing bandwidth demands.

4. Can hide network problems.

5. Reduce server burden.

Disadvantages:1. Stale pages.

2. Information retained in caches.

Open Future Works(trends): Content security. 1.Net cache

2. Cache flow

Handling more complex objects and real-time data

Web Caching based on Ontology ?– User access pattern prediction– Prefatching – Cache placement/replacement

-The appliance deployed in parallel to firewall-The appliance can be used to control who accesses a web site.-Virus scanning for all incoming content.

-Added content filtering to its caches.

RTEE(real time event engine): captures,caches,and queries data at speeds greater than 12000 event/s.

world wide web caching: trends and technologys gerg barish & katia obraczka usc information...

Documents