analysis of web caching architectures: hierarchical and distributed caching pablo rodriguez,...

Analysis of Web Caching Architectures: Hierarchical and Distributed Caching

Pablo Rodriguez, Christian Spanner, and Ernst W.

BiersackIEEE/ACM TRANSACTIONS ON

NETWORKINGVOL. 9, NO. 4, Auguest 2001

Abstract Caching architectures

Hierarchical Distributed Hybrid

Analytical models Performance

Connection time Transmission time Total latency Bandwidth Cache load

Caching architectures Hierarchical caching

Institutional cache Intermediate cache National cache

Distributed caching Institutional cache

Network topology

The model Network model

Full O-ary tree Document model

Request – Poisson distribution Popularity - Zipf distribution

Hierarchical caching Caches are placed at the access points

between two different networks. Distributed caching

Caches are placed at the institutional network.

Network model

Document model

1N

1

,

,

1 ,

1

ison distributi Zipf theskewed how determineshat constant t a is

documentpopular most th theof raterequest theis

documents N allfor cache nalinstitutioan from raterequest theis

i

IiI

iI

N

i iII

I

i

i

i

Properties and limitations of the model O-ary trees are good models. Modifying the height or the number of

tiers of the tree can easily model other networks.

The model assumes homogeneous client communities.

Heterogeneous client communities can be easily modeled.

Simulations results in this paper should be considered as relative results.

Connection time Depend on the number of network links

from the client to the cache.

delayn propagatio hop-per the:

travelsidocument for request a that links :

tree theof level the:

noderoot andserver abetween links :

nodesroot between links :

d

L

l

z

H

i

Connection time (cont’d)

zHHHli

hc llLPdTE

2,2,,0

14

H

lii

dc zHzHLPdllLPdTE

2

0

1224124

Distance of transmission

A request first travels up then down

TCP three-way handshake

Server

Transmission time Caches operate in a cut-through mode.

zH

lii

dt

dt

zHHHlii

ht

ht

lLPlLTETE

lLPlLTETE

2

0

2,2,,0

|

|

lNlIld

l

NIH

RIl

IIl

I

hl

hithithitO

zHlHhitO

HlHhitO

HlhitO

l

1

22 ,1

,1

0 ,1

0 ,

2

Request

rate

Comparison O = 4 H = 3 z = 10 N = 250 million

Connection time

Network traffic at every tree level

Expected transmission time

(a) Non-congested national network

(b) Congested national network

Total latency

Heterogeneous client communities

(a) Expected connection time

(b) Expected transmission time

Bandwidth usage The expected number of links traversed to

distribute one packet to the clients.

(a) Regional network

(b) National network

Cache load The filtered request rate

Disk space The average Web document size S

times the average number of copies present in the caching infrastructure.

The average number of copies present in the caching infrastructure can be calculated using the probability that a new document copy is created at every cache level.

Disk space (cont’d)

A hybrid caching scheme A certain number of caches k cooperate

at every network level. When a document cannot be found in a

cache The cache checks if the document resides in

any of the cooperating caches. If multiple caches have a document copy, the

neighbor cache with the lowest latency is selected.

Otherwise, the request is then forwarded to the immediate parent cache or to the server.

Connection time

Connection time (cont’d)

Transmission time

Transmission time (cont’d)

Total latency

Bandwidth usage

(a) National network

(b) Regional network

Cache load

Conclusions Hierarchical caching architecture

Reduce the expected distance to hit a document Decrease the bandwidth usage Reduce the administrative concerns Need powerful intermediate caches or load-

balancing algorithms Distributed caching architecture

Large network distances High bandwidth usages Administrative issues

Hybrid scheme is the best

analysis of web caching architectures: hierarchical and distributed caching pablo rodriguez,...

Documents

number of network links

caching infrastructure

number of links

average number of copies

network level

number of tiers

cache level

congested national networkb