collaborative web caching based on proxy affinities jiong yang, wei wang in t. j.watson research...

Collaborative Web Caching Based on Proxy Affinities

Jiong Yang, Wei Wang in T. J.Watson Research Center

Richard Muntz in Computer Science Department of UCLA

Proceedings of the international conference on International Conference on Measurements and

modeling of computer systems, 2000, Pages 78 - 89

Outline

1. Introduction 2. Related Work 3. Objective Model 4.Page Cluster 5.Information Group Maintenance 6.Web Page Retrieval 7.Experience Result 8.Estimation of Information Group Size

1. Introduction

Recent research to improving internet performance into three categories

--server load balancing --intra-net collaborative caching (summary cache) --inter-net collaborative caching 1. Nearby proxy faster than distant server

2. A proxy with up-to-date page could serve as a server Drawback: Burst network traffic

Introduction(cont.)

Each request full into three categories

--The proxy locally cached the up-to-date version of web page

--The up-to-date web page exists on nearby proxies

--The requested web page has to obtained from the content server

Introduction(cont.)

How to discovery which proxy has cached web page?

--pull (more response time) --push (more communicating messages) In this paper --dynamic distributed collaborative caching infrastructure --information group ( web page clusters ) & proxy profile(

list of URLs )

--GOAL:messges (among proxies & update) maintain cache hit rate & latency

2. Related Work

Caching in Harvest

--caches organized in a hierarchy Adaptive Web Caching

--self-organizing to form a tight mesh Summary Cache

--Each proxy keep a summary (using cache sharing protocol)

Related Work(cont.)

Web Caching Based on Dynamic Access Patterns

--A local caching algorithm flexibly adapts its parameters

Server Volumes and Proxy Filters

--piggyback

3. Objective Model

γ: local cache hit ratio w : remote cache hit ratio Local-cost Remote-cost Server-cost Locating-cost : find where the cache is Push-cost : incurred multicast by changes Serach-cost :Push-cost+Locating-cost Cost=

Objective Model(cont.)

[Imax , I min ] : the number of proxies in a collaboration

m : cache hit ratio> search cost

4. Page Cluster

Frequency: this web page / total pages

threshold(β) Grouping web pages into clusters --Each proxy sends its profile to a central

site S --An optimal or near optimal partition of

frequency accessed web pages is generated

Page Cluster(cont.)

Page Cluster(cont.)

The number of cluster

We need additional data structure:

Page Cluster(cont.)

The action on a page:

--move to another cluster

--replicate in another cluster

--remove replica from this cluster

Page Cluster(cont.)

Page Cluster(cont.)

Choose a server to be the coordinator of information group

The content of all page clusters and their coordinators are broadcast to all proxies

5. Information Group Maintenance

Each information group is associated with one page cluster.

A proxy join a information group which has maximum pages in it.Find another ……..until the proxy joins the information groups for all web pages on its profile.

Local reorganization

Information Group Maintenance(cont.)

A proxy wants to join an information group --send a message to coordinator of information group --send back the list of the members --the new proxy send the intersection of its cache content

to all member in this information group A proxy wants to withdraw from a information group --multicast to all member If a proxy’s cache for a page cluster changes by more

than 10% as the threshold ,multicast to all member( the lowest priority )

6. Web Page Retrieval

7. Experience Result

Experience Result(cont.)

8. Estimation of Information Group Size

Cost=

9. Conclusion

Dynamic adaptable structure Good scalability Maintain a high hit ratio and less latency

and less message

collaborative web caching based on proxy affinities jiong yang, wei wang in t. j.watson research...

Documents

cluster slide

uptodate web page

web page retrieval

requested web page

web pages

page clustercont

generated slide

protocol slide