squirrel: a decentralized peer- to-peer web cache paul burstein 10/27/2003

22
Squirrel: A decentralized peer-to- peer web cache Paul Burstein 10/27/2003

Post on 21-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Squirrel: A decentralized peer-to-peer web cache

Paul Burstein10/27/2003

Outline

Overview Design Evaluation Discussion

Traditional Web Caching

Goals Reduce browser latency Reduce aggregate bandwidth Reduce load on web servers

Deployment Dedicated centralized machines Placed at local network boundaries

Squirrel Web Caching

Decentralized caching Desktops cooperate in a peer-to-peer

fashion Mutual sharing between hosts

Hosts browse and cache

Pros Centralized

Dedicated Hardware

Cost Administration

Handling load bursts

Single point of failure

Decentralized No additional

hardware More users

more resources Automatic scaling

Self organizing Easy deployment

Assumptions

Cooperative hosts No security issues

Link and node failures Nodes are in single geographic

location Low internal network latencies

Outline

Overview Design Evaluation Discussion

Design Goals

Target environment: 100 - 100,000 machines

Goal: Achieve performance comparable to centralized cache

Design Overview Built on top of Pastry Objects have 128-bit objectIds

SHA-1 hash of URL Mapped to home node with closest nodeId Requests:

GET – new request cGET – conditional

Two schemes Home-store Directory

Home-store Objects stored at

client cache and home node

External requests come through home node Cache replacement

All objects are considered

a. home node freshb. home node stale

Directory Home node keeps a

directory of pointers Randomly redirect

to delegates

a. no directory, add new delegateb. cGET not modifiedc. delegate fresh, get from delegated. cGET and stale, updatee. GET and stale, update

Outline

Overview Design Evaluation Discussion

Evaluation Characteristics Compare two schemes and dedicated

cache

Performance Latency External bandwidth Hit ratio

Overhead Load Storage

Fault Tolerance

Trace Characteristics

Bandwidth and Hit ratio Bytes transferred to origin servers and

back correlated with hit rate

Centralized cache with infinite storage 100MB cache per node achieves

optimal rates 10MB in-memory cache is reasonable

Directory scheme Active nodes suffer from eviction Distributed LRU is worse than centralized

Home-store More total storage required

Latency User-perceived time for a response With comparable hit ratios, only

consider internal hops Many requests can be satisfied locally,

with 0 hops Directory scheme latency is up to one

hop greater Some requests can be satisfied by

home node

Squirrel Latency Based on Pastry hops on cache hit Overshadowed on cache miss

Load on Nodes(1/2) Bursty behavior observations

Max objects served per second Up to 48 and 55 objects per second

served for the two traces

Directory scheme One delegate can get bombarded with

requests from many home nodes Home-store scheme

Replicate objects at request threshold

Load on Nodes(2/2) Sustained load

measurements Max objects/minute

Average load in any second or minute: 0.31 objects/minute Redmond trace,

both models

Fault Tolerance Internet connection loss Internal partitioning Individual failure

Desktop shutdown or reboot Graceful shutdown

Pastry aided content transfer Directory scheme

More vulnerable to failures

Results

The home-store models seems to outperform the directory model Hit ratio Load balancing Internal network latency

Compared to centralized cache?

Outline

Overview Design Evaluation Discussion

Discussion

Would this be deployed in a corporate network?